Hi there, Been running my own little Lemmy instance basically to see how it runs with federation and stuff like that. I have had open (email validation) for the odd person or 2 that might want to use it.

Early this morning (my time 4:43am) I had about 15 new users all at the exact same time registering as users with same structure names (random words) followed by 4 numbers. Being all within 1 minute of each other they are obvious bots.

Going through the UI I have not been able to find a way to remove them. I have since changed my registration policy to make the person fill in an application, captcha, email validation etc. to help stop polluting the ecosystem with bots.

Any help would be appreciated. I am running it all under docker

  • b3nsn0w@pricefield.org
    link
    fedilink
    English
    arrow-up
    10
    ·
    edit-2
    1 year ago

    the only mass solution i found to this was that i installed pgadmin, logged into the db, and manually removed all the bot accounts from local_user. you should also remove them from the person table as well (you can easily find them if you do SELECT * FROM person WHERE local = true ORDER BY published DESC in the query tool), that way they don’t show up in your instance stats, but removing them from local_user would be enough to stop them from logging in.

  • cosmic_slate@dmv.social
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 year ago

    If you’re running a small instance for experimentation and play, IMO you should seriously consider not leaving it be open and instead use manual approval.

    If I wanted to be a malicious bot operator, I’d want to find instances that are as minimally staffed as possible with no signs the admins are keeping tabs on the instance as much as the others.

  • Trapping5341@lemmy.world
    link
    fedilink
    English
    arrow-up
    7
    ·
    1 year ago

    I feel attacked. 😂

    In all seriousness. They probably are bots but I personally just let bitwarden make me a username and this is the default way to generate one.

  • Ricaz@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    This is a problem for any web application. There are many solutions, none are perfect.

    On some sites (like 4chan) you’re required to solve a captcha every single time you post, unless you pay a yearly fee not to.

    To avoid it, you would need people actively monitoring, banning, and setting up bot detection patterns.

    Then again, there are cheap services online where real people are hired to create human accounts and spam you anyway, so…

    • talung@lemmy.talung.orgOP
      link
      fedilink
      English
      arrow-up
      3
      ·
      1 year ago

      And How would I do that in the UI? This is the issue, haven’t found a way to even find those users on my system, even though it marks at 15 extra users.

      • Wander@yiffit.net
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 year ago

        Did you find a solution. The above comment with the database query should work. You can access the docker container where the database is running with docker exec -it instancedomain_postgres_1 busybox /bin/sh and then run psql -U databaseuser which by default is ‘lemmy’.

        Check docker ps to know the exact name of the postgres container which in your case likely is lemmytalungorg_postgres_1

        • talung@lemmy.talung.orgOP
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          1 year ago

          Thanks, I have gone through and identified the REAL accounts gathered the ID’s and deleted the rest from local_user and person tables.

          Haven’t really played much with Postgress so took some time to look up all the functionality with that.

          EDIT: yup, made sure the person one was local only :)

  • BrightSide@demotheque.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    I experienced this on my own little instance as well, although they managed to create 69000+ users before I noticed that something weird might be going on and I got a message from a helpful soul indicating the same thing.

    I was pointed to a writeup by lemmy.ninja @ https://lemmy.ninja/post/30492 on how they experienced this and proceeded to clean everything up, and adapted this to my own instance. Only thing to mention of note was that I disabled triggers while doing the cleanup. The DELETE statements on person table was excruciatingly slow to the point where I estimated that time to completion approached 48 hours.

    • RotaryKeyboard@lemmy.ninja
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Only thing to mention of note was that I disabled triggers while doing the cleanup.

      Could you explain what that does and how you did it? I can add that to the post as a note.

      • BrightSide@demotheque.com
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 year ago

        Sure. Triggers are normally a good idea as they make sure that data is consistent. Like when you delete a user, a trigger will run to also decrease the number of users by one. But since they run for every row, they can certainly impact performance. Foreign key checks are also implementet as triggers so if your missing an index and the db has to crawl through huge tables of data for every delete (I suspect this was the cause of slowdown during DELETE), that too will affect performance.

        While I don’t do it often, here is the superuser command I use in psql to disable the triggers before doing any other commands:

        SET session_replication_role TO replica;

        This is something postgresq uses internally when applying replication data, as it assumes all the data is correct and valid and don’t fire any of the triggers or rules that would normally apply when modifying data. As you can see from the name, this is a session setting. If you quit the db session, everything goes back to normal so no data is changed and you don’t run the risk of forgetting to change it back when your done.

        If you do want to go back to normal operation during the same session for some reason, this gets you back to the default:

        SET session_replication_role TO origin;

        And thank you for the helpful post about the bot problem in the first place.