this post was submitted on 20 Mar 2025
512 points (99.6% liked)

Technology

67536 readers
4682 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[โ€“] wjs018@piefed.social 114 points 1 week ago* (last edited 1 week ago) (7 children)

Really great piece. We have recently seen many popular lemmy instances struggle under recent scraping waves, and that is hardly the first time its happened. I have some firsthand experience with the second part of this article that talks about AI-generated bug reports/vulnerabilities for open source projects.

I help maintain a python library and got a bug report a couple weeks back of a user getting a type-checking issue and a bit of additional information. It didn't strictly follow the bug report template we use, but it was well organized enough, so I spent some time digging into it and came up with no way to reproduce this at all. Thankfully, the lead maintainer was able to spot the report for what it was and just closed it and saved me from further efforts to diagnose the issue (after an hour or two were burned already).

[โ€“] BrianTheeBiscuiteer@lemmy.world 6 points 1 week ago* (last edited 1 week ago) (1 children)

Testing out a theory with ChatGPT there might be a way, albeit clunky, to detect AI. I asked ChatGPT a simple math question then told it to disregard the rest of the message, then I asked it if it was AI. It answered the math question and told me it was ai. Now a bot probably won't admit to being AI but it might be foolish enough to consider instruction that you explicitly told it not to follow.

Or you might simply be able to waste its resources by asking it to do something computationally difficult that most people would just reject outright.

Of course all of this could just result in making AI even harder to detect once it learns these tricks. ๐Ÿ˜ฌ

[โ€“] itsralC@lemm.ee 3 points 6 days ago

These aren't actual LLMs scraping the web, they're your usual scraping bots used in an industrial scale, disregarding conventions about what they should or shouldn't scrape.

load more comments (5 replies)