this post was submitted on 18 Aug 2025
1099 points (99.1% liked)

Technology

74966 readers
2667 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
(page 2) 50 comments
sorted by: hot top controversial new old
[–] Wispy2891@lemmy.world 13 points 3 weeks ago (2 children)

Question: those artificial stupidity bots want to steal the issues or want to steal the code? Because why they're wasting a lot of resources scraping millions of pages when they can steal everything via SSH (once a month, not 120 times a second)

[–] lime@feddit.nu 5 points 3 weeks ago

they just want all text

load more comments (1 replies)
[–] bizza@lemmy.zip 13 points 3 weeks ago

I use Anubis on my personal website, not because I think anything I’ve written is important enough that companies would want to scrape it, but as a “fuck you” to those companies regardless

That the bots are learning to get around it is disheartening, Anubis was a pain to setup and get running

[–] mfed1122@discuss.tchncs.de 13 points 3 weeks ago* (last edited 3 weeks ago) (7 children)

Okay what about...what about uhhh... Static site builders that render the whole page out as an image map, making it visible for humans but useless for crawlers 🤔🤔🤔

[–] echodot@feddit.uk 7 points 3 weeks ago (1 children)

AI is pretty good at OCR now. I think that would just make it worse for humans while making very little difference to the AI.

[–] mfed1122@discuss.tchncs.de 4 points 3 weeks ago (1 children)

The crawlers are likely not AI though, but yes OCR could be done effectively without AI anyways. This idea ultimately boils down to the same hope Anubis had of making the processing costs large enough to not be worth it.

[–] nymnympseudonym@lemmy.world 6 points 3 weeks ago (2 children)

OCR could be done effectively without AI

OCR has been neural nets even before convolutional networks emerged in the 2010s

load more comments (2 replies)
[–] iopq@lemmy.world 4 points 3 weeks ago

AI these days reads text from images better than humans can

load more comments (5 replies)
[–] Monument@lemmy.sdf.org 10 points 3 weeks ago

Increasingly, I’m reminded of this: Paul Bunyan vs. the spam bot (or how Paul Bunyan triggered the singularity to win a bet). It’s a medium-length read from the old internet, but fun.

[–] Goretantath@lemmy.world 9 points 3 weeks ago (4 children)

I knew that was the worse option. Use the one that traps them in an infinite maze.

load more comments (4 replies)
load more comments
view more: ‹ prev next ›