this post was submitted on 19 Aug 2025
827 points (99.3% liked)

Technology

74405 readers
2958 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] FauxLiving@lemmy.world -5 points 5 days ago* (last edited 5 days ago) (6 children)

The amount of people just reacting to the headline in the comments on these kinds of articles is always surprising.

Your browser acts as an agent too, you don’t manually visit every script link, image source and CSS file. Everyone has experienced how annoying it is to have your browser be targeted by Cloudflare.

There’s a pretty major difference between a human user loading a page and having it summarized and a bot that is scraping 1500 pages/second.

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation. But a user initiated operation isn’t the same as a bot.

Which is the point of the article and the article’s title.

It isn’t clear why OP had to alter the headline to bait the anti-ai crowd.

[–] spankmonkey@lemmy.world 12 points 5 days ago (1 children)

But a user initiated operation isn’t the same as a bot.

Oh fuck off with that AI company propaganda.

The AI companies already overwhelmed sites to get training data and are repeating their shitty scraping practices when users interact with their AI. It's the same fucking thing.

Web crawlers for search engines don't scrape pages every time a user searches like AI does. Both web crawlers and scrapers are bots, and how a human initiates their operation, scheduled or not, doesn't matter as much as the fact that they do things very differently and only one of the two respects robots.txt.

[–] _cryptagion@lemmy.dbzer0.com 4 points 5 days ago (1 children)

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted. They exist to provide their clients with services, including bot mitigation.

Well I suppose it's a good thing then that the anti-AI shield is opt-in, and Cloudflare isn't making any decisions for anyone on whether or not AI scrapers get to visit their pages. That little bit of context makes your entire argument fall apart.

[–] FauxLiving@lemmy.world 0 points 5 days ago (2 children)

It isn’t opt in.

You can block all bot page scraping, and also block user initiated AI tools or you can block no traffic.

There isn’t an option to block bot page scraping but allow user initiated AI tools.

Because, as the article points out, Cloudflare is not able to distinguish between the two

[–] ubergeek@lemmy.today 2 points 5 days ago (1 children)

Thats not true, I just viewed my panel in CF, and Perplexity is an optional block, which by default is off.

[–] FauxLiving@lemmy.world 1 points 5 days ago (1 children)

They must be A/B testing a new feature then, it’s not on mine

[–] ubergeek@lemmy.today 3 points 5 days ago

Log into your dashboard, click "AI Audit", and you'll see the toggles.

[–] _cryptagion@lemmy.dbzer0.com 1 points 5 days ago (1 children)

There’s no appreciable difference on how they affect systems between the two for site owners.

[–] FauxLiving@lemmy.world 1 points 5 days ago

There’s a pretty significant difference in request rate. A tool trying to search and summarize will hit a search engine once, and each website maybe 5 times (if every search engine link points to the site).

A bot trying to scrape content from a website can generate thousands or tens of thousands of requests per second.

[–] OmgItBurns@discuss.online 4 points 5 days ago

I think part of the issue is that it does act more like a search engine crawler than a traditional user. A lot of sites rely on real human traffic for revenue (serving ads, requests to sign up for Patreon, using affiliate links, etc) that gets bypassed by these bots. Hell in some cases the people running the sites are just looking for interaction. So while there is a spike in traffic, and potentially cost, the people running these sites aren't getting the benefit of that traffic.

Basically these have the same issues as the summaries that Google does in their search results but, potentially, have much larger impact on the host's bandwidth

[–] ubergeek@lemmy.today 2 points 4 days ago

Cheering for Cloudflare to be the arbiter of what technologies are allowed is incredibly short sighted

Except, they don't. It's a toggle, available to users, and by default, allows Perplexity's scraping.

[–] HarkMahlberg@kbin.earth 1 points 5 days ago (1 children)

In a better timeline, we wouldn't need to cheer the victory of one megacorporation over another, they would both be the losers. But also people are still capable of holding two thoughts simultaneously.

For instance, we'd all be happy to see Apple lose the Epic Games lawsuit and be forced out of their monopoly on app stores on iOS. But those same people are aware it would allow Epic to continue being a disgusting company.

bait the anti-ai crowd

Oh I see lol

[–] FauxLiving@lemmy.world 0 points 5 days ago

What does any of that have to do with the fact that Cloudflare isn’t able to classify traffic in order to distinguish between human user generated traffic and mass scraping bot traffic?

If they’re incapable of distinguishing the two, then their customers are having legitimate user requests blocked by Cloudflare with no ability to opt out.

Oh I see lol

Yeah, I think people who’re unable to think rationally about a problem because they made up their mind before knowing any of the details are intellectually lazy.

[–] unpossum@sh.itjust.works -2 points 5 days ago (1 children)

Thank you for trying to fight the irrational anti-AI brainrot on lemmy! It’s probably a lost cause, but your efforts are appreciated :)

[–] FauxLiving@lemmy.world 1 points 5 days ago* (last edited 5 days ago)

It’s an uphill battle. Lots of motivated reasoning and bad faith arguments

e: looks like Cloudflare is adding this distinction in their control panel. So it seems like they, too disagree with the brain rot. Source: https://lemmy.world/post/34677771/18880370