this post was submitted on 19 Aug 2025
827 points (99.3% liked)

Technology

74359 readers
2591 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] Glitchvid@lemmy.world 240 points 4 days ago (3 children)

When a firm outright admits to bypassing or trying to bypass measures taken to keep them out, you think that would be a slam dunk case of unauthorized access under the CFAA with felony enhancements.

[–] GamingChairModel@lemmy.world 95 points 4 days ago (2 children)

Fuck that. I don't need prosecutors and the courts to rule that accessing publicly available information in a way that the website owner doesn't want is literally a crime. That logic would extend to ad blockers and editing HTML/js in an "inspect element" tag.

[–] EncryptKeeper@lemmy.world 55 points 4 days ago (11 children)

That logic would not extend to ad blockers, as the point of concern is gaining unauthorized access to a computer system or asset. Blocking ads would not be considered gaining unauthorized access to anything. In fact it would be the opposite of that.

[–] GamingChairModel@lemmy.world 18 points 4 days ago (14 children)

gaining unauthorized access to a computer system

And my point is that defining "unauthorized" to include visitors using unauthorized tools/methods to access a publicly visible resource would be a policy disaster.

If I put a banner on my site that says "by visiting my site you agree not to modify the scripts or ads displayed on the site," does that make my visit with an ad blocker "unauthorized" under the CFAA? I think the answer should obviously be "no," and that the way to define "authorization" is whether the website puts up some kind of login/authentication mechanism to block or allow specific users, not to put a simple request to the visiting public to please respect the rules of the site.

To me, a robots.txt is more like a friendly request to unauthenticated visitors than it is a technical implementation of some kind of authentication mechanism.

Scraping isn't hacking. I agree with the Third Circuit and the EFF: If the website owner makes a resource available to visitors without authentication, then accessing those resources isn't a crime, even if the website owner didn't intend for site visitors to use that specific method.

[–] Glitchvid@lemmy.world 19 points 4 days ago* (last edited 4 days ago) (10 children)

When sites put challenges like Anubis or other measures to authenticate that the viewer isn't a robot, and scrapers then employ measures to thwart that authentication (via spoofing or other means) I think that's a reasonable violation of the CFAA in spirit — especially since these mass scraping activities are getting attention for the damage they are causing to site operators (another factor in the CFAA, and one that would promote this to felony activity.)

The fact is these laws are already on the books, we may as well utilize them to shut down this objectively harmful activity AI scrapers are doing.

load more comments (10 replies)
load more comments (13 replies)
load more comments (10 replies)
[–] kibiz0r@midwest.social 28 points 4 days ago (1 children)

They already prosecute people under the unauthorized access provision. They just don’t prosecute rich people under it.

load more comments (1 replies)
load more comments (2 replies)
[–] floquant@lemmy.dbzer0.com 233 points 4 days ago (1 children)

It's difficult to be a shittier company than OpenAI, but Perplexity seems to be trying hard.

[–] Brunbrun6766@lemmy.world 69 points 4 days ago (2 children)

Step 1, SOMEHOW find a more punchable face than Altman

load more comments (2 replies)
[–] WolfLink@sh.itjust.works 146 points 4 days ago (1 children)

This is a nice CloudFlare ad

[–] pyre@lemmy.world 28 points 4 days ago (13 children)

yeah. still not worth dealing with fucking cloudflare. fuck cloudflare.

load more comments (13 replies)
[–] Kissaki@feddit.org 100 points 4 days ago* (last edited 4 days ago) (5 children)

Perplexity argues that a platform’s inability to differentiate between helpful AI assistants and harmful bots causes misclassification of legitimate web traffic.

So, I assume Perplexity uses appropriate identifiable user-agent headers, to allow hosters to decide whether to serve them one way or another?

[–] lime@feddit.nu 37 points 4 days ago (4 children)

yeah it's almost like there as already a system for this in place

load more comments (4 replies)
load more comments (4 replies)
[–] JeeBaiChow@lemmy.world 87 points 4 days ago
[–] Amberskin@europe.pub 70 points 3 days ago (2 children)

Uh, are they admitting they are trying to circumvent technological protections setup to restrict access to a system?

Isn’t that a literal computer crime?

[–] dinckelman@lemmy.world 41 points 3 days ago (1 children)

No-no, see. When an AI-first company does it, it's actually called courageous innovation. Crimes are for poor people

load more comments (1 replies)
[–] utopiah@lemmy.world 16 points 3 days ago (2 children)

puts on evil hat CloudFlare should DRM their protection then DMCA Perplexity and other US based "AI" companies to oblivion. Side effect, might break the Internet.

load more comments (2 replies)
[–] frezik@lemmy.blahaj.zone 70 points 4 days ago

Traveling snake oil salesman complains he can't pick people's locks.

[–] tibi@lemmy.world 67 points 4 days ago

You could say they are... Perplexed.

[–] cupcakezealot@piefed.blahaj.zone 63 points 4 days ago (1 children)
[–] boonhet@sopuli.xyz 50 points 4 days ago

As far as security is concerned, their w's are pretty common tbh. It's just the whole centralization issue.

[–] NotASharkInAManSuit@lemmy.world 54 points 4 days ago (1 children)

That’s the entire point, dipshit. I wish we got one of the cool techno dystopias rather than this boring corporate idiot one.

[–] Dojan@pawb.social 15 points 4 days ago

I'm still holding out for Stephen Hawking to mail out Demon Summoning programs.

[–] EtherWhack@lemmy.world 50 points 4 days ago
[–] sylver_dragon@lemmy.world 48 points 4 days ago (3 children)

You'd think that a competent technology company, with their own AI would be able to figure out a way to spoof Cloudflare's checks. I'd still think that.

[–] spankmonkey@lemmy.world 65 points 4 days ago* (last edited 4 days ago)

Or find a more efficient way to manage data, since their current approach is basically DDOSing the internet for training data and also for responding to user interactions.

[–] Quill7513@slrpnk.net 29 points 4 days ago

see, but they're not competent. further, they don't care. most of these ai companies are snake oil. they're selling you a solution that doesn't meaningfully solve a problem. their main way of surviving is saying "this is what it can do now, just imagine what it can do if you invest money in my company."

they're scammers, the lot of them, running ponzi schemes with our money. if the planet dies for it, that's no concern of theirs. ponzi schemes require the schemer to have no long term plan, just a line of credit that they can keep drawing from until they skip town before the tax collector comes

[–] lemmyng@piefed.ca 19 points 4 days ago

Perplexity: "But that would cost us moneeyyyy!"

[–] ubergeek@lemmy.today 44 points 4 days ago (2 children)

Good. I went through my CF panel, and blocked some of those "AI Assistants" that by default were open, including Perplexity's.

load more comments (2 replies)
[–] Electricd@lemmybefree.net 38 points 3 days ago (1 children)

I don't like cloudflare but it's nice that they allow people to stop AI scrapping if they want to

[–] tempest@lemmy.ca 26 points 3 days ago (4 children)

CloudFlare has become an Internet protection racket and I'm not happy about it.

[–] Laser@feddit.org 20 points 3 days ago (3 children)

It's been this from the very beginning. But they don't fit the definition of a protection racket as they're not the ones attacking you if you don't pay up. So they're more like a security company that has no competitors due to the needed investment to operate.

load more comments (3 replies)
load more comments (3 replies)
[–] iAvicenna@lemmy.world 35 points 4 days ago (1 children)
[–] prex@aussie.zone 24 points 4 days ago

They tried nothing & they're all out of ideas.

[–] TheGrandNagus@lemmy.world 32 points 3 days ago (1 children)

Can't believe I've lived to see Cloudflare be the good guys

load more comments (1 replies)
[–] peoplebeproblems@midwest.social 32 points 4 days ago

Well... Good.

[–] wosat@lemmy.world 29 points 4 days ago

This is why companies like Perplexity and OpenAI are creating browsers.

good, that means it’s working

I’m gonna be frustrated (though not surprised) if the response is anything other than this.

[–] Ermiar@lemmy.world 20 points 4 days ago* (last edited 4 days ago) (1 children)
load more comments (1 replies)
[–] kescusay@lemmy.world 20 points 4 days ago

I set up a WAF for my company's publicly facing developer portal to block out bot traffic from assholes like these guys. It reduced bot traffic to the site by something like - I kid you not - 99.999%.

Fucking data vultures.

[–] BaroqueInMind@piefed.social 19 points 4 days ago

Cry more, Perplexity.

[–] SugarCatDestroyer@lemmy.world 18 points 4 days ago* (last edited 4 days ago) (2 children)

It seems like it's some kind of distraction to make people think things aren't as bad as they really are, it just sounds too far-fetched to me.

It's like a bear that has eaten too much and starts whining because a small rabbit is running away from him, even though the bear has already eaten almost all the rabbits and is clearly full.

load more comments (2 replies)
[–] Ekybio@lemmy.world 18 points 4 days ago (3 children)

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details

[–] spankmonkey@lemmy.world 51 points 4 days ago

AI crawlers tend to overwhelm websites by doing the least efficient scraping of data possible, basically DDOSing a huge portion of the internet. Perplexity already scraped the net for training data and is now hammering it inefficiently for searches.

Cloudflare is just trying to keep the bots from overwhelming everything.

[–] panda_abyss@lemmy.ca 31 points 4 days ago* (last edited 4 days ago) (4 children)

Cloudflare runs as a CDN/cache/gateway service in front of a ton of websites. Their service is to help protect against DDOS and malicious traffic.

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

This is a response to that from Perplexity who run an AI search company. I don’t actually know how their service works, but they were specifically called out in the announcement and Cloudflare accused them of “stealth scraping” and ignoring robots.txt and other things.

[–] very_well_lost@lemmy.world 29 points 4 days ago* (last edited 4 days ago) (3 children)

A few weeks ago cloudflare announced they were going to block AI crawling (good, in my opinion). However they also added a paid service that these AI crawlers can use, so it actually becomes a revenue source for them.

I think it's also worth pointing out that all of the big AI companies are currently burning through cash at an absolutely astonishing rate, and none of them are anywhere close to being profitable. So pay-walling the data they use is probably gonna be pretty painful for their already-tortured bottom line (good).

load more comments (3 replies)
load more comments (3 replies)
[–] BetaDoggo_@lemmy.world 19 points 4 days ago* (last edited 4 days ago) (11 children)

Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.

Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.

load more comments (11 replies)
load more comments
view more: next ›