this post was submitted on 24 May 2025
1 points (100.0% liked)

Science Memes

14652 readers
188 users here now

Welcome to c/science_memes @ Mander.xyz!

A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.



Rules

  1. Don't throw mud. Behave like an intellectual and remember the human.
  2. Keep it rooted (on topic).
  3. No spam.
  4. Infographics welcome, get schooled.

This is a science community. We use the Dawkins definition of meme.



Research Committee

Other Mander Communities

Science and Research

Biology and Life Sciences

Physical Sciences

Humanities and Social Sciences

Practical and Applied Sciences

Memes

Miscellaneous

founded 2 years ago
MODERATORS
 
(page 3) 50 comments
sorted by: hot top controversial new old
[–] ZeffSyde@lemmy.world 0 points 14 hours ago (1 children)

I'm imagining a break future where, in order to access data from a website you have to pass a three tiered system of tests that make, 'click here to prove you aren't a robot' and 'select all of the images that have a traffic light' , seem like child's play.

[–] Tiger_Man_@lemmy.blahaj.zone 0 points 12 hours ago (1 children)

All you need to protect data from ai is use non-http protocol, at least for now

[–] Bourff@lemmy.world 0 points 12 hours ago

Easier said than done. I know of IPFS, but how widespread and easy to use is it?

[–] Tiger_Man_@lemmy.blahaj.zone 0 points 12 hours ago (1 children)

How can i make something like this

[–] Iambus@lemmy.world 0 points 12 hours ago

Typical bluesky post

[–] gmtom@lemmy.world 0 points 11 hours ago (1 children)

Cool, but as with most of the anti-AI tricks its completely trivial to work around. So you might stop them for a week or two, but they'll add like 3 lines of code to detect this and it'll become useless.

[–] JackbyDev@programming.dev 0 points 10 hours ago (2 children)

I hate this argument. All cyber security is an arms race. If this helps small site owners stop small bot scrapers, good. Solutions don't need to be perfect.

[–] ByteOnBikes@slrpnk.net 0 points 7 hours ago (2 children)

I worked at a major tech company in 2018 who didn't take security seriously because that was literally their philosophy, just refusing to do anything until it was an absolute perfect security solution, and everything else is wasted resources.

I left since then and I continue to see them on the news for data leaks.

Small brain people man.

[–] Joeffect@lemmy.world 0 points 6 hours ago

Did they lock their doors?

[–] JackbyDev@programming.dev 0 points 6 hours ago

So many companies let perfect become the enemy of good and it's insane. Recently some discussion about trying to get our team to use a consistent formatting scheme devolved into this type of thing. If the thing being proposed is better than what we currently have, let's implement it as is then if you have concerns about ways to make it better let's address those later in another iteration.

[–] moseschrute@lemmy.world 0 points 8 hours ago

I bet someone like cloudflare could bounce them around traps across multiple domains under their DNS and make it harder to detect the trap.

[–] stm@lemmy.dbzer0.com 0 points 10 hours ago

Such a stupid title, great software!

[–] antihumanitarian@lemmy.world 0 points 6 hours ago (1 children)

Some details. One of the major players doing the tar pit strategy is Cloudflare. They're a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.

Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they're a cheap way to get training data. If you make a non zero portion of training data poisonous you'd have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.

So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.

[–] fossilesque@mander.xyz 0 points 4 hours ago

The fact the internet runs on lava lamps makes me so happy.

[–] mlg@lemmy.world 0 points 5 hours ago

--recurse-depth=3 --max-hits=256

[–] Novocirab@feddit.org 0 points 4 hours ago* (last edited 4 hours ago) (1 children)

There should be a federated system for blocking IP ranges that other server operators within a chain of trust have already identified as belonging to crawlers.

(Here's an advantage of Markov chain maze generators like Nepenthes: Even when crawlers recognize that they have been served garbage and delete it, one still has obtained highly reliable evidence that the IPs that requested it do, in fact, belong to crawlers.)

[–] Opisek@lemmy.world 0 points 4 hours ago (2 children)

You might want to take a look at CrowdSec if you don't already know it.

[–] rekabis@lemmy.ca 0 points 3 hours ago* (last edited 3 hours ago)

Holy shit, those prices. Like, I wouldn’t be able to afford any package at even 10% the going rate.

Anything available for the lone operator running a handful of Internet-addressable servers behind a single symmetrical SOHO connection? As in, anything for the other 95% of us that don’t have literal mountains of cash to burn?

[–] Novocirab@feddit.org 0 points 3 hours ago* (last edited 3 hours ago) (1 children)

Thanks. Makes sense that things roughly along those lines already exist, of course. CrowdSec's pricing, which apparently start at 900$/months, seem forbiddingly expensive for most small-to-medium projects, though. Do you or does anyone else know a similar solution for small or even nonexistent budgets? (Personally I'm not running any servers or projects right now, but may do so in the future.)

load more comments (1 replies)
load more comments
view more: ‹ prev next ›