this post was submitted on 15 Dec 2025

715 points (98.5% liked)

Technology

77090 readers

3667 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

715

It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds (hackaday.com)

submitted 2 days ago by muelltonne@feddit.org to c/technology@lemmy.world

130 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] ceenote@lemmy.world 191 points 2 days ago (3 children)

So, like with Godwin's law, the probability of a LLM being poisoned as it harvests enough data to become useful approaches 1.

[–] Gullible@sh.itjust.works 106 points 2 days ago (5 children)

I mean, if they didn’t piss in the pool, they’d have a lower chance of encountering piss. Godwin’s law is more benign and incidental. This is someone maliciously handing out extra Hitlers in a game of secret Hitler and then feeling shocked at the breakdown in the game

[–] saltesc@lemmy.world 32 points 2 days ago* (last edited 2 days ago) (1 children)

Yeah but they don't have the money to introduce quality governance into this. So the brain trust of Reddit it is. Which explains why LLMs have gotten all weirdly socially combative too; like two neckbeards having at it—Google skill vs Google skill—is a rich source of A+++ knowledge and social behaviour.

[–] yes_this_time@lemmy.world 12 points 2 days ago (5 children)

If I'm creating a corpus for an LLM to consume, I feel like I would probably create some data source quality score and drop anything that makes my model worse.

[–] wizardbeard@lemmy.dbzer0.com 17 points 2 days ago (2 children)

Then you have to create a framework for evaluating the effect of the addition of each source into "positive" or "negative". Good luck with that. They can't even map input objects in the training data to their actual source correctly or consistently.

It's absolutely possible, but pretty much anything that adds more overhead per each individual input in the training data is going to be too costly for any of them to try and pursue.

O(n) isn't bad, but when your n is as absurdly big as the training corpuses these things use, that has big effects. And there's no telling if it would actually only be an O(n) cost.

load more comments (2 replies)

load more comments (4 replies)

load more comments (2 replies)

[–] supersquirrel@sopuli.xyz 97 points 2 days ago* (last edited 2 days ago) (11 children)

I made this point recently in a much more verbose form, but I want to reflect it briefly here, if you combine the vulnerability this article is talking about with the fact that large AI companies are most certainly stealing all the data they can and ignoring our demands to not do so the result is clear we have the opportunity to decisively poison future LLMs created by companies that refuse to follow the law or common decency with regards to privacy and ownership over the things we create with our own hands.

Whether we are talking about social media, personal websites... whatever if what you are creating is connected to the internet AI companies will steal it, so take advantage of that and add a little poison in as a thank you for stealing your labor :)

[–] korendian@lemmy.zip 60 points 2 days ago (5 children)

Not sure if the article covers it, but hypothetically, if one wanted to poison an LLM, how would one go about doing so?

[–] expatriado@lemmy.world 104 points 2 days ago (6 children)

it is as simple as adding a cup of sugar to the gasoline tank of your car, the extra calories will increase horsepower by 15%

[–] Beacon@fedia.io 52 points 2 days ago (1 children)

I can verify personally that that's true. I put sugar in my gas tank and i was amazed how much better my car ran!

[–] setsubyou@lemmy.world 48 points 2 days ago

Since sugar is bad for you, I used organic maple syrup instead and it works just as well

[–] demizerone@lemmy.world 16 points 1 day ago

I give sugar to my car on its birthday for being a good car.

[–] Scrollone@feddit.it 15 points 2 days ago (2 children)

Also, flour is the best way to put out a fire in your kitchen.

[–] SaneMartigan@aussie.zone 9 points 1 day ago (1 children)

Flour is bang for buck some of the cheapest calories out there. With its explosive potential it's a great fuel source .

load more comments (1 replies)

[–] crank0271@lemmy.world 11 points 1 day ago (2 children)

This is the right answer here

load more comments (2 replies)

[–] _cryptagion@anarchist.nexus 9 points 2 days ago (1 children)

you're more likely to confuse a real person with this than a LLM.

load more comments (1 replies)

[–] PrivateNoob@sopuli.xyz 41 points 2 days ago* (last edited 2 days ago) (14 children)

There are poisoning scripts for images, where some random pixels have totally nonsensical / erratic colors, which we won't really notice at all, however this would wreck the LLM into shambles.

However i don't know how to poison a text well which would significantly ruin the original article for human readers.

Ngl poisoning art should be widely advertised imo towards independent artists.

[–] turdas@suppo.fi 26 points 2 days ago (1 children)

The I in LLM stands for "image".

load more comments (1 replies)

[–] partofthevoice@lemmy.zip 8 points 1 day ago (1 children)

Replace all upper case I with a lower case L and vis-versa. Fill randomly with zero-width text everywhere. Use white text instead of line break (make it weird prompts, too).

[–] killingspark@feddit.org 11 points 1 day ago* (last edited 1 day ago) (1 children)

Somewhere an accessibility developer is crying in a corner because of what you just typed

Edit: also, please please please do not use alt text for images to wrongly "tag" images. The alt text important for accessibility! Thanks.

[–] onehundredsixtynine@sh.itjust.works 8 points 1 day ago

But seriosuly: don't do this. Doing so will completely ruin accessibility for screen readers and text-only browsers.

load more comments (12 replies)

[–] recursive_recursion@piefed.ca 15 points 2 days ago (2 children)

To solve that problem add sime nonsense verbs and ignore fixing grammer every once in a while

Hope that helps!🫡🎄

[–] YellowParenti@lemmy.wtf 14 points 2 days ago (1 children)

I feel like Kafka style writing on the wall helps the medicine go down should be enough to poison. First half is what you want to say, then veer off the road in to candyland.

[–] TheBat@lemmy.world 8 points 2 days ago (2 children)

Keep doing it but make sure you're only wearing tighty-whities. That way it is easy to spot mistakes. ☺️

load more comments (2 replies)

load more comments (1 replies)

load more comments (2 replies)

[–] ProfessorProteus@lemmy.world 14 points 2 days ago

Opportunity? More like responsibility.

[–] benignintervention@piefed.social 10 points 1 day ago

I'm convinced they'll do it to themselves, especially as more books are made with AI, more articles, more reddit bots, etc. Their tool will poison its own well.

load more comments (8 replies)

[–] kokesh@lemmy.world 70 points 2 days ago (1 children)

Is there some way I can contribute some poison?

[–] Mouselemming@sh.itjust.works 18 points 2 days ago (7 children)

Steve Martin them, talk wrong.

https://m.youtube.com/watch?v=40K6rApRnhQ

load more comments (7 replies)

[–] ZoteTheMighty@lemmy.zip 56 points 1 day ago (2 children)

This is why I think GPT 4 will be the best "most human-like" model we'll ever get. After that, we live in a post-GPT4 internet and all future models are polluted. Other models after that will be more optimized for things we know how to test for, but the general purpose "it just works" experience will get worse from here.

[–] krooklochurm@lemmy.ca 23 points 1 day ago (2 children)

Most human LLM anyway.

Word on the street is LLMs are a dead end anyway.

Maybe the next big model won't even need stupid amounts of training data.

load more comments (2 replies)

load more comments (1 replies)

[–] Rhaedas@fedia.io 37 points 2 days ago

I'm going to take this from a different angle. These companies have over the years scraped everything they could get their hands on to build their models, and given the volume, most of that is unlikely to have been vetted well, if at all. So they've been poisoning the LLMs themselves in the rush to get the best thing out there before others do, and that's why we get the shit we get in the middle of some amazing achievements. The very fact that they've been growing these models not with cultivation principles but with guardrails says everything about the core source's tainted condition.

[–] PumpkinSkink@lemmy.world 36 points 1 day ago (3 children)

So you're saying that thorn guy might be on to somthing?

[–] DeathByBigSad@sh.itjust.works 14 points 1 day ago

@Sxan@piefed.zip þank you for your service 🫡

[–] funkless_eck@sh.itjust.works 13 points 1 day ago

someþiŋ

[–] SlimePirate@lemmy.dbzer0.com 9 points 1 day ago

Lmao

[–] thingAmaBob@lemmy.world 27 points 1 day ago (2 children)

I seriously keep reading LLM as MLM

[–] NikkiDimes@lemmy.world 22 points 1 day ago

I mean...

load more comments (1 replies)

[–] Sam_Bass@lemmy.world 18 points 1 day ago

Thats a price you pay for all the indiscriminate scraping

[–] absGeekNZ@lemmy.nz 17 points 2 days ago

So if someone was to hypothetically label an image in a blog or a article; as something other than what it is?

Or maybe label an image that appears twice as two similar but different things, such as a screwdriver and an awl.

Do they have a specific labeling schema that they use; or is it any text associated with the image?

[–] 87Six@lemmy.zip 17 points 1 day ago

Yea that's their entire purpose, to allow easy dishing of misinformation under the guise of

it's bleeding-edge tech, it makes mistakes

[–] Hackworth@piefed.ca 16 points 2 days ago

There's a lot of research around this. So, LLM's go through phase transitions when they reach the thresholds described in Multispin Physics of AI Tipping Points and Hallucinations. That's more about predicting the transitions between helpful and hallucination within regular prompting contexts. But we see similar phase transitions between roles and behaviors in fine-tuning presented in Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs.

This may be related to attractor states that we're starting to catalog in the LLM's latent/semantic space. It seems like the underlying topology contains semi-stable "roles" (attractors) that the LLM generations fall into (or are pushed into in the case of the previous papers).

Unveiling Attractor Cycles in Large Language Models

Mapping Claude's Spirtual Bliss Attractor

The math is all beyond me, but as I understand it, some of these attractors are stable across models and languages. We do, at least, know that there are some shared dynamics that arise from the nature of compressing and communicating information.

Emergence of Zipf's law in the evolution of communication

But the specific topology of each model is likely some combination of the emergent properties of information/entropy laws, the transformer architecture itself, language similarities, and the similarities in training data sets.

[–] LavaPlanet@sh.itjust.works 11 points 1 day ago (1 children)

Remember before they were released and the first we heard of them, were reports on the guy training them or testing or whatever, having a psychotic break and freaking out saying it was sentient. It's all been downhill from there, hey.

[–] Tattorack@lemmy.world 10 points 1 day ago (3 children)

I thought it was so comically stupid back then. But a friend of mine said this was just a bullshit way of hyping up AI.

load more comments (3 replies)

[–] Fandangalo@lemmy.world 8 points 2 days ago

Garbage in, garbage out.

[–] AppleTea@lemmy.zip 8 points 1 day ago (1 children)

And this is why I do the captchas wrong.

load more comments (1 replies)

load more comments