this post was submitted on 04 May 2025
88 points (79.3% liked)
Technology
69772 readers
3820 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
And what's your evidence for this claim? It seems to be false given the times people have tricked LLMs into spitting out verbatim or near-verbatim copies of training data. See this article as one of many examples out there.
Again, what's the evidence for this? Why do you think that of all the observable patterns, the AI will specifically copy "ideas" and "styles" but never copyrighted works of art? The examples from the above article contradict this as well. AIs don't seem to be able to distinguish between abstract ideas like "plumbers fix pipes" and specific copyright-protected works of art. They'll happily reproduce either one.
That article is over a year old. The NYT case against OpenAI turned out to be quite flimsy, their evidence was heavily massaged. What they did was pick an article of theirs that was widely copied across the Internet (and thus likely to be "overfit", a flaw in training that AI trainers actively avoid nowadays) and then they'd give ChatGPT the first 90% of the article and tell it to complete the rest. They tried over and over again until eventually something that closely resembled the remaining 10% came out, at which point they took a snapshot and went "aha, copyright violated!"
They had to spend a lot of effort to get that flimsy case. It likely wouldn't work on a modern AI, training techniques are much better now. Overfitting is better avoided and synthetic data is used.
Because it's literally physically impossible. The classic example is Stable Diffusion 1.5, which had a model size of around 4GB and was trained on over 5 billion images (the LAION5B dataset). If it was actually storing the images it was being trained on then it would be compressing them to under 1 byte of data.
This is simply incorrect.
The NYT was just one example. The Mario examples didn't require any such techniques. Not that it matters. Whether it's easy or hard to reproduce such an example, it is definitive proof that the information can in fact be encoded in some way inside of the model, contradicting your claim that it is not.
Storing a copy of the entire dataset is not a prerequisite to reproducing copyright-protected elements of someone's work. Mario's likeness itself is a protected work of art even if you don't exactly reproduce any (let alone every) image that contained him in the training data. The possibility of fitting the entirety of the dataset inside a model is completely irrelevant to the discussion.
Yet evidence supports it, while you have presented none to support your claims.
Learning what a character looks like is not a copyright violation. I'm not a great artist but I could probably draw a picture that's recognizably Mario, does that mean my brain is a violation of copyright somehow?
I presented some, you actually referenced what I presented in the very comment where you're saying I presented none.
You can actually support your case very simply and easily. Just find the case law where AI training has been ruled a copyright violation. It's been a couple of years now (as evidenced by the age of that news article you dug up), yet all the lawsuits are languishing or defunct.
And nobody claimed it was. But you're claiming that this knowledge cannot possibly be used to make a work that infringes on the original. This analogy about whether brains are copyright violations make no sense and is not equivalent to your initial claim.
But that's not what I claimed is happening. It's also not the opposite of what you claimed. You claimed that AI training is not even in the domain of copyright, which is different from something that is possibly in that domain, but is ruled to not be infringing. Also, this all started by you responding to another user saying the copyright situation "should be fixed". As in they (and I) don't agree that the current situation is fair. A current court ruling cannot prove that things should change. That makes no sense.
Honestly, none of your responses have actually supported your initial position. You're constantly moving to something else that sounds vaguely similar but is neither equivalent to what you said nor a direct response to my objections.
I am not. The only thing I've been claiming is that AI training is not copyright violation, and the AI model itself is not copyright violation.
As an analogy, you can use Photoshop to draw a picture of Mario. That does not mean that Photoshop is violating copyright by existing, and Adobe is not violating copyright by having created Photoshop.
I have no idea what this means.
I'm saying that the act of training an AI does not perform any actions that are within the realm of the actions that copyright could actually say anything about. It's like if there's a law against walking your dog without a leash, and someone asks "but does it cover aircraft pilots' licenses?" No, it doesn't, because there's absolutely no commonality between the two subjects. It's nonsensical.
I'm pretty sure you're misinterpreting my position.
The "copyright situation" regarding an actual literal picture of Mario doesn't need to be fixed because it's already quite clear. There's nothing that needs to change to make an AI-generated image of Mario count as a copyright violation, that's what the law already says and AI's involvement is irrelevant.
When people talk about needing to "change copyright" they're talking about making something that wasn't illegal previously into something that is illegal after the change. That's presumably the act of training or running an AI model. What else could they be talking about?