this post was submitted on 28 Mar 2025
274 points (96.0% liked)
Technology
68244 readers
3942 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You assume a uniform distribution. I’m guessing that it’s not. The question isn’t ”Does the model contain compressed representations of all works it was trained on”. Enough information on any single image is enough to be a copyright issue.
Besides, the situation isn’t as obviously flawed with image models, when compared to LLMs. LLMs are just broken in this regard, because it only takes a handful of bytes being retained in order to violate copyright.
I think there will be a ”find out” stage fairly soon. Currently, the US projects lots and lots of soft power on the rest of the world to enforce copyright terms favourable to Disney and friends. Accepting copyright violations for AI will erode that power internationally over time.
Personally, I do think we need to rework copyright anyway, so I’m not complaining that much. Change the law, go ahead and make the high seas legal. But set against current copyright laws, most large datasets and most models constitute copyright violations. Just imagine the shitshow if OpenAI was an European company training on material from Disney.