this post was submitted on 27 May 2025
521 points (97.8% liked)
Technology
70777 readers
3401 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ok, but is training an AI so it can plagiarize, often verbatim or with extreme visual accuracy, fair use? I see the 2 first articles argue that it is, but they don't mention the many cases where the crawlers and scrappers ignored rules set up to tell them to piss off. That would certainly invalidate several cases of fair use
Instead of charging for everything they scrap, law should force them to release all their data and training sets for free. "But they spent money and time and resources!" So did everyone who created the stuff they're using for their training, so they can fuck off.
The article by Tory also says these things:
I'd wager 99.9% of the art and content created by AI could go straight to the trashcan and nobody would miss it. Comparing AI to the internet is like comparing writing to doing drugs.
You can plagiarize with a computer with copy & paste too. That doesn't change the fact that computers have legitimate non-infringing use cases.
I agree
But 99.9% of the internet is stuff that no one would miss. Things don't have to have value to you to be worth having around. That trash could serve as inspiration for your 0.1% of people or garner feedback for people to improve.
I don't really disagree with your other two points, but
They sure do, of which that is not one. That's de facto copyright infringement or plagiarism. Especially if you then turn around and sell that product.
The key point that is being made is that it you are doing de facto copyright infringement of plagiarism by creating a copy, it shouldn't matter whether that copy was made though copy paste, re-compressing the same image, or by using AI model. The product being the copy paste operation, the image editor or the AI model here, not the (copyrighted) image itself. You can still sell computers with copy paste (despite some attempts from large copyright holders with DRM), and you can still sell image editors.
However, unlike copy paste and the image editor, the AI model could memorize and emit training data, without the input data implying the copyrighted work. (exclude the case where the image was provided itself, or a highly detailed description describing the work was provided, as in this case it would clearly be the user that is at fault, and intending for this to happen)
At the same time, it should be noted that exact replication of training data isn't exactly desirable in any case, and online services for image generation could include a image similarity check against training data, and many probably do this already.