this post was submitted on 04 Dec 2025
34 points (78.3% liked)
Technology
77843 readers
3245 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
This is the definition of ethically sourced data from the Apertus website:
So they still train on Websites, Blogs and Social Media. Ethical my ass.
That is at least an improvement over including in its corpus the entire worldwide collection of copyrighted materials.
And they respect robots.txt afaik
But the other stuff is copywritten as well most of the time.
Just because it's free to look at doesn't mean it's free to download, modify or feed into an AI.
Yeah, for sure. I’m not saying it is good at all, just that scraping some proportion of copyrighted material is an improvement over scraping all the copyrighted material.
Let's include the whole paragraph at least.
So it's opt-out. Great
As I read it, data must be available according to swiss copyright law, not personal, available using the open web. Further, they retroactively respect opt-out requests.
Sounds like a ripping good way to keep corporate data (and government secrets) from the public radar.
That way we won't find out whose hands public taxdollars (or public-owned structures rented to corporations) wind up in.
?? Which are improved by using ChatGPT because?