Technology

77843 readers

3245 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

Public AI: Free and Ethical AI models with Social good in mind (publicai.co)

submitted 2 weeks ago by Artisian@lemmy.world to c/technology@lemmy.world

10 comments fedilink hide all child comments

Found from Bruce Shneier's blog. This model is free, ad-free, privacy respecting, and likely to stay that way. If you or folks you know are heavily using GPT, and likely to be hurt when it starts introducing ads (and otherwise enshittifying) soon, do make sure they know there are alternatives like this.

This particular chat model uses a system prompt chosen by the swiss government, with the intention of providing LLM access as a public utility (like a library). I believe models are intentionally trained on ethical datasets (see the details of Aptertus here), with an effort towards sustainable energy use.

all 11 comments

sorted by: hot top controversial new old

[–] Dekkia@this.doesnotcut.it 11 points 2 weeks ago (3 children)

This is the definition of ethically sourced data from the Apertus website:

[...] the training corpus builds only on data which is publicly available.

So they still train on Websites, Blogs and Social Media. Ethical my ass.

[–] theherk@lemmy.world 11 points 2 weeks ago (2 children)

That is at least an improvement over including in its corpus the entire worldwide collection of copyrighted materials.

[–] Tywele@lemmy.dbzer0.com 6 points 2 weeks ago

And they respect robots.txt afaik

[–] Dekkia@this.doesnotcut.it 3 points 2 weeks ago* (last edited 2 weeks ago) (1 children)

But the other stuff is copywritten as well most of the time.

Just because it's free to look at doesn't mean it's free to download, modify or feed into an AI.

[–] theherk@lemmy.world 2 points 2 weeks ago

Yeah, for sure. I’m not saying it is good at all, just that scraping some proportion of copyrighted material is an improvement over scraping all the copyrighted material.

[–] Artisian@lemmy.world 4 points 2 weeks ago (1 children)

Let's include the whole paragraph at least.

Apertus was developed with due consideration to Swiss data protection laws, Swiss copyright laws, and the transparency obligations under the EU AI Act. Particular attention has been paid to data integrity and ethical standards: the training corpus builds only on data which is publicly available. It is filtered to respect machine-readable opt-out requests from websites, even retroactively, and to remove personal data, and other undesired content before training begins.

[–] Dekkia@this.doesnotcut.it 3 points 2 weeks ago (1 children)

So it's opt-out. Great

[–] Artisian@lemmy.world 2 points 2 weeks ago

As I read it, data must be available according to swiss copyright law, not personal, available using the open web. Further, they retroactively respect opt-out requests.

[–] kalkulat@lemmy.world 1 points 2 weeks ago (1 children)

Sounds like a ripping good way to keep corporate data (and government secrets) from the public radar.

That way we won't find out whose hands public taxdollars (or public-owned structures rented to corporations) wind up in.

[–] Artisian@lemmy.world 1 points 2 weeks ago

?? Which are improved by using ChatGPT because?