this post was submitted on 21 Nov 2025
456 points (98.3% liked)
Technology
76986 readers
2169 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Yes, they are. Not sure why you are bringing that up.
For those wondering what the actual difference is (possibly because they don't seem to know):
At a high level, training is when you ingest data to create a model based on characteristics of that data. Inference is when you then apply a model to (preferably new) data. So think of training as "teaching" a model what a cat is, and inference as having that model scan through images for cats.
And a huge part of making a good model is providing good data. That is, generally speaking, done by labeling things ahead of time. Back in the day it was paying people to take an amazon survey where they said "hot dog or no hot dog". These days... it is "anti-bot" technology that gets that for free (think about WHY every single website cares what is a fire hydrant or a bicycle...)
But that is ALSO just simple metrics like "Did the user use what we suggested". Instead of saying "not hot dog" it is "good reply" or "no reply" or "still read email" or "ignored email" and so forth.
And once you know what your pain points are with TOTALLY anonymized user data, you can then "reproduce" said user data to add to your training set. Which is the kind of bullshit facebook, allegedly, has done for years where they'll GLADLY delete your data if you request it... but not that picture of you at the McDonald's down the street because that belongs to Ronjon Buck who worked there one summer. But they'll gladly anonymize your user data so the picture of you actually just corresponds to "User 25156161616" that happens to be the sibling of your sister and so forth...
That is literally just a feedback loop and is core to pretty much any "agentic" network/graph.
There also tend to be laws about opting in and forced EULA agreements. It is almost like the megacorps have acknowledged that they'll just do whatever and MAYBE pay a fee after they have made so much more money already.
I am bringing it up because the setting Google is presenting only describes using AI on your data, not training AI on your data.