this post was submitted on 25 Nov 2025
769 points (98.9% liked)
Programmer Humor
27490 readers
1620 users here now
Welcome to Programmer Humor!
This is a place where you can post jokes, memes, humor, etc. related to programming!
For sharing awful code theres also Programming Horror.
Rules
- Keep content in english
- No advertisements
- Posts must be related to programming or programmer topics
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I was talking about you, but not /srs, that was an attempt @ satire. I'm dismissing the results by appealing to the fact that there's a process.
Reward is an AI maths term. It's the value according to which the neurons are updated, similar to "loss" or "error", if you've heard those.
Yes this is also possible, it depends on minute details of the training set, which we don't know.
Edit: As I understand, these models are trained in multiple modes, one where they're trying to predict text (supervised learning), but there are also others where it's given a prompt, and the response is sent to another system to be graded i.e. for factual accuracy. It could learn to identify which "training mode" it's in and behave differently. Although, I'm sure the ML guys have already thought of that & tried to prevent it.
I agree, noted this in my comment. Just saying, this isn't evidence either way.