this post was submitted on 12 Mar 2025
997 points (99.1% liked)
Technology
66067 readers
5180 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
A 100% accurate AI would be useful. A 99.999% accurate AI is in fact useless, because of the damage that one miss might do.
It's like the French say: Add one drop of wine in a barrel of sewage and you get sewage. Add one drop of sewage in a barrel of wine and you get sewage.
I think it largely depends on what kind of AI we're talking about. iOS has had models that let you extract subjects from images for a while now, and that's pretty nifty. Affinity Photo recently got the same feature. Noise cancellation can also be quite useful.
As for LLMs? Fuck off, honestly. My company apparently pays for MS CoPilot, something I only discovered when the garbage popped up the other day. I wrote a few random sentences for it to fix, and the only thing it managed to consistently do was screw the entire text up. Maybe it doesn't handle Swedish? I don't know.
One of the examples I sent to a friend is as follows, but in Swedish;
And CoPilot was like "yeah, let me fix this for you!"
Most AIs struggle with languages other than English, unfortunately, I hate how it reinforces the "defaultness" of English
99.999% accurate would be pretty useful. Theres plenty of misinformation without AI. Nothing and nobody will be perfect.
Trouble is they range from 0-95% accurate depending on the topic and given context while being very confident when they’re wrong.
We're not talking about an AI running a nuclear reactor, this article is about AI assistants on a personal phone. 0.001% failure rates for apps on your phone isn't that insane, and generally the only consequence of those failures would be you need to try a slightly different query. Tools like Alexa or Siri mishear user commands probably more than 0.001% of the time, and yet those tools have absolutely caught on for a significant amount of people.
The issue is that the failure rate of AI is high enough that you have to vet the outputs which typically requires about as much work as doing whatever you wanted the AI to do yourself, and using AI for creative things like art or videos is a fun novelty, but isn't something that you're doing regularly and so your phone trying to promote apps that you only want to use once in a blue moon is annoying. If AI were actually so useful you could query it with anything and 99.999% of the time get back exactly what you wanted, AI would absolutely become much more useful.
collapsed inline media
People love to make these claims.
Nothing is "100% accurate" to begin with. Humans spew constant FUD and outright malicious misinformation. Just do some googling for anything medical, for example.
So either we acknowledge that everything is already "sewage" and this changes nothing or we acknowledge that people already can find value from searching for answers to questions and they just need to apply critical thought toward whether I_Fucked_your_mom_416 on gamefaqs is a valid source or not.
Which gets to my big issue with most of the "AI Assistant" features. They don't source their information. I am all for not needing to remember the magic incantations to restrict my searches to a single site or use boolean operators when I can instead "ask jeeves" as it were. But I still want the citation of where information was pulled from so I can at least skim it.
99.999% would be fantastic.
90% is not good enough to be a primary feature that discourages inspection (like a naive chatbot).
What we have now is like...I dunno, anywhere from <1% to maybe 80% depending on your use case and definition of accuracy, I guess?
I haven't used Samsung's stuff specifically. Some web search engines do cite their sources, and I find that to be a nice little time-saver. With the prevalence of SEO spam, most results have like one meaningful sentence buried in 10 paragraphs of nonsense. When the AI can effectively extract that tiny morsel of information, it's great.
Ideally, I don't ever want to hear an AI's opinion, and I don't ever want information that's baked into the model from training. I want it to process text with an awareness of complex grammar, syntax, and vocabulary. That's what LLMs are actually good at.
Again: What is the percent "accurate" of an SEO infested blog about why ivermectin will cure all your problems? What is the percent "accurate" of some kid on gamefaqs insisting that you totally can see Lara's tatas if you do this 90 button command? Or even the people who insist that Jimi was talking about wanting to kiss some dude in Purple Haze.
Everyone is hellbent on insisting that AI hallucinates and... it does. You know who else hallucinates? Dumbfucks. And the internet is chock full of them. And guess what LLMs are training on? Its the same reason I always laugh when people talk about how AI can't do feet or hands and ignore the existence of Rob Liefeld or WHY so many cartoon characters only have four fingers.
Like I said: I don't like the AI Assistants that won't tell me where they got information from and it is why I pay for Kagi (they are also AI infested but they put that at higher tiers so I get a better search experience at the tier I pay for). But I 100% use stuff like chatgpt to sift through the ninety bazillion blogs to find me a snippet of a helm chart that I can then deep dive on whether a given function even exists.
But the reality is that people are still benchmarking LLMs against a reality that has never existed. The question shouldn't be "we need this to be 100% accurate and never hallucinate" and instead be "What web pages or resources were used to create this answer" and then doing what we should always be doing: Checking the sources to see if they at least seem trustworthy.
I don't think that's a good comparison in context. If Forbes replaced all their bloggers with ChatGPT, that might very well be a net gain. But that's not the use case we're talking about. Nobody goes to Forbes as their first step for information anyway (I mean...I sure hope not...).
Correct.
If we're talking about an AI search summarizer, then the accuracy lies not in how correct the information is in regard to my query, but in how closely the AI summary matches the cited source material. Kagi does this pretty well. Last I checked, Bing and Google did it very badly. Not sure about Samsung.
On top of that, the UX is critically important. In a traditional search engine, the source comes before the content. I can implicitly ignore any results from Forbes blogs. Even Kagi shunts the sources into footnotes. That's not a great UX because it elevates unvetted information above its source. In this context, I think it's fair to consider the quality of the source material as part of the "accuracy", the same way I would when reading Wikipedia. If Wikipedia replaced their editors with ChatGPT, it would most certainly NOT be a net gain.
You know, I was happy to dig through 9yo StackOverflow posts and adapt answers to my needs, because at least those examples did work for somebody. LLMs for me are just glorified autocorrect functions, and I treat them as such.
A colleague of mine had a recent experience with Copilot hallucinating a few Python functions that looked legit, ran without issue and did fuck all. We figured it out on testing, but boy was that a wake up call (colleague in question has what you might call an early adopter mindset).
For real. If a human performs task X with 80% accuracy, an AI needs to perform the same task with 80.1% accuracy to be a better choice - not 100%. Furthermore, we should consider how much time it would take for a human to perform the task versus an AI. That difference can justify the loss of accuracy. It all depends on the problem you're trying to solve. With that said, it feels like AI in mobile devices hardly solves any problems.
Perplexity is kinda half-decent with showing its sources, and I do rely on it a lot to get me 50% of the way there, at which point I jump into the suggested sources, do some of my own thinking, and do the other 50% myself.
It's been pretty useful to me so far.
I've realised I don't want complete answers to anything really. Give me a roundabout gist or template, and then tell me where to look for more if I'm interested.
I think you nailed it. In the grand scheme of things, critical thinking is always required.
The problem is that, when it comes to LLMs, people seem to use magical thinking instead. I'm not an artist, so I oohd and aahd at some of the AI art I got to see, especially in the early days, when we weren't flooded with all this AI slop. But when I saw the coding shit it spewed? Thanks, I'll pass.
The only legit use of AI in my field that I know of is an unit test generator, where tests were measured for stability and code coverage increase before being submitted to dev approval. But actual non-trivial production grade code? Hell no.
Even those examples are the kinds of things that "fall apart" if you actually think things through.
Art? Actual human artists tend to use a ridiculous amount of "AI" these days and have been for well over a decade (probably closer to two, depending on how you define "AI"). Stuff like magic erasers/brushes are inherently looking at the picture around it (training data) and then extrapolating/magicking what it would look like if you didn't have that logo on your shirt and so forth. Same with a lot of weathering techniques/algorithms and so forth.
Same with coding. People more or less understand that anyone who is working on something more complex than a coding exercise is going to be googling a lot (even if it is just that you will never ever remember how to do file i/o in python off the top of your head). So a tool that does exactly that is.... bad?
Which gets back to the reality of things. Much like with writing a business email or organizing a calendar: If a computer program can do your entire job for you... maybe shut the fuck up about that program? Chatgpt et al aren't meant to replace the senior or principle software engineer who is in lots of design meetings or optimizing the critical path of your corporate secret sauce.
It is replacing junior engineers and interns (which is gonna REALLY hurt in ten years but...). Chatgpt hallucinated a nonsense function? That is what CI testing and code review is for. Same as if that intern forgot to commit a file or that rockstar from facebook never ran the test suite.
Of course, the problem there is that the internet is chock full of "rock star coders" who just insist the world would be a better place if they never had to talk to anyone and were always given perfectly formed tickets so they could just put their headphones on and work and ignore Sophie's birthday and never be bothered by someone asking them for help (because, trust me, you ALWAYS want to talk to That Guy about... anything). And they don't realize that they were never actually hot shit and were mostly always doing entry level work.
Personally? I only trust AI to directly write my code for me if it is in an airgapped environment because I will never trust black box code I pulled off the internet to touch corporate data. But I will 100% use it in place of google to get an example of how to do something that I can use for a utility function or adapt to solving my real problem. And, regardless, I will review and test that just as thoroughly as the code Fred in accounting's son wrote because I am the one staying late if we break production.
And just to add on, here is what I told a friend's kid who is an undergrad comp sci:
LLMs are awesome tools. But if the only thing you bring to the table is that you can translate the tickets I assigned to you to a query to chatgpt? Why am I paying you? Why am I not expensing a prompt engineering course on udemy and doing it myself?
Right now? Finding a job is hard but there are a lot of people like me who understand we still need to hire entry level coders to make sure we have staff ready to replace attrition over the next decade (or even five years). But I can only hire so many people and we aren't a charity: If you can't do your job we will drop you the moment we get told to trim our budget.
So use LLMs because they are an incredibly useful tool. But also get involved in design and planning as quickly as possible. You don't want to be the person writing the prompts. You want to be the person figuring out what prompts we need to write.
In short, AI is useful when it's improving workflow efficiency and not much else beyond that. People just unfortunately see it as a replacement for the worker entirely.
If you wanna get loose with your definition of "AI," you can go all the way back to the MS Paint magic wand tool for art. It's simply an algorithm for identifying pixels within a certain color tolerance of each other.
The issue has never been the tool itself, just the way that it's made and/or how companies intend to use it.
Companies want to replace their entire software division, senior engineers included, with ChatGPT or equivalent because it's cheaper, and they don't value the skill of their employees at all. They don't care how often it's wrong, or how much more work the people that they didn't replace have to do to fix what the AI breaks, so long as it's "good enough."
It's the same in art. By the time somebody is working as an artist, they're essentially at a senior software engineer level of technical knowledge and experience. But society doesn't value that skill at all, and has tried to replace it with what is essentially a coding tool trained on code sourced from pirated software and sold on the cheap. A new market of cheap knockoffs on demand.
There's a great story I heard from somebody who works at a movie studio where they tried hiring AI prompters for their art department. At first, things were great. The senior artist could ask the team for concept art of a forest, and the prompters would come back the next day with 15 different pictures of forests while your regular artists might have that many at the end of the week. However, if you said, "I like this one, but give me some versions without the people in them," they'd come back the next day with 15 new pictures of forests, but not the original without the people. They simply could not iterate, only generate new images. They didn't have any of the technical knowledge required to do the job because they depended completely on the AI to do it for them. Needless to say, the studio has put a ban on hiring AI prompters.