Technology

76672 readers

2239 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

926

AI agents wrong ~70% of time: Carnegie Mellon study (www.theregister.com)

submitted 4 months ago by eli001@lemmy.world to c/technology@lemmy.world

177 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] davidagain@lemmy.world 1 points 4 months ago (1 children)

I would be in breach of contract to tell you the details. How about you just stop trying to blame me for the clear and obvious lies that the LLM churned out and start believing that LLMs ARE are strikingly fallible, because, buddy, you have your head so far in the sand on this issue it's weird.

The solution to the problem was to realise that an LLM cannot be trusted for accuracy even if the first few results are completely accurate, the bullshit well creep in. Don't trust the LLM. Check every fucking thing.

In the end I wrote a quick script that broke the input up on tab characters and wrote the sentence. That's how formulaic it was. I regretted deeply trying to get an LLM to use data.

The frustrating thing is that it is clearly capable of doing the task some of the time, but drifting off into FANTASY is its strong suit, and it doesn't matter how firmly or how often you ask it to be accurate or use the input carefully. It's going to lie to you before long. It's an LLM. Bullshitting is what it does. Get it to do ONE THING only, then check the fuck out of its answer. Don't trust it to tell you the truth any more than you would trust Donald J Trump to.

[–] Melvin_Ferd@lemmy.world 1 points 4 months ago* (last edited 4 months ago) (1 children)

This is crazy. I've literally been saying they are fallible. You're saying your professional fed and LLM some type of dataset. So I can't really say what it was you're trying to accomplish but I'm just arguing that trying to have it process data is not what they're trained to do. LLM are incredible tools and I'm tired of trying to act like they're not because people keep using them for things they're not built to do. It's not a fire and forget thing. It does need to be supervised and verified. It's not exactly an answer machine. But it's so good at parsing text and documents, summarizing, formatting and acting like a search engine that you can communicate with rather than trying to grok some arcane sentence. Its power is in language applications.

It is so much fun to just play around with and figure out where it can help. I'm constantly doing things on my computer it's great for instructions. Especially if I get a problem that's kind of unique and needs a big of discussion to solve.

[–] davidagain@lemmy.world 1 points 4 months ago* (last edited 4 months ago) (1 children)

it’s so good at parsing text and documents, summarizing

No. Not when it matters. It makes stuff up. The less you carefully check every single fucking thing it says, the more likely you are to believe some lies it subtly slipped in as it went along. If truth doesn't matter, go ahead and use LLMs.

If you just want some ideas that you're going to sift through, independently verify and check for yourself with extreme skepticism as if Donald Trump were telling you how to achieve world peace, great, you're using LLMs effectively.

But if you're trusting it, you're doing it very, very wrong and you're going to get humiliated because other people are going to catch you out in repeating an LLM's bullshit.

[–] Melvin_Ferd@lemmy.world 1 points 4 months ago (1 children)

If it's so bad as if you say, could you give an example of a prompt where it'll tell you incorrect information.

[–] davidagain@lemmy.world 1 points 4 months ago (1 children)

It's like you didn't listen to anything I ever said, or you discounted everything I said as fiction, but everything your dear LLM said is gospel truth in your eyes. It's utterly irrational. You have to be trolling me now.

[–] Melvin_Ferd@lemmy.world 1 points 4 months ago (1 children)

Should be easy if it's that bad though

[–] davidagain@lemmy.world 1 points 4 months ago* (last edited 4 months ago) (1 children)

I already told you my experience of the crapness of LLMs and even explained why I can't share the prompt etc. You clearly weren't listening or are incapable of taking in information.

There's also all the testing done by the people talked about in the article we're discussing which you're also irrationally dismissing.

You have extreme confirmation bias.

Everything you hear that disagrees with your absurd faith in the accuracy of the extreme blagging of LLMs gets dismissed for any excuse you can come up with.

[–] Melvin_Ferd@lemmy.world 1 points 4 months ago (1 children)

You're projecting here. I'm asking you to give an example of any prompt. You're saying it's so bad that it needs to be babysat because it's errors. I'll only asking for your to give an example and you're saying that's confirmation bias and acting like I'm being religiously ignorant

[–] davidagain@lemmy.world 1 points 4 months ago

This is you

https://youtu.be/mkcKQmr7kRc