Technology

77090 readers

3338 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

[Opinion] AI finds errors in 90% of Wikipedia's best articles (en.wikipedia.org)

submitted 2 days ago* (last edited 2 days ago) by King@blackneon.net to c/technology@lemmy.world

49 comments fedilink hide all child comments

For one month beginning on October 5, I ran an experiment: Every day, I asked ChatGPT 5 (more precisely, its "Extended Thinking" version) to find an error in "Today's featured article". In 28 of these 31 featured articles (90%), ChatGPT identified what I considered a valid error, often several. I have so far corrected 35 such errors.

you are viewing a single comment's thread
view the rest of the comments

[–] AcesFullOfKings@feddit.uk 64 points 2 days ago* (last edited 2 days ago) (1 children)

If you read the post it's actually quite a good method. Having an LLM flag potential errors and then reviewing them manually as a human is actually quite productive.

I've done exactly that on a project that relies on user-submitted content; moderating submissions at even a moderate scale is hard, but having an llm look through for me is easy. I can then check through anything it flags and manually moderate. Neither the accuracy nor precision is perfect, but it's high enough to be useful so it's a low-effort way to find a decent number of the thing you're looking for. In my case I was looking for abusive submissions from untrusted users; in the OP author's case they were looking for errors. I'm quite sure this method would never find all errors, and as per the article the "errors" it flags aren't always correct either. But the effort:reward ratio is high on a task that would otherwise be unfeasible.

[–] echodot@feddit.uk 2 points 1 day ago (1 children)

But we don't know what the false positive rate is either? How many submissions were blocked that shouldn't have been, it seems like you don't have a way to even find that metric out unless somebody complained about it.

[–] AcesFullOfKings@feddit.uk 6 points 1 day ago* (last edited 1 day ago)

I can then check through anything it flags and manually moderate.

It isn't doing anything automatically; it isn't moderating for me. It's just flagging submissions for human review. "Hey, maybe have a look at this one". So if it falsely flags something it shouldn't, which is common, I simply ignore it. And as I said, that error rate is moderate, and although I haven't checked the numbers of the error rate, it's still successful enough to be quite useful.