And it rarely works in scientific fields right away - usually an established wrong idea needs to be overwhelmed with serious proof before scientists start to consider that what they "know" might be wrong.
MangoCats
Toxicity is everywhere, you can't recognize that "Drill baby drill" has sexual connotations if you've never been exposed to sexual double entendre like that before.
Is it just me that things this seems like a no-brainer?
Yes, and no. When raising our children, my wife prefers the "ban the bad stuff" approach. I don't encourage exposure to bad stuff, but when my kid wants to buy and watch a raunchy movie, instead of yelling "NO!" and making him put it back, I let him buy it and we watch it, together, pausing to explain the unrealistic and awful parts and explain how imitating these things in real life can cause problems for you.
Well - if you want to devolve into argument, you can argue all day long about "what is reasoning?"
Hallucinations and the cost of running the models.
So, inaccurate information in books is nothing new. Agreed that the rate of hallucinations needs to decline, a lot, but there has always been a need for a veracity filter - just because it comes from "a book" or "the TV" has never been an indication of absolute truth, even though many people stop there and assume it is. In other words: blind trust is not a new problem.
The cost of running the models is an interesting one - how does it compare with publication on paper to ship globally to store in environmentally controlled libraries which require individuals to physically travel to/from the libraries to access the information? What's the price of the resulting increased ignorance of the general population due to the high cost of information access?
What good is a bunch of knowledge stuck behind a search engine when people don't know how to access it, or access it efficiently?
Granted, search engines already take us 95% (IMO) of the way from paper libraries to what AI is almost succeeding in being today, but ease of access of information has tremendous value - and developing ways to easily access the information available on the internet is a very valuable endeavor.
Personally, I feel more emphasis should be put on establishing the veracity of the information before we go making all the garbage easier to find.
I also worry that "easy access" to automated interpretation services is going to lead to a bunch of information encoded in languages that most people don't know because they're dependent on machines to do the translation for them. As an example: shiny new computer language comes out but software developer is too lazy to learn it, developer uses AI to write code in the new language instead...
I'm not trained or paid to reason, I am trained and paid to follow established corporate procedures. On rare occasions my input is sought to improve those procedures, but the vast majority of my time is spent executing tasks governed by a body of (not quite complete, sometimes conflicting) procedural instructions.
If AI can execute those procedures as well as, or better than, human employees, I doubt employers will care if it is reasoning or not.
I think as we approach the uncanny valley of machine intelligence, it's no longer a cute cartoon but a menacing creepy not-quite imitation of ourselves.
My impression of LLM training and deployment is that it's actually massively parallel in nature - which can be implemented one instruction at a time - but isn't in practice.
It's not just the memorization of patterns that matters, it's the recall of appropriate patterns on demand. Call it what you will, even if AI is just a better librarian for search work, that's value - that's the new Google.
7.50 on each USB drive
Ouch!
Not my hypothesis. And it is just bullshit, but if you pay attention, they have made similar runs at taxing and controlling the internet periodically since the 1990s.
I say it's simply easier to recognize something when you've seen more examples of it.
If you're training an image discriminator on apples, bananas, oranges, pears and penises, it will inevitably do better overall if 10-30% of the images it trains on are penises, rather than 0.01% penises - even if in operation it is only expected to encounter dick pics very rarely.