Serving technology enthusiasts for more than 25 years. TechSpot is the place to go for tech advice and analysis you can trust.
Context: There is no doubt that AI models are not accurate. Developers have struggled with hallucinations and the doubling-down on incorrect information. The use of AI varies so much between individual use cases, it’s difficult to quantify percentages. A research team claims to have these numbers.
Tow Center for Digital Journalism recently The study examined eight AI search engines including ChatGPT Search (also known as ChatGPT Search), Perplexity (also known as Perplexity Pro), Gemini, DeepSeek Search (also known as Grok-2 Search), Grok-3 Search (also called Grok-3 Search), and Copilot. They recorded the frequency with which each tool refused to respond.
Researchers randomly selected 200 news articles from twenty news publishers (10 per publisher). The researchers ensured that each story returned in the top three Google results when they used an excerpt from the article. Then they ran the same query in each AI search tool, and graded accuracy on whether it correctly cited the article, the news organization and the URL.
They then labeled the searches based on the degrees of accuracy “completely correct” through “completely incorrect.” . As you can see in the diagram below, AIs performed poorly, except for both versions of Perplexity. Collectively, AI engines are 60 percent inaccurate. These wrong results were further reinforced by AI’s “confidence” within them.
Click to enlarge.
The study is fascinating because it quantifiably confirms what we have known for a few years – that LLMs are “the slickest con artists of all time.” They report with complete authority that what they say is true even when it is not, sometimes to the point of argument or making up other false assertions when confronted.
In a 2023 anecdotal article, Ted Gioia (The Honest Broker) pointed out dozens of ChatGPT responses, showing that the bot Confidently “lies” respond to multiple queries. Some examples were antagonistic questions, but many were general questions.
“If I believed half of what I heard about ChatGPT, I could let it take over The Honest Broker while I sit on the beach drinking margaritas and searching for my lost shaker of salt,” Gioia sarcastically noted.
ChatGPT would fabricate more information even when it admitted that it was wrong. The LLM appears to be programmed to respond to every user input, no matter what. The researcher’s data confirms that hypothesis. They note that ChatGPT was the only AI tool to answer all 200 article queries. It only scored 28 percent as completely accurate and was incorrect 57 percent of time.
isn’t the worst. Both versions of X’s Grok AI performed badly, with Grok-3 Search 94 percent incorrect. Microsoft’s Copilot wasn’t much better, as it refused to answer 104 out of 200 queries. Only 16 of the remaining 96 were “completely correct,” 14 were “partially correct,” while 66 were a “completely incorrect,” making it approximately 70 percent accurate.
The most bizarre thing about all of this is that these companies are not transparent about their lack of accuracy, while charging $20 to $200 a month for access to their latest AI models. Perplexity Pro ($20/month), Grok-3 Search (above), and Perplexity ($20/month), answered slightly more questions correctly than their free versions, but had much higher error rates. What a con.
But not everyone is in agreement. TechRadar’s Lance Ulanoff stated that he may never use Google again, after trying ChatGPT search. He The tool is described as fast, accurate, and aware with a clean interface that’s ad-free.
Read the Tow Center paper for all the details. Let us know what you think about published in the Columbia Journalism Review.