Even premium AI tools can distort news and fabricate links. These are the worst

NoSystem images/Getty Images

AI tools and news just don’t seem to mix — even at the premium tier.

The Tow Center for Digital Journalism at Columbia University has conducted a new studythat shows that AI chatbots are often misidentifying news articles, presenting incorrect information without qualification, and fabricating links to news articles which do not exist. The findings are based on an initial study Tow published in November, which showed ChatGPT search misrepresenting content by publishers without any awareness that it could be wrong.

This new AI benchmark measures the amount of lying models do

This trend is not new. BBC reported last month that ChatGPT and Gemini chatbots struggled with accurately summarizing news stories, instead delivering “significant inaccuracies” or “distortions.”Here’s what you need to know about the models that are least reliable.

Failing To Identify News Articles

Two researchers randomly selected 10 articles from 20 publishers. Eight chatbots were given article excerpts and asked to return the headline, publisher date, URL, and the title of the article. Researchers note that Gemini could soon have access to Google Search history, if you allow it.

Columbia Journalism Review

After running the 1,600 queries, researchers ranked chatbot responses based on how accurately they retrieved the article, publisher, and URL. The chatbots returned wrong answers to over 60% of the queries. Within that, results varied by chatbot: Perplexity got 37% of the queries wrong, while Grok 3 weighed in at 94% errors.

Columbia Journalism Review

Why does this matter? If chatbots are worse than Google at correctly retrieving news, they can’t necessarily be relied upon to interpret and cite that news — which makes the content of their responses, even when linked, much more dubious.

Confidently giving wrong answers

Researchers note the chatbots returned wrong answers with “alarming confidence,” tending not to qualify their results or admit to knowledge gaps. ChatGPT “never declined to provide an answer,” despite 134 of its 200 responses being incorrect. Out of all eight tools, Copilot declined to answer more queries than it responded to.

“All of the tools were consistently more likely to provide an incorrect answer than to acknowledge limitations,” the report clarifies.

Paid tiers aren’t more reliable

While premium models like Grok-3 Search and Perplexity Pro answered more correctly than free versions, they still gave wrong answers more confidently — which calls into question the value of their often-astronomical subscription costs.

“This contradiction stems primarily from [the bots’] tendency to provide definitive, but wrong, answers rather than declining to answer the question directly,” the report explains. “The fundamental concern extends beyond the chatbots’ factual errors to their authoritative conversational tone, which can make it difficult for users to distinguish between accurate and inaccurate information.”

Also: Don’t trust ChatGPT Search and definitely verify anything it tells you

“This unearned confidence presents users with a potentially dangerous illusion of reliability and accuracy,” the report added.

Fabricating links

AI models are known to hallucinate regularly. But while all chatbots hallucinated fake articles in their responses, Tow found that Gemini and Grok 3 did so the most — more than half the time. “Even when Grok correctly identified an article, it often linked to a fabricated URL,” the report notes, meaning that Grok could find the right title and publisher, but then manufactured the actual article link.

This pattern is confirmed by an analysis of Comscore datadone by Generative AI, a Northwestern University initiative. The data they studied from July to November of 2024 showed that ChatGPT produced 205 broken URLs. Researchers noted that while publications may occasionally remove stories, resulting in 404 errors. They also noted that due to a lack of archived data, it is “likely that the model has hallucinated plausible-looking links to authoritative news outlets when responding to user queries.”

Also, this absurdly simple trick disables AI in Google Search results.

These findings are troubling given the AI search engines are gaining in popularity . Google AI Mode, released last week, replaces the normal search with a bot (despite its AI Overviews’ unpopularity). ChatGPT, and other popular AI tools, are potential misinformation engines, especially when 400 million people use it weekly. They also pull content from fact-checked, credited news sites.

According to the Tow report, AI tools that miscredit sources or inaccurately represent their work can have a negative impact on publishers’ reputations. Ignoring blocked crawlers.

The news for publishers gets worse: Columbia’s Tow Report found that several chatbots were still able to retrieve articles from publishers who had blocked their crawlers by using Robots Exclusion Protocol or robots.txt. Paradoxically, chatbots did not correctly answer questions about sites that allowed them to access the content. The report states.

AI agents aren’t only assistants: how they’re changing work today.

It’s also worth reading: AI agents aren’t just assistants. Perplexity Othershave been caught doing this last year, but that publishers of any kind are not included. It is not guaranteed that the licensing agreement with them will be correctly cited.

Columbia’s report is only one symptom of an even larger problem. The Generative Artificial Intelligence in the Newsroom Report also discovered that chatbots rarely send traffic to the news websites they’re extracting the information (and, therefore, human labor)from, which Other reports confirm this. ChatGPT only passed 3% of referrals from July to November 2024 to news sites. Perplexity, however, passed 7%. Comparatively, AI tools tended towards educational resources such as Scribd.com and Coursera. They also sent up to 30% of traffic to these sites.

The bottom-line: Original reporting remains a more reliable source of news than what AI tools regurgitate. Check all links before you accept what they say as fact. Use your critical thinking and media literacy to evaluate responses.

Artificial Intelligence

Even premium AI tools can distort news and fabricate links. These are the worst

Failing To Identify News Articles

Confidently giving wrong answers

Paid tiers aren’t more reliable

Fabricating links

Interview: Manish Jethwa, chief technology officer, Ordnance Survey

OpenThoughts: A Scalable Supervised Fine-Tuning SFT Data Curation Pipeline for Reasoning...

Highlighted at CVPR 2025: Google DeepMind’s ‘Motion Prompting’ Paper Unlocks Granular...

Sakana AI Introduces Text-to-LoRA (T2L): A Hypernetwork that Generates Task-Specific LLM...

Recomended

Interview: Manish Jethwa, chief technology officer, Ordnance Survey

OpenThoughts: A Scalable Supervised Fine-Tuning SFT Data Curation Pipeline for Reasoning Models

Highlighted at CVPR 2025: Google DeepMind’s ‘Motion Prompting’ Paper Unlocks Granular Video Control

Sakana AI Introduces Text-to-LoRA (T2L): A Hypernetwork that Generates Task-Specific LLM Adapters (LoRAs) based on a Text Description of the Task

MemOS: A Memory-Centric Operating System for Evolving and Adaptive Large Language Models

Internal Coherence Maximization (ICM): A Label-Free, Unsupervised Training Framework for LLMs