Technology

AI models are using material from retracted scientific papers

September 24, 2025

Challenges of AI Chatbots Using Retracted Scientific Papers

AI’s Struggle with Retracted Research and Its Impact on Trust

Recent investigations reveal that some AI chatbots incorporate information from retracted scientific studies when responding to queries. This issue, highlighted by independent research, raises serious concerns about the dependability of AI-driven tools in interpreting and summarizing scientific literature. Such shortcomings could hinder global efforts by governments and industries to confidently adopt AI technologies for research and decision-making.

How AI Chatbots Handle Scientific References

AI-powered chatbots and search engines are notorious for occasionally generating inaccurate citations or referencing non-existent sources. Even when citing legitimate studies, the problem intensifies if those studies have been officially withdrawn from the scientific record. For example, Dr. Weikuan Gu, a medical researcher at the University of Tennessee, Memphis, conducted a study demonstrating that AI chatbots like OpenAI’s GPT-4o sometimes present summaries of retracted medical imaging papers without alerting users to their invalid status. This can mislead users who rely solely on the chatbot’s output without verifying the original source.

Empirical Studies on AI’s Recognition of Retracted Papers

Gu’s team tested GPT-4o by querying it about 21 retracted medical imaging articles. The chatbot referenced five of these retracted papers, but only flagged three with cautionary notes. In a broader study conducted in August, researchers evaluated ChatGPT-4o Mini’s responses to 217 retracted or low-quality scientific articles across various disciplines. Alarmingly, none of the chatbot’s answers acknowledged the retraction status or quality concerns. As of mid-2024, no similar assessments have been published regarding GPT-5, which launched recently.

Growing Reliance on AI for Scientific and Medical Information

With the public increasingly turning to AI chatbots for medical advice and health diagnostics, and academics using AI to review and summarize scientific papers, the stakes are high. The trend is expected to accelerate, especially following the US National Science Foundation’s recent $75 million investment to develop AI models tailored for scientific research. This amplifies the urgency to address AI’s limitations in handling retracted or flawed studies.

Expert Opinions on the Importance of Retraction Awareness

Yuanxi Fu, an information science expert at the University of Illinois Urbana-Champaign, emphasizes that AI tools accessible to the public must treat retraction status as a critical quality marker. She notes a consensus within the scientific community that retracted papers should be considered invalid and that lay users deserve clear warnings about such content. Despite these concerns, OpenAI has not publicly commented on this issue.

Widespread Issue Across Multiple AI Research Tools

This challenge is not unique to ChatGPT. In June, a comparative test of AI research assistants-including Elicit (now integrated into the Allen Institute’s Asta), Ai2 ScholarQA, Perplexity, and Consensus-revealed that many referenced retracted articles without indicating their retraction. For instance, Ai2 ScholarQA cited 17 retracted papers, and Consensus referenced 18, all without disclaimers.

Industry Efforts to Address Retraction Data Integration

Some companies have begun improving their systems. Christian Salem, Consensus cofounder, explained that their platform recently enhanced its database by incorporating retraction information from publishers, aggregators, and web crawlers. Retraction Watch, a prominent archive of retracted studies, also supports these efforts. In a recent test, Consensus cited only five retracted papers, a significant reduction. Meanwhile, Elicit removes flagged retracted articles using OpenAlex data, though Ai2 ScholarQA currently lacks automatic retraction filtering. Perplexity openly acknowledges its limitations in accuracy.

Limitations of Retraction Databases and Labeling Practices

Relying solely on retraction databases is insufficient. Ivan Oransky, co-founder of Retraction Watch, cautions that no database is fully comprehensive due to resource constraints. Additionally, Caitlin Baker, a discovery tool specialist at the University of Regina, points out that retractions and related notices-such as “expressions of concern,” “errata,” or “corrections”-are inconsistently labeled by publishers. These annotations may arise from various issues, including data integrity, methodology flaws, or content disputes.

Challenges Posed by Preprints and Distributed Copies

Many researchers share their work on preprint servers and repositories, scattering multiple versions of the same paper across the internet. This complicates efforts to track retractions effectively. Moreover, AI training datasets may not be updated in real time; if a paper is retracted after the model’s training cutoff, the AI will continue to treat it as valid. Aaron Tay, a librarian at Singapore Management University, highlights that most academic search engines do not cross-reference retraction data dynamically, leaving users vulnerable to outdated or incorrect information.

Proposed Solutions for Enhancing AI Reliability in Scientific Contexts

Experts advocate for enriching AI responses with contextual information, such as peer reviews and community critiques from platforms like PubPeer, alongside the original publications. Leading journals, including Nature and BMJ, publish retraction notices as standalone articles linked to the original papers. Yuanxi Fu suggests that AI developers should leverage these notices and related news coverage to improve the accuracy of their models’ outputs.

Final Thoughts: Caution and Critical Evaluation Needed

Both AI users and developers must exercise vigilance when engaging with AI-generated scientific content. Aaron Tay advises skepticism, noting that AI tools are still in their infancy and prone to errors. As AI continues to evolve, integrating robust mechanisms to identify and flag retracted research will be essential to maintain trust and ensure responsible use.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}