AI models are using material from retracted scientific papers

September 23, 2025

Challenges of AI Chatbots Using Retracted Scientific Papers

Recent investigations reveal that some AI chatbots depend on discredited research from retracted scientific articles when generating responses. This issue, highlighted by independent studies, casts doubt on the dependability of AI-driven tools in assessing scientific literature. It also poses challenges for governments and industries aiming to integrate AI technologies into scientific research workflows.

Why Retracted Papers Pose a Risk for AI Responses

AI-powered search engines and conversational agents are known to occasionally produce inaccurate information or fabricate references. However, even when AI cites genuine scientific papers, the inclusion of retracted studies can mislead users. Weikuan Gu, a medical researcher at the University of Tennessee, emphasizes that if users accept AI-generated answers without verifying the retraction status of cited papers, it can propagate misinformation.

Testing AI Models Against Retracted Medical Research

Gu’s team evaluated OpenAI’s ChatGPT (GPT-4o) by querying it with data derived from 21 retracted medical imaging studies. The chatbot referenced five retracted papers, but only flagged three with cautionary notes. Although it cited valid, non-retracted studies for other questions, the AI appeared unaware of the retraction status in many cases. Another research group tested ChatGPT-4o mini on 217 retracted or low-quality papers across various scientific disciplines and found that the AI failed to mention any retractions or quality concerns. Notably, no similar assessments have been published for the recently released GPT-5 model.

Growing Dependence on AI for Scientific Literature Review

AI chatbots are increasingly used by the public, students, and researchers to obtain scientific information and summarize complex studies. This trend is expected to accelerate, especially with significant investments like the US National Science Foundation’s $75 million funding initiative launched in August 2023 to develop AI models tailored for scientific research.

The Importance of Retraction Awareness in Public-Facing AI Tools

Yuanxi Fu, an information science expert at the University of Illinois Urbana-Champaign, stresses that AI tools accessible to the general public must clearly indicate when a paper has been retracted. Retracted studies are effectively removed from the scientific record, and users outside the research community should be explicitly warned to avoid relying on invalidated findings. OpenAI has not publicly commented on these findings.

Widespread Issues Across AI Research Tools

This problem extends beyond ChatGPT. In June 2023, several AI platforms designed for academic research-such as Elicit, Ai2 ScholarQA (now integrated into the Allen Institute’s Asta), Perplexity, and Consensus-were tested using the same set of 21 retracted papers. The results showed that Elicit cited five retracted studies, Ai2 ScholarQA referenced 17, Perplexity 11, and Consensus 18, none of which flagged the retractions.

Industry Efforts to Address Retraction Data Integration

Some companies have begun improving their systems. Christian Salem, cofounder of Consensus, explains that their platform now incorporates retraction data from multiple sources, including publishers, data aggregators, independent web crawlers, and Retraction Watch’s manually curated database. After these enhancements, Consensus reduced its citations of retracted papers to five in a recent test.

Elicit reports removing retracted papers identified by the OpenAlex scholarly catalogue but continues to work on consolidating retraction sources. Ai2 currently lacks automatic detection or removal of retracted content, while Perplexity acknowledges it cannot guarantee complete accuracy.

Limitations of Retraction Databases and Publisher Practices

Ivan Oransky, cofounder of Retraction Watch, cautions that no retraction database is fully comprehensive due to the labor-intensive nature of manual curation. Additionally, publishers vary widely in how they label retractions and related notices. Terms like “correction,” “expression of concern,” “erratum,” and “retracted” may be applied inconsistently, reflecting different issues such as methodological flaws, data problems, or conflicts of interest.

Complications from Preprints and Distributed Copies

Many researchers share their work on preprint servers and repositories, resulting in multiple versions scattered across the internet. AI models trained on datasets with cutoff dates may not reflect recent retractions, leading to outdated or inaccurate responses. Aaron Tay, a librarian at Singapore Management University, notes that most academic search engines do not perform real-time retraction checks, making users reliant on the accuracy of the underlying data corpus.

Enhancing AI Reliability Through Contextual Information

Experts advocate for providing AI models with richer contextual data when generating answers. This could include integrating peer review comments, critiques from platforms like PubPeer, and openly accessible retraction notices published by journals such as Nature and the BMJ. These notices are often available as separate articles outside paywalls, offering valuable metadata that AI systems should leverage.

The Role of Users and Developers in Ensuring Accuracy

Both AI developers and users must exercise caution. Aaron Tay advises skepticism when interpreting AI-generated scientific information, especially given the early stage of these technologies. As AI tools evolve, continuous efforts to improve data quality and transparency will be essential to maintain trust in their scientific outputs.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}