DeepSeek accused of using Google Gemini to train new AI model

DeepSeek recently released an updated version of its AI model, named R1-0528, which is designed for advanced reasoning like and writing computer code.

The model appears to perform well, but controversy has emerged over how it was trained.

DeepSeek hasn’t said where it got the data to train R1-0528, but some AI experts believe it may have used information from Google’s Gemini AI models.

One developer, Sam Paech, pointed out that DeepSeek’s model often uses the same kinds of phrases that does.

Another developer who studies how AI handles free speech noticed that DeepSeek’s thought process, which works through problems, looks like Gemini’s. (Via: )

If you’re wondering why new deepseek r1 sounds a bit different, I think they probably switched from training on synthetic openai to synthetic gemini outputs.

— Sam Paech (@sam_paech)

This isn’t the first time DeepSeek has faced accusations of copying data. Back in December, one of its older models started referring to itself as ChatGPT.

That raised suspicions that it might have been trained using data from .

There’s also evidence from Microsoft and OpenAI that DeepSeek may have used a method called distillation, essentially copying information from more advanced models to train its own.

Microsoft reportedly found that data was being extracted from OpenAI accounts late last year, and those accounts were believed to be linked to DeepSeek.

Although distillation isn’t illegal, OpenAI’s rules specifically ban using its model outputs to build competitors.

But because so much of the internet is now filled with AI-generated content, it’s hard to tell where training data originally came from.

Even when models sound similar, it might just be because they’ve learned from the same messy sources online. Still, some experts believe DeepSeek may have intentionally used Gemini outputs.

One researcher said it would make sense for DeepSeek to generate fake training data using powerful tools like Gemini, since it has limited computer resources but plenty of money.

To stop this kind of copying, companies like OpenAI, Google, and Anthropic are adding new security measures, like requiring ID checks or hiding model thought processes, to protect their AI systems from being copied.

Do you think DeepSeek use Google’s data to train its model? Do you think it’s okay, given Google likely doesn’t have a 100% consent for all the data it has used? Tell us below in the comments, or via our or .

DeepSeek accused of using Google Gemini to train new AI model

Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates...

Alibaba Qwen Team Releases Qwen3-Embedding and Qwen3-Reranker Series – Redefining Multilingual...

Darwin Gödel Machine: A Self-Improving AI Agent That Evolves Code Using...

A Comprehensive Coding Tutorial for Advanced SerpAPI Integration with Google Gemini-1.5-Flash...

Recomended

Teaching AI to Say ‘I Don’t Know’: A New Dataset Mitigates Hallucinations from Reinforcement Finetuning

Alibaba Qwen Team Releases Qwen3-Embedding and Qwen3-Reranker Series – Redefining Multilingual Embedding and Ranking Standards

Darwin Gödel Machine: A Self-Improving AI Agent That Evolves Code Using Foundation Models and Real-World Benchmarks

A Comprehensive Coding Tutorial for Advanced SerpAPI Integration with Google Gemini-1.5-Flash for Advanced Analytics

Meta AI’s new smart glasses experiment can see what you do and tell you how you feel

Google claims Gemini 2.5 Pro Preview beats DeepSeek R1 Grok 3 Beta and in coding performance