DeepSeek accused of using Google Gemini to train new AI model

DeepSeek recently released an updated version of its AI model, named R1-0528, which is designed for advanced reasoning like and writing computer code. 

The model appears to perform well, but controversy has emerged over how it was trained.

DeepSeek hasn’t said where it got the data to train R1-0528, but some AI experts believe it may have used information from Google’s Gemini AI models. 

One developer, Sam Paech, pointed out that DeepSeek’s model often uses the same kinds of phrases that does. 

Another developer who studies how AI handles free speech noticed that DeepSeek’s thought process, which works through problems, looks like Gemini’s. (Via: )

This isn’t the first time DeepSeek has faced accusations of copying data. Back in December, one of its older models started referring to itself as ChatGPT. 

That raised suspicions that it might have been trained using data from .

There’s also evidence from Microsoft and OpenAI that DeepSeek may have used a method called distillation, essentially copying information from more advanced models to train its own. 

Microsoft reportedly found that data was being extracted from OpenAI accounts late last year, and those accounts were believed to be linked to DeepSeek.

Although distillation isn’t illegal, OpenAI’s rules specifically ban using its model outputs to build competitors. 

But because so much of the internet is now filled with AI-generated content, it’s hard to tell where training data originally came from. 

Even when models sound similar, it might just be because they’ve learned from the same messy sources online. Still, some experts believe DeepSeek may have intentionally used Gemini outputs. 

One researcher said it would make sense for DeepSeek to generate fake training data using powerful tools like Gemini, since it has limited computer resources but plenty of money.

To stop this kind of copying, companies like OpenAI, Google, and Anthropic are adding new security measures, like requiring ID checks or hiding model thought processes, to protect their AI systems from being copied.

Do you think DeepSeek use Google’s data to train its model? Do you think it’s okay, given Google likely doesn’t have a 100% consent for all the data it has used? Tell us below in the comments, or via our  or .

More from this stream

Recomended