How a top Chinese AI-model overcame US sanctions.

January 27, 2025

This could be a truly egalizing breakthrough, which is great for researchers, developers, and especially those from the Global South, says Hancheng Cao. He is an assistant professor of information systems at Emory University.

DeepSeek’s success is all the more remarkable when you consider the restrictions facing Chinese AI companies, such as the increased US export controls for cutting-edge chips. Early evidence suggests that these measures may not be working as intended. The sanctions are not weakening China’s AI capabilities. Instead, they seem to be driving startups such as DeepSeek to innovate by focusing on efficiency, resource-pooling and collaboration.

According to Zihan Wang a former DeepSeek worker and current PhD student at Northwestern University, DeepSeek had reworked its training process in order to reduce the strain placed on its GPUs. These GPUs are a Chinese-market variant released by Nvidia, with performance caps set at half that of its top products. Researchers have praised DeepSeek R1 for its ability in tackling complex reasoning tasks. This is especially true when it comes to mathematics and coding. The model uses a “chain-of-thought” approach, similar to ChatGPT o1, that allows it to solve problems by processing questions step by step.

Dimitris Papailiopoulos is a principal researcher at Microsoft’s AI Frontiers Research lab. He says that what surprised him most about R1 was its engineering simplicity. “DeepSeek focused on accurate answers instead of detailing every logical steps, significantly reducing computation time while maintaining high levels of effectiveness,” he explains.

DeepSeek also released six smaller versions R1 that can run locally on laptops. It claims that on some benchmarks, one of these even outperforms OpenAI’s o1 mini. DeepSeek failed to respond to MIT Technology Review’srequest for comment.

Despite all the buzz surrounding R1, DeepSeek is still relatively unknown. Liang Wenfeng founded the company in Hangzhou, China in July 2023. He is an alumnus from Zhejiang University who has a background in electronic and information engineering. High-Flyer was the incubator, a hedge-fund that Liang founded back in 2015. Liang, like Sam Altman from OpenAI, aims to create artificial general intelligence (AGI), which is a form AI that can compete with or even beat humans in a variety of tasks.

To train large language models (LLMs), a team of highly-trained researchers is required, as well as a lot of computing power. Kai-Fu Lee is a veteran entrepreneur, former head of Google China and a member of the Chinese media outlet LatePost. He said in a recent interview that it is usually only “front row players” who build foundation models like ChatGPT because they are so resource-intensive. The US export controls for advanced semiconductors further complicate the situation. High-Flyer’s decision to enter the AI market is directly linked to these restrictions. Liang had a large stockpile of Nvidia A100 processors, which are now banned for export to China, long before the sanctions were expected. The Chinese media outlet 36Kr claims that the company has more than 10,000 units in stock. However, Dylan Patel, founder and CEO of AI research consultancy SemiAnalysis estimates that there are over 20,000 units. Estimates suggest that it has at minimum 50,000. Liang founded DeepSeek after recognizing the potential of the stockpile to train AI. DeepSeek was able to combine them with the lower-power chip to develop its models.

Tech titans like Alibaba and ByteDance as well as a few startups with deep-pocketed investor dominate the Chinese AI market, making it difficult for small or medium-sized businesses to compete. Rare is a company like DeepSeek that has no plans to raise money.

Zihan, the former DeepSeek worker, told MIT Technology Review that he was given the freedom to experiment while working at DeepSeek. “A luxury that few new graduates would be able to get at any other company,” he said. “We [most Chinese companies] need to consume twice as much computing power to achieve similar results. This could require up to four-times more computing power, when combined with data efficiency gaps. “Our goal is to continually close these gaps,” said he.

DeepSeek has found ways to reduce the amount of memory used and speed up calculations without sacrificing accuracy. Wang says that the team enjoys turning a hardware problem into an opportunity to innovate.

Liang remains deeply involved in DeepSeek’s research process and runs experiments with his team. Wang explains that the team is united by a culture of collaboration and a commitment to hardcore research.

Chinese companies are increasingly adopting open-source principles, as well as prioritizing their efficiency. Alibaba Cloud has released more than 100 open-source AI models that support 29 languages, and cater to various applications including coding and math. Startups like Minimax and 01AI have also open-sourced their AI models.

A white paper published last year by China Academy of Information and Communications Technology (a state-affiliated institute) shows that the number of AI large languages models worldwide is 1,328. 36% of these models originate in China. China is now the second largest contributor to AI after the United States. Thomas Qitong Cao is an assistant professor at Tufts University who specializes in technology policy.

According to Matt Sheehan, AI researcher at Carnegie Endowment for International Peace, “the US export control has essentially forced Chinese companies into a corner that forces them to be more efficient with their limited computer resources.” “We will probably see a lot more consolidation in the future due to the lack compute.”

This might have already started to happen. Alibaba Cloud announced two weeks ago that it had partnered with Beijing-based startup Kai-Fu Lee to merge research teams and create an “industrial large-model laboratory.” “The rapid development of AI requires agility from Chinese companies to survive.”