Uncategorized

DeepSeek’s R1’s bold wager on reinforcement learning: how it outpaced OpenAI for 3% less cost

January 27, 2025

January 25, 2025 at 6:57 PM (19659002)Image Credit: VentureBeat via ChatGPT

Learn More

Subscribe to our daily and weekly emails for the latest updates on AI coverage. Learn More

DeepSeek R1 was released on Monday, causing a shockwave in the AI community. It has challenged assumptions about what is required to achieve cutting edge AI performance. This open-source model, which is comparable to OpenAI’s o1 for only 3%-5% less cost, has not only captured developers’ attention but also challenged enterprises to rethink AI strategies.

This model has rocketed up to the top-trending models being downloaded on HuggingFace. As of this writing, it has been downloaded 109,000 times) as developers rush to test it and understand what it means for AI development. Users are commenting on DeepSeek’s search feature (which can be found at DeepSeek siteis now available. OPENAI, PERPLEXITYand Google’s Gemini Deep Research are all competitors.

The implications of enterprise AI strategies are profound. With reduced costs and an open access, enterprises have now an alternative to expensive proprietary models like OpenAI. DeepSeek’s release may democratize access, allowing smaller organizations to compete in the AI arms races. This story will explain how DeepSeek achieved this feat and what it means to the many users of AI models. DeepSeek’s breakthrough offers a blueprint to cost-efficient innovation for enterprises developing AI-driven products. It challenges assumptions about OpenAI’s dominance. The “how” DeepSeek achieved its feats should be the most instructive part of this story.

DeepSeek’s breakthrough – moving to pure reinforcement learning

Back in November, DeepSeek announced that it had surpassed OpenAI’s R1-lite preview model. However, at the time they only offered a limited version of this model. Monday’s release of R1 includes the accompanying documentation. In a technical paperthe company revealed an innovative innovation: a deliberate departure away from the conventional supervised-fine-tuning process (SFT) widely used to train large language models (LLMs).

As a standard step in AI, SFT involves training models using curated datasets. This is often referred as “chain-of-thought” (CoT). It is considered crucial for improving reasoning abilities. DeepSeek, however, challenged this assumption, choosing to rely solely on reinforcement learning (RL), to train the model. This bold move forced DeepSeek R1 to develop independent reasoning skills, avoiding the brittleness that is often introduced by prescriptive data sets. The team had to correct some flaws, which led them to reintroduce SFT in the final stages of the model building. However, the results confirmed that reinforcement learning alone was capable of driving substantial performance gains.

The company used open source to get a lot of its work done

First a little background on how DeepSeek arrived at where it is today. DeepSeek is a 2023 spin-off of Chinese hedge-fund High-Flyer Quant. It began by developing AI models before releasing them to the public. It’s unclear how the company developed its AI models, but it is likely that it built on the open projects created by Meta, such as the Llama Model and the ML library Pytorch.

High-Flyer Quant used over 10,000 Nvidia graphics cards to train its models before U.S. import restrictions. Despite trade barriers, the number of GPUs has reportedly increased to 50,000 via alternative supply routes. This pales in comparison to leading AI laboratories like OpenAI Google and Anthropic. .

DeepSeek’s ability to achieve competitive outcomes with limited resources is highlighted How ingenuity, resourcefulness and creativity can challenge the high cost paradigm of training LLMs that are up-to-date

DeepSeek’s budget is not known

but it was reportedly trained on a budget of $5.58 million over a period of two months. Jim Fan, an Nvidia engineersays: Although the company hasn’t revealed the exact training datasets it used (sidenote: critics claim this means DeepSeek’s not truly open-source), new techniques make training with web and open datasets increasingly available. It is difficult to estimate the total cost for training DeepSeek R1. While running 50,000 GPUs would suggest significant expenditures (potentially tens of millions of dollars), exact figures are still speculative.

It’s clear that DeepSeek was innovative from the start. Last year, there were reports about some of the initial innovations that DeepSeek was making. Mixture of Expertsand Multi-Headed Latent Attention

are two examples.

Update: Here is an updated version of this very Jeffrey Emanuel is a former quant and now entrepreneur who has just published a detailed reporton DeepSeek’s infrastructure innovations. It’s a long report but it is very good. The “Theoretical threat” section contains three other innovations that are worth mentioning. These include: (1) mixed precision training, which allowed DeepSeek’s use of 8-bit floating-point numbers throughout the training instead of 32-bit, allowing DeepSeek a dramatic reduction in memory requirements per GPU resulting in fewer GPUs being needed; (2) multi token predicting during the inference; and (3) advancements in GPU communication efficiency via their DualPipe algorithms, resulting higher GPU utilization.

How DeepSeek-R1 reached the “aha moment”.

To arrive at DeepSeek-R1’s final iteration, a model intermediate, DeepSeek-R1-Zero was first trained using only reinforcement learning. DeepSeek rewarded both the correct answers and logical processes that led to them by relying only on RL.

The model’s ability to prioritize tasks by difficulty was demonstrated when it began allocating more processing time to complex problems. DeepSeek researchers called this an “aha” moment, where the model identified and articulated new solutions to challenging problems. (See screenshot below). This milestone demonstrated the power of reinforcement-learning to unlock advanced reasoning abilities without relying upon traditional training methods such as SFT.

Source: DeepSeek-R1 paper. Don’t let this graphic intimidate you. The key takeaway is the red line, where the model literally used the phrase “aha moment.” Researchers latched onto this as a striking example of the model’s ability to rethink problems in an anthropomorphic tone. For the researchers, they said it was their own “aha moment.”

The researchers conclude: “It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies.”

More than RL

It’s true, the model needed more. The paper continues to discuss how despite RL creating unexpected reasoning behaviors and powerful reasoning, this intermediate DeepSeek R1-Zero model did face some challenges. These included poor readability and language mixing (starting with Chinese and switching to English, for instance). The team decided to create a brand new model that would eventually become the DeepSeek R1 model. This model was based on V3 base and was first injected limited SFT, focusing on a “small quantity of long CoT data”or what was called Cold-start data, in order to fix some of these challenges. Then, it was put to the same reinforcement-learning process as R1 Zero. The paper discusses how R1 was fine-tuned in the final stages.

The ramifications of the release

A question that arises is why there was such a surprise. It’s not as if open source models are something new. Open Source models are gaining momentum and a lot of logic. We reported recently that these models will win in the enterprise because of their low cost and malleability.

Meta’s open-weights Llama 3 model, for example, was a huge hit last year as developers fine-tuned it to create their own custom models. DeepSeek-R1 has been used to distill its reasoning down into a variety of smaller models. The difference is that DeepSeek provides industry-leading performance. This includes running tiny models of the model, for example, on mobile phones.

DeepSeek R1 is not only faster than the leading alternative open source, Llama 3, but also more reliable. It reveals its entire chain of thinking of its answers. Meta’s Llama is not doing this by default. Llama must be aggressively prompted to do this

Transparency has also given OpenAI a bad PR, as it has hidden its chains of reasoning from users for competitive reasons, and to avoid confusing users when a model makes a mistake. Transparency allows developers pinpoint and correct errors in a model’s reasoning, streamlining customisations to meet enterprise needs more effectively.

DeepSeek’s success highlights a wider shift in the AI landscape for enterprise decision-makers: leaner and more efficient development practices are becoming increasingly viable. It may be necessary for organizations to reevaluate the benefits of their partnerships with proprietary AI service providers.

No massive lead

Although DeepSeek has made a number of innovative products, it is not a market leader. It published its research so that other model companies can learn from it and adapt. Meta and Mistral, a French open-source model company, are a little behind but they will catch up in a few months. Yann Lecun, Meta’s lead research scientist, says Put it: “The idea that everyone benefits from each other’s ideas. No one is ‘faster’ than anyone else and no country is ‘losing’ to another. No one has the monopoly of good ideas. Everyone is learning from each other.” It’s the execution that counts.

In the end, consumers, startups, and other users will benefit the most because DeepSeek will continue to reduce the cost of using its models (again, excluding the cost of running models for inference). This rapid commoditization may pose challenges, and even massive pain, for leading AI providers who have invested heavily in prop rietary technology. Many commentators, including Chamath Palihapitiya an investor and former Meta executive, have said that this could mean OpenAI and others will waste years of OpEx, CapEx and other expenditure

There are many comments about whether it’s ethical to use the DeepSeek R1 model due to the biases ingrained in it by Chinese law, such as that it shouldn’t be answering questions about the brutal crackdown on Tiananmen square by the Chinese government. Many developers do not have ethical concerns about biases. They see them as rare edge cases that can be reduced by fine-tuning. They also point out that models from OpenAI, and other companies, have similar, but different biases. Meta’s Llama is a popular open-source model, despite the fact that its data sets are not made public and despite hidden biases. As a result, lawsuits are being filed against the company

OpenAI’s $500 billion Stargate project raises many questions regarding the ROI of its large investments

All this raises serious concerns about the investment plans pursued both by OpenAI and Microsoft. OpenAI’s Stargate $500 billion project reflects the company’s commitment to building massive, high-performance data centers that will power its advanced models. This strategy, backed by partners such as Oracle and Softbank is based on the belief that artificial general intelligence (AGI), requires unprecedented computing resources. DeepSeek’s demonstration that a high-performing AI model can be achieved at a fraction the cost of OpenAI’s investment raises doubts about the sustainability of this strategy.

Entrepreneur Arnaud Bernard This dynamic was capturedby contrasting China’s frugal and decentralized innovation with U.S. reliance upon centralized, resource intensive infrastructure: “It is about the world realizing China has caught up – and in some areas, overtaken – the U.S. Doubao-1.5 Prowas announced. It includes a “Deep Thinking mode” that is superior to OpenAI’s AIME benchmark o1.

Want a deeper look at how DeepSeek R1 is reshaping AI? Watch our in-depth YouTube discussion where I explore this breakthrough along with ML developer Sam Witteveen. Together, we explore the technical details, the implications for enterprises and the future of AI.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

DeepSeek’s breakthrough – moving to pure reinforcement learning

The company used open source to get a lot of its work done

DeepSeek’s budget is not known

How DeepSeek-R1 reached the “aha moment”.

More than RL

The ramifications of the release

No massive lead

OpenAI’s $500 billion Stargate project raises many questions regarding the ROI of its large investments

RELATED ARTICLES

All-in-1 AI Platform 1minAI is Now Almost Free. Get Lifetime Access...

Safaricom to Invest $500 Million in AI Infrastructure Across East Africa

Africa Will Make Real Gains From Artificial Intelligence