Open-source DeepSeek R1 uses pure reinforcement-learning to match OpenAI O1

January 20, 2025, 9:55 AM (19659002)

VentureBeat/Midjourney

Learn More

Subscribe to our daily and weekly emails for the latest updates on industry-leading AI content. Learn More

Chinese startup focuses on AI DeepSeekknown for challenging leading AI vendor with open-source technology, has just dropped another bombshell – a new open reasoning LLM named DeepSeek R1.

Based upon the recently released DeepSeek V3 mix-of-experts models, DeepSeek R1 matches o1, OpenAI’s frontier reasoning LLM across math, coding, and reasoning tasks. The best part? It is 90-95% cheaper than the other.

This release marks a significant leap forward in open-source. It shows that open models are closing the gap between closed commercial models and artificial general intelligence (AGI) in the race. DeepSeek used R1 to distill the performance of six Llama models and Qwen models to show off its work. In one instance, the distilled Qwen-1.5B model outperformed larger models GPT-4o, and Claude 3.5 Sonnet in math benchmarks.

The distilled models along with the main R1has been open-sourced, and is available on Hugging Face under MIT license

What is DeepSeek R1?

The focus has shifted to artificial general intelligence (AGI), which is a level AI that can perform tasks intellectually like humans. Many teams are doubling-down on improving models’ reasoning abilities. OpenAI’s o1 model was the first to make a notable move in this domain. It uses a chain of thought reasoning process to solve a problem. Through RL (reinforcement-learning, or reward driven optimization), o1 learns how to refine its chain of thoughts and refine its strategies — eventually learning to recognize and fix its mistakes, and try new approaches when current approaches aren’t working.

DeepSeek, which is continuing this work, has released DeepSeek R1, which uses a mixture of RL and fine-tuning under supervision to handle complex reasoning and match o1.

DeepSeek-R1 achieved 79.8% in the AIME 2024 mathematics test and 97.3% in MATH-500. It also achieved a 2,029 score on Codeforces, better than 96.3% human programmers. In contrast, o1-1217 achieved 79.2%, 96.4 % and 96.6% on these benchmarks.

The test also showed strong general knowledge with 90.8% accuracy in MMLU. This was just behind o1’s 91.8%.

Performance of DeepSeek-R1 vs OpenAI o1 and o1-mini

The training pipeline

DeepSeek’s reasoning performance is a major win for the Chinese company in the US-dominated AI field, especially since the entire work, including how it was trained, is open-source. The work is not as simple as it seems. According to the paper that describes the research, DeepSeek R1 was developed as a version enhanced of DeepSeek R1 Zero — a breakthrough reinforcement learning model.

We are living in a timeline where a non-US company is keeping the original mission of OpenAI alive – truly open, frontier research that empowers all. It makes no sense. The most entertaining outcome is the most likely.

DeepSeek-R1 not only open-sources a barrage of models but… pic.twitter.com/M7eZnEmCOY

— Jim Fan (@DrJimFan) January 20, 2025

The company first used DeepSeek-V3-base as the base model, developing its reasoning capabilities without employing supervised data, essentially focusing only on its self-evolution through a pure RL-based trial-and-error process. This ability was developed from the work itself, and allows the model to solve increasingly complex reasoning problems by leveraging extended testing time computations to explore and refine their thought processes.

The researchers note that DeepSeek R1 Zero naturally developed a number of powerful and interesting reasoning behavior during training. DeepSeek R1 Zero is a super performer on reasoning benchmarks after thousands of RL steps. The pass@1 score for AIME 2024 increased from 15.6% to 70.0%. With majority voting, it improved to 86.7%. This is the same as OpenAI-o1-0912’s performance. The company used a multi-stage method that combined supervised learning with reinforcement learning to fix this problem.

The researchers explained that they began by collecting cold-start data in order to fine-tune DeepSeek-V3 Base model. “Following that, we perform reasoning oriented RL such as DeepSeek R1- Zero. After the RL process has reached convergence, we create new SFTs by rejecting samples on the RL Checkpoint. We combine this with supervised DeepSeek V3 data in domains like writing, factual QA and self-cognition. Then, we retrain the DeepSeek V3-Base Model. After fine-tuning the model with the new data and taking into account all scenarios, the checkpoint goes through an additional RL procedure. After these steps, the checkpoint we obtained was DeepSeek-R1, achieving performance comparable to OpenAI-o1-1217.”

DeepSeek R1 is much more affordable than OpenAI o1

Besides its enhanced performance, which nearly matches OpenAI o1 in benchmarks, it’s also very affordable. OpenAI’s o1 costs $60 per million tokens for inputs and $15 per million tokens for outputs. DeepSeek Reasoner is based on R1 model. Costs $0.55 for each million input tokens and $2.19 for each million output tokens.

Sooo @deepseek_ai's reasoner model, which sits somewhere between o1-mini & o1 is about 90-95% cheaper 👀 https://t.co/ohnI6dtPRC pic.twitter.com/Qn78yIGUtt

— Emad (@EMostaque) January 20, 2025

The model can be tested as “DeepThink” on the DeepSeek is a chat platform similar to ChatGPT. Users can access the model weights, code repository, and Hugging Face under an MIT License, or they can use the API to integrate directly.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

Open-source DeepSeek R1 uses pure reinforcement-learning to match OpenAI O1 –

What is DeepSeek R1?

The training pipeline

DeepSeek R1 is much more affordable than OpenAI o1

Windsurf Launches SWE-1: A Frontier AI Model Family for End-to-End Software...

LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a...

This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language...

Google Researchers Introduce LightLab: A Diffusion-Based AI Method for Physically Plausible,...

Recomended

Windsurf Launches SWE-1: A Frontier AI Model Family for End-to-End Software Engineering

LLMs Struggle with Real Conversations: Microsoft and Salesforce Researchers Reveal a 39% Performance Drop in Multi-Turn Underspecified Tasks

This AI paper from DeepSeek-AI Explores How DeepSeek-V3 Delivers High-Performance Language Modeling by Minimizing Hardware Overhead and Maximizing Computational Efficiency

Google Researchers Introduce LightLab: A Diffusion-Based AI Method for Physically Plausible, Fine-Grained Light Control in Single Images

AWS Open-Sources Strands Agents SDK to Simplify AI Agent Development

AlphaEvolve: Google DeepMind’s Groundbreaking Step Toward AGI