OpenAI responds with detailed reasoning traces to DeepSeek competition for o3 mini

Learn More

Subscribe to our daily and weekly emails for the latest updates, exclusive content and industry-leading AI coverage. Learn More


OpenAI now shows more details about the reasoning process for o3 mini, its latest reasoning models. The change was announced by OpenAI’s account X comes at a time when the AI lab faces increased pressure from DeepSeek R1, a rival model that displays all of its reasoning tokens.

Models like o3 and R1 undergo a lengthy “chain of thought” (CoT) process in which they generate extra tokens to break down the problem, reason about and test different answers and reach a final solution. Previously, OpenAI’s reasoning models hid their chain of thought and only produced a high-level overview of reasoning steps. This made it difficult for users and developers to understand the model’s reasoning logic and change their instructions and prompts to steer it in the right direction.

OpenAI considered chain of thought a competitive advantage and hid it to prevent rivals from copying to train their models. But with R1 and other open models showing their full reasoning trace, the lack of transparency becomes a disadvantage for OpenAI.

The new version of o3-mini shows a more detailed version of CoT. Although we still don’t see the raw tokens, it provides much more clarity on the reasoning process.

Why it matters

We found that in our previous experiments with o1 and r1, o1 performed slightly better when solving data analysis and reasoning issues. One of the main limitations was that it was impossible to determine why the model made errors. And it made many mistakes when faced with messy data from the internet. R1’s chain-of-thought enabled us to troubleshoot problems and change prompts to improve reasoning.

In one of our experiments, for example, both models failed in providing the correct answer. R1’s detailed thought process helped us to determine that the problem wasn’t with the model, but rather with the retrieval stage which gathered the information from the internet. In other experiments, the R1 chain of thought provided us with hints in cases where it failed to parse information we gave it, whereas o1 could only give us a rough overview of its response.

In a variation of an experiment we had run with o1, we tested the new model o3-mini. We gave the model a text file with prices for various stocks between January 2024 and January 2025. The file was unformatted and noisy, a mix of plain text and HTML. We asked the model to calculate a portfolio of $140 invested in the Magnificent 7 stock on the first of every month from January 2020 to January 2025. The $140 was distributed evenly among all stocks. (We used the term “Mag 7′ in the prompt to make the problem a little more challenging.

This time, o3 mini’s CoT proved to be very helpful. The model first reasoned what Mag 7 was. It then filtered the data so that only the relevant stocks were kept (to make it more challenging, we included a few non Mag 7 stocks in the data). Finally, the model calculated the monthly amount of investment in each stock and provided the correct answer.

It will take a lot more testing to see the limits of the new chain of thought, since OpenAI is still hiding a lot of details. But in our vibe checks, it seems that the new format is much more useful.

What it means for OpenAI.

DeepSeek-R1 had three distinct advantages over OpenAI reasoning models when it was released: it was open, inexpensive and transparent.

OpenAI has been able to close the gap since then. While o1 costs $ 60 per million output tokens o3-mini only costs $4.40 and outperforms o1 in many reasoning benchmarks. R1 costs between $7 and $8 for every million tokens with U.S. providers. DeepSeek offers R1 for $2.19 per 1 million tokens, but many organizations won’t be able use it as it is hosted in China.

OpenAI has been able to work around the problem of transparency with the new changes to the CoT output.

What OpenAI does about open-sourcing its models remains to be seen. R1 has been adapted, re-forked, and hosted by a number of different labs and businesses since its release. This could make it the preferred reasoning system for enterprises. OpenAI CEO Sam Altman admitted recently that he had been “on the wrong end of history” when it came to open source debate. We’ll see how OpenAI’s future releases reflect this realization.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.


www.aiobserver.co

More from this stream

Recomended