On the last day of OpenAI’s 12 days of ‘shipmas,’ the company unveiled its latest models, o3 and o3-mini, which excel at reasoning and even outperform o1 on a series of benchmarks, including math and science. At launch, OpenAI CEO Sam Altman said o3 was slated to drop at the end of January, and today, the company made good on its promise.
O3-Mini
On Friday, OpenAI OpenAI released the o3 mini model, the most cost efficient model in OpenAI reasoning series. The series was previously made up of o1 & o1-mini. According to the company, the model is also strong in science, mathematics, and coding.
OpenAI o3 Mini is now available on ChatGPT and API.
Pro Users will have unlimited access and Plus & Team Users will have tripled rate limits (compared to o1 mini).
ChatGPT users can use o3 mini for free by selecting the Reason button in the message composer.– OpenAI @OpenAI””https://twitter.com/OpenAI/status/1885406586136383634?ref_src=twsrc^tfw””> January 31, 2025
o3 mini will balance speed and accuracy by using medium reasoning effort. The new model is faster and has higher performance than the o1 mini. Although the original o1 still has more general knowledge, it’s main advantage is that it’s faster.
Benchmark Performance
Expert testers compared the performance of o3 mini to o1 mini and found that o3 mini provided more accurate, well-reasoned, and clearer answers than o1mini. According to the post they preferred o3 mini responses 56% of time and observed a reduction of 39% in major errors.
In addition to human preference evaluations in several STEM benchmarks including the Competition Math (AIME 2020), PhD-level Science Questions(GPQA Diamond), Competition Code (Codeforces), o3 mini with medium reasoning — that is what ChatGPT will defaultly give users — outperformed o1 mini.
Also notable is that o3-mini, with high reasoning effort in the benchmarks, came close to o1 performance, sometimes even surpassing it, as seen in the AIME 2024 above and Software Engineering (SWE-bench Verified) benchmarks. The o3-mini model with medium reasoning effort matched o1’s performance in the Codeforces benchmark.
Safety
OpenAI assessed o3-mini’s safety through public release through jailbreak and disallowed content evaluations. The company found that the model significantly surpasses GPT-4o on the evaluations. OpenAI posted the evaluation results below and also launched an o3-mini System Card, a The detailed results of the evaluations are included in a 37-page PDF .
Access
OpenAI o3 mini is now available to all subscribers of OpenAI’s paid tiers including ChatGPT Plus and Team. Plus and Team users have a three-fold increase in the rate limit. From 50 messages per week with o1 mini to 150 messages. ChatGPT Enterprise will be available in a few days.
Copilot’s powerful ‘Think Deeper” feature is now free for all users. As a paying user, I was not able to access the o3 mini at the time this article was written. Instead, the o1 mini option is still available.
Don’t worry if you don’t have a subscription: You can still test o3-mini from your free account. ChatGPT free users only need to click “Reason” or regenerate an answer in the message textbox. OpenAI CEO Sam Altman has confirmed that free access is available. Post on XOpenAI has not specified any limitations for the new model.