OpenAI launches new model o3-mini

Openai

On the last day of OpenAI’s 12 days of ‘shipmas,’ the company unveiled its latest models, o3 and o3-mini, which excel at reasoning and even outperform o1 on a series of benchmarks, including math and science. At launch, OpenAI CEO Sam Altman said o3 was slated to drop at the end of January, and today, the company made good on its promise.

O3-Mini

On Friday, OpenAI OpenAI released the o3 mini model, the most cost efficient model in OpenAI reasoning series. The series was previously made up of o1 & o1-mini. According to the company, the model is also strong in science, mathematics, and coding.

OpenAI o3 Mini is now available on ChatGPT and API.
Pro Users will have unlimited access and Plus & Team Users will have tripled rate limits (compared to o1 mini).
ChatGPT users can use o3 mini for free by selecting the Reason button in the message composer.

– OpenAI @OpenAI””https://twitter.com/OpenAI/status/1885406586136383634?ref_src=twsrc^tfw””> January 31, 2025

o3 mini will balance speed and accuracy by using medium reasoning effort. The new model is faster and has higher performance than the o1 mini. Although the original o1 still has more general knowledge, it’s main advantage is that it’s faster.

Benchmark Performance

Expert testers compared the performance of o3 mini to o1 mini and found that o3 mini provided more accurate, well-reasoned, and clearer answers than o1mini. According to the post they preferred o3 mini responses 56% of time and observed a reduction in 39% major errors.

In addition to human preference evaluations in several STEM benchmarks including the Competition Math (AIME 2020), PhD-level Science Questions(GPQA Diamond), Competition Code (Codeforces), o3 mini with medium reasoning — that is what ChatGPT will defaultly give users — outperformed o1 mini.

Openai

Also notable is that o3-mini, with high reasoning effort in the benchmarks, came close to o1 performance, sometimes even surpassing it, as seen in the AIME 2024 above and Software Engineering (SWE-bench Verified) benchmarks. The o3-mini model with medium reasoning effort matched o1’s performance in the Codeforces benchmark.

Safety

OpenAI assessed o3-mini’s safety through public release through jailbreak and disallowed content evaluations. The company found that the model significantly surpasses GPT-4o on the evaluations. OpenAI posted the evaluation results below and also launched an o3-mini System Card, a The detailed results of the evaluations are included in a 37-page PDF .

Access

OpenAI o3 mini is now available to all subscribers of OpenAI’s paid tiers including ChatGPT Plus and Team. Plus and Team users have a three-fold increase in the rate limit. From 50 messages per week with o1 mini to 150 messages. ChatGPT Enterprise will be available in a few days.

Copilot’s powerful ‘Think Deeper” feature is now free for all users. I was a paid subscriber at the time this article was written, but did not have access to o3 mini. Instead, the o1 mini option is still available.

Don’t worry if you donā€™t have a subscription: You can still test o3-mini from your free account. ChatGPT free users only need to click “Reason” or regenerate an answer in the message textbox. OpenAI CEO Sam Altman has confirmed that free access is available. Post on XOpenAI has not specified any limitations for the new model.

Artificial Intelligence

www.aiobserver.co

More from this stream

Recomended