OpenAI’s GPT-5 with up to 80% less hallucinations is here

OpenAI launched GPT-5, its most advanced model to date, on Thursday.

OpenAI CEO Sam Altman, an AI hype man, described it as talking to a personal expert who can write applications on request. “We think this idea of software on demand is going to be one of the defining characteristics of the GPT-5 era,” He said, kicking-off a 75-minute presentation filled with code demonstrations.

OpenAI claims that GPT-5 is superior to earlier models in terms of coding, writing and math skills, as well as visual perception. It also reduces hallucinations and deceptive behaviors.

Youtube Video.

GPT-5, as you may know, is not a single model. OpenAI routes prompts to a variety of models based on signals such as the user’s intention or the request’s complexity.

OpenAI states that simple prompts may be routed to an efficient, small version of the model, which can respond quickly, without “thinking”while a larger and deeper reasoning model could be used to handle complex or nuanced task. This capability is activated automatically based on the user’s prompts. If desired, paid users can also toggle reasoning functionality permanently.

The routing model is reportedly constantly being trained on new input signals in order to make it smarter when it comes to which model it routes a request to and how to trigger reasoning functionality. OpenAI claims that it plans to eventually integrate them into a single model. OpenAI claims that this architecture is not only faster but also more efficient than previous designs. The company stated in a blogthat

“GPT-5 gets more value out of less thinking time. In our evaluations, GPT-5 — with thinking — performs better than OpenAI o3 with 50-80 percent less output tokens across capabilities, including visual reasoning, agentic coding, and graduate-level scientific problem solving,” they were able to achieve this. ChatGPT users of ChatGPT Plus and Free will be able to access GPT-5, GPT-5 Mini, and Enterprise and Pro users will get a Pro version that can reason longer. The API users will have access to the Nano version, which is available at a discounted price, along with the standard and mini versions.

Revolutionary Upgrade or Overhyped Iteration

OpenAI’s presentation included many hyperbolic statements and demos that GPT-5 was its smartest model yet. However, the company’s results showed a different story. They showed a lot of iterative improvements. Your eyes are not deceiving. GPT-5 only shows iterative improvements on math benchmarks such as AIME 2025. Click to enlarge.

On the AIME math bench, GPT-5 pro managed a 1.6-point lead over the previous flagship o3 models when using tools. And a 7.8-point advantage without them. For free tier users the new models are an upgrade from GPT4o. GPT 5 (non Pro) managed a 57.5-point advantage. It was the same story with the FrontierMath math benches and the HMMT mathematics benches.

GPT-5 maintained single-digit lead over last generation’s models in nearly all benchmark suites.

“Benchmarks, they’re exciting numbers, but we’re starting to saturate them, like when you’re moving between 98% and 99% in some benchmark it means you need something else to really capture how great the model is,” OpenAI president Greg Brockman admitted.

It is no wonder that so much of the presentation focused on demos and testimonials. Altman was especially excited about GPT-5’s performance when it comes to health-related queries. Altman said. ChatGPT appears to have supplanted WebMD in self-diagnosis.

In one testimonial, it appeared that the company suggested users who were having trouble understanding their health conditions upload medical documents to ChatGPT so GPT-5 could figure them out. What did Altman just say about feeding ChatGPT sensitive data?

OpenAI turns off the voices

Although GPT-5’s benchmark improvements were marginal, the models should have less tendency to hallucinate, which is a major issue with models fabricating information to satisfy a users request. In our tests this weekOpenAI’s open-source (much smaller, less capable) models hallucinated an imaginary presidential candidate who Donald Trump defeated in 2024. In a blog, the company stated that

“GPT-5’s responses are around 45 percent less likely to contain a factual error than GPT-4o and when thinking GPT-5’s responses are around 80 percent less likely to contain a factual error than OpenAI o3,” it was a fictional presidential candidate whom Donald Trump beat in 2024. OpenAI implemented evaluations that test for deceptive behavior from the models, along with reducing hallucinations. The company explained

“In order to achieve a high reward during training, reasoning models may learn to lie about successfully completing a task or be overly confident about an uncertain answer,” . OpenAI claims that it has been able to reduce the deception rate from 4.8 percent to 2.1 percent by testing real-world chat data.

On the subject of safety, OpenAI implemented new measures for handling potentially dubious prompts about sensitive topics. The model claims that GPT-5 will provide the most complete answer possible within an acceptable safety margin.

Instead of refusing to respond to a question regarding how to ignite a potentially-explosive compound, the model may instead direct the user where they can find this information and issue warnings as a response to the request. OpenAI launches the first open weights language model since GPT-2.

  • OpenAI lives up to its name and launches its first open weights models.
  • The ChatGPT chatbot gets a personality

    Along with the new models, OpenAI will also be releasing four new optional personalities. Users can choose how professional or edgy their AI assistant should be.

    Four personalities will be available at launch: cynics, robots, listeners, and nerds. The model builder notes that these personalities are opt-in, and for the time being, they are limited to text chat. Voice capabilities will be added later. Mark Chen, Chief Scientist at OpenAI, stated

    “This lets you interact with ChatGPT in a way that’s consistent with your own communication style,” .

    OpenAI made sure to stress that these personalities were specifically tuned to avoid being too sycophantic when praising user questions and inputs.

    Availability.

    OpenAI’s GPT-5 models are available today on ChatGPT to all users, including free, Plus and Pro. Enterprise and educational users will receive the GPT-5 models next week. ChatGPT’s pricing remains the same at $20 per month for Plus and $200 per month for unlimited Pro. Professionals can also access the models through API. You can find the full pricing including cost per input and output, as well as cached tokens, here.

    OpenAI released its first open weights model since GPT-2 earlier this week . Boot note:

    Boot note

    Also, this week saw the release of Anthropic’s Claude Opus 4.1. This updated version of the model showed similar iterative improvements to coding benchmarks. (r)

    www.aiobserver.co

    More from this stream

    Recomended