OpenAI updates Operator from o2 to o3, which makes its $200 monthly ChatGPT subscription more attractive

May 23, 2025 at 2:51 PM (19659002)Credit: VentureBeat created with Midjourney

Join our daily and weekday newsletters to receive the latest updates on AI and exclusive content. Learn More

This week was full of AI announcements, including those from Microsoft, Google and Anthropic. OpenAI has some news to share as well. We’re not talking about a single thing. About its $6.5 billion purchase of Jony Ive’s design team OpenAI has a new hardware initiative called “io”.

The Today’s the The company upgraded its Operatorself-navigating web agent and cursor controller within ChatGPT, from the previous GPT-4o large language multimodal model to the more powerful and advanced o3 reasoning engine.

OpenAI’s $200 USD monthly ChatGPT Pro plan subscribers can access the update as a “research-preview” today, May 23, 2025.

This is OpenAI’s way to say that the product has not been “sanded” down or perfected yet. It may still have some kinks.

But what about the kinks and issues? OpenAI’s ChatGPT pro plan suddenly seems more affordable when compared to Google’s top-tier AI subscription bundle, which costs nearly $250 USD per month for access to its latest Gemini Multimodal, Imagen Image Generation, and Veo Video Generation models.

What is OpenAI Operator and why is it used?

OpenAI’s first semi-autonomous agent, Computer Using Agents, was the Operator. It debuted in January of 2025. The idea is to move beyond the ChatGPT chatbot interface and allow OpenAI to take more actions for the user.

Operator was designed so that it could autonomously point, scroll, click and type in order to complete web-based activities such as booking dinner reservations or compiling shopping list. This agentic ability allows it to complete tasks directly through the browser interface, such as booking reservations or gathering online data.

Operator did not use any web browser installed on the PC or Mac of a user for safety, privacy, and security reasons. Instead, it ran in a cloud-hosted virtual browser accessible via a standalone site–operator.chatgpt.com–where users could input requests and observe the agent perform tasks in real time.

The product combined vision, reasoning and interaction capabilities, based on GPT-4o. This marked a new direction in agentic AI for OpenAI.

It was launched as a preview for ChatGPT subscribers, and included safety measures such as user confirmations, Watch mode, and restrictions on high risk web platforms.

The product was also being tested for enterprise contexts including travel planning and civic service, demonstrating its versatility across both consumer- and business-oriented environments.

OpenAI’s o3 update offers improved accuracy, structure and success rates.

OpenAI is aiming to improve performance in several key dimensions. The new o3 based Operator shows improved persistence and accuracy when interacting with browsers.

In practice, this means that it is more likely for the user to complete tasks successfully with less need for repetition or correction. Users can also expect clearer, more structured and comprehensive responses.

Comparative evaluations show that the new model has a distinct advantage over its predecessor. Human preference studies show that users prefer the o3 model because of its style, comprehensiveness and clarity. It also performs well in terms of efficiency and instruction following, although results for factual accuracy are more balanced.

Performance on third-party evaluation benchmarks reflects these enhancements. On the OSWorld benchmark measures the completion of browser-based task. The o3 model scored 42.9, compared to the previous version’s score of 38.1.

However, OpenAI notes that due to limitations in the automated grading system, the actual performance gain could be closer to 20 percentage points!

On WebArena, the new model achieved a score of 62.9, up from 48.1. The most dramatic improvement appears on the GAIA benchmark, where the o3 model scores 62.2, vastly surpassing the prior model’s 12.3.

Side-by-side task comparisons further illustrate these gains. In one example involving a restaurant booking request, the new model provided a clearer and more detailed list of available reservations, including locations, Michelin ratings, and seating notes, presented in a well-formatted table. The previous version, while functional, delivered less information in a less organized manner, according to an image included with the New o3 Operator Release Notes :

The safety measures that were introduced in earlier versions remain. However, the o3 model has been fine-tuned to fit its role as a system agent. OpenAI has enhanced training against harmful task completion, prompt injection vulnerabilities and mistakes involving the user’s intent.

Evaluations have shown that the model confirms 94% sensitive actions before they are executed, with 100% confirmation for financial transactions. The susceptibility to prompt injection has also decreased, from 23% down to 20%.

The o3 Operator maintains an extra cautious boundary for certain high-risk web interaction, such as emails or financial platforms. It may require user supervision through Watch Mode, or it may refuse to proceed. These measures are part a layered safety approach that combines robustness at the model level with real-time monitoring.

The upgrade to Operator is a technical advancement, but it also reflects OpenAI’s commitment to responsible AI deployment.

As the system is able to take real-world action, it introduces new risks. The development team continues to improve its safety protocols in response.

The development team continues to refine its safety protocols accordingly. OpenAI’s updated system card documentationshows that the model is below high-risk thresholds for categories like biological and chemical misuse. It also has no native coding environments or terminal access to further reduce potential misuse vectors. Operator is still a research preview, and only ChatGPT Pro users can access it. Operator’s Responses API will continue to use the GPT-4o Model, at least until further notice.

Impact on enterprise technical decision makers

Upgraded Operator will significantly improve the workflows of professionals working in AI engineering, orchestration and data management. The improved accuracy of the model and its structured outputs will reduce the burden of testing and troubleshooting for those who are building or maintaining machine-learning models.

It is a reliable and practical tool to automate browser-based components in complex pipelines.

Data Engineers can now delegate manual web interactions, such as data verification and scraping, with more confidence. This frees up time for higher level optimization work. The model’s multiple safety mechanisms allow security professionals to simulate user behavior more safely in audits and incident response drills.

The o3-based Operator is a useful addition to modern technical toolkits, introducing both a framework for risk mitigation and an upgrade of capability across these disciplines.

Daily insights into business use cases from VB Daily

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

OpenAI updates Operator from o2 to o3, which makes its $200 monthly ChatGPT subscription more attractive

What is OpenAI Operator and why is it used?

OpenAI’s o3 update offers improved accuracy, structure and success rates.

The safety measures that were introduced in earlier versions remain. However, the o3 model has been fine-tuned to fit its role as a system agent. OpenAI has enhanced training against harmful task completion, prompt injection vulnerabilities and mistakes involving the user’s intent.

Impact on enterprise technical decision makers

Capital One pushes data tokenisation

Evaluating Enterprise-Grade AI Assistants: A Benchmark for Complex, Voice-Driven Workflows

This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm...

A Comprehensive Coding Guide to Crafting Advanced Round-Robin Multi-Agent Workflows with...

Recomended

Capital One pushes data tokenisation

Evaluating Enterprise-Grade AI Assistants: A Benchmark for Complex, Voice-Driven Workflows

This AI Paper Introduces Group Think: A Token-Level Multi-Agent Reasoning Paradigm for Faster and Collaborative LLM Inference

A Comprehensive Coding Guide to Crafting Advanced Round-Robin Multi-Agent Workflows with Microsoft AutoGen

Optimizing Assembly Code with LLMs: Reinforcement Learning Outperforms Traditional Compilers

Step-by-Step Guide to Build a Customizable Multi-Tool AI Agent with LangGraph and Claude for Dynamic Agent Creation