After weeks of buzz, OpenAI released Operator, its AI agent. Operator is a web application that allows users to perform simple online tasks using a browser. For example, it can book concert tickets or place an online grocery order. The app is powered using a new model named Computer-Using Agent (CUA, or “coo-ah”) built on OpenAI’s large multimodal language model GPT-4o.
The Operator is now available at OpenAI’s ChatGPT Pro service, which costs $200 a month, is available to those in the US who have signed up for it. The company plans to make the tool available to other users at a later date.
OpenAI says that Operator outperforms other similar tools, such as Anthropic’s Computer Use, a version of Claude 3.5 Sonnet which can perform simple tasks on a PC), and Google DeepMind Mariner (a web browsing agent built on Gemini 2.0).
It is clear that three of the top AI companies in the world have agreed on a vision of what agent based models can be. The battle for AI dominance has a new front–and that’s our computer screen.
Ali Farhadi is CEO of the Allen Institute for AI. He says that moving from generating images and text to actually doing things, is the right way to go. It unlocks business and solves new issues.
Farhadi believes that doing things on the computer screen is the natural first step for agents. “It’s constrained enough to actually work,” says Farhadi. Farhadi says that AI2 is developing its own computer-using agents.
Donโt believe the hype.
OpenAIโs announcement confirms two rumors that circulated the internet this week. One rumor predicted that OpenAI would reveal an agent-based application, after details of Operator were leaked ahead of its release on social media . One predicted that OpenAI would soon reveal a new superintelligence,and that officials of newly inaugurated President Trump will be briefed about it. Could the two rumors have a connection? OpenAI superfans were curiousto learn more.
Nope. Open MIT Technology Review ( ) gave a preview of Operator yesterday. The tool gives a glimpse at the potential of large language models to do more than just answer questions. But Operator is still an experimental work. Yash Kumar is a researcher with OpenAI. He says, “It’s early, and it still makes mistakes.”
Let’s leave the wild superintelligence rumors to OpenAI CEO Sam Altman. “Twitter hype is again out of control,” he wrotein a post on January 20. “pls calm down and reduce your expectations 100x!”
Like Anthropic Computer Use and Google DeepMind Mariner, Operator scans pixels on a computer screen to determine what actions it should take. CUA, the model that is behind it, has been trained to interact with the same graphic user interfaces (buttons, text fields, menus) that people use online. It scans the display, takes a specific action, scans it again, takes a different action, etc. This allows the model to perform tasks on any website that a human can use.
Reiichiro Nakao, a scientist with OpenAI, says that models have traditionally used software through specialized APIs. (An API is a piece code that acts as a connector, allowing various pieces of software to be connected to each other.) He says that this puts many apps and websites out of reach. “But if we create a model which can use the same user interface that humans use every day, it opens up an entirely new range of software previously inaccessible.” OpenAI claims CUA was taught using techniques similar to the ones used for its reasoning models o1 and O3.
OpenAI tested CUA using a variety of industry benchmarks to determine the agent’s ability to perform tasks on a computer. The company claims its model beats Computer Use and Mariner on all of them.
On OSWorld, a test that measures how well an agent can perform tasks like merging PDF files and manipulating an image CUA scored 38.1% compared to Computer Use’s 22,0%. Humans scored 72.4%. CUA scored 87% on a benchmark called WebVoyager which tests how an agent performs in a browser. Mariner scored 83.5% and Computer Use scored 56%. (Mariner is limited to using a browser for tasks and does not score well on OSWorld.
Operator can only perform tasks in a web browser for the time being. OpenAI plans to make CUAโs expanded capabilities available in the future through an API that developers can use to create their own apps. Anthropic released Computer Use on December in this way.
OpenAI claims it has tested CUAโs safety using red teams. They explored what happens when users ask the model to do unacceptable tasks, such as research on how to make bioweapons, when websites contain hidden instructions that are designed to derail it and when it breaks down. Casey Chu, a researcher on the team, says that they have trained the model to ask for information and stop before performing any actions with external side effects.
Look! No hands
You can use Operator by typing instructions into a textbox. Operator sends your commands to a remote web browser running on OpenAI’s server, instead of launching the browser on your own computer. OpenAI claims this makes the system more effective. Mariner (which runs in Google Chrome on your computer) is another important difference between Operator and Computer Use.
Operator is able to perform multiple tasks simultaneously because it runs in the cloud. This is according to Kumar. In the live demonstration, he asked Operator for a table reservation for two at 6.30 p.m. in a restaurant named Octavia located in San Francisco. Operator immediately opened OpenTable and began clicking through the options. “As you can tell, my hands are not on the keyboard,” he said.
OpenAI collaborates with a number businesses, including OpenTable StubHub Instacart DoorDash and Uber. Operator seems to suggest pre-set websites for certain tasks.
As the tool navigated OpenTable dropdowns, Kumar asked Operator to find four tickets for Kendrick Lamar’s show on StubHub. While it was doing that, he copied and pasted a picture of a shopping list on a piece of paper. He then asked Operator to add those items to his Instacart.
While waiting, he flipped between Operator’s tabs. “If it wants help or confirmation, it will ask you questions and you’ll be able to answer them,” he said.
Kumar has been using Operator in his home. It helps him keep on top of his grocery shopping. “I can quickly click a picture of a list and then send it to the office,” he says. It has also become a friend in his personal life. “I have a night out every Thursday,” Kumar says. Every Thursday morning, he asks Operator to send a list of restaurants with a table for 2 that evening. “I could do that but it would take me 10 minutes,” says he. “I often forget to do this. Operator allows me to complete the task in just one click. Booking is not a burden.”