OpenAI’s agent tool may be nearing release

OpenAI’s agent software may be close to release

OpenAI is close to releasing a tool that will take control of your computer and perform actions for you.

Tibor Bláho is a software developer with a reputation of accurately leaking upcoming AI product information. Publications claim to have found evidence of OpenAI’s long-rumored Operator Tool. Publications Bloomberg has previously published Report on Operator which is said to a be an “agenttic” system capable autonomously handling tasks such as writing code and booking trips.

According to of The Information, OpenAI has set January as the month for Operator’s release. Blaho’s code uncovered this weekend lends credence to the report. Blaho reports that OpenAI’s ChatGPT for macOS now has options to define shortcuts for “Toggle Operator” or “Force Quit Operator”. These are currently hidden. Blaho reported that OpenAI had added references to Operator to its website. However, these references are not yet publicly visible.

Confirmed: the ChatGPT macOS desktop application has hidden options for defining shortcuts to the desktop launcher that “Toggle Operator”and “Force Quit Operator”. https://t.co/rSFobi4iPN pic.twitter.com/j19YSlexAS

— Tibor Blaho (@btibor91)””https://twitter.com/btibor91/status/1881110210867290191?ref_src=twsrc%5Etfw”” rel=””nofollow” “> January 19, 2025

Blaho reports that OpenAI’s website also contains tables not yet public, comparing the performance and capabilities of Operator with other computer-based AI systems. These tables could be placeholders. If the numbers are accurate, it suggests that Operator may not be 100% reliable depending on the task.

OpenAI’s website already contains references to Operator/OpenAI CUA – “Operator System Card Table”“Operator Research Eval Table” and “Operator Refusal Ratio Table”.

Includes comparison to Claude 3. 5 Sonnet Computer Use, Google Mariner etc.

Preview of tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91)””https://twitter.com/btibor91/status/1881285255266750564?ref_src=twsrc%5Etfw”” rel=””nofollow” “> January 20, 2025

“OpenAI Computer Use Agent” — perhaps the AI model that powers Operator — scored 38.1% on OSWorld, an OSWorld benchmark that attempts to mimic a realistic computer environment. This score is ahead of Anthropic’s computer-controlling models but far behind the 72.4% humans scored. OpenAI CUA outperforms humans on WebVoyager which measures an AI’s ability navigate and interact with website. According to leaked benchmarks, the model does not achieve human-level scores in another web-based benchmark called WebArena. According to the leaked benchmarks, Operator also struggles to perform tasks that a human would be able to do easily. Operator only succeeded 60% of the time in a test where he was asked to sign up with a cloud service provider and launch a virtual machine. Operator only succeeded 10% of the times when asked to create a Bitcoin wallet.

OpenAI has been contacted for comment. We will update this article if we receive a response. OpenAI’s entry into the AI agent market comes as competitors such as Anthropic, Google and others are making moves for the nascent sector. AI agents may be They are speculative and riskybut tech giants have already hailed them as the next big thing for AI. According to the analytics firm Markets and Markets, the market for AI agents may be worth $47.1 billion in 2030.

Agents are primitive today. Some experts have expressed concerns about their safety if technology improves rapidly.

According to one of the leaked charts, Operator performed well in selected safety evaluations. This included tests that tried to get the system perform “illicit activity” and search for sensitive personal data. According tosafety testing is one of the reasons for Operator’s long development cycle. In a recent X Wojciech Zaremba, co-founder of OpenAI, criticized Anthropic in a postfor releasing a software agent that he claimed lacked safety mitigations. Zaremba wrote: “I can only imagine what the negative reactions would be if OpenAI released a similar agent.”

It is worth noting that OpenAI was criticized by AI experts, including former staff, for allegedly reducing the emphasis on safety work to quickly productize its technology.

Kyle Wiggers, a senior reporter for TechCrunch, has a special interest on artificial intelligence. His writings have appeared in VentureBeat, Digital Trends and a variety of gadget blogs, including Android Police and Android Authority, Droid-Life and XDA-Developers. He lives in Brooklyn, with his partner who is a piano teacher, and plays the piano occasionally. Sometimes — but mostly unsuccessfully.

View Bio

www.aiobserver.co

More from this stream

Recomended