(19659001)
Image Credit: VentureBeat via ChatGPT
Subscribe to our daily and weekly emails for the latest updates on AI and exclusive content. Learn More
New AI-powered browser use agents are emerging and promising to transform the way enterprises interact with websites. These agents are able to autonomously navigate websites, retrieve data, and complete transactions. However, early testing has revealed significant gaps between performance and promise.
Although consumer examples like ordering pizza or purchasing game tickets have made headlines, it is important to understand where the developer and enterprise use case are. Sam Witteveen is the co-founder of Red Dragon a company which develops AI agent apps. He said, “We don’t know what will be a killer app.” “My guess is that it will be things that take up time on the internet that you don’t enjoy. This includes things like searching the web for the best price of a product, or booking the most suitable hotel accommodations. It is more likely to be used in conjunction with other tools, such as Deep Research. This will allow companies to do even more sophisticated research and execute tasks on the web.
Companies must carefully evaluate the rapidly changing landscape as established players and startup take different approaches to solving autonomous browsing challenges.
Key players in the browser use agent landscape
This field is crowded, with both large tech companies and innovative startups.
- OpenAI’s operator (launched in January 2025) – Available for ChatGPT Pro ($200/month), focusing primarily on consumer-friendly Web automation
- Convergence’s Proxy (launched in December 2024) – UK startup offering limited access (5 sessions/day for free) or unlimited access at $20/month. Google’s Project Mariner (released October 2024)
- Anthropic’s Computer Use – A new update is expected soon
- Microsoft’s OmniParser V2 – (February 2025 ) – A project that converts UI screenshots into structured information, allowing LLMs interpret and interact with websites.
- ByteDance’s UI-TARS: Requires deeper access to the system, raising security concerns
- Browser-Use () – A developer’s tool that allows choice of AI models. The most advanced are Google’s Gemini Flash
operator and proxy. They are consumer-friendly, and ready to use out of the box. Many of the other products seem to be geared more towards developers or enterprise use. For example, Browser Use is a Y Combinator startup that allows the user to customize the models with the agent. This gives you greater control over the agent’s functionality, including the ability to use a model on your local machine. It’s more complicated.
Each of the others listed above provides a different level of functionality and interaction with machine resources. I decided to not test ByteDance UI-TARS at this time, as it requested lower levels of access to my machine’s security and privacy features. (If I test it, I will definitely use a second computer).
Testing reveals challenges in reasoning
The easiest to test are OpenAI Operator and Convergence Proxy. Our testing revealed that reasoning capabilities can be more important than raw automation features. Operator was particularly buggy.
I asked my agents to find the five most popular VentureBeat stories and summarize them. VentureBeat does not have a “most-popular” section in itself . Operator struggled to do this. It fell into an endless scrolling loop when searching for the’most popular stories’, which required manual intervention. It found a three-year old article titled, “Top five stories for the week.” Proxy, however, demonstrated better reasoning, identifying the five stories that were most visible on the homepage, as a practical proxy of popularity, and gave accurate summaries.
This distinction became more apparent when real-world tasks were performed. I asked the agents for a reservation in a romantic restaurant at noon in Napa California. The operator approached the task in a linear manner — first finding a romantic place, then checking availability for noon. It reached a dead-end when no tables were available. Proxy used a more sophisticated approach by using OpenTable to locate restaurants that were romantic and available during the desired time. It even returned with a slightly higher rated restaurant.
Even seemingly straightforward tasks revealed significant differences. Proxy found the “YubiKey 5C NFC Price” more quickly than Operator when searching on Amazon for it.
OpenAI didn’t reveal much about the technologies it uses to train its Operator agent. It only said that it trained its model for browser-use tasks. Convergence has provided more details: Its agent uses Generative Tree Search, which “leverages Web-World Models to predict the state the web will be in after a proposed action is taken.” These are generated recursively, resulting in a tree of futures which is then sorted by our value models to determine the next best action. Our Web-World models are also used to train agents without having to generate a lot expensive data. Here().
Benchmarks are not useful for the moment
These tools look similar on paper. Convergence Proxy The 80%is achieved on the WebVoyager benchmark evaluates web agents on 643 real-world tasks across 15 popular websites such as Booking.com and Amazon. OpenAI’s Operator scored 87%, whereas Browser-Use scored 93% It says it reaches 89%but only after making a slight change to the WebVoyager’s codebase, it admitted, “accordingly to our needs”.
These scores are not to be taken too seriously, however, as they could be manipulated. The real test is in the real-world use of real-world cases. It’s still very early in the game, and products are changing almost daily. Results will depend on what you are trying to accomplish. You may also want to rely on your gut feeling when using different products.
Enterprise automation implications
Enterprise automation has significant implications. Witteveen explains in our article Video podcast conversation on this topic, where we dive deep into this browser-use tendency. Many companies pay for virtual assistants, operated by real people, to handle basic web-research and data-gathering tasks. These browser-use agents may change the equation.
Witteveen says, “If AI takes over this,” “that’s what will be the first low hanging fruits of people losing their job.” It will show up in these types of things.
The trend of robotic process automation (RPA), where companies use browsers to automate tasks, could be a result of this. As mentioned earlier, the most powerful use cases will be when agents combine browser use with other tools. For example, Deep Research, which is an LLM-driven LLM agent using a search tool combined with browser use, to perform more sophisticated tasks.
The cost dynamics are driving innovation
A powerful open-source reasoning model like DeepSeek R1 is another key factor that drives rapid development. These models allow companies that are building these browser-based agents to compete with larger players more effectively than if they were to build their own.
Pricing pressure is already apparent. OpenAI requires a $200 ChatGPT Pro subscription per month to access Operator. Convergence, on the other hand, offers a limited free plan (up to 5 uses per day), and a $20/month unlimited package. This competitive dynamic will accelerate enterprise adoption even though use cases are not yet clear.
Security and integration challenges
There are still a few hurdles to overcome before enterprise adoption becomes widespread. Some websites actively prevent automated browsing while others require CAPTCHA validation. OpenAI and Convergence both have tools that can bypass CAPTCHAs. However, they let the users fill them out instead of doing it directly. This is because the whole purpose of CAPTCHAs are to verify a human being is on the other end. ByteDance UI-TARS, for example, requests deep system access. This raises security concerns when used in enterprise deployment.
The approach to website collaboration varies. OpenAI has worked specifically with partners such as Instacart and Priceline. Other OpenAI users have tried to navigate any website. This inconsistency may impact enterprise use cases. Of course, if an agent is prompted to enter login information, this will slow down the process, as you will be required to do so.
Looking ahead
Enterprises evaluating these tools should focus on specific use cases that could provide clear value, whether in research, customer support, or process automation. The technology is advancing rapidly, but the success of this new technology will depend on how well it matches capabilities with concrete business needs.
As the market evolves, we can expect to see more enterprise features and possibly specialized agents for specific tasks or industries. The race between established companies and innovative startups will drive both technical advancements and competitive pricing. 2025 will be a critical year for enterprise browser use agent adoption.
Check out the Testing Results and Trends for more information on these trends. Full video conversation between Sam Witteveen, myself and others
Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to maximize ROI, from regulatory changes to practical deployments.
Read our privacy policy
Thank you for subscribing. Click here to view more VB Newsletters.
An error occured.