Former Google DeepMind engineer who developed Simular says that other AI agents are not doing it right

July 15, 2025

When Ang Li (co-founder of agent software biz Simular) started working at Google DeepMind, software engineers were sceptical about the usefulness and effectiveness of machine learning or artificial intelligence, as it is now called. Li told The Register that the production team would often say “machine learning never works in production.”

“That is kind of interesting because we have lots of papers also hyping AI,” between 2017 and 2019.

Li explained that at one point the Google Ads team had asked the DeepMind team to use the AlphaGo system, which was the one that won the game Go, to improve Google’s advertising revenue. Li said

“I think some people tried it, but it actually dropped the revenue,” . “That’s the funny part because the real world system is very complex.”

Li explained that machine learning methods are based upon statistics and assume a static dataset. He explained. “In the real world, for example, on YouTube, you have videos being uploaded every day. In ads, you have search queries coming every day. And this distribution of data keeps changing. That’s actually the core reason why machine learning doesn’t work in production.”

All of this was before OpenAI released ChatGPT in November 2022. Machine learning is still not working well nearly three years after the hype cycle of generative AI and billions of dollars in capital expenditures. Investors have been dazzled.

We noted last monththat AI agents, AI models that use tools in a continuous loop, only complete office tasks about 30% of the time.

Success rates vary depending on the benchmark and when it is measured. The OSWorld Benchmark, which measures how well agent software can perform real-world computer tasks was established in April of 2024. Benchmark tasks include directives such as: “Please update my bookkeeping sheet with the recent transactions from the provided folder, detailing my expenses over the past few days.”

The top performing AI agent at the time was GPT-4 (with Vision) which managed an overall success rate “Please update my bookkeeping sheet with the recent transactions from the provided folder, detailing my expenses over the past few days.”

of 12.24.

A week ago, the best performing AI agent was GUI Test-time scaling Agentwhich, when paired with OpenAI’s o3 model, scored a task success rate of 45.2% on OSWorld benchmark. GTA1 is the result of research from researchers at the Australian National University and the University of Hong Kong.

This is a significant improvement over the state-of-the-art last year, but the best agent fails to complete office automation tasks at least half the time. Human workers can achieve a task completion rate of 72.36 percent.” Nvidia’s CEO says China won’t risk building supers with American AI chip

xAI Grok lurches right-wing lunacy, offers tips on assaulting a man

xAI Grok lurches in to right-wing insanity and offers tips on assaulting a man

When Li co-founded Simular in 2023 with Jiachen Yaang, he claimed he told the people that the company People didn’t get it and tried to convince Li to call them assistants. Everyone is now building agents. He said.

For now, we have to carry computers with us every day, but in the future, we won’t need to.

Simular’s S2 agent frameworkis currently ranked fourth on OSWorld, and sixth on the AndroidWorld Benchmark, reflecting the company’s vision for autonomous computing. Li said

“Basically for now we need to carry computers every day with us, but in the future we don’t have to,” . Li said that this agent would also know the user’s preferences and habits, which are stored locally on your computer. “This is the vision we’re pushing for.”

Simulate prois a $500/month computer agent for macOS, Apple silicon, that automates desktop tasks. Li expects that this is not a price for casual use, but rather an adoption in industries such as insurance and healthcare where there are a lot repetitive computer tasks involving filling forms. Li explained.

“Insurance, healthcare, finance, they have no API for developers or business to automate their workflow. They are pretty painful. They have to hire people around the world to sit in on the computers. They say if you can automate this, it’s going to be a huge productivity boost for them. Most of the customers are actually in these categories.”

To attract organizational interest in this type of office task automation, it is likely that you will need to get things right as often as human workers. Li, however, believes that the industry is in a state of confusion. Li said

“We believe everyone else is doing the wrong thing,” . “It’s not really the wrong thing. It’s like they are not going in the right direction. Everyone says agents are based on LLMs. We believe this type of technology is only one part of the reinforcement learning framework.”

Li makes a distinction between exploitation, which is executing a solution without considering other options, and exploration – where an LLM tries out different possible paths to find a resolution.

He said that other companies are too focused on exploration and don’t spend sufficient time on exploitation. Simular’s S2 framework uses the LLM to explore, but when it finds a solution it converts it into symbolic code similar to JavaScript so that tasks can run predictably and programmatically.

Li views Simular as more of a technical infrastructure provider than a manufacturer of agent products. As he describes, the goal is to create a neuro-symbolic continuous reinforcement learning framework for creating agents.

He said that continuous learning is one of the most difficult problems for AI researchers. He explained that the problem is that if you continue to train a neural network with new data “it will gradually, catastrophically forget what you learned ten days ago,” it will become unaffordable. Then there’s cost. It becomes impossible to continue adding knowledge to static models and retraining them.

Li says that in order to reach what the industry calls AGI, or Artificial General Intelligence, a point where AI models can perform most tasks just as well as humans, the future will require continuous learning. (r)