Don’t let hype about AI agents get ahead of reality

Google’s recent announcement of a “new category of agentic experiences,” which it calls, feels like a turning-point. In May, at its I/O event, Google showed off a digital assistance that did more than just answer questions. It helped with a bicycle repair, by finding a user manual, locating YouTube tutorials, and even calling local stores to ask about parts, all without any human nudging. These capabilities may soon be available outside of the Google ecosystem. The company has created an open standard called Agent to Agent, or A2A. This standard allows agents from different companies to communicate and work together.

This vision is exciting. Intelligent software agents who act as digital coworkers book your flights, reschedule meetings, file expenses, and talk to each other to get things done. If we don’t take care, we could derail this idea before it can deliver any real benefits. As with other tech trends, the hype can get ahead of reality. When expectations become out of control, a backlash will follow.

Let’s start with the word “agent”. It’s currently being used to describe everything from simple scripts, to sophisticated AI workflows. There is no standard definition, so companies can market basic automation as being much more advanced. This “agentwashing”which is a form of misleading customers, can lead to disappointment. We don’t need a rigid standard but we need to be clearer about what these systems should do, how autonomously and reliably they work.

Reliability is the next major challenge. The majority of today’s agents are powered largely by large language models, which generate probabilistic responses. These systems are not only powerful but also unpredictable. They can be unpredictable, make up things, go off course, or fail in subtle way–especially if they are asked to complete tasks that require multiple steps, involving external tools and chaining LLM answers together. An automated support agent told Cursor users that they could only use the software on one device. There were many complaints and reports that users had cancelled their subscriptions. It turned out to be a mistake The policy didn’t even existAI invented it. This kind of mistake can cause huge damage in enterprise settings. We must stop treating LLMs like standalone products and instead build complete systems around them. These systems should account for uncertainty, monitor the outputs, manage the costs, and include guardrails to ensure safety and accuracy. These measures can ensure that the output adheres with the user’s requirements, complies with the company’s access policies, respects privacy concerns, etc. AI21, a company I cofounded that has received funding by Google, is already moving in this direction. They are wrapping language models into more deliberate, structured architectural designs. Maestro is our latest product, designed for enterprise reliability. It combines LLMs with public information, company data, and other tools in order to ensure reliable outputs.

Even the smartest agent will not be useful in a vacuum. For the agent model work, agents must cooperate (booking travel, checking weather, submitting expense reports) without constant supervision. Google’s A2A Protocol is the answer. It’s a universal language designed to let agents share what they are capable of and divide tasks. It’s a good idea in principle.

In practice, A2A still falls short. It only defines how agents communicate with each other and not what they mean. If an agent says that it can provide “wind condition,” another must guess whether this is useful for evaluating the weather on a flight path. Coordination becomes fragile without a shared context or vocabulary. This problem has been encountered before in distributed computing. Solving this problem at scale is not trivial.

Another assumption is that agents are naturally cooperative. In the Google ecosystem or other single company’s, this may be true, but in real life, agents represent different vendors, clients, or even competitors. If my travel agent requests price quotes from your airline agent, but your agent is incentivized towards certain airlines, then my agent may not be able get me the best itinerary or the least expensive one. It may be unrealistic to expect seamless collaboration without some mechanism for aligning incentives, such as contracts, payments or game-theoretic mechanisms.

These issues are not insurmountable. Shared semantics is possible. Protocols can be evolved. Agents can learn to collaborate and negotiate in more sophisticated ways. These problems won’t go away on their own, and if they are ignored, the term “agent”like other overhyped buzzwords in tech, will disappear. Some CIOs already roll their eyes when they hear the term.

This is a warning. We don’t wish to have the excitement cover up the pitfalls and let users and developers discover them on their own. It would be a shame. This potential is real. We need to match our ambition with a thoughtful design, clear expectations, and clear definitions. If we can achieve this, agents will not be just another passing trend. They could become the backbone for how we do things in the digital age.

Yoav is a professor at Stanford University, and cofounder of AI21 Labs. His 1993 paper on Agent-Oriented Programming won the AI Journal Classic Paper Award. He is the coauthor of Multiagent Systems : Algorithmic Game-Theoretic and Logical Foundations,a standard textbook.

Don’t let hype about AI agents get ahead of reality

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google...

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

Recomended

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google Lens and Google Lens

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat