Lessons learned from leading agents of AI reveal critical deployment strategies to enterprises

Michael O’Donnell Photography,

Join the event trusted for over two decades by business leaders. VB Transform brings the people who are building enterprise AI strategies together. Learn more


Many companies are rushing AI agents to production – and many will fail. But it has nothing to do their AI models.

Day two of VB Transform– Industry leaders shared their hard-won experiences from deploying AI agents on a large scale. A panel moderated Joanne Chen, general partnership at Foundation Capital includes Shawn Malhotra as CTO. Rocket Companiesuses agents to help customers throughout the home ownership process, from mortgage underwriting through customer chat. Shailesh Nlawadi is the head of product for Rocket Companies. Sendbirdbuilds agentic customer experience experiences for companies across various verticals. Thys Waanders is SVP of AI Transformation at Cognigy’s platform automates the customer experience for large enterprise contact centres. Their shared discovery is that companies who build evaluation and orchestration first are successful. Those who rush to production with powerful model fail at scale.

>>See our Transform 2025 coverage right here

Understanding the return on investment is a key component of designing AI agents for success.

Early AI agent deployments were focused on cost reduction. Enterprise leaders report that while cost reduction is still a major component, they are now reporting more complex ROI patterns which require different technical architectures.

Malhotra shared an example of the most dramatic cost reduction

from Rocket Companies. “We had an [who] engineer who in two days was able to create a simple agent that could handle a very niche issue called ‘transfer taxes calculations’ in mortgage underwriting. He said that the two days of work saved him a million dollars in expenses per year.

Waanders, a Cognigy employee, noted that the cost per call was a key metric. He said that by using AI agents to automate certain parts of the calls, it is possible to reduce average handling time.

Methods for generating revenue

Saving money is one thing, but generating more revenue is quite another. Malhotra reported his team’s success in increasing conversion rates: Clients are more likely to convert when they get answers to their questions quicker and have a positive experience.

Nalawadi emphasized the importance of proactive outreach in generating new revenue. His team provides proactive customer service by reaching out to customers before they even realize that they have a issue. This is illustrated perfectly by a food delivery example. “They know when an order will be late and instead of waiting for the customer get upset and call, they realize there was an opportunity to do so,” he said.

Why AI agents fail in production

There are challenges to production deployments of AI agents. Nalawadi identified a core technical failure. Companies build AI agents without evaluating infrastructure.

Nalawadi stated, “Before even starting to build it, you should already have an evaluation infrastructure in place.” “We were all software engineers once. Unit tests are always run before a production release. I think that eval can be thought of as the unit test for an AI agent system.

The traditional software testing methods don’t work with AI agents. He said that it is impossible to test every possible input and write comprehensive test cases. Nalawadi and his team learned this from customer service deployments in retail, food delivery, and financial services. Standard quality assurance approaches missed edge-cases that emerged in production.

AI Testing AI: The New Quality Assurance Paradigm

Given AI testing’s complexity, what should organizations be doing? Waanders solved his testing problem by using simulation. Waanders explained that “we have a feature we’re releasing shortly that is about simulating possible conversations.” “It’s basically AI agents testing AI Agents.”

This testing isn’t only conversation quality testing but behavioral analysis at scale. Can it be used to understand how agents respond to angry customers? Does it support multiple languages? What happens when customers use slang words?

Waanders said, “The biggest problem is that you don’t even know what you don’t know.” “How does it respond to anything anyone could think of?” You can only find out by simulating thousands of scenarios and pushing the software to its limits.

This approach tests demographic variations, emotions, and edge cases which human QA teams cannot cover.

The coming explosion of complexity

Currently, AI agents are only able to handle single tasks. Enterprise leaders must prepare for a new reality: hundreds of agents in each organization that learn from one another. The infrastructure implications of

are huge. Failure modes multiply exponentially when agents share data and collaborate. Traditional monitoring systems cannot track these interactions.

Companies need to architect for this complexity right now. Retrofitting infrastructure to support multi-agent systems is more expensive than building it from scratch.

If you fast-forward in what is theoretically possible, it could be hundreds in an organization and they may be learning from each other, Chen said. The number of possible outcomes explodes. The complexity explodes.”

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

www.aiobserver.co

More from this stream

Recomended