Salesforce takes aim at a ‘jagged’ intelligence in its push for more reliable AI.

Credit : VentureBeat made using Midjourney

Join our daily and weekday newsletters to receive the latest updates on AI coverage. Learn More


Salesforce has taken on one of the most persistent artificial intelligence challenges for business applications. The gap between an AI’s raw intelligence, and its ability to perform consistently in unpredictable enterprise environments – what the company calls the ” “Jagged intelligence

Today, in a research announcement, Salesforce AI Research has revealed several new benchmarks and frameworks that will make future AI agents more intelligent and trustworthy for enterprise use. The innovations are designed to improve AI systems’ capabilities and consistency, especially when they are deployed as autonomous agents within complex business environments.

“While LLMs excel at standardized testing, planning intricate trips, and generating sophisticated poetry, their brilliance is often stumbling when faced with the necessity for reliable and consistently task execution in dynamic and unpredictable enterprise environments,” Silvio Savarese said during a preceding press conference.

This initiative represents Salesforce’s push towards what Savarese refers to as ” Enterprise General Intelligence (19459052) ” (EGI), AI designed specifically for the business complexity, rather than the more hypothetical pursuit of Artificial General Intelligence.

We define EGI as AI agents that are purpose-built for business and optimized not only for capability but also for consistency,” Savarese explained. While AGI may conjure up images of superintelligent robots surpassing human intelligence in the distant future, businesses aren’t awaiting that illusory, distant future. They’re using these foundational concepts to solve real-world problems at scale.

How Salesforce is measuring AI’s inconsistent performance in enterprise settings and fixing it

The research focuses on quantifying and addressing AI inconsistency. Salesforce introduced the SIMPLE datasetis a public benchmark that features 225 simple reasoning questions to measure the capabilities of an AI system.

‘Today’s AI has a lot of flaws, so we have to fix that. How can we improve something without first measuring it? This SIMPLE benchmark is exactly what Shelby Heinecke, Senior Director of Research at Salesforce said during the press conference.

This inconsistency for enterprise applications is not merely a concern. A single AI agent’s mistake could disrupt operations, undermine customer trust or cause substantial financial damage.

Savarese wrote in his commentary that AI is not a hobby for businesses; it’s an essential tool that requires predictability.

Inside CRMArena, Salesforce’s virtual testing grounds for enterprise AI agents

The most significant innovation is CRMArena is a benchmarking framework that simulates realistic customer relationship management scenarios. It allows for comprehensive testing of AI agents within professional contexts. This bridges the gap between academic benchmarks as well as real-world business needs.

“Recognizing current AI models are often inadequate in reflecting the complex demands of enterprise environments we have introduced CRMArena, a novel benchmarking frame work meticulously designed to simulating realistic, professionally grounded scenarios,” Savarese stated. The framework measures agent performance in three personas, including service agents, analysts and managers. Early testing revealed, that even with guided prompting leading agents still failed to call functions for these personas use cases less than 65% the time.

Savarese explained that the CRM arena is a tool used internally to improve agents. “It allows for us to stress-test these agents, understand where they are failing, and then use the lessons we learn from these failure cases to improve agents,” Savarese explained.

Salesforce highlighted the new embedding models which understand enterprise context more than ever before.

Salesforce announced a number of technical innovations. SFR-Embedding (19459052) is a new model that allows for a deeper understanding of context. It leads the Massive Text Embedding Benchmark across 56 datasets.

SFR embedding is more than just research. Heinecke said that Data Cloud will be launching it very, very soon.

Specialized version SFR-Embedding Codewas also introduced to developers, enabling high quality code search and streamlining the development. Salesforce claims that the 7B parameter version is the most popular. Code Information Retrieval benchmarkis a benchmark for retrieving code information. Smaller models (400M and 2B) are cost-effective and efficient alternatives.

Why smaller, action focused AI models may outperform large language models for business tasks (

Salesforce has also announced xLAM (Large Action Model),is a family models designed to predict action rather than just generate texts. These models are based on 1 billion parameters, a fraction of what many leading language models have.

‘What’s unique about our xLAM model is that if we look at the sizes of our models, we have a 1B, all the way up until a 70B. Heinecke explained that the 1B model is a fraction of what many large language models are today. This small model is incredibly powerful in its ability to execute the next step in a task sequence.

These action models, unlike standard language models, are specifically trained to anticipate and execute the next steps of a task-sequence, making them especially valuable for autonomous agents who need to interact enterprise systems.

Large action models are LLMs underneath the hood. We build them by taking an LLM and fine-tuning it on what we called action trajectories,” Heinecke said.

Salesforce’s Trust Layer: How it Establishes Guardrails for Business Use

In order to address enterprise concerns regarding AI safety and reliability Salesforce introduced SFR-Guard () is a family of models that are trained using both publicly available data as well as CRM-specific internal data. These models reinforce the company’s Trust Layer which provides guardrails to AI agent behavior. The company announced that “Agentforce guardrails set clear boundaries for agent behaviour based on business requirements, policies, and standard, ensuring agents behave within predefined limitations.” The company also launched ContextualJudgeBench is a benchmark for evaluating LLM based judge models within context. It tests over 2,000 challenging responses for accuracy, faithfulness, conciseness and appropriate refusal to respond.

Salesforce unveils a new way of looking beyond text TACOis a multimodal family of action models designed to tackle complex problems in multiple steps through chains of thought and action (CoTA). This approach allows AI to interpret and answer complex queries involving multiple media types. Salesforce claims up to a 20% improvement over the challenging MMVet Benchmark.

Co-innovation in action : How customer feedback shapes Salesforce Enterprise AI roadmap

Itai Asseo, Senior Director of Brand Strategy and Incubation at AI Research, stressed the importance of customer-driven co-innovation when developing enterprise-ready AI-solutions.

Asseo said, “When we talk to customers, we find that there is a low tolerance for answers that are inaccurate and irrelevant when dealing with enterprise-level data.” “We’ve made progress with reasoning engines, RAG techniques, and other methods around LLMs.”

What’s next in Salesforce AI? The road to Enterprise General Intelligence

Salesforce is stepping up its research at a crucial time for enterprise AI adoption. Businesses are looking for AI systems that combine advanced features with reliable performance.

As the tech industry pursues ever larger models with impressive raw abilities, Salesforce’s focus is on the consistency gaps. This highlights a more refined approach to AI development that prioritizes real world business requirements over academic benchmarks.

The technology announced Thursday will be rolled out in the next few months. SFR-Embedding (19459052) will be the first to arrive on Data Cloud, while other innovations are expected to power future versions Agentforce.

Savarese stated in the press conference that “It is not about replacing people.” In the race for enterprise AI dominance, Salesforce bets that consistency and reliability – not just raw intelligence – will ultimately determine the winners of this business AI revolution.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop on what companies do with generative AI. From regulatory shifts to practical implementations, we give you the insights you need to maximize ROI.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.


www.aiobserver.co

More from this stream

Recomended