What you need know about Amazon Nova Act, the new AI agent SDK that challenges OpenAI, Microsoft and Salesforce

Credit : VentureBeat made using Midjourney

Learn More

Subscribe to our daily and weekly emails for the latest updates on AI coverage. Learn More


A sleeping giant has awoken.

It seemed for a time that Amazon was playing catch-up in the race to provide its users — especially the millions of developers who build on Amazon Web Services’ cloud infrastructure — with compelling first-party tools and AI models.

In late 2024, Amazon debuted its internal foundation model family with text, video and image generation capabilities. Last month, a new Amazon Alexa was powered in part by Anthropic’s Claude family.

On Monday, the ecommerce and cloud giant announced its new Claude family of models. Artificial general intelligence division Amazon AGI (19459061) has Amazon Nova Act is an experimental developer kit that allows you to build AI agents capable of navigating the web and completing tasks on their own. It’s powered by a proprietary, custom version of Amazon’s Nova large-language model (LLM). The SDK, which is an open-source software under the Apache 2.0 permissive license, is designed to only work with Amazon’s custom Nova model and not any other third-party models.

To enable third-party developers build AI agents that can reliably perform tasks within web browsers.

How does Amazon’s Nova Act compare to other agent-building platforms on the market such as Microsoft AutoGen, Salesforce Agentforce, and OpenAI’s newly released open source Agents SDK?”

A different and more thoughtful approach to AI Agents

Ever since the public rise of large-language models (LLMs), many “agent” systems were limited to providing information through knowledge bases or responding in natural language. Nova Act is a part of a larger industry shift towards action-based agents, systems that can complete tasks across digital environments for the user. OpenAI’s new responses API, which allows users to access its autonomous browser navigator is a leading example. Developers can integrate this into AI agents using the OpenAI AgentsSDK.

Amazon AGI emphasizes the fact that current agent systems are promising but often lack reliability and require human supervision. This is especially true when dealing with multi-step or complicated workflows. Nova Act was designed to address these limitations. It provides a set atomic, prescriptive instructions that can be chained into reliable workflows.

Deniz Birlikci described the broader vision of Amazon in a Video introducing Nova Act (19459061): Soon, there will more AI agents than users browsing the web, performing tasks on their behalf.

David Luan VP of Amazon’s Autonomy Team, and Head of AGI SF Lab framed the mission in a recent interview via video call with VentureBeat. “We’ve created a new experimental AI model which is trained to perform tasks in a browser. He said that fundamentally, agents are the building blocks of computing.

Luan was formerly the co-founder and CEO at Adept AI. He joined Amazon as a 2024 employee. Part of an aqcui hireLuan has been a long-time proponent of AI agents. “With Adept we were the very first company to start working on AI Agents. Everyone knows now how important agents are. It was cool to be ahead of the game,” he said.

What Nova Act offers developers

Nova Act SDK gives developers a framework to build web-based automation agents that use natural language prompts, broken down into manageable steps. Nova Act is designed to incrementally perform smaller, verifiable steps. This is different from typical LLM-powered agent that attempt entire workflows with a single prompt. Nova Act has several key features, including:

  • Finely-Grained Task Decomposition Developers are able to break down complex digital work flows into smaller act() call, which each guide the agent to perform UI interactions. Direct Browser Manipulation with Playwright: Nova Act is integrated with Microsoft’sPlaywright, an open-source framework for browser automation. Playwright allows developers to control web browsers programmatically–clicking elements, filling forms, or navigating pages–without relying solely on AI predictions. This integration is especially useful when handling sensitive tasks like entering credit card numbers or passwords. Instead of sending sensitive data to the model, developers could instruct Nova Act to focus only on a password input field, and then use Playwright’s APIs to enter the password securely without the model “seeing” it. This approach strengthens security and privacy for web interactions.
  • Python Integration: The SDK allows developers the ability to interleave Python commands with Nova Act commands. This includes standard Python tools like breakpoints or assertions. Structured Data Extraction:
  • The SDK allows agents to convert screen contents into structured formats using Pydantic schemas.
  • Scheduling and Parallelization: Developers are able to run multiple Nova Act instances simultaneously and schedule automated workflows, without the need for constant human oversight.

Luan stressed that Nova Act was a tool designed for developers, and not a chatbot with a wide range of uses. “Nova Act was built for developers. It’s not just a chatbot that you can talk to for entertainment. “It’s designed to allow developers to start building useful products,” said he.

One of the sample workflows shown in Amazon’s documentation demonstrates how Nova Act can automate apartments searches by scraping rental listings, calculating biking distances to train stations, and then sorting the results into a structured table.

In another example, Nova Act is used to order a specific Sweetgreen salad every Tuesday, completely hands-free, and on a set schedule. This shows how developers can automate repetitive digital tasks in a reliable and customizable way.

Benchmark performance and reliability are the focus

Amazon’s announcement emphasizes that reliability is more important than intelligence in preventing widespread adoption of agents. Amazon reports that current state-of-the art models are not very robust when it comes to powering AI agents. Agents typically achieve 30% to 60% success on browser-based, multi-step tasks. Nova Act, on the other hand, emphasizes an approach that focuses on building blocks, scoring over 90% in internal evaluations for tasks that challenge existing models, such as interacting with dropdowns or date pickers.

Luan emphasized why this reliability focus is important. “What we have really focused on is the question of how to make agents reliable?” He said that if you ask it to update an entry in Salesforce, and it deletes the database one out of every ten times, it’s unlikely you will use it again.

Amazon AGI compared Nova Act to other models, including Anthropic’s Claude 3.7 Sonnet model and OpenAI CUA. On the ScreenSpot Web Text benchmark, which tests instruction-following on textual screen elements, Nova Act achieved a score of 0.939, outperforming Claude 3.7 Sonnet (0.900) and OpenAI CUA (0.883).

Amazon Nova Act benchmarks. Credit: Amazon

On the ScreenSpot Web Icon benchmark, which focuses on visual UI elements, Nova Act scored 0.879, again ahead of the other models.

However, on the GroundUI Web benchmark, which tests general UI interaction, Nova Act scored 0.805, slightly behind its competitors.

These scores were measured internally by Amazon using consistent prompts and evaluation criteria.

Amazon also highlighted early results in Nova Act’s ability to generalize beyond standard environments.

For instance, team member Rick Liu demonstrated how the agent, without explicit training, successfully interacted with a pigeon-themed web game—assigning stats, battling opponents, and progressing in the game.

According to Luan, that ability to generalize is central to the long-term vision. “Our goal with Nova Act is to be a universal browser-use solution. We want an agent that can do anything you want to do on a computer for you,” he said.

Nova Act is a flexible cloud-based application that can be used in multiple clouds, but is locked to Amazon’s Nova Model

while it is available to developers worldwide through nova.amazon.com (19459061]Luan clarified the system was tightly coupled with Amazon’s own Nova foundation models.

Developers are not able to plug in external LLMs, such as OpenAI GPT-4o and Anthropic Claude 3.7 Sonnet. This is unlike OpenAI Agents SDK. Microsoft’s AutoGen () and Salesforce’s Agentforce platforms(which allows switching to a few provider companies and model family families)

He said that “Nova Act” is a customized version of the Nova model. It’s not a generic LLM scaffolding. It’s natively programmed to act on your behalf on the internet.”

Nova Act isn’t restricted to AWS environments. Developers can use the SDK to run it locally or in the cloud. Luan said that you don’t have to be using AWS in order to use the SDK. Nova Act may not be the best option for businesses that want to give their agents maximum flexibility in terms of the underlying model. It’s worth a look for those who are looking for a model that is specifically designed to navigate and perform actions on a variety of websites, with different user interfaces.

Security, licensing, and pricing

Nova Act SDK was released under the Apache License Version 2.0 (January 2004, an open source license). This applies only to SDK software.

Nova Act’s model, its weights, and training data are proprietary and remain closed-source. Luan explained that this approach was intentional. The model is tightly integrated with the SDK and is co-trained to achieve reliability. Nova Act will be offered at launch as a free preview for research. Pricing for production use has not yet been announced.

Luan describes this phase as a chance for developers to experiment with and build on the technology. “We believe that the majority of useful agent products are still to be built. He said that he wanted to make it possible for anyone to create a useful agent, either as a personal product or a commercial one.

Amazon plans to introduce production terms in the future, including usage-based charging and scaling guarantees. However, these are not yet available.

What’s next with Nova Act? Nova Act is part of Amazon’s larger ambition to make AI agents that can take action a fundamental component of computing.

Luan summarized the opportunity ahead, “My personal fantasy is that agents will become the building blocks of computing and the coolest startups and products will be built on top of the stuff our team is working on.” Amazon’s website Github

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to maximize ROI, from regulatory changes to practical deployments.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.


www.aiobserver.co

More from this stream

Recomended