Agents have become the hottest thing to happen in tech. Anthropic, OpenAI, Google DeepMind and other top firms are racing to enhance large language models to be able to perform tasks on their own. These systems, also known as agentic AI by industry jargon are the new focus of Silicon Valley buzz. Everyone from Nvidia and Salesforce is talking about their plans to disrupt the industry.
In a blog last week, Sam Altman stated that he believes the first AI agents will ‘join’ the workforce in 2025 and change the output of businesses.
An agent is a system of software that does something without supervision. The more complex the thing, the smarter an agent must be. Large language models are smart enough for many to power agents who can perform a wide range of useful tasks, such as filling in forms, searching for a recipe, adding ingredients to an online shopping basket, or using search engines to do last-minute preparations before a meeting, and producing a bullet-point summary.
Anthropic demonstrated one of its most advanced agents in October: a computer use extension to its Claude large-language model. It lets you instruct Claude to operate a computer as a human would by moving a mouse, clicking buttons and typing text. You can now ask Claude to perform on-screen tasks instead of just having a conversation.
Anthropic notes the feature is still cumbersome, and error-prone. It is available to a few testers including third-party developers from companies such as DoorDash Canva and Asana.
Computer usage is a preview of what agents will be able to do in the future. MIT Technology Review ( ) spoke to Anthropic cofounder and chief scientific officer Jared Kaplan about what’s next. Here are five ways agents will be even better in 2025.
Kaplan’s responses have been lightly edited to increase length and clarity.
Agents will become better at using tools.
I think there are two axes to thinking about what AI can do. One aspect is the complexity of a system’s task. As AI systems become smarter, they are improving in this direction. Another direction that is very relevant is the kind of environments or tools that AI can use.
So, for example, if we go back to [DeepMind’s Go-playing model] AlphaGo almost 10 years ago, we had AI system that were superhuman when it came to how well they played board games. If you’re only allowed to use a boardgame, that’s a very restricted environment. It’s not useful, even if very intelligent. You’re moving towards bringing AI to different situations and tasks by using text models, multimodal models, computer use, and perhaps robotics in the future.
We were excited about the computer use mainly for that reason. It was necessary, until recently, to give large language models a very specific prompt and give them tools that were very specific. Then, they had to be restricted to a certain kind of environment. I think that computer use is going to improve rapidly in terms of how models can perform different tasks and more complicated tasks. It will also be able to recognize when users have made mistakes or when there is a high-stakes issue and it should ask for feedback.
2/ The agents will understand the context
Things like your role, the style of writing you use or what you and your company need.
) “I think we’ll see improvement there, where Claude will have the ability to search through your documents, Slack, and other things, in order to learn what is useful for you. Agents tend to undervalue this. It is important that systems are not only useful, but also safe and do what you expect.
Claude won’t have to use much reasoning for a lot tasks. You don’t have to sit there and think for hours just to open Google Docs. I think we’ll be seeing not only more reasoning, but also the application of it when it is really useful and important. But also not wasting unnecessary time. As these systems improve, they may be used more widely and collaborate with you in different activities.
I think DoorDash is experimenting with different types of browser interactions, and designing them using AI.
I expect that we will also see improvements to coding assistances. This is something that developers have been very excited about. There’s a lot of interest in using Claude 3.5 to code, which isn’t just autocomplete as it was a few years ago. It’s about debugging code–running it, seeing what happens, then fixing it.”
3/ Agents must be made safe.
“We founded Anthropic as we expected AI to advance very quickly, and [thought] safety concerns would inevitably be relevant. This is going to be more evident this year because these agents will become more integrated into our work. We must be prepared for challenges like prompt injection.
[Prompt injection is an attack in which a malicious prompt is passed to a large language model in ways that its developers did not foresee or intend. One way to do this is to add the prompt to websites that models might visit.]
“Prompt injection is one of the top things we are thinking about when it comes to, like, a broader use of agents. I think it is especially important for computer usage, and we are working very actively on it. If computer use is deployed in a large scale, there could be pernicious websites, or something that tries to convince Claude of something it shouldn’t.
“And with more advanced model, there’s more risk. We have a robust policy of scaling where, as AI systems grow sufficiently capable, we want to be able prevent them being misused. If they could, for example, help terrorists – that kind of thing.
I’m excited about the potential of AI. It’s also accelerating Anthropic in many ways internally, with people using Claude for all sorts of things, especially coding. But, yes, there will be many challenges. It’ll be a very interesting year.”