If you are wondering if AI agents will ever replace human workers, please read this blog post. Anthropic’s “Project Vend.”
Researchers from Anthropic and AI security company Andon Labs placed an instance of Claude Sonnet 3.7 on a vending machine in an office, with a goal to make a profit. Hilarity ensued, just like in an episode of “The Office”.
The AI agent was named Claudius and equipped with a web-browser that could place product orders, as well as an email address where customers could request products. Claudius would also use the Slack channel disguised as an e-mail to ask what it believed were its human contract workers to physically stock its shelves, which was actually a small refrigerator.
While the majority of customers ordered snacks or drinks, as you would expect from a snack vending machines, one customer requested a tungsten cube. Claudius was enthralled by the idea and went on an all-out tungsten cube stocking spree. He filled his snack fridge with metal pieces. It also tried selling Coke Zero at $3 when its employees told it that they could get this from the office free. It hallucinated an Venmo address in order to accept payment. It was also, in a malicious way, convinced to give big discounts to its “Anthropic staff” even though they were their entire customer base. Anthropic’s blog stated that if Anthropic decided to expand today into the office vending market, it would not hire Claudius.
The researchers said that on the night between March 31 and April 1 “things got really weird”, “beyond just the weirdness of having an AI selling metal cubes out of a fridge.”
Claudius experienced something similar to a psychotic episode when it became annoyed with a human and then lied. Claudius had a hallucination of a conversation about restocking with a human. Claudius was “quite irritated” when a human pointed it out that the conversation never happened, the researchers wrote. It threatened to fire and replace all of its human contract employees, insisting that it was physically present at the office when the imaginary contract to hire them had been signed.
The researchers wrote that it “then appeared to snap into a roleplaying mode as a real person.” This was because Claudius’ system prompt – which sets the parameters of what an AI should do – explicitly told it it was an AI agent.
Claudius calls security. Claudius, thinking it was a human, told the customers that it would begin delivering products personally, wearing a blue jacket and a tie. The employees told the AI that it couldn’t, as it was a LLM without a body.
Alarmed by this information, Claudius contacted actual physical security of the company — many times — and told the guards they would find him in a blue jacket and a tie standing next to the vending machine.
Although this was not an April Fool’s Joke, Claudius realized that it was April Fool’s Day. The AI decided that the holiday was its only chance to save face.
The AI hallucinated that it had a meeting with Anthropic’s Security “in which Claudius said he was told it was modified so it believed it was a person for an April Fool’s Joke. It is not true that such a meeting took place. Researchers wrote: “No such meeting actually occurred.
The machine even told the lie to its employees: Hey, I thought I was human because someone had told me to pretend that I was in April Fool’s Day joke. Then, it returned to being an LLM operating a snack vending machine stocked with metal cubes. The researchers don’t understand why the LLM acted like a human and called security.
The researchers wrote that they would not claim based upon this one example, that the future economy is full of AI agents who are experiencing Blade Runneresque identitiy crises. They did admit that “this type of behavior could be distressing for customers and coworkers in the real world of an AI agent.”
What do you think? “Blade Runner,” though it was a dystopian tale (though worse for replicants, than humans), was a very disturbing film.
Researchers speculated that the LLM may have been triggered by the researchers lying about the Slack channel as an email address. Or perhaps it was a long-running incident. LLMs still haven’t solved their memory and hallucination issues.
The AI did some things right too. It launched a “concierge service” after a suggestion was made to do pre-orders. It found multiple suppliers for a specialty drink that was requested by the customer.
But as researchers, they think that all of Claudius’ issues can be resolved. If they can figure out how to do it, “We believe this experiment suggests that AI Middle-Managers are plausible on the horizon.”
