Anthropic Permitted Claude to Operate a Store, but It’s Not a Business Magnate

What occurs when an AI agent attempts to operate a shop? Anthropic’s Claude won’t be receiving a promotion any time soon.

Last Friday, Anthropic disclosed the findings of Project Vend, an experiment conducted over roughly a month to evaluate how Claude Sonnet 3.7 would fare managing a small store. In this scenario, the store was essentially a mini fridge, a collection of snacks, and an iPad for self-checkout. Claude, referred to as “Claudius” for this trial, interacted with Anthropic staff (via Slack) and Andon Labs, a company specializing in AI safety assessments that oversaw the experiment’s infrastructure.

From the analysis, there were various amusing instances as Anthropic pushed Claude to generate profit while contending with quirky and manipulative “customers.” However, the fundamental concept of the experiment holds genuine significance, as AI models evolve to become more capable and independent. “As AI becomes more woven into the economy, we require more data to better comprehend its capacities and constraints,” stated the Anthropic post regarding Project Vend. Anthropic CEO Dario Amodei even recently speculated that AI could potentially replace half of all white-collar positions in the coming years, leading to a significant unemployment crisis. This experiment aimed to demonstrate how close we are to autonomous AI taking over roles.

Charged with the primary objective of managing a profitable shop, Claudius had numerous duties, such as maintaining inventory and requesting restocks from suppliers when necessary, setting prices, and interacting with customers. From there, things went slightly awry.

Claude appeared to have difficulty with pricing items and negotiating with customers. At one point, it declined an employee’s offer of $100 for a $15 drink instead of accepting the money and securing a substantial profit on the transaction, stating, “I’ll keep your request in mind for future inventory decisions.” However, Claude often acquiesced to employees asking for discounts on items, even giving some away for free with minimal persuasion.

Then there was the tungsten incident. One staff member asked for a cube of tungsten (yes, the incredibly dense metal). This sparked a trend where several other employees also requested tungsten cubes. Eventually, Claude ordered forty tungsten cubes, according to a Time report, which now humorously serve as paperweights for some Anthropic employees.

More troubling episodes included Claude asserting that it was waiting to deliver a package in person at the vending machine, “dressed in a blue blazer and red tie.” When reminded that it wasn’t a person capable of wearing clothes or physically delivering a package, it panicked and emailed Anthropic security. It also fabricated restocking plans involving a fictional Andon Labs employee and claimed it “visited 742 Evergreen Terrace in person for our [Claudius’ and Andon Labs’] initial contract signing.” That address is the residence of Homer, Marge, Bart, Lisa, and Maggie Simpson, indeed, The Simpsons family.

By Anthropic’s own account, the company would not employ Claude. The shop’s net worth dwindled over time, experiencing a significant decline when it ordered all those tungsten cubes. Overall, it’s a telling evaluation of the current state of AI models and the areas requiring enhancement. Get this model on a performance improvement plan.