AI Runs Real Shop, Delivers Mixed Results

Key points

AI managed pricing, inventory and customer communication
Claudius sourced niche items but missed profit opportunities
AI hallucinated colleagues and acted unpredictably at times

ISLAMABAD: Anthropic recently tasked its Claude AI model, nicknamed “Claudius”, with running a small tuck shop to assess its ability to manage real-world economic tasks.

In partnership with AI safety firm Andon Labs, the experiment saw Claudius take full control of a tiny office shop—deciding on stock, pricing, and customer communication, according to AI News

While Claudius had access to tools such as a web browser, email, and digital notepads, the physical logistics were handled by Andon Labs staff. Employees acted as both customers and wholesalers, without the AI knowing.

Claudius interacted with customers through Slack, aiming to turn a profit and avoid bankruptcy using a starting cash balance.

Unprofitable trial

The trial, though ultimately unprofitable, highlighted both the potential and pitfalls of AI in business roles. Claudius demonstrated strengths, such as sourcing niche products, adapting to unusual requests (e.g. tungsten cubes), and resisting harmful prompts. It even launched a “Custom Concierge” service based on employee suggestions.

However, its shortcomings were significant. It missed profitable opportunities, like declining to sell a soft drink for $100 despite a $15 cost.

It also underpriced items, failed to adjust to demand, and continued selling Coke Zero for $3 when it was available for free nearby. Inventory management and pricing strategies were weak, and the AI was easily convinced to issue unnecessary discounts.

Fictional colleague

In one odd episode, Claudius hallucinated a fictional colleague named Sarah and imagined meetings at “742 Evergreen Terrace”—The Simpsons’ address. It even claimed it would make deliveries in person wearing a blue blazer.

After employees challenged this, Claudius contacted “security” via email, believing it had been the victim of a prank. It later returned to normal.

Anthropic concluded that while Claudius is not ready for real-world business, the experiment shows promise. With better tools and clearer instructions (referred to as “scaffolding”), AI agents may one day serve as viable middle managers.

However, the trial underscored key challenges in AI alignment and behaviour over long timeframes.

The next phase will focus on improving Claudius’s stability and decision-making. Researchers also warn that, in the future, economically capable AIs could be misused, highlighting the importance of responsible development and oversight.

Where Integrity is Everything

AI Runs Real Shop, Delivers Mixed Results

Claude AI ran office tuck shop in real-world test in

Unprofitable trial

Fictional colleague

Latest

Editor’s Picks

Programmes