What's the worst thing your AI agent has done in production? I'm building reliability infrastructure for AI agents, and I'm collecting failure stories from engineers who've shipped agents to production. If you've shipped an AI agent and watched it do something nobody could explain — this post is for you. Why I'm asking For the past few weeks I've been talking to engineers running AI agents in production. The same pattern keeps showing up. Their agent did something they couldn't predict. The damage was already done by the time they noticed. The tools they had only logged the failure after the fact. One engineer told me he spent a whole weekend rerunning an agent trying to reproduce one failure. Another watched his sales agent email the same lead twelve times in five minutes before anyone caught it. A third issued a $4,500 refund because the customer asked nicely and the agent didn't think to check. These aren't edge cases.…