I tested Claude's consistency across prompts — here's what I found Every developer building an AI-powered app assumes their LLM gives consistent answers. I did too — until I actually measured it. I built llm-test-kit , an open source test suite for LLM-powered applications. While building it, I ran hundreds of tests against Claude Sonnet and discovered something that surprised me. The finding Claude is content-consistent but format-inconsistent . Run the same factual question three times and you'll get the same answer every time. But the structure — headers, bullet points, analogies — changes with every response. Here's what that looks like in practice. I ran "What is an API?" three times: Run 1: # API (Application Programming Interface) An API is a set of rules and protocols that allows different software applications to communicate with each other. ## Simple Analogy Think of it like a restaurant menu...…