There is a reflex in AI engineering right now: when in doubt, reach for the biggest model you can afford. GPT-4o for the customer support bot. Claude Opus for the internal search tool. A frontier-class model for the document classifier that runs ten thousand times a day. That reflex is expensive. And in a growing number of production scenarios, it is also wrong. Small language models are no longer a compromise you accept when you cannot afford the real thing. They are a deliberate architectural choice that, in the right context, beats larger models on latency, cost, privacy, and even accuracy. This post gives you the framework to know when that context applies to your project. What Makes a Model "Small"? The working definition across the industry is any language model under ten billion parameters. In practice, most SLMs deployed in production today sit between one and seven billion parameters.…