I shipped a structured-output endpoint to production in March. The schema was clean, JSON mode was on, the model was GPT-4.1, the eval suite was green. Three weeks in, the on-call channel lit up because a downstream billing job had silently skipped 4,200 records over a weekend. The output was valid JSON. It just wasn't the JSON we asked for. That was my last "JSON mode is good enough" deployment. Since then I've shipped four more LLM structured-output systems and the failures keep coming from the same places — and JSON mode catches roughly two of them. This post is the toolkit I wish I had on day one, with runnable Python you can drop into a FastAPI service this afternoon. The six failure modes JSON mode does not save you from Two months of incident logs across two enterprise deployments, sorted by frequency: Silent truncation. max_tokens runs out mid-object. You get parseable JSON for the first 80% of an array, the last item is gone. Hallucinated keys.…