Bulletproofing LLM Structured Output in Python: Healing Retries, Cost Caps, and Drift Detection (…

1 / 2

Bulletproofing LLM Structured Output in Python: Healing Retries, Cost Caps, and Drift Detection (Runnable Code)

DEV Community·Nitin Srivastava·24 days ago

#9Ltr4ecm

#python #llm #ai #model #import #json

Reading 0:00

15s threshold

I shipped a structured-output endpoint to production in March. The schema was clean, JSON mode was on, the model was GPT-4.1, the eval suite was green. Three weeks in, the on-call channel lit up because a downstream billing job had silently skipped 4,200 records over a weekend. The output was valid JSON. It just wasn't the JSON we asked for. That was my last "JSON mode is good enough" deployment. Since then I've shipped four more LLM structured-output systems and the failures keep coming from the same places — and JSON mode catches roughly two of them. This post is the toolkit I wish I had on day one, with runnable Python you can drop into a FastAPI service this afternoon. The six failure modes JSON mode does not save you from Two months of incident logs across two enterprise deployments, sorted by frequency: Silent truncation. max_tokens runs out mid-object. You get parseable JSON for the first 80% of an array, the last item is gone. Hallucinated keys.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Bulletproofing LLM Structured Output in Python: Healing Retries, Cost Caps, and Drift Detection (Runnable Code)