You have a judge LLM in your pipeline. You've told it: "Return JSON only. No preamble, no explanation. Just the JSON object." It works great in testing. It works great in staging. Then in production it returns: Sure! Here's my evaluation of the response: { "score" : 4 , "reason" : "The answer is mostly correct but..." } Enter fullscreen mode Exit fullscreen mode Your json.loads() throws. Your pipeline catches nothing. Downstream code receives None and keeps running. Your evaluation scores are silently wrong for the next 200 requests before anyone notices. Was this the model misbehaving? No. Was there ever a way to actually force JSON output? Yes — but it's not the prompt. Let me show you the real mechanism. What "return JSON only" actually does When you write a format instruction in a prompt, you are doing exactly one thing: shifting the probability distribution over the next token.…