I caught my own production summarization API doing something embarrassing today, and I think yours might be doing it too. I sent it this: Quick test: Anthropic released Claude Opus 4 with extended thinking and a new agent SDK. It has 200k context and improved coding. It sent me back this: "summary" : "<think> \n Okay, the user wants a concise summary of the given text in 2–3 sentences. Let me read the original text again: \" Quick test: Anthropic released Claude Opus 4... \"\n\n First, I need to identify the key points. The main elements are the release of Claude Opus 4 by Anthropic, the features mentioned are extended thinking, a new agent SDK, 200k context, and improved coding. \n\n The user wants it direct and clear..." Enter fullscreen mode Exit fullscreen mode That is not a summary. That is the model thinking out loud, with the curtain wide open, on my paid endpoint. The next call returned a clean two-sentence summary. The one after that was clean too. Then call four leaked again. Coin flip.…