Book: LLM Observability Pocket Guide: Picking the Right Tracing & Evals Tools for Your Team Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You ship a multi-tenant SaaS with an AI feature. One Anthropic key fans out to every customer (swap in OpenAI, Bedrock, Vertex; the shape is the same). On Tuesday morning a single tenant burns through your minute-bucket. Usually it is the one running a backfill nobody told you about. The whole platform starts returning 429s. Your enterprise customer's CEO demo at 10 AM hits the rate limit two minutes in, because a free-tier tenant is replaying their support inbox through your summarizer. A team I talked to had a worse version of this. They added "20 requests per minute per tenant" the first time it happened.…