p95 latency dropped from 2.3 seconds to 180 milliseconds. Same hardware, same database, same traffic. The only thing that changed was how we cached — and I don't mean slapping @lru_cache on a function. I'm writing this because every Redis caching tutorial I read before this project showed me the same 15-line example: redis.get(key) or fetch_from_db() . That code works in a notebook. It will absolutely wreck you in production the first time real traffic hits it. This is the layered strategy that actually survived. FastAPI + Python on the server, Redis 7 for caching, Postgres behind it. Everything below is from a real project we shipped for a B2B client earlier this year — roughly 800 requests per minute on the hot endpoints, with read-heavy traffic around product catalog and pricing. The endpoint that was killing us The problematic endpoint returned a pricing quote for a product variant, filtered by region, customer tier, and active promotions.…