The Problem We Were Actually Solving We needed to distinguish between real treasure spawns and synthetic spam. The original design used a lightweight LLM filter called TreasureLLM that ran on top of every /spawn request; it cost 12 ms and dropped only 0.3 % of fake spawns in the demo. The problem was that the filter was pure Python, blocking, and our traffic model showed that once we crossed 300 k ccu the filter would become the new tail latency at 100 ms. At that point the geo-fence lookup we already had in Redis would have to do extra round-trips to validate the result, which was a latency stack we had not budgeted. The documentation for TreasureLLM promised sub-5 ms responses with ONNX, but the actual compilation artifact came with a 256 MB model that fit into neither our 512 MB Redis container nor our 1 MB hot cache. What We Tried First (And Why It Failed) We tried three things in the same weekend: Fuse TreasureLLM directly into the geofence micro-service using coroutines.…