Why the Treasure Hunt Engine Killed Our Weekend Before the Scale-Out

1 / 3

Why the Treasure Hunt Engine Killed Our Weekend Before the Scale-Out

DEV Community: machinelearning·Lisa Zulu·1 day ago

#MsGn75r5

#dev #latency #every #model #redis #used

Reading 0:00

15s threshold

The Problem We Were Actually Solving We needed to distinguish between real treasure spawns and synthetic spam. The original design used a lightweight LLM filter called TreasureLLM that ran on top of every /spawn request; it cost 12 ms and dropped only 0.3 % of fake spawns in the demo. The problem was that the filter was pure Python, blocking, and our traffic model showed that once we crossed 300 k ccu the filter would become the new tail latency at 100 ms. At that point the geo-fence lookup we already had in Redis would have to do extra round-trips to validate the result, which was a latency stack we had not budgeted. The documentation for TreasureLLM promised sub-5 ms responses with ONNX, but the actual compilation artifact came with a 256 MB model that fit into neither our 512 MB Redis container nor our 1 MB hot cache. What We Tried First (And Why It Failed) We tried three things in the same weekend: Fuse TreasureLLM directly into the geofence micro-service using coroutines.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Why the Treasure Hunt Engine Killed Our Weekend Before the Scale-Out