Introduction We hit a hard scaling wall after shipping a realtime feature tied to our AI agents. Latency spiked, message loss crept in, and ops time ballooned. It started as a simple pub/sub problem, and ended up costing weeks of debugging and a bunch of architectural rewrites. Here is what we learned the hard way, the wrong assumptions we made, and the changes that actually stuck. The Trigger Traffic patterns changed: bursts of short-lived connections from a new client, plus background AI agents that produced a steady stream of small events. Symptoms: WebSocket connections dropping intermittently under burst load. End-to-end message delivery inconsistency between services. Backpressure not propagated, causing memory spikes in a few services. Too many homegrown glue scripts to coordinate AI steps. At first, this looked fine. Our monolith handled modest load. But at 10M events a day, operational complexity became the real bottleneck.…