Introduction We built an AI feature that depended on low-latency bi-directional comms: model feedback loops, live agent coordination, and user-facing streaming results over WebSockets. At first it was fast and simple. Then a combination of connection churn, uneven load, and our own optimistic assumptions turned the system into a nightly firefight. Here’s what we learned the hard way and how adding a realtime orchestration layer changed the game. The Trigger Latency spikes during peak periods started to cascade. A few symptoms we saw: 99th-percentile request times shot up while median stayed fine. Messages duplicated or arrived out of order when an upstream retried. Our homegrown fanout layer collapsed under connection churn. The immediate fallout: agents missed context, models processed stale inputs, and customers saw wrong or delayed streaming outputs. What We Tried (and Why It Failed) Vertical scaling the fanout service We beefed up the box running the WebSocket proxy and fanout logic.…