You built an AI voice bot that handles one call perfectly. Then you run a real campaign β 50 calls come in simultaneously. Contexts bleed between sessions. Your server buckles. The architecture that was fine for demos breaks in production. Here's the counterintuitive insight that fixes this: a voice bot is just an HTTP server . Once you see it that way, scaling becomes trivial. Why Concurrent Voice Calls Seem Hard Each live phone call requires: A persistent RTP media stream carrying audio Real-time speech-to-text per call Text-to-speech generation and delivery per response Session state (conversation history, caller context) Proper teardown when the call ends At 100 concurrent calls, you're managing 100 simultaneous audio streams plus 100 STT engines running in parallel. At 1,000, the infrastructure problem completely dominates the AI problem.β¦