This is the “real app” version of the 5-minute quickstart: a polished UI, AudioWorklet mic capture, temporary-token auth, and full barge-in handling. The AssemblyAI Voice Agent API does the speech recognition, the LLM, and the TTS server-side — you’re just shuttling audio bytes. Why One WebSocket Beats a Multi-Service Pipeline A traditional browser voice agent needs you to wire up streaming STT, an LLM, and a TTS provider, then orchestrate audio routing between them in the browser. Every hop adds latency, every provider needs a key, and every glue layer adds a failure mode. Multi-service browser pipeline Voice Agent API Services to wire up STT + LLM + TTS (3+ vendors) API keys to manage 3+ Round trips per turn 3 (mic→STT→LLM→TTS→speaker) Browser key exposure Hard to avoid Turn detection Configure separately Barge-in / interruption Implement yourself Tool calling Wire LLM tools manually The endpoint is one URL: wss://agents.assemblyai.com/v1/ws. Send 24 kHz PCM, get 24 kHz PCM back. That’s it.…