Adding voice messages to an AI companion is one of those features that sounds simple until you try to ship it. "Just use a TTS API and send the audio" -sure, in theory. In practice, you are solving latency, character consistency, emotional expressiveness, and cost optimization all at once. Here is how voice synthesis works in production for AI companions, based on what I have learned building and studying these systems. The TTS landscape in 2026 The text-to-speech market has fragmented into two tiers. Tier one: high-quality providers with natural-sounding output. Fish Audio, ElevenLabs, and PlayHT are the leaders here. The voices sound human. They handle emphasis, pacing, and emotional variation. They cost between $15-30 per million characters. Tier two: cost-efficient providers with acceptable quality. Google Cloud TTS, Amazon Polly, and various open-source models (Bark, XTTS). Cheaper by an order of magnitude but with audible synthesis artifacts.…