I got tired of seeing small businesses miss calls. So I built Vokio — a voice AI agent that answers real phone calls, remembers callers between sessions, and generates a post-call summary automatically. Here's the full architecture and how I solved the hard parts. Stack Python + Flask (webhook server) Vapi (telephony + STT) Claude Haiku (conversation) Deepgram Nova 3 (Spanish STT) Azure TTS (Spanish voices) SQLite (memory between calls) How it works Vapi handles the phone call and speech recognition. Instead of using a built-in LLM, I configured Vapi to use a Custom LLM — pointing it to my Flask server. Every time the caller says something, Vapi sends a POST request to my endpoint and expects an OpenAI-compatible streaming response. @app.route ( " /chat/completions " , methods = [ " POST " ]) def chat (): data = request .…