Two-tower models offer sub-10ms latency for cold-start; vector DB + LLM provides richer semantics. Hybrid architectures reduce churn by 15-20%. Two-tower models and vector DB + LLM architectures represent competing paradigms for personalized recommendation at scale. The choice between them hinges on latency budgets, cold-start handling, and semantic depth requirements. Key facts Two-tower models achieve sub-10ms inference for millions of users. LLM re-ranking adds 100-500ms per query. Hybrid architectures reduce churn by 15-20% over pure systems. Vector DB + LLM excels in cold-start for new items. Pinterest and Netflix use hybrid two-tower + LLM deployments. Recommender systems at scale face a fundamental trade-off: throughput versus semantic richness. Two-tower models, popularized by Google's 2019 YouTube recommendation paper, embed users and items into a shared latent space via dual neural networks.…