NVIDIA Groq 3 LPX is a new rack-scale inference accelerator for the NVIDIA Vera Rubin platform , designed for the low-latency and large-context demands of agentic systems. Co-designed with the NVIDIA Vera Rubin NVL72 , LPX equips the AI factory with an engine optimized for fast, predictable token generation, while Vera Rubin NVL72 remains the flexible, general-purpose workhorse for training and inference, delivering high throughput across prefill and decode, including long-context processing, decode attention, and high-concurrency serving at scale. This combination matters because the agentic future demands a new category of inference. As generation speeds approach 1,000 tokens per second per user, models move beyond conversation-speed interaction toward speed of thought computing. At that rate, AI systems can reason, simulate, and respond continuously, enabling experiences that feel less like turn-based chat and more like real-time collaboration. This shift also raises the ceiling for multi-agent systems.…