At the Google Cloud Next conference, Google and NVIDIA outlined their hardware roadmap designed to address the cost of AI inference at scale. The companies detailed the new A5X bare-metal instances, which run on NVIDIA Vera Rubin NVL72 rack-scale systems. Through hardware and software codesign, this architecture aims to deliver up to ten times lower inference cost per token compared to previous generations, while concurrently achieving ten times higher token throughput per megawatt. Connecting thousands of processors requires massive bandwidth to prevent processing delays. The A5X instances address this hardware challenge by pairing NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology. This configuration scales to 80,000 NVIDIA Rubin GPUs within a single site cluster, and up to 960,000 GPUs across a multisite deployment.…