NVIDIA and Google infrastructure cuts AI inference costs

1 / 2

NVIDIA and Google infrastructure cuts AI inference costs

AI News·Ryan Daws·about 1 month ago

#PLYb9DdV

#physicalai #aibusinessstrategy #aiinaction #features #insideai #nvidia

Reading 0:00

15s threshold

At the Google Cloud Next conference, Google and NVIDIA outlined their hardware roadmap designed to address the cost of AI inference at scale. The companies detailed the new A5X bare-metal instances, which run on NVIDIA Vera Rubin NVL72 rack-scale systems. Through hardware and software codesign, this architecture aims to deliver up to ten times lower inference cost per token compared to previous generations, while concurrently achieving ten times higher token throughput per megawatt. Connecting thousands of processors requires massive bandwidth to prevent processing delays. The A5X instances address this hardware challenge by pairing NVIDIA ConnectX-9 SuperNICs with Google Virgo networking technology. This configuration scales to 80,000 NVIDIA Rubin GPUs within a single site cluster, and up to 960,000 GPUs across a multisite deployment.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

NVIDIA and Google infrastructure cuts AI inference costs