NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design

1 / 3

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design

NVIDIA Technical Blog·Ashraf Eassa·about 1 month ago

#IPQ3BKDw

#x2d #datacentercloud #hardwaresemiconductor #dynamo #infiniband #nvidia

Reading 0:00

15s threshold

Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak chip specifications. Rigorous AI inference performance benchmarks are critical to understanding real-world token output, which drives AI factory revenue. MLPerf Inference v6.0 is the latest in a series of industry benchmarks that measure performance across a wide range of model architectures and use cases. In this latest round, systems powered by NVIDIA Blackwell Ultra GPUs delivered the highest throughput across the widest range of models and scenarios. This brings the cumulative NVIDIA MLPerf training and inference wins since 2018 to 291, which is 9x of all other submitters combined. This round, the NVIDIA partner ecosystem participated broadly, with 14 partners—the largest number of partners submitting on any platform.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

NVIDIA Platform Delivers Lowest Token Cost Enabled by Extreme Co-Design