Accelerating Data Processing with NVIDIA Multi-Instance GPU and Locality Domains

1 / 12

Accelerating Data Processing with NVIDIA Multi-Instance GPU and Locality Domains

NVIDIA Technical Blog·Mukul Joshi·about 1 month ago

#mnlrJHWX

#x2d #agenticaigenerativeai #datacentercloud #simulationmodelingdesign #general #power

Reading 0:00

15s threshold

NVIDIA flagship data center GPUs in the NVIDIA Ampere , NVIDIA Hopper , and NVIDIA Blackwell families all feature non-uniform memory access (NUMA) behaviors, but expose a single memory space. Most programs therefore do not have an issue with memory non-uniformity. However, as bandwidth increases in newer generation GPUs, there are significant performance and power gains to be had when taking into consideration compute and data locality. This post first analyzes the memory hierarchy of the NVIDIA GPUs, discussing the power and performance impacts of data transfer over die-to-die link. It then reviews how to use NVIDIA Multi-Instance GPU (MIG) mode to achieve data localization. Finally, it presents results for running MIG mode versus unlocalized for the Wilson-Dslash stencil operator use case. Note: The techniques described in this post are exploratory, and the field is evolving quickly. New developments may supersede what is described here.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Accelerating Data Processing with NVIDIA Multi-Instance GPU and Locality Domains