Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

1 / 9

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core

NVIDIA Technical Blog·Kunlun Li·about 1 month ago

#8YrH91Oa

#x2d #agenticaigenerativeai #consumerinternet #nsighttoolscompute #intermediatetechnical #length

Reading 0:00

15s threshold

This post introduces Dynamic Context Parallelism (Dynamic-CP), a scheduling approach in NVIDIA Megatron Core used for LLM post-training or DiT pre-training. It dynamically selects the CP size per microbatch to efficiently handle variable-length sequences, achieving up to 1.48x speedup on real-world datasets. In large-scale model training, an often-overlooked bottleneck arises from the sequence-length variability in real-world datasets. Both LLM training and large-scale video generation have clear long-tail distributions in sequence length. A small fraction of ultra-long samples accounts for a disproportionately large share of the computational workload and memory consumption In LLM training, this leads to wide-ranging text sequence lengths across batches. In video generation, high-resolution, multi-second videos can span tens of thousands of tokens.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Speeding Up Variable-Length Training with Dynamic Context Parallelism and NVIDIA Megatron Core