Menu

Post image 1
Post image 2
1 / 2
0

Architecture Teardown: How Meta Trains LLMs for Code Generation on 100k GPU Clusters

DEV Community·ANKUSH CHOUDHARY JOHAL·about 1 month ago
#lhXiQyzF
#code#how#architecture#teardown#training#meta
Reading 0:00
15s threshold

In Q3 2024, Meta trained a 70B parameter code-specialized LLM on 100,000 Nvidia H100 GPUs, achieving 214 TFLOPS per GPU and 92% cluster utilization – a 3x improvement over their 2023 16k A100 cluster runs, with total training cost of $17.4M for 21 days of continuous operation. 📡 Hacker News Top Stories Right Now Ghostty is leaving GitHub (2250 points) Bugs Rust won't catch (158 points) How ChatGPT serves ads (265 points) Before GitHub (389 points) Show HN: Auto-Architecture: Karpathy's Loop, pointed at a CPU (88 points) Key Insights Meta’s custom collective communication library (C3) achieves 98.7% bandwidth utilization on 100k H100 nodes, vs 89% for NCCL 2.18.3 PyTorch 2.3.0 with FSDP2 and custom activation checkpointing reduces memory footprint by 41% vs vanilla FSDP for 70B code models Total training cost for 70B model on 100k GPUs is $17.4M, 22% cheaper than equivalent 16k A100 cluster runs when accounting for H100’s 3.2x throughput By 2026, Meta plans to deploy 250k GPU clusters with custom silicon,…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More