Menu

LukeW | Rethinking Networking for the AI/ML Era
📰
0

LukeW | Rethinking Networking for the AI/ML Era

www.lukew.com·Luke Wroblewski·about 1 month ago
#GZoCvgX8
Reading 0:00
15s threshold

In her AI Speaker Series presentation at Sutter Hill Ventures, Google Distinguished Engineer Nandita Dukkipati explained how AI/ML workloads have completely broken traditional networking. Here's my notes from her talk: AI broke our networking assumptions. Traditional networking expected some latency variance and occasional failures. AI workloads demand perfection: high bandwidth, ultra-low jitter (tens of microseconds), and near-flawless reliability. One slow node kills the entire training job. Why AI is different: These workloads use bulk synchronous parallel computing. Everyone waits at a barrier until every node completes its step. The slowest worker determines overall speed. No "good enough" when 99 of 100 nodes finish fast. Real example: Gemini traffic shows hundreds of milliseconds at line rate, but average utilization is 5x below peak. Synchronized bursts with no statistical multiplexing benefits. Both latency sensitive AND bandwidth intensive.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More