DeepSeek-V4: Finally, a Context Window Built for Agents

1 / 2

DeepSeek-V4: Finally, a Context Window Built for Agents

DEV Community·Aamer Mihaysi·19 days ago

#4urUwkEc

#comment #ai #agents #attention #tool #tokens

Reading 0:00

15s threshold

Most long-context models are benchmarks in search of a use case. DeepSeek-V4 is different. It is built for the one workload that actually needs a million tokens: agents running long-horizon tasks. The specs are straightforward. Two MoE checkpoints: V4-Pro at 1.6T total parameters with 49B active, and V4-Flash at 284B total with 13B active. Both ship with a 1M-token context window. But the headline is not the window size. It is what happens to inference cost as you use it. At 1M tokens, V4-Pro requires 27% of the single-token FLOPs compared to V3.2. The KV cache uses 10% of the memory. V4-Flash drops further: 10% of FLOPs, 7% of KV cache. Against a standard grouped-query attention baseline, V4 uses roughly 2% the cache size. These are not incremental gains. They are the difference between a demo and a production deployment. Hybrid Attention The architecture splits attention into two mechanisms that alternate across layers.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

DeepSeek-V4: Finally, a Context Window Built for Agents