Menu

Post image 1
Post image 2
1 / 2
0

DeepSeek-V4: Finally, a Context Window Built for Agents

DEV Community·Aamer Mihaysi·19 days ago
#4urUwkEc
#comment#ai#agents#attention#tool#tokens
Reading 0:00
15s threshold

Most long-context models are benchmarks in search of a use case. DeepSeek-V4 is different. It is built for the one workload that actually needs a million tokens: agents running long-horizon tasks. The specs are straightforward. Two MoE checkpoints: V4-Pro at 1.6T total parameters with 49B active, and V4-Flash at 284B total with 13B active. Both ship with a 1M-token context window. But the headline is not the window size. It is what happens to inference cost as you use it. At 1M tokens, V4-Pro requires 27% of the single-token FLOPs compared to V3.2. The KV cache uses 10% of the memory. V4-Flash drops further: 10% of FLOPs, 7% of KV cache. Against a standard grouped-query attention baseline, V4 uses roughly 2% the cache size. These are not incremental gains. They are the difference between a demo and a production deployment. Hybrid Attention The architecture splits attention into two mechanisms that alternate across layers.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More