The Architecture that scales DeepSeek V4 to 1M token context

📰

The Architecture that scales DeepSeek V4 to 1M token context

Reddit r/learnmachinelearning·u/AvvYaa·about 1 month ago

#attention #deepseek #compressed #sparse #architecture #photo

Reading 0:00

15s threshold

The Architecture that scales DeepSeek V4 to 1M token context A visual explanation of Deepseek v4. Compressed Sparse Attention (CSA) Heavily Compressed Attention (HCA) Sliding Window Attention (SWA) Deepseek Sparse Attention (DSA) And more!

Anonymous readers can preview up to 1024 characters here. Log in to unlock the full article once ingest succeeds.

Menu

The Architecture that scales DeepSeek V4 to 1M token context