Deepseek

DeepSeek V4 supports 1M-token context, yet its KV cache for a 61-layer model fits in ~9.6 GiB (BF16) — a 6.3× reduction over naive full attention. This post breaks down how three orthogonal techniques combine to make that possible. ...