DeepSeek V4 KV Cache Design: How 1M Tokens Fit in 10 GiB
DeepSeek V4 supports 1M-token context, yet its KV cache for a 61-layer model fits in ~9.6 GiB (BF16) — a 6.3× reduction over naive full attention. This post breaks down how three orthogonal techniques combine to make that possible. ...