Hi there 👋

Welcome to Yi Liu’s blog where I share my thoughts on technology, programming, and more.

DeepSeek V4 KV Cache Design: How 1M Tokens Fit in 10 GiB

DeepSeek V4 supports 1M-token context, yet its KV cache for a 61-layer model fits in ~9.6 GiB (BF16) — a 6.3× reduction over naive full attention. This post breaks down how three orthogonal techniques combine to make that possible. ...

April 26, 2026 Â· 6 min Â· 1148 words

Debugging Transformers Upgrade with torch DebugMode: v4.57.3 → v5.0

January 20, 2026 Â· 38 min Â· 8010 words