Hi there 👋

Welcome to Yi Liu’s blog where I share my thoughts on technology, programming, and more.

DeepSeek V4 KV Cache Design: How 1M Tokens Fit in 10 GiB

DeepSeek V4 supports 1M-token context, yet its KV cache for a 61-layer model fits in ~9.6 GiB (BF16) — a 6.3× reduction over naive full attention. This post breaks down how three orthogonal techniques combine to make that possible. ...

Debugging Transformers Upgrade with torch DebugMode: v4.57.3 → v5.0

Understanding CuTe GEMM: A Visual Study Guide

A series of interactive visual notes exploring how NVIDIA CuTe builds a GEMM kernel — from layout algebra to thread-level execution traces.