TV Layout Formula — Interactive Visualizer

mma.sync.m16n8k16 — TV_layout_C: shape ((4,8),(2,2)) : stride ((16,1),(8,64))
col = tid % 8     row = (tid / 8) * 2 + (vid % 2) + (vid / 2) * 8
T_inner (tid%8) = 0,   T_outer (tid/8) = 0
V_inner (vid%2) = 0,   V_outer (vid/2) = 0

Offset Computation

(tid%8)*1 = 0
+
(tid/8)*16 = 0
+
(vid%2)*8 = 0
+
(vid/2)*64 = 0
=
offset 0
col = 0,   row = 0   → C[0, 0]
Current (tid, vid)
Same thread (all 4 values)
Same value index (all 32 threads)

16×8 Output Matrix C

Thread 0's Registers

C[0, 0]
C[1, 0]
C[8, 0]
C[9, 0]