MX means microscaling: one small block of values shares one scale, while every value keeps its own low-bit floating-point element.
The visual shows 32 private elements. In memory, the spec does not prescribe the exact physical layout.
Each private element is FP4 E2M1: 1 sign bit, 2 exponent bits, 1 mantissa bit. A block stores 32 such elements plus one 8-bit E8M0 scale.
| Property | Value |
|---|---|
| Element type | FP4 E2M1 |
| Element max normal | ±6.0 |
| Min normal / subnormal | ±1.0 / ±0.5 |
| Block storage | 8 + 32×4 = 136 bits |
MXFP8 uses FP8 private elements. The spec allows E4M3 or E5M2. This example uses E4M3, which is common when precision matters more than very large range.
| Property | E4M3 | E5M2 |
|---|---|---|
| Exponent bias | 7 | 15 |
| Max normal | ±448 | ±57,344 |
| Min normal | ±2⁻⁶ | ±2⁻¹⁴ |
| Block storage | 8 + 32×8 = 264 bits | |
X = 2^(encoded_exponent − 127). Multiplying by X is conceptually the same as shifting the exponent of each private element.
real = scale × (q − zero_point). MX floating formats use real = X × P. Zero is represented by the FP element zero itself, so no stored zero-point field is needed.
Assume these are 32 weights from one linear layer block. We use the same source block for MXFP4 and MXFP8, then compare the reconstructed values.
V[i] / X, so values fit the local FP4/FP8 element range.X × P[i].Very compact. The shared scale gives each 32-value block a local range, but FP4 E2M1 still has only a few representable levels. Small values can become zero, and medium values can move to coarse buckets.
Less compact but much more accurate. With 8 element bits, the same shared-scale idea gives better local precision and usually much smaller reconstruction error.