MXFP4 / MXFP8 Quantization

MX means microscaling: one small block of values shares one scale, while every value keeps its own low-bit floating-point element.

For one MX block:
real_value[i] = X × P[i]

X = shared E8M0 power-of-two scale
P[i] = private FP4 or FP8 element

Block size k = 32 Scale = E8M0, 8 bits No explicit zero point Power-of-two scaling

One block layout

X
shared scale
8-bit E8M0

The visual shows 32 private elements. In memory, the spec does not prescribe the exact physical layout.

1. Format layout

MXFP4

4.25 bits / value

Each private element is FP4 E2M1: 1 sign bit, 2 exponent bits, 1 mantissa bit. A block stores 32 such elements plus one 8-bit E8M0 scale.

E E

Property	Value
Element type	FP4 E2M1
Element max normal	±6.0
Min normal / subnormal	±1.0 / ±0.5
Block storage	8 + 32×4 = 136 bits

MXFP8

8.25 bits / value

MXFP8 uses FP8 private elements. The spec allows E4M3 or E5M2. This example uses E4M3, which is common when precision matters more than very large range.

E E E E

M M M

Property	E4M3	E5M2
Exponent bias	7	15
Max normal	±448	±57,344
Min normal	±2⁻⁶	±2⁻¹⁴
Block storage	8 + 32×8 = 264 bits

2. Scale and zero point

Scale in MXFP4/MXFP8
The scale is an 8-bit E8M0 value. It stores only an exponent, so the scale is a power of two: X = 2^(encoded_exponent − 127). Multiplying by X is conceptually the same as shifting the exponent of each private element.

Zero point in MXFP4/MXFP8
There is no explicit zero point. Classic affine INT quantization uses real = scale × (q − zero_point). MX floating formats use real = X × P. Zero is represented by the FP element zero itself, so no stored zero-point field is needed.

3. Worked example: quantize one 32-value block

Assume these are 32 weights from one linear layer block. We use the same source block for MXFP4 and MXFP8, then compare the reconstructed values.

Source block32 FP32/BF16-like values. Here max absolute value is 2.90.

Choose XUse a power-of-two scale based on amax and the largest power-of-two representable by the element type.

NormalizeCompute V[i] / X, so values fit the local FP4/FP8 element range.

RoundRound normalized values to FP4 E2M1 or FP8 E4M3 elements.

DequantizeRecover approximate values with X × P[i].

Shared scale X0.5

E8M0 bits01111110

Mean abs error0.1266

Max abs error0.4500

OriginalDequantizedBars are scaled to the largest magnitude in the source block.

4. What to remember

MXFP4

Very compact. The shared scale gives each 32-value block a local range, but FP4 E2M1 still has only a few representable levels. Small values can become zero, and medium values can move to coarse buckets.

MXFP8

Less compact but much more accurate. With 8 element bits, the same shared-scale idea gives better local precision and usually much smaller reconstruction error.