Algorithm visualization

SignRound / AutoRound: learn the rounding, not the model

A visual walkthrough of low-bit weight-only quantization: start from FP weights, add trainable rounding offsets V and clipping scalars α, β, reconstruct each block output, and update only those quantization parameters with SignSGD.

What is optimized?

Only quantization parameters: rounding offset V, upper clip α, lower clip β.

Inference cost

0extra ops

After tuning, export normal INT weights + scale/zero-point.

1. Big picture

RTN rounds each weight independently. SignRound instead uses calibration data to make the quantized block output match the FP block output.

RTN path

W → scale / zero-point → round(W / s + zp) → W_int

Fast, but each weight is rounded locally. It does not look at how the block output changes.

→

SignRound path

W̃ = s · clip(round(W / s + zp + V), n, m)

Train V, α, β so W̃X reconstructs WX for calibration inputs.

2. Pipeline

Collect calibration X

Feed a few samples through the FP model and cache the input to the current block.

FP forward

Compute the reference output: Y_fp = W X.

QDQ with V, α, β

Quantize + dequantize weights using trainable rounding and clipping.

Quant forward

Compute Y_q = W̃ X through the same block.

Reconstruction loss

Minimize ||Y_fp - Y_q||², not the direct weight error.

SignSGD update

Move by the sign of the gradient and save the best parameters.

3. Interactive toy example

This demo uses a tiny 4×4 weight matrix and a small calibration input. The update is a simplified SignSGD-style reconstruction loop to show the mechanism; real LLM quantization applies the same idea block by block at large scale.

Run the optimizer

Learning rate for V:

Visible speed:

Step0

RTN loss-

Current loss-

Best loss-

Selected rounding offset V[0,0]

-0.50: RTN+0.5

Crossing the center threshold can change the final integer rounding decision. SignSGD only needs the direction, not the exact gradient magnitude.

What the update does

θ_t+1 = θ_t − lr · sign(∂loss / ∂θ), θ ∈ {V, α, β}

The goal is not to recover the exact FP weights. The goal is to choose integer weights that make the block output close to the FP block output.

Rounding V

Clip α

Clip β

FP weights W fixed

Rounding offsets V trained

Integer weights W_int exported

Dequantized weights W̃ used for loss

Quantized integerGood reconstructionOffset near rounding thresholdLarger local error

4. Pseudocode

initialize V = 0, α = 1, β = 1, best_loss = ∞ for step in 1..T: X = block_input(calibration_batch) Y_fp = forward(W, X) # reference W̃ = quant_dequant(W, V, α, β) # QDQ weight Y_q = forward(W̃, X) # quantized block loss = mse(Y_q, Y_fp) # reconstruction if loss < best_loss: save V, α, β V, α, β = V, α, β - lr · sign(gradient) export W_int, scale, zero_point

5. Why this works

Rounding is threshold-based. For a weight near a rounding border, a small shift in V can flip the integer choice.

Output reconstruction is block-aware. Some local weight errors matter less if the block output remains close.

SignSGD is lightweight. It ignores gradient magnitude and only follows the direction, which is enough for bounded rounding/clipping parameters.

Serving stays simple. The tuned parameters are folded into the final quantized checkpoint.