Mismatch in dynamic quantization performance for torchao and torch.quantization #152813

PioneerAlexander · 2025-05-05T09:26:09Z

Hi everyone!

Can someone explain, why I get different performance, when I apply torch.quantization.quantize_dynamic and torchao.quantize_?

More specifically, I have an LSTM model with two fully connected layers (in the front and in the back). In order to quantize it with torchao, I reimplemented a lstm layer (checked that it works as a nn.LSTM implementation)

Then I compare DynamicInt8ActivationInt8Weight quantization in both libraries:

quantize_(model, Int8DynamicActivationInt8WeightConfig())

model = torch.quantization.quantize_dynamic(
model, {nn.Linear}, dtype=torch.qint8
)

The first torchao solution was tested on GPU (NVIDIA A100 80GB PCIe, not MI300), nvcc version 12.1, cudnn 9.8, torch 2.5.1

Metric value drops by 1%

But when I run the second solution (on CPU, as GPU is not yet supported for torch.quantization), metric value drops by 35%.

what could be possibly wrong?

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @msaroufim

PioneerAlexander · 2025-05-12T05:48:31Z

Created a script to reproduce the issue. Try it out: https://pastebin.com/ACeySMtj

I trained LSTM to predict y=sin(x) function (R->R). Compared the metrics of the quality: MSE and MAE

Baseline solution (no quantization):
MSE: 0.00035 MAE: 0.01150
Quantization of the trained model using torch.quantization:
MSE: 0.00047 MAE: 0.01554
Quantization of the trained model using torchao (runned on cpu/gpu):
MSE: 0.00037 MAE: 0.01223

torch.quantization and torchao do the same quantization, but metrics have changed (torch.quantization is worse by 25%)

I received an useful answer from torchao team developer:

torchao is using _int_mm op https://github.com/pytorch/ao/blob/72e3c1169efdbf2ecbfdc601e93fc83a5f79208e/torchao/kernel/intmm.py#L132
(for both cpu and gpu) and torch.ao.quantization is using fbgemm CPU op.

Trying to change the engine with torch.backends.quantized.engine="qnnpack" reduced the metrics drop to 12%

If it is indeed the case, the issue can be closed.

zou3519 added the oncall: quantization Quantization support in PyTorch label May 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch in dynamic quantization performance for torchao and torch.quantization #152813

Mismatch in dynamic quantization performance for torchao and torch.quantization #152813

Mismatch in dynamic quantization performance for torchao and torch.quantization #152813

Mismatch in dynamic quantization performance for torchao and torch.quantization #152813

Comments