10000 Mismatch in dynamic quantization performance for torchao and torch.quantization · Issue #152813 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Mismatch in dynamic quantization performance for torchao and torch.quantization #152813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
PioneerAlexander opened this issue May 5, 2025 · 1 comment
Labels
oncall: quantization Quantization support in PyTorch

Comments

@PioneerAlexander
Copy link
PioneerAlexander commented May 5, 2025

Hi everyone!

Can someone explain, why I get different performance, when I apply torch.quantization.quantize_dynamic and torchao.quantize_?

More specifically, I have an LSTM model with two fully connected layers (in the front and in the back). In order to quantize it with torchao, I reimplemented a lstm layer (checked that it works as a nn.LSTM implementation)

Then I compare DynamicInt8ActivationInt8Weight quantization in both libraries:

quantize_(model, Int8DynamicActivationInt8WeightConfig())

model = torch.quantization.quantize_dynamic(
model, {nn.Linear}, dtype=torch.qint8
)

The first torchao solution was tested on GPU (NVIDIA A100 80GB PCIe, not MI300), nvcc version 12.1, cudnn 9.8, torch 2.5.1

Metric value drops by 1%

But when I run the second solution (on CPU, as GPU is not yet supported for torch.quantization), metric value drops by 35%.

what could be possibly wrong?

cc @jerryzh168 @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel @msaroufim

@zou3519 zou3519 added the oncall: quantization Quantization support in PyTorch label May 6, 2025
@PioneerAlexander
Copy link
Author

Created a script to reproduce the issue. Try it out: https://pastebin.com/ACeySMtj

I trained LSTM to predict y=sin(x) function (R->R). Compared the metrics of the quality: MSE and MAE

Baseline solution (no quantization):
MSE: 0.00035 MAE: 0.01150
Quantization of the trained model using torch.quantization:
MSE: 0.00047 MAE: 0.01554
Quantization of the trained model using torchao (runned on cpu/gpu):
MSE: 0.00037 MAE: 0.01223

torch.quantization and torchao do the same quantization, but metrics have changed (torch.quantization is worse by 25%)

I received an useful answer from torchao team developer:

torchao is using _int_mm op https://github.com/pytorch/ao/blob/72e3c1169efdbf2ecbfdc601e93fc83a5f79208e/torchao/kernel/intmm.py#L132
(for both cpu and gpu) and torch.ao.quantization is using fbgemm CPU op.

Trying to change the engine with torch.backends.quantized.engine="qnnpack" reduced the metrics drop to 12%

If it is indeed the case, the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: quantization Quantization support in PyTorch
Projects
None yet
Development

No branches or pull requests

2 participants
0