[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

Xia-Weiwen · 2025-04-28T09:09:58Z

Summary

In this PR, we migrate the fusion passes of quantized ops for X86Inductor backend from PyTorch Inductor source code to Torchao. This is the first step to migrate quantization-related fusion passes in PyTorch core to Torchao.
With this PR landed, we can add fusion passes for new ops in Torchao instead of in PyTorch core. So, we want this PR merged early.
We plan to do the migration in the following steps:

Copy fusion passes from PyTorch core to Torchao (this PR)
Depreacte and remove fusion passes in PyTorch core (TODO)
Switch to new quantize/dequantize ops in Torchao (TODO)

(Step 2 and 3 have no dependency on each other and can be reordered.)

Fusion passes need to be registered to Inductor before calling torch.compile. And it would be less user-friendly if we ask users to register them in their code. There are two options for automatic registration:

Wrap registration in the lowering API lower_pt2e_quantized_to_x86
Register passes when importing torchao.quantization.pt2e.quantizer.x86_inductor_quantizer

The problem with option 1 is that we should allow registering other fusion passes out of tree. So, we decide to go with option 2.

Test plan

We copied related UTs from https://github.com/pytorch/pytorch/blob/main/test/inductor/test_mkldnn_pattern_matcher.py
The test cases are run only with torch nightly since some torch features are only available in nightly, such as onednn.qconv_pointwise.
Use the following cmd to run tests

pytest test/quantization/pt2e/test_x86inductor_fusion.py

Explanation of implementation

In this PR, we mostly copy the code from torch Inductor https://github.com/pytorch/pytorch/blob/main/torch/_inductor/fx_passes/quantization.py, using internal functions, methods and utilities in Inductor by importing them directly. We think it's the simplest way to register the fusion passes.
For now, the fusion passes in torch Inductor will co-exist with the passes registered in torchao. There won't be an issue because duplicate passes won't be applied twice. It's because the patterns no longer exist in graph after fusion, and one pattern won't be matched twice.
In the future, we will switch to the new quantize/dequantize ops in torchao when they are ready. At that time, the patterns registered in torchao will be different from those in Inductor. After that, the passes in torch Inductor will be deprecated and eventually removed.

pytorch-bot · 2025-04-28T09:10:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2140

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 38517e8 with merge base 2c901b3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-04-29T02:00:51Z

Hi @jerryzh168 @jansel Could you please review this PR? We would like to hear your comments especially on (1) if it sounds ok to you that we copy Inductor code here in Torchao with Inductor's internal utilities, (2) if it is ok that we keep duplicate passes for now. Thanks!

jansel · 2025-04-29T18:12:22Z

I think out of tree passes are fine.

Do we need a better registration system so the changes can be local to a specific torch.compile() call rather than mutating globals?

cc @eellison

jerryzh168 · 2025-04-29T19:36:38Z

torchao/prototype/inductor/fx_passes/quantization.py

should this be in prototype? I think under torchao/quantization/pt2e might be better?

also the folder name can probably be something like inductor_passes to be more specific

I'd recommend: torchao/quantization/pt2e/inductor_passes/x86.py

Thanks. I have moved it as you suggested.

jerryzh168 · 2025-04-29T19:39:52Z

I also feel hiding compile API in lower_pt2e_quantized_to_x86 is not a good idea and compile stack should allow registering fusion passes out of tree

jerryzh168 · 2025-04-29T19:41:19Z

torchao/quantization/pt2e/lowering.py

+        global FUSION_PATH_REGISTERED
+        if not FUSION_PATH_REGISTERED:
+            global torch
+            import torch._inductor.config
+
+            from torchao.prototype.inductor.fx_passes.quantization import (
+                _register_quantization_weight_pack_pass,
+                quant_lift_up,
+            )
+
+            torch._inductor.config.pre_grad_custom_pass = quant_lift_up
+            _register_quantization_weight_pack_pass()
+            FUSION_PATH_REGISTERED = True


can this part happen during import of x86_inductor_quantizer?

Thanks. I have modified per your suggestion.

Xia-Weiwen · 2025-04-30T01:26:16Z

I think out of tree passes are fine.

Do we need a better registration system so the changes can be local to a specific torch.compile() call rather than mutating globals?

cc @eellison

Thanks for your comments. We will just keep the current implementation then.
As for a new registration system, maybe we can have something similar to pre_grad_custom_pass?

jansel · 2025-04-30T01:38:49Z

Yeah, I think this might be cleaner with something like pre_grad_custom_pass instead of global registration.

leslie-fang-intel

Hi @Xia-Weiwen, will you add the registration system in PyTorch firstly then refine this PR?

Xia-Weiwen · 2025-04-30T02:25:32Z

Hi @Xia-Weiwen, will you add the registration system in PyTorch firstly then refine this PR?

No. I plan to keep the current implementation. When new registration system is added in Inductor by Meta Inductor team, I will switch to that in another PR.

leslie-fang-intel · 2025-04-30T05:21:17Z

torchao/quantization/pt2e/lowering.py

+            )
+
+            torch._inductor.config.pre_grad_custom_pass = quant_lift_up
+            _register_quantization_weight_pack_pass()


I'm a bit concerned about this. Not sure how we should handle it, but it seems that

The patterns from quantization.py in TorchAO will be registered here once

And inside torch.compile when freezing turns on, the same patterns from pytorch/torch/_inductor/fx_passes/quantization.py inside Torch Inductor will be registered again.

Thanks for the comments. As we discussed offline and I have explained in the summary above, duplicate passes will be applied only once because once applied, the pattern is gone.

leslie-fang-intel · 2025-04-30T05:24:51Z

torchao/quantization/pt2e/lowering.py

+                quant_lift_up,
+            )
+
+            torch._inductor.config.pre_grad_custom_pass = quant_lift_up


Be careful to check if there is any other pre_grad_custom_pass registered before, check this pytorch/pytorch#151876 issue cc @Valentine233 who are working on it.

Thanks for the comment. It is potentially unsafe. I will modify this part after the pre_grad_custom_pass is refactored, probably in another PR.

Xia-Weiwen · 2025-04-30T06:32:04Z

I also feel hiding compile API in lower_pt2e_quantized_to_x86 is not a good idea and compile stack should allow registering fusion passes out of tree

Thanks for the comments. I have moved the registration out of the lowering function. Please review again. Thanks.

Xia-Weiwen · 2025-05-06T09:52:05Z

Hi @jerryzh168 Could you please review this PR again? Thanks.
The CI failure seems due to environment issues, not related to this PR.

Xia-Weiwen · 2025-05-07T09:24:44Z

Hi @jerryzh168 Do you have any concerns about this PR? Thanks.

jerryzh168 · 2025-05-09T06:26:18Z

torchao/quantization/pt2e/quantizer/x86_inductor_quantizer.py

+from torchao.utils import TORCH_VERSION_AT_LEAST_2_8
+
+if TORCH_VERSION_AT_LEAST_2_8:
+    torch._inductor.config.pre_grad_custom_pass = quant_lift_up


just confirming, is this the final API already or do we still expect more changes here?

I think this is subject to change. Discussions are still underway. We plan to modify this part after the API is changed. Thanks.

Xia-Weiwen added 3 commits April 28, 2025 01:47

[PT2E][X86] Migrate fusion passes in Inductor to torchao

e9076fe

Merge branch 'main' into migrate_x86_fusion_passes

c29a161

Fix conflict after merging main

2642a4b

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2025

Xia-Weiwen added topic: new feature Use this tag if this PR adds a new feature and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Apr 28, 2025

Fix CI

bd4b9ae

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 28, 2025

Xia-Weiwen added 7 commits April 28, 2025 02:34

Fix format issues

3aaa0f5

Fix format issue

4491a98

Fix versioning issue in UT

e260028

Fix format issue

692566e

Fix CI

11c3924

Fix CI

1c2a948

Fix CI

ee2c1b1

Xia-Weiwen requested review from leslie-fang-intel, jansel and jerryzh168 April 29, 2025 01:21

Xia-Weiwen marked this pull request as ready for review April 29, 2025 01:46

Merge branch 'main' into migrate_x86_fusion_passes

e1664db

jerryzh168 reviewed Apr 29, 2025

View reviewed changes

jerryzh168 approved these changes Apr 29, 2025 •

edited

Loading

View reviewed changes

jerryzh168 reviewed Apr 29, 2025

View reviewed changes

jerryzh168 self-requested a review April 29, 2025 19:41

leslie-fang-intel reviewed Apr 30, 2025

View reviewed changes

Move registration of Inductor fusion passes to x86_inductor_quantizer.py

524281a

Xia-Weiwen requested a review from leslie-fang-intel April 30, 2025 06:36

Xia-Weiwen added 2 commits April 30, 2025 00:44

Fix CI

8e4532f

Merge branch 'main' into migrate_x86_fusion_passes

882ccc1

danielvegamyhre mentioned this pull request May 8, 2025

[Async TP] Fuse all-gather-matmuls for float8 rowwise training pytorch/pytorch#149990

Open

jerryzh168 reviewed May 9, 2025

View reviewed changes

jerryzh168 approved these changes May 9, 2025

View reviewed changes

Merge branch 'main' into migrate_x86_fusion_passes

38517e8

Xia-Weiwen merged commit 8b96bcd into pytorch:main May 10, 2025
18 checks passed

Xia-Weiwen deleted the migrate_x86_fusion_passes branch May 12, 2025 02:06

marpioch mentioned this pull request May 14, 2025

[RFC] [Inductor] Custom pass registration interface pytorch/pytorch#153532

Open

jcaip mentioned this pull request Jun 13, 2025

[ci] fix pt2e x86 unit tests #2371

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

[PT2E][X86] Migrate fusion passes in Inductor to torchao #2140

Uh oh!

Conversation

Uh oh!

Summary

Test plan

Explanation of implementation

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2140

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!