8000 Introduce aoti_call_delegate HOP by SherlockNoMad · Pull Request #145630 · pytorch/pytorch · GitHub

Introduce aoti_call_delegate HOP #145630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

SherlockNoMad wants to merge 1 commit into pytorch:main from SherlockNoMad:export-D68359391

Contributor

SherlockNoMad commented

Summary:
Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering.

I introduce a new HOP to address this.

The schema is following

aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])

There are a few problems exposed by HOP

AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making original_gm stateful, and bypassing the serialization for original_gm.
As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391

SherlockNoMad requested review from avikchaudhuri, tugsbayasgalan, zhxchen17, ydwu4, angelayi and zou3519 as code owners

January 24, 2025 17:56

pytorch-bot bot added ciflow/inductor release notes: export labels

pytorch-bot bot commented

•

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145630

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 133a4e8 with merge base 5a527fa ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
levit_128
linux-binary-manywheel / manywheel-py3_9-cuda11_8-test / test (gh) (trunk failure)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-cuda12_4-test / test (gh) (trunk failure)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

facebook-github-bot added the fb-exported label

pytorch-bot bot temporarily deployed to upload-benchmark-results

January 24, 2025 18:26

Inactive 8000

pytorch-bot bot temporarily deployed to upload-benchmark-results

January 24, 2025 18:26

Inactive

SherlockNoMad force-pushed the export-D68359391 branch from 78919e2 to 60a629c Compare

January 24, 2025 23:44

SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request


          Introduce aoti_call_delegate HOP (pytorch#145630)

60a629c

Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

pytorch-bot bot temporarily deployed to upload-benchmark-results

January 25, 2025 00:14

Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results

January 25, 2025 00:14

Inactive

SherlockNoMad force-pushed the export-D68359391 branch from 60a629c to aae8b65 Compare

January 27, 2025 16:38

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request


          Introduce aoti_call_delegate HOP (pytorch#145630)

4f8e4e0

Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391

SherlockNoMad force-pushed the export-D68359391 branch from aae8b65 to 4f8e4e0 Compare

January 27, 2025 16:41

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad commented

View reviewed changes

torch/_ops.py Show resolved Hide resolved

pytorch-bot bot temporarily deployed to upload-benchmark-results

January 27, 2025 17:11

Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results

January 27, 2025 17:11

Inactive

SherlockNoMad force-pushed the export-D68359391 branch from 4f8e4e0 to 769110e Compare

January 28, 2025 21:46

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad force-pushed the export-D68359391 branch from 7682b7a to 243c3e3 Compare

January 29, 2025 18:28

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

1 similar comment

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad force-pushed the export-D68359391 branch from 243c3e3 to 64d65ee Compare

January 29, 2025 19:16

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

angelayi reviewed

View reviewed changes

torch/_higher_order_ops/aoti_call_delegate.py

    
              AOTI_LOWERED_MODULE = "AOTInductorEPModule"

              class AOTICallDelegate(HigherOrderOperator):

Contributor

angelayi

is it valuable to make this op more general? Like couldn't it also work for MTIA too?

SherlockNoMad force-pushed the export-D68359391 branch from 64d65ee to db69de9 Compare

January 29, 2025 21:55

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request


          Introduce aoti_call_delegate HOP (pytorch#145630)

65c6c2d

Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391

SherlockNoMad force-pushed the export-D68359391 branch from db69de9 to 65c6c2d Compare

January 29, 2025 22:06

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad force-pushed the export-D68359391 branch from 65c6c2d to 3014224 Compare

January 30, 2025 18:00

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request


          Introduce aoti_call_delegate HOP (pytorch#145630)

cfb5471

Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391

SherlockNoMad force-pushed the export-D68359391 branch from 3014224 to cfb5471 Compare

January 30, 2025 18:04

67E6

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391


          Introduce aoti_call_delegate HOP (pytorch#145630)

133a4e8

Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391

SherlockNoMad force-pushed the export-D68359391 branch from cfb5471 to 133a4e8 Compare

January 30, 2025 18:52

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D68359391

Contributor

facebook-github-bot commented

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this in

cf2de4e

pytorchmergebot added Merged and removed merging labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

angelayi angelayi left review comments

zou3519 zou3519 approved these changes

avikchaudhuri Awaiting requested review from avikchaudhuri avikchaudhuri is a code owner

tugsbayasgalan Awaiting requested review from tugsbayasgalan tugsbayasgalan is a code owner

zhxchen17 Awaiting requested review from zhxchen17 zhxchen17 is a code owner

ydwu4 Awaiting requested review from ydwu4 ydwu4 is a code owner

Labels

ciflow/inductor ciflow/trunk fb-exported Merged release notes: export

0