8000 Introduce aoti_call_delegate HOP by SherlockNoMad · Pull Request #145630 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Introduce aoti_call_delegate HOP #145630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

SherlockNoMad
Copy link
Contributor

Summary:
Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering.

I introduce a new HOP to address this.

The schema is following

aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])

There are a few problems exposed by HOP

  • AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making original_gm stateful, and bypassing the serialization for original_gm.
  • As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391

Copy link
pytorch-bot bot commented Jan 24, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145630

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 133a4e8 with merge base 5a527fa (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 18:26 Inactive 8000
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 24, 2025 18:26 Inactive
SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request Jan 24, 2025
Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 25, 2025 00:14 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 25, 2025 00:14 Inactive
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request Jan 27, 2025
Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 17:11 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 27, 2025 17:11 Inactive
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

AOTI_LOWERED_MODULE = "AOTInductorEPModule"


class AOTICallDelegate(HigherOrderOperator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it valuable to make this op more general? Like couldn't it also work for MTIA too?

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request Jan 29, 2025
Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

SherlockNoMad added a commit to SherlockNoMad/pytorch that referenced this pull request Jan 30, 2025
Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391
67E6
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

Summary:

Previously, aoti compile node is represented as a kernel-less custom op in the exported program. The node was not eager runnable, which is a common practice for numerical validation during lowering. 

I introduce a new HOP to address this. 

The schema is following
```
aoti_call_delegate(lower_moduel: AOTInductorEPModule, original_gm: fx.GraphModule, weights: List[Tensor], inputs: List[Tensor])
```

There are a few problems exposed by HOP
- AOTI expects a FX graph with weights as getattr nodes, aka stateful graph. HOP expect graph_module arguments to be stateless. Export serializer also expect a stateless graph. Currently, to make AOTI happy, I am making `original_gm` stateful, and bypassing the serialization for `original_gm`. 
- As a result, the HOP is not re-traceable, as functionalization on stateful graph module argument will fail.

Test Plan: buck2 test 'fbcode//mode/opt' fbcode//deeplearning/aot_inductor/cpu/test:cpu_lowering_utils_test

Reviewed By: zhxchen17

Differential Revision: D68359391
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D68359391

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0