8000 [ONNX] Migrate torchlib from onnxscript · Issue #139301 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[ONNX] Migrate torchlib from onnxscript #139301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
justinchuby opened this issue Oct 30, 2024 · 5 comments
Open

[ONNX] Migrate torchlib from onnxscript #139301

justinchuby opened this issue Oct 30, 2024 · 5 comments
Assignees
Labels
module: onnx Related to torch.onnx triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone

Comments

@justinchuby
Copy link
Collaborator
justinchuby commented Oct 30, 2024

Now that the torchlib translation library is close to stable in onnxscript, it is ready to be migrated into PyTorch so that

  1. It can evolve with aten operators without having to worry about backward compatibility for different PyTorch versions
  2. We can use newer opsets by default, again without having to worry about BC. The proposal is upgrade to use opset 22 for torch 2.6 or torch 2.7. This makes easier for us and users to leverage new dtypes and the corrected GroupNormalization.

Considerations

  1. We should still allow implementations in onnxscript to override those in PyTorch so we can continue to add new implementations after a torch version is released. This will be part of the stable framework apis in onnxscript.
  2. We can update the default opset version for each torch version.
  3. The opinfo unit tests should be migrated as well.
@justinchuby justinchuby added module: onnx Related to torch.onnx triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 30, 2024
@justinchuby justinchuby self-assigned this Oct 30, 2024
@justinchuby justinchuby changed the title [ONNX][RFC] Migrate torchlib from onnxscript [ONNX][RFC] Migrate torchlib from onnxscript (torch 2.6/2.7) Oct 30, 2024
@justinchuby justinchuby added this to the 2.6.0 milestone Oct 30, 2024
@justinchuby justinchuby modified the milestones: 2.6.0, 2.7.0 Nov 19, 2024
@justinchuby justinchuby changed the title [ONNX][RFC] Migrate torchlib from onnxscript (torch 2.6/2.7) [ONNX][RFC] Migrate torchlib from onnxscript Jan 23, 2025
@justinchuby justinchuby changed the title [ONNX][RFC] Migrate torchlib from onnxscript [ONNX] Migrate torchlib from onnxscript Feb 4, 2025
@justinchuby
Copy link
Collaborator Author
justinchuby commented Feb 15, 2025

Migrate the ATen op decomp library for ONNX ("torchlib") from ONNX Script with necessary changes in the onnx registry.

The migration

"torchlib" is what we the decomp library from aten ops to onnx in the onnxscript project. (name doesn't matter, can be changed.) It is the improved version of the "symbolic functions" in torch/onnx implemented using onnxscript, a graph builder for onnx. Since PyTorch 2.1, it has been a dependency for torch.onnx via onnxscript and has been powering the torch.onnx exporter.

torchlib was hosted in onnxscript for rapid evolvement. However, is it time to migrate the logic into PyTorch because:

  1. The logic (is developed for and) belongs to torch.onnx and is the equivalent of the onnx "symbolic functions" for FX graphs
  2. Migrating to PyTorch decouples torch.onnx from logic in onnxscript, which is a good thing.
  3. Maintaining its compatibility among multiple PyTorch versions is becoming harder and harder. After migration we can evolve the logic with aten operators without having to worry about backward compatibility for different PyTorch versions
  4. We can use newer opsets by default, again without having to worry about BC. The proposal is upgrade to use opset 21 (from opset 18, released 2 years ago) for torch 2.7. This makes it easier for developers and users to leverage new dtypes and new operators like the corrected GroupNormalization.

Test and infra impact

The tests leverage OpInfo. They run in an onnx shard only. Added tests finish within 40 seconds on a 10-core local machine.

No new dependencies are introduced. Packaging, test activities should remain the same.

State of the migrated code

The migrated code are lifted from https://github.com/microsoft/onnxscript/tree/main/onnxscript/function_libs/torch_lib. It is reviewed by the same set of reviewers owning the torch.onnx component.

Next steps

The follow up PRs will decouple the implementation from ONNX Script type system to improve type checking and bump the onnx opset version used.

pytorchmergebot pushed a commit that referenced this issue Feb 19, 2025
This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (#139301).

The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list.

After the migration hooks for loading ops from onnx script will be removed.

1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts
2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure
3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Pull Request resolved: #147396
Approved by: https://github.com/titaiwangms
pytorchmergebot pushed a commit that referenced this issue Feb 19, 2025
Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in #139301. It adds two ops: abs and add, and enables the related tests.

#139301
Pull Request resolved: #147392
Approved by: https://github.com/titaiwangms
ghstack dependencies: #147396
Raymo111 pushed a commit that referenced this issue Feb 20, 2025
This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (#139301).

The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list.

After the migration hooks for loading ops from onnx script will be removed.

1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts
2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure
3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Pull Request resolved: #147396
Approved by: https://github.com/titaiwangms
Raymo111 pushed a commit that referenced this issue Feb 20, 2025
Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in #139301. It adds two ops: abs and add, and enables the related tests.

#139301
Pull Request resolved: #147392
Approved by: https://github.com/titaiwangms
ghstack dependencies: #147396
Ryo-not-rio pushed a commit to Ryo-not-rio/pytorch that referenced this issue Feb 24, 2025
This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (pytorch#139301).

The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list.

After the migration hooks for loading ops from onnx script will be removed.

1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts
2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure
3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Pull Request resolved: pytorch#147396
Approved by: https://github.com/titaiwangms
pytorch-bot bot pushed a commit that referenced this issue Feb 24, 2025
Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in #139301. It adds two ops: abs and add, and enables the related tests.

#139301
Pull Request resolved: #147392
Approved by: https://github.com/titaiwangms
ghstack dependencies: #147396
@albanD
Copy link
Collaborator
albanD commented Feb 27, 2025

Thanks for the details @justinchuby !
I wanted to share a bit more context on where we're coming from here to clarify our position. The bottom line here is really that we want to keep things stable (in our infra and code) and maintainable (code debt slowing down development is a growing concern).

That being said, this kind of major migration is sometimes needed but they are always disruptive to users and maintainers and so need care.
In particular, in principle, I would expect:

  • The older system that this is replacing must be deleted or at least moved out. It is not ok to say this will be done in the future. It has happened many times and "it will be done later" <=> "it will never be done".
  • The PR needs to be split to allow smooth integration within infra and review (each reviewer only having to look at the relevant piece).
  • There must be a clear articulation on how the whole code sent here is tested. Splitting the PR also helps this story significantly.

For the particular ask here, I would say:

  • I 100% agree that this new export system is great and we do want to find a way to get it into core.
  • I don't really buy the argument that "we did it out of core to move fast" and at the same time "it is reviewed by the same people with the same standard as core". If that were true, why do you need to be out of core to move fast?
  • I think the PR should be split into scaffolding + a couple example ops + their testing and all the other ops. How to best build such scaffolding and integrate in the testing infra would most likely benefit from a review from core maintainers. But a 20k lines PR makes it impossible.
  • The PR split should make it clear how everything being added is tested to give everyone confidence on everything works and will continue working.
  • There needs to be a concrete plan to remove the logic that this replaces.

@justinchuby
Copy link
Collaborator Author
justinchuby commented Feb 27, 2025

Thanks! For the points your shared:

The bottom line here is really that we want to keep things stable (in our infra and code) and maintainable (code debt slowing down development is a growing concern).

I fully agree with the goals here.

The older system that this is replacing must be deleted or at least moved out. It is not ok to say this will be done in the future. It has happened many times and "it will be done later" <=> "it will never be done".

Currently it is completing the new onnx exporter that works on FX. So the logic itself is not "replacing" anything. The logic, along with the whole export flow, will replace the jit/torchscript based onnx exporter. It is as removable as torchscript in torch (aka. not immediately clear because of the vast number of legacy dependents). But we are encouraging users to migrate to the new flow.

The PR needs to be split to allow smooth integration within infra and review (each reviewer only having to look at the relevant piece).
There must be a clear articulation on how the whole code sent here is tested. Splitting the PR also helps this story significantly.

I think the PR should be split into scaffolding + a couple example ops + their testing and all the other ops.

Sure - that is done according to your suggestion:

If that were true, why do you need to be out of core to move fast?

That is because the CI runtime in onnxscript is significantly smaller (on the order of 2-5 minutes) to allow faster iteration. We also had more frequent releases so users will be able to use the latest updates sooner than they could with PyTorch. In short, every one of the PRs was peer-reviewed w/ the same (high) quality we hold among ourselves and can be traced in onnxscript. The logic is also used in production.

There needs to be a concrete plan to remove the logic that this replaces.

The logic, along with the whole export flow, will replace the jit/torchscript based onnx exporter. It is as removable as torchscript in torch (aka. not immediately clear because of the vast number of legacy dependents). But we are encouraging users to migrate to the new flow.

@justinchuby
Copy link
Collaborator Author

@albanD let us know if the info above addresses some of your concerns? The migration will also avoid issues like microsoft/onnxscript#2085 which affects development in core.

majing921201 pushed a commit to majing921201/pytorch that referenced this issue Mar 4, 2025
majing921201 pushed a commit to majing921201/pytorch that referenced this issue Mar 4, 2025
This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (pytorch#139301).

The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list.

After the migration hooks for loading ops from onnx script will be removed.

1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts
2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure
3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Pull Request resolved: pytorch#147396
Approved by: https://github.com/titaiwangms
majing921201 pushed a commit to majing921201/pytorch that referenced this issue Mar 4, 2025
…h#147392)

Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in pytorch#139301. It adds two ops: abs and add, and enables the related tests.

pytorch#139301
Pull Request resolved: pytorch#147392
Approved by: https://github.com/titaiwangms
ghstack dependencies: pytorch#147396
@justinchuby
Copy link
Collaborator Author
justinchuby commented Mar 10, 2025

Since there is not much movement of the discussion and we are blocked with the migration, I am going to create split-up PRs and handle the migration piecemeal instead. cc @MaanavD @xadupre @gramalingam @titaiwangms

@justinchuby justinchuby modified the milestones: 2.7.0, 2.8.0 Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: onnx Related to torch.onnx triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
0