[ONNX] Migrate torchlib from onnxscript #139301

justinchuby · 2024-10-30T17:44:43Z

Now that the torchlib translation library is close to stable in onnxscript, it is ready to be migrated into PyTorch so that

It can evolve with aten operators without having to worry about backward compatibility for different PyTorch versions
We can use newer opsets by default, again without having to worry about BC. The proposal is upgrade to use opset 22 for torch 2.6 or torch 2.7. This makes easier for us and users to leverage new dtypes and the corrected GroupNormalization.

Considerations

We should still allow implementations in onnxscript to override those in PyTorch so we can continue to add new implementations after a torch version is released. This will be part of the stable framework apis in onnxscript.
We can update the default opset version for each torch version.
The opinfo unit tests should be migrated as well.

justinchuby · 2025-02-15T17:33:30Z

Migrate the ATen op decomp library for ONNX ("torchlib") from ONNX Script with necessary changes in the onnx registry.

The migration

"torchlib" is what we the decomp library from aten ops to onnx in the onnxscript project. (name doesn't matter, can be changed.) It is the improved version of the "symbolic functions" in torch/onnx implemented using onnxscript, a graph builder for onnx. Since PyTorch 2.1, it has been a dependency for torch.onnx via onnxscript and has been powering the torch.onnx exporter.

torchlib was hosted in onnxscript for rapid evolvement. However, is it time to migrate the logic into PyTorch because:

The logic (is developed for and) belongs to torch.onnx and is the equivalent of the onnx "symbolic functions" for FX graphs
Migrating to PyTorch decouples torch.onnx from logic in onnxscript, which is a good thing.
Maintaining its compatibility among multiple PyTorch versions is becoming harder and harder. After migration we can evolve the logic with aten operators without having to worry about backward compatibility for different PyTorch versions
We can use newer opsets by default, again without having to worry about BC. The proposal is upgrade to use opset 21 (from opset 18, released 2 years ago) for torch 2.7. This makes it easier for developers and users to leverage new dtypes and new operators like the corrected GroupNormalization.

Test and infra impact

The tests leverage OpInfo. They run in an onnx shard only. Added tests finish within 40 seconds on a 10-core local machine.

No new dependencies are introduced. Packaging, test activities should remain the same.

State of the migrated code

The migrated code are lifted from https://github.com/microsoft/onnxscript/tree/main/onnxscript/function_libs/torch_lib. It is reviewed by the same set of reviewers owning the torch.onnx component.

Next steps

The follow up PRs will decouple the implementation from ONNX Script type system to improve type checking and bump the onnx opset version used.

#139301 Pull Request resolved: #147391 Approved by: https://github.com/titaiwangms

This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (#139301). The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list. After the migration hooks for loading ops from onnx script will be removed. 1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts 2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure 3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Pull Request resolved: #147396 Approved by: https://github.com/titaiwangms

Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in #139301. It adds two ops: abs and add, and enables the related tests. #139301 Pull Request resolved: #147392 Approved by: https://github.com/titaiwangms ghstack dependencies: #147396

#139301 Pull Request resolved: #147391 Approved by: https://github.com/titaiwangms

This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (#139301). The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list. After the migration hooks for loading ops from onnx script will be removed. 1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts 2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure 3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Pull Request resolved: #147396 Approved by: https://github.com/titaiwangms

Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in #139301. It adds two ops: abs and add, and enables the related tests. #139301 Pull Request resolved: #147392 Approved by: https://github.com/titaiwangms ghstack dependencies: #147396

#139301 Pull Request resolved: #147391 Approved by: https://github.com/titaiwangms

This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (pytorch#139301). The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list. After the migration hooks for loading ops from onnx script will be removed. 1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts 2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure 3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Pull Request resolved: pytorch#147396 Approved by: https://github.com/titaiwangms

Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in #139301. It adds two ops: abs and add, and enables the related tests. #139301 Pull Request resolved: #147392 Approved by: https://github.com/titaiwangms ghstack dependencies: #147396

albanD · 2025-02-27T17:28:41Z

Thanks for the details @justinchuby !
I wanted to share a bit more context on where we're coming from here to clarify our position. The bottom line here is really that we want to keep things stable (in our infra and code) and maintainable (code debt slowing down development is a growing concern).

That being said, this kind of major migration is sometimes needed but they are always disruptive to users and maintainers and so need care.
In particular, in principle, I would expect:

The older system that this is replacing must be deleted or at least moved out. It is not ok to say this will be done in the future. It has happened many times and "it will be done later" <=> "it will never be done".
The PR needs to be split to allow smooth integration within infra and review (each reviewer only having to look at the relevant piece).
There must be a clear articulation on how the whole code sent here is tested. Splitting the PR also helps this story significantly.

For the particular ask here, I would say:

I 100% agree that this new export system is great and we do want to find a way to get it into core.
I don't really buy the argument that "we did it out of core to move fast" and at the same time "it is reviewed by the same people with the same standard as core". If that were true, why do you need to be out of core to move fast?
I think the PR should be split into scaffolding + a couple example ops + their testing and all the other ops. How to best build such scaffolding and integrate in the testing infra would most likely benefit from a review from core maintainers. But a 20k lines PR makes it impossible.
The PR split should make it clear how everything being added is tested to give everyone confidence on everything works and will continue working.
There needs to be a concrete plan to remove the logic that this replaces.

justinchuby · 2025-02-27T17:52:10Z

Thanks! For the points your shared:

The bottom line here is really that we want to keep things stable (in our infra and code) and maintainable (code debt slowing down development is a growing concern).

I fully agree with the goals here.

The older system that this is replacing must be deleted or at least moved out. It is not ok to say this will be done in the future. It has happened many times and "it will be done later" <=> "it will never be done".

Currently it is completing the new onnx exporter that works on FX. So the logic itself is not "replacing" anything. The logic, along with the whole export flow, will replace the jit/torchscript based onnx exporter. It is as removable as torchscript in torch (aka. not immediately clear because of the vast number of legacy dependents). But we are encouraging users to migrate to the new flow.

The PR needs to be split to allow smooth integration within infra and review (each reviewer only having to look at the relevant piece).
There must be a clear articulation on how the whole code sent here is tested. Splitting the PR also helps this story significantly.

I think the PR should be split into scaffolding + a couple example ops + their testing and all the other ops.

Sure - that is done according to your suggestion:

Refactoring: [ONNX] Refactor dispatcher and registry #147396
Scaffolding: [ONNX] Add scaffolding for onnx decomp and logic for op tests #147392
I also added reviewer walkthough guide on the main migration PR (this is new, not the 20k one) [ONNX] Migrate onnx ops decomp functions #147469 for your reference.

If that were true, why do you need to be out of core to move fast?

That is because the CI runtime in onnxscript is significantly smaller (on the order of 2-5 minutes) to allow faster iteration. We also had more frequent releases so users will be able to use the latest updates sooner than they could with PyTorch. In short, every one of the PRs was peer-reviewed w/ the same (high) quality we hold among ourselves and can be traced in onnxscript. The logic is also used in production.

There needs to be a concrete plan to remove the logic that this replaces.

The logic, along with the whole export flow, will replace the jit/torchscript based onnx exporter. It is as removable as torchscript in torch (aka. not immediately clear because of the vast number of legacy dependents). But we are encouraging users to migrate to the new flow.

justinchuby · 2025-03-03T17:38:44Z

@albanD let us know if the info above addresses some of your concerns? The migration will also avoid issues like microsoft/onnxscript#2085 which affects development in core.

) pytorch#139301 Pull Request resolved: pytorch#147391 Approved by: https://github.com/titaiwangms

This PR sets up the registry to accept onnx decomp functions to be moved into PyTorch (pytorch#139301). The ops from onnx script are currently appended to the registry. When the ops are moved into PyTorch, the moved ops takes precedence because they appear first in the registry list. After the migration hooks for loading ops from onnx script will be removed. 1. Use a private field `_pt_onnx_signature` to store function signatures to avoid conflicts 2. Update the registry to record the signature in OnnxDecompMeta and update the dispatcher to leverage the data structure 3. Update registry to prepare for onnx op registration, and update the the onnx_impl decorator to support a no_compile option Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Pull Request resolved: pytorch#147396 Approved by: https://github.com/titaiwangms

…h#147392) Create scaffold for onnx op test data and common logic. This PR creates the scaffolding for new onnx decomp functions described in pytorch#139301. It adds two ops: abs and add, and enables the related tests. pytorch#139301 Pull Request resolved: pytorch#147392 Approved by: https://github.com/titaiwangms ghstack dependencies: pytorch#147396

justinchuby · 2025-03-10T19:04:25Z

Since there is not much movement of the discussion and we are blocked with the migration, I am going to create split-up PRs and handle the migration piecemeal instead. cc @MaanavD @xadupre @gramalingam @titaiwangms

justinchuby added module: onnx Related to torch.onnx triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 30, 2024

justinchuby self-assigned this Oct 30, 2024

justinchuby changed the title ~~[ONNX][RFC] Migrate torchlib from onnxscript~~ [ONNX][RFC] Migrate torchlib from onnxscript (torch 2.6/2.7) Oct 30, 2024

justinchuby added this to the 2.6.0 milestone Oct 30, 2024

kiwik mentioned this issue Nov 14, 2024

Monthly issue metrics report kiwik/os-version-checker#65

Closed

justinchuby modified the milestones: 2.6.0, 2.7.0 Nov 19, 2024

justinchuby changed the title ~~[ONNX][RFC] Migrate torchlib from onnxscript (torch 2.6/2.7)~~ [ONNX][RFC] Migrate torchlib from onnxscript Jan 23, 2025

justinchuby changed the title ~~[ONNX][RFC] Migrate torchlib from onnxscript~~ [ONNX] Migrate torchlib from onnxscript Feb 4, 2025

justinchuby mentioned this issue Feb 4, 2025

[ONNX] Migrate onnx decomps into PyTorch #146224

Closed

pytorchmergebot pushed a commit that referenced this issue Feb 19, 2025

[ONNX] Move and improve error reproduction logic in test (#147391)

959d79f

#139301 Pull Request resolved: #147391 Approved by: https://github.com/titaiwangms

justinchuby mentioned this issue Feb 19, 2025

[ONNX] Migrate onnx ops decomp functions #147469

Open

Raymo111 pushed a commit that referenced this issue Feb 20, 2025

[ONNX] Move and improve error reproduction logic in test (#147391)

06f91df

#139301 Pull Request resolved: #147391 Approved by: https://github.com/titaiwangms

pytorch-bot bot pushed a commit that referenced this issue Feb 24, 2025

[ONNX] Move and improve error reproduction logic in test (#147391)

45ed5cc

#139301 Pull Request resolved: #147391 Approved by: https://github.com/titaiwangms

majing921201 pushed a commit to majing921201/pytorch that referenced this issue Mar 4, 2025

[ONNX] Move and improve error reproduction logic in test (pytorch#147391

d5d486c

) pytorch#139301 Pull Request resolved: pytorch#147391 Approved by: https://github.com/titaiwangms

justinchuby modified the milestones: 2.7.0, 2.8.0 Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] Migrate torchlib from onnxscript #139301

[ONNX] Migrate torchlib from onnxscript #139301

[ONNX] Migrate torchlib from onnxscript #139301

[ONNX] Migrate torchlib from onnxscript #139301

Comments

Considerations

The migration

Test and infra impact

State of the migrated code

Next steps