[ONNX] Add support for torch.cond/HOP in onnx exporter #137428

xadupre · 2024-10-07T15:50:40Z

This PR implements the framework for supporting HOP in the ONNX exporter. Refer to #140995 for the design.

Implement support for torch.cond
Refactor _add_nodes into _translate_fx_graph to handle nested subgraphs. To support building subgraphs as functions using the same logic, new handlers for placeholder and output nodes are added to register inputs and outputs on the onnx function.
Fuctions are created under the domain of pkg.torch.__subgraph__
Updated the type promotion pass to run on nested subgraphs.
Implement torch.cond in _torchlib/ops/hop.py. Updated the registry to discover these ops.
Improve opset_import handling robustness with add_opset_imports IR pass. To achieve this, we added opset version to all Nodes. Fixes [ONNX] Add opset version to individual nodes when building the graph #139503

Fixes #117655 Fixes #123972 Ref #93743 Closes #140995

pytorch-bot · 2024-10-07T15:50:45Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137428

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

❌ 3 New Failures, 1 Unrelated Failure

As of commit 9f081b8 with merge base 72943ba ():

NEW FAILURES - The following jobs have failed:

pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge) (gh)
ModuleNotFoundError: No module named 'torch.version'
pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge) (gh)
RuntimeError: Failed to compile /tmp/tmpi6gtxgru/data.json to /tmp/tmpi6gtxgru/data.pte. Set ET_EXIR_SAVE_FLATC_INPUTS_ON_FAILURE=1 to save input files on failure.
pull / linux-jammy-py3.9-gcc11 / test (backwards_compat, 1, 1, lf.linux.2xlarge) (gh)
test_modules_can_be_imported

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-jammy-py3.9-gcc11 / test (docs_test, 1, 1, lf.linux.2xlarge) (gh) (similar failure)
upload the benchmark results to OSS benchmark database: error: test/test-reports is not a valid directory

This comment was automatically generated by Dr. CI and updates every 15 minutes.

test/onnx/exporter/test_aten_cond.py

torch/onnx/_internal/exporter/_core.py

justinchuby · 2024-10-07T18:18:22Z

Function vs subgraph: Could you share more details on how the fx subgraphs are represented differently than onnx subgraphs? This can inform us on what the best representation is.

torch/onnx/_internal/exporter/_core.py

justinchuby

Thanks!

I think we should isolate logic handling a particular operator from the _core module. Any logic there would better be generic for handling HOPs, and specific logic for handling cond can be in a separate location which can be later integrated into torchlib (when it is migrated over)
In general, we should avoid any proto object manipulation in the exporter directory as they introduce unnecessary overhead and creates inconsistencies in how we manipulate the onnx graph.

test/onnx/exporter/test_small_models_e2e.py

xadupre · 2024-11-04T16:49:46Z

Thanks!

I think we should isolate logic handling a particular operator from the _core module. Any logic there would better be generic for handling HOPs, and specific logic for handling cond can be in a separate location which can be later integrated into torchlib (when it is migrated over)

In general, we should avoid any proto object manipulation in the exporter directory as they introduce unnecessary overhead and creates inconsistencies in how we manipulate the onnx graph.

Let's do it in two steps: first this one, then HOPs
That would be better. I'll give it a try. My first goal was to check it is working.

justinchuby · 2024-11-05T18:33:40Z

@xadupre and I talked and we have some ideas to simplify some of the function calls. I will play with it a little and update here if I make any progress

torch/onnx/_internal/exporter/_core.py

torch/onnx/_internal/exporter/_torchlib/ops/hop.py

torch/onnx/_internal/exporter/_registration.py

torch/onnx/_internal/exporter/_schemas.py

torch/onnx/_internal/exporter/_fx_passes.py

torch/onnx/_internal/exporter/_core.py

titaiwangms · 2024-11-19T19:13:12Z

torch/onnx/_internal/exporter/_core.py

+        fx_graph = module.graph
+
+        graph_like: ir.Graph | ir.Function
+        if name == "":


Do you think torch.export would change this assumption? I am a bit concerned.

Do you mean the name? I think the root module always have an empty name, but I should double check

Asked in slack

torch/onnx/_internal/exporter/_torchlib/ops/hop.py

test/onnx/exporter/test_small_models_e2e.py

justinchuby · 2024-11-20T02:32:25Z

<
    ir_version=9,
    opset_imports={'pkg.onnxscript.torch_lib.common': 1, '': 18, 'pkg.torch.__subgraph__': 1, 'pkg.onnxscript.torch_lib': 1},
    producer_name='pytorch',
    producer_version='2.6.0a0+git3cd6dd5',
    domain=None,
    model_version=None,
>
graph(
    name=main_graph,
    inputs=(
        %"x"<INT64,[2]>
    ),
    outputs=(
        %"getitem"<FLOAT,[2]>
    ),
    initializers=(
        %"weight"<FLOAT,[1]>,
        %"submodule.weight"<FLOAT,[1]>
    ),
) {
    0 |  # node_ReduceSum_0
         %"sum_1"<INT64,[]> ⬅️ ::ReduceSum(%"x") {keepdims=False, noop_with_empty_axes=0}
    1 |  # node_Constant_1
         %"val_0"<?,?> ⬅️ ::Constant() {value=Tensor<INT64,[]>(array(0), name=None)}
    2 |  # node_Greater_2
         %"gt"<BOOL,[]> ⬅️ ::Greater(%"sum_1", %"val_0")
    3 |  # node_If_3
         %"getitem"<FLOAT,[2]> ⬅️ ::If(%"gt") {then_branch=
             graph(
                 name=true_graph_0,
                 inputs=(

                 ),
                 outputs=(
                     %"getitem_true_graph_0"<?,?>
                 ),
             ) {
                 0 |  # node_true_graph_0_0
                      %"getitem_true_graph_0"<?,?> ⬅️ pkg.torch.__subgraph__::true_graph_0(%"weight", %"x", %"submodule.weight")
                 return %"getitem_true_graph_0"<?,?>
             }, else_branch=
             graph(
                 name=false_graph_0,
                 inputs=(

                 ),
                 outputs=(
                     %"sub_false_graph_0"<?,?>
                 ),
             ) {
                 0 |  # node_false_graph_0_0
                      %"sub_false_graph_0"<?,?> ⬅️ pkg.torch.__subgraph__::false_graph_0(%"weight", %"x", %"submodule.weight")
                 return %"sub_false_graph_0"<?,?>
             }}
    return %"getitem"<FLOAT,[2]>
}

<
    opset_imports={'': 18},
>
def pkg.torch.__subgraph__::false_graph_0(
    inputs=(
        %"p_weight"<FLOAT,[1]>,
        %"x"<INT64,[2]>,
        %"p_submodule_weight"<FLOAT,[1]>
    ),
    outputs=(
        %"sub"<FLOAT,[2]>
    ),
) {
    0 |  # node_Cast_0
         %"convert_element_type_default"<FLOAT,[2]> ⬅️ ::Cast(%"x") {to=FLOAT}
    1 |  # node_Sub_1
         %"sub"<FLOAT,[2]> ⬅️ ::Sub(%"convert_element_type_default", %"p_weight")
    return %"sub"<FLOAT,[2]>
}

<
    opset_imports={'pkg.onnxscript.torch_lib': 1},
>
def pkg.torch.__subgraph__::true_graph_0__false_graph_0(
    inputs=(
        %"p_submodule_weight"<FLOAT,[1]>,
        %"sub"<FLOAT,[2]>
    ),
    outputs=(
        %"div"<FLOAT,[2]>
    ),
) {
    0 |  # node_aten_div_0
         %"div"<FLOAT,[2]> ⬅️ pkg.onnxscript.torch_lib::aten_div(%"sub", %"p_submodule_weight")
    return %"div"<FLOAT,[2]>
}

<
    opset_imports={'': 18},
>
def pkg.onnxscript.torch_lib::aten_div(
    inputs=(
        %"self"<?,?>,
        %"other"<?,?>
    ),
    outputs=(
        %"return_val"<?,?>
    ),
) {
    0 |  # n0
         %"return_val"<?,?> ⬅️ ::Div(%"self", %"other")
    return %"return_val"<?,?>
}

<
    opset_imports={'': 18},
>
def pkg.torch.__subgraph__::true_graph_0__true_graph_0(
    inputs=(
        %"p_submodule_weight"<FLOAT,[1]>,
        %"sub"<FLOAT,[2]>
    ),
    outputs=(
        %"mul"<FLOAT,[2]>
    ),
) {
    0 |  # node_Mul_0
         %"mul"<FLOAT,[2]> ⬅️ ::Mul(%"sub", %"p_submodule_weight")
    return %"mul"<FLOAT,[2]>
}

<
    opset_imports={'': 18, 'pkg.torch.__subgraph__': 1},
>
def pkg.torch.__subgraph__::true_graph_0(
    inputs=(
        %"p_weight"<FLOAT,[1]>,
        %"x"<INT64,[2]>,
        %"p_submodule_weight"<FLOAT,[1]>
    ),
    outputs=(
        %"getitem"<FLOAT,[2]>
    ),
) {
    0 |  # node_Cast_0
         %"convert_element_type_default"<FLOAT,[2]> ⬅️ ::Cast(%"x") {to=FLOAT}
    1 |  # node_Sub_1
         %"sub"<FLOAT,[2]> ⬅️ ::Sub(%"convert_element_type_default", %"p_weight")
    2 |  # node_ReduceSum_2
         %"sum_1"<FLOAT,[]> ⬅️ ::ReduceSum(%"sub") {keepdims=False, noop_with_empty_axes=0}
    3 |  # node_Constant_3
         %"val_0"<?,?> ⬅️ ::Constant() {value=Tensor<INT64,[]>(array(0), name=None)}
    4 |  # node_Cast_4
         %"scalar_tensor_default"<FLOAT,[]> ⬅️ ::Cast(%"val_0") {to=FLOAT}
    5 |  # node_LessOrEqual_5
         %"le"<BOOL,[]> ⬅️ ::LessOrEqual(%"sum_1", %"scalar_tensor_default")
    6 |  # node_If_6
         %"getitem"<FLOAT,[2]> ⬅️ ::If(%"le") {then_branch=
             graph(
                 name=true_graph_0__true_graph_0,
                 inputs=(

                 ),
                 outputs=(
                     %"mul_true_graph_0__true_graph_0"<?,?>
                 ),
             ) {
                 0 |  # node_true_graph_0__true_graph_0_0
                      %"mul_true_graph_0__true_graph_0"<?,?> ⬅️ pkg.torch.__subgraph__::true_graph_0__true_graph_0(%"p_submodule_weight", %"sub")
                 return %"mul_true_graph_0__true_graph_0"<?,?>
             }, else_branch=
             graph(
                 name=true_graph_0__false_graph_0,
                 inputs=(

                 ),
                 outputs=(
                     %"div_true_graph_0__false_graph_0"<?,?>
                 ),
             ) {
                 0 |  # node_true_graph_0__false_graph_0_0
                      %"div_true_graph_0__false_graph_0"<?,?> ⬅️ pkg.torch.__subgraph__::true_graph_0__false_graph_0(%"p_submodule_weight", %"sub")
                 return %"div_true_graph_0__false_graph_0"<?,?>
             }}
    return %"getitem"<FLOAT,[2]>
}

<
    opset_imports={'': 18},
>
def pkg.onnxscript.torch_lib.common::Rank(
    inputs=(
        %"input"<?,?>
    ),
    outputs=(
        %"return_val"<?,?>
    ),
) {
    0 |  # n0
         %"tmp"<?,?> ⬅️ ::Shape(%"input")
    1 |  # n1
         %"return_val"<?,?> ⬅️ ::Size(%"tmp")
    return %"return_val"<?,?>
}

<
    opset_imports={'': 18},
>
def pkg.onnxscript.torch_lib.common::IsScalar(
    inputs=(
        %"input"<?,?>
    ),
    outputs=(
        %"return_val"<?,?>
    ),
) {
    0 |  # n0
         %"tmp"<?,?> ⬅️ ::Shape(%"input")
    1 |  # n1
         %"tmp_0"<?,?> ⬅️ ::Size(%"tmp")
    2 |  # n2
         %"tmp_1"<?,?> ⬅️ ::Constant() {value_int=0}
    3 |  # n3
         %"return_val"<?,?> ⬅️ ::Equal(%"tmp_0", %"tmp_1")
    return %"return_val"<?,?>
}

justinchuby · 2024-11-21T02:53:52Z

@pytorchbot merge -i

justinchuby · 2024-11-21T02:54:02Z

Will address new comments in a follow up PR

pytorchmergebot · 2024-11-21T02:56:09Z

Merge started

Your change will be merged while ignoring the following 4 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge), pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge), pull / linux-jammy-py3.9-gcc11 / test (docs_test, 1, 1, lf.linux.2xlarge), pull / linux-jammy-py3.9-gcc11 / test (backwards_compat, 1, 1, lf.linux.2xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR implements the framework for supporting HOP in the ONNX exporter. Refer to pytorch#140995 for the design. - Implement support for torch.cond - Refactor `_add_nodes` into `_translate_fx_graph` to handle nested subgraphs. To support building subgraphs as functions using the same logic, new handlers for `placeholder` and `output` nodes are added to register inputs and outputs on the onnx function. - Fuctions are created under the domain of `pkg.torch.__subgraph__` - Updated the type promotion pass to run on nested subgraphs. - Implement torch.cond in `_torchlib/ops/hop.py`. Updated the registry to discover these ops. - Improve opset_import handling robustness with `add_opset_imports` IR pass. To achieve this, we added opset version to all Nodes. Fixes pytorch#139503 Fixes pytorch#117655 Fixes pytorch#123972 Fixes pytorch#93743 Closes pytorch#140995 Pull Request resolved: pytorch#137428 Approved by: https://github.com/justinchuby Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

[ONNX] Support model with torch.cond

d1cc835

xadupre requested review from titaiwangms, shubhambhokare1, justinchuby and wschin as code owners October 7, 2024 15:50

pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Oct 7, 2024

xadupre changed the title ~~[ONNX] Add support for torch.cond in onnx exporter~~ [ONNX] [WIP] Add support for torch.cond in onnx exporter Oct 7, 2024

fix annotation

3d89910

pytorchbot added the open source label Oct 7, 2024

justinchuby self-assigned this Oct 7, 2024

titaiwangms reviewed Oct 7, 2024

View reviewed changes

test/onnx/exporter/test_aten_cond.py Outdated Show resolved Hide resolved

torch/onnx/_internal/exporter/_core.py Outdated Show resolved Hide resolved

titaiwangms added topic: bug fixes topic category module: onnx Related to torch.onnx labels Oct 7, 2024

gramalingam reviewed Oct 8, 2024

View reviewed changes

torch/onnx/_internal/exporter/_core.py Outdated Show resolved Hide resolved

xadupre added 2 commits November 4, 2024 12:27

Merge branch 'main' of https://github.com/pytorch/pytorch into onnx_cond

e4b5870

complete support of torch.cond

de5790a

xadupre mentioned this pull request Nov 4, 2024

[ONNX] [WIP] First draft to preserve submodule #139619

Closed

justinchuby reviewed Nov 4, 2024

View reviewed changes

torch/onnx/_internal/exporter/_core.py Outdated Show resolved Hide resolved

justinchuby reviewed Nov 4, 2024

View reviewed changes

torch/onnx/_internal/exporter/_core.py Outdated Show resolved Hide resolved

justinchuby requested changes Nov 4, 2024

View reviewed changes

justinchuby reviewed Nov 4, 2024

View reviewed changes

test/onnx/exporter/test_small_models_e2e.py Outdated Show resolved Hide resolved

xadupre added 2 commits November 5, 2024 11:11

Merge branch 'main' of https://github.com/pytorch/pytorch into onnx_cond

ede7891

use ir

9ec86b7

xadupre changed the title ~~[ONNX] [WIP] Add support for torch.cond in onnx exporter~~ [ONNX] Add support for torch.cond in onnx exporter Nov 5, 2024

lint

a1627cf

titaiwangms reviewed Nov 5, 2024

View reviewed changes

torch/onnx/_internal/exporter/_core.py Outdated Show resolved Hide resolved

titaiwangms reviewed Nov 19, 2024

View reviewed changes

torch/onnx/_internal/exporter/_torchlib/ops/hop.py Show resolved Hide resolved

gramalingam reviewed Nov 19, 2024

View reviewed changes

test/onnx/exporter/test_small_models_e2e.py Outdated Show resolved Hide resolved

justinchuby added 10 commits November 19, 2024 15:46

Update test_small_models_e2e.py

ca634a1

Update _building.py

f04378c

Update _core.py

d4c8447

Update _fx_passes.py

600e23e

Update _registration.py

d1869c5

Update _schemas.py

9616504

Update _torchlib_registry.py

7d2c89a

Update _building.py

f9165b5

Update _registration.py

fd86370

Update test_small_models_e2e.py

00b883d

justinchuby added this to the 2.6.0 milestone Nov 20, 2024

justinchuby added 2 commits November 19, 2024 18:33

Update __init__.py

eda5c04

Update _ir_passes.py

9f081b8

pytorchmergebot added the merging label Nov 21, 2024

pytorchmergebot added the Merged label Nov 21, 2024

pytorchmergebot closed this in 0a4bcbf Nov 21, 2024

pytorchmergebot removed the merging label Nov 21, 2024

justinchuby mentioned this pull request Nov 24, 2024

Sync changes justinchuby/torch-onnx#211

Open

atalman mentioned this pull request Jan 13, 2025

Release 2.6.0 validations checklist and cherry-picks #144503

Closed

73 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ONNX] Add support for torch.cond/HOP in onnx exporter #137428

[ONNX] Add support for torch.cond/HOP in onnx exporter #137428

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[ONNX] Add support for torch.cond/HOP in onnx exporter #137428

[ONNX] Add support for torch.cond/HOP in onnx exporter #137428

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/137428

❗ 1 Active SEVs

❌ 3 New Failures, 1 Unrelated Failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!