10000 [invoke_subgraph] Force the output stride to be same as eager by anijain2305 · Pull Request #152806 · pytorch/pytorch · GitHub

[invoke_subgraph] Force the output stride to be same as eager #152806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

anijain2305 wants to merge 6 commits into gh/anijain2305/753/base from gh/anijain2305/753/head

Contributor

anijain2305 commented

•

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov


          [invoke_subgraph] Force the output stride to be same as eager

3b0eea0

[ghstack-poisoned]

This was referenced May 5, 2025

[fx] Recursive DCE on subgraphs #152772

Closed

[inductor][refactor] Refactor the fetching of subgraph names #152770

Closed

pytorch-bot bot commented

•

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152806

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

CI workflows being skipped on PR

❌ 1 New Failure, 3 Unrelated Failures

As of commit ba8920b with merge base fdadda2 ():

NEW FAILURE - The following job has failed:

inductor / unit-test / cuda12.6-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh)
[ FAILED ] AotInductorTest.BasicTestCpu

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
Process completed with exit code 1.
inductor / cuda12.6-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, ephemeral.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
Process completed with exit code 1.
trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable) (gh) (trunk failure)
'test/dynamo/test_unittest.py::CPythonTest_Assertions::testAssertNotRegex'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

anijain2305 mentioned this pull request

[invoke_subgraph] Run missing graph passes recursively #152675

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels

anijain2305 added a commit that referenced this pull request


          [invoke_subgraph] Force the output stride to be same as eager

e2b7150

ghstack-source-id: 48ffd03
Pull Request resolved: #152806

anijain2305 requested review from eellison, zou3519 and bdhirsh

May 5, 2025 14:54


          Update on "[invoke_subgraph] Force the output stride to be same as ea…

e303edc

…ger"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request


          [invoke_subgraph] Force the output stride to be same as eager

3f2e93e

ghstack-source-id: 3dac617
Pull Request resolved: #152806

bdhirsh reviewed

View reviewed changes

Contributor

bdhirsh left a comment

lgtm but i'll wait for someone from inductor to review

anijain2305 added ciflow/trunk topic: not user facing labels


          Update on "[invoke_subgraph] Force the output stride to be same as ea…

1e79f75

…ger"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request


          [invoke_subgraph] Force the output stride to be same as eager

b61c66f

ghstack-source-id: 919235a
Pull Request resolved: #152806


          Update on "[invoke_subgraph] Force the output stride to be same as ea…

dfe1877

…ger"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request


          [invoke_subgraph] Force the output stride to be same as eager

a23e24a

ghstack-source-id: 26feecc
Pull Request resolved: #152806


          Update on "[invoke_subgraph] Force the output stride to be same as ea…

b6b482d

…ger"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request


          [invoke_subgraph] Force the output stride to be same as eager

93f50de

ghstack-source-id: 945ec5b
Pull Request resolved: #152806


          Update on "[invoke_subgraph] Force the output stride to be same as ea…

ba8920b

…ger"

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]

anijain2305 added a commit that referenced this pull request


          [invoke_subgraph] Force the output stride to be same as eager

858440e

ghstack-source-id: ea0fe73
Pull Request resolved: #152806

zou3519 reviewed

View reviewed changes

torch/_inductor/ir.py

Comment on lines +7529 to +7530

		example_stride = handle_sym_expr(fake_outputs[idx].stride())
		new_outputs.append(cls.require_exact_strides(output, example_stride))

Contributor

zou3519

I'm not sure this is right. Can Inductor passes change the fake_outputs in a way that they differ from eager?

If so we need to record the meta vals at the time of tracing, before passes run, and then use the metadata on them.

Contributor Author

anijain2305

I guess this applies to inputs of the invoke subgraph then as well. Currently, we rely on the meta vals of the inputs of invoke subgraph, which could be different from eager because of graph passes

Contributor

zou3519

Can you remind me why we want to force the inputs and output strides to be the same as eager? If we were not doing invoke_subgraph, inductor is allowed to change intermediates in the graph to have whatever strides it wants, with some exceptions.

Contributor Author

anijain2305

•

This is to reduce compile time. We compile a subgraph once and then call the same subgraph output code on second call. Since the input strides can be different for different subgraph calls, we restride the input to a fixed value at the beginning of each subgraph.

This allows us to reuse the output code of a subgraph. This is very important for compile time, otherwise the major benefits of invoke subgraph are not realized.

It is possible that the restriding is not to eager strides but to some strides after inductor graph passes are run. Nevertheless, it's a fixed and valid input strides.

Contributor

zou3519

We have some infrastructure to do this already (for inputs), check out

pytorch/torch/fx/experimental/proxy_tensor.py

Lines 1127 to 1134 in bc11afd

    
           if _should_save_eager_input_vals(target, (args, kwargs)): 
        
               # NOTE "eager_input_vals" 
        
               # We save the original (args, kwargs) FakeTensor values for nodes 
        
               # that have exact stride requirements. This is useful downstream. 
        
               # We use this information inside Inductor to ensure that inputs to 
        
               # stride-sensitive operators have the correct strides. 
        
               arg_inp, kwarg_inp = torch.fx.node.map_aggregate((args, kwargs), map_fn)  # type: ignore[misc, arg-type] 
        
               node.meta["eager_input_vals"] = (arg_inp, kwarg_inp)

Contributor

eellison

Yea - let's use the above mechanism

Contributor Author

anijain2305

•

I can use this for input. Is there anything for the output strides? The pointer is only for the inputs, but I also want to constrain the outputs.

zou3519 reviewed

View reviewed changes

torch/_inductor/ir.py

@@ @@ -7515,6 +7519,17 @@ def create_output(output: IRNode, ind: int): @@
                                   skip_size_stride_alignment_checks=True,
                               )
+                      # Force the output strides to be same as the original strides

Contributor

zou3519

This needs a test at the very least. You can add an invoke_subgraph node, then do a graph pass that changes the outputs in the invoke_subgraph subgraph, and then check to make sure the strides are still what you expect.

Contributor Author

anijain2305

•

Yes, I was not able to get a test working.

I was looking at a regression when I wrap the whole model with the invoke subgraph. When I diffed the output code, I saw an extra kernel after the invoke subgraph call, even though there was no operation outside of the invoke subgraph call. So this PR was my attempt to make the stride of invoke subgraph same as eager output to avoid that extra kernel. This fixed the regression. But after your comment about passes changing meta vals, I am not sure if this is correct (or what should be the solution to avoid the extra kernel)

eellison reviewed

View reviewed changes

Contributor

eellison left a comment

let me know when you want review ! stride issues can be very tricky and it's worth pushing on this imo.

Contributor Author

anijain2305 commented

Definitely on my todo list .. just need time to understand the inductor codebase more to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk module: inductor topic: not user facing

0