8000 Recheck autotune cache on static cuda launcher load by jamesjwu · Pull Request #153565 · pytorch/pytorch · GitHub

Recheck autotune cache on static cuda launcher load #153565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

jamesjwu wants to merge 9 commits into gh/jamesjwu/152/base from gh/jamesjwu/152/head

Contributor

jamesjwu commented

•

Stack from ghstack (oldest at bottom):

-> Recheck autotune cache on static cuda launcher load #153565

When loading statically launchable triton kernels from FxGraphCache, since we don't instantiate a CachingAutotuner like we do normally, we need to recheck the autotune cache based on the existing compile results. If we get a hit, we take the compile result whose config matches the best config.

Sometimes, the best config will have been from coordinate descent tuning. In this case, FxGraphCache today does not cache the resulting triton kernel, neither with static or without static cuda launcher. This is because coordinate descent tuning happens at runtime, and if the best config happens to not be one of the precompiled configs.

Test Plan:
New unit test that failed before

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov


          Update

2ecc515

[ghstack-poisoned]

pytorch-bot bot commented

•

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153565

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1b0a78e with merge base d81217b ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

rocm / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.2) (gh) (trunk failure)
inductor/test_max_autotune.py::TestMaxAutotune::test_triton_template_generated_code_caching

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added ciflow/inductor module: inductor labels

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

ghstack-source-id: 004eeac
Pull Request resolved: #153565

jamesjwu added topic: not user facing and removed ciflow/inductor labels

jamesjwu requested review from eellison, aorenste and oulgen and removed request for aorenste

May 14, 2025 20:06

pytorch-bot bot added the ciflow/inductor label

jamesjwu requested a review from Mingming-Ding

May 14, 2025 20:29


          Update

298b8fc

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

bd6c9e1

ghstack-source-id: da1275b
Pull Request resolved: #153565

aorenste reviewed

View reviewed changes

Contributor

aorenste left a comment

What's up with the gloo change?

aorenste approved these changes

View reviewed changes

torch/_inductor/runtime/triton_heuristics.py

+                          autotune_cache_info["only_config"] = triton_config_to_hashable(configs[0])
+                      if disabled:
+                          autotune_cache_info["autotune_cache_state"] = "force_disabled"

Contributor

aorenste

This is just moved code, but in the case of len(configs)==1 AND disabled is it intentional to overwrite the autotune_cache_state?

Contributor Author

jamesjwu

Yes, force disable should override all the other logging

torch/_inductor/runtime/triton_heuristics.py Outdated

                       f.write(f"{kernel_name} | {args_str} | {grid!r}\n")
+              def check_autotune_cache(
+                  configs, filename, inductor_meta

Contributor

aorenste

Can we annotate these?

torch/_inductor/runtime/triton_heuristics.py Outdated

@@ @@ -298,6 +347,39 @@ def is_statically_launchable(self): @@
                           isinstance(x, StaticTritonCompileResult) for x in self.compile_results
                       )
+                  def recheck_autotune_cache(self, reload_kernel_from_src) -> None:

Contributor

aorenste

Annotate reload_kernel_from_src?

eellison reviewed

View reviewed changes

Contributor

eellison left a comment

+1 gloo, also have a couple questions

torch/_inductor/runtime/triton_heuristics.py

+                      )
+                      self.autotune_cache_info = autotune_cache_info
+                      # I.e. there was an autotune cache hit
+                      if len(cached_configs) == 1 and len(configs) > 1:

Contributor

eellison

why do we need to check len(configs) > 1 ? we'd still coordinate_descent_tune with a single config right ?

Contributor Author

jamesjwu

If len(configs) == 1, we don't need to prune the list of configs or compile results, so there's no need to loop through the list. If coordesc tuning is on, then we'll start coordesc tuning immediately.

torch/_inductor/runtime/triton_heuristics.py

Comment on lines +377 to +380

+                          if best_config.found_by_coordesc:
+                              with dynamo_timed("CachingAutotuner.slow_precompile_config"):
+                                  if self.fn.f
10000
n is None:
+                                      self.fn = reload_kernel_from_src().fn

Contributor

eellison

Is there a reason we need to rely on best_config.found_by_coordesc ? should this be an assert, or should we always be reloading ?

Contributor Author

jamesjwu

I think this can be an assert yes, because if best_config isn't in the list of compiled configs it should always be because of coordesc.

Contributor Author

jamesjwu

That said, I was a little scared that I might have missed a case where it's possible for best_config to be not in our list... and it doesn't seem particularly helpful to crash in that case? We should just re-autotune and pretend it was a cache miss IMO

torch/_inductor/runtime/triton_heuristics.py

Comment on lines +369 to +370

		for compile_result in self.compile_results:
		if triton_config_to_hashable(compile_result.config) == best_config_hash:

Contributor

eellison

Just so I am following, if we were coordinate descent tuning, and happened to have the best config from start, then these would be equal ?

Contributor Author

jamesjwu

Correct: the only case where it falls out of this for loop is if coordesc tuning finds a best_config that wasn't one of the precompiled options. In that case, we haven't saved anything in the cache and need to recompile.

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

826c38e

ghstack-source-id: da1275b
Pull Request resolved: #153565

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

20aa2c8

ghstack-source-id: da1275b
Pull Request resolved: #153565

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

092cf07

ghstack-source-id: da1275b
Pull Request resolved: #153565


          Remove third party changes

41256eb

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

c069cfd

ghstack-source-id: bff3400
Pull Request resolved: #153565

Contributor Author

jamesjwu commented

Fixed gloo


          Add type annotations

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

abb8eee

ghstack-source-id: 5d04a72
Pull Request resolved: #153565


          Update

81ebd18

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

ghstack-source-id: d29ef5f
Pull Request resolved: #153565

Contributor Author

jamesjwu commented

@pytorchbot merge

Collaborator

pytorchmergebot commented

@jamesjwu your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request


          Revert "Recheck autotune cache on static cuda launcher load (#153565)"

b0e5402

This reverts commit 02af4e8.

Reverted #153565 on behalf of https://github.com/malfet due to Looks like it broke ROCM, see https://hud.pytorch.org/hud/pytorch/pytorch/ee72c53c884ce5d0cbdd50641557df5c5783afbf/1?per_page=50&name_filter=rocm%20%2F%20linux&mergeEphemeralLF=true ([comment](#153565 (comment)))

pytorchmergebot added Reverted ci-no-td labels

pytorchmergebot reopened this


          Fix ROCM

cc84ffc

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

c843b0a

ghstack-source-id: fb86728
Pull Request resolved: #153565

Contributor Author

jamesjwu commented

Fixed ROCM

desertfire added the ciflow/rocm label


          Update

12ebdf7

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

3474f22

ghstack-source-id: a90b646
Pull Request resolved: #153565


          Rebase

1b0a78e

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request


          Recheck autotune cache on static cuda launcher load

ebdee34

ghstack-source-id: daba1d3
Pull Request resolved: #153565

Contributor Author

jamesjwu commented

@pytorchbot merge

pytorchmergebot added the merging label

Collaborator

pytorchmergebot commented

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this in

4b759d9

pytorchmergebot removed the merging label

pytorchmergebot pushed a commit that referenced this pull request


          Add option to statically launch user defined triton kernels (#153725)

33767eb

Pull Request resolved: #153725
Approved by: https://github.com/oulgen, https://github.com/Mingming-Ding, https://github.com/jansel
ghstack dependencies: #153565

jamesjwu mentioned this pull request

[easy] Fix internal only test #154035

Closed

Contributor

facebook-github-bot commented

This pull request was exported from Phabricator. Differential Revision: D75078076

facebook-github-bot added the fb-exported label

Contributor

yangw-dev commented

@pytorchbot revert -c ghfirst -m "sorry, but your pr failed at internal tests
see D75078076"

pytorch-bot bot commented

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

Contributor

yangw-dev commented

@pytorchbot revert -c ghfirst -m sorry, but your pr failed at internal testssee D75078076

pytorch-bot bot commented

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: but your pr failed at internal testssee D75078076

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

Collaborator

pytorchmergebot commented

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

Collaborator

pytorchmergebot commented

Reverting PR 153565 failed

Reason: Comment with id 2898685388 not found

Details for Dev Infra team

Raised by workflow job

pytorchmergebot pushed a commit that referenced this pull request


          [easy] Fix internal only test (#154035)

b184e3d

Internally static cuda launcher isn't enabled, so we need to always enable it

Differential Revision: [D75146584](https://our.internmc.facebook.com/intern/diff/D75146584/)

Pull Request resolved: #154035
Approved by: https://github.com/Skylion007
ghstack dependencies: #153565

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td ciflow/inductor ciflow/rocm ciflow/trunk fb-exported Merged module: inductor Reverted topic: not user facing

0