8000 Recheck autotune cache on static cuda launcher load by jamesjwu · Pull Request #153565 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Recheck autotune cache on static cuda launcher load #153565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

jamesjwu
Copy link
Contributor
@jamesjwu jamesjwu commented May 14, 2025

Stack from ghstack (oldest at bottom):

When loading statically launchable triton kernels from FxGraphCache, since we don't instantiate a CachingAutotuner like we do normally, we need to recheck the autotune cache based on the existing compile results. If we get a hit, we take the compile result whose config matches the best config.

Sometimes, the best config will have been from coordinate descent tuning. In this case, FxGraphCache today does not cache the resulting triton kernel, neither with static or without static cuda launcher. This is because coordinate descent tuning happens at runtime, and if the best config happens to not be one of the precompiled configs.

Test Plan:
New unit test that failed before

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
Copy link
pytorch-bot bot commented May 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153565

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1b0a78e with merge base d81217b (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jamesjwu added a commit that referenced this pull request May 14, 2025
ghstack-source-id: 004eeac
Pull Request resolved: #153565
@jamesjwu jamesjwu added topic: not user facing topic category and removed ciflow/inductor labels May 14, 2025
@jamesjwu jamesjwu requested review from eellison, aorenste and oulgen and removed request for aorenste May 14, 2025 20:06
@jamesjwu jamesjwu requested a review from Mingming-Ding May 14, 2025 20:29
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 14, 2025
ghstack-source-id: da1275b
Pull Request resolved: #153565
Copy link
Contributor
@aorenste aorenste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the gloo change?

autotune_cache_info["only_config"] = triton_config_to_hashable(configs[0])

if disabled:
autotune_cache_info["autotune_cache_state"] = "force_disabled"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just moved code, but in the case of len(configs)==1 AND disabled is it intentional to overwrite the autotune_cache_state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, force disable should override all the other logging

@@ -187,6 +187,55 @@ def _dump_launch_params(args, kwargs, launcher, kernel_name, grid):
f.write(f"{kernel_name} | {args_str} | {grid!r}\n")


def check_autotune_cache(
configs, filename, inductor_meta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we annotate these?

@@ -298,6 +347,39 @@ def is_statically_launchable(self):
isinstance(x, StaticTritonCompileResult) for x in self.compile_results
)

def recheck_autotune_cache(self, reload_kernel_from_src) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotate reload_kernel_from_src?

Copy link
Contributor
@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 gloo, also have a couple questions

)
self.autotune_cache_info = autotune_cache_info
# I.e. there was an autotune cache hit
if len(cached_configs) == 1 and len(configs) > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to check len(configs) > 1 ? we'd still coordinate_descent_tune with a single config right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If len(configs) == 1, we don't need to prune the list of configs or compile results, so there's no need to loop through the list. If coordesc tuning is on, then we'll start coordesc tuning immediately.

Comment on lines +377 to +380
if best_config.found_by_coordesc:
with dynamo_timed("CachingAutotuner.slow_precompile_config"):
if self.fn.f 10000 n is None:
self.fn = reload_kernel_from_src().fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need to rely on best_config.found_by_coordesc ? should this be an assert, or should we always be reloading ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be an assert yes, because if best_config isn't in the list of compiled configs it should always be because of coordesc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I was a little scared that I might have missed a case where it's possible for best_config to be not in our list... and it doesn't seem particularly helpful to crash in that case? We should just re-autotune and pretend it was a cache miss IMO

Comment on lines +369 to +370
for compile_result in self.compile_results:
if triton_config_to_hashable(compile_result.config) == best_config_hash:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I am following, if we were coordinate descent tuning, and happened to have the best config from start, then these would be equal ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct: the only case where it falls out of this for loop is if coordesc tuning finds a best_config that wasn't one of the precompiled options. In that case, we haven't saved anything in the cache and need to recompile.

jamesjwu added a commit that referenced this pull request May 15, 2025
ghstack-source-id: da1275b
Pull Request resolved: #153565
jamesjwu added a commit that referenced this pull request May 15, 2025
ghstack-source-id: da1275b
Pull Request resolved: #153565
jamesjwu added a commit that referenced this pull request May 15, 2025
ghstack-source-id: da1275b
Pull Request resolved: #153565
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 15, 2025
ghstack-source-id: bff3400
Pull Request resolved: #153565
@jamesjwu
Copy link
Contributor Author

Fixed gloo

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 15, 2025
ghstack-source-id: 5d04a72
Pull Request resolved: #153565
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 15, 2025
ghstack-source-id: d29ef5f
Pull Request resolved: #153565
@jamesjwu
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@jamesjwu your PR has been successfully reverted.

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 19, 2025
ghstack-source-id: fb86728
Pull Request resolved: #153565
@jamesjwu
Copy link
Contributor Author

Fixed ROCM

@desertfire desertfire added the ciflow/rocm Trigger "default" config CI on ROCm label May 19, 2025
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 19, 2025
ghstack-source-id: a90b646
Pull Request resolved: #153565
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request May 19, 2025
ghstack-source-id: daba1d3
Pull Request resolved: #153565
@jamesjwu
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D75078076

@yangw-dev
Copy link
Contributor

@pytorchbot revert -c ghfirst -m "sorry, but your pr failed at internal tests
see D75078076"

Copy link
pytorch-bot bot commented May 21, 2025

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```
Try `@pytorchbot --help` for more info.

@yangw-dev
Copy link
Contributor

@pytorchbot revert -c ghfirst -m sorry, but your pr failed at internal testssee D75078076

Copy link
pytorch-bot bot commented May 21, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: but your pr failed at internal testssee D75078076

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

Reverting PR 153565 failed

Reason: Comment with id 2898685388 not found

Details for Dev Infra team Raised by workflow job

pytorchmergebot pushed a commit that referenced this pull request May 21, 2025
Internally static cuda launcher isn't enabled, so we need to always enable it

Differential Revision: [D75146584](https://our.internmc.facebook.com/intern/diff/D75146584/)

Pull Request resolved: #154035
Approved by: https://github.com/Skylion007
ghstack dependencies: #153565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged module: inductor Reverted topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0