10BC0 Basic utilities to support remote autotuning by masnesral · Pull Request #153201 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Basic utilities to support remote autotuning #153201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

masnesral
Copy link
Contributor
@masnesral masnesral commented May 8, 2025

Stack from ghstack (oldest at bottom):

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the seri 8000 alized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache.

Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing    TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache.

Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable.

[ghstack-poisoned]
Copy link
pytorch-bot bot commented May 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153201

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

masnesral added a commit that referenced this pull request May 8, 2025
Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing    TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache.

Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable.

ghstack-source-id: 42798e2
Pull Request resolved: #153201
@masnesral masnesral added the topic: not user facing topic category label May 8, 2025
@masnesral masnesral marked this pull request as ready for review May 9, 2025 15:20
@masnesral masnesral requested review from eellison and aorenste May 9, 2025 15:20
Copy link
Contributor
@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question about symbolic shapes

Comment on lines 44 to 46
allocation_size: Sequence[sympy.Expr]
name: str
value: Optional[torch.Tensor] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work with sympy expressions ? the remote server wont have the shape env. We'd want to at least pass hints in I think.

Also, we probably want the invariant that the benchmarking wont add any guards.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about autotuning - but can autotuning use sympy expressions? That's really only valid with FakeTensors, right? I would expect that autotuning would have to happen on real tensors since you're looking at real performance? Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I guess I didn't actually think hard about what I was doing here -- the sympy.Expr was just to satisfy the typechecker when I linted before publishing the PR. @eellison at the time of autotuning can this actually be a symbolic expression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

synced offline. We're calling get_hints() to get the allocation_size, so this should be the same type as the size field.

return benchmark


class TestSelectAlgorithmRemote(TestSelectAlgorithm):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice ! it might be worth doing this on a subset of the test_max_autotune tests as well



@dataclasses.dataclass
class TensorMeta:
Copy link
Contributor

Choose a reason for hi 8000 ding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could/should reuse torch._subclasses.meta_utils.MetaTensorDesc

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing    TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache.

Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
masnesral added a commit that referenced this pull request May 13, 2025
Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing    TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache.

Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable.

ghstack-source-id: bcfd4b3
Pull Request resolved: #153201
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jul 12, 2025
@github-actions github-actions bot closed this Aug 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0