Basic utilities to support remote autotuning #153201

masnesral · 2025-05-08T21:05:20Z

Stack from ghstack (oldest at bottom):

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the seri 8000 alized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache.

Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. [ghstack-poisoned]

pytorch-bot · 2025-05-08T21:05:24Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153201

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[PREEMPTIVE] Removal of ephemeral variants on scale-config.yml

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. ghstack-source-id: 42798e2 Pull Request resolved: #153201

eellison

one question about symbolic shapes

eellison · 2025-05-09T16:22:17Z

torch/_inductor/autotune_remote.py

+    allocation_size: Sequence[sympy.Expr]
+    name: str
+    value: Optional[torch.Tensor] = None


How does this work with sympy expressions ? the remote server wont have the shape env. We'd want to at least pass hints in I think.

Also, we probably want the invariant that the benchmarking wont add any guards.

I don't know enough about autotuning - but can autotuning use sympy expressions? That's really only valid with FakeTensors, right? I would expect that autotuning would have to happen on real tensors since you're looking at real performance? Or am I missing something?

yeah, I guess I didn't actually think hard about what I was doing here -- the sympy.Expr was just to satisfy the typechecker when I linted before publishing the PR. @eellison at the time of autotuning can this actually be a symbolic expression?

synced offline. We're calling get_hints() to get the allocation_size, so this should be the same type as the size field.

eellison · 2025-05-09T16:23:08Z

test/inductor/test_select_algorithm.py

+    return benchmark
+
+
+class TestSelectAlgorithmRemote(TestSelectAlgorithm):


nice ! it might be worth doing this on a subset of the test_max_autotune tests as well

aorenste · 2025-05-09T17:54:53Z

torch/_inductor/autotune_remote.py

+
+
+@dataclasses.dataclass
+class TensorMeta:


I wonder if we could/should reuse torch._subclasses.meta_utils.MetaTensorDesc

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]

Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. ghstack-source-id: bcfd4b3 Pull Request resolved: #153201

github-actions · 2025-07-12T23:36:25Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

This was referenced May 8, 2025

Refactor nested benchmarking functions in select_algorithm.py #153084

Closed

Rename "output_tensor" -> "out" in autotune_process.py #153169

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels May 8, 2025

masnesral added the topic: not user facing topic category label May 8, 2025

masnesral marked this pull request as ready for review May 9, 2025 15:20

masnesral requested review from eellison and aorenste May 9, 2025 15:20

eellison reviewed May 9, 2025

View reviewed changes

aorenste reviewed May 9, 2025

View reviewed changes

masnesral mentioned this pull request May 13, 2025

Refactor tests in test_max_autotune into a few separate test cases. #153486

Closed

github-actions bot added the Stale label Jul 12, 2025

github-actions bot closed this Aug 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Basic utilities to support remote autotuning #153201

Basic utilities to support remote autotuning #153201

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		return benchmark


		class TestSelectAlgorithmRemote(TestSelectAlgorithm):

Basic utilities to support remote autotuning #153201

Basic utilities to support remote autotuning #153201

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153201

❗ 1 Active SEVs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hi 8000 ding this comment

Uh oh!

Uh oh!

Uh oh!