-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Basic utilities to support remote autotuning #153201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153201
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. ghstack-source-id: 42798e2 Pull Request resolved: #153201
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one question about symbolic shapes
torch/_inductor/autotune_remote.py
Outdated
allocation_size: Sequence[sympy.Expr] | ||
name: str | ||
value: Optional[torch.Tensor] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with sympy expressions ? the remote server wont have the shape env. We'd want to at least pass hints in I think.
Also, we probably want the invariant that the benchmarking wont add any guards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about autotuning - but can autotuning use sympy expressions? That's really only valid with FakeTensors, right? I would expect that autotuning would have to happen on real tensors since you're looking at real performance? Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I guess I didn't actually think hard about what I was doing here -- the sympy.Expr was just to satisfy the typechecker when I linted before publishing the PR. @eellison at the time of autotuning can this actually be a symbolic expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
synced offline. We're calling get_hints() to get the allocation_size, so this should be the same type as the size field.
return benchmark | ||
|
||
|
||
class TestSelectAlgorithmRemote(TestSelectAlgorithm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice ! it might be worth doing this on a subset of the test_max_autotune tests as well
|
||
|
||
@dataclasses.dataclass | ||
class TensorMeta: |
There was a problem hiding this comment.
Choose a reason for hi 8000 ding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could/should reuse torch._subclasses.meta_utils.MetaTensorDesc
Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the serialized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache. Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable. ghstack-source-id: bcfd4b3 Pull Request resolved: #153201
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
Summary: Adds some new utilities in autotune_remote.py that will be used to construct remote autotuning requests (tensor metadata, extern kernel details, triton kernel details). For Triton, I reuse the existing TritonBenchmarkRequest / TritonTemplateCaller that we use for subprocess autotuning. For extern kernels, I provide the name of the kernel and expect to look it up on the other side of the wire. The new utilities also contain the methods to construct args from the seri 8000 alized request and call the existing ("in process") benchmark helpers in AlgorithmSelectorCache.
Test Plan: New test case that runs all the test_select_algorithm tests, but routes the benchmark request through the remote utilties to make sure the remote request a) is serializable, and b) the choices in the deserialized object are benchmarkable.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov