-
Notifications
You must be signed in to change notification settings - Fork 24.8k
Simplify _compute_symbolic_stride() #138844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138844
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3bee778 with merge base 82ce888 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
Rewrite _compute_symbolic_stride() to make it simpler and faster. The existing code involves several inner loops in an attempt to process the common case faster - but in reality this effort is actually slower than the simpler code. Testing: The initial version of this PR (which passed all tests) ran both the old algorithm and new algorithm and compared the results to make sure that results were substantially the same (they weren't the same simply because the algorithm allocates new dynamic symbols as part of it). I also measured the timing of both methods and from the cases I checked the simpler algorithm was generally about 30% faster (which was usually the "fast path" of the old algorithm). cc ezyang SherlockNoMad EikanWang jgong5 wenzhe-nrv [ghstack-poisoned]
I am deferring to Bob as he is also monkeying around with this code recently |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Rewrite _compute_symbolic_stride() to make it simpler and faster. The existing code involves several inner loops in an attempt to process the common case faster - but in reality this effort is actually slower than the simpler code. Testing: The initial version of this PR (which passed all tests) ran both the old algorithm and new algorithm and compared the results to make sure that results were substantially the same (they weren't the same simply because the algorithm allocates new dynamic symbols as part of it). I also measured the timing of both methods and from the cases I checked the simpler algorithm was generally about 30% faster (which was usually the "fast path" of the old algorithm). Pull Request resolved: pytorch#138844 Approved by: https://github.com/bobrenjc93 ghstack dependencies: pytorch#138843
Rewrite _compute_symbolic_stride() to make it simpler and faster.
The existing code involves several inner loops in an attempt to process the common case faster - but in reality this effort is actually slower than the simpler code.
Testing:
The initial version of this PR (which passed all tests) ran both the old algorithm and new algorithm and compared the results to make sure that results were substantially the same (they weren't the same simply because the algorithm allocates new dynamic symbols as part of it).
I also measured the timing of both methods and from the cases I checked the simpler algorithm was generally about 30% faster (which was usually the "fast path" of the old algorithm).
Stack from ghstack (oldest at bottom):
cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv