-
Notifications
You must be signed in to change notification settings - Fork 24.4k
[reland] Add autograd hook for python rpc call #28312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/) [ghstack-poisoned]
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/) [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, this is a reland and the potential flakiness of test_worker_ids_recorded
should be fixed by adding self._check_rpc_done
.
previous commit caused "test_worker_ids_recorded" to be flaky. added "self._check_rpc_done(1)" to fix the flakiness. |
hmm, "test_worker_ids_recorded" still timed out after adding _check_rpc_done, looking |
we still need dist.barrier() for this test case actually |
to simplify, I removed graph tests for tensors without grads from test_worker_ids_recorded |
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/) [ghstack-poisoned]
Pull Request resolved: #28312 1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) ghstack-source-id: 92240367 Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)
This pull request has been merged in 56eb4f7. |
Summary: Pull Request resolved: pytorch#28312 1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in pytorch#27022) ghstack-source-id: 92240367 Test Plan: unit tests Differential Revision: D18017554 fbshipit-source-id: dbe79a5171063901a78a9b3322b9b31c159d098d
Stack from ghstack:
currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads
meanwhile create a utiliy to attach autograd info and functions as needed
add autograd send/recv functions for python rpc call
make changes to support nested python rpc calls
disallow nested dist autograd context (was landed in Distributed Autograd - FAST mode backward pass implementation. #27022)
Differential Revision: D18017554