10000 [reland] Add autograd hook for python rpc call by zhaojuanmao · Pull Request #28312 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[reland] Add autograd hook for python rpc call #28312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

zhaojuanmao
Copy link
Contributor
@zhaojuanmao zhaojuanmao commented Oct 18, 2019

Stack from ghstack:

  1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
    it still send rpc with autograd meta. This is not ideal.
    This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

  2. meanwhile create a utiliy to attach autograd info and functions as needed

  3. add autograd send/recv functions for python rpc call

  4. make changes to support nested python rpc calls

  5. disallow nested dist autograd context (was landed in Distributed Autograd - FAST mode backward pass implementation. #27022)

Differential Revision: D18017554

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)

[ghstack-poisoned]
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)

[ghstack-poisoned]
Copy link
Member
@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, this is a reland and the potential flakiness of test_worker_ids_recorded should be fixed by adding self._check_rpc_done.

@zhaojuanmao
Copy link
Contributor Author

previous commit caused "test_worker_ids_recorded" to be flaky. added "self._check_rpc_done(1)" to fix the flakiness.

@zhaojuanmao
Copy link
Contributor Author

hmm, "test_worker_ids_recorded" still timed out after adding _check_rpc_done, looking

@zhaojuanmao
Copy link
Contributor Author

we still need dist.barrier() for this test case actually

@zhaojuanmao
Copy link
Contributor Author

to simplify, I removed graph tests for tensors without grads from test_worker_ids_recorded

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 19, 2019
Pull Request resolved: #28312

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92240367

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 56eb4f7.

@facebook-github-bot facebook-github-bot deleted the gh/zhaojuanmao/12/head branch October 28, 2019 22:23
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#28312

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in pytorch#27022)
ghstack-source-id: 92240367

Test Plan: unit tests

Differential Revision: D18017554

fbshipit-source-id: dbe79a5171063901a78a9b3322b9b31c159d098d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0