8000 Distributed Autograd - FAST mode backward pass implementation. by pritamdamania87 · Pull Request #27022 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Distributed Autograd - FAST mode backward pass implementation. #27022

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

pritamdamania87
Copy link
Contributor
@pritamdamania87 pritamdamania87 commented Sep 28, 2019

Stack from ghstack:

[test all] This change implements the "FAST" mode distributed autograd backward
pass as described in #23110.

At a high level the backward pass works as follows:

  1. We start by computing dependencies on the node that calls
    torch.distributed.backward.
  2. This node computes the dependencies starting from the root nodes provided in
    the backward call and all the 'send' functions present in the current autograd
    context. The "FAST" mode assumes all 'send' functions are part of the autograd
    computation.
  3. Once the dependency computation is done, the distributed autograd engine
    calls the local autograd engine to execute the autograd graph. Note that the
    autograd graph on a single node is not necessarily connected because of
    inter-node communication. As a result, we have special handling to ensure the
    local autograd engine ensures we execute the entire graph starting from the
    provided roots and all 'send' functions on the node.
  4. When the local autograd engine hits a 'recv' function, it performs an async
    RPC to send the gradients over to the appropriate node and stores a future in
    the autograd context to keep track of this RPC.
  5. On the destination node, the appropriate 'send' function is looked up and
    enqueued on the local autograd engine. If this is the first time the node is
    hearing about this autograd context id on the backward pass, then the node
    computes dependencies for the local autograd engine.
  6. As part of compute dependencies, the distributed autograd engine discovers
    all leaf nodes and ensures those are passed as 'outputs' to the local autograd
    engine. This avoids running the 'AccumulateGrad' function.
  7. The gradients computed for the leaf nodes are then actually accumulated in
    DistAutogradContext for the appropriate autograd context id.
  8. The distributed autograd engine waits for the local autograd engine
    to complete and also waits for all the 'Futures' (stored in 4.) for respective
    RPCs to finish.

We have made the following changes to the local autograd engine for this
purpose:

  1. Expose GraphTask and NodeTask so that the distributed autograd engine can
    use them.
  2. Expose a execute_with_graph_task API which gives the distributed engine
    to build a GraphTask and pass it to the local autograd engine.
  3. Expose a enqueue_on_cpu API, which allows the distributed engine to build
    a NodeTask for a 'send' function and enqueue it on the local autograd engine.

In addition to this a few general improvements:

  1. Added a PropagateGradients RPC call for the 'recv' function to pass
    gradients to the appropriate node during the backward pass.
  2. Use IValues as much as possible in serialization for RpcWithAutograd.
  3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate
    exception instead of just returning the message. This is inline with what most
    Future.wait() APIs do. This was mostly done to ensure Future.wait() propagates errors correctly on the backward pass.
  4. Added a get_gradients(context_id) API which allows users to retrieve a map
    from Tensor to respective gradient for the provided context_id on the local
    node.

Differential Revision: D17652615

This change implements the "FAST" mode distributed autograd backward
pass as described in #23110.

At a high level the backward pass works as follows:
1. We start by computing dependencies on the node that calls
`torch.distributed.backward`.
2. This node computes the dependencies starting from the root nodes provided in
the backward call and all the 'send' functions present in the current autograd
context. The "FAST" mode assumes all 'send' functions are part of the autograd
computation.
3. Once the dependency computation is done, the distributed autograd engine
calls the local autograd engine to execute the autograd graph. Note that the
autograd graph on a single node is not necessarily connected because of
inter-node communication. As a result, we have special handling to ensure the
local autograd engine ensures we execute the entire graph starting from the
provided roots and all 'send' functions on the node.
4. When the local autograd engine hits a 'recv' function, it performs an async
RPC to send the gradients over to the appropriate node and stores a future in
the autograd context to keep track of this RPC.
5. On the destination node, the appropriate 'send' function is looked up and
enqueued on the local autograd engine. If this is the first time the node is
hearing about this autograd context id on the backward pass, then the node
computes dependencies for the local autograd engine.
6. As part of compute dependencies, the distributed autograd engine discovers
all leaf nodes and ensures those are passed as 'outputs' to the local autograd
engine. This avoids running the 'AccumulateGrad' function.
7. The gradients computed for the leaf nodes are then actually accumulated in
`DistAutogradContext` for the appropriate autograd context id.
8. The distributed autograd engine waits for the local autograd engine
to complete and also waits for all the 'Futures' (stored in 4.) for respective
RPCs to finish.

We have made the following changes to the local autograd engine for this
purpose:

1. Expose GraphTask and NodeTask so that the distributed autograd engine can
use them.
2. Expose a `execute_with_graph_task` API which gives the distributed engine
to build a GraphTask and pass it to the local autograd engine.
3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build
a `NodeTask` for a 'send' function and enqueue it on the local autograd engine.

In addition to this a few general improvements:
1. Added a `PropagateGradients` RPC call for the 'recv' function to pass
gradients to the appropriate node during the backward pass.
2. Use IValues as much as possible in serialization for RpcWithAutograd.
3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate
exception instead of just returning the message. This is inline with what most
Future.wait() APIs do.
4. Added a `get_gradients(context_id)` API which allows users to retrieve a map
from Tensor to respective gradient for the provided context_id on the local
node.

Differential Revision: [D17652615](https://our.internmc.facebook.com/intern/diff/D17652615/)

[ghstack-poisoned]
@pytorchbot pytorchbot added caffe2 module: autograd Related to torch.autograd, and the autograd engine in general module: build Build system issues module: cpp Related to C++ API oncall: distributed Add this issue/PR to distributed oncall triage queue labels Sep 28, 2019
@pritamdamania87
Copy link
Contributor Author

I plan to add a few more unit tests I have in mind to this PR, but it is still in a good enough state for review.

…ion."

This change implements the "FAST" mode distributed autograd backward
pass as described in #23110.

At a high level the backward pass works as follows:
1. We start by computing dependencies on the node that calls
`torch.distributed.backward`.
2. This node computes the dependencies starting from the root nodes provided in
the backward call and all the 'send' functions present in the current autograd
context. The "FAST" mode assumes all 'send' functions are part of the autograd
computation.
3. Once the dependency computation is done, the distributed autograd engine
calls the local autograd engine to execute the autograd graph. Note that the
autograd graph on a single node is not necessarily connected because of
inter-node communication. As a result, we have special handling to ensure the
local autograd engine ensures we execute the entire graph starting from the
provided roots and all 'send' functions on the node.
4. When the local autograd engine hits a 'recv' function, it performs an async
RPC to send the gradients over to the appropriate node and stores a future in
the autograd context to keep track of this RPC.
5. On the destination node, the appropriate 'send' function is looked up and
enqueued on the local autograd engine. If this is the first time the node is
hearing about this autograd context id on the backward pass, then the node
computes dependencies for the local autograd engine.
6. As part of compute dependencies, the distributed autograd engine discovers
all leaf nodes and ensures those are passed as 'outputs' to the local autograd
engine. This avoids running the 'AccumulateGrad' function.
7. The gradients computed for the leaf nodes are then actually accumulated in
`DistAutogradContext` for the appropriate autograd context id.
8. The distributed autograd engine waits for the local autograd engine
to complete and also waits for all the 'Futures' (stored in 4.) for respective
RPCs to finish.

We have made the following changes to the local autograd engine for this
purpose:

1. Expose GraphTask and NodeTask so that the distributed autograd engine can
use them.
2. Expose a `execute_with_graph_task` API which gives the distributed engine
to build a GraphTask and pass it to the local autograd engine.
3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build
a `NodeTask` for a 'send' function and enqueue it on the local autograd engine.

In addition to this a few general improvements:
1. Added a `PropagateGradients` RPC call for the 'recv' function to pass
gradients to the appropriate node during the backward pass.
2. Use IValues as much as possible in serialization for RpcWithAutograd.
3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate
exception instead of just returning the message. This is inline with what most
Future.wait() APIs do.
4. Added a `get_gradients(context_id)` API which allows users to retrieve a map
from Tensor to respective gradient for the provided context_id on the local
node.

Differential Revision: [D17652615](https://our.internmc.facebook.com/intern/diff/D17652615/)

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this pull request Sep 28, 2019
Pull Request resolved: #27022

This change implements the "FAST" mode distributed autograd backward
pass as described in #23110.

At a high level the backward pass works as follows:
1. We start by computing dependencies on the node that calls
`torch.distributed.backward`.
2. This node computes the dependencies starting from the root nodes provided in
the backward call and all the 'send' functions present in the current autograd
context. The "FAST" mode assumes all 'send' functions are part of the autograd
computation.
3. Once the dependency computation is done, the distributed autograd engine
calls the local autograd engine to execute the autograd graph. Note that the
autograd graph on a single node is not necessarily connected because of
inter-node communication. As a result, we have special handling to ensure the
local autograd engine ensures we execute the entire graph starting from the
provided roots and all 'send' functions on the node.
4. When the local autograd engine hits a 'recv' function, it performs an async
RPC to send the gradients over to the appropriate node and stores a future in
the autograd context to keep track of this RPC.
5. On the destination node, the appropriate 'send' function is looked up and
enqueued on the local autograd engine. If this is the first time the node is
hearing about this autograd context id on the backward pass, then the node
computes dependencies for the local autograd engine.
6. As part of compute dependencies, the distributed autograd engine discovers
all leaf nodes and ensures those are passed as 'outputs' to the local autograd
engine. This avoids running the 'AccumulateGrad' function.
7. The gradients computed for the leaf nodes are then actually accumulated in
`DistAutogradContext` for the appropriate autograd context id.
8. The distributed autograd engine waits for the local autograd engine
to complete and also waits for all the 'Futures' (stored in 4.) for respective
RPCs to finish.

We have made the following changes to the local autograd engine for this
purpose:

1. Expose GraphTask and NodeTask so that the distributed autograd engine can
use them.
2. Expose a `execute_with_graph_task` API which gives the distributed engine
to build a GraphTask and pass it to the local autograd engine.
3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build
a `NodeTask` for a 'send' function and enqueue it on the local autograd engine.

In addition to this a few general improvements:
1. Added a `PropagateGradients` RPC call for the 'recv' function to pass
gradients to the appropriate node during the backward pass.
2. Use IValues as much as possible in serialization for RpcWithAutograd.
3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate
exception instead of just returning the message. This is inline with what most
Future.wait() APIs do.
4. Added a `get_gradients(context_id)` API which allows users to retrieve a map
from Tensor to respective gradient for the provided context_id on the local
node.
ghstack-source-id: 90989748

Differential Revision: [D17652615](https://our.internmc.facebook.com/intern/diff/D17652615/)
@pritamdamania87 pritamdamania87 added the module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer label Sep 28, 2019
@ezyang
Copy link
Contributor
ezyang commented Sep 30, 2019 10000

I won't be able to review this today. Bug me again about it on Wednesday.

@pritamdamania87
Copy link
Contributor Author

cc @rohan-varma This PR has the changes for the default RPC agent.

@pritamdamania87
Copy link
Contributor Author

@ezyang Just a friendly reminder since its Wednesday :)

@ezyang ezyang requested a review from albanD October 3, 2019 15:19
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)

ghstack-source-id: 92090804
Pull Request resolved: #28211
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
Pull Request resolved: #27576

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92090804

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal.

This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
Pull Request resolved: #27576

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92123039

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal.

This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal.

This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal.

This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 17, 2019
Pull Request resolved: #27576

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92154535

Differential Revision: [D17819153](https://our.internmc.facebook.com/intern/diff/D17819153/)
facebook-github-bot pushed a commit that referenced this pull request Oct 18, 2019
Summary:
Pull Request resolved: #27576

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92154535

Test Plan: unit tests

Differential Revision: D17819153

fbshipit-source-id: 37d8a85855bf591f2f2da48d475a06e870a30ea1
zhaojuanmao added a commit that referenced this pull request Oct 18, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 18, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 19, 2019
1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)

[ghstack-poisoned]
zhaojuanmao added a commit that referenced this pull request Oct 19, 2019
Pull Request resolved: #28312

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92240367

Differential Revision: [D18017554](https://our.internmc.facebook.com/intern/diff/D18017554/)
facebook-github-bot pushed a commit that referenced this pull request Oct 19, 2019
Summary:
Pull Request resolved: #28312

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92240367

Test Plan: unit tests

Differential Revision: D18017554

fbshipit-source-id: dbe79a5171063901a78a9b3322b9b31c159d098d
@facebook-github-bot facebook-github-bot deleted the gh/pritamdamania87/7/head branch October 28, 2019 22:18
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
…ch#27022)

Summary:
Pull Request resolved: pytorch#27022

This change implements the "FAST" mode distributed autograd backward
pass as described in pytorch#23110.

At a high level the backward pass works as follows:
1. We start by computing dependencies on the node that calls
`torch.distributed.backward`.
2. This node computes the dependencies starting from the root nodes provided in
the backward call and all the 'send' functions present in the current autograd
context. The "FAST" mode assumes all 'send' functions are part of the autograd
computation.
3. Once the dependency computation is done, the distributed autograd engine
calls the local autograd engine to execute the autograd graph. Note that the
autograd graph on a single node is not necessarily connected because of
inter-node communication. As a result, we have special handling to ensure the
local autograd engine ensures we execute the entire graph starting from the
provided roots and all 'send' functions on the node.
4. When the local autograd engine hits a 'recv' function, it performs an async
RPC to send the gradients over to the appropriate node and stores a future in
the autograd context to keep track of this RPC.
5. On the destination node, the appropriate 'send' function is looked up and
enqueued on the local autograd engine. If this is the first time the node is
hearing about this autograd context id on the backward pass, then the node
computes dependencies for the local autograd engine.
6. As part of compute dependencies, the distributed autograd engine discovers
all leaf nodes and ensures those are passed as 'outputs' to the local autograd
engine. This avoids running the 'AccumulateGrad' function.
7. The gradients computed for the leaf nodes are then actually accumulated in
`DistAutogradContext` for the appropriate autograd context id.
8. The distributed autograd engine waits for the local autograd engine
to complete and also waits for all the 'Futures' (stored in 4.) for respective
RPCs to finish.

We have made the following changes to the local autograd engine for this
purpose:

1. Expose GraphTask and NodeTask so that the distributed autograd engine can
use them.
2. Expose a `execute_with_graph_task` API which gives the distributed engine
to build a GraphTask and pass it to the local autograd engine.
3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build
a `NodeTask` for a 'send' function and enqueue it on the local autograd engine.

In addition to this a few general improvements:
1. Added a `PropagateGradients` RPC call for the 'recv' function to pass
gradients to the appropriate node during the backward pass.
2. Use IValues as much as possible in serialization for RpcWithAutograd.
3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate
exception instead of just returning the message. This is inline with what most
Future.wait() APIs do.
4. Added a `get_gradients(context_id)` API which allows users to retrieve a map
from Tensor to respective gradient for the provided context_id on the local
node.
ghstack-source-id: 91794926

Test Plan: unit tests.

Differential Revision: D17652615

fbshipit-source-id: 96f65c52adb2706ee29f4b49e1655afaa0a3bec3
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#27576

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in pytorch#27022)
ghstack-source-id: 92154535

Test Plan: unit tests

Differential Revision: D17819153

fbshipit-source-id: 37d8a85855bf591f2f2da48d475a06e870a30ea1
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#28312

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in pytorch#27022)
ghstack-source-id: 92240367

Test Plan: unit tests

Differential Revision: D18017554

fbshipit-source-id: dbe79a5171063901a78a9b3322b9b31c159d098d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
caffe2 Merged module: autograd Related to torch.autograd, and the autograd engine in general module: build Build system issues module: cpp Related to C++ API module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0