8000 Update on "Add Python RRef as args and return value" · pytorch/pytorch@46322c3 · GitHub
[go: up one dir, main page]

Skip to content

Commit 46322c3

Browse files
committed
Update on "Add Python RRef as args and return value"
See #23110 for model parallel design details, and #26759 for the RRef protocol. This commit add support for using RRef as Python UDF arguments and return value. RRefs can now be shared from owner to user, from user to owner, or from user to user. Limitations: 1. No implicit type conversion yet. 2. No failure handling and retry. 3. UDF is not yet blocked until all RRefs are confirmed. 4. Internal RRef control messages are not idempotent yet. Main changes: 1. Added `SCRIP 8000 T_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations. 2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages. 3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`. 4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure. 5. Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs. 6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`. Differential Revision: [D17184146](https://our.internmc.facebook.com/intern/diff/D17184146) [ghstack-poisoned]
2 parents 4a370ed + 48079e5 commit 46322c3

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

torch/csrc/distributed/rpc/python_functions.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ PyRRef pyRemoteBuiltin(
107107
auto op = matchBuiltinOp(opName, args, kwargs, stack);
108108

109109
auto& ctx = RRefContext::getInstance();
110+
// TODO: support creaing RRefs on a local object.
110111
TORCH_INTERNAL_ASSERT(
111112
ctx->getWorkerId() != dst.id_,
112113
"Does not support creating RRef on self yet.");

0 commit comments

Comments
 (0)
0