10000 Add Python RRef as args and return value · pytorch/pytorch@252d70b · GitHub
[go: up one dir, main page]

Skip to content

Commit 252d70b

Browse files
committed
Add Python RRef as args and return value
See #23110 for model parallel design details, and #26759 for the RRef protocol. This commit add support for using RRef as Python UDF arguments and return value. RRefs can now be shared from owner to user, from user to owner, or from user to user. Limitations: 1. No implicit type conversion yet. 2. No failure handling and retry. 3. UDF is not yet blocked until all RRefs are confirmed. 4. Internal RRef control messages are not idempotent yet. 5. Cannot delete RRefs correctly when there are circular dependencies Main changes: 1. Added SCRIPT_REMOTE_CALL and PYTHON_REMOTE_CALL to Message.h to represent dist.remote invocations. 2. Added SCRIPT_RREF_FETCH_CALL, PYTHON_RREF_FETCH_CALL, RREF_USER_ACCEPT, RREF_USER_DELETE, RREF_CHILD_ACCEPT, and RREF_FORK_REQUEST to Message.h as internal RRef control messages. 3. New message request handling code is added to functions.cpp, and message format is added in script_remote_call.h, python_remote_call.h, and rref_proto.h. 4. Added a PyRRef type in py_rref.h and py_rref.cpp which holds a shared pointer to C++ RRef type. PyRRef wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure. 5. Update RRef.h and RRef.cpp accordingly to support py::object RRefs. 6. RRef context (reference count, etc.) are tracked in rref_context.h and rref_context.cpp. ghstack-source-id: c7926a6 Pull Request resolved: #25499
1 parent e33ec39 commit 252d70b

36 files changed

+2063
-617
lines changed

caffe2/CMakeLists.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -485,10 +485,12 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
485485
${TORCH_SRC_DIR}/csrc/distributed/autograd/utils.cpp
486486
${TORCH_SRC_DIR}/csrc/distributed/rpc/future_message.cpp
487487
${TORCH_SRC_DIR}/csrc/distributed/rpc/message.cpp
488+
${TORCH_SRC_DIR}/csrc/distributed/rpc/python_remote_call.cpp
489+
${TORCH_SRC_DIR}/csrc/distributed/rpc/rref_proto.cpp
488490
${TORCH_SRC_DIR}/csrc/distributed/rpc/script_call.cpp
489491
${TORCH_SRC_DIR}/csrc/distributed/rpc/script_remote_call.cpp
490-
${TORCH_SRC_DIR}/csrc/distributed/rpc/script_rref_proto.cpp
491492
${TORCH_SRC_DIR}/csrc/distributed/rpc/script_ret.cpp
493+
${TORCH_SRC_DIR}/csrc/distributed/rpc/types.cpp
492494
${TORCH_SRC_DIR}/csrc/jit/export.cpp
493495
${TORCH_SRC_DIR}/csrc/jit/import_legacy.cpp
494496
${TORCH_SRC_DIR}/csrc/jit/netdef_converter.cpp

0 commit comments

Comments
 (0)
0