8000 Update on "sync and async torch.distributed.rpc for builtin operators" · pytorch/pytorch@f0d6fa3 · GitHub
[go: up one dir, main page]

Skip to content

Commit f0d6fa3

Browse files
committed
Update on "sync and async torch.distributed.rpc for builtin operators"
Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: This is the first PR for #23110, and there will be many followup ones. So let's focus on the overall API and code structure. Details like efficiency and error handling can be improved in future PRs. * have a minimum working and testable RPC implementation. * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)
1 parent 052ba85 commit f0d6fa3

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

torch/csrc/distributed/rpc/ScriptCall.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ using torch::jit::Unpickler;
1111

1212
} // namespace
1313

14+
static constexpr char BUILTIN_OP_NAMESPACE_[] = "torch.ops.aten.";
15+
static constexpr char ATEN_PREFIX_[] = "aten::";
16+
static constexpr int ATEN_PREFIX_LEN_ = 6;
17+
1418
ScriptCall::ScriptCall(
1519
std::shared_ptr<Operator> op, std::vector<at::IValue>&& args)
1620
: op_(std::move(op)), stack_(args) {}

torch/csrc/distributed/rpc/ScriptCall.h

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ class TORCH_API ScriptCall final {
3232
static std::shared_ptr<Operator> matchOperator(
3333
at::Symbol& symbol, const std::string& str_schema);
3434

35-
static constexpr char BUILTIN_OP_NAMESPACE_[] = "torch.ops.aten.";
36-
static constexpr char ATEN_PREFIX_[] = "aten::";
37-
static constexpr int ATEN_PREFIX_LEN_ = 6;
38-
3935
// This field has value if this ScriptCall represents invocation of a builtin
4036
// operator.
4137
c10::optional<std::shared_ptr<Operator>> op_;

0 commit comments

Comments
 (0)
0