8000 Update on "sync and async torch.distributed.rpc for builtin operators" · pytorch/pytorch@207c1ff · GitHub
[go: up one dir, main page]

Skip to content

Commit 207c1ff

Browse files
committed
Update on "sync and async torch.distributed.rpc for builtin operators"
Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: This is the first PR for #23110, and there will be many followup ones. So let's focus on the overall API and code structure. Details like efficiency and error handling can be improved in future PRs. * have a minimum working and testable RPC implementation. * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)
2 parents c0d4b14 + 7d9e69e commit 207c1ff

File tree

110 files changed

+2782
-717
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

110 files changed

+2782
-717
lines changed

.circleci/README.md

Lines changed: 22 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ A **binary configuration** is a collection of
8181
* MacOS
8282
* Windows - these are built on Azure pipelines
8383
* devtoolset version (gcc compiler version)
84-
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
84+
* This only matters on Linux cause only Linux uses gcc. tldr is gcc made a backwards incompatible change from gcc 4.8 to gcc 5, because it had to change how it implemented std::vector and std::string
8585

8686
### Where are the binaries?
8787

@@ -101,12 +101,12 @@ All binaries are built in CircleCI workflows. There are checked-in workflows (co
101101

102102
# CircleCI structure of the binaries
103103

104-
Some quick vocab:
104+
Some quick vocab:
105105

106-
* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
106+
* A\**workflow** is a CircleCI concept; it is a DAG of '**jobs**'. ctrl-f 'workflows' on\https://github.com/pytorch/pytorch/blob/master/.circleci/config.yml to see the workflows.
107107
* **jobs** are a sequence of '**steps**'
108108
* **steps** are usually just a bash script or a builtin CircleCI command.* All steps run in new environments, environment variables declared in one script DO NOT persist to following steps*
109-
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
109+
* CircleCI has a **workspace**, which is essentially a cache between steps of the *same job* in which you can store artifacts between steps.
110110

111111
## How are the workflows structured?
112112

@@ -116,9 +116,9 @@ The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build,
116116
1. every day midnight EST
117117
2. linux: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
118118
3. macos: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/macos-binary-build-defaults.yml
119-
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
119+
4. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
120120
1. binary_linux_conda_3.7_cpu_build
121-
1. Builds the build. On linux jobs this uses the 'docker executor'.
121+
1. Builds the build. On linux jobs this uses the 'docker executor'.
122122
2. Persists the package to the workspace
123123
2. binary_linux_conda_3.7_cpu_test
124124
1. Loads the package to the workspace
@@ -134,16 +134,16 @@ The nightly binaries have 3 workflows. We have one job (actually 3 jobs: build,
134134
3. See below for what these are for and why they're needed
135135
4. Three jobs that each examine the current contents of aws and the conda repo and update some html files in s3
136136
3. binarysmoketests
137-
1. every day
137+
1. every day
138138
2. https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/nightly-build-smoke-tests-defaults.yml
139-
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
139+
3. For each binary configuration, e.g. linux_conda_3.7_cpu there is a
140140
1. smoke_linux_conda_3.7_cpu
141141
1. Downloads the package from the cloud, e.g. using the official pip or conda instructions
142142
2. Runs the smoke tests
143143

144144
## How are the jobs structured?
145145

146-
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
146+
The jobs are in https://github.com/pytorch/pytorch/tree/master/.circleci/verbatim-sources . Jobs are made of multiple steps. There are some shared steps used by all the binaries/smokes. Steps of these jobs are all delegated to scripts in https://github.com/pytorch/pytorch/tree/master/.circleci/scripts .
147147

148148
* Linux jobs: https://github.com/pytorch/pytorch/blob/master/.circleci/verbatim-sources/linux-binary-build-defaults.yml
149149
* binary_linux_build.sh
@@ -177,7 +177,7 @@ CircleCI creates a final yaml file by inlining every <<* segment, so if we were
177177
So, CircleCI has several executor types: macos, machine, and docker are the ones we use. The 'machine' executor gives you two cores on some linux vm. The 'docker' executor gives you considerably more cores (nproc was 32 instead of 2 back when I tried in February). Since the dockers are faster, we try to run everything that we can in dockers. Thus
178178

179179
* linux build jobs use the docker executor. Running them on the docker executor was at least 2x faster than running them on the machine executor
180-
* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.
180+
* linux test jobs use the machine executor and spin up their own docker. Why this nonsense? It's cause we run nvidia-docker for our GPU tests; any code that calls into the CUDA runtime needs to be run on nvidia-docker. To run a nvidia-docker you need to install some nvidia packages on the host machine and then call docker with the '—runtime nvidia' argument. CircleCI doesn't support this, so we have to do it ourself.
181181
* This is not just a mere inconvenience. **This blocks all of our linux tests from using more than 2 cores.** But there is nothing that we can do about it, but wait for a fix on circleci's side. Right now, we only run some smoke tests (some simple imports) on the binaries, but this also affects non-binary test jobs.
182182
* linux upload jobs use the machine executor. The upload jobs are so short that it doesn't really matter what they use
183183
* linux smoke test jobs use the machine executor for the same reason as the linux test jobs
@@ -243,7 +243,7 @@ Every type of package has an entrypoint build script that handles the all the im
243243

244244
Both Linux and MacOS use the same code flow for the conda builds.
245245

246-
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
246+
Conda packages are built with conda-build, see https://conda.io/projects/conda-build/en/latest/resources/commands/conda-build.html
247247

248248
Basically, you pass `conda build` a build folder (pytorch-nightly/ above) that contains a build script and a meta.yaml. The meta.yaml specifies in what python environment to build the package in, and what dependencies the resulting package should have, and the build script gets called in the env to build the thing.
249249
tldr; on conda-build is
@@ -266,7 +266,7 @@ The entrypoint file `builder/conda/build_conda.sh` is complicated because
266266

267267
## Manywheels (linux pip and libtorch packages)
268268

269-
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
269+
Manywheels are pip packages for linux distros. Note that these manywheels are not actually manylinux compliant.
270270

271271
`builder/manywheel/build_cpu.sh` and `builder/manywheel/build.sh` (for CUDA builds) just set different env vars and then call into `builder/manywheel/build_common.sh`
272272

@@ -313,13 +313,12 @@ Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for
313313
All linux builds occur in docker images. The docker images are
314314

315315
* soumith/conda-cuda
316-
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-8.0 to enable different CUDA builds
317-
* Also used for cpu builds
318-
* soumith/manylinux-cuda80
316+
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
319317
* Also used for cpu builds
320318
* soumith/manylinux-cuda90
321319
* soumith/manylinux-cuda92
322320
* soumith/manylinux-cuda100
321+
* Also used for cpu builds
323322

324323
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
325324

@@ -380,7 +379,7 @@ The advantage of this flow is that you can make new changes to the base commit a
380379

381380
### Linux
382381

383-
You can build Linux binaries locally easily using docker.
382+
You can build Linux binaries locally easily using docker.
384383

385384
```
386385
# Run the docker
@@ -400,7 +399,7 @@ docker run \
400399
-v your/builder/repo:/builder \
401400
-v where/you/want/packages/to/appear:/final_pkgs \
402401
-it soumith/conda-cuda /bin/bash
403-
402+
404403
# Export whatever variables are important to you. All variables that you'd
405404
# possibly need are in .circleci/scripts/binary_populate_env.sh
406405
# You should probably always export at least these 3 variables
@@ -410,14 +409,14 @@ export DESIRED_CUDA=cpu
410409
411410
# Call the entrypoint
412411
# `|& tee foo.log` just copies all stdout and stderr output to foo.log
413-
# The builds generate lots of output so you probably need this when
412+
# The builds generate lots of output so you probably need this when
414413
# building locally.
415414
/builder/conda/build_pytorch.sh |& tee build_output.log
416415
```
417416

418417
**Building CUDA binaries on docker**
419418

420-
To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.
419+
To build a CUDA binary you need to use `nvidia-docker run` instead of just `docker run` (or you can manually pass `--runtime=nvidia`). This adds some needed libraries and things to build CUDA stuff.
421420

422421
You can build CUDA binaries on CPU only machines, but you can only run CUDA binaries on CUDA machines. This means that you can build a CUDA binary on a docker on your laptop if you so choose (though it’s gonna take a loong time).
423422

@@ -431,7 +430,7 @@ But if you want to try, then I’d recommend
431430

432431
```
433432
# Create a new terminal
434-
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
433+
# Clear your LD_LIBRARY_PATH and trim as much out of your PATH as you
435434
# know how to do
436435
437436
# Install a new miniconda
@@ -462,11 +461,11 @@ export DESIRED_CUDA=cpu
462461
path/to/builder/wheel/build_wheel.sh
463462
```
464463

465-
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
464+
N.B. installing a brand new miniconda is important. This has to do with how conda installations work. See the “General Python” section above, but tldr; is that
466465

467466
1. You make the ‘conda’ command accessible by prepending `path/to/conda_root/bin` to your PATH.
468-
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
469-
3. Now say you (or some code that you ran) call python executable `foo`
467+
2. You make a new env and activate it, which then also gets prepended to your PATH. Now you have `path/to/conda_root/envs/new_env/bin:path/to/conda_root/bin:$PATH`
468+
3. Now say you (or some code that you ran) call python executable `foo`
470469
1. if you installed `foo` in `new_env`, then `path/to/conda_root/envs/new_env/bin/foo` will get called, as expected.
471470
2. But if you forgot to installed `foo` in `new_env` but happened to previously install it in your root conda env (called ‘base’), then unix/linux will still find `path/to/conda_root/bin/foo` . This is dangerous, since `foo` can be a different version than you want; `foo` can even be for an incompatible python version!
472471

@@ -475,4 +474,3 @@ Newer conda versions and proper python hygiene can prevent this, but just instal
475474
### Windows
476475

477476
Maybe @peterjc123 can fill this section in.
478-

.circleci/cimodel/data/binary_build_data.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ def get_processor_arch_name(cuda_version):
4242
"3.6m",
4343
"3.7m",
4444
],
45-
conda=dimensions.CONDA_PYTHON_VERSIONS,
45+
conda=dimensions.STANDARD_PYTHON_VERSIONS,
4646
libtorch=[
4747
"2.7m",
4848
],
@@ -52,7 +52,7 @@ def get_processor_arch_name(cuda_version):
5252
linux=(dimensions.CUDA_VERSIONS, LINUX_PACKAGE_VARIANTS),
5353
macos=([None], OrderedDict(
5454
wheel=dimensions.STANDARD_PYTHON_VERSIONS,
55-
conda=dimensions.CONDA_PYTHON_VERSIONS,
55+
conda=dimensions.STANDARD_PYTHON_VERSIONS,
5656
libtorch=[
5757
"2.7",
5858
],

.circleci/cimodel/data/binary_build_definitions.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ def gen_docker_image(self):
3434

3535
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
3636

37-
# The cpu nightlies are built on the soumith/manylinux-cuda80 docker image
38-
alt_docker_suffix = self.cuda_version or "80"
37+
# The cpu nightlies are built on the soumith/manylinux-cuda100 docker image
38+
alt_docker_suffix = self.cuda_version or "100"
3939
docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix
4040
return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
4141

.circleci/cimodel/data/dimensions.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,3 @@
1515
"3.6",
1616
"3.7",
1717
]
18-
19-
CONDA_PYTHON_VERSIONS = [
20-
"2.7",
21-
"3.6",
22-
"3.7",
23-
]

0 commit comments

Comments
 (0)
0