8000 Update on "[WIP] sync and async torch.distributed.rpc for builtin ope… · pytorch/pytorch@294038c · GitHub
[go: up one dir, main page]

Skip to content

Commit 294038c

Browse files
committed
Update on "[WIP] sync and async torch.distributed.rpc for builtin operators"
Features: * sync and async RPC for builtin operators * RpcAgent API * ProcessGroupAgent implementation Goal: * have a minimum working and testable RPC implementation for #23110 * make sure the RpcAgent API is sufficient for future ThriftAgent and TensorPipeAgent implementation * For tensor pipe implementation, it might allocate multiple underlying communication channels with different types, and might also use streaming serialization/deserialization for large tensors. To support this requirement, the current implementation only convert a BuiltinOp into a Message which contains a byte vector and a tensor table. It is up to the RpcAgent implementation to determine how it would like to serialize a Message object. * For ThriftAgent, as Thrift has it own request/response matching solution, the Message.id is no longer necessary. Hence the id can be dropped during serialization. All it needs to do is to pass the response Message object to the Future returned by send(...). * support blocking and non-blocking RequestCallback * blocking means the callback won't return before sending out the response * non-blocking can be achieved by enqueue the `(from, request, RpcAgent&)` tuple and use a different thread to process them. That is why there is an `RpcAgent&` arg in the param list. Differential Revision: [D15194693](https://our.internmc.facebook.com/intern/diff/D15194693/)
2 parents 9150589 + 071536f commit 294038c

File tree

278 files changed

+8229
-4008
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

278 files changed

+8229
-4008
lines changed

.circleci/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -272,13 +272,13 @@ Manywheels are pip packages for linux distros. Note that these manywheels are no
272272

273273
The entrypoint file `builder/manywheel/build_common.sh` is really really complicated because
274274

275-
* This used to handle building for several different python versions at the same time. This is why there are loops everywhere
275+
* This used to handle building for several different python versions at the same time. The loops have been removed, but there's still unneccessary folders and movements here and there.
276276
* The script is never used this way anymore. This extra machinery could be removed.
277277
* This used to handle testing the pip packages too. This is why there’s testing code at the end that messes with python installations and stuff
278278
* The script is never used this way anymore. This extra machinery could be removed.
279279
* This also builds libtorch packages
280280
* This should really be separate. libtorch packages are c++ only and have no python. They should not share infra with all the python specific stuff in this file.
281-
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the loops for libtorch and separate python versions were removed.
281+
* There is a lot of messing with rpaths. This is necessary, but could be made much much simpler if the above issues were fixed.
282282

283283
## Wheels (MacOS pip and libtorch packages)
284284

@@ -307,7 +307,6 @@ Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for
307307
* It’s confusinig. Most of those scripts deal with python specifics.
308308
* The extra conditionals everywhere severely complicate the wheel build scripts
309309
* The process for building libtorch is different from the official instructions (a plain call to cmake, or a call to a script)
310-
* For Linux specifically, the job is set up to build all libtorch varieties in a single go. This leads to 9+ hour builds times for CUDA 10.0 libtorch. This is more of a problem with the circleci setup though.
311310

312311
### Note on docker images / Dockerfiles
313312

.circleci/cimodel/data/binary_build_data.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ def get_children(self):
157157
package_format = self.find_prop("package_format")
158158
os_name = self.find_prop("os_name")
159159

160-
has_libtorch_variants = smoke and package_format == "libtorch" and os_name == "linux"
160+
has_libtorch_variants = package_format == "libtorch" and os_name == "linux"
161161
linking_variants = LINKING_DIMENSIONS if has_libtorch_variants else []
162162

163163
return [LinkingVariantConfigNode(self, v) for v in linking_variants]

.circleci/cimodel/data/binary_build_definitions.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,10 @@ def gen_build_name(self, build_or_test):
4646

4747
parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()
4848

49-
if self.smoke:
50-
if self.libtorch_variant:
51-
parts.append(self.libtorch_variant)
52-
else:
49+
if self.libtorch_variant:
50+
parts.append(self.libtorch_variant)
51+
52+
if not self.smoke:
5353
parts.append(build_or_test)
5454

5555
return "_".join(parts)

.circleci/config.yml

Lines changed: 229 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -100,13 +100,15 @@ pytorch_linux_build_defaults: &pytorch_linux_build_defaults
100100
101101
# Push intermediate Docker image for next phase to use
102102
if [ -z "${BUILD_ONLY}" ]; then
103-
# Note [namedtensor build image]
104-
# The namedtensor build uses the same docker image as
103+
# Note [Special build images]
104+
# The namedtensor and xla builds use the same docker image as
105105
# pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to
106-
# distinguish between these two so the test can pick up the correct image.
106+
# distinguish between them so the test can pick up the correct image.
107107
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
108108
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
109109
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
110+
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
111+
export COMMIT_DOCKER_IMAGE=$output_image-xla
110112
else
111113
export COMMIT_DOCKER_IMAGE=$output_image
112114
fi
@@ -132,11 +134,13 @@ pytorch_linux_test_defaults: &pytorch_linux_test_defaults
132134
no_output_timeout: "90m"
133135
command: |
134136
set -e
135-
# See Note [namedtensor build image]
137+
# See Note [Special build images]
136138
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
137139
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
138140
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
139141
NAMED_FLAG="export BUILD_NAMEDTENSOR=1"
142+
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
143+
export COMMIT_DOCKER_IMAGE=$output_image-xla
140144
else
141145
export COMMIT_DOCKER_IMAGE=$output_image
142146
fi
@@ -1681,23 +1685,98 @@ jobs:
16811685
- image: "soumith/conda-cuda"
16821686
<<: *binary_linux_build
16831687

1684-
binary_linux_libtorch_2.7m_cpu_devtoolset7_build:
1688+
binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-with-deps_build:
16851689
environment:
16861690
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
1691+
LIBTORCH_VARIANT: "shared-with-deps"
16871692
docker:
16881693
- image: "soumith/manylinux-cuda80"
16891694
<<: *binary_linux_build
16901695

1691-
binary_linux_libtorch_2.7m_cu92_devtoolset7_build:
1696+
binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-without-deps_build:
1697+
environment:
1698+
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
1699+
LIBTORCH_VARIANT: "shared-without-deps"
1700+
docker:
1701+
- image: "soumith/manylinux-cuda80"
1702+
<<: *binary_linux_build
1703+
1704+
binary_linux_libtorch_2.7m_cpu_devtoolset7_static-with-deps_build:
1705+
environment:
1706+
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
1707+
LIBTORCH_VARIANT: "static-with-deps"
1708+
docker:
1709+
- image: "soumith/manylinux-cuda80"
1710+
<<: *binary_linux_build
1711+
1712+
binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_build:
1713+
environment:
1714+
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
1715+
LIBTORCH_VARIANT: "static-without-deps"
1716+
docker:
1717+
- image: "soumith/manylinux-cuda80"
1718+
<<: *binary_linux_build
1719+
1720+
binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-with-deps_build:
1721+
environment:
1722+
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
1723+
LIBTORCH_VARIANT: "shared-with-deps"
1724+
docker:
1725+
- ima 97AE ge: "soumith/manylinux-cuda92"
1726+
<<: *binary_linux_build
1727+
1728+
binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-without-deps_build:
1729+
environment:
1730+
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
1731+
LIBTORCH_VARIANT: "shared-without-deps"
1732+
docker:
1733+
- image: "soumith/manylinux-cuda92"
1734+
<<: *binary_linux_build
1735+
1736+
binary_linux_libtorch_2.7m_cu92_devtoolset7_static-with-deps_build:
1737+
environment:
1738+
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
1739+
LIBTORCH_VARIANT: "static-with-deps"
1740+
docker:
1741+
- image: "soumith/manylinux-cuda92"
1742+
<<: *binary_linux_build
1743+
1744+
binary_linux_libtorch_2.7m_cu92_devtoolset7_static-without-deps_build:
16921745
environment:
16931746
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
1747+
LIBTORCH_VARIANT: "static-without-deps"
16941748
docker:
16951749
- image: "soumith/manylinux-cuda92"
16961750
<<: *binary_linux_build
16971751

1698-
binary_linux_libtorch_2.7m_cu100_devtoolset7_build:
1752+
binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-with-deps_build:
1753+
environment:
1754+
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
1755+
LIBTORCH_VARIANT: "shared-with-deps"
1756+
docker:
1757+
- image: "soumith/manylinux-cuda100"
1758+
<<: *binary_linux_build
1759+
1760+
binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-without-deps_build:
1761+
environment:
1762+
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
1763+
LIBTORCH_VARIANT: "shared-without-deps"
1764+
docker:
1765+
- image: "soumith/manylinux-cuda100"
1766+
<<: *binary_linux_build
1767+
1768+
binary_linux_libtorch_2.7m_cu100_devtoolset7_static-with-deps_build:
1769+
environment:
1770+
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
1771+
LIBTORCH_VARIANT: "static-with-deps"
1772+
docker:
1773+
- image: "soumith/manylinux-cuda100"
1774+
<<: *binary_linux_build
1775+
1776+
binary_linux_libtorch_2.7m_cu100_devtoolset7_static-without-deps_build:
16991777
environment:
17001778
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
1779+
LIBTORCH_VARIANT: "static-without-deps"
17011780
docker:
17021781
- image: "soumith/manylinux-cuda100"
17031782
<<: *binary_linux_build
@@ -2066,19 +2145,76 @@ jobs:
20662145
BUILD_ENVIRONMENT: "conda 3.7 cu100 devtoolset7"
20672146
<<: *binary_linux_upload
20682147

2069-
binary_linux_libtorch_2.7m_cpu_devtoolset7_upload:
2148+
binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-with-deps_upload:
20702149
environment:
20712150
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
2151+
LIBTORCH_VARIANT: "shared-with-deps"
20722152
<<: *binary_linux_upload
20732153

2074-
binary_linux_libtorch_2.7m_cu92_devtoolset7_upload:
2154+
binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-without-deps_upload:
2155+
environment:
2156+
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
2157+
LIBTORCH_VARIANT: "shared-without-deps"
2158+
<<: *binary_linux_upload
2159+
2160+
binary_linux_libtorch_2.7m_cpu_devtoolset7_static-with-deps_upload:
2161+
environment:
2162+
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
2163+
LIBTORCH_VARIANT: "static-with-deps"
2164+
<<: *binary_linux_upload
2165+
2166+
binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_upload:
2167+
environment:
2168+
BUILD_ENVIRONMENT: "libtorch 2.7m cpu devtoolset7"
2169+
LIBTORCH_VARIANT: "static-without-deps"
2170+
<<: *binary_linux_upload
2171+
2172+
binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-with-deps_upload:
20752173
environment:
20762174
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
2175+
LIBTORCH_VARIANT: "shared-with-deps"
20772176
<<: *binary_linux_upload
20782177

2079-
binary_linux_libtorch_2.7m_cu100_devtoolset7_upload:
2178+
binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-without-deps_upload:
2179+
environment:
2180+
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
2181+
LIBTORCH_VARIANT: "shared-without-deps"
2182+
<<: *binary_linux_upload
2183+
2184+
binary_linux_libtorch_2.7m_cu92_devtoolset7_static-with-deps_upload:
2185+
environment:
2186+
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
2187+
LIBTORCH_VARIANT: "static-with-deps"
2188+
<<: *binary_linux_upload
2189+
2190+
binary_linux_libtorch_2.7m_cu92_devtoolset7_static-without-deps_upload:
2191+
environment:
2192+
BUILD_ENVIRONMENT: "libtorch 2.7m cu92 devtoolset7"
2193+
LIBTORCH_VARIANT: "static-without-deps"
2194+
<<: *binary_linux_upload
2195+
2196+
binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-with-deps_upload:
2197+
environment:
2198+
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
2199+
LIBTORCH_VARIANT: "shared-with-deps"
2200+
<<: *binary_linux_upload
2201+
2202+
binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-without-deps_upload:
2203+
environment:
2204+
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
2205+
LIBTORCH_VARIANT: "shared-without-deps"
2206+
<<: *binary_linux_upload
2207+
2208+
binary_linux_libtorch_2.7m_cu100_devtoolset7_static-with-deps_upload:
2209+
environment:
2210+
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
2211+
LIBTORCH_VARIANT: "static-with-deps"
2212+
<<: *binary_linux_upload
2213+
2214+
binary_linux_libtorch_2.7m_cu100_devtoolset7_static-without-deps_upload:
20802215
environment:
20812216
BUILD_ENVIRONMENT: "libtorch 2.7m cu100 devtoolset7"
2217+
LIBTORCH_VARIANT: "static-without-deps"
20822218
<<: *binary_linux_upload
20832219

20842220
binary_macos_wheel_2.7_cpu_upload:
@@ -2719,11 +2855,11 @@ workflows:
27192855
- setup
27202856
# This binary build is currently broken, see https://github.com/pytorch/pytorch/issues/16710
27212857
# - binary_linux_conda_3.6_cu90_devtoolset7_build
2722-
- binary_linux_libtorch_2.7m_cpu_devtoolset7_build:
2858+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_build:
27232859
requires:
27242860
- setup
27252861
# TODO we should test a libtorch cuda build, but they take too long
2726-
# - binary_linux_libtorch_2.7m_cu90_devtoolset7_build
2862+
# - binary_linux_libtorch_2.7m_cu90_devtoolset7_static-without-deps_build
27272863
- binary_macos_wheel_3.6_cpu_build:
27282864
requires:
27292865
- setup
@@ -2944,13 +3080,40 @@ workflows:
29443080
- binary_linux_conda_3.7_cu100_devtoolset7_build:
29453081
requires:
29463082
- setup
2947-
- binary_linux_libtorch_2.7m_cpu_devtoolset7_build:
3083+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-with-deps_build:
3084+
requires:
3085+
- setup
3086+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-without-deps_build:
3087+
requires:
3088+
- setup
3089+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-with-deps_build:
3090+
requires:
3091+
- setup
3092+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_build:
3093+
requires:
3094+
- setup
3095+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-with-deps_build:
3096+
requires:
3097+
- setup
3098+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-without-deps_build:
3099+
requires:
3100+
- setup
3101+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_static-with-deps_build:
29483102
requires:
29493103
- setup
2950-
- binary_linux_libtorch_2.7m_cu92_devtoolset7_build:
3104+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_static-without-deps_build:
29513105
requires:
29523106
- setup
2953-
- binary_linux_libtorch_2.7m_cu100_devtoolset7_build:
3107+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-with-deps_build:
3108+
requires:
3109+
- setup
3110+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-without-deps_build:
3111+
requires:
3112+
- setup
3113+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_static-with-deps_build:
3114+
requires:
3115+
- setup
3116+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_static-without-deps_build:
29543117
requires:
29553118
- setup
29563119
- binary_macos_wheel_2.7_cpu_build:
@@ -3208,21 +3371,66 @@ workflows:
32083371
requires:
32093372
- setup
32103373
- binary_linux_conda_3.7_cu100_devtoolset7_test
3211-
- binary_linux_libtorch_2.7m_cpu_devtoolset7_upload:
3374+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-with-deps_upload:
3375+
context: org-member
3376+
requires:
3377+
- setup
3378+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-with-deps_build
3379+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-without-deps_upload:
3380+
context: org-member
3381+
requires:
3382+
- setup
3383+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_shared-without-deps_build
3384+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-with-deps_upload:
3385+
context: org-member
3386+
requires:
3387+
- setup
3388+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-with-deps_build
3389+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_upload:
3390+
context: org-member
3391+
requires:
3392+
- setup
3393+
- binary_linux_libtorch_2.7m_cpu_devtoolset7_static-without-deps_build
3394+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-with-deps_upload:
3395+
context: org-member
3396+
requires:
3397+
- setup
3398+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-with-deps_build
3399+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-without-deps_upload:
3400+
context: org-member
3401+
requires:
3402+
- setup
3403+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_shared-without-deps_build
3404+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_static-with-deps_upload:
3405+
context: org-member
3406+
requires:
3407+
- setup
3408+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_static-with-deps_build
3409+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_static-without-deps_upload:
3410+
context: org-member
3411+
requires:
3412+
- setup
3413+
- binary_linux_libtorch_2.7m_cu92_devtoolset7_static-without-deps_build
3414+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-with-deps_upload:
3415+
context: org-member
3416+
requires:
3417+
- setup
3418+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-with-deps_build
3419+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-without-deps_upload:
32123420
context: org-member
32133421
requires:
32143422
- setup
3215-
- binary_linux_libtorch_2.7m_cpu_devtoolset7_build
3216-
- binary_linux_libtorch_2.7m_cu92_devtoolset7_upload:
3423+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_shared-without-deps_build
3424+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_static-with-deps_upload:
32173425
context: org-member
32183426
requires:
32193427
- setup
3220-
- binary_linux_libtorch_2.7m_cu92_devtoolset7_build
3221-
- binary_linux_libtorch_2.7m_cu100_devtoolset7_upload:
3428+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_static-with-deps_build
3429+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_static-without-deps_upload:
32223430
context: org-member
32233431
requires:
32243432
- setup
3225-
- binary_linux_libtorch_2.7m_cu100_devtoolset7_build
3433+
- binary_linux_libtorch_2.7m_cu100_devtoolset7_static-without-deps_build
32263434
- binary_macos_wheel_2.7_cpu_upload:
32273435
context: org-member
32283436
requires:

0 commit comments

Comments
 (0)
0