-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[NOT FOR LANDING] experimental NVSHMEM integration #146593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/yifuwang/195/base
Are you sure you want to change the base?
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146593
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 New Failures, 2 Unrelated FailuresAs of commit bcd4a36 with merge base 07b9fe0 ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
| set_target_properties(nvshmem_extension PROPERTIES CUDA_SEPARABLE_COMPILATION ON) | ||
| target_compile_options(nvshmem_extension PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:-rdc=true>) | ||
| target_link_libraries(nvshmem_extension PRIVATE | ||
| ${NVSHMEM_LIB_DIR}/libnvshmem.a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically we dynamically link libnvshmem_host.so and statically link libnvshmem_device.a.
You also don't need to link the extension to nvshmem_bootstrap_uid.so. It will by dynamically opened by NVSHMEM.
Alternatively, if you are only using host APIs, you can forego linking to libnvshmem_device.a and dynamically load libnvshmem_host.so. Which would mean you wouldn't actually have any build-time NVSHMEM dependencies from your module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, if you are only using host APIs, you can forego linking to libnvshmem_device.a and dynamically load libnvshmem_host.so
@seth-howell I tried this but it didn't work. I might've done something wrong, but I got a dynamic linker error complaining about some missing symboI (I forgot the name). I tried building with the host compiler, including only the host header and only calling nvshmem_init but they didn't help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I probably should have provided a little more context. If you are dynamically loading the library you will want to use the nvshmemx_hostlib_init_attr and nvshmemx_hostlib_finalize APIs instead. https://docs.nvidia.com/nvshmem/api/gen/api/setup.html#nvshmemx-hostlib-init-attr
| target_compile_definitions(torch_cuda PRIVATE USE_NCCL) | ||
| endif() | ||
|
|
||
| # Use env var for these for now for prototyping purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, Outside of the prototyping phase, NVSHMEM does support cmake's find_package command.
ghstack-source-id: dc491e4 Pull Request resolved: pytorch#146593
cc H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o [ghstack-poisoned]
Adding NVSHMEM as a backend for `SymmetricMemory`, implementation of which is in `NVSHMEMSymmetricMemory.cu`. Moving some helper functions in `CUDASymmetricMemory.cu` to `CUDASymmetricMemoryUtils.cpp`, so that they can be shared by `NVSHMEMSymmetricMemory`. These functions are mostly side-band exchange helpers (`store_all_gather`, `IpcChannel`, etc). Adding `TORCH_SYMMEM` to control which implementation to use for CUDA tensors, currently support: `CUDA` (in-house impl), `NVSHMEM`. The NVSHMEM feature is gated by build-time flag: `USE_NVSHMEM=1`. And `NVSHMEM_HOME` setting is required (TODO). Ported most code from #146593. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k [ghstack-poisoned]
Adding NVSHMEM as a backend for `SymmetricMemory`, implementation of which is in `NVSHMEMSymmetricMemory.cu`. Moving some helper functions in `CUDASymmetricMemory.cu` to `CUDASymmetricMemoryUtils.cpp`, so that they can be shared by `NVSHMEMSymmetricMemory`. These functions are mostly side-band exchange helpers (`store_all_gather`, `IpcChannel`, etc). Adding `TORCH_SYMMEM` to control which implementation to use for CUDA tensors, currently support: `CUDA` (in-house impl), `NVSHMEM`. The NVSHMEM feature is gated by build-time flag: `USE_NVSHMEM=1`. And `NVSHMEM_HOME` setting is required (TODO). Ported most code from #146593. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k [ghstack-poisoned]
Adding NVSHMEM as a backend for `SymmetricMemory`, implementation of which is in `NVSHMEMSymmetricMemory.cu`. Moving some helper functions in `CUDASymmetricMemory.cu` to `CUDASymmetricMemoryUtils.cpp`, so that they can be shared by `NVSHMEMSymmetricMemory`. These functions are mostly side-band exchange helpers (`store_all_gather`, `IpcChannel`, etc). Adding `TORCH_SYMMEM` to control which implementation to use for CUDA tensors, currently support: `CUDA` (in-house impl), `NVSHMEM`. The NVSHMEM feature is gated by build-time flag: `USE_NVSHMEM=1`. And `NVSHMEM_HOME` setting is required (TODO). Ported most code from #146593. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k [ghstack-poisoned]
Adding NVSHMEM as a backend for `SymmetricMemory`, implementation of which is in `NVSHMEMSymmetricMemory.cu`. Moving some helper functions in `CUDASymmetricMemory.cu` to `CUDASymmetricMemoryUtils.cpp`, so that they can be shared by `NVSHMEMSymmetricMemory`. These functions are mostly side-band exchange helpers (`store_all_gather`, `IpcChannel`, etc). Adding `TORCH_SYMMEM` to control which implementation to use for CUDA tensors, currently support: `CUDA` (in-house impl), `NVSHMEM`. The NVSHMEM feature is gated by build-time flag: `USE_NVSHMEM=1`. And `NVSHMEM_HOME` setting is required (TODO). Ported most code from #146593. cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k [ghstack-poisoned]
Adding NVSHMEM as a backend for `SymmetricMemory`, implementation of which is in `NVSHMEMSymmetricMemory.cu`. Moving some helper functions in `CUDASymmetricMemory.cu` to `CUDASymmetricMemoryUtils.cpp`, so that they can be shared by `NVSHMEMSymmetricMemory`. These functions are mostly side-band exchange helpers (`store_all_gather`, `IpcChannel`, etc). Adding `TORCH_SYMMEM` to control which implementation to use for CUDA tensors, currently support: `CUDA` (in-house impl), `NVSHMEM`. The NVSHMEM feature is gated by build-time flag: `USE_NVSHMEM=1`. And `NVSHMEM_HOME` setting is required (TODO). Ported most code from #146593. Pull Request resolved: #151261 Approved by: https://github.com/fegin, https://github.com/fduwjj
Stack from ghstack (oldest at bottom):
cc @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o