-
Notifications
You must be signed in to change notification settings - Fork 24.7k
Add NVSHMEM to PYTORCH_EXTRA_INSTALL_REQUIREMENTS #154568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/154568
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 51f8222 with merge base 241f8dc ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@Skylion007 @atalman do you mind having a look? |
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds. See: https://pypi.nvidia.com/nvidia-nvshmem-cu12/ https://pypi.nvidia.com/nvidia-nvshmem-cu11/ [ghstack-poisoned]
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds. See: https://pypi.nvidia.com/nvidia-nvshmem-cu12/ https://pypi.nvidia.com/nvidia-nvshmem-cu11/ [ghstack-poisoned]
@@ -53,6 +53,7 @@ | |||
"nvidia-cusolver-cu11==11.4.1.48; platform_system == 'Linux' and platform_machine == 'x86_64' | " | |||
"nvidia-cusparse-cu11==11.7.5.86; platform_system == 'Linux' and platform_machine == 'x86_64' | " | |||
"nvidia-nccl-cu11==2.21.5; platform_system == 'Linux' and platform_machine == 'x86_64' | " | |||
"nvidia-nvshmem-cu11==3.2.5; platform_system == 'Linux' and platform_machine == 'x86_64' | " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, removed from 11.8
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds. See: https://pypi.nvidia.com/nvidia-nvshmem-cu12/ https://pypi.nvidia.com/nvidia-nvshmem-cu11/ [ghstack-poisoned]
@Skylion007 thanks! Added rpath. |
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds. See: https://pypi.nvidia.com/nvidia-nvshmem-cu12/ https://pypi.nvidia.com/nvidia-nvshmem-cu11/ [ghstack-poisoned]
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds. See: https://pypi.nvidia.com/nvidia-nvshmem-cu12/ https://pypi.nvidia.com/nvidia-nvshmem-cu11/ [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kwen2501 I believe it need to be added to: Bundling with cudnn and cublas.
in .ci/manywheel/build_cuda.sh use case as well
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds. See: https://pypi.nvidia.com/nvidia-nvshmem-cu12/ https://pypi.nvidia.com/nvidia-nvshmem-cu11/ Pull Request resolved: pytorch#154568 Approved by: https://github.com/atalman ghstack dependencies: pytorch#154538
hmmm there are often scenarios where people patch and build nvshmem themselves. would pytorch bringing in a native dependency of nvshmem break such usage? for example, vllm's recipes to use nvshmem is: |
@youkaichao thanks for raising the concern. Are those patches improvements / extensions to NVSHMEM? If so, would DeepEp be interested in upstreaming them to NVSHMEM? (It would be easier for DeepEp to maintain their codebase too.) |
@kwen2501 i think nvshmem 3.3 has integrated these patches. I haven't fully understand what would happen if multiple versions / instances of nvshmem exist in the same program yet. |
We have already incorporated the changes done by DeepEP in NVSHMEM 3.3. There was one change about "receive queue support" that was ABI breaking but it was recently confirmed that they are not using that feature anymore and that it can be removed (deepseek-ai/DeepEP#147). We do need to create a patch for DeepEP to get rid of those changes and use upstreamed NVSHMEM directly instead. I am working on that. Once that is done, the PyTorch integration and DeepEP usage should be just fine. I will post a link to the PR I open to keep you updated. Other than this, DeepEP carries their own version of the device-side ibgda_device.cu file which uses an internal NVSHMEM IBGDA API to be able to do QP selection. In NVSHMEM 3.4, we are working on exposing a standard NVSHMEM API for doing QP selection. DeepEP will then be free to use the exposed API rather than their internal implementation. But this does not impact PyTorch integration. |
FYI - Opened Friday: deepseek-ai/DeepEP#295 |
Stack from ghstack (oldest at bottom):
NVSHMEM 3.2.5 (released Mar 2025) have both cu11 and cu12 builds.
See:
https://pypi.nvidia.com/nvidia-nvshmem-cu12/
https://pypi.nvidia.com/nvidia-nvshmem-cu11/