-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[BE]: Update CU128 cudnn to 9.8.0.87 #148963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BE]: Update CU128 cudnn to 9.8.0.87 #148963
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148963
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 1 Unrelated FailureAs of commit 8b23833 with merge base f1787ee ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@tinglvv Opened the most recent PR for updating CUDNN for 12.8, any reason we didn't also update for 12.6? We had a version split previously due to ABI compatibility due to the manylinux upgrade, by that shouldn't be an issue anymore. |
@@ -5,7 +5,7 @@ if [[ -n "${CUDNN_VERSION}" ]]; then | |||
mkdir tmp_cudnn | |||
pushd tmp_cudnn | |||
if [[ ${CUDA_VERSION:0:4} == "12.8" ]]; then | |||
CUDNN_NAME="cudnn-linux-x86_64-9.7.1.26_cuda12-archive" | |||
CUDNN_NAME="cudnn-linux-x86_64-9.8.0.87_cuda12-archive" | |||
elif [[ ${CUDA_VERSION:0:4} == "12.6" ]]; then | |||
CUDNN_NAME="cudnn-linux-x86_64-9.5.1.17_cuda12-archive" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably merged with 12.8 too, no reason to keep 12.6 on an old CUDNN version when there a lot of performance fixes that apply to Hopper in newer releases too now
632751a
to
8b23833
Compare
@jansel Should we update CU126's libraries in this PR or another one? |
I would consider a separate PR, background is that 9.7+ is for Blackwell. |
@@ -76,7 +76,7 @@ | |||
"nvidia-cuda-nvrtc-cu12==12.8.61; platform_system == 'Linux' and platform_machine == 'x86_64' | " | |||
"nvidia-cuda-runtime-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | " | |||
"nvidia-cuda-cupti-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | " | |||
"nvidia-cudnn-cu12==9.7.1.26; platform_system == 'Linux' and platform_machine == 'x86_64' | " | |||
"nvidia-cudnn-cu12==9.8.0.87; platform_system == 'Linux' and platform_machine == 'x86_64' | " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing this may need a synchronization point where @atalman usually helps us with uploading 9.8.0.87 nvidia-cudnn-cu12 first? Or this has already been done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://pypi.org/project/nvidia-cudnn-cu12/ looks updated with 9.8.0.87, so I think we are good on that front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to upload it to our s3 bucket unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tinglvv for security reasons, all dependencies of torch need to live on https://download.pytorch.org/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation! yes then indeed we need 9.8.0.87 in https://download.pytorch.org/whl/nightly/nvidia-cudnn-cu12/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just had a question on uploading pypi cudnn wheel to AWS S3.
LGTM, if the ciflow/binaries pass then we are good to merge. |
Smaller PRs would be easier. |
Thanks for uploading the binaries @atalman but it seems like the S3 bucket is returning a 403 error on the wheels. |
@pytorchbot merge -i |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Thank you @Skylion007
Merge startedYour change will be merged while ignoring the following 6 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge), macos-arm64-binary-wheel / wheel-py3_10-cpu-build, macos-arm64-binary-wheel / wheel-py3_11-cpu-build, macos-arm64-binary-wheel / wheel-py3_13-cpu-build, macos-arm64-binary-wheel / wheel-py3_12-cpu-build, macos-arm64-binary-wheel / wheel-py3_9-cpu-build Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Also cu12.6 is an on old CUDNN version, we may want to upgrade it for all the performance reasons as I don't see a manywheel linux reason to stay back on the old 9.5 release. I might split that into it's own PR. This one just updates CU126 to the latest and greatest.