10000 [BE]: Update CU128 cudnn to 9.8.0.87 by Skylion007 · Pull Request #148963 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[BE]: Update CU128 cudnn to 9.8.0.87 #148963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Skylion007
Copy link
Collaborator
@Skylion007 Skylion007 commented Mar 11, 2025

Also cu12.6 is an on old CUDNN version, we may want to upgrade it for all the performance reasons as I don't see a manywheel linux reason to stay back on the old 9.5 release. I might split that into it's own PR. This one just updates CU126 to the latest and greatest.

@Skylion007 Skylion007 requested review from jansel, eqy, malfet and nWEIdia March 11, 2025 14:44
Copy link
pytorch-bot bot commented Mar 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148963

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 1 Unrelated Failure

As of commit 8b23833 with merge base f1787ee (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 11, 2025
@Skylion007 Skylion007 requested a review from tinglvv March 11, 2025 14:46
@Skylion007
Copy link
Collaborator Author

@tinglvv Opened the most recent PR for updating CUDNN for 12.8, any reason we didn't also update for 12.6? We had a version split previously due to ABI compatibility due to the manylinux upgrade, by that shouldn't be an issue anymore.

@@ -5,7 +5,7 @@ if [[ -n "${CUDNN_VERSION}" ]]; then
mkdir tmp_cudnn
pushd tmp_cudnn
if [[ ${CUDA_VERSION:0:4} == "12.8" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.7.1.26_cuda12-archive"
CUDNN_NAME="cudnn-linux-x86_64-9.8.0.87_cuda12-archive"
elif [[ ${CUDA_VERSION:0:4} == "12.6" ]]; then
CUDNN_NAME="cudnn-linux-x86_64-9.5.1.17_cuda12-archive"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably merged with 12.8 too, no reason to keep 12.6 on an old CUDNN version when there a lot of performance fixes that apply to Hopper in newer releases too now

@Skylion007 Skylion007 added the better-engineering Relatively self-contained tasks for better engineering contributors label Mar 11, 2025
@Skylion007 Skylion007 force-pushed the skylion007/update-cudnn-9-8-0-87 branch from 632751a to 8b23833 Compare March 11, 2025 15:10
@Skylion007 Skylion007 marked this pull request as ready for review March 11, 2025 15:45
@Skylion007 Skylion007 requested review from a team and jeffdaily as code owners March 11, 2025 15:45
@Skylion007
Copy link
Collaborator Author

@jansel Should we update CU126's libraries in this PR or another one?

@eqy
Copy link
Collaborator
eqy commented Mar 11, 2025

I would consider a separate PR, background is that 9.7+ is for Blackwell.
In the past we have not be super active in bumping cuDNN versions for older CUDA toolkit versions.

@@ -76,7 +76,7 @@
"nvidia-cuda-nvrtc-cu12==12.8.61; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cuda-runtime-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cuda-cupti-cu12==12.8.57; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cudnn-cu12==9.7.1.26; platform_system == 'Linux' and platform_machine == 'x86_64' | "
"nvidia-cudnn-cu12==9.8.0.87; platform_system == 'Linux' and platform_machine == 'x86_64' | "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this may need a synchronization point where @atalman usually helps us with uploading 9.8.0.87 nvidia-cudnn-cu12 first? Or this has already been done?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pypi.org/project/nvidia-cudnn-cu12/ looks updated with 9.8.0.87, so I think we are good on that front.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to upload it to our s3 bucket unfortunately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tinglvv for security reasons, all dependencies of torch need to live on https://download.pytorch.org/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explanation! yes then indeed we need 9.8.0.87 in https://download.pytorch.org/whl/nightly/nvidia-cudnn-cu12/

Copy link
Collaborator
@nWEIdia nWEIdia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just had a question on uploading pypi cudnn wheel to AWS S3.

@tinglvv tinglvv added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Mar 11, 2025
@tinglvv
Copy link
Collaborator
tinglvv commented Mar 11, 2025

LGTM, if the ciflow/binaries pass then we are good to merge.

@Skylion007 Skylion007 requested a review from atalman March 11, 2025 17:20
@jansel
Copy link
Contributor
jansel commented Mar 11, 2025

@jansel Should we update CU126's libraries in this PR or another one?

Smaller PRs would be easier.

@Skylion007
Copy link
Collaborator Author

Thanks for uploading the binaries @atalman but it seems like the S3 bucket is returning a 403 error on the wheels.

@Skylion007
Copy link
Collaborator Author

@pytorchbot merge -i

Copy link
Contributor
@atalman atalman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thank you @Skylion007

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 13, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 6 checks: pull / linux-focal-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge), macos-arm64-binary-wheel / wheel-py3_10-cpu-build, macos-arm64-binary-wheel / wheel-py3_11-cpu-build, macos-arm64-binary-wheel / wheel-py3_13-cpu-build, macos-arm64-binary-wheel / wheel-py3_12-cpu-build, macos-arm64-binary-wheel / wheel-py3_9-cpu-build

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
better-engineering Relatively self-contained tasks for better engineering contributors ciflow/binaries Trigger all binary build and upload jobs on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0