[NVIDIA] Full Family Blackwell Support codegen #145436

johnnynunez · 2025-01-23T00:46:46Z

cc @ptrblck @msaroufim @eqy @Fuzzkatt

More references:
https://github.com/NVIDIA/nccl

pytorch-bot · 2025-01-23T00:46:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145436

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c359b91 with merge base 302b07f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wangxianggang1997 · 2025-01-23T09:36:21Z

Hello, I would like to install a working PyTorch on Nvidia's Balckwell architecture's Thor platform, but it seems that there is no public version yet. Can you please let me know if this version can run on the Thor platform and how to compile it? When I pull the PyTorch source code from the main branch and compile it on the Thor platform, there will be many errors. I don't know how to solve this problem. Can you help me

johnnynunez · 2025-01-23T09:41:25Z

Hello, I would like to install a working PyTorch on Nvidia's Balckwell architecture's Thor platform, but it seems that there is no public version yet. Can you please let me know if this version can run on the Thor platform and how to compile it? When I pull the PyTorch source code from the main branch and compile it on the Thor platform, there will be many errors. I don't know how to solve this problem. Can you help me

We wait for cuda 12.8 public released

wangxianggang1997 · 2025-01-23T09:52:37Z

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

johnnynunez · 2025-01-23T09:54:06Z

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

cuda 12.7 is b100 support 12.8 is blackwell family

johnnynunez · 2025-01-23T13:20:58Z

cc @malfet

ezyang · 2025-01-23T15:21:55Z

@pytorchbot merge

pytorchmergebot · 2025-01-23T15:24:24Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

johnnynunez · 2025-01-23T15:36:07Z

@pytorchbot merge

Labels as always😂

johnnynunez · 2025-01-23T16:38:26Z

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

Well Thor will be launched in the following months, I suppose that in GTC2025

torch/utils/cpp_extension.py

malfet · 2025-01-23T22:22:43Z

cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake

  list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0")
+  list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0a")


Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one?
Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing

Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one? Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing

all these new blackwell needs, the new cuda 12.8
whitepaper: https://docs.nvidia.com/cuda/pdf/ptx_isa_8.7.pdf @malfet

Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one? Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing

up to you!

Also when I updated this the name 12.8 wasn't released at the time hence I had this 'greater than' 12.6. I think thats fine to keep as is

@drisspg are those architectures will be selected by default when user builds with 12.8? Or only for few ops? There are no difference between 12.0 and 12.0a for matmul or anything like that is there?

So nothing within pytorch uses the 10 + 'a' features today. We do have one file that builds with sm90a for which is the rowwise kernel.

So if building form source My understanding is that if you dont set cuda-arch list we warn saying you should. We then try auto:

pytorch/cmake/Modules_CUDA_fix/upstream/FindCUDA/select_compute_arch.cmake

Line 170 in a57133e

if("X${CUDA_ARCH_LIST}" STREQUAL "X" )

to detect w/ gpu you are on

otherwise yah we we would fall back to the above which adds them all.

I think its fine either way, you can build today from source w/ 10a if you set it manually in arch list.

The cpp_extensions though needs to be updated since that errors if you aren't in that list

johnnynunez · 2025-01-23T23:07:39Z

cc @malfet pytorch is building for my rtx5090 with cuda 12.8

johnnynunez · 2025-01-23T23:11:56Z

cc @malfet FYI: torch torchaudio and torchvision is compiling with cuda 12.8 correctly

wangxianggang1997 · 2025-01-24T01:48:05Z

cc 仅供参考：Torch TorchAudio 和 TorchVision 使用 CUDA 12.8 正确编译

May I ask how you successfully compiled PyTorch adapted to CUDA12.8? Do you have the source code link and corresponding process description? Looking forward to your answer

main-horse · 2025-01-24T03:37:02Z

@johnnynunez do you mind running a matmul benchmark for your 5090, if you have the time? 🙏

ezyang · 2025-01-24T04:28:32Z

@pytorchbot merge

pytorchmergebot · 2025-01-24T04:30:12Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

johnnynunez · 2025-01-24T07:31:45Z

@johnnynunez do you mind running a matmul benchmark for your 5090, if you have the time? 🙏

yeah! I have running COSMOS on 5090 right now! it is working. Only missing support fp4 because transformer engine must be adapted with the new whitepaper

johnnynunez · 2025-01-24T17:08:16Z

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

hello, you can run now on jetson thor. Cuda 12.8 is out

johnnynunez and others added 8 commits January 21, 2025 14:04

rtx50 blackwell

f4ed8b9

rtx50 blackwell

b7696cc

Merge branch 'pytorch:main' into master

e071d78

Jetson Thor

dbd449c

Merge branch 'pytorch:main' into master

8ca4012

Merge branch 'pytorch:main' into master

9778cd3

Update select_compute_arch.cmake

4bed00a

Update cpp_extension.py

a7679bc

johnnynunez requested review from fmassa, soumith and ezyang as code owners January 23, 2025 00:46

pytorchbot added the open source label Jan 23, 2025

ezyang approved these changes Jan 23, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 23, 2025

pytorchmergebot added the merging label Jan 23, 2025

pytorchmergebot removed the merging label Jan 23, 2025

pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 15:53 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results Janua 8000 ry 23, 2025 15:53 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 15:53 Inactive

johnnynunez mentioned this pull request Jan 23, 2025

initial support blackwell flashinfer-ai/flashinfer#747

Open

malfet reviewed Jan 23, 2025

View reviewed changes

torch/utils/cpp_extension.py Show resolved Hide resolved

drisspg approved these changes Jan 23, 2025

View reviewed changes

pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 22:20 Inactive

malfet reviewed Jan 23, 2025

View reviewed changes

johnnynunez requested review from malfet and drisspg January 23, 2025 23:15

drisspg added module: cuda Related to torch.cuda, and CUDA support in general release notes: build release notes category labels Jan 24, 2025

drisspg approved these changes Jan 24, 2025

View reviewed changes

drisspg requested review from eqy, syed-ahmed and ptrblck January 24, 2025 00:22

pytorchmergebot added the merging label Jan 24, 2025

pytorchmergebot added the Merged label Jan 24, 2025

pytorchmergebot closed this in 732c499 Jan 24, 2025

pytorchmergebot removed the merging label Jan 24, 2025

johnnynunez mentioned this pull request Jan 24, 2025

[RFC] Cuda support matrix for Release 2.7 #145544

Closed

eqy mentioned this pull request Jan 29, 2025

[CUDA][Blackwell] Blackwell Tracking Issue #145949

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVIDIA] Full Family Blackwell Support codegen #145436

[NVIDIA] Full Family Blackwell Support codegen #145436

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

C0E4 Uh oh!

Uh oh!

		list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0")
		list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0a")

[NVIDIA] Full Family Blackwell Support codegen #145436

[NVIDIA] Full Family Blackwell Support codegen #145436

Uh oh!

Conversation

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145436

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

C0E4 Uh oh!

Uh oh!