8000 [NVIDIA] Full Family Blackwell Support codegen by johnnynunez · Pull Request #145436 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[NVIDIA] Full Family Blackwell Support codegen #145436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

johnnynunez
Copy link
Contributor

Copy link
pytorch-bot bot commented Jan 23, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145436

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c359b91 with merge base 302b07f (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@wangxianggang1997
Copy link

Hello, I would like to install a working PyTorch on Nvidia's Balckwell architecture's Thor platform, but it seems that there is no public version yet. Can you please let me know if this version can run on the Thor platform and how to compile it? When I pull the PyTorch source code from the main branch and compile it on the Thor platform, there will be many errors. I don't know how to solve this problem. Can you help me

@johnnynunez
Copy link
Contributor Author

Hello, I would like to install a working PyTorch on Nvidia's Balckwell architecture's Thor platform, but it seems that there is no public version yet. Can you please let me know if this version can run on the Thor platform and how to compile it? When I pull the PyTorch source code from the main branch and compile it on the Thor platform, there will be many errors. I don't know how to solve this problem. Can you help me

We wait for cuda 12.8 public released

@wangxianggang1997
Copy link

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

@johnnynunez
Copy link
Contributor Author

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

cuda 12.7 is b100 support 12.8 is blackwell family

@johnnynunez
Copy link
Contributor Author

cc @malfet

@ezyang
Copy link
Contributor
ezyang commented Jan 23, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 23, 2025
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@johnnynunez
Copy link
Contributor Author

@pytorchbot merge

Labels as always😂

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 15:53 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results Janua 8000 ry 23, 2025 15:53 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 15:53 Inactive
@johnnynunez
Copy link
Contributor Author

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

Well Thor will be launched in the following months, I suppose that in GTC2025

@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 22:20 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 22:20 Inactive
@pytorch-bot pytorch-bot bot temporarily deployed to upload-benchmark-results January 23, 2025 22:20 Inactive
list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0")
list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0a")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one?
Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing

Copy link
Contributor Author
@johnnynunez johnnynunez Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one? Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing

all these new blackwell needs, the new cuda 12.8
whitepaper: https://docs.nvidia.com/cuda/pdf/ptx_isa_8.7.pdf @malfet

Copy link
Contributor Author
@johnnynunez johnnynunez Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one? Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing

up to you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also when I updated this the name 12.8 wasn't released at the time hence I had this 'greater than' 12.6. I think thats fine to keep as is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drisspg are those architectures will be selected by default when user builds with 12.8? Or only for few ops? There are no difference between 12.0 and 12.0a for matmul or anything like that is there?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So nothing within pytorch uses the 10 + 'a' features today. We do have one file that builds with sm90a for which is the rowwise kernel.

So if building form source My understanding is that if you dont set cuda-arch list we warn saying you should. We then try auto:

if("X${CUDA_ARCH_LIST}" STREQUAL "X" )

to detect w/ gpu you are on

otherwise yah we we would fall back to the above which adds them all.

I think its fine either way, you can build today from source w/ 10a if you set it manually in arch list.

The cpp_extensions though needs to be updated since that errors if you aren't in that list

@johnnynunez
Copy link
Contributor Author

cc @malfet pytorch is building for my rtx5090 with cuda 12.8
image

@johnnynunez
Copy link
Contributor Author
johnnynunez commented Jan 23, 2025

cc @malfet FYI: torch torchaudio and torchvision is compiling with cuda 12.8 correctly
image

@johnnynunez johnnynunez requested review from malfet and drisspg January 23, 2025 23:15
@drisspg drisspg added module: cuda Related to torch.cuda, and CUDA support in general release notes: build release notes category labels Jan 24, 2025
@drisspg drisspg requested review from eqy, syed-ahmed and ptrblck January 24, 2025 00:22
@wangxianggang1997
Copy link

cc 仅供参考:Torch TorchAudio 和 TorchVision 使用 CUDA 12.8 正确编译 image

May I ask how you successfully compiled PyTorch adapted to CUDA12.8? Do you have the source code link and corresponding process description? Looking forward to your answer

@main-horse
Copy link
Contributor

@johnnynunez do you mind running a matmul benchmark for your 5090, if you have the time? 🙏

@ezyang
Copy link
Contributor
ezyang commented Jan 24, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@johnnynunez
Copy link
Contributor Author

@johnnynunez do you mind running a matmul benchmark for your 5090, if you have the time? 🙏

yeah! I have running COSMOS on 5090 right now! it is working. Only missing support fp4 because transformer engine must be adapted with the new whitepaper

@johnnynunez
Copy link
Contributor Author
johnnynunez commented Jan 24, 2025

We wait for cuda 12.8 public released

Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform

hello, you can run now on jetson thor. Cuda 12.8 is out

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: cuda Related to torch.cuda, and CUDA support in general open source release notes: build release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0