-
Notifications
You must be signed in to change notification settings - Fork 24.7k
[NVIDIA] Full Family Blackwell Support codegen #145436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145436
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c359b91 with merge base 302b07f ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hello, I would like to install a working PyTorch on Nvidia's Balckwell architecture's Thor platform, but it seems that there is no public version yet. Can you please let me know if this version can run on the Thor platform and how to compile it? When I pull the PyTorch source code from the main branch and compile it on the Thor platform, there will be many errors. I don't know how to solve this problem. Can you help me |
We wait for cuda 12.8 public released |
Thank you very much for your answer. I am not sure when the CUDA12.8 version will be released. I am currently in a hurry to use a usable PyTorch on the Thor platform. May I ask if it is feasible to compile PyTorch source code on the Thor platform |
cuda 12.7 is b100 support 12.8 is blackwell family |
cc @malfet |
@pytorchbot merge |
Merge failedReason: This PR needs a If not, please add the To add a label, you can comment to pytorchbot, for example For more information, see Details for Dev Infra teamRaised by workflow job |
Labels as always😂 |
Well Thor will be launched in the following months, I suppose that in GTC2025 |
list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0") | ||
list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "12.0a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one?
Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one? Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing
all these new blackwell needs, the new cuda 12.8
whitepaper: https://docs.nvidia.com/cuda/pdf/ptx_isa_8.7.pdf @malfet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure if 12.6 will support 12.0a? Or 12.8 is needed for this one? Also, building everything with 12.0a might be an overkill as it affects probably only 1-2 operators (flex attention for example), as we are already doing
up to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when I updated this the name 12.8 wasn't released at the time hence I had this 'greater than' 12.6. I think thats fine to keep as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drisspg are those architectures will be selected by default when user builds with 12.8? Or only for few ops? There are no difference between 12.0 and 12.0a for matmul or anything like that is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So nothing within pytorch uses the 10 + 'a' features today. We do have one file that builds with sm90a for which is the rowwise kernel.
So if building form source My understanding is that if you dont set cuda-arch list we warn saying you should. We then try auto:
if("X${CUDA_ARCH_LIST}" STREQUAL "X" ) |
to detect w/ gpu you are on
otherwise yah we we would fall back to the above which adds them all.
I think its fine either way, you can build today from source w/ 10a if you set it manually in arch list.
The cpp_extensions though needs to be updated since that errors if you aren't in that list
cc @malfet pytorch is building for my rtx5090 with cuda 12.8 |
cc @malfet FYI: torch torchaudio and torchvision is compiling with cuda 12.8 correctly |
@johnnynunez do you mind running a matmul benchmark for your 5090, if you have the time? 🙏 |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
yeah! I have running COSMOS on 5090 right now! it is working. Only missing support fp4 because transformer engine must be adapted with the new whitepaper |
hello, you can run now on jetson thor. Cuda 12.8 is out |
cc @ptrblck @msaroufim @eqy @Fuzzkatt
More references:
https://github.com/NVIDIA/nccl