[CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build #163972

nWEIdia · 2025-09-26T17:12:55Z

Triton (release/3.5.x) by default ships CUDA12.8 ptxas.
This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark.

Fixes #163801

Test Plan:

Check binary size increase against nightly or v2.9RC
Install the binary from into a working THOR and B200/H100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on B200 is to ensure no regression.

[Update: since the original ptxas issue seems to be an ARM specific issue - THOR device is on ARM], This PR seems not appropriate.

Reference: #119750 and pytorch/builder@5c814e2

cc @ptrblck @eqy @tinglvv @atalman @malfet

builds because triton by default ships CUDA12.8 ptxas

pytorch-bot · 2025-09-26T17:13:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163972

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 6 Pending

As of commit 2979396 with merge base 5880996 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet

LGTM, but please update test plan
And propose followup PR for aarch64

nWEIdia · 2025-09-26T19:35:04Z

LGTM, but please update test plan And propose followup PR for aarch64

Thanks @malfet ! I updated the test plan and changed the title to reflect this is only x86. I will create a follow up for ARM CUDA13 wheel build, thanks for catching it!

…UDA13 Wheel Build See also pytorch#163972

nWEIdia · 2025-09-26T20:04:39Z

Considering the original issue all seem to derive from ARM systems (e.g. THOR).
We might consider dropping this PR in favor of #163988

nWEIdia · 2025-09-27T02:08:26Z

Closing as the new ptxas that comes with cuda13 is mostly dealing with ARM based GPU addition. So having a cuda13 ptxas in x86 cuda binary does not have clear help.
Prioritizing #163988 instead.

…63988) See also #163972, which was intended to be this PR. Triton (release/3.5.x) by default ships CUDA12.8 ptxas. This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark. Fixes #163801 Test Plan: Check binary size increase against nightly or v2.9RC Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression. Reference: #119750 and pytorch/builder@5c814e2 Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary. However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available) Pull Request resolved: #163988 Approved by: https://github.com/atalman

…63988) See also #163972, which was intended to be this PR. Triton (release/3.5.x) by default ships CUDA12.8 ptxas. This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark. Fixes #163801 Test Plan: Check binary size increase against nightly or v2.9RC Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression. Reference: #119750 and pytorch/builder@5c814e2 Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary. However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available) Pull Request resolved: #163988 Approved by: https://github.com/atalman (cherry picked from commit 3b4ad4a)

…64236) [AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 (#163988) See also #163972, which was intended to be this PR. Triton (release/3.5.x) by default ships CUDA12.8 ptxas. This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark. Fixes #163801 Test Plan: Check binary size increase against nightly or v2.9RC Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression. Reference: #119750 and pytorch/builder@5c814e2 Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary. However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available) Pull Request resolved: #163988 Approved by: https://github.com/atalman (cherry picked from commit 3b4ad4a) Co-authored-by: Wei Wang <weiwan@nvidia.com>

[CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13

702c4c7

builds because triton by default ships CUDA12.8 ptxas

nWEIdia requested a review from a team as a code owner September 26, 2025 17:12

nWEIdia added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Sep 26, 2025

pytorchbot added the open source label Sep 26, 2025

Undo submodule changes

7f8e744

nWEIdia added the release notes: build release notes category label Sep 26, 2025

Lint fix. sorry.

2979396

atalman added this to PyTorch + CUDA Sep 26, 2025

atalman moved this to Hi Priority in PyTorch + CUDA Sep 26, 2025

atalman approved these changes Sep 26, 2025

View reviewed changes

malfet approved these changes Sep 26, 2025

View reviewed changes

nWEIdia changed the title ~~[CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13~~ [CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build Sep 26, 2025

nWEIdia self-assigned this Sep 26, 2025

nWEIdia added a commit to nWEIdia/pytorch that referenced this pull request Sep 26, 2025

[AARCH64][CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for C…

325768a

…UDA13 Wheel Build See also pytorch#163972

nWEIdia mentioned this pull request Sep 26, 2025

[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 #163988

Closed

nWEIdia closed this Sep 27, 2025

github-project-automation bot moved this from Hi Priority to Done in PyTorch + CUDA Sep 27, 2025

pytorchbot mentioned this pull request Sep 30, 2025

[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 #164236

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build #163972

[CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build #163972

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build #163972

[CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build #163972

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163972

⏳ No Failures, 6 Pending

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants