8000 [CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build by nWEIdia · Pull Request #163972 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@nWEIdia
Copy link
Collaborator
@nWEIdia nWEIdia commented Sep 26, 2025

Triton (release/3.5.x) by default ships CUDA12.8 ptxas.
This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark.

Fixes #163801

Test Plan:

  1. Check binary size increase against nightly or v2.9RC
  2. Install the binary from into a working THOR and B200/H100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on B200 is to ensure no regression.

[Update: since the original ptxas issue seems to be an ARM specific issue - THOR device is on ARM], This PR seems not appropriate.

Reference: #119750 and pytorch/builder@5c814e2

cc @ptrblck @eqy @tinglvv @atalman @malfet

builds because triton by default ships CUDA12.8 ptxas
@nWEIdia nWEIdia requested a review from a team as a code owner September 26, 2025 17:12
@pytorch-bot
Copy link
pytorch-bot bot commented Sep 26, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163972

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 6 Pending

As of commit 2979396 with merge base 5880996 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@nWEIdia nWEIdia added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Sep 26, 2025
@nWEIdia nWEIdia added the release notes: build release notes category label Sep 26, 2025
@atalman atalman moved this to Hi Priority in PyTorch + CUDA Sep 26, 2025
Copy link
Contributor
@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please update test plan
And propose followup PR for aarch64

@nWEIdia nWEIdia changed the title [CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 [CD][CUDA][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1 for CUDA13 X86 Wheel Build Sep 26, 2025
@nWEIdia
Copy link
Collaborator Author
nWEIdia commented Sep 26, 2025

LGTM, but please update test plan And propose followup PR for aarch64

Thanks @malfet ! I updated the test plan and changed the title to reflect this is only x86. I will create a follow up for ARM CUDA13 wheel build, thanks for catching it!

@nWEIdia nWEIdia self-assigned this Sep 26, 2025
nWEIdia added a commit to nWEIdia/pytorch that referenced this pull request Sep 26, 2025
@nWEIdia
Copy link
Collaborator Author
nWEIdia commented Sep 26, 2025

Considering the original issue all seem to derive from ARM systems (e.g. THOR).
We might consider dropping this PR in favor of #163988

@nWEIdia
Copy link
Collaborator Author
nWEIdia commented Sep 27, 2025

Closing as the new ptxas that comes with cuda13 is mostly dealing with ARM based GPU addition. So having a cuda13 ptxas in x86 cuda binary does not have clear help.
Prioritizing #163988 instead.

@nWEIdia nWEIdia closed this Sep 27, 2025
@github-project-automation github-project-automation bot moved this from Hi Priority to Done in PyTorch + CUDA Sep 27, 2025
pytorchmergebot pushed a commit that referenced this pull request Sep 30, 2025
…63988)

See also #163972, which was intended to be this PR.

Triton (release/3.5.x) by default ships CUDA12.8 ptxas.
This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark.

Fixes #163801

Test Plan:

Check binary size increase against nightly or v2.9RC
Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression.
Reference: #119750 and pytorch/builder@5c814e2

Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary.
However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then  https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available)

Pull Request resolved: #163988
Approved by: https://github.com/atalman
pytorchbot pushed a commit that referenced this pull request Sep 30, 2025
…63988)

See also #163972, which was intended to be this PR.

Triton (release/3.5.x) by default ships CUDA12.8 ptxas.
This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark.

Fixes #163801

Test Plan:

Check binary size increase against nightly or v2.9RC
Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression.
Reference: #119750 and pytorch/builder@5c814e2

Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary.
However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then  https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available)

Pull Request resolved: #163988
Approved by: https://github.com/atalman

(cherry picked from commit 3b4ad4a)
atalman pushed a commit that referenced this pull request Sep 30, 2025
…64236)

[AARCH64][CD][CUDA13][Triton][PTXAS] Turn on BUILD_BUNDLE_PTXAS=1   (#163988)

See also #163972, which was intended to be this PR.

Triton (release/3.5.x) by default ships CUDA12.8 ptxas.
This PR tries to bundle a ptxas version for cuda13, so that it can help #163801 when users run on new devices like THOR and Spark.

Fixes #163801

Test Plan:

Check binary size increase against nightly or v2.9RC
Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression.
Reference: #119750 and pytorch/builder@5c814e2

Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary.
However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then  https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available)

Pull Request resolved: #163988
Approved by: https://github.com/atalman

(cherry picked from commit 3b4ad4a)

Co-authored-by: Wei Wang <weiwan@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/binaries Trigger all binary build and upload jobs on the PR open source release notes: build release notes category

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[CUDA][Triton][PTXAS] Triton Wheel Missing CUDA13 PTXAS - Breakage exists for the environment where CTK is not present

4 participants

0