[ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen #153373

Aidyn-A · 2025-05-12T13:20:42Z

A major release of CCCL 3.0.0 will introduce some bc-breaking changes. Namely iterators like TransformInputIterator and ConstantInputIterator were moved from CUB to Thrust, some operators like Max and Sum were moved to LibCUDACXX.

For the more info on changes please visit: https://nvidia.github.io/cccl/cccl/3.0_migration_guide.html

This is a follow up to PR #147493. A description from the original PR:

Several cub iterators have been deprecated and removed in the latest CCCL (cub) development NVIDIA/cccl#3831. This PR replaced the usage of those cub iterators with thrust iterators.

Some cub thread operators were also deprecated and removed in NVIDIA/cccl#3918. This PR replaced those operators with libcudacxx ops.

This might also affect ROCM usability a bit.

This patch is tested to work with CCCL commit at NVIDIA/cccl@82befb0

Tracking of CCCL/CUB deprecations in the most recent development NVIDIA/cccl#101

cc @ptrblck @msaroufim @eqy @jerryzh168 @manuelcandales @SherlockNoMad @angelayi @xwang233 @miscco

pytorch-bot · 2025-05-12T13:20:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153373

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 1 Pending, 1 Unrelated Failure

As of commit 793a956 with merge base d0cfa3e ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

miscco

This is technically correct.

However, with the exception of cuda::maximum and cuda::minimumall the types are available unconditionally.

So we can considerably simplify the changes

aten/src/ATen/cuda/cub.cu

aten/src/ATen/cuda/cub.cuh

aten/src/ATen/test/cuda_cub_test.cu

aten/src/ATen/native/cuda/Embedding.cu

aten/src/ATen/cuda/cub.cuh

aten/src/ATen/cuda/cub.cu

Aidyn-A · 2025-05-28T15:15:07Z

@ngimel, can you please review?

cyyever

Nice

Aidyn-A · 2025-06-03T17:35:41Z

Thanks!

@pytorchbot merge

pytorchmergebot · 2025-06-03T17:37:30Z

Merge failed

Reason: Approvers from one of the following sets are needed:

superuser (pytorch/metamates)
Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Aidyn-A · 2025-06-03T17:39:15Z

@ngimel, may I have your review please?

atalman

lgtm, fixes failure https://github.com/pytorch/pytorch/actions/runs/15904119151/job/44924693966
cc @ngimel and @malfet

Aidyn-A · 2025-06-27T23:37:06Z

@pytorchbot rebase

pytorchmergebot · 2025-06-27T23:38:44Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-06-27T23:38:48Z

Successfully rebased cccl_cub_v3_changes onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout cccl_cub_v3_changes && git pull --rebase)

cyyever · 2025-06-28T02:11:13Z

@pytorchbot merge

pytorchmergebot · 2025-06-28T02:13:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Aidyn-A requested review from eqy and syed-ahmed as code owners May 12, 2025 13:20

pytorch-bot bot added the release notes: cuda release notes category label May 12, 2025

Aidyn-A added module: cuda Related to torch.cuda, and CUDA support in general topic: not user facing topic category module: core aten Related to change to the Core ATen opset labels May 12, 2025

Aidyn-A requested a review from ngimel May 12, 2025 13:22

pytorchbot added the open source label May 12, 2025

miscco reviewed May 12, 2025

View reviewed changes

xwang233 mentioned this pull request May 12, 2025

[CUDA] Replace deprecated usages of cub iterators and thread operators #147493

Closed

colesbury added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 13, 2025

Aidyn-A marked this pull request as draft May 14, 2025 09:12

Aidyn-A marked this pull request as ready for review May 15, 2025 12:42

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label May 27, 2025

Aidyn-A marked this pull request as draft May 29, 2025 12:37

Aidyn-A force-pushed the cccl_cub_v3_changes branch from b332ead to 30512e7 Compare June 3, 2025 09:28

cyyever approved these changes Jun 3, 2025

View reviewed changes

cyyever added the ciflow/rocm Trigger "default" config CI on ROCm label Jun 3, 2025

Aidyn-A marked this pull request as ready for review June 3, 2025 17:35

pytorchmergebot added the merging label Jun 3, 2025

pytorchmergebot removed the merging label Jun 3, 2025

Aidyn-A requested a review from malfet June 9, 2025 16:24

atalman approved these changes Jun 27, 2025

View reviewed changes

Aidyn-A and others added 12 commits June 27, 2025 23:38

add changes to cccl usage

f0f8746

apply suggested changes

2d0b129

revert make_transform_iterator

021038a

use macros

2893966

fix typo

afa293a

fix namespaces

b8c8666

fix build for cuda 11.8

ffd0b8b

attempt to fix rocm build

02ce628

Fix rocm test

fa9223c

Add missing ROCM_HIPCUB

a050f27

Fix Win build

49cb6fa

Fix lint

793a956

pytorchmergebot force-pushed the cccl_cub_v3_changes branch from 1b77746 to 793a956 Compare June 27, 2025 23:38

pytorchmergebot added the merging label Jun 28, 2025

pytorchmergebot added the Merged label Jun 28, 2025

pytorchmergebot closed this in 51eb8e8 Jun 28, 2025

pytorchmergebot removed the merging label Jun 28, 2025

tinglvv mentioned this pull request Jul 2, 2025

Failure with cub::TransformInputIterator in 12.9 periodic CI test #157502

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen #153373

[ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen #153373

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen #153373

[ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen #153373

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153373

⏳ 1 Pending, 1 Unrelated Failure

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge failed

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!