[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions #153101

eqy · 2025-05-07T21:42:56Z

For #152816

cc @ptrblck @msaroufim @jerryzh168 @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2025-05-07T21:42:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153101

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 0a9c6da with merge base 89ebd29 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

trunk / win-vs2022-cpu-py3 / test (default, 1, 3, lf.windows.4xlarge.nonephemeral) (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
trunk / win-vs2022-cpu-py3 / test (default, 3, 3, lf.windows.4xlarge.nonephemeral) (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Skylion007 · 2025-05-08T14:18:30Z

test/nn/test_convolution.py

+        c = nn.Conv2d(2, 2, kernel_size=3, stride=1, padding=1, groups=2)
+        yref = c(x)
+        y = c.to(device=device)(x.to(device=device))
+        self.assertEqual(yref, y)


Huh, this is really numerically stable enough on for assertEqual instead of assertClose?

assertEqual does use tolerances under the hood

eqy · 2025-05-12T17:06:11Z

@pytorchmergebot rebase

pytorchmergebot · 2025-05-12T17:07:42Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-05-12T17:07:45Z

Successfully rebased depthwisenchw64 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout depthwisenchw64 && git pull --rebase)

eqy · 2025-05-12T18:17:16Z

@pytorchmergebot merge

pytorchmergebot · 2025-05-12T18:19:18Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-05-12T19:22:49Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2)

Details for Dev Infra team

Raised by workflow job

Skylion007 · 2025-05-13T14:11:00Z

aten/src/ATen/native/Convolution.cpp

@@ -467,8 +467,19 @@ struct ConvParams {
      // always use cudnn_depthwise for channels_last format
      return true;
    }
+    static long cudnn_version = detail::getCUDAHooks().versionCuDNN();
+    // native kernel doesn't support 64-bit non-splittable case
+    if (needs_64bit_indexing_no_split(input, weight) && detail::getCUDAHooks().compiledWithCuDNN() && cudnn_enabled) {


Start with cudnn_enabled for efficient short circuiting here.

Skylion007 · 2025-05-13T14:12:12Z

aten/src/ATen/native/Convolution.cpp

@@ -467,8 +467,19 @@ struct ConvParams {
      // always use cudnn_depthwise for channels_last format
      return true;
    }
+    static long cudnn_version = detail::getCUDAHooks().versionCuDNN();


This static variable is now run even if CUDNN is not enabled or compiled with, is this the source of the ROCM issues? Might not be able to be static as a result (unless you can put it in a constexpr or something).

eqy · 2025-05-14T15:14:05Z

@pytorchmergebot merge

pytorchmergebot · 2025-05-14T15:17:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jeanschmidt · 2025-05-14T18:49:44Z

@pytorchbot revert -m "Seems to have introduced breakages on main, tentative revert: https://github.com/pytorch/pytorch/actions/runs/15024667248/job/42224521705" -c nosignal

pytorchmergebot · 2025-05-14T18:51:58Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

eqy · 2025-05-20T16:14:02Z

@pytorchmergebot merge

pytorchmergebot · 2025-05-20T16:17:00Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-05-20T16:17:10Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-xpu-2025.1-py3.9 / build

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

eqy · 2025-05-20T16:26:15Z

@pytorchmergebot rebase

pytorchmergebot · 2025-05-20T16:27:56Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-05-20T16:27:59Z

Successfully rebased depthwisenchw64 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout depthwisenchw64 && git pull --rebase)

eqy · 2025-05-20T20:11:26Z

@pytorchmergebot merge

pytorchmergebot · 2025-05-20T20:13:27Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

eqy added module: cuda Related to torch.cuda, and CUDA support in general module: convolution Problems related to convolutions (THNN, THCUNN, CuDNN) open source topic: not user facing topic category labels May 7, 2025

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label May 7, 2025

eqy mentioned this pull request May 7, 2025

Depthwise Separable Convolutions with Large Tensors (> 2**31) Elements) Fail Despite cuDNN 64-bit Indexing Support #152816

Closed

Skylion007 approved these changes May 8, 2025

View reviewed changes

pytorchmergebot force-pushed the depthwisenchw64 branch from b339a07 to fa66195 Compare May 12, 2025 17:07

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 12, 2025

pytorchmergebot added the merging label May 12, 2025

pytorchmergebot removed the merging label May 12, 2025

Skylion007 reviewed May 13, 2025

View reviewed changes

pytorchmergebot added the merging label May 14, 2025

pytorchmergebot added the Merged label May 14, 2025

pytorchmergebot closed this in ced90d2 May 14, 2025

pytorchmergebot removed the merging label May 14, 2025

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels May 14, 2025

pytorchmergebot reopened this May 14, 2025

eqy added ciflow/mps Run MPS tests (subset of trunk) ciflow/rocm Trigger "default" config CI on ROCm labels May 14, 2025

pytorchmergebot added the merging label May 20, 2025

pytorchmergebot removed the merging label May 20, 2025

eqy added 10 commits May 20, 2025 16:27

check in

2126436

lint

8085503

lint

55dc302

make sure enabled

a0279c5

fix condition and move version check inside

0d3a436

Update Convolution.cpp

9d0ae34

lint

52ce0b6

try to fix condition

0acce50

skip on rocm

73a6216

fix decorator location

0a9c6da

pytorchmergebot force-pushed the depthwisenchw64 branch from 18ba66e to 0a9c6da Compare May 20, 2025 16:28

pytorchmergebot added the merging label May 20, 2025

pytorchmergebot closed this in 823a358 May 20, 2025

pytorchmergebot removed the merging label May 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions #153101

[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions #153101

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions #153101

[CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions #153101

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/153101

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!