[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` #150640

eqy · 2025-04-03T21:54:33Z

__CUDA_ARCH__ is not visible in host code, which causes incorrect launch bounds and too many resources requested for launch on blackwell

CC @atalman @malfet as we would want this in 2.7 @nWEIdia

cc @ptrblck @msaroufim

pytorch-bot · 2025-04-03T21:54:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150640

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 2ede186 with merge base 51da241 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, ephemeral.linux.2xlarge) (gh) (trunk failure)
'test/inductor/test_aot_inductor_arrayref.py::AOTInductorTestABICompatibleCpuWithStackAllocation::test_runtime_checks_large_cpu_with_stack_allocation'

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.4-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu) (gh) (#149370)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

malfet

LGTM

malfet · 2025-04-03T22:38:46Z

@pytorchbot merge

pytorchmergebot · 2025-04-03T22:40:34Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-04-03T23:01:44Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, ephemeral.linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

atalman

lgtm

atalman · 2025-04-04T13:03:57Z

@pytorchmergebot merge -f "failure already existing, lint is green"

pytorchmergebot · 2025-04-04T13:05:29Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

atalman · 2025-04-04T13:11:18Z

@pytorchbot cherry-pick --onto release/2.7 -c critical

@atalman

…0` (#150640) `__CUDA_ARCH__` is not visible in host code, which causes incorrect launch bounds and `too many resources requested for launch` on blackwell CC @atalman @malfet as we would want this in 2.7 @nWEIdia Pull Request resolved: #150640 Approved by: https://github.com/malfet, https://github.com/drisspg, https://github.com/atalman (cherry picked from commit 09c4da9)

pytorchbot · 2025-04-04T13:16:04Z

Cherry picking #150640

The cherry pick PR is at #150676 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

[v.2.7.0] Release Tracker #149044 (comment)

Details for Dev Infra team

Raised by workflow job

…0` (#150676) [CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` (#150640) `__CUDA_ARCH__` is not visible in host code, which causes incorrect launch bounds and `too many resources requested for launch` on blackwell Pull Request resolved: #150640 Approved by: https://github.com/malfet, https://github.com/drisspg, https://github.com/atalman (cherry picked from commit 09c4da9) Co-authored-by: Eddie Yan <eddiey@nvidia.com>

@atalman

…0` (pytorch#150640) `__CUDA_ARCH__` is not visible in host code, which causes incorrect launch bounds and `too many resources requested for launch` on blackwell CC @atalman @malfet as we would want this in 2.7 @nWEIdia Pull Request resolved: pytorch#150640 Approved by: https://github.com/malfet, https://github.com/drisspg, https://github.com/atalman

@atalman

…0` (pytorch#150640) `__CUDA_ARCH__` is not visible in host code, which causes incorrect launch bounds and `too many resources requested for launch` on blackwell CC @atalman @malfet as we would want this in 2.7 @nWEIdia Pull Request resolved: pytorch#150640 Approved by: https://github.com/malfet, https://github.com/drisspg, https://github.com/atalman

check in

2ede186

eqy added module: cuda Related to torch.cuda, and CUDA support in general open source topic: bug fixes topic category topic: not user facing topic category labels Apr 3, 2025

eqy requested a review from syed-ahmed as a code owner April 3, 2025 21:54

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 3, 2025

malfet approved these changes Apr 3, 2025

View reviewed changes

pytorchmergebot added the merging label Apr 3, 2025

pytorchmergebot removed the merging label Apr 3, 2025

drisspg approved these changes Apr 3, 2025

View reviewed changes

atalman approved these changes Apr 3, 2025

View reviewed changes

pytorchmergebot added the merging label Apr 4, 2025

pytorchmergebot closed this in 09c4da9 Apr 4, 2025

pytorchmergebot added Merged and removed merging labels Apr 4, 2025

pytorchbot mentioned this pull request Apr 4, 2025

[v.2.7.0] Release Tracker #149044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` #150640

[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` #150640

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[CUDA][avgpool2d] Fix backward launch bounds again for sm100, sm120 #150640

[CUDA][avgpool2d] Fix backward launch bounds again for sm100, sm120 #150640

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150640

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Cherry picking #150640

Uh oh!

Uh oh!

[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` #150640

[CUDA][avgpool2d] Fix backward launch bounds again for `sm100`, `sm120` #150640