[64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 #141923

eqy · 2024-12-02T23:56:15Z

#141831
Block/thread math requires a cast...

cc @ptrblck @msaroufim

pytorch-bot · 2024-12-02T23:56:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141923

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 170c003 with merge base 56f6289 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

linux-binary-manywheel / manywheel-py3_9-cuda12_4-test / test (gh) (trunk failure)
Process completed with exit code 1.
linux-binary-manywheel / manywheel-py3_9-cuda12_6-test / test (gh) (trunk failure)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/cuda/UpSampleNearest2d.cu

eqy · 2024-12-17T19:30:10Z

@pytorchmergebot rebase

pytorchmergebot · 2024-12-17T19:31:49Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-12-17T19:31:55Z

Successfully rebased upsample2d64 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout upsample2d64 && git pull --rebase)

eqy · 2024-12-18T18:47:37Z

@pytorchmergebot merge

pytorchmergebot · 2024-12-18T18:49:28Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-18T19:37:27Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm6.2-py3.10 / test (default, 1, 2, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

eqy · 2024-12-31T18:58:15Z

@pytorchmergebot rebase

pytorchmergebot · 2024-12-31T18:59:47Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-12-31T18:59:49Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/141923/head returned non-zero exit code 1

Rebasing (1/6)
Auto-merging aten/src/ATen/native/cuda/UpSampleNearest2d.cu
CONFLICT (content): Merge conflict in aten/src/ATen/native/cuda/UpSampleNearest2d.cu
Auto-merging test/test_nn.py
error: could not apply 9c095b5919... check in
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 9c095b5919... check in

Raised by https://github.com/pytorch/pytorch/actions/runs/12563012288

eqy · 2024-12-31T22:01:26Z

@pytorchmergebot merge

pytorchmergebot · 2024-12-31T22:03:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-01-01T04:01:47Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

eqy · 2025-01-01T04:12:59Z

@pytorchmergebot merge

pytorchmergebot · 2025-01-01T04:14:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-01-01T10:13:19Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

eqy · 2025-01-02T01:55:28Z

@pytorchmergebot merge

pytorchmergebot · 2025-01-02T01:55:45Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2025-01-02T01:56:51Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-01-02T01:56:53Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/141923/head returned non-zero exit code 1

Rebasing (1/4)
Auto-merging aten/src/ATen/native/cuda/UpSampleNearest2d.cu
CONFLICT (content): Merge conflict in aten/src/ATen/native/cuda/UpSampleNearest2d.cu
Auto-merging test/test_nn.py
error: could not apply 9c095b5919... check in
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 9c095b5919... check in

Raised by https://github.com/pytorch/pytorch/actions/runs/12576421498

pytorchmergebot · 2025-01-02T01:57:31Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-01-02T07:56:07Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

eqy · 2025-01-03T19:30:13Z

@pytorchmergebot merge

pytorchmergebot · 2025-01-03T19:31:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-01-04T01:30:33Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

eqy · 2025-01-04T02:28:38Z

@pytorchmergebot merge

pytorchmergebot · 2025-01-04T02:30:16Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

eppaneamd · 2025-01-10T14:48:51Z

@eqy @ptrblck @msaroufim there is likely a similar issue with upsample_nearest3d, would it be possible for you to verify this? 🙏

[rank1]:     other_frames = F.interpolate(other_frames, scale_factor=self.upsample_factor, mode="nearest")
[rank1]:   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 4538, in interpolate
[rank1]:     return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
[rank1]: RuntimeError: upsample_nearest3d only supports output tensors with less than INT_MAX elements, but got [1, 256, 16, 720, 1280]

jataylo · 2025-01-13T09:30:41Z

test/test_nn.py

@@ -9961,7 +9961,8 @@ def test_upsamplingTrilinear3d(self, device, align_corners, memory_format):
            gradgradcheck(lambda x: F.interpolate(x, out_size, **kwargs), [input])

    @onlyCUDA
-    @dtypes(torch.half)
+    @skipCUDAIfRocm(msg="launch bounds error out on ROCM")


@eqy curious do we know if this was a regression on ROCm caused by this PR, or new failures from the dtypes change? cc: @jeffdaily

Fixes #144855 Follows approach in #141923 to use int64 types to increase INT_MAX limits Pull Request resolved: #144865 Approved by: https://github.com/eqy

Fixes pytorch#144855 Follows approach in pytorch#141923 to use int64 types to increase INT_MAX limits Pull Request resolved: pytorch#144865 Approved by: https://github.com/eqy (cherry picked from commit 082fab0)

…orch#144865) (#1869) Fixes pytorch#144855 Follows approach in pytorch#141923 to use int64 types to increase INT_MAX limits Pull Request resolved: pytorch#144865 Approved by: https://github.com/eqy (cherry picked from commit 082fab0)

…orch#144865) (#1869) Fixes pytorch#144855 Follows approach in pytorch#141923 to use int64 types to increase INT_MAX limits Pull Request resolved: pytorch#144865 Approved by: https://github.com/eqy (cherry picked from commit 082fab0) (cherry picked from commit 5d01868)

eqy added module: cuda Related to torch.cuda, and CUDA support in general open source topic: bug fixes topic category topic: not user facing topic category labels Dec 2, 2024

eqy requested a review from syed-ahmed as a code owner December 2, 2024 23:56

eqy mentioned this pull request Dec 2, 2024

Nearest-neighbour upsampling with interpolate silently fails on CUDA when output size exceeds 2^31 #141831

Open

ngimel reviewed Dec 3, 2024

View reviewed changes

aten/src/ATen/native/cuda/UpSampleNearest2d.cu Outdated Show resolved Hide resolved

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 3, 2024

pytorchmergebot force-pushed the upsample2d64 branch from da97aa0 to 2a08365 Compare December 17, 2024 19:31

ngimel approved these changes Dec 18, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 18, 2024

pytorchmergebot added the merging label Dec 18, 2024

pytorchmergebot removed the merging label Dec 18, 2024

pytorchmergebot added the merging label Dec 31, 2024

eqy added 4 commits January 3, 2025 18:58

check in

02f1615

Update UpSampleNearest2d.cu

2a4c7b3

Update test_nn.py

971f346

Update test_nn.py

170c003

eqy force-pushed the upsample2d64 branch from 4314872 to 170c003 Compare January 3, 2025 19:11

pytorchmergebot added the Merged label Jan 4, 2025

pytorchmergebot closed this in dbdda65 Jan 4, 2025

pytorchmergebot removed the merging label Jan 4, 2025

jataylo reviewed Jan 13, 2025

View reviewed changes

jataylo mentioned this pull request Jan 15, 2025

[64-bit] Int64 casting for UpSampleNearest3D #144865

Closed

jataylo mentioned this pull request Jan 31, 2025

[SWDEV-509031] [CP] [64-bit] Int64 casting for UpSampleNearest3D (#144865) ROCm/pytorch#1869

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 #141923

[64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 #141923

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 #141923

[64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 #141923

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141923

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Merge failed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!