8000 [64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 by eqy · Pull Request #141923 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

[64-bit][CUDA] Upsample2D 64-bit indexing fix attempt 2 #141923

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

eqy
Copy link
Collaborator
@eqy eqy commented Dec 2, 2024

#141831
Block/thread math requires a cast...

cc @ptrblck @msaroufim

@eqy eqy added module: cuda Related to torch.cuda, and CUDA support in general open source topic: bug fixes topic category topic: not user facing topic category labels Dec 2, 2024
@eqy eqy requested a review from syed-ahmed as a code owner December 2, 2024 23:56
Copy link
pytorch-bot bot commented Dec 2, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141923

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 170c003 with merge base 56f6289 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Dec 3, 2024
@eqy
Copy link
Collaborator Author
eqy commented Dec 17, 2024

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased upsample2d64 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout upsample2d64 && git pull --rebase)

@eqy
Copy link
Collaborator Author
eqy commented Dec 18, 2024

@pytorchmergebot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 18, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm6.2-py3.10 / test (default, 1, 2, linux.rocm.gpu)

Details for Dev Infra team Raised by workflow job

@eqy
Copy link
Collaborator Author
eqy commented Dec 31, 2024

@pytorchmergebot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/141923/head returned non-zero exit code 1

Rebasing (1/6)
Auto-merging aten/src/ATen/native/cuda/UpSampleNearest2d.cu
CONFLICT (content): Merge conflict in aten/src/ATen/native/cuda/UpSampleNearest2d.cu
Auto-merging test/test_nn.py
error: could not apply 9c095b5919... check in
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 9c095b5919... check in

Raised by https://github.com/pytorch/pytorch/actions/runs/12563012288

@eqy
Copy link
Collaborator Author
eqy commented Dec 31, 2024

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@eqy
Copy link
Collaborator Author
eqy commented Jan 1, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@eqy
Copy link
Collaborator Author
eqy commented Jan 2, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/141923/head returned non-zero exit code 1

Rebasing (1/4)
Auto-merging aten/src/ATen/native/cuda/UpSampleNearest2d.cu
CONFLICT (content): Merge conflict in aten/src/ATen/native/cuda/UpSampleNearest2d.cu
Auto-merging test/test_nn.py
error: could not apply 9c095b5919... check in
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 9c095b5919... check in

Raised by https://github.com/pytorch/pytorch/actions/runs/12576421498

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@eqy
Copy link
Collaborator Author
eqy commented Jan 3, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@eqy
Copy link
Collaborator Author
eqy commented Jan 4, 2025

@pytorchmergebot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@eppaneamd
Copy link

@eqy @ptrblck @msaroufim there is likely a similar issue with upsample_nearest3d, would it be possible for you to verify this? 🙏

[rank1]:     other_frames = F.interpolate(other_frames, scale_factor=self.upsample_factor, mode="nearest")
[rank1]:   File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/nn/functional.py", line 4538, in interpolate
[rank1]:     return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
[rank1]: RuntimeError: upsample_nearest3d only supports output tensors with less than INT_MAX elements, but got [1, 256, 16, 720, 1280]

@@ -9961,7 +9961,8 @@ def test_upsamplingTrilinear3d(self, device, align_corners, memory_format):
gradgradcheck(lambda x: F.interpolate(x, out_size, **kwargs), [input])

@onlyCUDA
@dtypes(torch.half)
@skipCUDAIfRocm(msg="launch bounds error out on ROCM")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eqy curious do we know if this was a regression on ROCm caused by this PR, or new failures from the dtypes change? cc: @jeffdaily

pytorchmergebot pushed a commit that referenced this pull request Jan 29, 2025
Fixes #144855

Follows approach in #141923 to use int64 types to increase INT_MAX limits
Pull Request resolved: #144865
Approved by: https://github.com/eqy
jataylo added a commit to jataylo/pytorch that referenced this pull request Jan 31, 2025
Fixes pytorch#144855

Follows approach in pytorch#141923 to use int64 types to increase INT_MAX limits
Pull Request resolved: pytorch#144865
Approved by: https://github.com/eqy

(cherry picked from commit 082fab0)
pruthvistony pushed a commit to ROCm/pytorch that referenced this pull request Jan 31, 2025
…orch#144865) (#1869)

Fixes pytorch#144855

Follows approach in pytorch#141923 to
use int64 types to increase INT_MAX limits Pull Request resolved:
pytorch#144865 Approved by:
https://github.com/eqy

(cherry picked from commit 082fab0)
jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Feb 18, 2025
…orch#144865) (#1869)

Fixes pytorch#144855

Follows approach in pytorch#141923 to
use int64 types to increase INT_MAX limits Pull Request resolved:
pytorch#144865 Approved by:
https://github.com/eqy

(cherry picked from commit 082fab0)
(cherry picked from commit 5d01868)
jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Feb 19, 2025
…orch#144865) (#1869)

Fixes pytorch#144855

Follows approach in pytorch#141923 to
use int64 types to increase INT_MAX limits Pull Request resolved:
pytorch#144865 Approved by:
https://github.com/eqy

(cherry picked from commit 082fab0)
(cherry picked from commit 5d01868)
jithunnair-amd pushed a commit to ROCm/pytorch that referenced this pull request Feb 20, 2025
…orch#144865) (#1869)

Fixes pytorch#144855

Follows approach in pytorch#141923 to
use int64 types to increase INT_MAX limits Pull Request resolved:
pytorch#144865 Approved by:
https://github.com/eqy

(cherry picked from commit 082fab0)
(cherry picked from commit 5d01868)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: cuda Related to torch.cuda, and CUDA support in general open source topic: bug fixes topic category topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0