10000 Workaround for gather_out in MPS backend by jhavukainen · Pull Request #135543 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Workaround for gather_out in MPS backend #135543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

jhavukainen
Copy link
Collaborator
@jhavukainen jhavukainen commented Sep 9, 2024

Avoids an underlying issue in reshape op in MPS that gets triggered when the input has multiple dimensions but the shape can be squeezed into 1D. The underlying issue is going to get fixed eventually.

Fixes #135240

…st are ones. Avoids an underlying issue in reshape op in MPS.
Copy link
pytorch-bot bot commented Sep 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/135543

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0e65478 with merge base 4717cd1 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Sep 9, 2024
@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 10, 2024
Copy link
Contributor
github-actions bot commented Nov 9, 2024

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Nov 9, 2024
@github-actions github-actions bot closed this Dec 9, 2024
@jhavukainen jhavukainen reopened this Dec 18, 2024
@jhavukainen
Copy link
Collaborator Author
jhavukainen commented Dec 18, 2024

@pytorchbot rebase

// If this op is failing and isMacos15_3 = 1, the OS level fix has not
// landed and we'll need to increment the version check until it does.
bool workaroundSingleDim = (self_arg.squeeze().sizes().size() == 1 && self_arg.sizes().size() > 1);
bool isMacos15_3 = is_macos_13_or_newer(MacOSVersion::MACOS_VER_15_3_PLUS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels wrong to me. We don't know what will or will not be fixed in macOS-15.3, so let's not condition this workaround. When OS is released we can add the check and run tests

@malfet
Copy link
Contributor
malfet commented Dec 18, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/135543/head returned non-zero exit code 1

Rebasing (1/12)
Auto-merging aten/src/ATen/mps/MPSDevice.h
CONFLICT (content): Merge conflict in aten/src/ATen/mps/MPSDevice.h
Auto-merging aten/src/ATen/mps/MPSDevice.mm
CONFLICT (content): Merge conflict in aten/src/ATen/mps/MPSDevice.mm
Auto-merging aten/src/ATen/native/mps/operations/ScatterGather.mm
Auto-merging test/test_mps.py
error: could not apply 84aaa77a594... Workaround for gather_out when there is only 1 dim of interest and rest are ones. Avoids an underlying issue in reshape op in MPS.
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Could not apply 84aaa77a594... Workaround for gather_out when there is only 1 dim of interest and rest are ones. Avoids an underlying issue in reshape op in MPS.

Raised by https://github.com/pytorch/pytorch/actions/runs/12402330979

Copy link
Contributor
@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I just run this test on 2.6.0 release candidate and it seems to be working just fine, so may be test needs to be expanded

@malfet
Copy link
Contributor
malfet commented Dec 18, 2024

@jhavukainen do you know if it got fixed on OS end by any chance?

@malfet malfet self-requested a review December 19, 2024 01:00
@jhavukainen
Copy link
Collaborator Author

@malfet I confirmed that the change landed in MacOS 15.2. I'll add the now accurate OS version check back

@malfet malfet added the topic: bug fixes topic category label Dec 19, 2024
@malfet
Copy link
Contributor
malfet commented Dec 19, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 19, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@malfet
Copy link
Contributor
malfet commented Dec 19, 2024

@pytorchbot merge -f "Lint and MPS are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the dev/joona/take_along_dim branch January 19, 2025 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/mps Run MPS tests (subset of trunk) ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: mps Release notes category Stale topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: MPS backend take_along_dim crash/assertion fail
5 participants
0