10BC0 [MPSHooks] Release pending command encoder by malfet · Pull Request #164093 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@malfet
8000 Copy link
Contributor
@malfet malfet commented Sep 29, 2025

Stack from ghstack (oldest at bottom):

Before returning a comand buffer, as subsequent calle are very likely to allocate their own encoder, which results in the following runtime error

 tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1090: failed assertion `A command encoder is already encoding to this command buffer'

Added regression test to test_mps_extension

Please note, that torch::mps::get_command_buffer() should be called with dispatch_queue held, both before and after this change, but many implementations skip that

Fixes #163721

Before returning a comand buffer, as subsequent calle are very likely to
allocate their own encoder, which results in the following runtime error
```
 tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1090: failed assertion `A command encoder is already encoding to this command buffer'
```

Fixes #163721

[ghstack-poisoned]
@malfet malfet requested a review from kulinseth as a code owner September 29, 2025 02:00
@pytorch-bot
Copy link
pytorch-bot bot commented Sep 29, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164093

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4331f79 with merge base 6ba83e0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Sep 29, 2025
malfet added a commit that referenced this pull request Sep 29, 2025
Before returning a comand buffer, as subsequent calle are very likely to
allocate their own encoder, which results in the following runtime error
```
 tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1090: failed assertion `A command encoder is already encoding to this command buffer'
```

Fixes #163721

ghstack-source-id: b214852
Pull Request resolved: #164093
@malfet malfet requested review from Skylion007 and dcci September 29, 2025 05:25
@malfet malfet added the topic: bug fixes topic category label Sep 29, 2025
malfet added a commit that referenced this pull request Sep 29, 2025
To avoid regression on MacOS 26

Fixes #164093

[ghstack-poisoned]
malfet added a commit that referenced this pull request Sep 29, 2025
To avoid regression on MacOS 26

Fixes #164093

ghstack-source-id: 65e18a6
Pull Request resolved: #164108

void mps_add_one_new_encoder(const at::Tensor& input) {
using namespace at::native::mps;
TORCH_CHECK(input.is_mps());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use TORCH_CHECK_VALUE for these two checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not matter, as this is an internal test, I'm unsure why they contain those checks to begin with, as they are tests no examples

@malfet
Copy link
Contributor Author
malfet commented Sep 29, 2025
8000

@pytorchbot merge -f "Lint + MPS are green"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@Camyll
Copy link
Contributor
Camyll commented Oct 1, 2025

@pytorchbot cherry-pick --onto release/2.9 --c critical

pytorchbot pushed a commit that referenced this pull request Oct 1, 2025
Before returning a comand buffer, as subsequent calle are very likely to allocate their own encoder, which results in the following runtime error
```
 tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1090: failed assertion `A command encoder is already encoding to this command buffer'
```

Added regression test to `test_mps_extension`

Please note, that `torch::mps::get_command_buffer()` should be called with dispatch_queue held, both before and after this change, but many implementations skip that

Fixes #163721
Pull Request resolved: #164093
Approved by: https://github.com/atalman, https://github.com/Skylion007

(cherry picked from commit 8f32adc)
@pytorchbot
Copy link
Collaborator

Cherry picking #164093

The cherry pick PR is at #164365 and it is recommended to link a critical cherry pick PR with an issue. The following tracker issues are updated:

Details for Dev Infra team Raised by workflow job

Camyll pushed a commit that referenced this pull request Oct 1, 2025
[MPSHooks] Release pending command encoder (#164093)

Before returning a comand buffer, as subsequent calle are very likely to allocate their own encoder, which results in the following runtime error
```
 tryCoalescingPreviousComputeCommandEncoderWithConfig:nextEncoderClass:]:1090: failed assertion `A command encoder is already encoding to this command buffer'
```

Added regression test to `test_mps_extension`

Please note, that `torch::mps::get_command_buffer()` should be called with dispatch_queue held, both before and after this change, but many implementations skip that

Fixes #163721
Pull Request resolved: #164093
Approved by: https://github.com/atalman, https://github.com/Skylion007

(cherry picked from commit 8f32adc)

Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>
@github-actions github-actions bot deleted the gh/malfet/538/head branch November 1, 2025 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) Merged release notes: mps Release notes category topic: bug fixes topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants

0