8000 [ROCm] check stream graph capture status in memcpy_and_sync inline function by naromero77amd · Pull Request #158165 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@naromero77amd
Copy link
Collaborator
@naromero77amd naromero77amd commented Jul 11, 2025

Check for stream graph capture when using hipMemcpyWithStream.

Fixes #155684, #155231

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@pytorch-bot
Copy link
pytorch-bot bot commented Jul 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158165

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 5eccaf2 with merge base 26807dc (image):

NEW FAILURE - The following job has failed:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jul 11, 2025
@naromero77amd naromero77amd marked this pull request as draft July 11, 2025 23:27
@naromero77amd naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025
@pytorch-bot
Copy link
pytorch-bot bot commented Jul 11, 2025

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025
@naromero77amd naromero77amd added topic: not user facing topic category ciflow/rocm Trigger "default" config CI on ROCm labels Jul 11, 2025
@pytorch-bot
Copy link
pytorch-bot bot commented Jul 11, 2025

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025
@naromero77amd naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025
@pytorch-bot
Copy link
pytorch-bot bot commented Jul 11, 2025

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025
@naromero77amd naromero77amd requested a review from jeffdaily July 12, 2025 00:07
@ngimel
Copy link
Collaborator
ngimel commented Jul 14, 2025

Previously AMD said that hipMemcpyWithStream has a much better perf than cudaStreamSynchronize, but if you think that's no longer important I'm fine with this change

@ngimel
Copy link
Collaborator
ngimel commented Jul 14, 2025

cc @houseroad

@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 14, 2025
@jeffdaily jeffdaily requested a review from zoranzhao July 14, 2025 18:49
@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025
@naromero77amd naromero77amd changed the title [ROCm] Fix tensor.item() for ROCm [ROCm] check stream graph capture status on memcpy_and_sync inline function Jul 15, 2025
@naromero77amd naromero77amd changed the title [ROCm] check stream graph capture status on memcpy_and_sync inline function [ROCm] check stream graph capture status in memcpy_and_sync inline function Jul 15, 2025
@naromero77amd
Copy link
Collaborator Author

@zoranzhao Can the workaround introduced in this PR be removed?
#133054

@naromero77amd
Copy link
Collaborator Author
naromero77amd commented Jul 15, 2025

Previously AMD said that hipMemcpyWithStream has a much better perf than cudaStreamSynchronize, but if you think that's no longer important I'm fine with this change

@ngimel Confirmed internally that this is still the case: hipMemcpyWithStream is more performant.

@pytorch-bot pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025
@pytorch-bot pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025
@naromero77amd naromero77amd marked this pull request as ready for review July 15, 2025 16:45
@naromero77amd
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased bug_rocm_item onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout bug_rocm_item && git pull --rebase)

@naromero77amd naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025
@jeffdaily
Copy link
Collaborator

@pytorchbot merge -f "rocm-only change inside an #ifdef USE_ROCM, rocm CI is fully passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@naromero77amd naromero77amd deleted the bug_rocm_item branch July 16, 2025 22:29
pytorchmergebot pushed a commit that referenced this pull request Jul 23, 2025
pytorchmergebot pushed a commit that referenced this pull request Jul 23, 2025
…ured in a cudagraph (#158878)

Unit test for this PR: #158165

This unit test verifies that a runtime error is raised when tensor.item() operation is captured in a cudagraph. Equally valid for ROCm and CUDA.

Pull Request resolved: #158878
Approved by: https://github.com/jeffdaily, https://github.com/ngimel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rocm Trigger "default" config CI on ROCm Merged module: rocm AMD GPU support for Pytorch open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[rocm] HIP Graph (on AMD GPU) capture does not raise operation not permitted for illegal operation whereas CUDA Graph (Nvidia GPU) does

6 participants

0