[ROCm] check stream graph capture status in memcpy_and_sync inline function #158165

naromero77amd · 2025-07-11T23:27:08Z

Check for stream graph capture when using hipMemcpyWithStream.

Fixes #155684, #155231

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

pytorch-bot · 2025-07-11T23:27:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158165

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 5eccaf2 with merge base 26807dc ():

NEW FAILURE - The following job has failed:

pull / linux-jammy-py3.13-clang12 / test (default, 1, 5, linux.4xlarge) (gh)
test_autograd.py::TestAutograd::test_pynode_destruction_deadlock

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2025-07-11T23:28:19Z

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2025-07-11T23:29:43Z

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

pytorch-bot · 2025-07-11T23:31:24Z

To add the ciflow label ciflow/rocm please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

ngimel · 2025-07-14T16:30:45Z

Previously AMD said that hipMemcpyWithStream has a much better perf than cudaStreamSynchronize, but if you think that's no longer important I'm fine with this change

ngimel · 2025-07-14T16:31:15Z

cc @houseroad

naromero77amd · 2025-07-15T15:33:25Z

@zoranzhao Can the workaround introduced in this PR be removed?
#133054

naromero77amd · 2025-07-15T15:34:13Z

Previously AMD said that hipMemcpyWithStream has a much better perf than cudaStreamSynchronize, but if you think that's no longer important I'm fine with this change

@ngimel Confirmed internally that this is still the case: hipMemcpyWithStream is more performant.

c10/cuda/CUDAFunctions.h

aten/src/ATen/native/cuda/CUDAScalar.cu

naromero77amd · 2025-07-15T21:44:38Z

@pytorchbot rebase

pytorchmergebot · 2025-07-15T21:46:10Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

This reverts commit 4bcc1ec1be177cbf100bda954bbd5f2ce54fa93f.

… are in a stream graph graph capture.

This reverts commit 16283094a1791ab8a3c47b72b74d60144608fa43.

pytorchmergebot · 2025-07-15T21:46:14Z

Successfully rebased bug_rocm_item onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout bug_rocm_item && git pull --rebase)

jeffdaily · 2025-07-16T17:15:40Z

@pytorchbot merge -f "rocm-only change inside an #ifdef USE_ROCM, rocm CI is fully passing"

pytorchmergebot · 2025-07-16T17:17:20Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Deleting unused workaround per discussion here: #158165 (comment) Pull Request resolved: #158486 Approved by: https://github.com/jeffdaily, https://github.com/houseroad

…ured in a cudagraph (#158878) Unit test for this PR: #158165 This unit test verifies that a runtime error is raised when tensor.item() operation is captured in a cudagraph. Equally valid for ROCm and CUDA. Pull Request resolved: #158878 Approved by: https://github.com/jeffdaily, https://github.com/ngimel

naromero77amd requested review from eqy and syed-ahmed as code owners July 11, 2025 23:27

pytorch-bot bot added the module: rocm AMD GPU support for Pytorch label Jul 11, 2025

naromero77amd marked this pull request as draft July 11, 2025 23:27

naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025

pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025

naromero77amd added topic: not user facing topic category ciflow/rocm Trigger "default" config CI on ROCm labels Jul 11, 2025

pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025

naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025

pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 11, 2025

pytorchbot added the open source label Jul 11, 2025

naromero77amd requested a review from jeffdaily July 12, 2025 00:07

pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 14, 2025

jeffdaily requested a review from zoranzhao July 14, 2025 18:49

pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025

naromero77amd changed the title ~~[ROCm] Fix tensor.item() for ROCm~~ [ROCm] check stream graph capture status on memcpy_and_sync inline function Jul 15, 2025

naromero77amd changed the title ~~[ROCm] check stream graph capture status on memcpy_and_sync inline function~~ [ROCm] check stream graph capture status in memcpy_and_sync inline function Jul 15, 2025

jeffdaily mentioned this pull request Jul 15, 2025

[ROCm] Remove usage of hipMemcpyWithStream usage in pytorch #158344

Closed

ngimel reviewed Jul 15, 2025

View reviewed changes

c10/cuda/CUDAFunctions.h Outdated Show resolved Hide resolved

houseroad reviewed Jul 15, 2025

View reviewed changes

aten/src/ATen/native/cuda/CUDAScalar.cu Show resolved Hide resolved

pytorch-bot bot added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025

pytorch-bot bot removed the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025

jeffdaily approved these changes Jul 15, 2025

View reviewed changes

naromero77amd marked this pull request as ready for review July 15, 2025 16:45

naromero77amd added 8 commits July 15, 2025 21:46

Fix ROCm code path divergence in inline memcpy_and_sync.

5573785

Fix ROCm code path divergence for scalar.

fba9f8e

Revert "Fix ROCm code path divergence in inline memcpy_and_sync."

7265803

This reverts commit 4bcc1ec1be177cbf100bda954bbd5f2ce54fa93f.

Keep hipMemcpyWithStream for performance reasons, but error out if we…

35b1324

… are in a stream graph graph capture.

Lint.

ea90dfe

Fix grammar.

ad6499a

Revert "Fix ROCm code path divergence for scalar."

39e1d92

This reverts commit 16283094a1791ab8a3c47b72b74d60144608fa43.

Use C10_LIKELY.

5eccaf2

pytorchmergebot force-pushed the bug_rocm_item branch from 93c5481 to 5eccaf2 Compare July 15, 2025 21:46

naromero77amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jul 15, 2025

pytorchmergebot added the merging label Jul 16, 2025

pytorchmergebot added the Merged label Jul 16, 2025

pytorchmergebot closed this in ff611d9 Jul 16, 2025

pytorchmergebot removed the merging label Jul 16, 2025

naromero77amd deleted the bug_rocm_item branch July 16, 2025 22:29

< A3E2 /a> naromero77amd mentioned this pull request Jul 22, 2025

[ROCm] UT verifies a runtime error is raised if tensor.item() is captured in a cudagraph #158878

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] check stream graph capture status in memcpy_and_sync inline function #158165

[ROCm] check stream graph capture status in memcpy_and_sync inline function #158165

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[ROCm] check stream graph capture status in memcpy_and_sync inline function #158165

[ROCm] check stream graph capture status in memcpy_and_sync inline function #158165

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158165

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants