-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[ROCm] check stream graph capture status in memcpy_and_sync inline function #158165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158165
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit 5eccaf2 with merge base 26807dc ( NEW FAILURE - The following job has failed:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
Previously AMD said that hipMemcpyWithStream has a much better perf than cudaStreamSynchronize, but if you think that's no longer important I'm fine with this change |
|
cc @houseroad |
|
@zoranzhao Can the workaround introduced in this PR be removed? |
@ngimel Confirmed internally that this is still the case: hipMemcpyWithStream is more performant. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
This reverts commit 4bcc1ec1be177cbf100bda954bbd5f2ce54fa93f.
… are in a stream graph graph capture.
This reverts commit 16283094a1791ab8a3c47b72b74d60144608fa43.
|
Successfully rebased |
93c5481 to
5eccaf2
Compare
|
@pytorchbot merge -f "rocm-only change inside an #ifdef USE_ROCM, rocm CI is fully passing" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Deleting unused workaround per discussion here: #158165 (comment) Pull Request resolved: #158486 Approved by: https://github.com/jeffdaily, https://github.com/houseroad
…ured in a cudagraph (#158878) Unit test for this PR: #158165 This unit test verifies that a runtime error is raised when tensor.item() operation is captured in a cudagraph. Equally valid for ROCm and CUDA. Pull Request resolved: #158878 Approved by: https://github.com/jeffdaily, https://github.com/ngimel
Check for stream graph capture when using hipMemcpyWithStream.
Fixes #155684, #155231
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang