10000 Fix DLPack stream logic. by ysiraichi · Pull Request #150217 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Fix DLPack stream logic. #150217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: gh/ysiraichi/85/base
Choose a base branch
from
Open

Conversation

ysiraichi
Copy link
Collaborator
@ysiraichi ysiraichi commented Mar 28, 2025

Stack from ghstack (oldest at bottom):

This PR fixes the logic for dealing with CUDA and ROCm streams whenever
we are trying to create a DLPack capsule from a tensor.

In summary, this PR:

  • Uses the legacy default stream if tensor.__dlpack__(stream=None) is
    called for a CUDA tensor.
  • Errors if tensor.__dlpack__(stream=2) is called for a CUDA tensor:
    PyTorch doesn't support the per-thread default stream.
  • Errors if tensor.__dlpack__(stream=stream), where stream is 1 or
    2, is called for a CUDA tensor using ROCm.

For more details, see the documentation.

[ghstack-poisoned]
Copy link
pytorch-bot bot commented Mar 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150217

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 476b1ea with merge base 85bfaf8 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Divigroup-RAP pushed a commit to Divigroup-RAP/PYTORCH that referenced this pull request Apr 22, 2025
This PR fixes the logic for dealing with CUDA and ROCm streams whenever
we are trying to create a DLPack capsule from a tensor.

In summary, this PR:

- Uses the legacy default stream if `tensor.__dlpack__(stream=None)` is
  called for a CUDA tensor.
- Errors if `tensor.__dlpack__(stream=2)` is called for a CUDA tensor:
  PyTorch doesn't support the per-thread default stream.
- Errors if `tensor.__dlpack__(stream=stream)`, where `stream` is 1 or
  2, is called for a CUDA tensor using ROCm.

For more details, see [the documentation][1].

[1]: https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html

ghstack-source-id: cc0e31c
Pull Request resolved: pytorch/pytorch#150217
[ghstack-poisoned]
[ghstack-poisoned]
Copy link
Collaborator
@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds ok even though this doesn't fix the multi-device case.

@@ -1717,23 +1717,37 @@ def __dlpack__(self, stream=None, max_version=None):
# Stream pointers in CUDA/ROCm are uniquely numbered and can
# be retrieved from their integer value.
raise TypeError("stream must be ``int`` or ``none``")
elif stream is not None and stream != -1:
elif stream != -1:
if self.device.type == "cuda":
# NB: This logic handles the special case values for default
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No update to dlpack.py ? :D

if is_cuda and stream == 2:
raise BufferError("per-thread default stream is not supported.")

assert is_cuda or (is_rocm and stream not in (1, 2)), (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a BufferError like above instead of AssertionError?

# Only synchronize on different streams
sync_stream = torch.cuda.current_stream()
if stream != sync_stream:
current_stream = torch.cuda.current_stream()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we care if self.device.index != torch.cuda.current_device() ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0