initialize device when pinning memory on this device, short circuit i… #145752

ngimel · 2025-01-27T18:58:49Z

…s_pinned if device is not initialized
Do not land
RFC
potential fix for #144687

Now .is_pinned(device="cuda") does not initialize device and thus doesn't poison the fork (but it complains about device arg being deprecated). To not need device= arg we'd need to fix get_accelerator to not initialize device.
cc @malfet, @albanD

…s_pinned if device is not initialized

pytorch-bot · 2025-01-27T18:58:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145752

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ad816b7 with merge base 30dea84 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

SGTM !

aten/src/ATen/Context.h

Co-authored-by: albanD <albandes@fb.com>

ngimel · 2025-01-27T19:26:31Z

Hm no this actually initializes context on the 0th device (I think it shouldn't have, but I'll have to check)

In [1]: import torch

In [2]: torch.cuda.list_gpu_processes()
Out[2]: 'GPU:0\nno processes are running'

In [3]: x=torch.empty(1, pin_memory=True)

In [4]: torch.cuda.list_gpu_processes()
Out[4]: 'GPU:0\nprocess    1105514 uses     1188.000 MB GPU memory'

so won't work as is

ngimel · 2025-01-28T21:48:54Z

The behavior above is caused by pinning itself, not by initializing context, so this diff is a strict improvement over existing state.

albanD

Sounds good!

ngimel · 2025-01-29T22:18:52Z

@pytorchbot merge

pytorchmergebot · 2025-01-29T22:20:36Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-01-30T04:19:16Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

ngimel · 2025-01-30T21:35:22Z

@pytorchbot merge

pytorchmergebot · 2025-01-30T21:37:03Z

< 8000 /tbody>

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch#145752) …s_pinned if device is not initialized Do not land RFC potential fix for pytorch#144687 Now `.is_pinned(device="cuda")` does not initialize device and thus doesn't poison the fork (but it complains about `device` arg being deprecated). To not need `device=` arg we'd need to fix get_accelerator to not initialize device. Pull Request resolved: pytorch#145752 Approved by: https://github.com/albanD Co-authored-by: albanD <albandes@fb.com>

PR pytorch#145752 has added a check in the isPinnedPtr to check if a device is initialized before checking if the tensor is pinned. Also that PR has added a lazy initialization trigger when an at::empty is called with a pinned param set to true. However, when the tensor is firstly created and it is pinned in a separate call by calling pin_memory() function, lazy device init is not called so is_pinned returns always false. With this PR, the lazy initialization is moved to getPinnedMemoryAllocator function, thus it is assured that device is initialized before we pin a tensor.

@ngimel

…itialization. (#149033) PR #145752 has added a check in the isPinnedPtr to check if a device is initialized before checking if the tensor is pinned. Also that PR has added a lazy initialization trigger when an at::empty is called with a pinned param set to true. However, when the tensor is firstly created and it is pinned in a separate call by calling pin_memory() function, lazy device init is not called so is_pinned returns always false. With this PR, the lazy initialization is moved to getPinnedMemoryAllocator function, thus it is assured that device is initialized before we pin a tensor. Fixes #149032 @ngimel @albanD Pull Request resolved: #149033 Approved by: https://github.com/ngimel, https://github.com/albanD

@ngimel

…itialization. (#149033) PR #145752 has added a check in the isPinnedPtr to check if a device is initialized before checking if the tensor is pinned. Also that PR has added a lazy initialization trigger when an at::empty is called with a pinned param set to true. However, when the tensor is firstly created and it is pinned in a separate call by calling pin_memory() function, lazy device init is not called so is_pinned returns always false. With this PR, the lazy initialization is moved to getPinnedMemoryAllocator function, thus it is assured that device is initialized before we pin a tensor. Fixes #149032 @ngimel @albanD Pull Request resolved: #149033 Approved by: https://github.com/ngimel, https://github.com/albanD (cherry picked from commit 420a9be)

@ngimel

…itialization. (#149183) [regression] Fix pin_memory() when it is called before device lazy initialization. (#149033) PR #145752 has added a check in the isPinnedPtr to check if a device is initialized before checking if the tensor is pinned. Also that PR has added a lazy initialization trigger when an at::empty is called with a pinned param set to true. However, when the tensor is firstly created and it is pinned in a separate call by calling pin_mem 7715 ory() function, lazy device init is not called so is_pinned returns always false. With this PR, the lazy initialization is moved to getPinnedMemoryAllocator function, thus it is assured that device is initialized before we pin a tensor. Fixes #149032 @ngimel @albanD Pull Request resolved: #149033 Approved by: https://github.com/ngimel, https://github.com/albanD (cherry picked from commit 420a9be) Co-authored-by: Bartlomiej Stemborowski <bstemborowskix@habana.ai>

initialize device when pinning memory on this device, short circuit i…

ad4d697

…s_pinned if device is not initialized

albanD approved these changes Jan 27, 2025

View reviewed changes

aten/src/ATen/Context.h Outdated Show resolved Hide resolved

Update aten/src/ATen/Context.h

fe16605

Co-authored-by: albanD <albandes@fb.com>

ngimel added the release notes: cuda release notes category label Jan 27, 2025

ngimel added 2 commits January 28, 2025 11:52

test

b4db6f2

lint

ad816b7

albanD approved these changes Jan 29, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 29, 2025

pytorchmergebot added the merging label Jan 29, 2025

pytorch-bot bot temporarily deployed to upload-benchmark-results January 29, 2025 22:49 Inactive

pytorchmergebot closed this in 08ff11e Jan 30, 2025

pytorchmergebot added Merged and removed merging labels Jan 30, 2025

github-actions bot deleted the ngimel/pinned_init branch March 2, 2025 02:10

pytorchbot mentioned this pull request Mar 14, 2025

[regression] Fix pin_memory() when it is called before device lazy initialization. #149183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

initialize device when pinning memory on this device, short circuit i… #145752

initialize device when pinning memory on this device, short circuit i… #145752

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

initialize device when pinning memory on this device, short circuit i… #145752

initialize device when pinning memory on this device, short circuit i… #145752

Uh oh!

Conversation

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/145752

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!

Uh oh!

Merge started

Uh oh!

Uh oh!