-
Notifications
You must be signed in to change notification settings - Fork 24.2k
Enable _lazy_clone
between CPU and MPS
#148408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/kurtamohler/32/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/148408
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 3b1a2c7 with merge base 56e1c23 ( UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Attention! native_functions.yaml was changedIf you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. See https://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info. Caused by: |
ghstack-source-id: 7c251ee Pull Request resolved: pytorch#148408
ghstack-source-id: 22e24cc Pull Request resolved: pytorch#148408
ghstack-source-id: 7cb9a35 Pull Request resolved: pytorch#148408
ghstack-source-id: ba80390 Pull Request resolved: pytorch#148408
ghstack-source-id: 97b5249 Pull Request resolved: pytorch#148408
ghstack-source-id: fdcc972 Pull Request resolved: pytorch#148408
virtual const void* get_cpu_ptr_from_device_ptr(const void* device_ptr) const; | ||
virtual void* get_device_ptr_from_cpu_ptr(void* cpu_ptr) const; | ||
virtual const void* get_device_ptr_from_cpu_ptr(const void* cpu_ptr) const; | ||
virtual bool has_unified_memory() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to add this new concept here?
I would expect that, in the context of MPS, cpu Tensor is pure cpu, pinned cpu and mps Tensors are unified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When CPU and Metal operators access the same location in the shared memory space, they use different addresses. (I wonder if there's a way to make them use the same address space?) So when we lazy clone an MPS tensor to pinned-CPU, we need a way to translate the MPS address into CPU address space and set the output DataPtr to that. Likewise, we need to translate in the other direction for pinned-CPU to MPS lazy clone. These functions provide an API for that
ghstack-source-id: fdcc972 Pull Request resolved: pytorch#148408
ghstack-source-id: f6ae96d Pull Request resolved: pytorch#148408
ghstack-source-id: a863bdd Pull Request resolved: pytorch#148408
ghstack-source-id: 344f648 Pull Request resolved: pytorch/pytorch#148408
ghstack-source-id: ead3b2f Pull Request resolved: pytorch#148408
ghstack-source-id: ead3b2f Pull Request resolved: pytorch#148408
ghstack-source-id: 3f9e14a Pull Request resolved: pytorch#148408
ghstack-source-id: eb3a36e Pull Request resolved: pytorch#148408
ghstack-source-id: eb3a36e Pull Request resolved: pytorch#148408
ghstack-source-id: eb3a36e Pull Request resolved: pytorch#148408
ghstack-source-id: eb3a36e Pull Request resolved: pytorch#148408
ghstack-source-id: eb3a36e Pull Request resolved: pytorch#148408
ghstack-source-id: 0179df2 Pull Request resolved: pytorch#148408
ghstack-source-id: 0179df2 Pull Request resolved: pytorch#148408
ghstack-source-id: 0179df2 Pull Request resolved: pytorch#148408
ghstack-source-id: 5e1efe9 Pull Request resolved: pytorch#148408
Adds
device
arg to_lazy_clone
to enable lazy cloning data from one device to another. At the moment, only the following cases are supported:This PR also adds support for pinned CPU tensors on MPS builds, which was not working properly before.
Stack from ghstack (oldest at bottom):
Tensor.to
between CPU and MPS #150569_lazy_clone
between CPU and MPS #148408cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov