8000 Allow `__array__` to automatically detach and move to CPU · Issue #36560 · pytorch/pytorch · GitHub
[go: up one dir, main page]

Skip to content

Allow __array__ to automatically detach and move to CPU #36560

8000
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jni opened this issue Apr 14, 2020 · 15 comments
Open

Allow __array__ to automatically detach and move to CPU #36560

jni opened this issue Apr 14, 2020 · 15 comments
Labels
feature A request for a proper, new feature. module: cuda Related to torch.cuda, and CUDA support in general module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@jni
Copy link
jni commented Apr 14, 2020

🚀 Feature

I would like __array__ to always implicitly detach and transfer to CPU before returning a numpy array, so that np.asarray(mytensor) is guaranteed to work.

Motivation

For good reasons detailed in this Discourse thread, a torch.Tensor with gradients needs to be .detach()ed before it is converted to NumPy, and further, if the Tensor is on the GPU it needs to be explicitly transferred back. Specifically:

People not very familiar with requires_grad and cpu/gpu Tensors might go back and forth with numpy. For example doing pytorch -> numpy -> pytorch and backward on the last Tensor. This will backward without issue but not all the way to the first part of the code and won’t raise any error.

As someone not very familiar with requires_grad, I am fully on board with this. 😅

However, the purpose of __array__ is to allow functions to be written against a unique API (NumPy) to work on arrays of other types without having to know anything about said array. Having to go through .detach()[.cpu()] breaks this assumption.

Pitch

The specific use case I have in mind is viewing tensors. In napari, we pass input arrays through __getitem__ and then through np.asarray, which lets us lazily view n-dimensional arrays as long as they satisfy the __array__ API. This works for NumPy, dask, zarr, and Xarray, but I stumbled when I was trying to demo it with torch Tensors. You can see a brief demo in this noise2self notebook:

https://github.com/jni/noise2self/blob/napari/notebooks/Intro%20to%20Neural%20Nets.ipynb

(You need the usual suspects, plus pip install napari or conda install -c conda-forge napari for the viewer.)

pytorch

Alternatives

Things could remain as is, and there are advantages (articulated in the Discourse post). However, this means that anyone else in the NumPy ecosystem that wants to play well with torch.Tensors then needs to include PyTorch-specific code in their code base, which is not super nice. (Or monkey-patch it, which is also not super nice!)

Additional context

original __array__ proposal: #2935
original __array__ implementation: #2945
possibly related issue: #9648

I couldn't find too much discussion on the issues or PRs about the decision not to .detach() automatically in __array__.

cc @ngimel

@jni
Copy link
Author
jni commented Apr 14, 2020

If people are ok with this I'd love to submit a PR. The only question I have is whether we should warn when detaching/CPUing — I would favour yes.

@emcastillo
Copy link
Collaborator

I agree that a warning should be explicitly shown in such a case.
Users might be breaking the computation graph without realizing.

@mrshenli mrshenli added feature A request for a proper, new feature. module: cuda Related to torch.cuda, and CUDA support in general module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 14, 2020
@mrshenli
Copy link
Contributor

cc @gchanan @ezyang

@ezyang
Copy link
Contributor
ezyang commented Apr 15, 2020

Related #13568, #1100

Our previous stance has been that we will NOT silently convert tensors from cuda to cpu under any circumstances, because this can lead to silent and hard to debug performance problems. That being said, I am sympathetic to your desire to not have PyTorch specific code in your codebase.

cc @mruberry

@mruberry
Copy link
Collaborator

I share @ezyang's thinking.

This sounds like a significant and scary change for PyTorch to make. While I appreciate the desire to keep your code base as elegant as possible, would it really be so bad to tell users to transfer their tensors to CPU or NumPy first? That way the device synchronization would be explicit.

@jni
Copy link
Author
jni commented Apr 17, 2020

Thanks all for your fast responses!

If there were only PyTorch arrays and NumPy arrays, it would not be a big deal to add code or a note in the docs about how to deal with them. But there are also all the arrays mentioned above, plus Jax, CuPy, tensorflow, chainer, mxnet, pint Quantities, PyOpenCL, unyt, ... And more to come, I'm sure! In all these cases, converting to NumPy results in information loss, but it would be a nightmare to have to support all of the individual conversions. So the issue for me isn't really "how hard is it really to deal with PyTorch properly", but "how can we have some kind of conversion standards that all array libraries can agree to implement". For example, would adding a force=False flag to __array__, so that users would specifically have to type force=True, help you agree to the standard?

I'm curious whether @rgommers, @shoyer or @seberg would have comments here about the best way forward for the community as a whole.

One proposal that could alleviate concerns is to simultaneously add support for __array_function__ (NEP-18). This would ensure that calling np.min and other numpy functions on the Tensor would always call the pytorch version of the function. (Which is good because right now np.min() on a Tensor results in a super-obscure error message that I can't decipher (see below).) This way, the only way to get a NumPy array out would be to explicitly ask for one with np.array or np.asarray, and never through implicit conversion from a different NumPy function.

I'd be happy to help (as much as I can, as a consumer of these libraries rather than a developer) push a solution forward that satisfies the PyTorch devs, but it should be a solution that is not PyTorch-specific.

My view is that implicit conversion with loud warnings would be fine, but I sense that that would be an uphill battle up a very steep hill. 😂

One last option that might be a reasonable intermediate alternative: a with torch.allow_implicit_detach() global switch and context manager that allows detaching with np.array within that specific context. (Note: I don't love this, just thinking out loud.)

Thanks again!

np.min on torch.Tensor error
>>> t = torch.zeros([2, 4])

>>> np.min(t)

TypeError                                 Traceback (most recent call last)
<ipython-input-13-b148c1b25541> in <module>
----> 1 np.min(t)

<__array_function__ internals> in amin(*args, **kwargs)

~/conda/envs/pip7/lib/python3.7/site-packages/numpy/core/fromnumeric.py in amin(a, axis, out, keepdims, initial, where)
   2791     """
   2792     return _wrapreduction(a, np.minimum, 'min', axis, None, out,
-> 2793                           keepdims=keepdims, initial=initial, where=where)
   2794
   2795

~/conda/envs/pip7/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     86                 return reduction(axis=axis, dtype=dtype, out=out, **passkwargs)
     87             else:
---> 88                 return reduction(axis=axis, out=out, **passkwargs)
     89
     90     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)

TypeError: min() received an invalid combination of arguments - got (out=NoneType, axis=NoneType, ), but expected one of:
 * ()
 * (name dim, bool keepdim)
      didn't match because some of the keywords were incorrect: out, axis
 * (Tensor other)
 * (int dim, bool keepdim)
      didn't match because some of the keywords were incorrect: out, axis

@shoyer
Copy link
shoyer commented Apr 17, 2020

The problem is that there's way too much code out in the wild (e.g., in SciPy or Scikit-Learn) that starts by casting inputs with np.asarray(). This has become the standard way to coerce generic inputs like Python lists or memoryview objects into NumPy arrays, but I agree that it may not be appropriate when such conversions are expensive.

It would indeed be nice to have a standard way to "hard convert" GPU/sparse/distributed arrays into NumPy arrays, it just can't be the default behavior of __array__/asarray. Adding a force flag to this protocol sounds like a totally reasonable solution to me.

This would need to be blessed by discussion on the NumPy mailing list. Even better would be to write up a short NumPy Enhancement Proposal, with a discussion of the motivation and usage guidelines about when force=True is appropriate to use in a library (e.g., it's much safer for a visualization tool than a computational library). It's probably worth noting that some library like (e.g., Dask, JAX and probably TensorFlow?) already support __array__, but others (PyTorch, scipy.sparse, pydata/sparse) do not.

@seberg
Copy link
seberg commented Apr 17, 2020

I agree with @shoyer plotting libraries are indeed a nice point where something like a "forced cast" is a reasonable thing to do (since e.g. you do not have a return value that might have the same type as the input).

I wouldn't mind exploring what happens if you do things like implement __array_function__, but always error out (until you do something sensible). If you reject __array__ you currently reject pretty much all of NumPy's API anyway. But, for libraries – until they move away from calling np.asarray() – that does not help.

In either case, a force seems simple enough and probably about as good an approach we will find.

@rgommers
Copy link
Collaborator

I think I agree with @ezyang and @mruberry here - there are advantages to being explicit, code does become clearer about what it does (e.g. scipy.sparse matrices have todense(), pytorch CUDA tensors have cpu(), etc.), and it's harder for the user to shoot themselves in the foot.

Forced casting sounds like a nice convenience for a visualization library, however this can also be seen as a learning moment for users. If they have no idea about whether their data lives on a GPU or how to get it back to a CPU and what the cost of that is, there's bigger issues than having to type a few characters more ....

But there are also all the arrays mentioned above, plus Jax, CuPy, tensorflow, chainer, mxnet, pint Quantities, PyOpenCL, unyt, ... And more to come, I'm sure! In all these cases, converting to NumPy results in information loss, but it would be a nightmare to have to support all of the individual conversions.

Why do you (as a viz library author) have to support this? Is it a matter of saving a user typing .cpu() or .todense(), or is there more to it?

However, this means that anyone else in the NumPy ecosystem that wants to play well with torch.Tensors then needs to include PyTorch-specific code in their code base, which is not super nice.

I'm not sure about this statement. If you support CPU tensors, you are playing well with PyTorch.

(Or monkey-patch it, which is also not super nice!)

don't do that please:)

@jni
Copy link
Author
jni commented Apr 20, 2020

@rgommers

Forced casting sounds like a nice convenience for a visualization library, however this can also be seen as a learning moment for users. If they have no idea about whether their data lives on a GPU or how to get it back to a CPU and what the cost of that is, there's bigger issues than having to type a few characters more ....

Why do you (as a viz library author) have to support this? Is it a matter of saving a user typing .cpu() or .todense(), or is there more to it?

Well, to me it comes to the question of why we have protocols at all. Should the moment of visualisation be the moment at which every user learns the specifics of how to get a numpy array for that particular library they are using right then? torch_tensor.numpy(), dask_array.compute(), pandas_series.to_numpy(), tf.make_ndarray(tf_tensor), mxnet_array.asnumpy()...

I remember a few years ago, someone tweeted that adapting two FOSS libraries to work together was more akin to organ transplant than to snapping legos together. At the time, I remember thinking that I was so lucky to be working with scientific Python, where I didn't find this true at all. Because everybody spoke NumPy, I could plug scientific libraries together in minutes. It was not remotely like organ transplant. I started out in my career by mixing together NumPy, SciPy, scikit-image, scikit-learn, and NetworkX. It was just a brilliant experience.

Now if I want to work with PyTorch and napari, things don't just work. I should learn .detach().cpu(). Fine. But what if I'm comparing the performance of various pretrained networks from different publications on my data, plus two classical pipelines that produce dask arrays and PyOpenCL arrays. Now these things come in all flavours: tensorflow 1.0, tensorflow 2.0, pytorch, mxnet, and dask. If everyone provides a .__array__ method that Just Works, I can write:

v = napari.Viewer()
for result in results():
    v.add_layer(result)

In the non-__array__ world, I have to write a big if/else cascade to add something different depending on the type of the result. And of course if Apple then comes up with iTensors that use the syntax .dongle_to_numpy(), I have to go and modify my code again, instead of passing in some parameters pointing to the new network.

I think this is not a small thing, it's a huge thing, and it's only going to get bigger as the array landscape continues to evolve.

Anyway, I hope that's a good enough justification to consider force=True. I'd be more than happy to start a discussion on the NumPy side and write my first NEP, but I'd like to get a sense on whether that would be convincing to the PyTorch folks? To be clear, the proposal is:

>>> np.asarray(torch_with_grad)
# ---> Error
>>> np.asarray(torch_on_gpu)
# ---> Error
>>> np.asarray(torch_with_grad, force=True)
# ---> array!
>>> np.asarray(torch_on_gpu, force=True)
# ---> array!

Thoughts?

@rgommers btw, I'm actually dismayed to find out that np.array(csr_mat) doesn't work! I have blown up my memory about a million times calling np.array() on a huge dask array, but not once have I wished dask didn't provide me that option! That can also be a learning opportunity. =) Maybe with force=True SciPy would consider dense instantiation? ;)

@mruberry
Copy link
Collaborator

We would consider supporting a "force to NumPy" option in PyTorch. See #20778. Would that address your concern?

@seberg
Copy link
seberg commented Apr 20, 2020

Well, I am not sure that adding an option to pytorch can really help with the issue Juan is pointing out. Let me try to unravel this:

  1. Many libraries do not have this problem, they are happy to be converted silently to a numpy array and np.asarray may be cheap or even no-copy.

    • They and their users use np.asarray a lot and are happy with it. "Things just work"
    • Some of these may be qualitatively different from an array. Say an image object, or a storage provider.
  2. Some libraries have costly (and possibly confusing coercion) to arrays. For these it is indvidually a common and probably good choice is to shout at the user and make them aware of it by using a more explicit coercion (e.g. a arr.to_numpy() method).

    • This teaches the user more nicely than freezing up the computer due to a sparse conversion.
    • Some of these libraries pretend to be NumPy arrays making the transition to NumPy much more subtle. And especially surprising when passing it in to library functions which return a NumPy array.
  3. Visualization is different in the sense that the conversion is much less problematic and not prone to confusion due to having a numpy array returned.

So, should napari do force=True silently for the user? I do not know, it removes the teaching moment for the user and potentially freezes up their computer. So you can argue that the user should do this step.

But, Juan's argument remains. Even if you say that napari should never use a potential force=True argument if a user works with multiple array likes, there is no clear story for them.
We already agree that a force=True in some form is a good design choice, this is implicit by the fact that many libraries choose to provide that functionality. Aside from us not wanting to expand the __array__ protocol, the only real argument I can see against it, would be that a single user working with PyTorch, Tensorflow, MxNet, etc. is already in so much pain and agony that a simpler conversion to NumPy frustrates them more because there is only one piece that fits instead of most ;p.

EDIT: To be fair, if we discourage napari to do the conversion, the pressure to actually provide a protocol is much lower, so an argument such that it adds annoying API to NumPy is much stronger.

@mruberry
Copy link
Collaborator

Does this hypothetical user of multiple 8000 tensor libraries actually exist? It seems like we're trying to cater to the following niche:

  • A user who uses multiple tensor libraries plus a NumPy-compatible library
  • The user is familiar with the NumPy-compatible library but can't be bothered to learn how to convert tensors from the tensor libraries to NumPy arrays

Does it really take that much time to look up t.detach().cpu() or t.numpy(force=True)?

@rgommers
Copy link
Collaborator

The discussion on adding a force keyword to __array__ is happening on the NumPy mailing list:
https://mail.python.org/pipermail/numpy-discussion/2020-April/080574.html

This discussion has more detail than gh-13568, so I'll close that as a duplicate.

Adding a force keyword to .numpy() is gh-20778, and that is actionable now, if anyone wants to open a PR.

@rgommers
Copy link
Collaborator

The Tensor.numpy method now has a force=False keyword, see gh-78564. This missed the 1.12 branch cut, so it should become available in PyTorch 1.13.0.

That's as much as can be done here I think, until NumPy adds a force=False keyword to numpy.ndarray.__array__. If that happens, then it can easily also be added to torch.Tensor.__array__.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A request for a proper, new feature. module: cuda Related to torch.cuda, and CUDA support in general module: numpy Related to numpy support, and also numpy compatibility of our operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

8 participants
0