8000 API: Add ``device`` and ``to_device`` to ``numpy.ndarray`` [Array API] by mtsokol · Pull Request #25233 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

API: Add device and to_device to numpy.ndarray [Array API] #25233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 18, 2024

Conversation

mtsokol
Copy link
Member
@mtsokol mtsokol commented Nov 23, 2023

Hi @rgommers @seberg,

This PR introduces device and to_device members to numpy.ndarray.

Additionally, device keyword-only arguments were added to:
numpy.asarray, numpy.arange, numpy.empty, numpy.empty_like,
numpy.eye, numpy.full, numpy.full_like, numpy.linspace,
numpy.ones, numpy.ones_like, numpy.zeros, and numpy.zeros_like.

@mtsokol mtsokol self-assigned this Nov 23, 2023
@mtsokol mtsokol changed the title API: Add device and to_device to numpy.ndarray API: Add device and to_device to numpy.ndarray [Array API] Nov 23, 2023
@mtsokol mtsokol force-pushed the numpy-device branch 2 times, most recently from 4a36606 to 5037b48 Compare November 23, 2023 13:35
@mtsokol mtsokol added this to the 2.0.0 release milestone Nov 29, 2023
@mtsokol mtsokol force-pushed the numpy-device branch 2 times, most recently from 4e99405 to 89184df Compare November 29, 2023 15:58
@mtsokol
Copy link
Member Author
mtsokol commented Nov 29, 2023

Hi @seberg,
I completed C-level changes: The new enum and keyword-only arguments are in place. I added the string interning optimization for "cpu", I did an experiment and object == npy_ma_str_cpu comparison works for device="cpu".
In the implementation I kept it as PyUnicode_Compare(object, npy_ma_str_cpu) because PyUnicode_Compare starts with an identity comparison. (Or should I keep it as ==?)

@ngoldbaum
Copy link
Member

I think there's enough code out there relying on the identity comparison fast path in PyUnicde_Compare that you can assume CPython won't remove it.

There's a real test failure though:

self = <numpy._core.tests.test_multiarray.TestDevice object at 0x000001c0cb8314b0>

    def test_device(self):
        arr = np.arange(5)
    
        assert arr.device == "cpu"
        with assert_raises_regex(
            AttributeError,
            "attribute 'device' of 'numpy.ndarray' objects is "
            "not writable"
        ):
>           arr.device = "other"
E           AttributeError: attribute 'device' of 'ndarray' objects is not writable

@mtsokol
Copy link
Member Author
mtsokol commented Dec 5, 2023

There's a real test failure though:

Right, I need to adjust the regex to conform to pypy's ndarray name.

@mtsokol
Copy link
Member Author
mtsokol commented Dec 6, 2023

Now it's ready.

@ngoldbaum
Copy link
Member

Just for the record - it would be nice to see a little justification here or in the eventual NEP for why we need this, despite the only allowed options being "cpu" or None. It's self-evident it adds benefits if you want to write code that will work correctly in numpy or pytorch generically (for example) - are there any other plusses for users who don't need array API compat? On the whole it's probably worth adding stuff to the API that isn't actually useful but does make it easier to write array API-compatible code, but that argument should be spelled out somewhere.

I would also add to all the docstrings for device as a keyword argument something like:

.. note:

    Only the ``"cpu"`` device is supported by NumPy.

@charris charris changed the title API: Add device and to_device to numpy.ndarray [Array API] API: Add device and to_device to numpy.ndarray [Array API] Dec 11, 2023
@ngoldbaum
Copy link
Member

@mtsokol
Copy link
Member Author
mtsokol commented Dec 14, 2023

Hmm, looks like those note admonitions don't show up in the docstrings: https://output.circle-artifacts.com/output/job/ce6f8552-de08-47c3-80bb-91eb2ad750a5/artifacts/0/doc/build/html/reference/generated/numpy.empty.html#numpy.empty

Ah, I missed one : in the note:: definition - now they're all visible.

@mtsokol mtsokol force-pushed the numpy-device branch 4 times, most recently from be8b328 to 5498f7f Compare December 21, 2023 16:50
asmeurer added a commit to asmeurer/numpy that referenced this pull request Jan 10, 2024
In accordance with the array API. See numpy#25233.
Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments mostly from the point of view that we should try to minimize any impact from an argument that for numpy is not useful. So I'd put device after like, and would try to minimize extra lines added and impact on code by passing it on as much as possible, so that there is only one place where the error gets raised (in the C code parsing device).

@@ -912,7 +912,7 @@

add_newdoc('numpy._core.multiarray', 'asarray',
"""
asarray(a, dtype=None, order=None, *, like=None)
asarray(a, dtype=None, order=None, *, device=None, like=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, why is device ahead of like? Since one cannot use it for anything useful, my tendency would be to put it at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why not just set the default to "cpu"? That seems clearer. (Still fine to accept None).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now the function dispatch requires like to be the last argument - here's a check that raises RuntimeError if a function doesn't comply with it:

last_arg = co.co_argcount + co.co_kwonlyargcount - 1
last_arg = co.co_varnames[last_arg]
if last_arg != "like" or co.co_kwonlyargcount == 0:
raise RuntimeError(
"__array_function__ expects `like=` to be the last "
"argument and a keyword-only argument. "
f"{implementation} does not seem to comply.")

Can I keep device before like or should we update the __array_function__ protocol?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, why not just set the default to "cpu"? That seems clearer. (Still fine to accept None).

That's true that "cpu" is the only supported value, but I would prefer to stick to the Array API standard here. It defines the default value for device as None: https://data-apis.org/array-api/latest/API_specification/generated/array_api.asarray.html

We're adding these keyword arguments for Array API compatibility only, so I would prefer to fully conform with it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters from the Array API perspective, it doesn't make NumPy non-compliant, it just makes it explicit about what that default is.

But, I'll give a very slight vote to just keep it None anyway. Just becuase I don't think the more explicit device="cpu" makes it clearer, since users should ignore it anyway (you only use it via xp = arr.__array_namespace__() as an Array API user, never directly as a NumPy user).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000

The order doesn't matter much in principle, since it's keyword-only and like is only for other non-numpy libraries as well. However, there's a bigger problem here:

>>> import numpy as np
>>> import dask.array as da
>>> d = da.ones((1, 2))
>>> np.asarray([1, 2], device='cpu')
array([1, 2])
>>> np.asarray([1, 2], device='cpu', like=d)
Traceback (most recent call last):
  Cell In[24], line 1
    np.asarray([1, 2], device='cpu', like=d)
  File ~/code/tmp/dask/dask/array/core.py:1760 in __array_function__
    return da_func(*args, **kwargs)
  File ~/code/tmp/dask/dask/array/core.py:4581 in asarray
    return from_array(a, getitem=getter_inline, **kwargs)
TypeError: from_array() got an unexpected keyword argument 'device'

The device keyword probably should not be added to the dispatchers for any of these creation functions. It looks like the __array_function__ protocol wasn't implemented in a forward-looking way - I haven't looked much deeper yet, but at first sight it may be the case that we can never add a new keyword to dispatched functions anymore.

In this case, device is here to support writing cross-library-compatible code via the array API standard. The dispatching protocols kinda were earlier attempts to do something similar. So probably they should not interact anyway?

Copy link
Member
@rgommers rgommers Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wait, it's maybe not that extreme - np.asarray([1, 2], like=d) still works, so new keywords are fine, as long as they are not used (then they don't get forwarded). Of course that's a bit funky, but there's no way to avoid it really.

>>> np.asarray([1, 2], device='cpu')
array([1, 2])
>>> np.asarray([1, 2], like=d)
dask.array<array, shape=(2,), dtype=int64, chunksize=(2,), chunktype=numpy.ndarray>
>>> np.asarray([1, 2], like=d, device='cpu')
...
TypeError: from_array() got an unexpected keyword argument 'device'

This would have been much more problematic if libraries like SciPy and scikit-learn supported non-numpy array inputs via __array_ufunc/function__. But since they squash such inputs anyway with np.asarray, there shouldn't be too much of an issue in practice.

And it's fairly easy to add support for new keywords in Dask & co. They just need a new 2.0-compatible release that adds such support, and then device/like can be used together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rgommers - I've gotten quite used to updating astropy's __array_function__ wrappers to remain compatible with numpy functions -- and we've got tests to check that our signatures remain consistent. Of course it doesn't happen that often. So, I don't think that's a worry, and overall I'd tend to not special-case any arguments (i.e., pass it on if present).

On the order: if like has to be last, then just keep it like that, sorry for not noticing myself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify: the check is only there to test habits, and probably mainly to make sure it is kwarg only.
It may have to be last in C (not sure), but it doesn't really matter anywhere here. But... it also doesn't matter a lot (especially since if someone cares, they can change it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mhvk, that is useful context. A test that flag mismatches sounds like a great idea. 🤞🏼 most implementers have that.

@ngoldbaum
Copy link
Member

Merging this one as well since all comments have been addressed. Thanks @mtsokol!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0