8000 NEP: Adjust NEP-35 to make it more user-accessible by pentschev · Pull Request #17093 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

NEP: Adjust NEP-35 to make it more user-accessible #17093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Sep 7, 2020
Merged
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
7d7b46c
NEP: Adjust NEP-35 to make it more user-accessible
pentschev Aug 14, 2020
9b660e4
NEP: Simplify NEP-35 further with reviewer's suggestions
pentschev Aug 17, 2020
68fd054
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
61dcb63
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
3cf7b6b
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
52d9c74
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
615f19f
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
69e3e71
NEP: Improve NEP-35 abstract per @mattip's suggestion
pentschev Aug 19, 2020
cde3543
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
1017007
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
a82cc4b
NEP: Move NumPy users comment to top of NEP-35 Usage and Impact
pentschev Aug 19, 2020
17620c2
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
f1d1562
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
57d6bab
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
67c9733
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
974c023
Update doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
pentschev Aug 19, 2020
b5f5577
NEP: Clarify NEP-35 C implementation details.
pentschev Aug 19, 2020
2e30534
NEP: Clarify Dask intent with `my_dask_pad` function name
pentschev Aug 19, 2020
3d527ea
NEP: Improve grammar on NEP-35 reference to Dask's objects
pentschev Aug 19, 2020
b6f2c16
NEP: Fix some grammar and formatting in NEP-35
pentschev Aug 19, 2020
57f78df
ENH: Clarifies meta_from_array function in NEP-35
pentschev Aug 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
307 changes: 234 additions & 73 deletions doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst
F438
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,196 @@ NEP 35 — Array Creation Dispatching With __array_function__
:Status: Draft
:Type: Standards Track
:Created: 2019-10-15
:Updated: 2020-08-06
:Updated: 2020-08-17
:Resolution:

Abstract
--------

We propose the introduction of a new keyword argument ``like=`` to all array
creation functions to permit dispatching of such functions by the
``__array_function__`` protocol, addressing one of the protocol shortcomings,
as described by NEP-18 [1]_.
creation functions to address one of the shortcomings of ``__array_function__``,
as described by NEP 18 [1]_. The ``like=`` keyword argument will create an
instance of the argument's type, enabling direct creation of non-NumPy arrays.
The target array type must implement the ``__array_function__`` protocol.

Motivation and Scope
--------------------

Many libraries implement the NumPy API, such as Dask for graph
computing, CuPy for GPGPU computing, xarray for N-D labeled arrays, etc. Underneath,
they have adopted the ``__array_function__`` protocol which allows NumPy to understand
and treat downstream objects as if they are the native ``numpy.ndarray`` object.
Hence the community while using various libraries still benefits from a unified
NumPy API. This not only brings great convenience for standardization but also
removes the burden of learning a new API and rewriting code for every new
object. In more technical terms, this mechanism of the protocol is called a
"dispatcher", which is the terminology we use from here onwards when referring
to that.


.. code:: python

x = dask.array.arange(5) # Creates dask.array
np.diff(x) # Returns dask.array

Note above how we called Dask's implementation of ``diff`` via the NumPy
namespace by calling ``np.diff``, and the same would apply if we had a CuPy
array or any other array from a library that adopts ``__array_function__``.
This allows writing code that is agnostic to the implementation library, thus
users can write their code once and still be able to use different array
implementations according to their needs.

Obviously, having a protocol in-place is useful if the arrays are created
elsewhere and let NumPy handle them. But still these arrays have to be started
in their native library and brought back. Instead if it was possible to create
these objects through NumPy API then there would be an almost complete
experience, all using NumPy syntax. For example, say we have some CuPy array
Comment on lines +50 to +54
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Obviously, having a protocol in-place is useful if the arrays are created
elsewhere and let NumPy handle them. But still these arrays have to be started
in their native library and brought back. Instead if it was possible to create
these objects through NumPy API then there would be an almost complete
experience, all using NumPy syntax. For example, say we have some CuPy array
The mechanism as described above covers cases where the input is already an array.
These arrays have to be created. NumPy provides `array creation routines` like
`np.ones` but how to use these to create a CuPy or Dask array? For example,
say we have some CuPy array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of the reasons why I got confused in the beginning. "Why should I even be able to create other arrays if I am already using those libs? I can create via CuPy or Dask whatever I need".

That detail is being lost with this compact narrative.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was suggested in #17093 (comment) . Given that the intent was to make the NEP clearer to users, I agree with @ilayn that the detail is getting lost.

``cp_arr``, and want a similar CuPy array with identity matrix. We could still
write the following:

.. code:: python

x = cupy.identity(3)

Instead, the better way would be using to only use the NumPy API, this could now
be achieved with:

.. code:: python

x = np.identity(3, like=cp_arr)

As if by magic, ``x`` will also be a CuPy array, as NumPy was capable to infer
that from the type of ``cp_arr``. Note that this last step would not be possible
without ``like=``, as it would be impossible for the NumPy to know the user
expects a CuPy array based only on the integer input.

The new ``like=`` keyword proposed is solely intended to identify the downstream
library where to dispatch and the object is used only as reference, meaning that
no modifications, copies or processing will be performed on that object.

We expect that this functionality will be mostly useful to library developers,
allowing them to create new arrays for internal usage based on arrays passed
by the user, preventing unnecessary creation of NumPy arrays that will
ultimately lead to an additional conversion into a downstream array type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =) The only thing I think would be useful is to have a complete usage example, "eg here is scikit-learn k-means now — without any option to create like, even if it works with a CuPy array as input it will end up with a NumPy array as output. But after this NEP and modifying the code to look like below, it will work completely within CuPy, or dask, or other arrays implementing this protocol."

I was trying to rack my brain for examples in scikit-image that need this, but can't come up with any off the top of my head. Happy to go hunting for one, but I figure there might be existing examples that motivated this NEP in the first place and that would be simple enough to be included?

Update: my_pad below is sufficient. I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is wonderful, thank you @pentschev for writing it. =) Definitely exactly what I was looking for. =)

To be fair this is not a new addition, it was already in the original text, it just got moved a bit. See https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance .

Update: my_pad below is sufficient.

I'm not sure if you mean that my_pad already addresses the "complete usage example" you mentioned above or the strikethrough text. Are you confirming the existing example suffices?

I wonder whether it belongs in this section rather than the next one, ie move up to line 113 (cupy's TypeError) into this section, then start the next section.

I tried to follow the NEP X -- Template and Instructions. To my understanding, it should belong in the "Motivation and Scope" section, where it is now.

"Motivation and Scope: ... It should describe the existing problem, who it affects, what it is trying to solve, and why."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my_pad is sufficient, and I agree, it belongs in motivation and scope, but it is currently in Usage, and I was asking whether it should be moved to motivation and scope.

Are we still talking about the paragraph saying "We expect that this functionality will be mostly useful to library developers, ..."? That is in Motivation and Scope already, not in Usage and Impact.


Support for Python 2.7 has been dropped since NumPy 1.17, therefore we make use
of the keyword-only argument standard described in PEP-3102 [2]_ to implement
``like=``, thus preventing it from being passed by position.

.. _neps.like-kwarg.usage-and-impact:

Usage and Impact
----------------

NumPy users who don't use other arrays from downstream libraries can continue
to use array creation routines without a ``like=`` argument. Using
``like=np.ndarray`` will work as if no array was passed via that argument.
However, this will incur additional checks that will negatively impact
performance.

To understand the intended use for ``like=``, and before we move to more complex
cases, consider the following illustrative example consisting only of NumPy and
CuPy arrays:

.. code:: python

import numpy as np
import cupy

def my_pad(arr, padding):
padding = np.array(padding, like=arr)
return np.concatenate((padding, arr, padding))
Comment on lines +107 to +109
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code would work just fine without the second line. To motivate it, perhaps:

Suggested change
def my_pad(arr, padding):
padding = np.array(padding, like=arr)
return np.concatenate((padding, arr, padding))
def my_pad(arr, padding):
# coerce `padding` just once, rather than letting `concatenate` do it twice
padding = np.array(padding, like=arr)
return np.concatenate((padding, arr, padding))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code would work just fine without the second line.

I don't get this comment, are you saying it works without the np.concatenate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naive reader may not understand why the call to padding = np.array(padding, like=arr) is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. but then it is explained in the text below.


my_pad(np.arange(5), [-1, -1]) # Returns np.ndarray
my_pad(cupy.arange(5), [-1, -1]) # Returns cupy.core.core.ndarray

Note in the ``my_pad`` function above how ``arr`` is used as a reference to
dictate what array type padding should have, before concatenating the arrays to
produce the result. On the other hand, if ``like=`` wasn't used, the NumPy case
would still work, but CuPy wouldn't allow this kind of automatic
conversion, ultimately raising a
``TypeError: Only cupy arrays can be concatenated`` exception.

Now we should look at how a library like Dask could benefit from ``like=``.
Before we understand that, it's important to understand a bit about Dask basics
and ensures correctness with ``__array_function__``. Note that Dask can perform
computations on different sorts of objects, like dataframes, bags and arrays,
here we will focus strictly on arrays, which are the objects we can use
``__array_function__`` with.

Dask uses a graph computing model, meaning it breaks down a large problem in
many smaller problems and merges their results to reach the final result. To
break the problem down into smaller ones, Dask also breaks arrays into smaller
arrays that it calls "chunks". A Dask array can thus consist of one or more
chunks and they may be of different types. However, in the context of
``__array_function__``, Dask only allows chunks of the same type; for example,
a Dask array can be formed of several NumPy arrays or several CuPy arrays, but
not a mix of both.

To avoid mismatched types during computation, Dask keeps an attribute ``_meta`` as
part of its array throughout computation: this attribute is used to both predict
the output type at graph creation time, and to create any intermediary arrays
that are necessary within some function's computation. Going back to our
previous example, we can use ``_meta`` information to identify what kind of
array we would use for padding, as seen below:

.. code:: python

import numpy as np
import cupy
import dask.array as da
from dask.array.utils import meta_from_array

def my_dask_pad(arr, padding):
Copy link
Member
@eric-wieser eric-wieser Aug 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This begs the question "Why can't I just use the my_pad from earlier on dask arrays?".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

padding = np.array(padding, like=meta_from_array(arr))
return np.concatenate((padding, arr, padding))

# Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>
my_dask_pad(da.arange(5), [-1, -1])

# Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=cupy.ndarray>
my_dask_pad(da.from_array(cupy.arange(5)), [-1, -1])

Note how ``chunktype`` in the return value above changes from
``numpy.ndarray`` in the first ``my_dask_pad`` call to ``cupy.ndarray`` in the
second. We have also renamed the function to ``my_dask_pad`` in this example
with the intent to make it clear that this is how Dask would implement such
functionality, should it need to do so, as it requires Dask's internal tools
that are not of much use elsewhere.

To enable proper identification of the array type we use Dask's utility function
``meta_from_array``, which was introduced as part of the work to support
``__array_function__``, allowing Dask to handle ``_meta`` appropriately. Readers
can think of ``meta_from_array`` as a special function that just returns the
type of the underlying Dask array, for example:

.. code:: python

np_arr = da.arange(5)
cp_arr = da.from_array(cupy.arange(5))

meta_from_array(np_arr) # Returns a numpy.ndarray
meta_from_array(cp_arr) # Returns a cupy.ndarray

Since the value returned by ``meta_from_array`` is a NumPy-like array, we can
just pass that directly into the ``like=`` argument.

The ``meta_from_array`` function is primarily targeted at the library's internal
usage to ensure chunks are created with correct types. Without the ``like=``
argument, it would be impossible to ensure ``my_pad`` creates a padding array
with a type matching that of the input array, which would cause a ``TypeError``
exception to be raised by CuPy, as discussed above would happen to the CuPy case
alone. Combining Dask's internal handling of meta arrays and the proposed
``like=`` argument, it now becomes possible to handle cases involving creation
of non-NumPy arrays, which is likely the heaviest limitation Dask currently
faces from the ``__array_function__`` protocol.

Backward Compatibility
----------------------

This proposal does not raise any backward compatibility issues within NumPy,
given that it only introduces a new keyword argument to existing array creation
functions with a default ``None`` value, thus not changing current behavior.

Detailed description
--------------------
Expand All @@ -28,36 +208,32 @@ did not -- and did not intend to -- address the creation of arrays by downstream
libraries, preventing those libraries from using such important functionality in
that context.

Other NEPs have been written to address parts of that limitation, such as the
introduction of the ``__duckarray__`` protocol in NEP-30 [2]_, and the
introduction of an overriding mechanism called ``uarray`` by NEP-31 [3]_.

The purpose of this NEP is to address that shortcoming in a simple and
straighforward way: introduce a new ``like=`` keyword argument, similar to how
the ``empty_like`` family of functions work. When array creation functions
receive such an argument, they will trigger the ``__array_function__`` protocol,
and call the downstream library's own array creation function implementation.
The ``like=`` argument, as its own name suggests, shall be used solely for the
purpose of identifying where to dispatch. In contrast to the way
``__array_function__`` has been used so far (the first argument identifies where
to dispatch), and to avoid breaking NumPy's API with regards to array creation,
the new ``like=`` keyword shall be used for the purpose of dispatching.

Usage Guidance
~~~~~~~~~~~~~~

The new ``like=`` keyword is solely intended to identify the downstream library
where to dispatch and the object is used only as reference, meaning that no
modifications, copies or processing will be performed on that object.

We expect that this functionality will be mostly useful to library developers,
allowing them to create new arrays for internal usage based on arrays passed
by the user, preventing unnecessary creation of NumPy arrays that will
ultimately lead to an additional conversion into a downstream array type.

Support for Python 2.7 has been dropped since NumPy 1.17, therefore we should
make use of the keyword-only argument standard described in PEP-3102 [4]_ to
implement the ``like=``, thus preventing it from being passed by position.
``__array_function__`` has been used so far (the first argument identifies the
target downstream library), and to avoid breaking NumPy's API with regards to
array creation, the new ``like=`` keyword shall be used for the purpose of
dispatching.

Downstream libraries will benefit from the ``like=`` argument without any
changes to their API, given the argument is of exclusive implementation in
NumPy. It will still be required that downstream libraries implement the
``__array_function__`` protocol, as described by NEP 18 [1]_, and appropriately
introduce the argument to their calls to NumPy array creation functions, as
exemplified in :ref:`neps.like-kwarg.usage-and-impact`.

Related work
------------

Other NEPs have been written to address parts of ``__array_function__``
protocol's limitation, such as the introduction of the ``__duckarray__``
protocol in NEP 30 [3]_, and the introduction of an overriding mechanism called
``uarray`` by NEP 31 [4]_.

Implementation
--------------
Expand All @@ -66,10 +242,10 @@ The implementation requires introducing a new ``like=`` keyword to all existing
array creation functions of NumPy. As examples of functions that would add this
new argument (but not limited to) we can cite those taking array-like objects
such as ``array`` and ``asarray``, functions that create arrays based on
numerical ranges such as ``range`` and ``linspace``, as well as the ``empty``
family of functions, even though that may be redundant, since there exists
already specializations for those with the naming format ``empty_like``. As of
the writing of this NEP, a complete list of array creation functions can be
numerical inputs such as ``range`` and ``identity``, as well as the ``empty``
family of functions, even though that may be redundant, since specializations
for those already exist with the naming format ``empty_like``. As of the
writing of this NEP, a complete list of array creation functions can be
found in [5]_.

This newly proposed keyword shall be removed by the ``__array_function__``
Expand Down Expand Up @@ -135,60 +311,45 @@ There are two downsides to the implementation above for C functions:
2. To follow current implementation standards, documentation should be attached
directly to the Python source code.

Alternatively for C functions, the implementation of ``like=`` could be moved
into the C implementation itself. This is not the primary suggestion here due
to its inherent complexity which would be difficult too long to describe in its
entirety here, and too tedious for the reader. However, we leave that as an
option open for discussion.
The first version of this proposal suggested the implementation above as one
viable solution for NumPy functions implemented in C. However, due to the
downsides pointed out above we have decided to discard any changes on the Python
side and resolve those issues with a pure-C implementation. Please refer to
[implementation]_ for details.

Usage
-----
Alternatives
------------

The purpose of this NEP is to keep things simple. Similarly, we can exemplify
the usage of ``like=`` in a simple way. Imagine you have an array of ones
created by a downstream library, such as CuPy. What you need now is a new array
that can be created using the NumPy API, but that will in fact be created by
the downstream library, a simple way to achieve that is shown below.
Recently a new protocol to replace ``__array_function__`` entirely was proposed
by NEP 37 [6]_, which would require considerable rework by downstream libraries
that adopt ``__array_function__`` already, because of that we still believe the
``like=`` argument is beneficial for NumPy and downstream libraries. However,
that proposal wouldn't necessarily be considered a direct alternative to the
present NEP, as it would replace NEP 18 entirely, upon which this builds.
Discussion on details about this new proposal and why that would require rework
by downstream libraries is beyond the scope of the present proposal.

.. code:: python
Discussion
----------

x = cupy.ones(2)
np.array([1, 3, 5], like=x) # Returns cupy.ndarray
.. [implementation] `Implementation's pull request on GitHub <https://github.com/numpy/numpy/pull/16935>`_
.. [discussion] `Further discussion on implementation and the NEP's content <https://mail.python.org/pipermail/numpy-discussion/2020-August/080919.html>`_

As a second example, we could also create an array of evenly spaced numbers
using a Dask identity matrix as reference:
References
----------

.. code:: python
.. [1] `NEP 18 - A dispatch mechanism for NumPy's high level array functions <https://numpy.org/neps/nep-0018-array-function-protocol.html>`_.

x = dask.array.eye(3)
np.linspace(0, 2, like=x) # Returns dask.array
.. [2] `PEP 3102 — Keyword-Only Arguments <https://www.python.org/dev/peps/pep-3102/>`_.

.. [3] `NEP 30 — Duck Typing for NumPy Arrays - Implementation <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_.

Compatibility
-------------

This proposal does not raise any backward compatibility issues within NumPy,
given that it only introduces a new keyword argument to existing array creation
functions.

Downstream libraries will benefit from the ``like=`` argument automatically,
that is, without any explicit changes in their codebase. The only requirement
is that they already implement the ``__array_function__`` protocol, as
described by NEP-18 [2]_.

References and Footnotes
------------------------

.. [1] `NEP-18 - A dispatch mechanism for NumPy's high level array functions <https://numpy.org/neps/nep-0018-array-function-protocol.html>`_.

.. [2] `NEP 30 — Duck Typing for NumPy Arrays - Implementation <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_.

.. [3] `NEP 31 — Context-local and global overrides of the NumPy API <https://github.com/numpy/numpy/pull/14389>`_.

.. [4] `PEP 3102 — Keyword-Only Arguments <https://www.python.org/dev/peps/pep-3102/>`_.
.. [4] `NEP 31 — Context-local and global overrides of the NumPy API <https://github.com/numpy/numpy/pull/14389>`_.

.. [5] `Array creation routines <https://docs.scipy.org/doc/numpy-1.17.0/reference/routines.array-creation.html>`_.

.. [6] `NEP 37 — A dispatch protocol for NumPy-like modules <https://numpy.org/neps/nep-0037-array-module.html>`_.

Copyright
---------

Expand Down
0