8000 Merge pull request #17093 from pentschev/nep-35-template-rewrite · numpy/numpy@274d5d6 · GitHub
[go: up one dir, main page]

Skip to content

Commit 274d5d6

Browse files
authored
Merge pull request #17093 from pentschev/nep-35-template-rewrite
NEP: Adjust NEP-35 to make it more user-accessible
2 parents 8281acd + 57f78df commit 274d5d6

File tree

1 file changed

+234
-73
lines changed

1 file changed

+234
-73
lines changed

doc/neps/nep-0035-array-creation-dispatch-with-array-function.rst

Lines changed: 234 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,196 @@ NEP 35 — Array Creation Dispatching With __array_function__
88
:Status: Draft
99
:Type: Standards Track
1010
:Created: 2019-10-15
11-
:Updated: 2020-08-06
11+
:Updated: 2020-08-17
1212
:Resolution:
1313

1414
Abstract
1515
--------
1616

1717
We propose the introduction of a new keyword argument ``like=`` to all array
18-
creation functions to permit dispatching of such functions by the
19-
``__array_function__`` protocol, addressing one of the protocol shortcomings,
20-
as described by NEP-18 [1]_.
18+
creation functions to address one of the shortcomings of ``__array_function__``,
19+
as described by NEP 18 [1]_. The ``like=`` keyword argument will create an
20+
instance of the argument's type, enabling direct creation of non-NumPy arrays.
21+
The target array type must implement the ``__array_function__`` protocol.
22+
23+
Motivation and Scope
24+
--------------------
25+
26+
Many libraries implement the NumPy API, such as Dask for graph
27+
computing, CuPy for GPGPU computing, xarray for N-D labeled arrays, etc. Underneath,
28+
they have adopted the ``__array_function__`` protocol which allows NumPy to understand
29+
and treat downstream objects as if they are the native ``numpy.ndarray`` object.
30+
Hence the community while using various libraries still benefits from a unified
31+
NumPy API. This not only brings great convenience for standardization but also
32+
removes the burden of learning a new API and rewriting code for every new
33+
object. In more technical terms, this mechanism of the protocol is called a
34+
"dispatcher", which is the terminology we use from here onwards when referring
35+
to that.
36+
37+
38+
.. code:: python
39+
40+
x = dask.array.arange(5) # Creates dask.array
41+
np.diff(x) # Returns dask.array
42+
43+
Note above how we called Dask's implementation of ``diff`` via the NumPy
44+
namespace by calling ``np.diff``, and the same would apply if we had a CuPy
45+
array or any other array from a library that adopts ``__array_function__``.
46+
This allows writing code that is agnostic to the implementation library, thus
47+
users can write their code once and still be able to use different array
48+
implementations according to their needs.
49+
50+
Obviously, having a protocol in-place is useful if the arrays are created
51+
elsewhere and let NumPy handle them. But still these arrays have to be started
52+
in their native library and brought back. Instead if it was possible to create
53+
these objects through NumPy API then there would be an almost complete
54+
experience, all using NumPy syntax. For example, say we have some CuPy array
55+
``cp_arr``, and want a similar CuPy array with identity matrix. We could still
56+
write the following:
57+
58+
.. code:: python
59+
60+
x = cupy.identity(3)
61+
62+
Instead, the better way would be using to only use the NumPy API, this could now
63+
be achieved with:
64+
65+
.. code:: python
66+< 6D40 div class="diff-text-inner">
67+
x = np.identity(3, like=cp_arr)
68+
69+
As if by magic, ``x`` will also be a CuPy array, as NumPy was capable to infer
70+
that from the type of ``cp_arr``. Note that this last step would not be possible
71+
without ``like=``, as it would be impossible for the NumPy to know the user
72+
expects a CuPy array based only on the integer input.
73+
74+
The new ``like=`` keyword proposed is solely intended to identify the downstream
75+
library where to dispatch and the object is used only as reference, meaning that
76+
no modifications, copies or processing will be performed on that object.
77+
78+
We expect that this functionality will be mostly useful to library developers,
79+
allowing them to create new arrays for internal usage based on arrays passed
80+
by the user, preventing unnecessary creation of NumPy arrays that will
81+
ultimately lead to an additional conversion into a downstream array type.
82+
83+
Support for Python 2.7 has been dropped since NumPy 1.17, therefore we make use
84+
of the keyword-only argument standard described in PEP-3102 [2]_ to implement
85+
``like=``, thus preventing it from being passed by position.
86+
87+
.. _neps.like-kwarg.usage-and-impact:
88+
89+
Usage and Impact
90+
----------------
91+
92+
NumPy users who don't use other arrays from downstream libraries can continue
93+
to use array creation routines without a ``like=`` argument. Using
94+
``like=np.ndarray`` will work as if no array was passed via that argument.
95+
However, this will incur additional checks that will negatively impact
96+
performance.
97+
98+
To understand the intended use for ``like=``, and before we move to more complex
99+
cases, consider the following illustrative example consisting only of NumPy and
100+
CuPy arrays:
101+
102+
.. code:: python
103+
104+
import numpy as np
105+
import cupy
106+
107+
def my_pad(arr, padding):
108+
padding = np.array(padding, like=arr)
109+
return np.concatenate((padding, arr, padding))
110+
111+
my_pad(np.arange(5), [-1, -1]) # Returns np.ndarray
112+
my_pad(cupy.arange(5), [-1, -1]) # Returns cupy.core.core.ndarray
113+
114+
Note in the ``my_pad`` function above how ``arr`` is used as a reference to
115+
dictate what array type padding should have, before concatenating the arrays to
116+
produce the result. On the other hand, if ``like=`` wasn't used, the NumPy case
117+
would still work, but CuPy wouldn't allow this kind of automatic
118+
conversion, ultimately raising a
119+
``TypeError: Only cupy arrays can be concatenated`` exception.
120+
121+
Now we should look at how a library like Dask could benefit from ``like=``.
122+
Before we understand that, it's important to understand a bit about Dask basics
123+
and ensures correctness with ``__array_function__``. Note that Dask can perform
124+
computations on different sorts of objects, like dataframes, bags and arrays,
125+
here we will focus strictly on arrays, which are the objects we can use
126+
``__array_function__`` with.
127+
128+
Dask uses a graph computing model, meaning it breaks down a large problem in
129+
many smaller problems and merges their results to reach the final result. To
130+
break the problem down into smaller ones, Dask also breaks arrays into smaller
131+
arrays that it calls "chunks". A Dask array can thus consist of one or more
132+
chunks and they may be of different types. However, in the context of
133+
``__array_function__``, Dask only allows chunks of the same type; for example,
134+
a Dask array can be formed of several NumPy arrays or several CuPy arrays, but
135+
not a mix of both.
136+
137+
To avoid mismatched types during computation, Dask keeps an attribute ``_meta`` as
138+
part of its array throughout computation: this attribute is used to both predict
139+
the output type at graph creation time, and to create any intermediary arrays
140+
that are necessary within some function's computation. Going back to our
141+
previous example, we can use ``_meta`` information to identify what kind of
142+
array we would use for padding, as seen below:
143+
144+
.. code:: python
145+
146+
import numpy as np
147+
import cupy
148+
import dask.array as da
149+
from dask.array.utils import meta_from_array
150+
151+
def my_dask_pad(arr, padding):
152+
padding = np.array(padding, like=meta_from_array(arr))
153+
return np.concatenate((padding, arr, padding))
154+
155+
# Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray>
156+
my_dask_pad(da.arange(5), [-1, -1])
157+
158+
# Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=cupy.ndarray>
159+
my_dask_pad(da.from_array(cupy.arange(5)), [-1, -1])
160+
161+
Note how ``chunktype`` in the return value above changes from
162+
``numpy.ndarray`` in the first ``my_dask_pad`` call to ``cupy.ndarray`` in the
163+
second. We have also renamed the function to ``my_dask_pad`` in this example
164+
with the intent to make it clear that this is how Dask would implement such
165+
functionality, should it need to do so, as it requires Dask's internal tools
166+
that are not of much use elsewhere.
167+
168+
To enable proper identification of the array type we use Dask's utility function
169+
``meta_from_array``, which was introduced as part of the work to support
170+
``__array_function__``, allowing Dask to handle ``_meta`` appropriately. Readers
171+
can think of ``meta_from_array`` as a special function that just returns the
172+
type of the underlying Dask array, for example:
173+
174+
.. code:: python
175+
176+
np_arr = da.arange(5)
177+
cp_arr = da.from_array(cupy.arange(5))
178+
179+
meta_from_array(np_arr) # Returns a numpy.ndarray
180+
meta_from_array(cp_arr) # Returns a cupy.ndarray
181+
182+
Since the value returned by ``meta_from_array`` is a NumPy-like array, we can
183+
just pass that directly into the ``like=`` argument.
184+
185+
The ``meta_from_array`` function is primarily targeted at the library's internal
186+
usage to ensure chunks are created with correct types. Without the ``like=``
187+
argument, it would be impossible to ensure ``my_pad`` creates a padding array
188+
with a type matching that of the input array, which would cause a ``TypeError``
189+
exception to be raised by CuPy, as discussed above would happen to the CuPy case
190+
alone. Combining Dask's internal handling of meta arrays and the proposed
191+
``like=`` argument, it now becomes possible to handle cases involving creation
192+
of non-NumPy arrays, which is likely the heaviest limitation Dask currently
193+
faces from the ``__array_function__`` protocol.
194+
195+
Backward Compatibility
196+
----------------------
197+
198+
This proposal does not raise any backward compatibility issues within NumPy,
199+
given that it only introduces a new keyword argument to existing array creation
200+
functions with a default ``None`` value, thus not changing current behavior.
21201

22202
Detailed description
23203
--------------------
@@ -28,36 +208,32 @@ did not -- and did not intend to -- address the creation of arrays by downstream
28208
libraries, preventing those libraries from using such important functionality in
29209
that context.
30210

31-
Other NEPs have been written to address parts of that limitation, such as the
32-
introduction of the ``__duckarray__`` protocol in NEP-30 [2]_, and the
33-
introduction of an overriding mechanism called ``uarray`` by NEP-31 [3]_.
34-
35211
The purpose of this NEP is to address that shortcoming in a simple and
36212
straighforward way: introduce a new ``like=`` keyword argument, similar to how
37213
the ``empty_like`` family of functions work. When array creation functions
38214
receive such an argument, they will trigger the ``__array_function__`` protocol,
39215
and call the downstream library's own array creation function implementation.
40216
The ``like=`` argument, as its own name suggests, shall be used solely for the
41217
purpose of identifying where to dispatch. In contrast to the way
42-
``__array_function__`` has been used so far (the first argument identifies where
43-
to dispatch), and to avoid breaking NumPy's API with regards to array creation,
44-
the new ``like=`` keyword shall be used for the purpose of dispatching.
45-
46-
Usage Guidance
47-
~~~~~~~~~~~~~~
48-
49-
The new ``like=`` keyword is solely intended to identify the downstream library
50-
where to dispatch and the object is used only as reference, meaning that no
51-
modifications, copies or processing will be performed on that object.
52-
53-
We expect that this functionality will be mostly useful to library developers,
54-
allowing them to create new arrays for internal usage based on arrays passed
55-
by the user, preventing unnecessary creation of NumPy arrays that will
56-
ultimately lead to an additional conversion into a downstream array type.
57-
58-
Support for Python 2.7 has been dropped since NumPy 1.17, therefore we should
59-
make use of the keyword-only argument standard described in PEP-3102 [4]_ to
60-
implement the ``like=``, thus preventing it from being passed by position.
218+
``__array_function__`` has been used so far (the first argument identifies the
219+
target downstream library), and to avoid breaking NumPy's API with regards to
220+
array creation, the new ``like=`` keyword shall be used for the purpose of
221+
dispatching.
222+
223+
Downstream libraries will benefit from the ``like=`` argument without any
224+
changes to their API, given the argument is of exclusive implementation in
225+
NumPy. It will still be required that downstream libraries implement the
226+
``__array_function__`` protocol, as described by NEP 18 [1]_, and appropriately
227+
introduce the argument to their calls to NumPy array creation functions, as
228+
exemplified in :ref:`neps.like-kwarg.usage-and-impact`.
229+
230+
Related work
231+
------------
232+
233+
Other NEPs have been written to address parts of ``__array_function__``
234+
protocol's limitation, such as the introduction of the ``__duckarray__``
235+
protocol in NEP 30 [3]_, and the introduction of an overriding mechanism called
236+
``uarray`` by NEP 31 [4]_.
61237

62238
Implementation
63239
--------------
@@ -66,10 +242,10 @@ The implementation requires introducing a new ``like=`` keyword to all existing
66242
array creation functions of NumPy. As examples of functions that would add this
67243
new argument (but not limited to) we can cite those taking array-like objects
68244
such as ``array`` and ``asarray``, functions that create arrays based on
69-
numerical ranges such as ``range`` and ``linspace``, as well as the ``empty``
70-
family of functions, even though that may be redundant, since there exists
71-
already specializations for those with the naming format ``empty_like``. As of
72-
the writing of this NEP, a complete list of array creation functions can be
245+
numerical inputs such as ``range`` and ``identity``, as well as the ``empty``
246+
family of functions, even though that may be redundant, since specializations
247+
for those already exist with the naming format ``empty_like``. As of the
248+
writing of this NEP, a complete list of array creation functions can be
73249
found in [5]_.
74250

75251
This newly proposed keyword shall be removed by the ``__array_function__``
@@ -135,60 +311,45 @@ There are two downsides to the implementation above for C functions:
135311
2. To follow current implementation standards, documentation should be attached
136312
directly to the Python source code.
137313

138-
Alternatively for C functions, the implementation of ``like=`` could be moved
139-
into the C implementation itself. This is not the primary suggestion here due
140-
to its inherent complexity which would be difficult too long to describe in its
141-
entirety here, and too tedious for the reader. However, we leave that as an
142-
option open for discussion.
314+
The first version of this proposal suggested the implementation above as one
315+
viable solution for NumPy functions implemented in C. However, due to the
316+
downsides pointed out above we have decided to discard any changes on the Python
317+
side and resolve those issues with a pure-C implementation. Please refer to
318+
[implementation]_ for details.
143319

144-
Usage
145-
-----
320+
Alternatives
321+
------------
146322

147-
The purpose of this NEP is to keep things simple. Similarly, we can exemplify
148-
the usage of ``like=`` in a simple way. Imagine you have an array of ones
149-
created by a downstream library, such as CuPy. What you need now is a new array
150-
that can be created using the NumPy API, but that will in fact be created by
151-
the downstream library, a simple way to achieve that is shown below.
323+
Recently a new protocol to replace ``__array_function__`` entirely was proposed
324+
by NEP 37 [6]_, which would require considerable rework by downstream libraries
325+
that adopt ``__array_function__`` already, because of that we still believe the
326+
``like=`` argument is beneficial for NumPy and downstream libraries. However,
327+
that proposal wouldn't necessarily be considered a direct alternative to the
328+
present NEP, as it would replace NEP 18 entirely, upon which this builds.
329+
Discussion on details about this new proposal and why that would require rework
330+
by downstream libraries is beyond the scope of the present proposal.
152331

153-
.. code:: python
332+
Discussion
333+
----------
154334

155-
x = cupy.ones(2)
156-
np.array([1, 3, 5], like=x) # Returns cupy.ndarray
335+
.. [implementation] `Implementation's pull request on GitHub <https://github.com/numpy/numpy/pull/16935>`_
336+
.. [discussion] `Further discussion on implementation and the NEP's content <https://mail.python.org/pipermail/numpy-discussion/2020-August/080919.html>`_
157337
158-
As a second example, we could also create an array of evenly spaced numbers
159-
using a Dask identity matrix as reference:
338+
References
339+
----------
160340

161-
.. code:: python
341+
.. [1] `NEP 18 - A dispatch mechanism for NumPy's high level array functions <https://numpy.org/neps/nep-0018-array-function-protocol.html>`_.
162342
163-
x = dask.array.eye(3)
164-
np.linspace(0, 2, like=x) # Returns dask.array
343+
.. [2] `PEP 3102 — Keyword-Only Arguments <https://www.python.org/dev/peps/pep-3102/>`_.
165344
345+
.. [3] `NEP 30 — Duck Typing for NumPy Arrays - Implementation <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_.
166346
167-
Compatibility
168-
-------------
169-
170-
This proposal does not raise any backward compatibility issues within NumPy,
171-
given that it only introduces a new keyword argument to existing array creation
172-
functions.
173-
174-
Downstream libraries will benefit from the ``like=`` argument automatically,
175-
that is, without any explicit changes in their codebase. The only requirement
176-
is that they already implement the ``__array_function__`` protocol, as
177-
described by NEP-18 [2]_.
178-
179-
References and Footnotes
180-
------------------------
181-
182-
.. [1] `NEP-18 - A dispatch mechanism for NumPy's high level array functions <https://numpy.org/neps/nep-0018-array-function-protocol.html>`_.
183-
184-
.. [2] `NEP 30 — Duck Typing for NumPy Arrays - Implementation <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_.
185-
186-
.. [3] `NEP 31 — Context-local and global overrides of the NumPy API <https://github.com/numpy/numpy/pull/14389>`_.
187-
188-
.. [4] `PEP 3102 — Keyword-Only Arguments <https://www.python.org/dev/peps/pep-3102/>`_.
347+
.. [4] `NEP 31 — Context-local and global overrides of the NumPy API <https://github.com/numpy/numpy/pull/14389>`_.
189348
190349
.. [5] `Array creation routines <https://docs.scipy.org/doc/numpy-1.17.0/reference/routines.array-creation.html>`_.
191350
351+
.. [6] `NEP 37 — A dispatch protocol for NumPy-like modules <https://numpy.org/neps/nep-0037-array-module.html>`_.
352+
192353
Copyright
193354
---------
194355

0 commit comments

Comments
 (0)
0