@@ -5,29 +5,41 @@ Interoperability with NumPy
5
5
6
6
NumPy's ndarray objects provide both a high-level API for operations on
7
7
array-structured data and a concrete implementation of the API based on
8
- :ref: `strided in-RAM storage <arrays >`.
9
- While this API is powerful and fairly general, its concrete implementation has
10
- limitations. As datasets grow and NumPy becomes used in a variety of new
11
- environments and architectures, there are cases where the strided in-RAM storage
12
- strategy is inappropriate, which has caused different libraries to reimplement
13
- this API for their own uses. This includes GPU arrays (CuPy _), Sparse arrays
14
- (`scipy.sparse `, `PyData/Sparse <Sparse _>`_) and parallel arrays (Dask _ arrays)
15
- as well as various NumPy-like implementations in deep learning frameworks, like
16
- TensorFlow _ and PyTorch _. Similarly, there are many projects that build on top
17
- of the NumPy API for labeled and indexed arrays (XArray _), automatic
18
- differentiation (JAX _), masked arrays (`numpy.ma `), physical units
19
- (astropy.units _, pint _, unyt _), among others that add additional functionality
20
- on top of the NumPy API.
8
+ :ref: `strided in-RAM storage <arrays >`. While this API is powerful and fairly
9
+ general, its concrete implementation has limitations. As datasets grow and NumPy
10
+ becomes used in a variety of new environments and architectures, there are cases
11
+ where the strided in-RAM storage strategy is inappropriate, which has caused
12
+ different libraries to reimplement this API for their own uses. This includes
13
+ GPU arrays (CuPy _), Sparse arrays (`scipy.sparse `, `PyData/Sparse <Sparse _>`_)
14
+ and parallel arrays (Dask _ arrays) as well as various NumPy-like implementations
15
+ in deep learning frameworks, like TensorFlow _ and PyTorch _. Similarly, there are
16
+ many projects that build on top of the NumPy API for labeled and indexed arrays
17
+ (XArray _), automatic differentiation (JAX _), masked arrays (`numpy.ma `),
18
+ physical units (astropy.units _, pint _, unyt _), among others that add additional
19
+ functionality on top of the NumPy API.
21
20
22
21
Yet, users still want to work with these arrays using the familiar NumPy API and
23
22
re-use existing code with minimal (ideally zero) porting overhead. With this
24
23
goal in mind, various protocols are defined for implementations of
25
- multi-dimensional arrays with high-level APIs matching NumPy.
24
+ multi-dimensional arrays with high-level APIs matching NumPy.
26
25
27
- Using arbitrary objects in NumPy
28
- --------------------------------
26
+ Broadly speaking, there are three groups of features used for interoperability
27
+ with NumPy:
29
28
30
- When NumPy functions encounter a foreign object, they will try (in order):
29
+ 1. Methods of turning a foreign object into an ndarray;
30
+ 2. Methods of deferring execution from a NumPy function to another array
31
+ library;
32
+ 3. Methods that use NumPy functions and return an instance of a foreign object.
33
+
34
+ We describe these features below.
35
+
36
+
37
+ 1. Using arbitrary objects in NumPy
38
+ -----------------------------------
39
+
40
+ The first set of interoperability features from the NumPy API allows foreign
41
+ objects to be treated as NumPy arrays whenever possible. When NumPy functions
42
+ encounter a foreign object, they will try (in order):
31
43
32
44
1. The buffer protocol, described :py:doc: `in the Python C-API documentation
33
45
<c-api/buffer>`.
@@ -106,18 +118,22 @@ as the original object and any attributes/behavior it may have had, is lost.
106
118
To see an example of a custom array implementation including the use of
107
119
``__array__() ``, see :ref: `basics.dispatch `.
108
120
109
- Operating on foreign objects without converting
110
- -----------------------------------------------
121
+
122
+ 2. Operating on foreign objects without converting
123
+ --------------------------------------------------
124
+
125
+ A second set of methods defined by the NumPy API allows us to defer the
126
+ execution from a NumPy function to another array library.
111
127
112
128
Consider the following function.
113
129
114
130
>>> import numpy as np
115
131
>>> def f (x ):
116
132
... return np.mean(np.exp(x))
117
133
118
- Note that `np.exp ` is a :ref: `ufunc <ufuncs-basics >`, which means that it
119
- operates on ndarrays in an element-by-element fashion. On the other hand,
120
- `np.mean ` operates along one of the array's axes.
134
+ Note that `np.exp <numpy.exp> ` is a :ref: `ufunc <ufuncs-basics >`, which means
135
+ that it operates on ndarrays in an element-by-element fashion. On the other
136
+ hand, `np.mean <numpy.mean> ` operates along one of the array's axes.
121
137
122
138
We can apply ``f `` to a NumPy ndarray object directly:
123
139
@@ -126,8 +142,7 @@ We can apply ``f`` to a NumPy ndarray object directly:
126
142
21.1977562209304
127
143
128
144
We would like this function to work equally well with any NumPy-like array
129
- object. Some of this is possible today with various protocol mechanisms within
130
- NumPy.
145
+ object.
131
146
132
147
NumPy allows a class to indicate that it would like to handle computations in a
133
148
custom-defined way through the following interfaces:
@@ -139,15 +154,15 @@ custom-defined way through the following interfaces:
139
154
140
155
As long as foreign objects implement the ``__array_ufunc__ `` or
141
156
``__array_function__ `` protocols, it is possible to operate on them without the
142
- need for explicit conversion.
157
+ need for explicit conversion.
143
158
144
159
The ``__array_ufunc__ `` protocol
145
160
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
146
161
147
162
A :ref: `universal function (or ufunc for short) <ufuncs-basics >` is a
148
163
“vectorized” wrapper for a function that takes a fixed number of specific inputs
149
164
and produces a fixed number of specific outputs. The output of the ufunc (and
150
- its methods) is not necessarily an ndarray, if not all input arguments are
165
+ its methods) is not necessarily a ndarray, if not all input arguments are
151
166
ndarrays. Indeed, if any input defines an ``__array_ufunc__ `` method, control
152
167
will be passed completely to that function, i.e., the ufunc is overridden. The
153
168
``__array_ufunc__ `` method defined on that (non-ndarray) object has access to
@@ -173,6 +188,36 @@ The semantics of ``__array_function__`` are very similar to ``__array_ufunc__``,
173
188
except the operation is specified by an arbitrary callable object rather than a
174
189
ufunc instance and method. For more details, see :ref: `NEP18 `.
175
190
191
+
192
+ 3. Returning foreign objects
193
+ ----------------------------
194
+
195
+ A third type of feature set is meant to use the NumPy function implementation
196
+ and then convert the return value back into an instance of the foreign object.
197
+ The ``__array_finalize__ `` and ``__array_wrap__ `` methods act behind the scenes
198
+ to ensure that the return type of a NumPy function can be specified as needed.
199
+
200
+ The ``__array_finalize__ `` method is the mechanism that NumPy provides to allow
201
+ subclasses to handle the various ways that new instances get created. This
202
+ method is called whenever the system internally allocates a new array from an
203
+ object which is a subclass (subtype) of the ndarray. It can be used to change
204
+ attributes after construction, or to update meta-information from the “parent.”
205
+
206
+ The ``__array_wrap__ `` method “wraps up the action” in the sense of allowing a
207
+ subclass to set the type of the return value and update attributes and metadata.
208
+ This can be seen as the opposite of the ``__array__ `` method. At the end of
209
+ every ufunc, this method is called on the input object with the
210
+ highest *array priority *, or the output object if one was specified. The
211
+ ``__array_priority__ `` attribute is used to determine what type of object to
212
+ return in situations where there is more than one possibility for the Python
213
+ type of the returned object. Subclasses may opt to use this method to transform
214
+ the output array into an instance of the subclass and update metadata before
215
+ returning the array to the user.
216
+
217
+ For more information on these methods, see :ref: `basics.subclassing ` and
218
+ :ref: `specific-array-subtyping `.
219
+
220
+
176
221
Interoperability examples
177
222
-------------------------
178
223
@@ -218,6 +263,7 @@ We can even do operations with other ndarrays:
218
263
>>> type (result)
219
264
numpy.ndarray
220
265
266
+
221
267
Example: PyTorch tensors
222
268
~~~~~~~~~~~~~~~~~~~~~~~~
223
269
@@ -343,8 +389,11 @@ Further reading
343
389
- :ref: `basics.dispatch `
344
390
- :ref: `special-attributes-and-methods ` (details on the ``__array_ufunc__ `` and
345
391
``__array_function__ `` protocols)
346
- - `NumPy roadmap: interoperability
347
- <https://numpy.org/neps/roadmap.html#interoperability> `__
392
+ - :ref: `basics.subclassing ` (details on the ``__array_wrap__ `` and
393
+ ``__array_finalize__ `` methods)
394
+ - :ref: `specific-array-subtyping ` (more details on the implementation of
395
+ ``__array_finalize__ ``, ``__array_wrap__ `` and ``__array_priority__ ``)
396
+ - :doc: `NumPy roadmap: interoperability <neps:roadmap >`
348
397
- `PyTorch documentation on the Bridge with NumPy
349
398
<https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html#bridge-to-np-label> `__
350
399
0 commit comments