@@ -10,7 +10,7 @@ NEP 18 — A dispatch mechanism for NumPy's high level array functions
10
10
:Status: Provisional
11
11
:Type: Standards Track
12
12
:Created: 2018-05-29
13
- :Updated: 2019-04-11
13
+ :Updated: 2019-05-25
14
14
:Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-August/078493.html
15
15
16
16
Abstact
@@ -98,12 +98,15 @@ A prototype implementation can be found in
98
98
99
99
.. note ::
100
100
101
- Dispatch with the ``__array_function__ `` protocol has been implemented on
102
- NumPy's master branch but is not yet enabled by default. In NumPy 1.16,
103
- you will need to set the environment variable
104
- ``NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1 `` before importing NumPy to test
105
- NumPy function overrides. We anticipate the protocol will be enabled by
106
- default in NumPy 1.17.
101
+ Dispatch with the ``__array_function__ `` protocol has been implemented but is
102
+ not yet enabled by default:
103
+
104
+ - In NumPy 1.16, you need to set the environment variable
105
+ ``NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=1 `` before importing NumPy to test
106
+ NumPy function overrides.
107
+ - In NumPy 1.17, the protocol will be enabled by default, but can be disabled
108
+ with ``NUMPY_EXPERIMENTAL_ARRAY_FUNCTION=0 ``.
109
+ - Eventually, expect to ``__array_function__ `` to always be enabled.
107
110
108
111
The interface
109
112
~~~~~~~~~~~~~
@@ -208,75 +211,6 @@ were explicitly used in the NumPy function call.
208
211
be impossible to correctly override NumPy functions from another object
209
212
if the operation also includes one of your objects.
210
213
211
- Avoiding nested ``__array_function__ `` overrides
212
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
213
-
214
- The special ``__skip_array_function__ `` attribute found on NumPy functions that
215
- support overrides with ``__array_function__ `` allows for calling these
216
- functions without any override checks.
217
-
218
- ``__skip_array_function__ `` always points back to the original NumPy-array
219
- specific implementation of a function. These functions do not check for
220
- ``__array_function__ `` overrides, and instead usually coerce all of their
221
- array-like arguments to NumPy arrays.
222
-
223
- .. note ::
224
-
225
- ``__skip_array_function__ `` was not included as part of the initial
226
- opt-in-only preview of ``__array_function__ `` in NumPy 1.16.
227
-
228
- Defaulting to NumPy's coercive implementations
229
- ''''''''''''''''''''''''''''''''''''''''''''''
230
-
231
- Some projects may prefer to default to NumPy's implementation, rather than
232
- explicitly defining implementing a supported API. This allows for incrementally
233
- overriding NumPy's API in projects that already support it implicitly by
234
- allowing their objects to be converted into NumPy arrays (e.g., because they
235
- implemented special methods such as ``__array__ ``). We don't recommend this
236
- for most new projects ("Explicit is better than implicit"), but in some cases
237
- it is the most expedient optio
67E6
n.
238
-
239
- Adapting the previous example:
240
-
241
- .. code :: python
242
-
243
- class MyArray :
244
- def __array_function__ (self , func , types , args , kwargs ):
245
- # It is still best practice to defer to unrecognized types
246
- if not all (issubclass (t, (MyArray, np.ndarray)) for t in types):
247
- return NotImplemented
248
-
249
- my_func = HANDLED_FUNCTIONS .get(func)
250
- if my_func is None :
251
- return func.__skip_array_function__(* args, ** kwargs)
252
- return my_func(* args, ** kwargs)
253
-
254
- def __array__ (self , dtype ):
255
- # convert this object into a NumPy array
256
-
257
- Now, if a NumPy function that isn't explicitly handled is called on
258
- ``MyArray `` object, the operation will act (almost) as if MyArray's
259
- ``__array_function__ `` method never existed.
260
-
261
- Explicitly reusing NumPy's implementation
262
- '''''''''''''''''''''''''''''''''''''''''
263
-
264
- ``__skip_array_function__ `` is also convenient for cases where an explicit
265
- set of NumPy functions should still use NumPy's implementation, by
266
- calling ``func.__skip__array_function__(*args, **kwargs) `` inside
267
- ``__array_function__ `` instead of ``func(*args, **kwargs) `` (which would
268
- lead to infinite recursion). For example, to explicitly reuse NumPy's
269
- ``array_repr() `` function on a custom array type:
270
-
271
- .. code :: python
272
-
273
- class MyArray :
274
- def __array_function__ (self , func , types , args , kwargs ):
275
- ...
276
- if func is np.array_repr:
277
- return np.array_repr.__skip_array_function__(* args, ** kwargs)
278
- ...
279
-
280
214
Necessary changes within the NumPy codebase itself
281
215
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
282
216
@@ -400,20 +334,18 @@ The ``__array_function__`` method on ``numpy.ndarray``
400
334
401
335
The use cases for subclasses with ``__array_function__ `` are the same as those
402
336
with ``__array_ufunc__ ``, so ``numpy.ndarray `` also defines a
403
- ``__array_function__ `` method.
404
-
405
- ``ndarray.__array_function__ `` is a trivial case of the "Defaulting to NumPy's
406
- implementation" strategy described above: *every * NumPy function on NumPy
407
- arrays is defined by calling NumPy's own implementation if there are other
408
- overrides:
337
+ ``__array_function__ `` method:
409
338
410
339
.. code :: python
411
340
412
341
def __array_function__ (self , func , types , args , kwargs ):
413
342
if not all (issubclass (t, ndarray) for t in types):
414
343
# Defer to any non-subclasses that implement __array_function__
415
344
return NotImplemented
416
- return func.__skip_array_function__(* args, ** kwargs)
345
+
346
+ # Use NumPy's private implementation without __array_function__
347
+ # dispatching
348
+ return func._implementation(* args, ** kwargs)
417
349
418
350
This method matches NumPy's dispatching rules, so for most part it is
419
351
possible to pretend that ``ndarray.__array_function__ `` does not exist.
@@ -427,9 +359,9 @@ returns ``NotImplemented``, NumPy's implementation of the function will be
427
359
called instead of raising an exception. This is appropriate since subclasses
428
360
are `expected to be substitutable <https://en.wikipedia.org/wiki/Liskov_substitution_principle >`_.
429
361
430
- Notice that the `` __skip_array_function__ `` function attribute allows us
431
- to avoid the special cases for NumPy arrays that were needed in the
432
- ``__array_ufunc__ `` protocol.
362
+ Note that the private `` _implementation `` attribute, defined below in the
363
+ `` array_function_dispatch `` decorator, allows us to avoid the special cases for
364
+ NumPy arrays that were needed in the ``__array_ufunc__ `` protocol.
433
365
434
366
Changes within NumPy functions
435
367
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -441,9 +373,8 @@ but of fairly simple and innocuous code that should complete quickly and
441
373
without effect if no arguments implement the ``__array_function__ ``
442
374
protocol.
443
375
444
- In most cases, these functions should written using the
445
- ``array_function_dispatch `` decorator. Error checking aside, here's what the
446
- core implementation looks like:
376
+ To achieve this, we define a ``array_function_dispatch `` decorator to rewrite
377
+ NumPy functions. The basic implementation is as follows:
447
378
448
379
.. code :: python
449
380
@@ -457,25 +388,27 @@ core implementation looks like:
457
388
implementation, public_api, relevant_args, args, kwargs)
458
389
if module is not None :
459
390
public_api.__module__ = module
460
- public_api.__skip_array_function__ = implementation
391
+ # for ndarray.__array_function__
392
+ public_api._implementation = implementation
461
393
return public_api
462
394
return decorator
463
395
464
396
# example usage
465
- def broadcast_to (array , shape , subok = None ):
397
+ def _broadcast_to_dispatcher (array , shape , subok = None ):
466
398
return (array,)
467
399
468
- @array_function_dispatch (broadcast_to , module = ' numpy' )
400
+ @array_function_dispatch (_broadcast_to_dispatcher , module = ' numpy' )
469
401
def broadcast_to (array , shape , subok = False ):
470
402
... # existing definition of np.broadcast_to
471
403
472
404
Using a decorator is great! We don't need to change the definitions of
473
405
existing NumPy functions, and only need to write a few additional lines
474
- to define dispatcher function. We originally thought that we might want to
475
- implement dispatching for some NumPy functions without the decorator, but
476
- so far it seems to cover every case.
406
+ for the dispatcher function. We could even reuse a single dispatcher for
407
+ families of functions with the same signature (e.g., ``sum `` and ``prod ``).
408
+ For such functions, the largest change could be adding a few lines to the
409
+ docstring to note which arguments are checked for overloads.
477
410
478
- Within NumPy 's implementation, it's worth calling out the decorator's use of
411
+ It 's particularly worth calling out the decorator's use of
479
412
``functools.wraps ``:
480
413
481
414
- This ensures that the wrapped function has the same name and docstring as
@@ -489,14 +422,6 @@ Within NumPy's implementation, it's worth calling out the decorator's use of
489
422
The example usage illustrates several best practices for writing dispatchers
490
423
relevant to NumPy contributors:
491
424
492
- - We gave the "dispatcher" function ``broadcast_to `` the exact same name and
493
- arguments as the "implementation" function. The matching arguments are
494
- required, because the function generated by ``array_function_dispatch `` will
495
- call the dispatcher in *exactly * the same way as it was called. The matching
496
- function name isn't strictly necessary, but ensures that Python reports the
497
- original function name in error messages if invalid arguments are used, e.g.,
498
- ``TypeError: broadcast_to() got an unexpected keyword argument ``.
499
-
500
425
- We passed the ``module `` argument, which in turn sets the ``__module__ ``
501
426
attribute on the generated function. This is for the benefit of better error
502
427
messages, here for errors raised internally by NumPy when no implementation
@@ -600,36 +525,6 @@ concerned about performance differences measured in microsecond(s) on NumPy
600
525
functions, because it's difficult to do *anything * in Python in less than a
601
526
microsecond.
602
527
603
- For rare cases where NumPy functions are called in performance critical inner
604
- loops on small arrays or scalars, it is possible to avoid the overhead of
605
- dispatching by calling the versions of NumPy functions skipping
606
- ``__array_function__ `` checks available in the ``__skip_array_function__ ``
607
- attribute. For example:
608
-
609
- .. code :: python
610
-
611
- dot = getattr (np.dot, ' __skip_array_function__' , np.dot)
612
-
613
- def naive_matrix_power (x , n ):
614
- x = np.array(x)
615
- for _ in range (n):
616
- dot(x, x, out = x)
617
- return x
618
-
619
- NumPy will use this internally to minimize overhead for NumPy functions
620
- defined in terms of other NumPy functions, but
621
- **we do not recommend it for most users **:
622
-
623
- - The specific implementation of overrides is still provisional, so the
624
- ``__skip_array_function__ `` attribute on particular functions could be
625
- removed in any NumPy release without warning.
626
- For this reason, access to ``__skip_array_function__ `` attribute outside of
627
- ``__array_function__ `` methods should *always * be guarded by using
628
- ``getattr() `` with a default value.
629
- - In cases where this makes a difference, you will get far greater speed-ups
630
- rewriting your inner loops in a compiled language, e.g., with Cython or
631
- Numba.
632
-
633
528
Use outside of NumPy
634
529
~~~~~~~~~~~~~~~~~~~~
635
530
@@ -809,48 +704,60 @@ nearly every public function in NumPy's API. This does not preclude the future
809
704
possibility of rewriting NumPy functions in terms of simplified core
810
705
functionality with ``__array_function__ `` and a protocol and/or base class for
811
706
ensuring that arrays expose methods and properties like ``numpy.ndarray ``.
707
+ However, to work well this would require the possibility of implementing
708
+ *some * but not all functions with ``__array_function__ ``, e.g., as described
709
+ in the next section.
812
710
813
- Coercion to a NumPy array as a catch-all fallback
814
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
711
+ Partial implementation of NumPy's API
712
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
815
713
816
714
With the current design, classes that implement ``__array_function__ ``
817
- to overload at least one function can opt-out of overriding other functions
818
- by using the ``__skip_array_function__ `` function, as described above under
819
- "Defaulting to NumPy's implementation."
820
-
821
- However, this still results in different behavior than not implementing
822
- ``__array_function__ `` in at least one edge case. If multiple objects implement
823
- ``__array_function__ `` but don't know about each other NumPy will raise
824
- ``TypeError `` if all methods return ``NotImplemented ``, whereas if no arguments
825
- defined ``__array_function__ `` methods it would attempt to coerce all of them
826
- to NumPy arrays.
827
-
828
- Alternatively, this could be "fixed" by writing a ``__array_function__ ``
829
- method that always calls ``__skip_array_function__() `` instead of returning
830
- ``NotImplemented `` for some functions, but that would result in a type
831
- whose implementation cannot be overriden by over argumetns -- like NumPy
832
- arrays themselves prior to the introduction of this protocol.
833
-
834
- Either way, it is not possible to *exactly * maintain the current behavior of
835
- all NumPy functions if at least one more function is overriden. If preserving
836
- this behavior is important, we could potentially solve it by changing the
837
- handling of return values in ``__array_function__ `` in either of two ways:
838
-
839
- 1. Change the meaning of all arguments returning ``NotImplemented `` to indicate
840
- that all arguments should be coerced to NumPy arrays and the operation
841
- should be retried. However, many array libraries (e.g., scipy.sparse) really
842
- don't want implicit conversions to NumPy arrays, and often avoid implementing
843
- ``__array__ `` for exactly this reason. Implicit conversions can result in
844
- silent bugs and performance degradation.
715
+ to overload at least one function implicitly declare an intent to
716
+ implement the entire NumPy API. It's not possible to implement *only *
717
+ ``np.concatenate() `` on a type, but fall back to NumPy's default
718
+ behavior of casting with ``np.asarray() `` for all other functions.
719
+
720
+ This could present a backwards compatibility concern that would
721
+ discourage libraries from adopting ``__array_function__ `` in an
722
+ incremental fashion. For example, currently most numpy functions will
723
+ implicitly convert ``pandas.Series `` objects into NumPy arrays, behavior
724
+ that assuredly many pandas users rely on. If pandas implemented
725
+ ``__array_function__ `` only for ``np.concatenate ``, unrelated NumPy
726
+ functions like ``np.nanmean `` would suddenly break on pandas objects by
727
+ raising TypeError.
728
+
729
+ Even libraries that reimplement most of NumPy's public API sometimes rely upon
730
+ using utility functions from NumPy without a wrapper. For example, both CuPy
731
+ and JAX simply `use an alias <https://github.com/numpy/numpy/issues/12974 >`_ to
732
+ ``np.result_type ``, which already supports duck-types with a ``dtype ``
733
+ attribute.
734
+
735
+ With ``__array_ufunc__ ``, it's possible to alleviate this concern by
736
+ casting all arguments to numpy arrays and re-calling the ufunc, but the
737
+ heterogeneous function signatures supported by ``__array_function__ ``
738
+ make it impossible to implement this generic fallback behavior for
739
+ ``__array_function__ ``.
740
+
741
+ We considered three possible ways to resolve this issue, but none were
742
+ entirely satisfactory:
743
+
744
+ 1. Change the meaning of all arguments returning ``NotImplemented `` from
745
+ ``__array_function__ `` to indicate that all arguments should be coerced to
746
+ NumPy arrays and the operation should be retried. However, many array
747
+ libraries (e.g., scipy.sparse) really don't want implicit conversions to
748
+ NumPy arrays, and often avoid implementing ``__array__ `` for exactly this
749
+ reason. Implicit conversions can result in silent bugs and performance
750
+ degradation.
845
751
846
752
Potentially, we could enable this behavior only for types that implement
847
753
``__array__ ``, which would resolve the most problematic cases like
848
754
scipy.sparse. But in practice, a large fraction of classes that present a
849
755
high level API like NumPy arrays already implement ``__array__ ``. This would
850
756
preclude reliable use of NumPy's high level API on these objects.
757
+
851
758
2. Use another sentinel value of some sort, e.g.,
852
- ``np.NotImplementedButCoercible ``, to indicate that a class implementing part
853
- of NumPy's higher level array API is coercible as a fallback. If all
759
+ ``np.NotImplementedButCoercible ``, to indicate that a class implementing
760
+ part of NumPy's higher level array API is coercible as a fallback. If all
854
761
arguments return ``NotImplementedButCoercible ``, arguments would be coerced
855
762
and the operation would be retried.
856
763
@@ -863,10 +770,20 @@ handling of return values in ``__array_function__`` in either of two ways:
863
770
logic an arbitrary number of times. Either way, the dispatching rules would
864
771
definitely get more complex and harder to reason about.
865
772
866
- At present, neither of these alternatives looks like a good idea. Reusing
867
- ``__skip_array_function__() `` looks like it should suffice for most purposes.
868
- Arguably this loss in flexibility is a virtue: fallback implementations often
869
- result in unpredictable and undesired behavior.
773
+ 3. Allow access to NumPy's implementation of functions, e.g., in the form of
774
+ a publicly exposed ``__skip_array_function__ `` attribute on the NumPy
775
+ functions. This would allow for falling back to NumPy's implementation by
776
+ using ``func.__skip_array_function__ `` inside ``__array_function__ ``
777
+ methods, and could also potentially be used to be used to avoid the
778
+ overhead of dispatching. However, it runs the risk of potentially exposing
779
+ details of NumPy's implementations for NumPy functions that do not call
780
+ ``np.asarray() `` internally. See
781
+ `this note <https://mail.python.org/pipermail/numpy-discussion/2019-May/079541.html >`_
782
+ for a summary of the full discussion.
783
+
784
+ These solutions would solve real use cases, but at the cost of additional
785
+ complexity. We would like to gain experience with how ``__array_function__ `` is
786
+ actually used before making decisions that would be difficult to roll back.
870
787
871
788
A magic decorator that inspects type annotations
872
789
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -965,8 +882,7 @@ There are two other arguments that we think *might* be important to pass to
965
882
- Access to the non-dispatched implementation (i.e., before wrapping with
966
883
``array_function_dispatch ``) in ``ndarray.__array_function__ `` would allow
967
884
us to drop special case logic for that method from
968
- ``implement_array_function ``. *Update: This has been implemented, as the
969
- ``__skip_array_function__`` attributes. *
885
+ ``implement_array_function ``.
970
886
- Access to the ``dispatcher `` function passed into
971
887
``array_function_dispatch() `` would allow ``__array_function__ ``
972
888
implementations to determine the list of "array-like" arguments in a generic
0 commit comments