8000 DOC: revert __skip_array_function__ from NEP-18 · numpy/numpy@a8ba10f · GitHub
[go: up one dir, main page]

Skip to content

Commit a8ba10f

Browse files
committed
DOC: revert __skip_array_function__ from NEP-18
This reverts most of the changes from GH-13305, and adds a brief discussion of ``__skip_array_function__`` into the "Alternatives" section. We still use NumPy's implementation of the function internally inside ``ndarray.__array_function__``, but I've given it a new name in the NEP (``_implementation``) to indicate that it's a private API.
1 parent cf704e7 commit a8ba10f

File tree

1 file changed

+79
-166
lines changed

1 file changed

+79
-166
lines changed

doc/neps/nep-0018-array-function-protocol.rst

Lines changed: 79 additions & 166 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ NEP 18 — A dispatch mechanism for NumPy's high level array functions
1010
:Status: Provisional
1111
:Type: Standards Track
1212
:Created: 2018-05-29
13-
:Updated: 2019-04-11
13+
:Updated: 2019-05-25
1414
:Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-August/078493.html
1515

1616
Abstact
@@ -208,75 +208,6 @@ were explicitly used in the NumPy function call.
208208
be impossible to correctly override NumPy functions from another object
209209
if the operation also includes one of your objects.
210210

211-
Avoiding nested ``__array_function__`` overrides
212-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
213-
214-
The special ``__skip_array_function__`` attribute found on NumPy functions that
215-
support overrides with ``__array_function__`` allows for calling these
216-
functions without any override checks.
217-
218-
``__skip_array_function__`` always points back to the original NumPy-array
219-
specific implementation of a function. These functions do not check for
220-
``__array_function__`` overrides, and instead usually coerce all of their
221-
array-like arguments to NumPy arrays.
222-
223-
.. note::
224-
225-
``__skip_array_function__`` was not included as part of the initial
226-
opt-in-only preview of ``__array_function__`` in NumPy 1.16.
227-
228-
Defaulting to NumPy's coercive implementations
229-
''''''''''''''''''''''''''''''''''''''''''''''
230-
231-
Some projects may prefer to default to NumPy's implementation, rather than
232-
explicitly defining implementing a supported API. This allows for incrementally
233-
overriding NumPy's API in projects that already support it implicitly by
234-
allowing their objects to be converted into NumPy arrays (e.g., because they
235-
implemented special methods such as ``__array__``). We don't recommend this
236-
for most new projects ("Explicit is better than implicit"), but in some cases
237-
it is the most expedient option.
238-
239-
Adapting the previous example:
240-
241-
.. code:: python
242-
243-
class MyArray:
244-
def __array_function__(self, func, types, args, kwargs):
245-
# It is still best practice to defer to unrecognized types
246-
if not all(issubclass(t, (MyArray, np.ndarray)) for t in types):
247-
return NotImplemented
248-
249-
my_func = HANDLED_FUNCTIONS.get(func)
250-
if my_func is None:
251-
return func.__skip_array_function__(*args, **kwargs)
252-
return my_func(*args, **kwargs)
253-
254-
def __array__(self, dtype):
255-
# convert this object into a NumPy array
256-
257-
Now, if a NumPy function that isn't explicitly handled is called on
258-
``MyArray`` object, the operation will act (almost) as if MyArray's
259-
``__array_function__`` method never existed.
260-
261-
Explicitly reusing NumPy's implementation
262-
'''''''''''''''''''''''''''''''''''''''''
263-
264-
``__skip_array_function__`` is also convenient for cases where an explicit
265-
set of NumPy functions should still use NumPy's implementation, by
266-
calling ``func.__skip__array_function__(*args, **kwargs)`` inside
267-
``__array_function__`` instead of ``func(*args, **kwargs)`` (which would
268-
lead to infinite recursion). For example, to explicitly reuse NumPy's
269-
``array_repr()`` function on a custom array type:
270-
271-
.. code:: python
272-
273-
class MyArray:
274-
def __array_function__(self, func, types, args, kwargs):
275-
...
276-
if func is np.array_repr:
277-
return np.array_repr.__skip_array_function__(*args, **kwargs)
278-
...
279-
280211
Necessary changes within the NumPy codebase itself
281212
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
282213

@@ -400,20 +331,18 @@ The ``__array_function__`` method on ``numpy.ndarray``
400331

401332
The use cases for subclasses with ``__array_function__`` are the same as those
402333
with ``__array_ufunc__``, so ``numpy.ndarray`` also defines a
403-
``__array_function__`` method.
404-
405-
``ndarray.__array_function__`` is a trivial case of the "Defaulting to NumPy's
406-
implementation" strategy described above: *every* NumPy function on NumPy
407-
arrays is defined by calling NumPy's own implementation if there are other
408-
overrides:
334+
``__array_function__`` method:
409335

410336
.. code:: python
411337
412338
def __array_function__(self, func, types, args, kwargs):
413339
if not all(issubclass(t, ndarray) for t in types):
414340
# Defer to any non-subclasses that implement __array_function__
415341
return NotImplemented
416-
return func.__skip_array_function__(*args, **kwargs)
342+
343+
# Use NumPy's private implementation without __array_function__
344+
# dispatching
345+
return func._implementation(*args, **kwargs)
417346
418347
This method matches NumPy's dispatching rules, so for most part it is
419348
possible to pretend that ``ndarray.__array_function__`` does not exist.
@@ -427,9 +356,9 @@ returns ``NotImplemented``, NumPy's implementation of the function will be
427356
called instead of raising an exception. This is appropriate since subclasses
428357
are `expected to be substitutable <https://en.wikipedia.org/wiki/Liskov_substitution_principle>`_.
429358

430-
Notice that the ``__skip_array_function__`` function attribute allows us
431-
to avoid the special cases for NumPy arrays that were needed in the
432-
``__array_ufunc__`` protocol.
359+
Note that the private ``_implementation`` attribute, defined below in the
360+
``array_function_dispatch`` decorator, allows us to avoid the special cases for
361+
NumPy arrays that were needed in the ``__array_ufunc__`` protocol.
433362

434363
Changes within NumPy functions
435364
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -441,9 +370,8 @@ but of fairly simple and innocuous code that should complete quickly and
441370
without effect if no arguments implement the ``__array_function__``
442371
protocol.
443372

444-
In most cases, these functions should written using the
445-
``array_function_dispatch`` decorator. Error checking aside, here's what the
446-
core implementation looks like:
373+
To achieve this, we define a ``array_function_dispatch`` decorator to rewrite
374+
NumPy functions. The basic implementation is as follows:
447375

448376
.. code:: python
449377
@@ -457,25 +385,27 @@ core implementation looks like:
457385
implementation, public_api, relevant_args, args, kwargs)
458386
if module is not None:
459387
public_api.__module__ = module
460-
public_api.__skip_array_function__ = implementation
388+
# for ndarray.__array_function__
389+
public_api._implementation = implementation
461390
return public_api
462391
return decorator
463392
464393
# example usage
465-
def broadcast_to(array, shape, subok=None):
394+
def _broadcast_to_dispatcher(array, shape, subok=None):
466395
return (array,)
467396
468-
@array_function_dispatch(broadcast_to, module='numpy')
397+
@array_function_dispatch(_broadcast_to_dispatcher, module='numpy')
469398
def broadcast_to(array, shape, subok=False):
470399
... # existing definition of np.broadcast_to
471400
472401
Using a decorator is great! We don't need to change the definitions of
473402
existing NumPy functions, and only need to write a few additional lines
474-
to define dispatcher function. We originally thought that we might want to
475-
implement dispatching for some NumPy functions without the decorator, but
476-
so far it seems to cover every case.
403+
for the dispatcher function. We could even reuse a single dispatcher for
404+
families of functions with the same signature (e.g., ``sum`` and ``prod``).
405+
For such functions, the largest change could be adding a few lines to the
406+
docstring to note which arguments are checked for overloads.
477407

478-
Within NumPy's implementation, it's worth calling out the decorator's use of
408+
It's particularly worth calling out the decorator's use of
479409
``functools.wraps``:
480410

481411
- This ensures that the wrapped function has the same name and docstring as
@@ -489,14 +419,6 @@ Within NumPy's implementation, it's worth calling out the decorator's use of
489419
The example usage illustrates several best practices for writing dispatchers
490420
relevant to NumPy contributors:
491421

492-
- We gave the "dispatcher" function ``broadcast_to`` the exact same name and
493-
arguments as the "implementation" function. The matching arguments are
494-
required, because the function generated by ``array_function_dispatch`` will
495-
call the dispatcher in *exactly* the same way as it was called. The matching
496-
function name isn't strictly necessary, but ensures that Python reports the
497-
original function name in error messages if invalid arguments are used, e.g.,
498-
``TypeError: broadcast_to() got an unexpected keyword argument``.
499-
500422
- We passed the ``module`` argument, which in turn sets the ``__module__``
501423
attribute on the generated function. This is for the benefit of better error
502424
messages, here for errors raised internally by NumPy when no implementation
@@ -600,36 +522,6 @@ concerned about performance differences measured in microsecond(s) on NumPy
600522
functions, because it's difficult to do *anything* in Python in less than a
601523
microsecond.
602524

603-
For rare cases where NumPy functions are called in performance critical inner
604-
loops on small arrays or scalars, it is possible to avoid the overhead of
605-
dispatching by calling the versions of NumPy functions skipping
606-
``__array_function__`` checks available in the ``__skip_array_function__``
607-
attribute. For example:
608-
609-
.. code:: python
610-
611-
dot = getattr(np.dot, '__skip_array_function__', np.dot)
612-
613-
def naive_matrix_power(x, n):
614-
x = np.array(x)
615-
for _ in range(n):
616-
dot(x, x, out=x)
617-
return x
618-
619-
NumPy will use this internally to minimize overhead for NumPy functions
620-
defined in terms of other NumPy functions, but
621-
**we do not recommend it for most users**:
622-
623-
- The specific implementation of overrides is still provisional, so the
624-
``__skip_array_function__`` attribute on particular functions could be
625-
removed in any NumPy release without warning.
626-
For this reason, access to ``__skip_array_function__`` attribute outside of
627-
``__array_function__`` methods should *always* be guarded by using
628-
``getattr()`` with a default value.
629-
- In cases where this makes a difference, you will get far greater speed-ups
630-
rewriting your inner loops in a compiled language, e.g., with Cython or
631-
Numba.
632-
633525
Use outside of NumPy
634526
~~~~~~~~~~~~~~~~~~~~
635527

@@ -809,48 +701,60 @@ nearly every public function in NumPy's API. This does not preclude the future
809701
possibility of rewriting NumPy functions in terms of simplified core
810702
functionality with ``__array_function__`` and a protocol and/or base class for
811703
ensuring that arrays expose methods and properties like ``numpy.ndarray``.
704+
However, to work well this would require the possibility of implementing
705+
*some* but not all functions with ``__array_function__``, e.g., as described
706+
in the next section.
812707

813-
Coercion to a NumPy array as a catch-all fallback
814-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
708+
Partial implementation of NumPy's API
709+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
815710

816711
With the current design, classes that implement ``__array_function__``
817-
to overload at least one function can opt-out of overriding other functions
818-
by using the ``__skip_array_function__`` function, as described above under
819-
"Defaulting to NumPy's implementation."
820-
821-
However, this still results in different behavior than not implementing
822-
``__array_function__`` in at least one edge case. If multiple objects implement
823-
``__array_function__`` but don't know about each other NumPy will raise
824-
``TypeError`` if all methods return ``NotImplemented``, whereas if no arguments
825-
defined ``__array_function__`` methods it would attempt to coerce all of them
826-
to NumPy arrays.
827-
828-
Alternatively, this could be "fixed" by writing a ``__array_function__``
829-
method that always calls ``__skip_array_function__()`` instead of returning
830-
``NotImplemented`` for some functions, but that would result in a type
831-
whose implementation cannot be overriden by over argumetns -- like NumPy
832-
arrays themselves prior to the introduction of this protocol.
833-
834-
Either way, it is not possible to *exactly* maintain the current behavior of
835-
all NumPy functions if at least one more function is overriden. If preserving
836-
this behavior is important, we could potentially solve it by changing the
837-
handling of return values in ``__array_function__`` in either of two ways:
838-
839-
1. Change the meaning of all arguments returning ``NotImplemented`` to indicate
840-
that all arguments should be coerced to NumPy arrays and the operation
841-
should be retried. However, many array libraries (e.g., scipy.sparse) really
842-
don't want implicit conversions to NumPy arrays, and often avoid implementing
843-
``__array__`` for exactly this reason. Implicit conversions can result in
844-
silent bugs and performance degradation.
712+
to overload at least one function implicitly declare an intent to
713+
implement the entire NumPy API. It's not possible to implement *only*
714+
``np.concatenate()`` on a type, but fall back to NumPy's default
715+
behavior of casting with ``np.asarray()`` for all other functions.
716+
717+
This could present a backwards compatibility concern that would
718+
discourage libraries from adopting ``__array_function__`` in an
719+
incremental fashion. For example, currently most numpy functions will
720+
implicitly convert ``pandas.Series`` objects into NumPy arrays, behavior
721+
that assuredly many pandas users rely on. If pandas implemented
722+
``__array_function__`` only for ``np.concatenate``, unrelated NumPy
723+
functions like ``np.nanmean`` would suddenly break on pandas objects by
724+
raising TypeError.
725+
726+
Even libraries that reimplement most of NumPy's public API sometimes rely upon
727+
using utility functions from NumPy without a wrapper. For example, both CuPy
728+
and JAX simply `use an alias <https://github.com/numpy/numpy/issues/12974>`_ to
729+
``np.result_type``, which already supports duck-types with a ``dtype``
730+
attribute.
731+
732+
With ``__array_ufunc__``, it's possible to alleviate this concern by
733+
casting all arguments to numpy arrays and re-calling the ufunc, but the
734+
heterogeneous function signatures supported by ``__array_function__``
735+
make it impossible to implement this generic fallback behavior for
736+
``__array_function__``.
737+
738+
We considered three possible ways to resolve this issue, but none were
739+
entirely satisfactory:
740+
741+
1. Change the meaning of all arguments returning ``NotImplemented`` from
742+
``__array_function__`` to indicate that all arguments should be coerced to
743+
NumPy arrays and the operation should be retried. However, many array
744+
libraries (e.g., scipy.sparse) really don't want implicit conversions to
745+
NumPy arrays, and often avoid implementing ``__array__`` for exactly this
746+
reason. Implicit conversions can result in silent bugs and performance
747+
degradation.
845748

846749
Potentially, we could enable this behavior only for types that implement
847750
``__array__``, which would resolve the most problematic cases like
848751
scipy.sparse. But in practice, a large fraction of classes that present a
849752
high level API like NumPy arrays already implement ``__array__``. This would
850753
preclude reliable use of NumPy's high level API on these objects.
754+
851755
2. Use another sentinel value of some sort, e.g.,
852-
``np.NotImplementedButCoercible``, to indicate that a class implementing part
853-
of NumPy's higher level array API is coercible as a fallback. If all
756+
``np.NotImplementedButCoercible``, to indicate that a class implementing
757+
part of NumPy's higher level array API is coercible as a fallback. If all
854758
arguments return ``NotImplementedButCoercible``, arguments would be coerced
855759
and the operation would be retried.
856760

@@ -863,10 +767,20 @@ handling of return values in ``__array_function__`` in either of two ways:
863767
logic an arbitrary number of times. Either way, the dispatching rules would
864768
definitely get more complex and harder to reason about.
865769

866-
At present, neither of these alternatives looks like a good idea. Reusing
867-
``__skip_array_function__()`` looks like it should suffice for most purposes.
868-
Arguably this loss in flexibility is a virtue: fallback implementations often
869-
result in unpredictable and undesired behavior.
770+
3. Allow access to NumPy's implementation of functions, e.g., in the form of
771+
a publicly exposed ``__skip_array_function__`` attribute on the NumPy
772+
functions. This would allow for falling back to NumPy's implementation by
773+
using ``func.__skip_array_function__`` inside ``__array_function__``
774+
methods, and could also potentially be used to be used to avoid the
775+
overhead of dispatching. However, it runs the risk of potentially exposing
776+
details of NumPy's implementations for NumPy functions that do not call
777+
``np.asarray()`` internally. See
778+
`this note <https://mail.python.org/pipermail/numpy-discussion/2019-May/079541.html>`_
779+
for a summary of the full discussion.
780+
781+
These solutions would solve real use cases, but at the cost of additional
782+
complexity. We would like to gain experience with how ``__array_function__`` is
783+
actually used before making decisions that would be difficult to roll back.
870784

871785
A magic decorator that inspects type annotations
872786
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -965,8 +879,7 @@ There are two other arguments that we think *might* be important to pass to
965879
- Access to the non-dispatched implementation (i.e., before wrapping with
966880
``array_function_dispatch``) in ``ndarray.__array_function__`` would allow
967881
us to drop special case logic for that method from
968-
``implement_array_function``. *Update: This has been implemented, as the
969-
``__skip_array_function__`` attributes.*
882+
``implement_array_function``.
970883
- Access to the ``dispatcher`` function passed into
971884
``array_function_dispatch()`` would allow ``__array_function__``
972885
implementations to determine the list of "array-like" arguments in a generic

0 commit comments

Comments
 (0)
0