8000 DOC: Added explanation document on interoperability by melissawm · Pull Request #20185 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC: Added explanation document on interoperability #20185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 2, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

10000
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Addressing review comments
  • Loading branch information
melissawm committed Nov 22, 2021
commit 7ce32d6188fcb76ad4790dd9679abdb3b7a6dacf
1 change: 1 addition & 0 deletions doc/source/reference/arrays.classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ however, of why your subroutine may not be able to handle an arbitrary
subclass of an array is that matrices redefine the "*" operator to be
matrix-multiplication, rather than element-by-element multiplication.

.. _special-attributes-and-methods:

Special attributes and methods
==============================
Expand Down
32 changes: 19 additions & 13 deletions doc/source/reference/arrays.interface.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@

.. _arrays.interface:

*******************
The Array Interface
*******************
****************************
The array interface protocol
****************************

.. note::

This page describes the numpy-specific API for accessing the contents of
a numpy array from other C extensions. :pep:`3118` --
This page describes the NumPy-specific API for accessing the contents of
a NumPy array from other C extensions. :pep:`3118` --
:c:func:`The Revised Buffer Protocol <PyObject_GetBuffer>` introduces
similar, standardized API to Python 2.6 and 3.0 for any extension
module to use. Cython__'s buffer array support
uses the :pep:`3118` API; see the `Cython numpy
uses the :pep:`3118` API; see the `Cython NumPy
tutorial`__. Cython provides a way to write code that supports the buffer
protocol with Python versions older than 2.6 because it has a
backward-compatible implementation utilizing the array interface
Expand Down Expand Up @@ -81,7 +81,8 @@ This approach to the interface consists of the object having an
===== ================================================================
``t`` Bit field (following integer gives the number of
bits in the bit field).
``b`` Boolean (integer type where all values are only True or False)
``b`` Boolean (integer type where all values are only ``True`` or
``False``)
``i`` Integer
``u`` Unsigned integer
``f`` Floating point
Expand Down Expand Up @@ -141,11 +142,11 @@ This approach to the interface consists of the object having an
must be stored by the new object if the memory area is to be
secured.

**Default**: None
**Default**: ``None``

**strides** (optional)
Either ``None`` to indicate a C-style contiguous array or
a Tuple of strides which provides the number of bytes needed
a tuple of strides which provides the number of bytes needed
to jump to the next array element in the corresponding
dimension. Each entry must be an integer (a Python
:py:class:`int`). As with shape, the values may
Expand All @@ -156,26 +157,26 @@ This approach to the interface consists of the object having an
memory buffer. In this model, the last dimension of the array
varies the fastest. For example, the default strides tuple
for an object whose array entries are 8 bytes long and whose
shape is ``(10, 20, 30)`` would be ``(4800, 240, 8)``
shape is ``(10, 20, 30)`` would be ``(4800, 240, 8)``.

**Default**: ``None`` (C-style contiguous)

**mask** (optional)
None or an object exposing the array interface. All
``None`` or an object exposing the array interface. All
elements of the mask array should be interpreted only as true
or not true indicating which elements of this array are valid.
The shape of this object should be `"broadcastable"
<arrays.broadcasting.broadcastable>` to the shape of the
original array.

**Default**: None (All array values are valid)
**Default**: ``None`` (All array values are valid)

**offset** (optional)
An integer offset into the array data region. This can only be
used when data is ``None`` or returns a :class:`buffer`
object.

**Default**: 0.
**Default**: ``0``.

**version** (required)
An integer showing the version of the interface (i.e. 3 for
Expand Down Expand Up @@ -243,6 +244,11 @@ flag is present.
returning the :c:type:`PyCapsule`, and configure a destructor to decref this
reference.

.. note::

:obj:`__array_struct__` is considered legacy and should not be used for new
code. Use the :py:doc:`buffer protocol <c-api/buffer>` instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deprecate it? I think also ndarray.dlpack could be mentioned as an alternative, although I see it is not part of the documentation ...

Checking for uses of __array_struct__ I found only pygame, and opened pygame/pygame#2949 to ask about removing it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add a deprecation note here? Or wait until it is actually marked for deprecation?



Type description examples
=========================
Expand Down
122 changes: 65 additions & 57 deletions doc/source/user/basics.interoperability.rst
To see an example of a custom array implementation including the use of the
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Interoperability with NumPy
***************************

NumPys ndarray objects provide both a high-level API for operations on
NumPy's ndarray objects provide both a high-level API for operations on
array-structured data and a concrete implementation of the API based on
`strided in-RAM storage <https://numpy.org/doc/stable/reference/arrays.html>`__.
:ref:`strided in-RAM storage <arrays>`.
While this API is powerful and fairly general, its concrete implementation has
limitations. As datasets grow and NumPy becomes used in a variety of new
environments and architectures, there are cases where the strided in-RAM storage
Expand All @@ -29,44 +29,39 @@ Using arbitrary objects in NumPy

When NumPy functions encounter a foreign object, they will try (in order):

1. The buffer protocol, described `in the Python C-API documentation
<https://docs.python.org/3/c-api/buffer.html>`__.
1. The buffer protocol, described :py:doc:`in the Python C-API documentation
<c-api/buffer>`.
2. The ``__array_interface__`` protocol, described
:ref:`in this page <arrays.interface>`. A precursor to Pythons buffer
:ref:`in this page <arrays.interface>`. A precursor to Python's buffer
protocol, it defines a way to access the contents of a NumPy array from other
C extensions.
3. The ``__array__`` protocol, which asks an arbitrary object to convert itself
into an array.
3. The ``__array__()`` method, which asks an arbitrary object to convert
itself into an array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the __dlpack__() method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, that is in a different section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean adding a mention of __dlpack()__ in the arrays.interface doc? I can at least add a link to https://data-apis.org/array-api/latest/design_topics/data_interchange.html#dlpack-support

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe leave it out for now. I thought we had a tracking issue for finishing dlpack, including documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long we don't have first-class consumer API, it doesn't really make sense to expose it any case. This should be added as part of making that consumer API first-class.


For both the buffer and the ``__array_interface__`` protocols, the object
describes its memory layout and NumPy does everything else (zero-copy if
possible). If thats not possible, the object itself is responsible for
possible). If that's not possible, the object itself is responsible for
returning a ``ndarray`` from ``__array__()``.

The array interface
~~~~~~~~~~~~~~~~~~~
The array interface protocol
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The :ref:`array interface <arrays.interface>` defines a protocol for array-like
objects to re-use each others data buffers. Its implementation relies on the
existence of the following attributes or methods:
The :ref:`array interface protocol <arrays.interface>` defines a way for
array-like objects to re-use each other's data buffers. Its implementation
relies on the existence of the following attributes or methods:

- ``__array_interface__``: a Python dictionary containing the shape, the
element type, and optionally, the data buffer address and the strides of an
array-like object;
- ``__array__()``: a method returning the NumPy ndarray view of an array-like
object;
- ``__array_struct__``: a ``PyCapsule`` containing a pointer to a
``PyArrayInterface`` C-structure.

The ``__array_interface__`` and ``__array_struct__`` attributes can be inspected
directly:
The ``__array_interface__`` attribute can be inspected directly:

>>> import numpy as np
>>> x = np.array([1, 2, 5.0, 8])
>>> x.__array_interface__
{'data': (94708397920832, False), 'strides': None, 'descr': [('', '<f8')], 'typestr': '<f8', 'shape': (4,), 'version': 3}
>>> x.__array_struct__
<capsule object NULL at 0x7f798800be40>

The ``__array_interface__`` attribute can also be used to manipulate the object
data in place:
Expand Down Expand Up @@ -96,21 +91,20 @@ We can check that ``arr`` and ``new_arr`` share the same data buffer:
array([1000, 2, 3, 4])


The ``__array__`` protocol
The ``__array__()`` method
~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``__array__`` protocol acts as a dispatch mechanism and ensures that any
NumPy-like object (an array, any object exposing the array interface, an object
whose ``__array__`` method returns an array or any nested sequence) that
implements it can be used as a NumPy array. If possible, this will mean using
``__array__`` to create a NumPy ndarray view of the array-like object.
Otherwise, this copies the data into a new ndarray object. This is not optimal,
as coercing arrays into ndarrays may cause performance problems or create the
need for copies and loss of metadata.
The ``__array__()`` method ensures that any NumPy-like object (an array, any
object exposing the array interface, an object whose ``__array__()`` method
returns an array or any nested sequence) that implements it can be used as a
NumPy array. If possible, this will mean using ``__array__()`` to create a NumPy
ndarray view of the array-like object. Otherwise, this copies the data into a
new ndarray object. This is not optimal, as coercing arrays into ndarrays may
cause performance problems or create the need for copies and loss of metadata,
as the original object and any attributes/behavior it may have had, is lost.

``__array__`` protocol, see `Writing custom array containers
<https://numpy.org/devdocs/user/basics.dispatch.html>`__.
To see an example of a custom array implementation including the use of
``__array__()``, see :ref:`basics.dispatch`.

Operating on foreign objects without converting
-----------------------------------------------
Expand All @@ -121,7 +115,11 @@ Consider the following function.
>>> def f(x):
... return np.mean(np.exp(x))

We can apply it to a NumPy ndarray object directly:
Note that `np.exp` is a :ref:`ufunc <ufuncs-basics>`, which means that it
operates on ndarrays in an element-by-element fashion. On the other hand,
`np.mean` operates along one of the array's axes.

We can apply ``f`` to a NumPy ndarray object directly:

>>> x = np.array([1, 2, 3, 4])
>>> f(x)
Expand Down Expand Up @@ -149,9 +147,13 @@ The ``__array_ufunc__`` protocol
A :ref:`universal function (or ufunc for short) <ufuncs-basics>` is a
“vectorized” wrapper for a function that takes a fixed number of specific inputs
and produces a fixed number of specific outputs. The output of the ufunc (and
its methods) is not necessarily an ndarray, if all input arguments are not
its methods) is not necessarily an ndarray, if not all input arguments are
ndarrays. Indeed, if any input defines an ``__array_ufunc__`` method, control
will be passed completely to that function, i.e., the ufunc is overridden.
will be passed completely to that function, i.e., the ufunc is overridden. The
``__array_ufunc__`` method defined on that (non-ndarray) object has access to
the NumPy ufunc. Because ufuncs have a well-defined structure, the foreign
``__array_ufunc__`` method may rely on ufunc attributes like ``.at()``,
``.reduce()``, and others.

A subclass can override what happens when executing NumPy ufuncs on it by
overriding the default ``ndarray.__array_ufunc__`` method. This method is
Expand All @@ -169,9 +171,7 @@ is safe and consistent across projects.

The semantics of ``__array_function__`` are very similar to ``__array_ufunc__``,
except the operation is specified by an arbitrary callable object rather than a
ufunc instance and method. For more details, see `NEP 18
<https://numpy.org/neps/nep-0018-array-function-protocol.html>`__.

ufunc instance and method. For more details, see :ref:`NEP18`.

Interoperability examples
-------------------------
Expand Down Expand Up @@ -223,7 +223,7 @@ Example: PyTorch tensors

`PyTorch <https://pytorch.org/>`__ is an optimized tensor library for deep
learning using GPUs and CPUs. PyTorch arrays are commonly called *tensors*.
Tensors are similar to NumPys ndarrays, except that tensors can run on GPUs or
Tensors are similar to NumPy's ndarrays, except that tensors can run on GPUs or
other hardware accelerators. In fact, tensors and NumPy arrays can often share
the same underlying memory, eliminating the need to copy data.

Expand Down Expand Up @@ -251,13 +251,22 @@ explicit conversion:
Also, note that the return type of this function is compatible with the initial
data type.

**Note** PyTorch does not implement ``__array_function__`` or
``__array_ufunc__``. Under the hood, the ``Tensor.__array__()`` method returns a
NumPy ndarray as a view of the tensor data buffer. See `this issue
<https://github.com/pytorch/pytorch/issues/24015>`__ and the
`__torch_function__ implementation
<https://github.com/pytorch/pytorch/blob/master/torch/overrides.py>`__
for details.
.. admonition:: Warning

While this mixing of ndarrays and tensors may be convenient, it is not
recommended. It will not work for non-CPU tensors, and will have unexpected
behavior in corner cases. Users should prefer explicitly converting the
ndarray to a tensor.

.. note::

PyTorch does not implement ``__array_function__`` or ``__array_ufunc__``.
Under the hood, the ``Tensor.__array__()`` method returns a NumPy ndarray as
a view of the tensor data buffer. See `this issue
<https://github.com/pytorch/pytorch/issues/24015>`__ and the
`__torch_function__ implementation
<https://github.com/pytorch/pytorch/blob/master/torch/overrides.py>`__
for details.

Example: CuPy arrays
~~~~~~~~~~~~~~~~~~~~
Expand All @@ -271,7 +280,8 @@ with Python. CuPy implements a subset of the NumPy interface by implementing
>>> x_gpu = cp.array([1, 2, 3, 4])

The ``cupy.ndarray`` object implements the ``__array_ufunc__`` interface. This
enables NumPy ufuncs to be directly operated on CuPy arrays:
enables NumPy ufuncs to be applied to CuPy arrays (this will defer operation to
the matching CuPy CUDA/ROCm implementation of the ufunc):

>>> np.mean(np.exp(x_gpu))
array(21.19775622)
Expand Down Expand Up @@ -307,8 +317,7 @@ implements a subset of the NumPy ndarray interface using blocked algorithms,
cutting up the large array into many small arrays. This allows computations on
larger-than-memory arrays using multiple cores.

Dask supports array protocols like ``__array__`` and
``__array_ufunc__``.
Dask supports ``__array__()`` and ``__array_ufunc__``.

>>> import dask.array as da
>>> x = da.random.normal(1, 0.1, size=(20, 20), chunks=(10, 10))
Expand All @@ -317,8 +326,10 @@ Dask supports array protocols like ``__array__`` and
>>> np.mean(np.exp(x)).compute()
5.090097550553843

**Note** Dask is lazily evaluated, and the result from a computation isn’t
computed until you ask for it by invoking ``compute()``.
.. note::

Dask is lazily evaluated, and the result from a computation isn't computed
until you ask for it by invoking ``compute()``.

See `the Dask array documentation
<https://docs.dask.org/en/stable/array.html>`__
Expand All @@ -328,13 +339,10 @@ and the `scope of Dask arrays interoperability with NumPy arrays
Further reading
---------------

- `The Array interface
<https://numpy.org/doc/stable/reference/arrays.interface.html>`__
- `Writing custom array containers
<https://numpy.org/devdocs/user/basics.dispatch.html>`__.
- `Special array attributes
<https://numpy.org/devdocs/reference/arrays.classes.html#special-attributes-and-methods>`__
(details on the ``__array_ufunc__`` and ``__array_function__`` protocols)
- :ref:`arrays.interface`
- :ref:`basics.dispatch`
- :ref:`special-attributes-and-methods` (details on the ``__array_ufunc__`` and
``__array_function__`` protocols)
- `NumPy roadmap: interoperability
<https://numpy.org/neps/roadmap.html#interoperability>`__
- `PyTorch documentation on the Bridge with NumPy
Expand Down
0