10000 ENH: add machinery to support Array API by tupui · Pull Request #18668 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content

ENH: add machinery to support Array API #18668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 119 commits into from
Jul 12, 2023
Merged

ENH: add machinery to support Array API #18668

merged 119 commits into from
Jul 12, 2023

Conversation

tupui
Copy link
Member
@tupui tupui commented Jun 13, 2023

Reference issue

RFC: #18286
Initial PR with some discussions: tupui#24

What does this implement/fix?

This PR adds the first necessary bricks to support the Array API. And it demonstrates how to use these to add support to a module, here scipy.cluster which is fully covered.

The changes

It introduces the following.

Two global variables:

  • SCIPY_ARRAY_API: either a bool, "all", or a list of supported backend. For now these are "cupy", "pytorch", "numpy", "numpy.array_api". The global variable can be adjusted dynamically. This is an optional flag. By default the backend is numpy and it only affects functions you modified with the helpers bellow.
  • SCIPY_DEVICE: can be "cpu", "gpu", "mps" (something supported by the backend)

Three helpers:

  • array_namespace: detect the namespace based on input arrays and do some input validation (like refusing to work with masked Array, please see the RFC.)
  • as_xparray: a drop-in replacement for np.asarray with additional features like copy, check_finite.
  • as_xparray_namespace: does both, detect the namespace and convert arrays in one step.

Some testing fixture:

  • array_api_compatible: use a parametrisation to run a test on multiple Array backend.
  • skip_if_array_api: don't run a test if SCIPY_ARRAY_API is on.
  • skip_if_array_api_gpu: don't run a test if GPU is involved (also applies to PyTorch's MPS mode.)

A new runtime dependency is needed, array-api-compat. It could easily be vendored if we want, but for now it seems better to just pull it from PyPi. In the end, we would like to push as much as possible all of our "tricks" there so that we do the same things across all libraries.

And finally, there is a new GitHub action workflow which runs pytorch-cpu.

How to add support for a new function?

I will show an example. Let's say we have this function (yes I am making it up now and is completely random, so don't run it 😅):

def toto(a, b):
    a = np.asarray(a)
    b = np.asarray(b, copy=True)

    c = np.sum(a) - np.prod(b)

    # this is some C or Cython call
    d = cdist(c)

    return d

You would convert this like so:

def toto(a, b):
    xp = array_namespace(a, b)
    a = xp.asarray(a)
    b = as_xparray(b, copy=True)

    c = xp.sum(a) - xp.prod(b)

    # this is some C or Cython call
    c = np.asarray(c)
    d = cdist(c)
    d = xp.asarray(d)

    return d

The key is that, going to C or Cython requires to go back to NumPy. Meaning that it will raise if you are on a GPU as the conversion would be silent otherwise which is not good. For CPU interactions, it should be a transparent operation.

How to test?

Just use the decorator:

@array_api_compatible
def test_toto(self, xp):
    a = xp.asarray([1, 2, 3])
    b = xp.asarray([0, 2, 5])
    toto(a, b)

Then you can use dev.py with he new option -b or --array-api-backend:

python dev.py test -b numpy -b pytorch -s cluster

This automatically set SCIPY_ARRAY_API appropriately. And if you want to test a different device, set manually SCIPY_DEVICE. e.g. SCIPY_DEVICE=mps python dev.py ...

Additional information

Thank you to everyone who helped put this together 🚀 Here are some additional resources:

Quick started from #15395 and sklearn's https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_array_api.py

Array API in sklearn: scikit-learn/scikit-learn#22352

sklearn implentation from PR scikit-learn/scikit-learn#22554 and scikit-learn/scikit-learn#25956

tupui added 30 commits April 26, 2023 21:28
Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now in quite good shape, and with several thumbs-up's it's time to merge this. Thanks for the hard work @tupui, @tylerjereddy and everyone else who contributed!

I went through all comments again, and here are the needed follow-ups and loose ends:

  • a better name for as_xparray
  • a TODO comment in hierarchy.py for rewriting a fancy indexing expression (may need xp.put to be standardized, see this comment)
  • there's a usage of xp.cov that needs changing (it's in numpy/cupy/pytorch, but not in the standard)
  • consider removing the skip_if_array_api marker (see this comment)
  • (after we drop setup.py) making the import of the vendored array_api_compat nicer (see this comment)

And then next steps:

  • open a tracking issue for the above issues and another one for overall coverage (I'll do that today)
  • work on other scipy submodules now that we have the patterns and CI/infra for this
  • reduce the number of test skips

@rgommers rgommers merged commit bd8ef0c into scipy:main Jul 12, 2023
@rgommers rgommers added this to the 1.12.0 milestone Jul 12, 2023
@rgommers rgommers added the array types Items related to array API support and input array validation (see gh-18286) label Jul 12, 2023
@asmeurer
Copy link
asmeurer commented Jul 13, 2023

I was planning to open a tracking issue with follow-up actions. If you already want to open a separate issue for a better name for as_xparray @h-vetinari, I think that would be helpful. Choosing names is hard, good ideas welcome:)

I think we decided that this specific wrapper was doing things that weren't appropriate for general inclusion in the compat library (c.f. data-apis/array-api-compat#42), but an option could be to monkeypatch array_api_compat.asarray, especially since you are vendoring it.

@rgommers
Copy link
Member

Thanks for the suggestion @asmeurer. I'll move your comment to the follow-up issue (gh-18866) now.

alugowski pushed a commit to alugowski/scipy that referenced this pull request Jul 16, 2023
See api-dev doc updates in this commit for details.

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
@tupui tupui deleted the array_api branch July 24, 2023 09:10
1. Check for the global switch: SCIPY_ARRAY_API. This can also be accessed
dynamically through ``_GLOBAL_CONFIG['SCIPY_ARRAY_API']``.
2. `compliance_scipy` raise exceptions on known-bad subclasses. See
it's definition for more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: it's its

rgommers added a commit that referenced this pull request Sep 19, 2023
This PR extends the array API support of PR 18668 to also cover `scipy.fft`.

`fft` is an [array API standard extension module](https://data-apis.org/array-api/latest/extensions/fourier_transform_functions.html)
and has matching APIs in both CuPy and PyTorch.

For more context on the standard and the goal of SciPy adoption,
please see [the RFC](#18286).

For explanation of the machinery of which this PR makes use, please see
[the `cluster` PR](#18668).

---------

Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
B41A
@@ -995,7 +1002,8 @@ def linkage(y, method='single', metric='euclidean', optimal_ordering=False):
>>> dn = dendrogram(Z)
>>> plt.show()
"""
y = np.asarray(y, order='C', dtype=np.float64)
xp = array_namespace(y)
y = as_xparray(y, order='C', dtype=xp.float64, xp=xp)
Copy link
Contributor
@mdhaber mdhaber Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on gh-21835 and this confused me, since order only works for NumPy arrays.

It looks like order='C' was added in 57e1ec7 to ensure that the data is contiguous before it goes into compiled code. But it seems like non-contiguous arrays from other libraries will get past this and not be forced to be contiguous when they are turned into NumPy arrays.

It this still important? Or at this point, does our machinery recognize that the arrays are not contiguous and copy the data to be contiguous (and in the appropriate order) before passing it into compiled code?

Also, should there be an error when the array library does not support options passed into the _asarray function? I see that it's explicitly documented that order will be silently ignored for non-NumPy arrays, but should it be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
array types Items related to array API support and input array validation (see gh-18286) enhancement A new feature or improvement RFC Request for Comments; typically used to gather feedback for a substantial change proposal SciPEP SciPy Enhancement Proposal scipy.cluster scipy._lib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0