-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
ENH: add machinery to support Array API #18668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now in quite good shape, and with several thumbs-up's it's time to merge this. Thanks for the hard work @tupui, @tylerjereddy and everyone else who contributed!
I went through all comments again, and here are the needed follow-ups and loose ends:
- a better name for
as_xparray
- a TODO comment in
hierarchy.py
for rewriting a fancy indexing expression (may needxp.put
to be standardized, see this comment) - there's a usage of
xp.cov
that needs changing (it's in numpy/cupy/pytorch, but not in the standard) - consider removing the
skip_if_array_api
marker (see this comment) - (after we drop
setup.py
) making the import of the vendoredarray_api_compat
nicer (see this comment)
And then next steps:
- open a tracking issue for the above issues and another one for overall coverage (I'll do that today)
- work on other scipy submodules now that we have the patterns and CI/infra for this
- reduce the number of test skips
I think we decided that this specific wrapper was doing things that weren't appropriate for general inclusion in the compat library (c.f. data-apis/array-api-compat#42), but an option could be to monkeypatch |
See api-dev doc updates in this commit for details. Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com>
1. Check for the global switch: SCIPY_ARRAY_API. This can also be accessed | ||
dynamically through ``_GLOBAL_CONFIG['SCIPY_ARRAY_API']``. | ||
2. `compliance_scipy` raise exceptions on known-bad subclasses. See | ||
it's definition for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: it's its
This PR extends the array API support of PR 18668 to also cover `scipy.fft`. `fft` is an [array API standard extension module](https://data-apis.org/array-api/latest/extensions/fourier_transform_functions.html) and has matching APIs in both CuPy and PyTorch. For more context on the standard and the goal of SciPy adoption, please see [the RFC](#18286). For explanation of the machinery of which this PR makes use, please see [the `cluster` PR](#18668). --------- Co-authored-by: Tyler Reddy <tyler.je.reddy@gmail.com> Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
@@ -995,7 +1002,8 @@ def linkage(y, method='single', metric='euclidean', optimal_ordering=False): | |||
>>> dn = dendrogram(Z) | |||
>>> plt.show() | |||
""" | |||
y = np.asarray(y, order='C', dtype=np.float64) | |||
xp = array_namespace(y) | |||
y = as_xparray(y, order='C', dtype=xp.float64, xp=xp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working on gh-21835 and this confused me, since order
only works for NumPy arrays.
It looks like order='C'
was added in 57e1ec7 to ensure that the data is contiguous before it goes into compiled code. But it seems like non-contiguous arrays from other libraries will get past this and not be forced to be contiguous when they are turned into NumPy arrays.
It this still important? Or at this point, does our machinery recognize that the arrays are not contiguous and copy the data to be contiguous (and in the appropriate order) before passing it into compiled code?
Also, should there be an error when the array library does not support options passed into the _asarray
function? I see that it's explicitly documented that order
will be silently ignored for non-NumPy arrays, but should it be?
Reference issue
RFC: #18286
Initial PR with some discussions: tupui#24
What does this implement/fix?
This PR adds the first necessary bricks to support the Array API. And it demonstrates how to use these to add support to a module, here
scipy.cluster
which is fully covered.The changes
It introduces the following.
Two global variables:
SCIPY_ARRAY_API
: either a bool,"all"
, or a list of supported backend. For now these are"cupy", "pytorch", "numpy", "numpy.array_api"
. The global variable can be adjusted dynamically. This is an optional flag. By default the backend isnumpy
and it only affects functions you modified with the helpers bellow.SCIPY_DEVICE
: can be"cpu", "gpu", "mps"
(something supported by the backend)Three helpers:
array_namespace
: detect the namespace based on input arrays and do some input validation (like refusing to work with masked Array, please see the RFC.)as_xparray
: a drop-in replacement fornp.asarray
with additional features likecopy, check_finite
.as_xparray_namespace
: does both, detect the namespace and convert arrays in one step.Some testing fixture:
array_api_compatible
: use a parametrisation to run a test on multiple Array backend.skip_if_array_api
: don't run a test ifSCIPY_ARRAY_API
is on.skip_if_array_api_gpu
: don't run a test if GPU is involved (also applies to PyTorch's MPS mode.)A new runtime dependency is needed, array-api-compat. It could easily be vendored if we want, but for now it seems better to just pull it from PyPi. In the end, we would like to push as much as possible all of our "tricks" there so that we do the same things across all libraries.
And finally, there is a new GitHub action workflow which runs
pytorch-cpu
.How to add support for a new function?
I will show an example. Let's say we have this function (yes I am making it up now and is completely random, so don't run it 😅):
You would convert this like so:
The key is that, going to C or Cython requires to go back to NumPy. Meaning that it will raise if you are on a GPU as the conversion would be silent otherwise which is not good. For CPU interactions, it should be a transparent operation.
How to test?
Just use the decorator:
Then you can use
dev.py
with he new option-b
or--array-api-backend
:This automatically set
SCIPY_ARRAY_API
appropriately. And if you want to test a different device, set manuallySCIPY_DEVICE
. e.g.SCIPY_DEVICE=mps python dev.py ...
Additional information
Thank you to everyone who helped put this together 🚀 Here are some additional resources:
Quick started from #15395 and sklearn's https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_array_api.py
Array API in sklearn: scikit-learn/scikit-learn#22352
sklearn implentation from PR scikit-learn/scikit-learn#22554 and scikit-learn/scikit-learn#25956