ENH Adds Array API support to LinearDiscriminantAnalysis #22554

thomasjpfan · 2022-02-20T03:44:50Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds Array API support to LinearDiscriminantAnalysis. There is around a 14x runtime improvement when using Array API with CuPy on GPU.

The overall design principle is to use the Array API Specification as much as possible. In the short term, there will be an awkward transition as we need to support both NumPy and ArrayAPI. In the far term, the most maintainable position for the code base is to only use the Array API specification.

I extended the Array API spec in _ArrayAPIWrapper where these a feature we must have. In _NumPyApiWrapper, I added functions to the NumPy namespace adopt the functions in the Array API spec.

Any other comments?

There is still the question of how to communicated the feature. For this PR, I only implemented it for solver="svd".

jjerphan

Thank you, @thomasjpfan.

I can get a ~~similar~~ ×2 speed-up ratio on a machine with one NVIDIA Quadro RTX 6000 using the provided notebook.

This PR, with the Array API dispatch:

CPU times: user 12 s, sys: 858 ms, total: 12.8 s
Wall time: 14 s

This PR, without the Array API dispatch:

CPU times: user 1min 20s, sys: 1min 6s, total: 2min 27s
Wall time: 23.3 s

To me, this PR is clear and does not introduce too much complexity.

Do you think we could (if it's worth it) come up with adaptors for the few API mismatches (e.g add.at)?

doc/whats_new/v1.1.rst

sklearn/utils/validation.py

jjerphan · 2022-03-07T15:52:04Z

sklearn/discriminant_analysis.py

-        self.intercept_ = -0.5 * np.sum(coef**2, axis=1) + np.log(self.priors_)
-        self.coef_ = np.dot(coef, self.scalings_.T)
-        self.intercept_ -= np.dot(self.xbar_, self.coef_.T)
+        rank = xp.sum(xp.astype(S > self.tol * S[0], xp.int32))


Side-note: What is the intend of specifying xp.int32? Does it makes the sum faster?

ArrayAPI is very strict when it comes to bools. S > self.tol returns a boolean array which can not be summed.

I suspect it is because there is no type promotion rules for bools.

Yes, that sounds right. There are also no int-float mixed type casting rules for arrays, because erroring out is a valid design choice and something at least TensorFlow does (PyTorch also limits what it wants to allow without being explicit).

There could perhaps be a rule for Python bool to Python int, but there's probably little appetite for array dtype cross-kind casting rules.

sklearn/utils/tests/test_array_api.py

thomasjpfan · 2022-03-07T19:43:05Z

Are your timings reversed? It looks like Array API makes it slower.

jjerphan · 2022-03-07T20:00:55Z

You are right, I just have corrected it.

ogrisel

Still familiarizing my-self with the Array API standard and the NumPy implementation but here is a first pass of comments for this PR.

sklearn/_config.py

sklearn/utils/_array_api.py

sklearn/utils/tests/test_array_api.py

ogrisel · 2022-03-08T09:38:33Z

sklearn/utils/tests/test_array_api.py

+    pytest.importorskip("numpy", minversion="1.22", reason="Requires Array API")
+
+    X_np = numpy.asarray([[1, 2, 3]])
+    xp = pytest.importorskip("numpy.array_api")


We can simplify the 2 importorskip into a single one, right? NumPy 1.21 did not expose a numpy.array_api submodule (I checked).

Suggested change

pytest.importorskip("numpy", minversion="1.22", reason="Requires Array API")

X_np = numpy.asarray([[1, 2, 3]])

xp = pytest.importorskip("numpy.array_api")

# This test requires NumPy 1.22 or later for its implementation of the

# Array API specification:

with warnings.catch_warnings():

warnings.simplefilter("ignore") # ignore experimental warning

xp = pytest.importorskip("numpy.array_api")

X_np = numpy.asarray([[1, 2, 3]])

sklea 8000 rn/utils/tests/test_array_api.py

sklearn/utils/_array_api.py

ogrisel · 2022-03-08T11:13:18Z

sklearn/utils/_array_api.py

+        return getattr(self._namespace, name)
+
+    def astype(self, x, dtype, *, copy=True, casting="unsafe"):
+        # support casting for NumPy


Suggested change

# support casting for NumPy

# Extend Array API to support `casting` for NumPy containers

Is there any issue to track the support of custom casting in the spec?

I can not find a discussion on astype & casting. Maybe @rgommers has a link?

I suspect it's because other libraries do not really implement casting for astype. For example, cupy.astype does not support casting.

For this specific PR, we are using casting in check_array (which was added in #14872):

scikit-learn/sklearn/utils/validation.py

Line 840 in cd5385e

array = array.astype(dtype, casting="unsafe", copy=False)

I think we do not need to set casting here since the default is unsafe. For reference, the casting behavior of nans and infs are not specified in the ArrayAPI spec

For example:

import numpy as np np_float_arr = np.asarray([1, 2, np.nan], dtype=np.float32) print(np_int_float.astype(np.int32)) # On a x86 machine: # [ 1 2 -2147483648] # But on a M1 mac: # [1, 2, 0] # Cupy cast to zeros. import cupy cp_float_arr = cupy.asarray([1, 2, cupy.nan], dtype=cupy.float32) print(cp_float_arr.astype(cupy.int32)) # [1, 2, 0]

For reference, the casting behavior of nans and infs are not specified in the ArrayAPI spec

That seems like something to specify - even if just to say it's undefined behavior. Which it is I think, as evidenced by the inconsistent NumPy results here.

I suspect it's because other libraries do not really implement casting for astype.

That is typically the reason. The PR that added astype lists all supported keywords across libraries, and only NumPy and Dask have casting.

A question is if the concept of casting modes is useful enough to include. I'm not sure to be honest (but I didn't think about it very hard yet). The default in numpy.ndarray.astype is unsafe anyway, which is the only reasonable choice probably - because code like astype(x_f64, float32) shouldn't raise as it is very explicit.

Out of curiosity I did a a quick survey of our current use of the casting kwarg.

git grep "casting=" sklearn/preprocessing/_polynomial.py: casting="no", sklearn/ F438 tree/_tree.pyx: return n_classes.astype(expected_dtype, casting="same_kind") sklearn/tree/_tree.pyx: return value_ndarray.astype(expected_dtype, casting='equiv') sklearn/tree/_tree.pyx: return node_ndarray.astype(expected_dtype, casting="same_kind") sklearn/tree/tests/test_tree.py: return node_ndarray.astype(new_dtype, casting="same_kind") sklearn/tree/tests/test_tree.py: return node_ndarray.astype(new_dtype, casting="same_kind") sklearn/tree/tests/test_tree.py: new_n_classes = n_classes.astype(new_dtype, casting="same_kind") sklearn/utils/validation.py: # inf (numpy#14412). We cannot use casting='safe' because sklearn/utils/validation.py: array = array.astype(dtype, casting="unsafe", copy=False)

We can ignore the tree files because they are written in Cython and will never benefit from Array API compat.

So we just have sklearn/preprocessing/_polynomial.py with casting="no" and sklearn/utils/validation.py with casting="unsafe".

So it's probably indeed not worth exposing the casting argument in our xp wrapper and always use unsafe.

sklearn/utils/_array_api.py

ogrisel · 2022-03-08T11:19:55Z

I can get a similar speed-up ratio on a machine with one NVIDIA Quadro RTX 6000 using the provided notebook.

You report a ~2x speed-up instead of a 14x speed-up in @thomasjpfan notebook though. I am not sure if this is expected or not.

thomasjpfan · 2022-03-08T16:10:29Z

It could be because of different hardware. I ran my benchmarks using a Nvidia 3090 and a 5950x (16 core 32 thread) single CPU. It was also in a workstation environment where I can supply the GPU with 400 Watts of power.

jjerphan · 2022-03-10T11:29:42Z

You report a ~2x speed-up instead of a 14x speed-up in @thomasjpfan notebook though. I am not sure if this is expected or not.

Exact, I did a mistake comparing based on "total". Updated.

adrinjalali

This doesn't look too complicated, better than what I imagined.

I do think we need to test against something which is not numpy, and figure out the performance implications.

sklearn/_config.py

sklearn/utils/_array_api.py

adrinjalali · 2022-03-07T14:12:39Z

sklearn/utils/extmath.py

+        X = xp.exp(X)
    else:
+        np.exp(X, out=X)


one of the things that worries me is the performance implications of not having in-place operations.

Yea, there is not a great way around this without extending our own version of ArrayAPI further and special case NumPy.

The reasons for not having out= is in https://github.com/scikit-learn/scikit-learn/pull/22554/files#r825086171

sklearn/discriminant_analysis.py

adrinjalali · 2022-03-10T11:36:38Z

sklearn/discriminant_analysis.py

-        X = np.sqrt(fac) * (Xc / std)
+        X = math.sqrt(fac) * (Xc / std)


are they strictly equivalent? Shouldn't this be xp.sqrt instead?

xp.sqrt does not work on Python scalars such as frac. We would need to call xp.asarray(fac) before calling xp.sqrt.

Oh I see. It makes sense, but it also means a developer would need to know when to use xp. and when math., which I guess can be confusing. Should array api actually handle this?

but it also means a developer would need to know when to use xp. and when math., which I guess can be confusing. Should array api actually handle this?

From my understanding, this was by design for ArrayAPI. Only NumPy has the concept of "array scalars", while all other array libraries use a 0d array. (np.sqrt(python_scalar) returns an NumPy scalar, while math.sqrt(python_scalar) is a Python scalar)

In our case, the ArrayAPI forces us to think about using math for Python scalars. From a developer point of view, I agree it is one more think to think about, but I think it's better to be more strict about these types.

A side benefit is that the Python scalar version is faster:

%%timeit _ = np.log(431.456) # 315 ns ± 5.46 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) %%timeit _ = math.log(431.456) # 68.5 ns ± 0.519 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

which can be a difference for code that run in loops.

Looking at this again, I think it's better to do xp.asarray() on the scalar, so we can use xp.sqrt on it.

lolol, why? I was convinced with you last comment.

For me, the pros and cons of math vs xp.asarray on Python scalars is balanced. The argument for using xp.asarray is that it forces us to be in "array land" and not need to think about any Python scalar + Array interactions. Although, the ArrayAPI spec does state that that python_scalar * array is the same as xp.asarray(python_scalar, dtype=dtype_of_array) * array.

REF: https://discuss.scientific-python.org/t/poll-future-numpy-behavior-when-mixing-arrays-numpy-scalars-and-python-scalars/202

sklearn/discriminant_analysis.py

adrinjalali · 2022-03-10T11:43:46Z

sklearn/tests/test_discriminant_analysis.py

+def test_lda_array_api(X, y):
+    """Check that the array_api Array gives the same results as ndarrays."""
+    pytest.importorskip("numpy", minversion="1.22", reason="Requires Array API")
+    xp = pytest.importorskip("numpy.array_api")


we kinda still have numpy array api specific code. I think we need something here which is not numpy's array api implementation to test.

We will be able to do this with pytorch in CPU mode once pytorch's compliance as improved.

Progress is tracked here:

pytorch/pytorch#58743

The other mature enough candidate is CuPy but this one requires maintaining a GPU CI runner. I would rather start with numpy only on our CI in the short term.

But we could improve this test to make it work with CuPy with a non-default parametrization:

@pytest.mark.parametrize("array_api_namespace", ["numpy.array_api", "cupy.array_api"]) @pytest.mark.parametrize("X, y", [(X, y), (X, y3)]) def test_lda_array_api(X, y, array_api_namespace): """Check that the array_api Array gives the same results as ndarrays.""" xp = pytest.importorskip(array_api_namespace) ...

and this way it would be easy to run those compliance tests manually on a cuda enabled host.

I think it would make sense for the array api effort to include a reference implementation. It can use numpy under the hoods, but it should be a minimal implementation, and it'll be different from numpy's own implementation since numpy has certain considerations which make its implementation not minimal. Then all libraries could test against that implementation instead of some random other library's implementation.

These is an Array API Compliance test suite here: https://github.com/data-apis/array-api-tests which test that an Array API implementation follows the spec. For a subset of operators, the test suite also test for correctness.

I see the numpy.array_api as the minimal implementation backed by NumPy. The idea around testing on another with another library's implementation is that the numerical operations can return different results depending on hardware. For us to trust that our algorithms are correct using CuPy's or PyTorch's Array API implementation, we would still need to test it ourselves.

ogrisel · 2022-03-11T17:38:16Z

sklearn/utils/extmath.py

-    sum_prob = np.sum(X, axis=1).reshape((-1, 1))
+
+    if is_array_api:
+        # array_api does not have `out=`


Is there any plan or discussion for allowing this? Maybe as an optional API extension?

No there isn't. The reason is twofold:

out= doesn't make sense for all libraries - for example, JAX and TensorFlow have immutable data structures.

Even for libraries that do have mutable arrays, out= is not a very nice API pattern. It lets users do manual optimizations that a compiler may be able to do better. There was also another argument and an alternative design presented in Preliminary discussion: Standardise support for reuse of buffers data-apis/consortium-feedback#5.

And maybe (3), IIRC NumPy and PyTorch semantics for out= aren't identical.

Thanks for the feedback. I guess we will have to keep on using those if is_array_api conditions to protect our use of numpy's out= arguments for now.

I don't necessarily see that as a block for the adoption of Array API in scikit-learn, but it does make the code looks uglier... I don't really see a potential longterm fix for this.

It lets users do manual optimizations that a compiler may be able to do better.

@rgommers I'm quite confused. Does python actually do such optimizations?

Here's what I get on my machine:

In [19]: def f2(): ...: a = np.random.rand(10000, 100000) ...: a = np.exp(a) ...: return a ...: In [20]: def f1(): ...: a = np.random.rand(10000, 100000) ...: np.exp(a, out=a) ...: return a ...: In [23]: %timeit f1() 15.6 s ± 2.53 s per loop (mean ± std. dev. of 7 runs, 1 loop each) In [24]: %timeit f2() [1] 210906 killed ipython

so the difference is quite significant (one of them gets killed 😁 )

Does python actually do such optimizations?

No it doesn't - but Python doesn't have a compiler? I meant a JIT or AOT compiler like JAX's JIT or PyTorch's Torchscript. It is not strange to say that in principle X = xp.exp(X) can be rewritten to an inplace update (i.e., exp(X, out=X)) by a compiler transformation if and only if the memory backing X isn't used elsewhere, right?

This code looks fishy by the way for copy=False; the docs say nothing about inplace updating of the input array, which I'd consider a bug in library code if it were public. And to avoid this footgun, it defaults to copy=True which is always slow?

(one of them gets killed grin )

Ouch, that doesn't look like the expected result.

No it doesn't - but Python doesn't have a compiler? I meant a JIT or AOT compiler like JAX's JIT or PyTorch's Torchscript. It is not strange to say that in principle X = xp.exp(X) can be rewritten to an inplace update (i.e., exp(X, out=X)) by a compiler transformation if and only if the memory backing X isn't used elsewhere, right?

Ok now that makes sense. But here's the issue. One benefit of using array api is that developers can learn that instead of numpy's API and develop their estimators. It would also make it really easy to support array api even if they don't explicitly do so. But that raises the issue that in order to write efficient numpy code, one needs to do numpy, and for all other backends do array api, like here. I think in an ideal world, we wouldn't want to have these separate branches for the two APIs, do we?

I think in an ideal world, we wouldn't want to have these separate branches for the two APIs, do we?

I think we indeed want to have separate branches in as few places as possible. A lot of the places that are being identified are functions that can be added to either the array API standard (e.g., take, moveaxis) or NumPy (e.g., unique_*). There's a few things that will remain though, because there's an inherent tension between portability and performance. out= and order= are the two that come to mind here. And some forms of (advanced) indexing.

We reviewed the use of out= and order=, and they're used less commonly then one would expect based on the amount of discussion around them. The amount of branching that will remain in the end is quite limited, and seems like a reasonable price to pay.

Cool, I'm happy then.

This PR now has a private helper (_asarray_with_order) to explicitly deal with the case where we want to enforce a specific order for numpy (e.g. when used in conjunction with Cython code) vs array API passthrough.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

thomasjpfan · 2022-09-03T23:53:44Z

I added a note in the user guide about CuPy and NumPy here: 055c6f8 (#22554)

Those are the only two libraries that fully adopted the Array API specification.

jitting might have additional requirements

This is mostly has to do with Data-dependent output shapes. In sckit-learn's case, we use unique in many places which makes it harder to support a JAX or Dask backend.

fcharras · 2022-09-04T15:46:41Z

I can confirm the speedup that has been reported, using the notebook provided by @thomasjpfan , a local desktop with a 1070Ti GPU (and a low-end cpu), and the latest docker cupy docker image:

%%timeit for LinearDiscriminantAnalysis using 250000 samples

CPU on main:
CPU times: user 33.8 s, sys: 3.79 s, total: 37.6 s
Wall time: 16.9 s

CPU on thomasjpfan:array_api_lda_pr:
CPU times: user 32.6 s, sys: 3.74 s, total: 36.4 s
Wall time: 16 s

GPU on thomasjpfan:array_api_lda_pr:
CPU times: user 1.66 s, sys: 541 ms, total: 2.2 s
Wall time: 2.22 s

(I reduced the size of the data from 500000 to 250000 to fit in the 8GB VRAM of the GPU)

and it passes the tests for lda with array_namespace=cupy.array_api .

jjerphan

Thank you for this work, @thomasjpfan!

I have mainly two remarks:

Should we mention somewhere in LinearDiscriminantAnalysis that there's a support of the Array API when solver="svd"?
What is the behavior of LinearDiscriminantAnalysis if solver!="svd" and if it is given a tensor whose namespace is not NumPy but supports the Array API? Should it fallback to solver="svd" in this case?

Also, here are some minor suggestions.

sklearn/_config.py

jjerphan · 2022-09-05T13:24:28Z

sklearn/discriminant_analysis.py

+import scipy.linalg
 from scipy import linalg


Now that xp.linalg can be used, should we use the fully qualified name for scipy.linalg to be explicit and remove any potential ambiguity?

For this PR, I only used the full name where we had to select between xp.linalg and scipy.linalg, which I find more explicit:

https://github.com/thomasjpfan/scikit-learn/blob/5bc6f76f843d7ddf8e458021be0bf8e8f5ce1ad9/sklearn/discriminant_analysis.py#L484-L487

As for the other parts of the file, I would prefer to fully qualify linalg because I often need to look at the top of the file to see if it's numpy.linalg or scipy.linalg. I think this discussion and decision can be done independently of this PR.

sklearn/tests/test_discriminant_analysis.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

thomasjpfan · 2022-09-05T19:19:58Z

Should we mention somewhere in LinearDiscriminantAnalysis that there's a support of the Array API when solver="svd"?

Yes that makes sense. I added it in 671f4f2 (#22554)

What is the behavior of LinearDiscriminantAnalysis if solver!="svd" and if it is given a tensor whose namespace is not NumPy but supports the Array API? Should it fallback to solver="svd" in this case?

If np.asarray(some_array_api_array) works, then we case to an ndarray and run the algorithm, which is the behavior on main. If solver!="svd" and np.asarray(some_array_api_array) does not work, then we error.

I do not think we should fall back to solver="svd" and I prefer users to explicitly change the solver to "svd" for Array API support.

jjerphan

LGTM! Thank you, @thomasjpfan.

betatim · 2022-09-08T06:00:12Z

sklearn/discriminant_analysis.py

+        if is_array_api:
+            svd = xp.linalg.svd
+        else:
+            svd = scipy.linalg.svd


I'm new to the array API and such, so there is a lot I don't know. But I'm interested to learn because I'd like to contribute to efforts like this.

When reading this PR to learn more I was expecting to mostly see np. be replaced with xp. and very little or no if statements that depend on what kind of array is being processed.

Why not use svd = xp.linalg.svd for numpy and "array API" inputs? Maybe because for a = np.array(...), a doesn't have an __array_namespace__? Which made me wonder why it doesn't have that attribute. Now it feels like "one question leading to another and to another, ..." - so my ultimate question: what is a good place to talk about these things? (It feels like this PR is somehow the wrong place but I don't know a better one)

[...] what is a good place to talk about these things? (It feels like this PR is somehow the wrong place but I don't know a better one)

+1

The official organisation leading the Array API standard is data-apis. data-apis/array-api is an official repository versioning the static website of the standard but apart from its issues tracker, I do not think there is a dedicated forum.

I think @scientific-python might have been a suitable organisation to host discussions (and the standard?) but it postdates the standard (IIRC) and probably the direction of both initiative is different. (Under @scientific-python, the Array API likely would have been a SPEC IMO.)

I think https://github.com/data-apis/array-api/issues is a perfectly reasonable place to bring up these kinds of usability or "why was this choice made" discussions - and there are other such conversations there already. There's folks from many libraries participating there, so it's probably preferred over the issue tracker or mailing list of a single project. Currently issues are fine, maybe at some point the volume of such questions will grow that using a separate Discourse is helpful, but that is not the case today.

I think @scientific-python might have been a suitable organisation to host discussions (and the standard?) but it postdates the standard (IIRC) and probably the direction of both initiative is different. (Under @scientific-python, the Array API likely would have been a SPEC IMO.)

They're complementary. I don't think the direction is different, just the scopes. The Data API Consortium is squarely focusing on array and dataframe APIs and interoperability features/protocols around those. Scientific Python has a "best practices for Python projects" in general scope - and not even limited to technical topics, also best practices around communities.

The whole API standard wouldn't have been a single SPEC, the scope and level of detail in that standard is way too much for a single SPEC. See a SPEC more like a PEP or a NEP, and the array API standard as a whole open source project.

Why not use svd = xp.linalg.svd for numpy and "array API" inputs?

I think the point here is that in this case, scipy.linalg.svd is preferred over numpy.linalg.svd (it may be faster), and xp.linalg.svd would select the numpy function.

I'll also note that indeed that numpy.ndarray does not yet have a __array_namespace__ method as you noticed. The reason for that is that the main numpy namespace is not (yet) standard-compliant. Getting there takes time, due to backwards compatibility reasons. The new numpy.array_api namespace could be made compliant straight away. Over time, numpy aims to evolve its main namespace towards full compliance. The NumPy docs have a table with differences.

For this specific PR, is the scipy SVD really better than numpy SVD? Do they use different LAPACK drivers by default? Based on the documentation it seems that both seem to use gesdd by default:

https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html

https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.svd.html

If my analysis is wrong, and the choice of the LAPACK driver happens to be important (e.g. to trade of speed for numerical stability) shall we suggest an extension the Array API spec to allow for driver hints to favor one SVD algorithm of the other and fallback to an implementation-specific default otherwise?

It's the same driver routine underneath indeed. So I think it should be the same - the differences are:

NumPy may have been built without a proper LAPACK dependency (using the f2c'd lapack_lite sources instead). This may still be fairly common when folks build from source.

NumPy using the CLAPACK interface, while SciPy uses the Fortran interface (this shouldn't matter all that much).

I suspect that now fewer and fewer people compile numpy from source and instead use either wheels, conda packages or linux distro packages.

But let's be safe and keep the scipy exception in this PR for now.

@betatim would you like more time to review this PR? Otherwise I think we could merge it.

I have no opinion on merge or not. It will take me a while to learn enough to have an opinion, and I can read the code here or when it is in main :D So go ahead and merge.

GaelVaroquaux · 2022-09-20T13:55:09Z

1. NumPy may have been built without a proper LAPACK dependency (using the f2c'd lapack_lite sources instead).

Yes, I've seen this often.

jjerphan · 2022-09-21T13:37:04Z

Thank you for this work, @thomasjpfan!

jakirkham · 2022-09-23T07:37:06Z

sklearn/discriminant_analysis.py

-        n_classes = len(self.classes_)
+        n_classes = self.classes_.shape[0]


It might be nice to allow len in the array spec to avoid needing this kind of change. Raised as issue ( data-apis/array-api#481 )

This was on purpose and was discussed fairly extensively. The new code is much clearer than the old one, len on a >1-D array is an anti-pattern imho.

thomasjpfan added 2 commits February 19, 2022 22:28

ENH Adds ArrayAPI support to LinearDiscriminantAnalysis

749d5ec

DOC Adds PR number

d8cab77

github-actions bot added module:linear_model module:utils labels Feb 20, 2022

Merge remote-tracking branch 'upstream/main' into array_api_lda_pr

f641535

thomasjpfan marked this pull request as draft February 23, 2022 02:49

thomasjpfan added 8 commits February 22, 2022 23:05

ENH Full support for array api

8240c95

TST Adds array api test

d485018

TST Remove assert

101bbd9

FIX Adds path for smallest normal

c9459ef

FIX Private expit

d6d19f0

TST Adds better test for get_namespace

1f84146

TST Fix get_namespace

b3b9af9

TST Adds more coverage

abc25db

thomasjpfan marked this pull request as ready for review February 28, 2022 00:51

jjerphan reviewed Mar 7, 2022

View reviewed changes

ogrisel reviewed Mar 8, 2022

View reviewed changes

adrinjalali reviewed Mar 10, 2022

View reviewed changes

rgommers mentioned this pull request Mar 10, 2022

Array API standard and Numpy compatibility data-apis/array-api#400

Closed

ogrisel reviewed Mar 11, 2022

View reviewed changes

thomasjpfan and others added 5 commits March 11, 2022 22:00

Apply suggestions from code review

15b0e0b

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

CLN Address comments

413fd47

CLN Address comments

c668966

Merge remote-tracking branch 'upstream/main' into array_api_lda_pr

e51c908

STY Formatting errors

1b6f260

jjerphan reviewed Sep 5, 2022

View reviewed changes

thomasjpfan and others added 3 commits September 5, 2022 14:33

Apply suggestions from code review

5bc6f76

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

DOC Add docstring to solver parameter

671f4f2

DOC Use better directive

b809895

jjerphan approved these changes Sep 6, 2022

View reviewed changes

DOC Use versionchanged directive

e8be4bc

betatim reviewed Sep 8, 2022

View reviewed changes

jjerphan merged commit 2710a9e into scikit-learn:main Sep 21, 2022

ogrisel mentioned this pull request Sep 21, 2022

Weekly CI run with NVidia GPU hardware #24491

Closed

jakirkham reviewed Sep 23, 2022

View reviewed changes

shwina mentioned this pull request Sep 30, 2022

API Should Index be made opt-in? pandas-dev/pandas#48880

Closed

ogrisel mentioned this pull request Oct 10, 2022

[WIP] NEP-18 support for preprocessing algorithms #17744

Closed

fcharras mentioned this pull request Oct 20, 2022

[DRAFT] Engine plugin API and engine entry point for Lloyd's KMeans #24497

Closed

jjerphan mentioned this pull request Oct 21, 2022

Implement the Array API pola-rs/polars#2249

Closed

xadupre mentioned this pull request Mar 14, 2023

[WIP] Numpy API for ONNX onnx/onnx#4975

Closed

thomasjpfan mentioned this pull request Mar 23, 2023

ENH Adds PyTorch support to LinearDiscriminantAnalysis #25956

Merged

betatim mentioned this pull request Mar 30, 2023

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

josef-pkt mentioned this pull request Apr 18, 2023

ENH: array api, using computational backend other than numpy statsmodels/statsmodels#8809

Open

betatim added the Array API label May 8, 2023

This was referenced May 8, 2023

Add common tests for estimators that support the Array API #26348

Closed

ENH Add common Array API tests and estimator tag #26372

Merged

tupui mentioned this pull request Jun 13, 2023

ENH: add machinery to support Array API scipy/scipy#18668

Merged

jakirkham mentioned this pull request Jul 22, 2024

Avoid np.asarray call in check_array for duck-typed arrays #11447

Closed

	# support casting for NumPy
	# Extend Array API to support `casting` for NumPy containers

		X = np.sqrt(fac) * (Xc / std)
		X = math.sqrt(fac) * (Xc / std)

		n_classes = len(self.classes_)
		n_classes = self.classes_.shape[0]

Uh oh!

ENH Adds Array API support to LinearDiscriminantAnalysis #22554

ENH Adds Array API support to LinearDiscriminantAnalysis #22554

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment