-
-
Notifications
You must be signed in to change notification settings - Fork 25.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add array API support for Nystroem approximation #29661
base: main
Are you sure you want to change the base?
Conversation
While working on this I kept running into this error @OmarManzoor do you happen to have any pointer on how to get around this when you worked on #29433 ? Thanks!
|
There is something fishy that happens right after sklearn/metrics/tests/test_common.py:1979: in check_array_api_metric_pairwise
check_array_api_metric(
X_np = array([[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6]], dtype=float32)
Y_np = array([[0.2, 0.3, 0.4],
[0.5, 0.6, 0.7]], dtype=float32)
array_namespace = 'torch'
device = 'cpu'
dtype_name = 'float32'
metric = <function rbf_kernel at 0x125b99b20>
metric_kwargs = {}
sklearn/metrics/tests/test_common.py:1773: in check_array_api_metric
metric_xp = metric(a_xp, b_xp, **metric_kwargs)
a_np = array([[0.1, 0.2, 0.3],
[0.4, 0.5, 0.6]], dtype=float32)
a_xp = tensor([0.1000, 0.2000, 0.3000, 0.4000, 0.5000, 0.6000])
array_namespace = 'torch'
b_np = array([[0.2, 0.3, 0.4],
[0.5, 0.6, 0.7]], dtype=float32)
b_xp = tensor([0.2000, 0.3000, 0.4000, 0.5000, 0.6000, 0.7000])
|
Hi @EmilyXinyi I checked out your branch locally and ran the Also @ogrisel I think it might be useful to merge the PR #29475 as the current PR makes use of the other kernels. What do you think? |
Let me update the branch in this PR to check if the errors in the CI can be reproduced or if they were a transient event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some feedback. In general, I think you should have a look at a merge PR that adds array API support to another unsupervised transformer (e.g. PCA):
This should give guidance on how to upgrade this estimator and test it.
if X.ndim < 2: | ||
X = xp.reshape(X, (1, -1)) | ||
if y is not None: | ||
y = xp.asarray(y, device=device(X)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change y
? it is not used by this transformer (as this is an unsupervised estimator). There is no need to change it at all.
@@ -991,6 +991,11 @@ def fit(self, X, y=None): | |||
self : object | |||
Returns the instance itself. | |||
""" | |||
xp, _ = get_namespace(X) | |||
if X.ndim < 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not access X.ndim
before the call to self._validate_data(X, accept_sparse="csr")
a few lines below.
Furthermore, self._validate_data
itself is in charge of checking the shape of X
by passing appropriate values to **check_params
if needed.
However, all scikit-learn estimators are expected to raise a standardized error message when input data is 1D. self._validate_data
is in doing this automatically by default so there is no need to do it manually.
S = np.maximum(S, 1e-12) | ||
self.normalization_ = np.dot(U / np.sqrt(S), V) | ||
dtype = _find_matching_floating_dtype(basis_kernel, xp=xp) | ||
basis_kernel = xp.asarray(basis_kernel, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed. If basis
is an xp array, then basis_kernel
should also be one. If not there is a problem. This might be fixed by #29475.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the main problem is that the _parallel_pairwise
function called by pairwise_kernels
should be upgraded to support array API.
Maybe we could have a PR dedicated to adding array API support to pairwise_kernels
first and pause this PR in the meantime to incrementally review PRs one after the other.
@pytest.mark.parametrize( | ||
"array_namespace, device, dtype_name", yield_namespace_device_dtype_combinations() | ||
) | ||
def test_nystroem_approximation(array_namespace, device, dtype_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not change the existing tests that check the behavior with regular numpy array when array API dispatch is disabled.
Instead add a new test that compares the results of using the estimator on array API inputs with array API dispatch enabled (and also check the type and device of fitted attributes) as we do for other array API estimators (e.g. test_pca_array_api_compliance
) with non-default hyper-parameter values.
Note that you should also add "array_api_support": True
to the _more_tags
method of this estimator so that our common tests pick it up and run standard array API compliance tests with the default parameters.
Also use array_api
somewhere in the test name so that it's easy to run all array API related test of scikit-learn by doing pytest -k array_api sklearn
.
What does this implement/fix? Explain your changes.
Make Nystroem approximation array API compatible
To Do
array_api_dispatch=True
ANDarray_api_dispatch=False