8000 MinMaxScaler is not array API compliant if clip=True · Issue #29607 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MinMaxScaler is not array API compliant if clip=True #29607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cbourjau opened this issue Aug 1, 2024 · 8 comments · Fixed by #29751
Closed

MinMaxScaler is not array API compliant if clip=True #29607

cbourjau opened this issue Aug 1, 2024 · 8 comments · Fixed by #29751

Comments

@cbourjau
Copy link
cbourjau commented Aug 1, 2024

Describe the bug

The MinMaxScaler is listed as array API compliant, but uses non compliant code under some configuration. Namely, it calls clip (link to standard) with the non-standard out kwarg here.

While this particular fix may be simple, I am a bit unsure how to best test for array API compliance. The array-api-strict package is only implemented up to version 2022.12 of the standard which didn't include clip yet. I actually discovered the bug using the lazy ndonnx package, but it does not serve well as a reproducer here since I also run into other, earlier, issues related to eager data validation.

Steps/Code to Reproduce

from sklearn.preprocessing import MinMaxScaler
import array_api_strict as xp
import numpy as np
import sklearn

def test_minmax_scaler_clip():
    feature_range = (0, 1)
    # test behaviour of the parameter 'clip' in MinMaxScaler
    # X = iris.data
    X = np.array([[-1, 2], [-0.5, 6], [0, 10], [1, 18]])
    scaler = MinMaxScaler(feature_range=feature_range, clip=True).fit(X)
    X_min, X_max = np.min(X, axis=0), np.max(X, axis=0)
    X_test = xp.asarray([np.r_[X_min[:2] - 10, X_max[2:] + 10]])
    X_transformed = scaler.transform(X_test)


with sklearn.config_context(array_api_dispatch=True):
    test_minmax_scaler_clip()

Expected Results

I expected that MinMaxScaler.transform would just work with a standard compliant implementation.

Actual Results

The array-api-strict package is lagging behind and fails with the notice that clip is not implemented, yet.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 2
      1 with sklearn.config_context(array_api_dispatch=True):
----> 2     test_minmax_scaler_clip()

Cell In[5], line 9, in test_minmax_scaler_clip()
      7 X_min, X_max = np.min(X, axis=0), np.max(X, axis=0)
      8 X_test = xp.asarray([np.r_[X_min[:2] - 10, X_max[2:] + 10]])
----> 9 X_transformed = scaler.transform(X_test)

File [~/micromamba/envs/strict-array-api/lib/python3.12/site-packages/sklearn/utils/_set_output.py:313](http://localhost:8889/lab/tree/~/micromamba/envs/strict-array-api/lib/python3.12/site-packages/sklearn/utils/_set_output.py#line=312), in _wrap_method_output.<locals>.wrapped(self, X, *args, **kwargs)
    311 @wraps(f)
    312 def wrapped(self, X, *args, **kwargs):
--> 313     data_to_wrap = f(self, X, *args, **kwargs)
    314     if isinstance(data_to_wrap, tuple):
    315         # only wrap the first output for cross decomposition
    316         return_tuple = (
    317             _wrap_data_with_container(method, data_to_wrap[0], X, self),
    318             *data_to_wrap[1:],
    319         )

File [~/micromamba/envs/strict-array-api/lib/python3.12/site-packages/sklearn/preprocessing/_data.py:546](http://localhost:8889/lab/tree/~/micromamba/envs/strict-array-api/lib/python3.12/site-packages/sklearn/preprocessing/_data.py#line=545), in MinMaxScaler.transform(self, X)
    544 X += self.min_
    545 if self.clip:
--> 546     xp.clip(X, self.feature_range[0], self.feature_range[1], out=X)
    547 return X

File [~/micromamba/envs/strict-array-api/lib/python3.12/site-packages/array_api_strict/_flags.py:355](http://localhost:8889/lab/tree/~/micromamba/envs/strict-array-api/lib/python3.12/site-packages/array_api_strict/_flags.py#line=354), in requires_api_version.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    352 @functools.wraps(func)
    353 def wrapper(*args, **kwargs):
    354     if version > API_VERSION:
--> 355         raise RuntimeError(
    356             f"The function {func.__name__} requires API version {version} or later, "
    357             f"but the current API version for array-api-strict is {API_VERSION}"
    358         )
    359     return func(*args, **kwargs)

RuntimeError: The function clip requires API version 2023.12 or later, but the current API version for array-api-strict is 2022.12

Versions

System:
    python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:13:44) [Clang 16.0.6 ]
executable: /Users/c.bourjau/micromamba/envs/strict-array-api/bin/python
   machine: macOS-14.2.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.5.1
          pip: 24.2
   setuptools: 71.0.4
        numpy: 2.0.1
        scipy: 1.14.0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/c.bourjau/micromamba/envs/strict-array-api/lib/libopenblas.0.dylib
        version: 0.3.27
threading_layer: openmp
   architecture: VORTEX

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/c.bourjau/micromamba/envs/strict-array-api/lib/libomp.dylib
        version: None
@cbourjau cbourjau added Bug Needs Triage Issue requires triage labels Aug 1, 2024
@ogrisel
Copy link
Member
ogrisel commented Aug 2, 2024

Thanks for the report. Indeed, we need to change the code to use sklearn.utils._array_api._clip for the time being.

Any taker for a quick PR? @StefanieSenger @EmilyXinyi maybe?

One question though: I am not sure to which extent with want to test estimators for array API compat with non-default hparams. This can quickly lead to a combinatorial explosion of test cases.

@ogrisel ogrisel added module:preprocessing Array API and removed Needs Triage Issue requires triage labels Aug 2, 2024
@ogrisel ogrisel added this to Array API Aug 2, 2024
@cbourjau
Copy link
Author
cbourjau commented Aug 2, 2024

One question though: I am not sure to which extent with want to test estimators for array API compat with non-default hparams. This can quickly lead to a combinatorial explosion of test cases.

Indeed! I believe this may be most practically with static type checkers. Unfortunatelly, there is no Protocl for the API, yet. There is a very well develop yet seemingly dormant PR for that, though.

@StefanieSenger
Copy link
Contributor

I would have a look and try to make a PR.

@EmilyXinyi
Copy link
Contributor

Just adding a comment: I believe array-api-strict now uses the v2023.12 now and clip is supported (PR here, I think the documentation just wasn't updated ), you can update array-api-strict package using pip install array-api-strict==2.0.1. However, I think in the array-api-strict's of clip does not take in the argument out unlike numpy so that is why this error occurs.

for reference:
numpy's clip, array-api-strict's clip

@StefanieSenger
Copy link
Contributor

Thank you a lot @EmilyXinyi! That was very helpful. I didn't see that you have worked on _clip before, otherwise I would have let you go first. On the other hand, you're a perfect reviewer now. :)

@Siddharth-Latthe-07

This comment was marked as spam.

@betatim
Copy link
Member
betatim commented Aug 7, 2024

@Siddharth-Latthe-07 please don't go around posting replies on many issues indiscriminatingly. For example in this case there is already a Pull Request that addresses the issue.

@cbourjau
Copy link
Author
cbourjau commented Aug 7, 2024

Testing for Array API Compliance:- To ensure array API compliance, you can use the array-api-tests package and run tests against your code.

This does not apply here since the array-api-tests are testing implementations of the array API themselves, but that is not the issue here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
6 participants
0