10000 set_output doesn't work for inverse_transform method · Issue #27843 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

set_output doesn't work for inverse_transform method #27843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tvdboom opened this issue Nov 25, 2023 · 8 comments · May be fixed by #30376
Open

set_output doesn't work for inverse_transform method #27843

tvdboom opened this issue Nov 25, 2023 · 8 comments · May be fixed by #30376

Comments

@tvdboom
Copy link
Contributor
tvdboom commented Nov 25, 2023

Describe the bug

Using set_output(transfrom="pandas") doesn't return a pandas dataframe for the StandardScaler's inverse_transform method.

Steps/Code to Reproduce

from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

X, _ = load_breast_cancer(return_X_y=True, as_frame=True)

scaler = StandardScaler().fit(X)
Xt = scaler.transform(X)

print(scaler.inverse_transform(Xt))

Expected Results

I expect a pd.DataFrame in return, just like with the transform method.

Actual Results

A numpy array.

[[1.799e+01 1.038e+01 1.228e+02 ... 2.654e-01 4.601e-01 1.189e-01]
 [2.057e+01 1.777e+01 1.329e+02 ... 1.860e-01 2.750e-01 8.902e-02]
 [1.969e+01 2.125e+01 1.300e+02 ... 2.430e-01 3.613e-01 8.758e-02]
 ...
 [1.660e+01 2.808e+01 1.083e+02 ... 1.418e-01 2.218e-01 7.820e-02]
 [2.060e+01 2.933e+01 1.401e+02 ... 2.650e-01 4.087e-01 1.240e-01]
 [7.760e+00 2.454e+01 4.792e+01 ... 0.000e+00 2.871e-01 7.039e-02]]

Versions

System:
    python: 3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)]
executable: C:\Users\Mavs\Documents\Python\ATOM\venv311\Scripts\python.exe
   machine: Windows-10-10.0.19045-SP0
Python dependencies:
      sklearn: 1.3.2
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.24.4
        scipy: 1.11.3
       Cython: 3.0.5
       pandas: 2.1.2
   matplotlib: 3.8.0
       joblib: 1.3.2
threadpoolctl: 3.2.0
Built with OpenMP: True
threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libopenblas
       filepath: C:\Users\Mavs\Documents\Python\ATOM\venv311\Lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
        version: 0.3.21
threading_layer: pthreads
   architecture: Zen
       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: vcomp
       filepath: C:\Users\Mavs\Documents\Python\ATOM\venv311\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libopenblas
       filepath: C:\Users\Mavs\Documents\Python\ATOM\venv311\Lib\site-packages\scipy.libs\libopenblas_v0.3.20-571-g3dec11c6-gcc_10_3_0-c2315440d6b6cef5037bad648efc8c59.dll
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: Zen
       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libiomp
       filepath: C:\Users\Mavs\Documents\Python\ATOM\venv311\Lib\site-packages\torch\lib\libiomp5md.dll
        version: None
       user_api: openmp
   internal_api: openmp
    num_threads: 1
         prefix: libiomp
       filepath: C:\Users\Mavs\Documents\Python\ATOM\venv311\Lib\site-packages\torch\lib\libiompstubs5md.dll
        version: None
@tvdboom tvdboom added Bug Needs Triage Issue requires triage labels Nov 25, 2023
@glemaitre
Copy link
Member

We did not implement the inverse_transform_output so this is expected. This is not a bug but rather a feature request.

@glemaitre glemaitre added New Feature and removed Bug Needs Triage Issue requires triage labels Nov 25, 2023
@tvdboom
Copy link
Contributor Author
tvdboom commented Nov 25, 2023

I see. Is it on the roadmap?

@sky-2002
Copy link

@glemaitre I would like to take this, would be great if you could point out where to start

@RxndmForest
Copy link
RxndmForest commented Dec 2, 2023

@sky-2002 I took a look into this and my best guess is that you could add a as_frame parameter to sklearn.preprocessing.StandardScaler.inverse_transform() and return X as a dataframe if as_frame is True. To my understanding that can be done by adding to the code found here. Again, I only took a very quick glance at this so correct me if I'm mistaken @glemaitre.

UPDATE: Linked here is an implementation of as_frame for sklearn.datasets.load_breast_cancer.

@glemaitre
Copy link
Member

Nop this is not this API. This as_frame API is only for the data loader.
Here, we need to modify the _set_output.py file to enable the inverse_transform similarly done for the transform: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_set_output.py

@sky-2002
Copy link
sky-2002 commented Dec 2, 2023

Okay, thanks guys @RxndmForest @glemaitre , I am opening a PR for this, have added inverse_transform to _set_output.py, and have tested with the above example, it is now returning a dataframe.

@kdekker-kdr4
Copy link

Any update on this? Seems to be stale for some time, but it is a helpful feature.

@elise1993
Copy link

Seconding this. set_output should affect both transform and inverse_transform in the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Discussion
6 participants
0