8000 Support inverse_transform in ColumnTransformer · Issue #11463 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Support inverse_transform in ColumnTransformer #11463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jnothman opened this issue Jul 10, 2018 · 19 comments · May be fixed by #11639
Open

Support inverse_transform in ColumnTransformer #11463

jnothman opened this issue Jul 10, 2018 · 19 comments · May be fixed by #11639

Comments

@jnothman
Copy link
Member
jnothman commented Jul 10, 2018

Since ColumnTransformer is primarily used to apply basic preprocessing, it would be extremely helpful if it (and FeatureUnion while we're at it) supported inverse_transform (where each of its constituent estimators does too). This also would replace some functionality we might otherwise support with categorical_features or ignored_features etc parameters (H/T @qinhanmin2014).

In order for it to support inverse_transform, it would need to have an index of which columns in the output came from which transformers, which is what I intended to solve in #1952. This is not hard as long as fit_transform, rather than fit, is called, or as long as all the constituent transformers support get_feature_names. Maybe we could force a run of transform when fit is called in order to check the output size.

(Alternatively, it might be nice to have a consistent API for a transformer to declare its output size, which n_components_ does for a handful of estimators.)

@thomasjpfan
Copy link
Member

To implementColumnTransformer.inverse_transform, all the dropped columns would have to be stored. For users that do not need this feature, this may inflate the size of the ColumnTransformer object. I think adding a parameter named needs_inverse, to allow users to opt into this feature would be beneficial. What do you think?

@amueller
Copy link
Member

I'm not sure I understand. Each input column can be used in multiple transformers, right? How would these be combined?

@thomasjpfan
Copy link
Member

I didn't consider that use case! Since current version of ColumnTransformer already supports this, I think tests should be added to test "each input column can be used in multiple transformers".

Regarding the issue with inverse_transform and combining different inputs, I see three options

  1. When there is a transformer that targets the same input column, the call to inverse_transform raises an error.
  2. Add another parameter to allow uses to define how to aggregated the inputs.
  3. Do not support inverse_transform in ColumnTransformer

@jnothman
Copy link
Member Author
jnothman commented Jul 15, 2018 via email

@amueller
Copy link
Member

With drop we could just fill it with 0 or missing values but I think we shouldn't try to support the overlap case. I'm ok to support the no overlap case though.

@arkanoid87
Copy link

I stepped on this too

@jnothman
Copy link
Member Author
jnothman commented Jan 10, 2019 via email

@arkanoid87
Copy link
arkanoid87 commented Jan 10, 2019

I need to apply a different scaling factor to every column of my dataset.
In particular I need to rescale columns dt.dayofyear, dt.month, dt.quarter and dt.hour to sine and cosine for ML.
All my transform functions are reversible so I tried (without reading in depth the manual) to apply ColumnTransform + Pipeline but then I realised that the first was not passing the invert_transform functions described into the FunctionTransformers in my Pipelines.

So I coded this bare encoder where I pass a list of encoders matching the columns order in my dataset and it seems working

class ColumnScaler():
    def __init__(self, encoders=None):
        self.encoders = encoders
    
    def fit(self, X):
        X = check_array(X)
        columns = X.T
        encoders = self.encoders
        for column, encoder in zip(columns, encoders):
            encoder.fit(column.reshape(column.shape[0],1)), 
    
    def transform(self, X):
        X = check_array(X)
        result = np.column_stack(map(
            lambda column, encoder: encoder.transform(column.reshape(column.shape[0],1)), 
            X.T,
            self.encoders
        ))
        return result
    
    def inverse_transform(self, X):
        X = check_array(X)
        result = np.column_stack(map(
            lambda column, encoder: encoder.inverse_transform(column.reshape(column.shape[0],1)), 
            X.T,
            self.encoders
        ))
        return result

like this

def cycle_pipeline():
    return Pipeline([
        ('MinMaxScaler', MinMaxScaler(feature_range=(0, 360))),
        ('deg2rad', FunctionTransformer(func=np.deg2rad, inverse_func=np.rad2deg, validate=True)),
        ('sin', FunctionTransformer(func=np.sin, inverse_func=np.arcsin, validate=True)),
    ])
    
encoders = [
    MinMaxScaler(),
    MinMaxScaler(),
    MinMaxScaler(),
    MinMaxScaler(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline()
]
scaler = ColumnScaler(encoders)
scaler.fit(data_copy)

@jnothman
Copy link
Member Author

Thanks. Does the proposed change at #11639 work for you, @arkanoid87?

@ogrisel
Copy link
Member
ogrisel commented Sep 6, 2019

Another use case: being able to use ColumnTransformer to preprocess a multi-column target y using TransformedTargetRegressor. TransformedTargetRegressor requires its underlying transformer to implement inverse_transform to be able to make predictions.

This use case was first reported to me IRL by @trbedwards.

@thomasjpfan
Copy link
Member

Thank you for your input @ogrisel and @arkanoid87 . I'll update the PR with a motivating example and see if there are any improvements to the implementation.

@raels0
Copy link
raels0 commented Sep 30, 2019

Question, since we have access to the underlying steps for each column (or group of), couldn't we check if the pipeline consists of reversible steps, if so inverse that column. Where it isn't possible to do so for all, that would be an error case.

@gilbertovilarunc
Copy link
gilbertovilarunc commented Feb 27, 2022

Is it solved? Reaally need this feature!

@MehdiActable
Copy link

This feature looks really nice and I also need it actually !

@fredmontet
Copy link

Same here...

@ttamg
Copy link
ttamg commented Nov 25, 2022

Me too! Will be very useful

@ttamg
Copy link
ttamg commented Nov 25, 2022

I see all the issues with using the inverse of ColumnTransformer and lining up columns, missing values and so on.

Here is a simple fix to drop into code for someone wanting the ColumnTransformer.inverse_transform(X) method and can guarantee that you have the right columns in the right order in X.

from sklearn.compose import ColumnTransformer

class InvertableColumnTransformer(ColumnTransformer):
    """
    Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.

    Warning this is flaky and use at your own risk.  Validation checks that the column count in 
    `transformers` are in your object `X` to be inverted.  Reordering of columns will break things!
    """

    def inverse_transform(self, X):

        if X.shape[1] != self.n_features_in_:
            raise Exception("X and the fitted transformer seem to have differnet numbers of columns.")

        arrays = []
        for name, indices in self.output_indices_.items():

            transformer = self.named_transformers_.get(name, None)
            arr = X[:, indices.start: indices.stop]

            if transformer is None: 
                pass

            else:
                arr = transformer.inverse_transform(arr)

            arrays.append(arr)

        
        return np.concatenate(arrays, axis=1)

Use at your own risk.

@SamsTheGreatest
Copy link

@ttamg some mods

class InvertableColumnTransformer(ColumnTransformer):
"""
Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.

Warning this is flaky and use at your own risk.  Validation checks that the column count in 
`transformers` are in your object `X` to be inverted.  Reordering of columns will break things!
"""

def inverse_transform(self, X):
    if isinstance(X,pd.DataFrame):
        X = X.to_numpy()
    if X.shape[1] != self.n_features_in_:
        raise Exception("X and the fitted transformer seem to have differnet numbers of columns.")

    arrays = []
    for name, indices in self.output_indices_.items():
        transformer = self.named_transformers_.get(name, None)
        arr = X[:, indices.start: indices.stop]

        if transformer is None or transformer is 'passthrough':
            pass

        else:
            arr = transformer.inverse_transform(arr)

        arrays.append(arr)

    return np.concatenate(arrays, axis=1)

@bl3e967
Copy link
bl3e967 commented Aug 11, 2023

@ttamg A while since this was last addressed but have made some modifications to handle drop.

from sklearn.compose import ColumnTransformer

class InvertableColumnTransformer(ColumnTransformer):
    """
    Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.

    Warning this is flaky and use at your own risk.  Validation checks that the column count in 
    `transformers` are in your object `X` to be inverted.  Reordering of columns will break things!
    """
    def inverse_transform(self, X):
        if isinstance(X,pd.DataFrame):
            X = X.to_numpy()

        arrays = []
        for name, indices in self.output_indices_.items():
            transformer = self.named_transformers_.get(name, None)
            arr = X[:, indices.start: indices.stop]

            if transformer in (None, "passthrough", "drop"):
                pass

            else:
                arr = transformer.inverse_transform(arr)

            arrays.append(arr)

        retarr = np.concatenate(arrays, axis=1)

        if retarr.shape[1] != X.shape[1]:
            raise ValueError(f"Received {X.shape[1]} columns but transformer expected {retarr.shape[1]}")

        return retarr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

0