Support inverse_transform in ColumnTransformer #11463

jnothman · 2018-07-10T02:10:04Z

Since ColumnTransformer is primarily used to apply basic preprocessing, it would be extremely helpful if it (and FeatureUnion while we're at it) supported inverse_transform (where each of its constituent estimators does too). This also would replace some functionality we might otherwise support with categorical_features or ignored_features etc parameters (H/T @qinhanmin2014).

In order for it to support inverse_transform, it would need to have an index of which columns in the output came from which transformers, which is what I intended to solve in #1952. This is not hard as long as fit_transform, rather than fit, is called, or as long as all the constituent transformers support get_feature_names. Maybe we could force a run of transform when fit is called in order to check the output size.

(Alternatively, it might be nice to have a consistent API for a transformer to declare its output size, which n_components_ does for a handful of estimators.)

The text was updated successfully, but these errors were encountered:

thomasjpfan · 2018-07-12T17:39:45Z

To implementColumnTransformer.inverse_transform, all the dropped columns would have to be stored. For users that do not need this feature, this may inflate the size of the ColumnTransformer object. I think adding a parameter named needs_inverse, to allow users to opt into this feature would be beneficial. What do you think?

amueller · 2018-07-12T19:05:49Z

I'm not sure I understand. Each input column can be used in multiple transformers, right? How would these be combined?

thomasjpfan · 2018-07-12T19:58:10Z

I didn't consider that use case! Since current version of ColumnTransformer already supports this, I think tests should be added to test "each input column can be used in multiple transformers".

Regarding the issue with inverse_transform and combining different inputs, I see three options

When there is a transformer that targets the same input column, the call to inverse_transform raises an error.
Add another parameter to allow uses to define how to aggregated the inputs.
Do not support inverse_transform in ColumnTransformer

jnothman · 2018-07-15T06:18:35Z

Oh, indeed, there are lots of troubles here. We could support the case with no drop and no overlap.

amueller · 2018-07-15T20:29:54Z

With drop we could just fill it with 0 or missing values but I think we shouldn't try to support the overlap case. I'm ok to support the no overlap case though.

arkanoid87 · 2019-01-09T19:58:28Z

I stepped on this too

jnothman · 2019-01-10T00:12:29Z

@arkanoid87 could you please describe your use cases for ColumnTransformer.inverse_transform?

arkanoid87 · 2019-01-10T00:29:59Z

I need to apply a different scaling factor to every column of my dataset.
In particular I need to rescale columns dt.dayofyear, dt.month, dt.quarter and dt.hour to sine and cosine for ML.
All my transform functions are reversible so I tried (without reading in depth the manual) to apply ColumnTransform + Pipeline but then I realised that the first was not passing the invert_transform functions described into the FunctionTransformers in my Pipelines.

So I coded this bare encoder where I pass a list of encoders matching the columns order in my dataset and it seems working

class ColumnScaler():
    def __init__(self, encoders=None):
        self.encoders = encoders
    
    def fit(self, X):
        X = check_array(X)
        columns = X.T
        encoders = self.encoders
        for column, encoder in zip(columns, encoders):
            encoder.fit(column.reshape(column.shape[0],1)), 
    
    def transform(self, X):
        X = check_array(X)
        result = np.column_stack(map(
            lambda column, encoder: encoder.transform(column.reshape(column.shape[0],1)), 
            X.T,
            self.encoders
        ))
        return result
    
    def inverse_transform(self, X):
        X = check_array(X)
        result = np.column_stack(map(
            lambda column, encoder: encoder.inverse_transform(column.reshape(column.shape[0],1)), 
            X.T,
            self.encoders
        ))
        return result

like this

def cycle_pipeline():
    return Pipeline([
        ('MinMaxScaler', MinMaxScaler(feature_range=(0, 360))),
        ('deg2rad', FunctionTransformer(func=np.deg2rad, inverse_func=np.rad2deg, validate=True)),
        ('sin', FunctionTransformer(func=np.sin, inverse_func=np.arcsin, validate=True)),
    ])
    
encoders = [
    MinMaxScaler(),
    MinMaxScaler(),
    MinMaxScaler(),
    MinMaxScaler(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline(),
    cycle_pipeline()
]
scaler = ColumnScaler(encoders)
scaler.fit(data_copy)

jnothman · 2019-01-10T04:41:25Z

Thanks. Does the proposed change at #11639 work for you, @arkanoid87?

ogrisel · 2019-09-06T13:05:52Z

Another use case: being able to use ColumnTransformer to preprocess a multi-column target y using TransformedTargetRegressor. TransformedTargetRegressor requires its underlying transformer to implement inverse_transform to be able to make predictions.

This use case was first reported to me IRL by @trbedwards.

thomasjpfan · 2019-09-13T18:23:01Z

Thank you for your input @ogrisel and @arkanoid87 . I'll update the PR with a motivating example and see if there are any improvements to the implementation.

raels0 · 2019-09-30T05:41:52Z

Question, since we have access to the underlying steps for each column (or group of), couldn't we check if the pipeline consists of reversible steps, if so inverse that column. Where it isn't possible to do so for all, that would be an error case.

gilbertovilarunc · 2022-02-27T22:31:42Z

Is it solved? Reaally need this feature!

MehdiActable · 2022-04-09T12:24:57Z

This feature looks really nice and I also need it actually !

fredmontet · 2022-10-25T12:38:38Z

Same here...

ttamg · 2022-11-25T11:12:17Z

Me too! Will be very useful

ttamg · 2022-11-25T11:50:59Z

I see all the issues with using the inverse of ColumnTransformer and lining up columns, missing values and so on.

Here is a simple fix to drop into code for someone wanting the ColumnTransformer.inverse_transform(X) method and can guarantee that you have the right columns in the right order in X.

from sklearn.compose import ColumnTransformer

class InvertableColumnTransformer(ColumnTransformer):
    """
    Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.

    Warning this is flaky and use at your own risk.  Validation checks that the column count in 
    `transformers` are in your object `X` to be inverted.  Reordering of columns will break things!
    """

    def inverse_transform(self, X):

        if X.shape[1] != self.n_features_in_:
            raise Exception("X and the fitted transformer seem to have differnet numbers of columns.")

        arrays = []
        for name, indices in self.output_indices_.items():

            transformer = self.named_transformers_.get(name, None)
            arr = X[:, indices.start: indices.stop]

            if transformer is None: 
                pass

            else:
                arr = transformer.inverse_transform(arr)

            arrays.append(arr)

        
        return np.concatenate(arrays, axis=1)

Use at your own risk.

SamsTheGreatest · 2022-12-05T13:31:19Z

@ttamg some mods

class InvertableColumnTransformer(ColumnTransformer):
"""
Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.

Warning this is flaky and use at your own risk.  Validation checks that the column count in 
`transformers` are in your object `X` to be inverted.  Reordering of columns will break things!
"""

def inverse_transform(self, X):
    if isinstance(X,pd.DataFrame):
        X = X.to_numpy()
    if X.shape[1] != self.n_features_in_:
        raise Exception("X and the fitted transformer seem to have differnet numbers of columns.")

    arrays = []
    for name, indices in self.output_indices_.items():
        transformer = self.named_transformers_.get(name, None)
        arr = X[:, indices.start: indices.stop]

        if transformer is None or transformer is 'passthrough':
            pass

        else:
            arr = transformer.inverse_transform(arr)

        arrays.append(arr)

    return np.concatenate(arrays, axis=1)

bl3e967 · 2023-08-11T09:12:25Z

@ttamg A while since this was last addressed but have made some modifications to handle drop.

from sklearn.compose import ColumnTransformer

class InvertableColumnTransformer(ColumnTransformer):
    """
    Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.

    Warning this is flaky and use at your own risk.  Validation checks that the column count in 
    `transformers` are in your object `X` to be inverted.  Reordering of columns will break things!
    """
    def inverse_transform(self, X):
        if isinstance(X,pd.DataFrame):
            X = X.to_numpy()

        arrays = []
        for name, indices in self.output_indices_.items():
            transformer = self.named_transformers_.get(name, None)
            arr = X[:, indices.start: indices.stop]

            if transformer in (None, "passthrough", "drop"):
                pass

            else:
                arr = transformer.inverse_transform(arr)

            arrays.append(arr)

        retarr = np.concatenate(arrays, axis=1)

        if retarr.shape[1] != X.shape[1]:
            raise ValueError(f"Received {X.shape[1]} columns but transformer expected {retarr.shape[1]}")

        return retarr

jnothman added Enhancement API help wanted labels Jul 10, 2018

jnothman mentioned this issue Jul 10, 2018

[MRG+2] Merge discrete branch into master #9342

Merged

7 tasks

thomasjpfan linked a pull request Jul 20, 2018 that will close this issue

[MRG] ENH: Adds inverse_transform to ColumnTransformer #11639

Open

cmarmo removed the help wanted label Aug 23, 2020

cmarmo added module:compose module:pipeline labels Jan 17, 2022

tsrobinson mentioned this issue Jun 28, 2022

Redo _preprocess_df() when sklearn ColumnTransformer has inverse_transform method tsrobinson/SyGNet#16

Open

glemaitre mentioned this issue Jan 15, 2024

Expanded ColumnTransformer functionality -- transforming subsets of data #28130

Open

Alex-Cremers mentioned this issue Jul 13, 2024

feat: RepeatingBasisFunction.inverse_transform koaning/scikit-lego#687

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support inverse_transform in ColumnTransformer #11463

Support inverse_transform in ColumnTransformer #11463

Support inverse_transform in ColumnTransformer #11463

Support inverse_transform in ColumnTransformer #11463

Comments