-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Support inverse_transform in ColumnTransformer #11463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To implement |
I'm not sure I understand. Each input column can be used in multiple transformers, right? How would these be combined? |
I didn't consider that use case! Since current version of Regarding the issue with
|
Oh, indeed, there are lots of troubles here.
We could support the case with no drop and no overlap.
|
With |
I stepped on this too |
@arkanoid87 could you please describe your use cases for
ColumnTransformer.inverse_transform?
|
I need to apply a different scaling factor to every column of my dataset. So I coded this bare encoder where I pass a list of encoders matching the columns order in my dataset and it seems working
like this
|
Thanks. Does the proposed change at #11639 work for you, @arkanoid87? |
Another use case: being able to use This use case was first reported to me IRL by @trbedwards. |
Thank you for your input @ogrisel and @arkanoid87 . I'll update the PR with a motivating example and see if there are any improvements to the implementation. |
Question, since we have access to the underlying steps for each column (or group of), couldn't we check if the pipeline consists of reversible steps, if so inverse that column. Where it isn't possible to do so for all, that would be an error case. |
Is it solved? Reaally need this feature! |
This feature looks really nice and I also need it actually ! |
Same here... |
Me too! Will be very useful |
I see all the issues with using the inverse of Here is a simple fix to drop into code for someone wanting the from sklearn.compose import ColumnTransformer
class InvertableColumnTransformer(ColumnTransformer):
"""
Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.
Warning this is flaky and use at your own risk. Validation checks that the column count in
`transformers` are in your object `X` to be inverted. Reordering of columns will break things!
"""
def inverse_transform(self, X):
if X.shape[1] != self.n_features_in_:
raise Exception("X and the fitted transformer seem to have differnet numbers of columns.")
arrays = []
for name, indices in self.output_indices_.items():
transformer = self.named_transformers_.get(name, None)
arr = X[:, indices.start: indices.stop]
if transformer is None:
pass
else:
arr = transformer.inverse_transform(arr)
arrays.append(arr)
return np.concatenate(arrays, axis=1) Use at your own risk. |
@ttamg some mods class InvertableColumnTransformer(ColumnTransformer):
|
@ttamg A while since this was last addressed but have made some modifications to handle from sklearn.compose import ColumnTransformer
class InvertableColumnTransformer(ColumnTransformer):
"""
Adds an inverse transform method to the standard sklearn.compose.ColumnTransformer.
Warning this is flaky and use at your own risk. Validation checks that the column count in
`transformers` are in your object `X` to be inverted. Reordering of columns will break things!
"""
def inverse_transform(self, X):
if isinstance(X,pd.DataFrame):
X = X.to_numpy()
arrays = []
for name, indices in self.output_indices_.items():
transformer = self.named_transformers_.get(name, None)
arr = X[:, indices.start: indices.stop]
if transformer in (None, "passthrough", "drop"):
pass
else:
arr = transformer.inverse_transform(arr)
arrays.append(arr)
retarr = np.concatenate(arrays, axis=1)
if retarr.shape[1] != X.shape[1]:
raise ValueError(f"Received {X.shape[1]} columns but transformer expected {retarr.shape[1]}")
return retarr |
Since ColumnTransformer is primarily used to apply basic preprocessing, it would be extremely helpful if it (and
FeatureUnion
while we're at it) supportedinverse_transform
(where each of its constituent estimators does too). This also would replace some functionality we might otherwise support withcategorical_features
orignored_features
etc parameters (H/T @qinhanmin2014).In order for it to support
inverse_transform
, it would need to have an index of which columns in the output came from which transformers, which is what I intended to solve in #1952. This is not hard as long asfit_transform
, rather thanfit
, is called, or as long as all the constituent transformers supportget_feature_names
. Maybe we could force a run oftransform
when fit is called in order to check the output size.(Alternatively, it might be nice to have a consistent API for a transformer to declare its output size, which
n_components_
does for a handful of estimators.)The text was updated successfully, but these errors were encountered: