10000 ColumnTransformer requires excluded columns · Issue #19168 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ColumnTransformer requires excluded columns #19168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
david-cortes opened this issue Jan 13, 2021 · 1 comment
Closed

ColumnTransformer requires excluded columns #19168

david-cortes opened this issue Jan 13, 2021 · 1 comment

Comments

@david-cortes
Copy link
Contributor

The ColumnTransformer has an option remainder="drop" (which is the default) that makes it drop any column from the input that is not handled within the transformers passed to its transformers (list) argument.

However, if the data to which the ColumnTransformer is fitted is a DataFrame with named columns, and there are columns which end up dropped due to not being handled by the transformer, these columns are still required to be in the DataFrame when calling transform, which shouldn't be.

Example:

import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

df = pd.DataFrame({
    "col1" : ["a","b"],
    "col2" : ["c", "d"],
    "col3" : [1,3]
})

ct = ColumnTransformer([
    ("ohe_col1", OneHotEncoder(handle_unknown="ignore"), ["col1"]),
    ("ohe_col2", OneHotEncoder(), ["col2"])
], remainder="drop")
ct.fit(df, df["col3"])
ct.transform(df[["col1", "col2"]])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-3b20260b9d3f> in <module>
     14 ], remainder="drop")
     15 ct.fit(df, df["col3"])
---> 16 ct.transform(df[["col1", "col2"]])

~/ipython/del/sklearn/compose/_column_transformer.py in transform(self, X)
    555             X_feature_names = None
    556 
--> 557         self._check_n_features(X, reset=False)
    558         if (self._feature_names_in is not None and
    559             X_feature_names is not None and

~/ipython/del/sklearn/base.py in _check_n_features(self, X, reset)
    364         if n_features != self.n_features_in_:
    365             raise ValueError(
--> 366                 f"X has {n_features} features, but {self.__class__.__name__} "
    367                 f"is expecting {self.n_features_in_} features as input.")
    368 

ValueError: X has 2 features, but ColumnTransformer is expecting 3 features as input.

In this case, the transformer will only take columns col1 and col2, but still demands that the input have col3 which is not used.

@NicolasHug
Copy link
Member

This is discussed in #14251, please refer to this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0