8000 Cannot get feature names after ColumnTransformer · Issue #12525 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Cannot get feature names after ColumnTransformer #12525
Closed
@pjgao

Description

@pjgao

When I use ColumnTransformer to preprocess different columns (include numeric, category, text) with pipeline, I cannot get the feature names of the final transformed data, which is hard for debugging.

Here is the code:

titanic_url = ('https://raw.githubusercontent.com/amueller/'
               'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')

data = pd.read_csv(titanic_url)

target = data.pop('survived')

numeric_columns = ['age','sibsp','parch']
category_columns = ['pclass','sex','embarked']
text_columns = ['name','home.dest']

numeric_transformer = Pipeline(steps=[
    ('impute',SimpleImputer(strategy='median')),
    ('scaler',StandardScaler()
    )
])
category_transformer = Pipeline(steps=[
    ('impute',SimpleImputer(strategy='constant',fill_value='missing')),
    ('ohe',OneHotEncoder(handle_unknown='ignore'))
])
text_transformer = Pipeline(steps=[
    ('cntvec',CountVectorizer())
])

preprocesser = ColumnTransformer(transformers=[
    ('numeric',numeric_transformer,numeric_columns),
    ('category',category_transformer,category_columns),
    ('text',text_transformer,text_columns[0])
])

preprocesser.fit_transform(data)
  1. preprocesser.get_feature_names() will get error:
    AttributeError: Transformer numeric (type Pipeline) does not provide get_feature_names.
  2. In ColumnTransformertext_transformer can only process a string (eg 'Sex'), but not a list of string as text_columns

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0