Closed
Description
When I use ColumnTransformer to preprocess different columns (include numeric, category, text) with pipeline, I cannot get the feature names of the final transformed data, which is hard for debugging.
Here is the code:
titanic_url = ('https://raw.githubusercontent.com/amueller/'
'scipy-2017-sklearn/091d371/notebooks/datasets/titanic3.csv')
data = pd.read_csv(titanic_url)
target = data.pop('survived')
numeric_columns = ['age','sibsp','parch']
category_columns = ['pclass','sex','embarked']
text_columns = ['name','home.dest']
numeric_transformer = Pipeline(steps=[
('impute',SimpleImputer(strategy='median')),
('scaler',StandardScaler()
)
])
category_transformer = Pipeline(steps=[
('impute',SimpleImputer(strategy='constant',fill_value='missing')),
('ohe',OneHotEncoder(handle_unknown='ignore'))
])
text_transformer = Pipeline(steps=[
('cntvec',CountVectorizer())
])
preprocesser = ColumnTransformer(transformers=[
('numeric',numeric_transformer,numeric_columns),
('category',category_transformer,category_columns),
('text',text_transformer,text_columns[0])
])
preprocesser.fit_transform(data)
preprocesser.get_feature_names()
will get error:
AttributeError: Transformer numeric (type Pipeline) does not provide get_feature_names.
- In
ColumnTransformer
,text_transformer
can only process a string (eg 'Sex'), but not a list of string astext_columns
Metadata
Metadata
Assignees
Labels
No labels