8000 FunctionTransformer gets numpy instead of pandas after ColumnTransformer · scikit-learn scikit-learn · Discussion #22297 · GitHub
[go: up one dir, main page]

Skip to content

FunctionTransformer gets numpy instead of pandas after ColumnTransformer #22297

Answered by glemaitre
tomateit asked this question in Q&A
Discussion options

You must be logged in to vote

FuntionTransformer is quite of a flexible beast and by fixing validate=False (that is the default), the input data will not be validated and thus passed to the function directly.

Elsewhere in scikit-learn, input validation always happens and pandas dataframes are always converted into NumPy array because scikit-learn is going to make some numerical operations. This is the reason why ColumnTransformer output a NumPy array or a sparse matrix.

In the future, we will try to improve this part of the user experience by providing feature_names at different steps of a pipeline and maybe at some point, provide other types than NumPy array at the intermediate stages of the preprocessing.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@tomateit
Comment options

tomateit Jan 27, 2022
Author < 8000 /span>

@glemaitre
Comment options

Answer selected by tomateit
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
0