-
In a Pipeline, if I use FunctionTransformer after ColumnTransformer, the FunctionTransformer will get numpy array instead of original dataframe. How do I implement per-column transformation, which keeps my data in shape of DataFrame? Minimal Example:
If you run it this way, the DataFrame is printed. If you uncomment ColumnTransformer, then a numpy array is printed. I do not suspect it to be a bug. Seems I just cannot guess the right return format for ColumnTransformer. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Elsewhere in scikit-learn, input validation always happens and pandas dataframes are always converted into NumPy array because scikit-learn is going to make some numerical operations. This is the reason why In the future, we will try to improve this part of the user experience by providing |
Beta Was this translation helpful? Give feedback.
-
in case anyone stumbles upon this in the future, this can be fixed using |
Beta Was this translation helpful? Give feedback.
FuntionTransformer
is quite of a flexible beast and by fixingvalidate=False
(that is the default), the input data will not be validated and thus passed to the function directly.Elsewhere in scikit-learn, input validation always happens and pandas dataframes are always converted into NumPy array because scikit-learn is going to make some numerical operations. This is the reason why
ColumnTransformer
output a NumPy array or a sparse matrix.In the future, we will try to improve this part of the user experience by providing
feature_names
at different steps of a pipeline and maybe at some point, provide other types than NumPy array at the intermediate stages of the preprocessing.