False positive warning in `FunctionTransformer` #26552

glemaitre · 2023-06-09T10:48:39Z

While looking at #26543, I find out that we raise a FutureWarning that looked like a false positive to me:

from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.pipeline import make_pipeline
import pandas as pd
import numpy as np


test_df = pd.DataFrame(
    np.random.randint(low=0, high=9, size=(10, 2)), columns=["None", "valid"]
)


def _drop_None_cols(df):
    col_names = [col for col in df.columns if "None" in col]
    if len(col_names):
        return df.drop(columns=col_names)
    return df


encoder_dropping_None = make_pipeline(
    OneHotEncoder(sparse_output=False),
    FunctionTransformer(_drop_None_cols),
).set_output(transform="pandas")


# encoder_dropping_None.fit_transform(test_df)

/Users/glemaitre/Documents/packages/scikit-learn/sklearn/preprocessing/_function_transformer.py:345: UserWarning: With transform="pandas", `func` should return a DataFrame to follow the set_output API.
  warnings.warn(

Indeed, my function func is returning a dataframe and the output of the transform is indeed a dataframe. So the warning is a bit surprising. Using the global config (i.e. set_config will not raise this warning).

The text was updated successfully, but these errors were encountered:

glemaitre · 2023-06-09T10:49:14Z

Maybe @thomasjpfan has already in mind what is wrong here before that I investigate the problem.

glemaitre · 2023-06-09T10:59:07Z

OK, seeing the code in set_output, I can silence it only setting the encoder.

encoder_dropping_None = make_pipeline(
    OneHotEncoder(sparse_output=False).set_output(transform="pandas"),
    FunctionTransformer(_drop_None_cols),
)
encoder_dropping_None.fit_transform(test_df)

Somehow, it is expected that I provide feature_names_out this is not really possible with what I want to do.

Looking at what is required, it makes somehow sense. However, I would not be surprised if a user set the output on the pipeline instead of going through each estimator and thinking if you really want to rewrap or not the output.

I don't really know what is best here.

ogrisel · 2023-06-16T14:43:44Z

In the original code, the warning is raised when we call .set_output on the pipeline, not when we call .fit_transform on it.

ogrisel · 2023-06-16T14:46:29Z

We might want to delay the raise of the warning FunctionTransformer at the end of its .transform if the concrete type of its return value is not consistent with the currently configured .set_output policy instead of at the time where we call .set_output on it.

Charlie-XIAO · 2023-06-18T10:48:16Z

It seems that FunctionTransformer do not have get_feature_names_out if feature_names_out is provided, so that set_output is not called and _sklearn_output_config is not set. It is then not feasible to use _get_output_config to get the output policy unless setting global config. Am I not understanding you correctly @ogrisel?

thomasjpfan · 2023-07-30T22:50:56Z

I agree the warning is a false positive. I opened #26944 to fix it.

github-actions bot added the Needs Triage Issue requires triage label Jun 9, 2023

glemaitre removed the Needs Triage Issue requires triage label Jun 16, 2023

glemaitre added the Enhancement label Jun 16, 2023

thomasjpfan mentioned this issue Jul 30, 2023

ENH Improve warnings if func returns a dataframe in FunctionTransformer #26944

Merged

glemaitre closed this as completed in #26944 Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

False positive warning in `FunctionTransformer` #26552

False positive warning in `FunctionTransformer` #26552

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

False positive warning in FunctionTransformer #26552

False positive warning in FunctionTransformer #26552

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

False positive warning in `FunctionTransformer` #26552

False positive warning in `FunctionTransformer` #26552