10000 False positive warning in `FunctionTransformer` · Issue #26552 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

False positive warning in FunctionTransformer #26552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Jun 9, 2023 · 6 comments · Fixed by #26944
Closed

False positive warning in FunctionTransformer #26552

glemaitre opened this issue Jun 9, 2023 · 6 comments · Fixed by #26944

Comments

@glemaitre
Copy link
Member
glemaitre commented Jun 9, 2023

While looking at #26543, I find out that we raise a FutureWarning that looked like a false positive to me:

from sklearn.preprocessing import FunctionTransformer, OneHotEncoder
from sklearn.pipeline import make_pipeline
import pandas as pd
import numpy as np


test_df = pd.DataFrame(
    np.random.randint(low=0, high=9, size=(10, 2)), columns=["None", "valid"]
)


def _drop_None_cols(df):
    col_names = [col for col in df.columns if "None" in col]
    if len(col_names):
        return df.drop(columns=col_names)
    return df


encoder_dropping_None = make_pipeline(
    OneHotEncoder(sparse_output=False),
    FunctionTransformer(_drop_None_cols),
).set_output(transform="pandas")


# encoder_dropping_None.fit_transform(test_df)
/Users/glemaitre/Documents/packages/scikit-learn/sklearn/preprocessing/_function_transformer.py:345: UserWarning: With transform="pandas", `func` should return a DataFrame to follow the set_output API.
  warnings.warn(

Indeed, my function func is returning a dataframe and the output of the transform is indeed a dataframe. So the warning is a bit surprising. Using the global config (i.e. set_config will not raise this warning).

@github-actions github-actions bot added the Needs Triage Issue requires triage label Jun 9, 2023
@glemaitre
Copy link
Member Author

Maybe @thomasjpfan has already in mind what is wrong here before that I investigate the problem.

@glemaitre
Copy link
Member Author

OK, seeing the code in set_output, I can silence it only setting the encoder.

encoder_dropping_None = make_pipeline(
    OneHotEncoder(sparse_output=False).set_output(transform="pandas"),
    FunctionTransformer(_drop_None_cols),
)
encoder_dropping_None.fit_transform(test_df)

Somehow, it is expected that I provide feature_names_out this is not really possible with what I want to do.

Looking at what is required, it makes somehow sense. However, I would not be surprised if a user set the output on the pipeline instead of going through each estimator and thinking if you really want to rewrap or not the output.

I don't really know what is best here.

@glemaitre glemaitre removed the Needs Triage Issue requires triage label Jun 16, 2023
@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

In the original code, the warning is raised when we call .set_output on the pipeline, not when we call .fit_transform on it.

@ogrisel
Copy link
Member
ogrisel commented Jun 16, 2023

We might want to delay the raise of the warning FunctionTransformer at the end of its .transform if the concrete type of its return value is not consistent with the currently configured .set_output policy instead of at the time where we call .set_output on it.

@Charlie-XIAO
Copy link
Contributor

It seems that FunctionTransformer do not have get_feature_names_out if feature_names_out is provided, so that set_output is not called and _sklearn_output_config is not set. It is then not feasible to use _get_output_config to get the output policy unless setting global config. Am I not understanding you correctly @ogrisel?

@thomasjpfan
Copy link
Member

I agree the warning is a false positive. I opened #26944 to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0