[RFC] Should scalers or other estimators warn when fit on constant features? #19547

ogrisel · 2021-02-24T10:33:37Z

As discussed in #19527, fitting models on data with constant feature can be surprising.

For instance a StandardScaler(with_mean=False) fit on a column with constant values set to 1000. will let those values passthough unchanged because the variance of the column is null. It can be surprising but is this a problem? Should we warn the user about the presence of such constant features which are typically not predictive for machine learning models?

Which estimator should warn about such constant features? The scalers can naturally detect those because they can detect them when computing the scale_ attribute. The QuantileTransformer could also probably warn about this degenerate case.

HistGradientBoosting* and KBinsDiscretizer can also do it efficiently when binning the feature values.

If we do so:

what should be the warning message? Should it be the same for all the models?
shall we add a standard constructor param to this estimators constant_feature={'warn', 'drop', 'passthrough', 'zero', 'one'} with "warn" as the default?
should we generalize this to all estimators? (ogrisel: probably not because it could be expensive and redundant input validation check so we could restrict to the estimators above where it's cheap to check)

Are there legitimate cases where such a warning would be frequent and annoying? For instance StandardScaler(with_mean=False) after OneHotEncoding with dense output with a categorical feature that has a category that is significantly more frequent than the others in cross-validation loop? A similar problem could happen with after OrdinalEncoding. But would StandardScaler(with_mean=False) would actually make sense to use in those cases?

List of estimators to consider:

scalers (such as StandardScaler, RobustScaler, MinMaxScaler)...
estimators that do feature binning: HistGradientBoosting* and KBinsDiscretizer,
feature selectors such as SelectKBest.

The text was updated successfully, but these errors were encountered:

azihna · 2021-02-26T17:10:38Z

My first reaction is to say "yes, it is a problem and the library should warn about this!" but thinking about it a bit more I am more inclined that it is the user's responsibility to know the data and remove these features beforehand. I think this would be very frequent in use and have a similar feeling to the chained assignment warning from pandas.

However, it was to be added rather than adding a parameter to each estimator, adding a general option (again thinking about the chained assignment warning) about these might be better. There can be cases that the data you're training now on is constant but you know you'll get data later that will change that, and you might have ColumnTransformers that have different scalers on these columns that would proc the warning multiple times and would have to be turned off individually. I already see myself looking desperately for that one Scaler I forgot to turn the warning off and all the useful information is getting lost among the warnings in a cross-validation loop.

Micky774 · 2023-05-16T04:13:07Z

II just discovered this while responding to #26357 and I think it would be a helpful addition to StandardScaler. It is a common enough pitfall that scikit-learn would benefit its users by including an easily-suppressed warning. I think the added keyword approach would be a good way to control the prevalence of the warning.

I think StandardScaler is most likely the least controversial estimator for something like this, and a good place to start.

betatim · 2023-05-16T07:33:45Z

Sounds like a good proposal @Micky774. I think I prefer not adding it to too many estimators because it would lead to a flood of warnings, which leads to no one looking at them anymore. This means adding it sparingly to estimators where we see people step into this trap a few times is a good way of selecting where to add it.

ogrisel · 2025-01-09T13:25:39Z

Another related data-point: SelectKBest currently warns about constant features, indirectly by calling f_oneway issues a UserWarning with the integer indices (but not feature names...) of all the constant features. Moreover, it issues a RuntimeWarning about "invalid value in divide" in the presence of constant features.

This can be very verbose, in particular when feature selection is performed on a pipeline to model interactions between categorical variables:

from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import SelectKBest
from sklearn.preprocessing import PolynomialFeatures, OneHotEncoder
from sklearn.linear_model import RidgeCV

pipeline = make_pipeline(
    OneHotEncoder(sparse_output=True),
    PolynomialFeatures(interaction_only=True, include_bias=False),
    SelectKBest(k=500),
    RidgeCV(),
)

The output of PolynomialFeatures can be very high dimensional (in particular if there are two high cardinality categorical features in the input or more), but it should be cheap to filter the constant and non-informative generated features using SelectKBest.

However, this is very verbose, especially if you cross-validate or hparam tune such a pipeline.

EDIT: the current behavior of SelectKBest seems to be warn without dropping ahead of time:

>>> import numpy as np
>>> from sklearn.feature_selection import SelectKBest
>>> X = np.ones(shape=(100, 10))
>>> y = np.random.choice(range(5), size=X.shape[0])
>>> feature_selector = SelectKBest(k=3).fit(X, y)
/Users/ogrisel/code/scikit-learn/sklearn/feature_selection/_univariate_selection.py:111: UserWarning: Features [0 1 2 3 4 5 6 7 8 9] are constant.
  warnings.warn("Features %s are constant." % constant_features_idx, UserWarning)
/Users/ogrisel/code/scikit-learn/sklearn/feature_selection/_univariate_selection.py:112: RuntimeWarning: invalid value encountered in divide
  f = msb / msw
>>> feature_selector.scores_
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
>>> feature_selector.pvalues_
array([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])
>>> feature_selector.get_feature_names_out()
array(['x7', 'x8', 'x9'], dtype=object)

The last k (constant) features are arbitrarily selected.

Dropping would also be an option, but it could be dangerous in the sense that it could generate an empty output feature set in the case all input features are constant. So maybe the current behavior is a less surprising default.

ogrisel added Bug: triage Enhancement module:preprocessing and removed Bug: triage labels Feb 24, 2021

ogrisel changed the title ~~[RFC] Should scalers or other estimators warn when fit on constant features~~ [RFC] Should scalers or other estimators warn when fit on constant features? Feb 24, 2021

This was referenced Feb 24, 2021

Prevent scalers to scale near-constant features very large values #19527

Merged

PolynomialFeatures doesn't work correctly when degree=0 #19551

Closed

Micky774 mentioned this issue May 16, 2023

PowerTransformer standardize not working #26357

Closed

glemaitre mentioned this issue Sep 22, 2023

Enable drop='constant' in OneHotEncoder #27435

Closed

lesteve mentioned this issue Dec 20, 2023

MinMaxScalar.fit_transform() Returns Zero When All Elements Are Same #27987

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC] Should scalers or other estimators warn when fit on constant features? #19547

[RFC] Should scalers or other estimators warn when fit on constant features? #19547

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[RFC] Should scalers or other estimators warn when fit on constant features? #19547

[RFC] Should scalers or other estimators warn when fit on constant features? #19547

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!