8000 Extend SequentialFeatureSelector example to demonstrate how to use negative tol · Issue #25525 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Extend SequentialFeatureSelector example to demonstrate how to use negative tol #25525

@jnsofini

Description

@jnsofini

Describe the bug

I utilized the SequentialFeatureSelector for feature selection in my code, with the direction set to "backward." The tolerance value is negative and the selection process stops when the decrease in the metric, AUC in this case, is less than the specified tolerance. Generally, increasing the number of features results in a higher AUC, but sacrificing some features, especially correlated ones that offer little contribution, can produce a pessimistic model with a lower AUC. The code worked as expected in sklearn 1.1.1, but when I updated to sklearn 1.2.1, I encountered the following error.

Steps/Code to Reproduce

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

X, y = load_breast_cancer(return_X_y=True)

TOL = -0.001
feature_selector = SequentialFeatureSelector(
                    LogisticRegression(max_iter=1000),
                    n_features_to_select="auto",
                    direction="backward",
                    scoring="roc_auc",
                    tol=TOL
                )


pipe = Pipeline(
    [('scaler', StandardScaler()), 
    ('feature_selector', feature_selector), 
    ('log_reg', LogisticRegression(max_iter=1000))]
    )



if __name__ == "__main__":
    pipe.fit(X, y)
    print(pipe['log_reg'].coef_[0])

Expected Results

$ python sfs_tol.py 
[-2.0429818   0.5364346  -1.35765488 -2.85009904 -2.84603016]

Actual Results

$ python sfs_tol.py 
Traceback (most recent call last):
  File "/home/modelling/users-workspace/nsofinij/lab/open-source/sfs_tol.py", line 28, in <module>
    pipe.fit(X, y)
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/pipeline.py", line 401, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/pipeline.py", line 359, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/joblib/memory.py", line 349, in __call__
    return self.func(*args, **kwargs)
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/pipeline.py", line 893, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/base.py", line 862, in fit_transform
    return self.fit(X, y, **fit_params).transform(X)
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/feature_selection/_sequential.py", line 201, in fit
    self._validate_params()
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/base.py", line 581, in _validate_params
    validate_parameter_constraints(
  File "/home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/sklearn/utils/_param_validation.py", line 97, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'tol' parameter of SequentialFeatureSelector must be None or a float in the range (0, inf). Got -0.001 instead.

Versions

System:
    python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0]
executable: /home/modelling/opt/anaconda3/envs/py310/bin/python
   machine: Linux-4.14.301-224.520.amzn2.x86_64-x86_64-with-glibc2.26

Python dependencies:
      sklearn: 1.2.1
          pip: 23.0
   setuptools: 66.1.1
        numpy: 1.24.1
        scipy: 1.10.0
       Cython: None
       pandas: 1.5.3
   matplotlib: 3.6.3
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 64

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-15028c96.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 64

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/modelling/opt/anaconda3/envs/py310/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 64

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0