8000 Percentile-based SelectFromModel · Issue #9613 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Percentile-based SelectFromModel #9613
Closed
@nsheth12

Description

@nsheth12

Description

I am trying to use SelectFromModel to select the most important X% of features based on feature importance. However, SelectFromModel only allows feature selection threshold to be determined by the mean of feature importances, median, some multiple of mean or median, or just by setting the threshold directly. I think it would be helpful for the API to allow the threshold to be set as a percentile to select a certain percent of features. This functionality is currently not present in scikit-learn.

Steps/Code to Reproduce

Example:

from sklearn.feature_selection import SelectFromModel
help(SelectFromModel)

Desired API

I'll write this in as a change to the SelectFromModel API, modifying the threshold parameter to take percentile inputs in the format "X-percentile" to select the top X% of features by importance.

from sklearn.feature_selection import SelectFromModel
from sklearn.ensemble import RandomForestRegressor
select = SelectFromModel(RandomForestRegressor(), threshold="10-percentile") # following the format of mean/median scaling in docs, i.e. "1.25*mean"

Versions

Darwin-16.6.0-x86_64-i386-64bit
Python 3.6.1 |Anaconda 4.4.0 (x86_64)| (default, May 11 2017, 13:04:09) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]
NumPy 1.12.1
SciPy 0.19.0
Scikit-Learn 0.18.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0