-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Workflow
In its current state, RFECV
only allows for a single scoring metric. In my opinion, calculating multiple scores on each model using k <= K features would be extremely valuable.
For example, if I wanted to study how the precision and recall metrics of a binary classifier evolve as I feed less and less features to a model, I would have to run RFECV
twice: one with scoring='precision'
and another with scoring='recall'
.
This is inefficient, as it implies running RFECV twice instead of once.
The cv_results_
attribute of GridSearchCV
returns one rank per metric used to evaluate each combination of hyperparameters. Replicating this behavior in RFECV
would be extremely helpful.
Proposed solution
Notation
- K is the number of folds used for cross-validation.
- P is the total number of features available.
- p is the number of features tried at each step. That is, an integer such that
min_features_to_select
<= p <= P. - m is one of M performance metrics passed by the user (e.g., 'precision').
Solution
User can pass a list of strings representing M predefined scoring metrics and at each step, the algorithm stores the performance metric of the k models trained with p <= P features.
The cv_results_
attribute of the resulting RFECV
would now include the following keys for each metric m and fold k:
'split{k}_test_{m}'
mean_test_{m}
'std_test_{m}'
'rank_test_{m}'
Example
rfecv = RFECV(
estimator=clf, # Some classifier instance
step=1,
min_features_to_select=1,
cv=10,
scoring=['precision', 'recall', 'f1', 'roc_auc', 'accuracy']
)
Considerations
It is likely that rank_test_{m1}
will differ from rank_test_{m2}
for any pair of performance metrics m1 and m2. Hence, adding this feature will no longer allow RFECV to automatically pick the best number of features, as the rankings can differ from one metric to another. This part of the workflow would be up to the user.
Describe alternatives you've considered, if relevant
Running RFECV
as many times as there are metrics.
Additional context
I asked this question on StackOverflow and the community agrees that the most viable way to do this is to run one RFECV per performance metric I need to evaluate.