Another input needed for the parameter `n_features_to_select` in SequentialFeatureSelector #21291

hellojinwoo · 2021-10-09T09:03:15Z

Describe the workflow you want to enable

Currently, to use the SequentialFeatureSelector, you need a parameter n_features_to_select. However, according to the book 'Introduction to statistical learning', you can know how many number of variables are appropriate only after you test all number of parameters and get the best model of each number of parameters.

This is an excerpt from the book ISLR, which shows that you can figure out how many number of features are appropriate after testing all numbers of predictors. You cannot figure it out beforehand.

Describe your proposed solution

I suggest to create options like "the lowest adjusted r_squared" for the parameter n_features_to_select. By doing so, you can choose the number of features that has the lowest adjusted r_squared, which cannot be known beforehand.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

bmreiniger · 2021-10-09T14:04:24Z

Posted at https://datascience.stackexchange.com/q/102964/55122

See also #20137, #19583.

thomasjpfan · 2021-10-09T19:10:34Z

Closing because this is a duplicate of #20137. Note that there is on-going work at #20145 that will resolve the issue.

bmreiniger · 2021-10-10T14:03:21Z

@thomasjpfan this would be a little different if the scores-vs-number-of-features graph isn't convex: this proposal would order all of the features and then choose the best number, which might not be at the first turnaround. That said, it's not clear how common it would be that later additions/deletions would be sufficiently better to be worth the extra processing time.

With the work in #20145, a user could get this by setting tol=-np.inf and then manually set the number of features (would changing the n_features_to_select_ learned attribute have the desired effect when transforming?). I'd imagine, if it was deemed worth it, we could add an option n_features_to_select="best" without too much trouble after 20145.

hellojinwoo · 2021-11-02T05:46:30Z

@bmreiniger As you said, checking out the estimated test score(e.g. adjusted R Squared, AIC, BIC, etc. ) of all number of features and then deciding which number of features to use, it is not a duplicate of #20137. #20137 talks about something like "continuing to select the features as long as AIC gets better". However, this methodology is based on the assumption that you cannot solely rely on one estimated test score to choose which number of features should be selection. So as long as training SSE decreases, the feature selection goes on and on till all number of features is used, and then decides which number of features to use, by taking a look at several estimated test scores.

R supports this function so it would be great if we see this in SKlearn as well :)

hellojinwoo added the New Feature label Oct 9, 2021

thomasjpfan closed this as completed Oct 9, 2021

thomasjpfan reopened this Oct 10, 2021

cmarmo added the module:feature_selection label Sep 14, 2022

bmreiniger mentioned this issue May 3, 2024

Allow for multiple scoring metrics in RFECV #28937

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Another input needed for the parameter `n_features_to_select` in SequentialFeatureSelector #21291

Another input needed for the parameter `n_features_to_select` in SequentialFeatureSelector #21291

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Another input needed for the parameter n_features_to_select in SequentialFeatureSelector #21291

Another input needed for the parameter n_features_to_select in SequentialFeatureSelector #21291

Comments

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Another input needed for the parameter `n_features_to_select` in SequentialFeatureSelector #21291

Another input needed for the parameter `n_features_to_select` in SequentialFeatureSelector #21291