-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Another input needed for the parameter n_features_to_select
in SequentialFeatureSelector
#21291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@thomasjpfan this would be a little different if the scores-vs-number-of-features graph isn't convex: this proposal would order all of the features and then choose the best number, which might not be at the first turnaround. That said, it's not clear how common it would be that later additions/deletions would be sufficiently better to be worth the extra processing time. With the work in #20145, a user could get this by setting |
@bmreiniger As you said, checking out the estimated test score(e.g. adjusted R Squared, AIC, BIC, etc. ) of all number of features and then deciding which number of features to use, it is not a duplicate of #20137. #20137 talks about something like "continuing to select the features as long as AIC gets better". However, this methodology is based on the assumption that you cannot solely rely on one estimated test score to choose which number of features should be selection. So as long as training SSE decreases, the feature selection goes on and on till all number of features is used, and then decides which number of features to use, by taking a look at several estimated test scores. R supports this function so it would be great if we see this in SKlearn as well :) |
Describe the workflow you want to enable
Currently, to use the SequentialFeatureSelector, you need a parameter
n_features_to_select
. However, according to the book 'Introduction to statistical learning', you can know how many number of variables are appropriate only after you test all number of parameters and get the best model of each number of parameters.This is an excerpt from the book ISLR, which shows that you can figure out how many number of features are appropriate after testing all numbers of predictors. You cannot figure it out beforehand.
Describe your proposed solution
I suggest to create options like "the lowest adjusted r_squared" for the parameter
n_features_to_select
. By doing so, you can choose the number of features that has the lowest adjusted r_squared, which cannot be known beforehand.Describe alternatives you've considered, if relevant
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: