8000 Using Several Parameters with GridSearchCV · Issue #8243 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Using Several Parameters with GridSearchCV #8243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
retkowski opened this issue Jan 29, 2017 · 4 comments
Open

Using Several Parameters with GridSearchCV #8243

retkowski opened this issue Jan 29, 2017 · 4 comments
Labels

Comments

@retkowski
Copy link

When using (for example) the following transformators

  • CountVectorizer
  • TruncatedSVD
  • SelectKBest

with GridSearchCV it happens that it chooses a number for n_features for CountVectorizer that is less than n_components for TruncatedSVD or k for SelectKBest.

This leads to an error: ValueError: n_components must be < n_features

For SelectKBest I found a temporary solution:

class SelectAtMostKBest(SelectKBest):
    def _check_params(self, X, y):
        if not (self.k == "all" or 0 <= self.k <= X.shape[1]):
            self.k = "all"

But there is no equivalent for TruncatedSVD.
Is this behaviour intended? If yes, what can I do about this?

@jnothman jnothman added the API label Jan 30, 2017
@jnothman
Copy link
Member

Yes, this is a limitation of our API currently. One option is to use a parameter grid that only allows valid combinations, by using a list of parameter dicts wherein each setting is valid. A more robust forward-thinking solution might consider n_components and k optionally being functions of X and y.

@retkowski
Copy link
Author

@jnothman Thank you, can you please elaborate a little bit more on the function-based approach? I am also considering to use the RandomizedSearchCV, but will probably face the same problem there, right?

@jnothman
Copy link
Member
jnothman commented Jan 30, 2017 via email

@amueller
Copy link
Member
amueller commented Mar 4, 2017

How about using SelectPercentile instead? Not sure if we allow covered variance in TruncatedSVD but it would be nice to have an option that is relative to the input feature size, not absolute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
0