-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX: make GridSearchCV work with precomputed kernels #649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX: make GridSearchCV work with precomputed kernels #649
Conversation
|
||
clf = SVC(kernel='precomputed') | ||
cv = GridSearchCV(clf, {'C': [0.1, 1.0]}) | ||
cv.fit(K, y_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you should also test the score?
Or at least / also the cv.best_score_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By just checking the best_score_
attribute of cv
is defined, by checking its positive or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And add tests for the cases that raise expected exceptions with assert_raises
.
…matrix + some tests
Other than that I am 👍 for merging. |
Would be great to do the same for |
I am worried that this is a hack that special cases a problem to work around a design fault. Not to say that I am against merging, but we clearly cannot add a list of such hacks each time that a problem cannot be cross-validated by splitting X and y. This is why I mentioned 'an elegant solution' on the mailing list. |
The same problem arises in clustering algorithms with a similarity matrix. We could agree upon a flag (e.g. |
I am wondering if it wouldn't be better to have a dedicated API for kernel / similarity fitting models. Like:
That would make it possible to provide a |
It's funny because some time ago, I already made a similar pull request that yielded a similar discussion from the same people and the patch withered away... I completely agree with you about the fact that it's a wider design problem that needs a proper generic solution. However, I also believe that this particular use case (cross-validation with non-linear SVMs) is frequent (at least in my domain) and therefore this god-awful hack might still be of some use. Furthermore, I don't think that it hugely pollutes the code base. French people know that sometimes "le mieux est l'ennemi du bien" ("The best is the enemy of the good"). And as savvy pythonistas you also know that
Nevertheless, I agree that methods eating kernels or similarity matrices are somewhat problematic in scikit-learn, because of the breakage of the |
I like @ogrisel's idea. So maybe a combination of @ogrisel's idea and mine? Have a grid search tool that uses the fit_kernel function? |
As an afterthought: would this function be used for algorithms using similarity and using dissimilarity? that might be confusing. |
BTW, I'm +1 for merging this PR as a temporary fix until we figure out a nicer solution. |
Ok, let's merge :) |
I am also ok for merging this a temporary fix but we should really think on how to better improve the API for precomputed kernel / affinity / distances matrices (SVM, KernelPCA, SpectralClustering...). |
+1 on temporary fix and on improving long term. ----- Original message -----
|
FIX: make GridSearchCV work with precomputed kernels
Thanks for the contribution @daien :) |
You're very welcome :-) I hope I will be able to help more when I'm done with writing... And not just with hacks this time ;-) |
@daien: any contribution is welcomed. Thanks a lot. |
Wouldn't a simple use_kernel option in grid search and Cross validation do the stuff ? |
The same hack should be added to cross_validation.cross_val_score() |
@alexis-mignon This should be fixed in #803 but that hasn't been reviewed yet :-/ |
Simple fix (+test) based on checking whether
base_clf.kernel == 'precomputed'