FIX: make GridSearchCV work with precomputed kernels #649

daien · 2012-02-23T13:33:57Z

Simple fix (+test) based on checking whether base_clf.kernel == 'precomputed'

amueller · 2012-02-23T13:41:11Z

sklearn/tests/test_grid_search.py

+
+    clf = SVC(kernel='precomputed')
+    cv = GridSearchCV(clf, {'C': [0.1, 1.0]})
+    cv.fit(K, y_)


Maybe you should also test the score?
Or at least / also the cv.best_score_?

By just checking the best_score_ attribute of cv is defined, by checking its positive or something else?

And add tests for the cases that raise expected exceptions with assert_raises.

…matrix + some tests

ogrisel · 2012-02-23T15:03:05Z

Other than that I am 👍 for merging.

ogrisel · 2012-02-23T15:03:52Z

Would be great to do the same for sklearn.cross_validation.cross_val_score: maybe there is some code to factorize.

GaelVaroquaux · 2012-02-25T19:47:14Z

I am worried that this is a hack that special cases a problem to work around a design fault.

Not to say that I am against merging, but we clearly cannot add a list of such hacks each time that a problem cannot be cross-validated by splitting X and y. This is why I mentioned 'an elegant solution' on the mailing list.

mblondel · 2012-03-02T13:42:12Z

The same problem arises in clustering algorithms with a similarity matrix. We could agree upon a flag (e.g. symmetric_X_ = True) that must be set by the estimator in fit.

ogrisel · 2012-03-02T13:48:24Z

I am wondering if it wouldn't be better to have a dedicated API for kernel / similarity fitting models. Like:

clf.fit_kernel(K, y) instead.

That would make it possible to provide a fit(X, y) method that would accept a design matrix X and turn it into a kernel using the kneighbors_graph stuff automatically for the unsuspecting user of the SpectralClustering class for instance.

daien · 2012-03-02T14:37:06Z

It's funny because some time ago, I already made a similar pull request that yielded a similar discussion from the same people and the patch withered away...

I completely agree with you about the fact that it's a wider design problem that needs a proper generic solution. However, I also believe that this particular use case (cross-validation with non-linear SVMs) is frequent (at least in my domain) and therefore this god-awful hack might still be of some use. Furthermore, I don't think that it hugely pollutes the code base.

French people know that sometimes "le mieux est l'ennemi du bien" ("The best is the enemy of the good"). And as savvy pythonistas you also know that

Special cases aren't special enough to break the rules.
Although practicality beats purity.

Nevertheless, I agree that methods eating kernels or similarity matrices are somewhat problematic in scikit-learn, because of the breakage of the X = [n_samples, n_features] API convention. It surely needs a proper discussion, but, IMHO, it's not going to be trivial to solve while maintaining the beauty and simplicity of the existing API. If it was, you would have already done it ;-) But I'm hopeful!

amueller · 2012-03-03T10:45:24Z

I like @ogrisel's idea.
Another idea I just had (for this particular case) to subclass GridSearchCV for the "percomputed kernel/distance" case.
A problem would then be, though, to ensure that people only call it with the right kind of estimators.

So maybe a combination of @ogrisel's idea and mine? Have a grid search tool that uses the fit_kernel function?
Btw, I would rather use a name without "kernel" in it. I would like something that contains "similarity" or "precomputed".

amueller · 2012-03-03T10:47:25Z

As an afterthought: would this function be used for algorithms using similarity and using dissimilarity? that might be confusing.

mblondel · 2012-03-03T11:01:55Z

BTW, I'm +1 for merging this PR as a temporary fix until we figure out a nicer solution.

amueller · 2012-03-03T11:32:03Z

Ok, let's merge :)

ogrisel · 2012-03-03T11:54:32Z

I am also ok for merging this a temporary fix but we should really think on how to better improve the API for precomputed kernel / affinity / distances matrices (SVM, KernelPCA, SpectralClustering...).

GaelVaroquaux · 2012-03-03T12:26:44Z

+1 on temporary fix and on improving long term.

----- Original message -----

I am also ok for merging this a temporary fix but we should really think
on how to better improve the API for precomputed kernel / affinity /
distances matrices (SVM, KernelPCA, SpectralClustering...).

Reply to this email directly or view it on GitHub:
#649 (comment)

FIX: make GridSearchCV work with precomputed kernels

amueller · 2012-03-03T22:06:27Z

Thanks for the contribution @daien :)

daien · 2012-03-04T08:49:22Z

You're very welcome :-) I hope I will be able to help more when I'm done with writing... And not just with hacks this time ;-)

GaelVaroquaux · 2012-03-04T14:45:46Z

@daien: any contribution is welcomed. Thanks a lot.

alexis-mignon · 2012-07-06T16:04:03Z

Wouldn't a simple use_kernel option in grid search and Cross validation do the stuff ?
Or maybe adding a kernel_data attribute to estimators ?

alexis-mignon · 2012-07-06T16:23:44Z

The same hack should be added to cross_validation.cross_val_score()

amueller · 2012-07-09T21:30:22Z

@alexis-mignon This should be fixed in #803 but that hasn't been reviewed yet :-/

FIX: make GridSearchCV work with precomputed kernels

5ae4fbc

amueller reviewed Feb 23, 2012
View reviewed changes

raise ValueError when given a kernel_function or a non-square kernel …

d685f20

…matrix + some tests

Fixed a small typo

8000 1dff36a

amueller added a commit that referenced this pull request Mar 3, 2012

Merge pull request #649 from daien/GridSearchCV_precomputed_kernel

da9528f

FIX: make GridSearchCV work with precomputed kernels

amueller merged commit da9528f into scikit-learn:master Mar 3, 2012

jnothman mentioned this pull request Dec 7, 2016

[MRG] ENH test isinstance(.., GPKernel) only if module pre-loaded #7930

Closed

amueller mentioned this pull request Dec 7, 2016

[MRG+1] allow callable kernels in cross-validation #8005

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX: make GridSearchCV work with precomputed kernels #649

FIX: make GridSearchCV work with precomputed kernels #649

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FIX: make GridSearchCV work with precomputed kernels #649

FIX: make GridSearchCV work with precomputed kernels #649

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!