-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
All kernel regression methods should accept precomputed Gram matrices #8445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
They do support |
The documentation for GPR does not say it accepts a precomputed gram matrix: KRR does indeed accept a precomputed gram matrix, but the documentation does not say so: |
any volunteer to make a PR then?
|
Sorry for prematurely closing! I'm not actually sure if there is a way for GPR to use a precomputed gram matrix. Maybe @jmetzen can say? For KRR it's a doc fix. |
(good first issue is the doc fix only, there might be another issue for the GP) |
I can do a PR for the doc fix for KRR precomputed kernels. I've used that option before (saw the option while examining the source). First sklearn issue for me. |
To do GPR with a precomputed matrix, I use the attached subclass. Please feel free to use the code if you like it. It is directly determined by the nature of the problem to be solved, the conventions of the Python community, and the sklearn API, so I consider that I have no copyright in it. |
@wesbarnett go for it! |
@chrishmorris I was hoping we could do it easier than that, maybe by using a precomputed kernel class for the kernel. |
I'm not sure what that means. There is one advantage in what I have done. Computing the gram matrix is expensive. After doing so, you can fit it multiple ways, using different models or cross-validating the parameters of one model. Maybe your approach can do this if the class is memoized. |
* document "precomputed" kernel for KernelRidge See #8445. * fix doc for KernelRidge use n_samples_fitted in score
Documentation is done, still need to work on supporting it in GP. |
It would prefer if we could support precomputed kernels without having to adapt the GP classes by adding a @chrishmorris could you check if you could adapt your code in this way? |
I've been thinking about this, but I'm not sure what you are expecting. Computing a gram matrix is expensive. When cross-validating, it is best to compute it once, then iterate through parameter values. So it seems to me right that at present fit and predict methods can accept a gram matrix. Alternatively, these methods could accept a list of objects, and the kernel function could be memoized. A practical obstacle to this is that lists in Python are mutable. Maybe the lists (or series or numpy arrays) can be converted to tuples. Even then, the objects in the list may be mutable. Are you suggesting a change of signature for fit() and predict()? If so, what to? |
Any updates for his issue? I am looking for GPR which accept precomputed kernel also.
But it don't work. (pairwise kernel want X.shape[1] = y.shape[0], but this requirement cannot be met during grid search cross validation) |
The problem there is that the |
A fix for that would be welcome. |
version: 0.22.1 FYI, there's not been an example of
As the above comments, [1] and [2], imply, the If somebody tells me how to publish a sample code onto the sklearn doc, I'll be glad to follow the instruction. Thank you. |
Examples are published by adding a file under the examples directory in
this repository. See our contributing guide.
|
Has the GPR class been updated to allow for a precomputed kernel matrix? This is quite clearly explained for KRR and SVR, but it's less obvious (if present) for GPR in the docs. Whereas in KRR and SVR, you can pass Do you have to specifically supply the kernel as an object, in contrast with KRR and SVR? Drawing parallels with KRR and SVR, I'd imagine |
See my comment of 24 May 2018. The attachment provides a subclass of GPR that accepts |
Thanks @chrishmorris. Just saw that. Exactly what I was looking for. Thanks for sharing. |
Just a head's up. The script by @chrishmorris has a bug in line 67. If Anyway, about this issue in general, I agree with @chrishmorris and do not understand why a solution like |
If we can allow precomputed kernel for Gaussian process, that would be higly apprecriated and I believe it's going to be very useful for a lot of people. |
@chrishmorris Were you planning to do a PR on this or is there still discussion happening with @amueller concerning implementation? If no one is working on this anymore, I'd be happy to look into it and work on a PR for this. |
Go ahead - I'm not planning to work on a PR. Thanks to arosen93 for bug fixes. |
SVR has an option
If this is chosen, then the X array passed to the fit method is the Gram matrix of the training examples. This option should also be available for GPR and KRR.
Here is some motivation. Many machine learning projects begin by defining a feature set, and many algorithms intrinsically require a vector of real numbers for each sample. e.g. linear regression.
Kernel methods take a different approach: the modeller supplies a function, the "kernel", which measures the similarity between two samples. This need not be expressed in terms of features. For example, it is easy to say what it means for two DNA sequences to be similar, but hard to reduce a DNA sequence to a vector of features. So SVR correctly is willing to build a model with:
(My current project is a cheminformatic one.)
It would be great if the other kernel methods supported this. At present, they require a kernel function which they pass the X values to, after checking that X is an array of floats. So some of the value of kernel methods is denied.
The text was updated successfully, but these errors were encountered: