-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Make return_std 20x faster in Gaussian Processes (includes solution) #9234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Complete solution, starting here https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L322 if not hasattr(self,'K_inv_'):
L_inv = solve_triangular(self.L_.T, np.eye(self.L_.shape[0]))
self.K_inv_ = L_inv.dot(L_inv.T)
# Compute variance of predictive distribution
y_var = self.kernel_.diag(X)
sum1 = np.dot(K_trans,self.K_inv_).T
y_var1 = y_var - np.einsum("ki,ik->k", K_trans, sum1)
# y_var2 = y_var - np.einsum("ki,kj,ij->k", K_trans, K_trans, self.K_inv_)
# assert np.all(np.abs(y_var1-y_var2)<1e-12)
y_var = y_var1 |
Thanks for the suggestion. Can you open a pull-request with this change? Note that the first part was already solved in #8591, but not the second |
I'm sorry, I can't. I'm posting this from an environment where I can access the website, but none of the other GitHub tools. |
This looks like it might consume some negligible extra memory but otherwise
should only benefit...
…On 28 Jun 2017 2:27 am, "andrewww" ***@***.***> wrote:
I'm sorry, I can't. I'm posting this from an environment where I can
access the website, but none of the other GitHub tools.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9234 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_mL80w_0o6pjedx0c9YVVnEdww8ks5sIS1TgaJpZM4OG05k>
.
|
I would like to help and make the changes if that's ok. |
Go ahead
…On Tue, Jun 27, 2017 at 4:09 PM, Minghui Liu ***@***.***> wrote:
I would like to help and make the changes if that's ok.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9234 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADvEEDB5mqN-aHBWkvx01fApzXyFXoz6ks5sIYungaJpZM4OG05k>
.
|
Fixed in 6e01fef from #9236. Thanks @minghui-liu and @Andrewww. |
BTW I did not observe a 20x speed up a in my tests. The speed stayed approximately the same on a 1000x5 dataset generated with Note: the second call to predict is significantly faster because of the cached @Andrewww I would be curious to know more about the kind of data where you observed the initially reported speedup. |
Hmmm.... This was literally a make-or-break change to the code for me, i.e., the code was so slow I could not actually use it without this change (the change to the .einsum() call). Only thing I can think of is: I'm on Windows 7 x64 / Anaconda 3.1.4. Doesn't numpy sometimes behave differently on different platforms? Maybe the Windows Also, apparently it was already fixed in an earlier issue #8591 |
Is anyone still having problems with this? I have a model trained on 15,000 data points, and predictions with |
Two requests:
(1) Please replace this line:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/gpr.py#L329
from this
to this
For an input data set of size 800x1, the time difference is 12.7 seconds to 0.2 seconds. I have validated that the result is the same up to 1e-12 or smaller.
(2) Please cache the result of the K_inv computation. It depends only on the result of training, and can be very costly for repeated calls to the class.
The text was updated successfully, but these errors were encountered: