-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Correctness issue in LassoLars for unluckily aligned values #2746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
hum... do you have some time to investigate the issue? stability of LARS |
I can't look into it right now, but the workaround is enough for me at the moment. I'm also not that familiar with the LARS algorithm. |
So after reading through the LARS paper (Efron, et al., 2003), it looks like adding jitter to the samples to avoid the many-regressors-at-a-time problem is actually an expected part of the algorithm (section 5, last paragraph, page 32). Given that, it seems like the LassoLars call in sklearn should probably add some small amount of jitter automatically. I'm not sure how to determine how much should be added, though. |
We can add a parameter where the user can specify how much jitter to add to y. In the post above, the jitter is uniformly distributed, but I assume gaussian jitter would work too, in which case the user could specify its variance? |
Wow, thanks for taking a look at this. Yes, I'd figure some kind of noise parameter with a sensible default (maybe 10e-5 or something for single-precision floats) should work. I guess it doesn't matter much, but uniform noise might be preferable to gaussian since the parameter could be a bound on the magnitude of the noise. |
✋ I'd be happy to work on this as my first sklearn PR. To summarize my understanding:
To get set up, I'll make sure to read through the |
The new param should probably be to the class constructor rather than fit
in our convention.
|
Resolved in #15179 |
Thanks for fixing this!
…On Mon, Apr 20, 2020 at 1:45 PM Thomas J Fan ***@***.***> wrote:
Resolved in #15179
<#15179>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2746 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAD4D3WQQODZ34DKDGW4BBTRNSC2LANCNFSM4ALFSLCA>
.
|
I think I've found a correctness bug in sklearn.linear_model.LassoLars. For me it appears only for systems with some exactly aligned (i.e., non-general position) values. It can be fixed by jiggling the RHS of the system with small random offsets. Here is test python code:
import numpy as np
import sklearn.linear_model as sklm
A = np.array([[0.0,0.0,0.0,-1.0,0.0], [0.0,-1.0,0.0,0.0,0.0]])
b = np.array([-2.5,-2.5])
lars = sklm.LassoLars(alpha=0.001, fit_intercept=False)
lars.fit(A,b)
w_nojiggle = lars.coef_
jiggle_b = b + np.random.rand(2) * 0.00001
lars.fit(A,jiggle_b)
w_jiggle = lars.coef_
print 'without jiggle: ', w_nojiggle, ', residual: ', np.dot(A,w_nojiggle) - b
print 'with jiggle: ', w_jiggle, ', residual: ', np.dot(A,w_jiggle) - b
For me with the current Anaconda distribution (sklearn.version == '0.14.1'), the output is:
without jiggle: [ 0. 4.998 0. 2.498 0. ] , residual: [ 2.00000000e-03 -2.49800000e+00]
with jiggle: [ 0. 2.49799528 0. 2.49799561 0. ] , residual: [ 0.00200439 0.00200472]
The jiggled version has the expected result, whereas the no jiggle version is wrong.
The text was updated successfully, but these errors were encountered: