Correctness issue in LassoLars for unluckily aligned values #2746

fcole · 2014-01-13T19:24:52Z

I think I've found a correctness bug in sklearn.linear_model.LassoLars. For me it appears only for systems with some exactly aligned (i.e., non-general position) values. It can be fixed by jiggling the RHS of the system with small random offsets. Here is test python code:

import numpy as np
import sklearn.linear_model as sklm

A = np.array([[0.0,0.0,0.0,-1.0,0.0], [0.0,-1.0,0.0,0.0,0.0]])
b = np.array([-2.5,-2.5])

lars = sklm.LassoLars(alpha=0.001, fit_intercept=False)
lars.fit(A,b)
w_nojiggle = lars.coef_

jiggle_b = b + np.random.rand(2) * 0.00001

lars.fit(A,jiggle_b)
w_jiggle = lars.coef_

print 'without jiggle: ', w_nojiggle, ', residual: ', np.dot(A,w_nojiggle) - b
print 'with jiggle: ', w_jiggle, ', residual: ', np.dot(A,w_jiggle) - b

For me with the current Anaconda distribution (sklearn.version == '0.14.1'), the output is:

without jiggle: [ 0. 4.998 0. 2.498 0. ] , residual: [ 2.00000000e-03 -2.49800000e+00]
with jiggle: [ 0. 2.49799528 0. 2.49799561 0. ] , residual: [ 0.00200439 0.00200472]

The jiggled version has the expected result, whereas the no jiggle version is wrong.

agramfort · 2014-01-13T23:06:25Z

hum... do you have some time to investigate the issue? stability of LARS
can be a pain...

fcole · 2014-01-14T19:37:13Z

I can't look into it right now, but the workaround is enough for me at the moment. I'm also not that familiar with the LARS algorithm.

fcole · 2014-01-21T23:31:48Z

So after reading through the LARS paper (Efron, et al., 2003), it looks like adding jitter to the samples to avoid the many-regressors-at-a-time problem is actually an expected part of the algorithm (section 5, last paragraph, page 32). Given that, it seems like the LassoLars call in sklearn should probably add some small amount of jitter automatically. I'm not sure how to determine how much should be added, though.

jnothman · 2019-04-10T11:37:49Z

We can add a parameter where the user can specify how much jitter to add to y. In the post above, the jitter is uniformly distributed, but I assume gaussian jitter would work too, in which case the user could specify its variance?

fcole · 2019-04-10T14:40:58Z

Wow, thanks for taking a look at this.

Yes, I'd figure some kind of noise parameter with a sensible default (maybe 10e-5 or something for single-precision floats) should work. I guess it doesn't matter much, but uniform noise might be preferable to gaussian since the parameter could be a bound on the magnitude of the noise.

angelaambroz · 2019-10-10T14:23:10Z

✋ I'd be happy to work on this as my first sklearn PR. To summarize my understanding:

We need to add a noise kwarg to the LassoLars.fit() function.
Two options on the noise's distribution: uniform and bounded by the param; or Gaussian with the param being its variance. I can implement uniform to start with and make the PR, we can discuss further there.
Probably a test of some kind... I can take a look at how other .fit functions get tested.

To get set up, I'll make sure to read through the contributing docs, get my environment set up, and read Efron et al. 2003. 😅

jnothman · 2019-10-10T20:34:00Z

The new param should probably be to the class constructor rather than fit in our convention.

thomasjpfan · 2020-04-20T17:44:52Z

Resolved in #15179

fcole · 2020-04-20T18:51:36Z

Thanks for fixing this!

…

On Mon, Apr 20, 2020 at 1:45 PM Thomas J Fan ***@***.***> wrote: Resolved in #15179 <#15179> — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2746 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAD4D3WQQODZ34DKDGW4BBTRNSC2LANCNFSM4ALFSLCA> .

amueller added the Bug label Jan 23, 2015

amueller modified the milestone: 0.19 Sep 29, 2016

jnothman added the Need Contributor label Jun 14, 2017

jnothman modified the milestones: 0.20, 0.19 Jun 14, 2017

lesteve added help wanted and removed Need Contributor labels Oct 18, 2017

glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018

jnothman added the Easy Well-defined and straightforward way to resolve label Apr 10, 2019

jnothman modified the milestones: 0.21, 0.22 Apr 10, 2019

angelaambroz mentioned this issue Oct 11, 2019

[MRG] Add jitter to LassoLars #15179

Merged

jnothman modified the milestones: 0.22, 0.23 Oct 31, 2019

NicolasHug removed the help wanted label Apr 6, 2020

thomasjpfan closed this as completed Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctness issue in LassoLars for unluckily aligned values #2746

Correctness issue in LassoLars for unluckily aligned values #2746

Correctness issue in LassoLars for unluckily aligned values #2746

Correctness issue in LassoLars for unluckily aligned values #2746

Comments