8000 Adaptive lasso by henridwyer · Pull Request #4912 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Adaptive lasso #4912

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from
Closed

Conversation

henridwyer
Copy link
@henridwyer henridwyer commented Jun 30, 2015

Implement adaptive Lasso, and resolves issue #555


This change is Reviewable

@amueller
Copy link
Member
amueller commented Jul 1, 2015

Cool. Can you maybe post the plot from the example? Have you done any real-world comparison / benchmarks?

@amueller
Copy link
Member
amueller commented Jul 1, 2015

hm, your references are much older than the nips reference. What is the difference in the nips paper?

@henridwyer
Copy link
Author

I couldn't find the nips paper, so I am not sure. Do you have a link to it?

1/n * ||y - X Beta||^2_2 + alpha * w |Beta|_1

Where w is a weight vector calculated in the previous stage by::
w_j = alpha/(|Beta_j|^gamma + eps)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should clarify what cost function you're actually minimizing with this reweighting scheme.

see this old snippet of mine:

https://gist.github.com/agramfort/1610922

@agramfort
Copy link
Member

can you please share a screenshot of the output of the example?

@henridwyer
Copy link
Author

Here is the example graph:
adaptive lasso example graph

I clarified the objective in the docstring.

I updated the docstring to be compliant with pep257 (I think?). For the fit method, I had copied the docstring from the ElasticNet fit function so I updated that one too.

I also changed the references a bit.

The Adaptive Lasso and Its Oracle Properties
Journal of the American Statistical Association
"""
def __init__(self, n_lasso_iterations=2, gamma=1, alpha=1.0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_lasso_iterations=2 seems small. Also is eps=1e-3 a good default? I personally don't use eps (set it to zero) and I discard features once they have been zeroed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I was thinking that the adaptive lasso corresponds to 2 steps, but this could be increased to more (5?).

About eps: 0.001 was the value they used in another article (added ref), that's why I figured it was a sensible default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I was thinking that the adaptive lasso corresponds to 2 steps, but
this could be increased to more (5?).

you should iterate until the cost function stop decreasing up to a given
tolerance. My experience with neuroscience data is that 5 to 10 iterations
is enough.

About eps: 0.001 was the value they used in another article (added ref),
that's why I figured it was a sensible default.

hum ok...

@henridwyer
Copy link
Author

I added calculation of the objective function, and iterations until the progress of the objective is below ada_tol

@henridwyer
Copy link
Author

I changed n_lasso_iterations to max_lasso_iterations and set the default value to 20

Gasso, G., Rakotomamonjy, A., & Canu, S.
Recovering Sparse Signals With a Certain Family of Nonconvex
Penalties and DC Programming
IEEE Trans. Signal Process., 4686-4698.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the year?

@agramfort
Copy link
Member
agramfort commented Feb 23, 2016 via email

@henridwyer henridwyer reopened this Mar 23, 2016
@henridwyer
Copy link
Author

I rewrote the class with a few major changes:

  • changed the parameters. In particular there is now a penalty parameter to define which penalty to use.
  • added the log and scad penalty as separate penalties.
  • calculated the loss at each iteration to check for convergence
  • more complete, testing for sparser coefficients and testing that the loss decreases at each iteration, which it should according to the paper.

I had a bit of trouble with getting _p and _p_prime to work, since the pickling test would fail (originally I had a factory method). Is this the right way to implement those?

@agramfort
Copy link
Member

sleeping over it I feel this should be a metaestimator that allows you to reweight all linear models that expose a .coef_ parameter. It would then work for MultiTaskLasso, sparse logistic regression, Elastic-Net etc.

@henridwyer
Copy link
Author

so something like RANSACRegressor ?

I can change it a bit to take an estimator as a parameter. Should it stay in the same file?

@agramfort
Copy link
Member
agramfort commented Mar 23, 2016 via email

@henridwyer
Copy link
Author

Thinking about it a bit more, it makes sense to be able to use a multitask lasso and logistic regression - but I'm not sure about other estimators (I've never seen reweighted elasticnet).

@henridwyer
Copy link
Author

@agramfort any update?

@howthebodyworks
Copy link

Updated the reference request in the attached issue. FWIW, the key point here for me would be simply allowing a "weight" to each observation so that the user can choose their own reweighting scheme, Adaptive Lasso or one of the many others. This is the approach taken with R's glmnet and people regularly publish new papers testing out new and different weighting schemes.

The optimization objective for the AdaptiveLasso is::

(1 / (2 * n_samples)) * ||y - X Beta||^2_2
+ alpha * \sum_j p(|Beta|_j)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be |Beta_j| not |Beta|_j

@henridwyer
Copy link
Author

@danmackinlay what API are you imagining? You would want more granularity than the penalty and q parameters ?


for k in xrange(1, self.max_lasso_iterations):
self.n_iter_ = k
weights = self._p_prime(np.abs(self.coef_))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This weighted fitting feels like it could potentially live in the parent class, since weights are not specific to the Adaptive Lasso but occur in other variant of the Lasso; the Adaptive Lasso is unique for how and when it chooses weights, not that it has weights.

@howthebodyworks
Copy link

@henridwyer sorry, that may not have been clear; For AdaptiveLasso I think the proposed API is great; rather it's the implementation as concerns the relationship between it and Lasso. If I had my brain engaged I would have been more clear. Trying again:

I believe weighted coefficient penalties are useful for many Lasso variants that could subclass Lasso in addition to AdaptiveLasso e.g. custom robust fitting, or a-priori-given weights because I don't want to penalise some coefficients etc.
So my suggestion is that the weight calculation naturally belongs in Adaptive Lasso, but the weights should be optionally selectable in the parent Lasso class (as with e.g. the non-adaptive penalty.weight in glmnet, and AdaptiveLasso could then pass those weights to Lasso

(I also think we should allow observation weights, like glmnet, FWIW, but that's a separate question.)

equivalent to a Lasso, solved by the :class:`Lasso`, and
max_lasso_iterations = 2 is equivalent to an adaptive Lasso.

penalty : 'scad' | 'log' | 'lq' (default='lq')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistency: in the actual constructor the default is 'log'

The Adaptive Lasso and Its Oracle Properties
Journal of the American Statistical Association, 2006.
"""
def __init__(self, max_lasso_iterations=30, penalty='log', q=None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is called adaptive Lasso, the log penalty as default is might be surprising; it should probably default to a parameters from the original lasso paper, e.g. penalty='lq', q=1

@agramfort
Copy link
Member

it don't think it deserves to be a core estimator. It's simple to implement and can eventually be a simple example.

@agramfort agramfort closed this Jul 16, 2018
----------
coef_ : array, shape (n_features,) | (n_targets, n_features)
parameter vector (w in the cost function formula)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be (Beta in the cost function formula)

or self.max_lasso_iterations <= 0:
raise ValueError("Maximum number of Lasso iterations must be"
" positive; got (max_iter=%r)" % self.max_iter)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError raised for max_iter parameter value but conditioned on max_lasso_iterations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0