-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Adaptive lasso #4912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adaptive lasso #4912
Conversation
avoid mutating eps change array division to use np.divide
Cool. Can you maybe post the plot from the example? Have you done any real-world comparison / benchmarks? |
hm, your references are much older than the nips reference. What is the difference in the nips paper? |
I couldn't find the nips paper, so I am not sure. Do you have a link to it? |
1/n * ||y - X Beta||^2_2 + alpha * w |Beta|_1 | ||
|
||
Where w is a weight vector calculated in the previous stage by:: | ||
w_j = alpha/(|Beta_j|^gamma + eps) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should clarify what cost function you're actually minimizing with this reweighting scheme.
see this old snippet of mine:
can you please share a screenshot of the output of the example? |
f6f2338
to
14426ce
Compare
The Adaptive Lasso and Its Oracle Properties | ||
Journal of the American Statistical Association | ||
""" | ||
def __init__(self, n_lasso_iterations=2, gamma=1, alpha=1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_lasso_iterations=2 seems small. Also is eps=1e-3 a good default? I personally don't use eps (set it to zero) and I discard features once they have been zeroed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I was thinking that the adaptive lasso corresponds to 2 steps, but this could be increased to more (5?).
About eps: 0.001 was the value they used in another article (added ref), that's why I figured it was a sensible default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I was thinking that the adaptive lasso corresponds to 2 steps, but
this could be increased to more (5?).you should iterate until the cost function stop decreasing up to a given
tolerance. My experience with neuroscience data is that 5 to 10 iterations
is enough.About eps: 0.001 was the value they used in another article (added ref),
that's why I figured it was a sensible default.hum ok...
I added calculation of the objective function, and iterations until the progress of the objective is below ada_tol |
I changed n_lasso_iterations to max_lasso_iterations and set the default value to 20 |
Gasso, G., Rakotomamonjy, A., & Canu, S. | ||
Recovering Sparse Signals With a Certain Family of Nonconvex | ||
Penalties and DC Programming | ||
IEEE Trans. Signal Process., 4686-4698. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the year?
Ok sounds good
|
I rewrote the class with a few major changes:
I had a bit of trouble with getting _p and _p_prime to work, since the pickling test would fail (originally I had a factory method). Is this the right way to implement those? |
sleeping over it I feel this should be a metaestimator that allows you to reweight all linear models that expose a .coef_ parameter. It would then work for MultiTaskLasso, sparse logistic regression, Elastic-Net etc. |
so something like RANSACRegressor ? I can change it a bit to take an estimator as a parameter. Should it stay in the same file? |
yes but let's first maybe see what other core dev think?
we're quite slow adding new code in sklearn these days
|
Thinking about it a bit more, it makes sense to be able to use a multitask lasso and logistic regression - but I'm not sure about other estimators (I've never seen reweighted elasticnet). |
@agramfort any update? |
Updated the reference request in the attached issue. FWIW, the key point here for me would be simply allowing a "weight" to each observation so that the user can choose their own reweighting scheme, Adaptive Lasso or one of the many others. This is the approach taken with R's |
The optimization objective for the AdaptiveLasso is:: | ||
|
||
(1 / (2 * n_samples)) * ||y - X Beta||^2_2 | ||
+ alpha * \sum_j p(|Beta|_j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that should be |Beta_j|
not |Beta|_j
@danmackinlay what API are you imagining? You would want more granularity than the penalty and q parameters ? |
|
||
for k in xrange(1, self.max_lasso_iterations): | ||
self.n_iter_ = k | ||
weights = self._p_prime(np.abs(self.coef_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This weighted fitting feels like it could potentially live in the parent class, since weights are not specific to the Adaptive Lasso but occur in other variant of the Lasso; the Adaptive Lasso is unique for how and when it chooses weights, not that it has weights.
@henridwyer sorry, that may not have been clear; For I believe weighted coefficient penalties are useful for many Lasso variants that could subclass (I also think we should allow observation weights, like |
equivalent to a Lasso, solved by the :class:`Lasso`, and | ||
max_lasso_iterations = 2 is equivalent to an adaptive Lasso. | ||
|
||
penalty : 'scad' | 'log' | 'lq' (default='lq') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistency: in the actual constructor the default is 'log'
The Adaptive Lasso and Its Oracle Properties | ||
Journal of the American Statistical Association, 2006. | ||
""" | ||
def __init__(self, max_lasso_iterations=30, penalty='log', q=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is called adaptive Lasso, the log
penalty as default is might be surprising; it should probably default to a parameters from the original lasso paper, e.g. penalty='lq', q=1
it don't think it deserves to be a core estimator. It's simple to implement and can eventually be a simple example. |
---------- | ||
coef_ : array, shape (n_features,) | (n_targets, n_features) | ||
parameter vector (w in the cost function formula) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be (Beta in the cost function formula)
or self.max_lasso_iterations <= 0: | ||
raise ValueError("Maximum number of Lasso iterations must be" | ||
" positive; got (max_iter=%r)" % self.max_iter) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError raised for max_iter
parameter value but conditioned on max_lasso_iterations
Implement adaptive Lasso, and resolves issue #555
This change is