-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
WIP: Implementation of Group Lasso model #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Python + Cython code + docstrings
|
||
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * Sum(||w_j||_2) | ||
|
||
where bwj is the coefficients of w in the j-th group. This model is sometimes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bwj?
Interesting work. Do you think this could be reused in the (MiniBatch)DictionaryLearning class to find more structured dictionary elements? |
What other cool application could this be used for (maybe to be developed as an example)? |
Maybe your implementation could be used to test @agramfort 's multi-task lasso implementation: if you flatten the 2d coeffcient array |
On Wed, Jul 11, 2012 at 9:40 PM, Olivier Grisel <
|
You can make an assumption on the number of "topics" (to be used as groups by the sparse encoding step of the dict learning algo IIRC) so as to favor components that are enabled together. The global algo (minimizing the reconstruction error + the group structured activation penalty) will come up with complementary topics that are somehow related. I guess that if the groups are further allowed to overlap one could get even more interpretable results. This paper by Jenatton et al. goes even further with a tree-structured sparsity penalty: |
On some papers I've seen that the l2 norm of each group is multiplied by the square root of the cardinality of the group, so that the objective function is: loss + alpha sum{ sqrt(p_j) * ||w_j||} with p_j the size of group j. I wonder if I should change it to this formulation. |
no opinion on this |
Running the test suite with this PR breaks in |
@@ -392,6 +393,97 @@ def __init__(self, alpha=1.0, fit_intercept=True, normalize=False, | |||
max_iter=max_iter, tol=tol, warm_start=warm_start, | |||
positive=positive) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 says 2 lines between class definitions.
This PR needs an example. I would suggest generating a non-linear regression problem with a bunch of irrelevant variables and a couple relevant one, and using a group lasso on a basis of polynomials to do prediction. Comparison with a lasso without group structure should show a better prediction. |
why didn't I notice this PR before? Cool work, Hope I have time to look at it in detail. |
go go ! @fabianp :) |
@fabianp about the squareroot question: I haven't seen that before, though I'm certainly no expert in this. |
hey @amueller. I don't know about the theory behind that, I've just seen both formulations on different papers. I am however very disappointed with this approach because for groups of large size (say > 10) the Newton step has to be done with high precision or else the method starts diverging. In practice the algorithm becomes very unstable even for groups of medium size (~50 features). This is currently not my priority, but I'd like to experiment with an alternate algorithm [1] before pushing this forward. This algorithm does gradient descent in place of the Newton method inside the coordinate update and see if that solves my stability problems. [0] http://web.eecs.umich.edu/~hero/Preprints/manuscript_MSTO.pdf |
Hey @fabianp, I was curious if this WIP was being continued somewhere? I'm looking for an implementation of group lasso mainly to compare against something I'm working on. |
@briancheung I did an implementation of the sparse group lasso from the paper "A sparse-group lasso, Noah Simon et al.":
Haven't used it much though |
@fabianp Thanks, I'll take a look. |
Has Fabian or anyone else furthered this helpful work? Would love to see group lasso in sklearn. |
Eventhought I have personnally been a heavy user of group lasso, it seems |
lightning has some On Wed, Jun 15, 2016 at 5:34 PM, Gael Varoquaux notifications@github.com
|
@GaelVaroquaux Thanks for info! Is it that rare that people use factor cols with the l1 norm? Figured this would come up fairly frequently if one wants built-in feature selection. @fabianp Thanks! Appreciate both of your work on these fine packages! |
Hi all,
this is my implementation of the Group Lasso using Block Coordinate Descent. Contrary to the Lasso implementation the Cython code is maintained to a minimum and only the coordinate descent innermost loop is compiled code. The duality gap computation is maintained outside the loop.
It should be reasonably fast for small group sizes. For large groups some time could be won by translating into BLAS the innermost loop from the Cython code. However, recent developments [1] suggest that this innermost loop could be substituted with more efficient alternatives, something I am currently looking into.
I think the code as it is here might already be useful for others.
Feedback is welcome.
[1] http://www-stat.stanford.edu/~nsimon/SGLpaper.pdf
TODO: