WIP: Implementation of Group Lasso model #947

fabianp · 2012-07-11T12:11:26Z

Hi all,

this is my implementation of the Group Lasso using Block Coordinate Descent. Contrary to the Lasso implementation the Cython code is maintained to a minimum and only the coordinate descent innermost loop is compiled code. The duality gap computation is maintained outside the loop.

It should be reasonably fast for small group sizes. For large groups some time could be won by translating into BLAS the innermost loop from the Cython code. However, recent developments [1] suggest that this innermost loop could be substituted with more efficient alternatives, something I am currently looking into.

I think the code as it is here might already be useful for others.

Feedback is welcome.

[1] http://www-stat.stanford.edu/~nsimon/SGLpaper.pdf

TODO:

example
implement without precomputed Gram

Python + Cython code + docstrings

agramfort · 2012-07-11T12:16:25Z

sklearn/linear_model/coordinate_descent.py

+
+        (1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * Sum(||w_j||_2)
+
+    where bwj is the coefficients of w in the j-th group. This model is sometimes


ogrisel · 2012-07-11T12:37:39Z

Interesting work. Do you think this could be reused in the (MiniBatch)DictionaryLearning class to find more structured dictionary elements?

ogrisel · 2012-07-11T12:38:10Z

ogrisel commented

Jul 11, 2012

What other cool application could this be used for (maybe to be developed as an example)?

mblondel · 2012-07-11T12:38:32Z

Maybe your implementation could be used to test @agramfort 's multi-task lasso implementation: if you flatten the 2d coeffcient array W and concatenate the features n_tasks times, you should be able to define per-task groups :)

mblondel · 2012-07-11T12:55:08Z

On Wed, Jul 11, 2012 at 9:40 PM, Olivier Grisel <
reply@reply.github.com

wrote:

Interesting work. Do you think this could be reused in the
(MiniBatch)DictionaryLearning class to find more structured dictionary
elements?

The groups must be given by the user so that seems incompatible with an
unsupervised learning, doesn't it?

ogrisel · 2012-07-11T13:07:53Z

You can make an assumption on the number of "topics" (to be used as groups by the sparse encoding step of the dict learning algo IIRC) so as to favor components that are enabled together. The global algo (minimizing the reconstruction error + the group structured activation penalty) will come up with complementary topics that are somehow related. I guess that if the groups are further allowed to overlap one could get even more interpretable results.

This paper by Jenatton et al. goes even further with a tree-structured sparsity penalty:

http://www.di.ens.fr/~fbach/icml2010a.pdf

fabianp · 2012-07-11T13:22:50Z

On some papers I've seen that the l2 norm of each group is multiplied by the square root of the cardinality of the group, so that the objective function is:

loss + alpha sum{ sqrt(p_j) * ||w_j||}

with p_j the size of group j. I wonder if I should change it to this formulation.

agramfort · 2012-07-11T14:48:30Z

with p_j the size of group j. I wonder if I should change it to this formulation.

no opinion on this

GaelVaroquaux · 2012-07-23T11:39:37Z

Running the test suite with this PR breaks in
sklearn/tests/test_common.py(165)test_regressors_train()
when testing the new estimator 'GroupLasso'. I suspect that it is because the groups are not defined. I think that the right behavior would be to raise a warning, and simply use normal lasso to solve the problem in such situation.

GaelVaroquaux · 2012-07-23T11:41:44Z

sklearn/linear_model/coordinate_descent.py

@@ -392,6 +393,97 @@ def __init__(self, alpha=1.0, fit_intercept=True, normalize=False,
                            max_iter=max_iter, tol=tol, warm_start=warm_start,
                            positive=positive)



PEP8 says 2 lines between class definitions.

GaelVaroquaux · 2012-07-23T11:54:01Z

This PR needs an example. I would suggest generating a non-linear regression problem with a bunch of irrelevant variables and a couple relevant one, and using a group lasso on a basis of polynomials to do prediction. Comparison with a lasso without group structure should show a better prediction.

amueller · 2012-08-29T16:48:50Z

why didn't I notice this PR before? Cool work, Hope I have time to look at it in detail.
@vene just suggested a great example: use it together with the dictvectorizer! I think this example is in ESL, not sure though.

agramfort · 2012-08-30T07:40:19Z

go go ! @fabianp :)

amueller · 2012-09-01T00:08:10Z

@fabianp about the squareroot question: I haven't seen that before, though I'm certainly no expert in this.
Is there some theory behind this?

fabianp · 2012-09-01T11:46:19Z

hey @amueller. I don't know about the theory behind that, I've just seen both formulations on different papers.

I am however very disappointed with this approach because for groups of large size (say > 10) the Newton step has to be done with high precision or else the method starts diverging. In practice the algorithm becomes very unstable even for groups of medium size (~50 features).

This is currently not my priority, but I'd like to experiment with an alternate algorithm [1] before pushing this forward. This algorithm does gradient descent in place of the Newton method inside the coordinate update and see if that solves my stability problems.

[0] http://web.eecs.umich.edu/~hero/Preprints/manuscript_MSTO.pdf
[1] http://www-stat.stanford.edu/~nsimon/SGLpaper.pdf

briancheung · 2013-04-20T04:37:07Z

Hey @fabianp, I was curious if this WIP was being continued somewhere? I'm looking for an implementation of group lasso mainly to compare against something I'm working on.

fabianp · 2013-04-20T11:35:10Z

@briancheung I did an implementation of the sparse group lasso from the paper "A sparse-group lasso, Noah Simon et al.":

https://github.com/fabianp/group_lasso

Haven't used it much though

briancheung · 2013-04-20T20:17:25Z

@fabianp Thanks, I'll take a look.

levithatcher · 2016-06-15T15:22:42Z

Has Fabian or anyone else furthered this helpful work? Would love to see group lasso in sklearn.

https://github.com/fabianp/group_lasso

GaelVaroquaux · 2016-06-15T15:34:16Z

Eventhought I have personnally been a heavy user of group lasso, it seems
to me that it is a bit of niche model: it's not useful for many people. I
somewhat came to the conclusion that it couldn't a priority for
scikit-learn.

fabia 8000 np · 2016-06-15T15:44:05Z

lightning has some
support for group lasso, but some help extending it would be welcome.

On Wed, Jun 15, 2016 at 5:34 PM, Gael Varoquaux notifications@github.com
wrote:

Eventhought I have personnally been a heavy user of group lasso, it seems
to me that it is a bit of niche model: it's not useful for many people. I
somewhat came to the conclusion that it couldn't a priority for
scikit-learn.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#947 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAQ8h74iyb-k6izoDcEZ4OLdgBl3B-kLks5qMBt8gaJpZM4AED51
.

levithatcher · 2016-06-15T17:51:17Z

@GaelVaroquaux Thanks for info! Is it that rare that people use factor cols with the l1 norm? Figured this would come up fairly frequently if one wants built-in feature selection.

@fabianp Thanks! Appreciate both of your work on these fine packages!

Implementation of Group Lasso model

648f96d

Python + Cython code + docstrings

agramfort reviewed Jul 11, 2012
View reviewed changes

GaelVaroquaux reviewed Jul 23, 2012
View reviewed changes

fabianp closed this Sep 13, 2012

TomDLT mentioned this pull request Oct 23, 2017

Group Lasso and Sparse Group Lasso #9967

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Implementation of Group Lasso model #947

WIP: Implementation of Group Lasso model #947

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		(1 / (2 * n_samples)) * \|\|y - Xw\|\|^2_2 + alpha * Sum(\|\|w_j\|\|_2)

		where bwj is the coefficients of w in the j-th group. This model is sometimes

		@@ -392,6 +393,97 @@ def __init__(self, alpha=1.0, fit_intercept=True, normalize=False,
		max_iter=max_iter, tol=tol, warm_start=warm_start,
		positive=positive)

Uh oh!

WIP: Implementation of Group Lasso model #947

WIP: Implementation of Group Lasso model #947

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!