8000 WIP: Implementation of Group Lasso model by fabianp · Pull Request #947 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

WIP: Implementation of Group Lasso model #947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

fabianp
Copy link
Member
@fabianp fabianp commented Jul 11, 2012

Hi all,

this is my implementation of the Group Lasso using Block Coordinate Descent. Contrary to the Lasso implementation the Cython code is maintained to a minimum and only the coordinate descent innermost loop is compiled code. The duality gap computation is maintained outside the loop.

It should be reasonably fast for small group sizes. For large groups some time could be won by translating into BLAS the innermost loop from the Cython code. However, recent developments [1] suggest that this innermost loop could be substituted with more efficient alternatives, something I am currently looking into.

I think the code as it is here might already be useful for others.

Feedback is welcome.

[1] http://www-stat.stanford.edu/~nsimon/SGLpaper.pdf

TODO:

  • example
  • implement without precomputed Gram

Python + Cython code + docstrings

(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * Sum(||w_j||_2)

where bwj is the coefficients of w in the j-th group. This model is sometimes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bwj?

@ogrisel
Copy link
Member
ogrisel commented Jul 11, 2012

Interesting work. Do you think this could be reused in the (MiniBatch)DictionaryLearning class to find more structured dictionary elements?

@ogrisel
Copy link
Member
ogrisel commented Jul 11, 2012

What other cool application could this be used for (maybe to be developed as an example)?

@mblondel
Copy link
Member

Maybe your implementation could be used to test @agramfort 's multi-task lasso implementation: if you flatten the 2d coeffcient array W and concatenate the features n_tasks times, you should be able to define per-task groups :)

@mblondel
Copy link
Member

On Wed, Jul 11, 2012 at 9:40 PM, Olivier Grisel <
reply@reply.github.com

wrote:

Interesting work. Do you think this could be reused in the
(MiniBatch)DictionaryLearning class to find more structured dictionary
elements?

The groups must be given by the user so that seems incompatible with an
unsupervised learning, doesn't it?

@ogrisel
Copy link
Member
ogrisel commented Jul 11, 2012

You can make an assumption on the number of "topics" (to be used as groups by the sparse encoding step of the dict learning algo IIRC) so as to favor components that are enabled together. The global algo (minimizing the reconstruction error + the group structured activation penalty) will come up with complementary topics that are somehow related. I guess that if the groups are further allowed to overlap one could get even more interpretable results.

This paper by Jenatton et al. goes even further with a tree-structured sparsity penalty:

http://www.di.ens.fr/~fbach/icml2010a.pdf

@fabianp
Copy link
Member Author
fabianp commented Jul 11, 2012

On some papers I've seen that the l2 norm of each group is multiplied by the square root of the cardinality of the group, so that the objective function is:

loss + alpha sum{ sqrt(p_j) * ||w_j||}

with p_j the size of group j. I wonder if I should change it to this formulation.

@agramfort
Copy link
Member

with p_j the size of group j. I wonder if I should change it to this formulation.

no opinion on this

@GaelVaroquaux
Copy link
Member

Running the test suite with this PR breaks in
sklearn/tests/test_common.py(165)test_regressors_train()
when testing the new estimator 'GroupLasso'. I suspect that it is because the groups are not defined. I think that the right behavior would be to raise a warning, and simply use normal lasso to solve the problem in such situation.

@@ -392,6 +393,97 @@ def __init__(self, alpha=1.0, fit_intercept=True, normalize=False,
max_iter=max_iter, tol=tol, warm_start=warm_start,
positive=positive)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 says 2 lines between class definitions.

@GaelVaroquaux
Copy link
Member

This PR needs an example. I would suggest generating a non-linear regression problem with a bunch of irrelevant variables and a couple relevant one, and using a group lasso on a basis of polynomials to do prediction. Comparison with a lasso without group structure should show a better prediction.

@amueller
Copy link
Member

why didn't I notice this PR before? Cool work, Hope I have time to look at it in detail.
@vene just suggested a great example: use it together with the dictvectorizer! I think this example is in ESL, not sure though.

@agramfort
Copy link
Member

go go ! @fabianp :)

@amueller
Copy link
Member
amueller commented Sep 1, 2012

@fabianp about the squareroot question: I haven't seen that before, though I'm certainly no expert in this.
Is there some theory behind this?

@fabianp
Copy link
Member Author
fabianp commented Sep 1, 2012

hey @amueller. I don't know about the theory behind that, I've just seen both formulations on different papers.

I am however very disappointed with this approach because for groups of large size (say > 10) the Newton step has to be done with high precision or else the method starts diverging. In practice the algorithm becomes very unstable even for groups of medium size (~50 features).

This is currently not my priority, but I'd like to experiment with an alternate algorithm [1] before pushing this forward. This algorithm does gradient descent in place of the Newton method inside the coordinate update and see if that solves my stability problems.

[0] http://web.eecs.umich.edu/~hero/Preprints/manuscript_MSTO.pdf
[1] http://www-stat.stanford.edu/~nsimon/SGLpaper.pdf

@fabianp fabianp closed this Sep 13, 2012
@briancheung
Copy link
Contributor

Hey @fabianp, I was curious if this WIP was being continued somewhere? I'm looking for an implementation of group lasso mainly to compare against something I'm working on.

@fabianp
Copy link
Member Author
fabianp commented Apr 20, 2013

@briancheung I did an implementation of the sparse group lasso from the paper "A sparse-group lasso, Noah Simon et al.":

https://github.com/fabianp/group_lasso

Haven't used it much though

@briancheung
Copy link
Contributor

@fabianp Thanks, I'll take a look.

@levithatcher
Copy link

Has Fabian or anyone else furthered this helpful work? Would love to see group lasso in sklearn.

https://github.com/fabianp/group_lasso

@GaelVaroquaux
Copy link
Member

Eventhought I have personnally been a heavy user of group lasso, it seems
to me that it is a bit of niche model: it's not useful for many people. I
somewhat came to the conclusion that it couldn't a priority for
scikit-learn.

@fabianp
Copy link
Member Author

lightning has some
support for group lasso, but some help extending it would be welcome.

On Wed, Jun 15, 2016 at 5:34 PM, Gael Varoquaux notifications@github.com
wrote:

Eventhought I have personnally been a heavy user of group lasso, it seems
to me that it is a bit of niche model: it's not useful for many people. I
somewhat came to the conclusion that it couldn't a priority for
scikit-learn.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#947 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAQ8h74iyb-k6izoDcEZ4OLdgBl3B-kLks5qMBt8gaJpZM4AED51
.

@levithatcher
Copy link

@GaelVaroquaux Thanks for info! Is it that rare that people use factor cols with the l1 norm? Figured this would come up fairly frequently if one wants built-in feature selection.

@fabianp Thanks! Appreciate both of your work on these fine packages!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0