[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression #6498

clamus · 2016-03-07T00:55:21Z

Todo:

Agree on the notation for scaling in relation the equations.
Agree on "See Also" if cross referencing between logistic and lasso sides need to be included

hlin117 · 2016-03-07T01:31:59Z

doc/modules/feature_selection.rst

+
+.. math::   \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{|w_j|}{s_j},
+
+where :math:`s_j \in \{s, 1\}` are independent trials of a fair Bernoulli


Let's be consistent with the docstrings of the functions. s is described as alpha in the docstrings:

scaling : float, optional, default=0.5
The alpha parameter in the stability selection article used to randomly scale the features. Should be between 0 and 1.

I thought about this but I am not sure. alpha is used throughout sklearn linear models as the penalty parameter. The unfortunate thing is that alpha is the scaling factor in the paper. Probably using alpha in the equations will be more confusing. Maybe the solution is to change the docstring for scaling and reference that this is s in the equation. What do you think?

Very good observation. Though, to stay true with the source code, I would say it's better to replace alpha with C and s with alpha. We would ideally want someone to look at the model's description, and understand the model by looking at its objective function.

But you do make a good point about trying to stay true to the paper; I would put in parenthesis what the variables C and alpha are representing from the original paper.

(Making the point that w here is beta in the paper, and s_j here is w_k in the paper seems like overkill.)

Yup, agree with you. Staying consistent internally is better to match the paper's notation. Though, there is an additional "issue". This is that alpha and C in the code refer to the penalization in the Lasso and Logistic regression models, respectively. And in the code, alpha is not used as scaler, but a variable called weights. I just submitted a change that does not call alpha the scaling parameter, but just s. This might be better. What do you think?

hlin117 · 2016-03-07T01:54:21Z

These are very good edits. Thanks for the clarification in the documentation, @clamus.

clamus · 2016-03-07T02:58:03Z

Thanks so much for the comments @hlin117! I fixed most of them, except the two things I mention now in the commits message.

agramfort · 2016-03-07T09:41:14Z

doc/modules/feature_selection.rst

@@ -267,9 +283,6 @@ of features non zero.
     Journal of the Royal Statistical Society, 72 (2010)
     http://arxiv.org/pdf/0809.2932

-   * F. Bach, "Model-Consistent Sparse Estimation through the Bootstrap"
-     http://hal.inria.fr/hal-00354771/


I think that Bach's procedure (the Bolasso) does things somewhat different from what is implemented in sklearn. Also, what is implemented does match Meinshausen & Buhlmann's Stability selection procedure.

Specifically, Bach's Bolasso gets bootstrap replicates of the data, via sampling data pairs with replacement or residuals of a lasso fit, then fits a regular lasso to each bootstrap replicate data set, and at last, intersects the active sets across replicates. It also gives theoretical guidance as to how to choose the regularization parameter for the lasso in both high and low dimensional setting.

On the other hand, Meinshausen & Buhlmann's stability procedure get several subsamples of the data pairs (this is a sample of a smaller size than the original and sampled without replacement), then fits the modified lasso where the penalty of a random subset of features is scaled, and then counts the number of times each feature is selected across subsamples for each value of the regularization parameter. This gives a score for each feature and regularization parameter, which is then thresholded. Then the paper tell you to pick for each feature the highest score across the thresholded regularization parameter.

So, I think they are different. What do you think? I might have missed something.

yes it's different but when we wrote this code with Gael it was after studying these papers. It's giving credit... I am +0.5 on keeping the ref.

That's good point, now I think credit to Bach's Bolasso need to be included. What about keeping the reference and making a little note clarifying that what is implemented is the Stability procedure, and that a similar randomization approach can be obtained via the Bolasso?

@agramfort, I added back the reference to Bach's Bolasso and explained in documentation that the implementation follows the Stability selection method.

clamus · 2016-03-07T18:22:08Z

cicleci fails as http://apt.typesafe.com is not reachable. uhm?

agramfort · 2016-03-07T21:12:24Z

LGTM

hlin117 · 2016-03-07T21:18:13Z

@agramfort, do you have any opinions of what the scaling factor s should be named as in this line?
https://github.com/scikit-learn/scikit-learn/pull/6498/files#diff-07abcf7793e58d8b49fff9d1f8b03901R256
@clamus and I were discussing earlier whether we should rename s to alpha, to remain consistent with scikit-learn's naming conventions.

We were also wondering whether alpha should be renamed to C, because C acts as the penalty parameter in the scikit-learn implementation.

hlin117 · 2016-03-07T21:19:14Z

doc/modules/feature_selection.rst

+In the stability selection method, a subsample of the data is fit to a
+L1-penalized model where the penalty of a random subset of coefficients has
+been scaled. Specifically, given a subsample of the data
+:math:`(y_i, x_i), i \in I`, where :math:`I \subset \{1, 2, \ldots, n\}` is a


I think it's more consistent in scikit-learn to use (x_i, y_i) instead of (y_i, x_i). (Citation / reference needed.)

Makes sense, will fix this.

Also, what citation is missing? I didn't see it.

When I said "Citation / reference needed", I meant that I needed to find a reference in the scikit-learn documentation that uses (x_i, y_i) instead of (y_i, x_i).

You are right about the order. Here is an example: http://scikit-learn.org/stable/modules/sgd.html#mathematical-formulation

nothing to add

clamus · 2016-03-07T21:26:12Z

I think the current variable names for scaling, etc, are the most consistent with sklearn. alpha and C through our are regularization parameters. scaling is the parameter used to scale the penalty in the randomized lasso and logistic models, and calling this in the equations s makes sense.

clamus · 2016-03-07T22:03:03Z

@hlin117 and @agramfort, thanks for reviewing this!

hlin117 · 2016-03-08T00:11:21Z

@clamus: My pleasure. Thanks for working on this. Hope you contribute more to scikit-learn!

clamus · 2016-03-11T17:29:13Z

@agramfort, should I rebase to include the fixes in upstream/master that fixed the issue with travis?

agramfort · 2016-03-11T21:49:16Z

Yes if you can please

glouppe · 2016-03-19T11:00:45Z

Thanks for you contribution! Merging

[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression

clamus · 2016-03-20T01:27:58Z

You are welcome!

clamus changed the title ~~[MRG] Fix to documentation and doctoring of randomized lasso and randomized logistic regression~~ [MRG] Fix to documentation and docstring of randomized lasso and randomized logistic regression Mar 7, 2016

hlin117 reviewed Mar 7, 2016
View reviewed changes

agramfort reviewed Mar 7, 2016
View reviewed changes

agramfort changed the title ~~[MRG] Fix to documentation and docstring of randomized lasso and randomized logistic regression~~ [MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression Mar 7, 2016

hlin117 reviewed Mar 7, 2016
View reviewed changes

Camilo Lamus added 5 commits March 11, 2016 17:04

FIX: improve docs of randomized lasso

2e7d9ad

Fix consistency in docs and docstring

bf81451

Added ref to Bach and improved docs

a754e09

Try to fix link to pdf

f81e5aa

fix x and y order

3a83071

clamus force-pushed the rand-lasso-fix-6493 branch from ae68dc6 to 3a83071 Compare March 11, 2016 22:06

glouppe added a commit that referenced this pull request Mar 19, 2016

Merge pull request #6498 from clamus/rand-lasso-fix-6493

e2e6bde

[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression

glouppe merged commit e2e6bde into scikit-learn:master Mar 19, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression #6498

[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression #6498

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		.. math:: \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{\|w_j\|}{s_j},

		where :math:`s_j \in \{s, 1\}` are independent trials of a fair Bernoulli

Uh oh!

[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression #6498

[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression #6498

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!