8000 [MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression by clamus · Pull Request #6498 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression #6498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 19, 2016

Conversation

clamus
Copy link
@clamus clamus commented Mar 7, 2016

Fixes #6493

@hlin117 and @agramfort, this is ready.

Todo:

  • Agree on the notation for scaling in relation the equations.
  • Agree on "See Also" if cross referencing between logistic and lasso sides need to be included

@clamus clamus changed the title [MRG] Fix to documentation and doctoring of randomized lasso and randomized logistic regression [MRG] Fix to documentation and docstring of randomized lasso and randomized logistic regression Mar 7, 2016

.. math:: \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{|w_j|}{s_j},

where :math:`s_j \in \{s, 1\}` are independent trials of a fair Bernoulli
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be consistent with the docstrings of the functions. s is described as alpha in the docstrings:

scaling : float, optional, default=0.5
The alpha parameter in the stability selection article used to randomly scale the features. Should be between 0 and 1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this but I am not sure. alpha is used throughout sklearn linear models as the penalty parameter. The unfortunate thing is that alpha is the scaling factor in the paper. Probably using alpha in the equations will be more confusing. Maybe the solution is to change the docstring for scaling and reference that this is s in the equation. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good observation. Though, to stay true with the source code, I would say it's better to replace alpha with C and s with alpha. We would ideally want someone to look at the model's description, and understand the model by looking at its objective function.

But you do make a good point about trying to stay true to the paper; I would put in parenthesis what the variables C and alpha are representing from the original paper.

(Making the point that w here is beta in the paper, and s_j here is w_k in the paper seems like overkill.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, agree with you. Staying consistent internally is better to match the paper's notation. Though, there is an additional "issue". This is that alpha and C in the code refer to the penalization in the Lasso and Logistic regression models, respectively. And in the code, alpha is not used as scaler, but a variable called weights. I just submitted a change that does not call alpha the scaling parameter, but just s. This might be better. What do you think?

@hlin117
Copy link
Contributor
hlin117 commented Mar 7, 2016

These are very good edits. Thanks for the clarification in the documentation, @clamus.

@clamus
Copy link
Author
clamus commented Mar 7, 2016

Thanks so much for the comments @hlin117! I fixed most of them, except the two things I mention now in the commits message.

@@ -267,9 +283,6 @@ of features non zero.
Journal of the Royal Statistical Society, 72 (2010)
http://arxiv.org/pdf/0809.2932

* F. Bach, "Model-Consistent Sparse Estimation through the Bootstrap"
http://hal.inria.fr/hal-00354771/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that Bach's procedure (the Bolasso) does things somewhat different from what is implemented in sklearn. Also, what is implemented does match Meinshausen & Buhlmann's Stability selection procedure.

Specifically, Bach's Bolasso gets bootstrap replicates of the data, via sampling data pairs with replacement or residuals of a lasso fit, then fits a regular lasso to each bootstrap replicate data set, and at last, intersects the active sets across replicates. It also gives theoretical guidance as to how to choose the regularization parameter for the lasso in both high and low dimensional setting.

On the other hand, Meinshausen & Buhlmann's stability procedure get several subsamples of the data pairs (this is a sample of a smaller size than the original and sampled without replacement), then fits the modified lasso where the penalty of a random subset of features is scaled, and then counts the number of times each feature is selected across subsamples for each value of the regularization parameter. This gives a score for each feature and regularization parameter, which is then thresholded. Then the paper tell you to pick for each feature the highest score across the thresholded regularization parameter.

So, I think they are different. What do you think? I might have missed something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good point, now I think credit to Bach's Bolasso need to be included. What about keeping the reference and making a little note clarifying that what is implemented is the Stability procedure, and that a similar randomization approach can be obtained via the Bolasso?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort, I added back the reference to Bach's Bolasso and explained in documentation that the implementation follows the Stability selection method.

@clamus
Copy link
Author
clamus commented Mar 7, 2016

cicleci fails as http://apt.typesafe.com is not reachable. uhm?

@agramfort
Copy link
Member

LGTM

@agramfort agramfort changed the title [MRG] Fix to documentation and docstring of randomized lasso and randomized logistic regression [MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression Mar 7, 2016
@hlin117
Copy link
Contributor
hlin117 commented Mar 7, 2016

@agramfort, do you have any opinions of what the scaling factor s should be named as in this line?
https://github.com/scikit-learn/scikit-learn/pull/6498/files#diff-07abcf7793e58d8b49fff9d1f8b03901R256
@clamus and I were discussing earlier whether we should rename s to alpha, to remain consistent with scikit-learn's naming conventions.

We were also wondering whether alpha should be renamed to C, because C acts as the penalty parameter in the scikit-learn implementation.

In the stability selection method, a subsample of the data is fit to a
L1-penalized model where the penalty of a random subset of coefficients has
been scaled. Specifically, given a subsample of the data
:math:`(y_i, x_i), i \in I`, where :math:`I \subset \{1, 2, \ldots, n\}` is a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more consistent in scikit-learn to use (x_i, y_i) instead of (y_i, x_i). (Citation / reference needed.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will fix this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what citation is missing? I didn't see it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I said "Citation / reference needed", I meant that I needed to find a reference in the scikit-learn documentation that uses (x_i, y_i) instead of (y_i, x_i).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right about the order. Here is an example: http://scikit-learn.org/stable/modules/sgd.html#mathematical-formulation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clamus
Copy link
Author
clamus commented Mar 7, 2016

I think the current variable names for scaling, etc, are the most consistent with sklearn. alpha and C through our are regularization parameters. scaling is the parameter used to scale the penalty in the randomized lasso and logistic models, and calling this in the equations s makes sense.

@clamus
Copy link
Author
clamus commented Mar 7, 2016

@hlin117 and @agramfort, thanks for reviewing this!

@hlin117
Copy link
Contributor
hlin117 commented Mar 8, 2016

@clamus: My pleasure. Thanks for working on this. Hope you contribute more to scikit-learn!

@clamus
Copy link
Author
clamus commented Mar 11, 2016

@agramfort, should I rebase to include the fixes in upstream/master that fixed the issue with travis?

@agramfort
Copy link
Member
agramfort commented Mar 11, 2016 via email

@clamus clamus force-pushed the rand-lasso-fix-6493 branch from ae68dc6 to 3a83071 Compare March 11, 2016 22:06
@glouppe
Copy link
Contributor
glouppe commented Mar 19, 2016

Thanks for you contribution! Merging

glouppe added a commit that referenced this pull request Mar 19, 2016
[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression
@glouppe glouppe merged commit e2e6bde into scikit-learn:master Mar 19, 2016
@clamus
Copy link
Author
clamus commented Mar 20, 2016

You are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0