-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Algorithm description for RandomizedLogisticRegression and RandomizedLasso is inaccurate #6493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I thought this was confusing too. After looking briefly at the code it seems that the algorithm actually does randomization via both the W_k as well as subsampling the data. This "double" randomization is described in Remark 4 of Meinshausen & Bühlmann, and this is what they recommend. Maybe the documentation can be extended to mention the double randomization? |
I think that's a great idea, @clamus. Thanks for checking the source code. At the very least though, the documentation should mention the randomization of the weights, because that's what Meinshausen & Bühlmann defined as their "Randomized Lasso / Logistic Regression". |
PR welcome
|
I'll do it if it is ok with @hlin117 |
@clamus. Sure, be my guest. Can you make the docs such that it has the equation embedded in there too? Something similar to the lasso regression documentation: |
yep sure |
The algorithm descriptions for RandomizedLogisticRegression and RandomizedLasso are as follows:
I don't think these descriptions are accurate. According to the original paper here, the description of the randomized lasso (and by association, the randomized logistic regression) is as follows:
(We would then find multiple values of beta-hat using randomly chosen values for W)
In other words, the algorithm resamples some default weights of the features; the algorithm doesn't sample the training set and fit to these samples (ie: it doesn't bootstrap).
I think how the documentation is currently written, it seems like we're resampling the training set like a bootstrap approach. The documentation should instead clarify that we're reweighting each feature each time we fit Lasso / LogisticRegression to the data.
Thoughts, @agramfort, @GaelVaroquaux ?
The text was updated successfully, but these errors were encountered: