10000 Algorithm description for RandomizedLogisticRegression and RandomizedLasso is inaccurate · Issue #6493 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Algorithm description for RandomizedLogisticRegression and RandomizedLasso is inaccurate #6493
Closed
@hlin117

Description

@hlin117

The algorithm descriptions for RandomizedLogisticRegression and RandomizedLasso are as follows:

Randomized Logistic Regression
Randomized Regression works by resampling the train data and computing a LogisticRegression on each resampling. In short, the features selected more often are good features. It is also known as stability selection.

Randomized Lasso.
Randomized Lasso works by resampling the train data and computing a Lasso on each resampling. In short, the features selected more often are good features. It is also known as stability selection.

I don't think these descriptions are accurate. According to the original paper here, the description of the randomized lasso (and by association, the randomized logistic regression) is as follows:

screenshot 2016-03-05 13 38 20

(We would then find multiple values of beta-hat using randomly chosen values for W)

In other words, the algorithm resamples some default weights of the features; the algorithm doesn't sample the training set and fit to these samples (ie: it doesn't bootstrap).

I think how the documentation is currently written, it seems like we're resampling the training set like a bootstrap approach. The documentation should instead clarify that we're reweighting each feature each time we fit Lasso / LogisticRegression to the data.

Thoughts, @agramfort, @GaelVaroquaux ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0