8000 Merge pull request #6498 from clamus/rand-lasso-fix-6493 · scikit-learn/scikit-learn@e2e6bde · GitHub
[go: up one dir, main page]

Skip to content

Commit e2e6bde

Browse files
committed
Merge pull request #6498 from clamus/rand-lasso-fix-6493
[MRG+1] Fix to documentation and docstring of randomized lasso and randomized logistic regression
2 parents b64e992 + 3a83071 commit e2e6bde

File tree

2 files changed

+60
-31
lines changed

2 files changed

+60
-31
lines changed

doc/modules/feature_selection.rst

Lines changed: 33 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -173,8 +173,8 @@ L1-based feature selection
173173
sparse solutions: many of their estimated coefficients are zero. When the goal
174174
is to reduce the dimensionality of the data to use with another classifier,
175175
they can be used along with :class:`feature_selection.SelectFromModel`
176-
to select the non-zero coefficients. In particular, sparse estimators useful for
177-
this purpose are the :class:`linear_model.Lasso` for regression, and
176+
to select the non-zero coefficients. In particular, sparse estimators useful
177+
for this purpose are the :class:`linear_model.Lasso` for regression, and
178178
of :class:`linear_model.LogisticRegression` and :class:`svm.LinearSVC`
179179
for classification::
180180

@@ -234,15 +234,34 @@ Randomized sparse models
234234

235235
.. currentmodule:: sklearn.linear_model
236236

237-
The limitation of L1-based sparse models is that faced with a group of
238-
very correlated features, they will select only one. To mitigate this
239-
problem, it is possible to use randomization techniques, reestimating the
240-
sparse model many times perturbing the design matrix or sub-sampling data
241-
and counting how many times a given regressor is selected.
237+
In terms of feature selection, there are some well-known limitations of
238+
L1-penalized models for regression and classification. For example, it is
239+
known that the Lasso will tend to select an individual variable out of a group
240+
of highly correlated features. Furthermore, even when the correlation between
241+
features is not too high, the conditions under which L1-penalized methods
242+
consistently select "good" features can be restrictive in general.
243+
244+
To mitigate this problem, it is possible to use randomization techniques such
245+
as those presented in [B2009]_ and [M2010]_. The latter technique, known as
246+
stability selection, is implemented in the module :mod:`sklearn.linear_model`.
247+
In the stability selection method, a subsample of the data is fit to a
248+
L1-penalized model where the penalty of a random subset of coefficients has
249+
been scaled. Specifically, given a subsample of the data
250+
:math:`(x_i, y_i), i \in I`, where :math:`I \subset \{1, 2, \ldots, n\}` is a
251+
random subset of the data of size :math:`n_I`, the following modified Lasso
252+
fit is obtained:
253+
254+
.. math:: \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{ \vert w_j \vert}{s_j},
255+
256+
where :math:`s_j \in \{s, 1\}` are independent trials of a fair Bernoulli
257+
random variable, and :math:`0<s<1` is the scaling factor. By repeating this
258+
procedure across different random subsamples and Bernoulli trials, one can
259+
count the fraction of times the randomized procedure selected each feature,
260+
and used these fractions as scores for feature selection.
242261

243262
:class:`RandomizedLasso` implements this strategy for regression
244263
settings, using the Lasso, while :class:`RandomizedLogisticRegression` uses the
245-
logistic regression and is suitable for classification tasks. To get a full
264+
logistic regression and is suitable for classification tasks. To get a full
246265
path of stabilit 10000 y scores you can use :func:`lasso_stability_path`.
247266

248267
.. figure:: ../auto_examples/linear_model/images/plot_sparse_recovery_003.png
@@ -263,12 +282,12 @@ of features non zero.
263282

264283
.. topic:: References:
265284

266-
* N. Meinshausen, P. Buhlmann, "Stability selection",
267-
Journal of the Royal Statistical Society, 72 (2010)
268-
http://arxiv.org/pdf/0809.2932
285+
.. [B2009] F. Bach, "Model-Consistent Sparse Estimation through the
286+
Bootstrap." http://hal.inria.fr/hal-00354771/
269287
270-
* F. Bach, "Model-Consistent Sparse Estimation through the Bootstrap"
271-
http://hal.inria.fr/hal-00354771/
288+
.. [M2010] N. Meinshausen, P. Buhlmann, "Stability selection",
289+
Journal of the Royal Statistical Society, 72 (2010)
290+
http://arxiv.org/pdf/0809.2932
272291
273292
Tree-based feature selection
274293
----------------------------
@@ -324,4 +343,4 @@ Then, a :class:`sklearn.ensemble.RandomForestClassifier` is trained on the
324343
transformed output, i.e. using only relevant features. You can perform
325344
similar operations with the other feature selection methods and also
326345
classifiers that provide a way to evaluate feature importances of course.
327-
See the :class:`sklearn.pipeline.Pipeline` examples for more details.
346+
See the :class:`sklearn.pipeline.Pipeline` examples for more details.

sklearn/linear_model/randomized_l1.py

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -187,9 +187,13 @@ def _randomized_lasso(X, y, weights, mask, alpha=1., verbose=False,
187187
class RandomizedLasso(BaseRandomizedLinearModel):
188188
"""Randomized Lasso.
189189
190-
Randomized Lasso works by resampling the train data and computing
191-
a Lasso on each resampling. In short, the features selected more
192-
often are good features. It is also known as stability selection.
190+
Randomized Lasso works by subsampling the training data and
191+
computing a Lasso estimate where the penalty of a random subset of
192+
coefficients has been scaled. By performing this double
193+
randomization several times, the method assigns high scores to
194+
features that are repeatedly selected across randomizations. This
195+
is known as stability selection. In short, features selected more
196+
often are considered good features.
193197
194198
Read more in the :ref:`User Guide <randomized_l1>`.
195199
@@ -201,8 +205,9 @@ class RandomizedLasso(BaseRandomizedLinearModel):
201205
article which is scaling.
202206
203207
scaling : float, optional
204-
The alpha parameter in the stability selection article used to
205-
randomly scale the features. Should be between 0 and 1.
208+
The s parameter used to randomly scale the penalty of different
209+
features (See :ref:`User Guide <randomized_l1>` for details ).
210+
Should be between 0 and 1.
206211
207212
sample_fraction : float, optional
208213
The fraction of samples to be used in each randomized design.
@@ -226,11 +231,11 @@ class RandomizedLasso(BaseRandomizedLinearModel):
226231
If True, the regressors X will be normalized before regression.
227232
This parameter is ignored when `fit_intercept` is set to False.
228233
When the regressors are normalized, note that this makes the
229-
hyperparameters learnt more robust and almost independent of the number
230-
of samples. The same property is not valid for standardized data.
231-
However, if you wish to standardize, please use
232-
`preprocessing.StandardScaler` before calling `fit` on an estimator
233-
with `normalize=False`.
234+
hyperparameters learned more robust and almost independent of
235+
the number of samples. The same property is not valid for
236+
standardized data. However, if you wish to standardize, please
237+
use `preprocessing.StandardScaler` before calling `fit` on an
238+
estimator with `normalize=False`.
234239
235240
precompute : True | False | 'auto'
236241
Whether to use a precomputed Gram matrix to speed up
@@ -307,7 +312,7 @@ class RandomizedLasso(BaseRandomizedLinearModel):
307312
308313
See also
309314
--------
310-
RandomizedLogisticRegression, LogisticRegression
315+
RandomizedLogisticRegression, Lasso, ElasticNet
311316
"""
312317
def __init__(self, alpha='aic', scaling=.5, sample_fraction=.75,
313318
n_resampling=200, selection_threshold=.25,
@@ -378,9 +383,13 @@ def _randomized_logistic(X, y, weights, mask, C=1., verbose=False,
378383
class RandomizedLogisticRegression(BaseRandomizedLinearModel):
379384
"""Randomized Logistic Regression
380385
381-
Randomized Regression works by resampling the train data and computing
382-
a LogisticRegression on each resampling. In short, the features selected
383-
more often are good features. It is also known as stability selection.
386+
Randomized Logistic Regression works by subsampling the training
387+
data and fitting a L1-penalized LogisticRegression model where the
388+
penalty of a random subset of coefficients has been scaled. By
389+
performing this double randomization several times, the method
390+
assigns high scores to features that are repeatedly selected across
391+
randomizations. This is known as stability selection. In short,
392+
features selected more often are considered good features.
384393
385394
Read more in the :ref:`User Guide <randomized_l1>`.
386395
@@ -390,8 +399,9 @@ class RandomizedLogisticRegression(BaseRandomizedLinearModel):
390399
The regularization parameter C in the LogisticRegression.
391400
392401
scaling : float, optional, default=0.5
393-
The alpha parameter in the stability selection article used to
394-
randomly scale the features. Should be between 0 and 1.
402+
The s parameter used to randomly scale the penalty of different
403+
features (See :ref:`User Guide <randomized_l1>` for details ).
404+
Should be between 0 and 1.
395405
396406
sample_fraction : float, optional, default=0.75
397407
The fraction of samples to be used in each randomized design.
@@ -484,7 +494,7 @@ class RandomizedLogisticRegression(BaseRandomizedLinearModel):
484494
485495
See also
486496
--------
487-
RandomizedLasso, Lasso, ElasticNet
497+
RandomizedLasso, LogisticRegression
488498
"""
489499
def __init__(self, C=1, scaling=.5, sample_fraction=.75,
490500
n_resampling=200,

0 commit comments

Comments
 (0)
0