scikit-learn · glouppe · Mar 19, 2016 · Mar 7, 2016 · Mar 7, 2016 · Mar 7, 2016
diff --git a/doc/modules/feature_selection.rst b/doc/modules/feature_selection.rst
@@ -173,8 +173,8 @@ L1-based feature selection
 sparse solutions: many of their estimated coefficients are zero. When the goal
 is to reduce the dimensionality of the data to use with another classifier,
 they can be used along with :class:`feature_selection.SelectFromModel`
-to select the non-zero coefficients. In particular, sparse estimators useful for
-this purpose are the :class:`linear_model.Lasso` for regression, and
+to select the non-zero coefficients. In particular, sparse estimators useful
+for this purpose are the :class:`linear_model.Lasso` for regression, and
 of :class:`linear_model.LogisticRegression` and :class:`svm.LinearSVC`
 for classification::
 
@@ -234,15 +234,34 @@ Randomized sparse models
 
 .. currentmodule:: sklearn.linear_model
 
-The limitation of L1-based sparse models is that faced with a group of
-very correlated features, they will select only one. To mitigate this
-problem, it is possible to use randomization techniques, reestimating the
-sparse model many times perturbing the design matrix or sub-sampling data
-and counting how many times a given regressor is selected.
+In terms of feature selection, there are some well-known limitations of
+L1-penalized models for regression and classification. For example, it is
+known that the Lasso will tend to select an individual variable out of a group
+of highly correlated features. Furthermore, even when the correlation between
+features is not too high, the conditions under which L1-penalized methods
+consistently select "good" features can be restrictive in general.
+
+To mitigate this problem, it is possible to use randomization techniques such
+as those presented in [B2009]_ and [M2010]_. The latter technique, known as
+stability selection, is implemented in the module :mod:`sklearn.linear_model`.
+In the stability selection method, a subsample of the data is fit to a
+L1-penalized model where the penalty of a random subset of coefficients has
+been scaled. Specifically, given a subsample of the data
+:math:`(x_i, y_i), i \in I`, where :math:`I \subset \{1, 2, \ldots, n\}` is a
+random subset of the data of size :math:`n_I`, the following modified Lasso
+fit is obtained:
+
+.. math::   \hat{w_I} = \mathrm{arg}\min_{w} \frac{1}{2n_I} \sum_{i \in I} (y_i - x_i^T w)^2 + \alpha \sum_{j=1}^p \frac{ \vert w_j \vert}{s_j},
+
+where :math:`s_j \in \{s, 1\}` are independent trials of a fair Bernoulli
+random variable, and :math:`0<s<1` is the scaling factor. By repeating this
+procedure across different random subsamples and Bernoulli trials, one can
+count the fraction of times the randomized procedure selected each feature,
+and used these fractions as scores for feature selection.
 
 :class:`RandomizedLasso` implements this strategy for regression
 settings, using the Lasso, while :class:`RandomizedLogisticRegression` uses the
-logistic regression and is suitable for classification tasks.  To get a full
+logistic regression and is suitable for classification tasks. To get a full
 path of stability scores you can use :func:`lasso_stability_path`.
 
 .. figure:: ../auto_examples/linear_model/images/plot_sparse_recovery_003.png
@@ -263,12 +282,12 @@ of features non zero.
 
 .. topic:: References:
 
-   * N. Meinshausen, P. Buhlmann, "Stability selection",
-     Journal of the Royal Statistical Society, 72 (2010)
-     http://arxiv.org/pdf/0809.2932
+  .. [B2009] F. Bach, "Model-Consistent Sparse Estimation through the
+        Bootstrap." http://hal.inria.fr/hal-00354771/
 
-   * F. Bach, "Model-Consistent Sparse Estimation through the Bootstrap"
-     http://hal.inria.fr/hal-00354771/
+  .. [M2010] N. Meinshausen, P. Buhlmann, "Stability selection",
+       Journal of the Royal Statistical Society, 72 (2010)
+       http://arxiv.org/pdf/0809.2932
 
 Tree-based feature selection
 ----------------------------
@@ -324,4 +343,4 @@ Then, a :class:`sklearn.ensemble.RandomForestClassifier` is trained on the
 transformed output, i.e. using only relevant features. You can perform
 similar operations with the other feature selection methods and also
 classifiers that provide a way to evaluate feature importances of course.
-See the :class:`sklearn.pipeline.Pipeline` examples for more details.
+See the :class:`sklearn.pipeline.Pipeline` examples for more details.
diff --git a/sklearn/linear_model/randomized_l1.py b/sklearn/linear_model/randomized_l1.py
@@ -187,9 +187,13 @@ def _randomized_lasso(X, y, weights, mask, alpha=1., verbose=False,
 class RandomizedLasso(BaseRandomizedLinearModel):
     """Randomized Lasso.
 
-    Randomized Lasso works by resampling the train data and computing
-    a Lasso on each resampling. In short, the features selected more
-    often are good features. It is also known as stability selection.
+    Randomized Lasso works by subsampling the training data and
+    computing a Lasso estimate where the penalty of a random subset of
+    coefficients has been scaled. By performing this double
+    randomization several times, the method assigns high scores to
+    features that are repeatedly selected across randomizations. This
+    is known as stability selection. In short, features selected more
+    often are considered good features.
 
     Read more in the :ref:`User Guide <randomized_l1>`.
 
@@ -201,8 +205,9 @@ class RandomizedLasso(BaseRandomizedLinearModel):
         article which is scaling.
 
     scaling : float, optional
-        The alpha parameter in the stability selection article used to
-        randomly scale the features. Should be between 0 and 1.
+        The s parameter used to randomly scale the penalty of different
+        features (See :ref:`User Guide <randomized_l1>` for details ).
+        Should be between 0 and 1.
 
     sample_fraction : float, optional
         The fraction of samples to be used in each randomized design.
@@ -226,11 +231,11 @@ class RandomizedLasso(BaseRandomizedLinearModel):
         If True, the regressors X will be normalized before regression.
         This parameter is ignored when `fit_intercept` is set to False.
         When the regressors are normalized, note that this makes the
-        hyperparameters learnt more robust and almost independent of the number
-        of samples. The same property is not valid for standardized data.
-        However, if you wish to standardize, please use
-        `preprocessing.StandardScaler` before calling `f
8000
it` on an estimator
-        with `normalize=False`.
+        hyperparameters learned more robust and almost independent of
+        the number of samples. The same property is not valid for
+        standardized data. However, if you wish to standardize, please
+        use `preprocessing.StandardScaler` before calling `fit` on an
+        estimator with `normalize=False`.
 
     precompute : True | False | 'auto'
         Whether to use a precomputed Gram matrix to speed up
@@ -307,7 +312,7 @@ class RandomizedLasso(BaseRandomizedLinearModel):
 
     See also
     --------
-    RandomizedLogisticRegression, LogisticRegression
+    RandomizedLogisticRegression, Lasso, ElasticNet
     """
     def __init__(self, alpha='aic', scaling=.5, sample_fraction=.75,
                  n_resampling=200, selection_threshold=.25,
@@ -378,9 +383,13 @@ def _randomized_logistic(X, y, weights, mask, C=1., verbose=False,
 class RandomizedLogisticRegression(BaseRandomizedLinearModel):
     """Randomized Logistic Regression
 
-    Randomized Regression works by resampling the train data and computing
-    a LogisticRegression on each resampling. In short, the features selected
-    more often are good features. It is also known as stability selection.
+    Randomized Logistic Regression works by subsampling the training
+    data and fitting a L1-penalized LogisticRegression model where the
+    penalty of a random subset of coefficients has been scaled. By
+    performing this double randomization several times, the method
+    assigns high scores to features that are repeatedly selected across
+    randomizations. This is known as stability selection. In short,
+    features selected more often are considered good features.
 
     Read more in the :ref:`User Guide <randomized_l1>`.
 
@@ -390,8 +399,9 @@ class RandomizedLogisticRegression(BaseRandomizedLinearModel):
         The regularization parameter C in the LogisticRegression.
 
     scaling : float, optional, default=0.5
-        The alpha parameter in the stability selection article used to
-        randomly scale the features. Should be between 0 and 1.
+        The s parameter used to randomly scale the penalty of different
+        features (See :ref:`User Guide <randomized_l1>` for details ).
+        Should be between 0 and 1.
 
     sample_fraction : float, optional, default=0.75
         The fraction of samples to be used in each randomized design.
@@ -484,7 +494,7 @@ class RandomizedLogisticRegression(BaseRandomizedLinearModel):
 
     See also
     --------
-    RandomizedLasso, Lasso, ElasticNet
+    RandomizedLasso, LogisticRegression
     """
     def __init__(self, C=1, scaling=.5, sample_fraction=.75,
                  n_resampling=200,