diff --git a/doc/modules/feature_selection.rst b/doc/modules/feature_selection.rst index 01e2d079c45a4..496b27daf447e 100644 --- a/doc/modules/feature_selection.rst +++ b/doc/modules/feature_selection.rst @@ -201,31 +201,36 @@ alpha parameter, the fewer features selected. .. _compressive_sensing: -.. topic:: **L1-recovery and compressive sensing** - - For a good choice of alpha, the :ref:`lasso` can fully recover the - exact set of non-zero variables using only few observations, provided - certain specific conditions are met. In particular, the number of - samples should be "sufficiently large", or L1 models will perform at - random, where "sufficiently large" depends on the number of non-zero - coefficients, the logarithm of the number of features, the amount of - noise, the smallest absolute value of non-zero coefficients, and the - structure of the design matrix X. In addition, the design matrix must - display certain specific properties, such as not being too correlated. - - There is no general rule to select an alpha parameter for recovery of - non-zero coefficients. It can by set by cross-validation - (:class:`~sklearn.linear_model.LassoCV` or - :class:`~sklearn.linear_model.LassoLarsCV`), though this may lead to - under-penalized models: including a small number of non-relevant variables - is not detrimental to prediction score. BIC - (:class:`~sklearn.linear_model.LassoLarsIC`) tends, on the opposite, to set - high values of alpha. - - **Reference** Richard G. Baraniuk "Compressive Sensing", IEEE Signal +|details-start| +**L1-recovery and compressive sensing** +|details-split| + +For a good choice of alpha, the :ref:`lasso` can fully recover the +exact set of non-zero variables using only few observations, provided +certain specific conditions are met. In particular, the number of +samples should be "sufficiently large", or L1 models will perform at +random, where "sufficiently large" depends on the number of non-zero +coefficients, the logarithm of the number of features, the amount of +noise, the smallest absolute value of non-zero coefficients, and the +structure of the design matrix X. In addition, the design matrix must +display certain specific properties, such as not being too correlated. + +There is no general rule to select an alpha parameter for recovery of +non-zero coefficients. It can by set by cross-validation +(:class:`~sklearn.linear_model.LassoCV` or +:class:`~sklearn.linear_model.LassoLarsCV`), though this may lead to +under-penalized models: including a small number of non-relevant variables +is not detrimental to prediction score. BIC +(:class:`~sklearn.linear_model.LassoLarsIC`) tends, on the opposite, to set +high values of alpha. + +.. topic:: Reference + + Richard G. Baraniuk "Compressive Sensing", IEEE Signal Processing Magazine [120] July 2007 http://users.isr.ist.utl.pt/~aguiar/CS_notes.pdf +|details-end| Tree-based feature selection ---------------------------- @@ -282,6 +287,10 @@ instead of starting with no features and greedily adding features, we start with *all* the features and greedily *remove* features from the set. The `direction` parameter controls whether forward or backward SFS is used. +|details-start| +**Detail on Sequential Feature Selection** +|details-split| + In general, forward and backward selection do not yield equivalent results. Also, one may be much faster than the other depending on the requested number of selected features: if we have 10 features and ask for 7 selected features, @@ -299,16 +308,18 @@ cross-validation requires fitting `m * k` models, while :class:`~sklearn.feature_selection.SelectFromModel` always just does a single fit and requires no iterations. -.. topic:: Examples - - * :ref:`sphx_glr_auto_examples_feature_selection_plot_select_from_model_diabetes.py` - -.. topic:: References: +.. topic:: Reference .. [sfs] Ferri et al, `Comparative study of techniques for large-scale feature selection `_. +|details-end| + +.. topic:: Examples + + * :ref:`sphx_glr_auto_examples_feature_selection_plot_select_from_model_diabetes.py` + Feature selection as part of a pipeline =======================================