[MRG+1] Ledoit-Wolf behavior explanation (#9500)

GKjohns · amueller · commit dacf9e3f3316 · 2017-10-20T16:40:04.000-04:00
* DOC add explaination of unexpected behavior to ledoit-wolf functions and class

* DOC add explaination of unexpected ledoit-wolf behavior to module documentation

* fix line that's longer than 80 chars, pep8 issue

* fix documentation changes to Ledoit_Wolf behavior explaination

* change bahavior explanation to a note in documentation

* remove unexpected behavior explanation from docstrings

* fix broken links in docs
diff --git a/doc/modules/covariance.rst b/doc/modules/covariance.rst
@@ -38,7 +38,7 @@ The empirical covariance matrix of a sample can be computed using the
 whether the data are centered or not, the result will be different, so
 one may want to use the ``assume_centered`` parameter accurately. More precisely
 if one uses ``assume_centered=False``, then the test set is supposed to have the
-same mean vector as the training set. If not so, both should be centered by the 
+same mean vector as the training set. If not so, both should be centered by the
 user, and ``assume_centered=True`` should be used.
 
 .. topic:: Examples:
@@ -105,6 +105,23 @@ a sample with the :meth:`ledoit_wolf` function of the
 `sklearn.covariance` package, or it can be otherwise obtained by
 fitting a :class:`LedoitWolf` object to the same sample.
 
+.. note:: **Case when population covariance matrix is isotropic**
+
+    It is important to note that when the number of samples is much larger than
+    the number of features, one would expect that no shrinkage would be
+    necessary. The intuition behind this is that if the population covariance
+    is full rank, when the number of sample grows, the sample covariance will
+    also become positive definite. As a result, no shrinkage would necessary
+    and the method should automatically do this.
+
+    This, however, is not the case in the Ledoit-Wolf procedure when the
+    population covariance happens to be a multiple of the identity matrix. In
+    this case, the Ledoit-Wolf shrinkage estimate approaches 1 as the number of
+    samples increases. This indicates that the optimal estimate of the
+    covariance matrix in the Ledoit-Wolf sense is multiple of the identity.
+    Since the population covariance is already a multiple of the identity
+    matrix, the Ledoit-Wolf solution is indeed a reasonable estimate.
+
 .. topic:: Examples:
 
    * See :ref:`sphx_glr_auto_examples_covariance_plot_covariance_estimation.py` for
@@ -334,4 +351,3 @@ ____
 
     * - |robust_vs_emp|
       - |mahalanobis|
-
diff --git a/sklearn/covariance/shrunk_covariance_.py b/sklearn/covariance/shrunk_covariance_.py
@@ -486,10 +486,10 @@ class OAS(EmpiricalCovariance):
     The formula used here does not correspond to the one given in the
     article. It has been taken from the Matlab program available from the
     authors' webpage (http://tbayes.eecs.umich.edu/yilun/covestimation).
-    In the original article, formula (23) states that 2/p is multiplied by 
-    Trace(cov*cov) in both the numerator and denominator, this operation is omitted
-    in the author's MATLAB program because for a large p, the value of 2/p is so 
-    small that it doesn't affect the value of the estimator. 
+    In the original article, formula (23) states that 2/p is multiplied by
+    Trace(cov*cov) in both the numerator and denominator, this operation is
+    omitted in the author's MATLAB program because for a large p, the value
+    of 2/p is so small that it doesn't affect the value of the estimator.
 
     Parameters
     ----------