Bug in LedoitWolf Shrinkage #6195

GaelVaroquaux · 2016-01-20T14:55:08Z

The estimate of the shrinkage in the Ledoit is pretty broken:

import numpy as np
from sklearn import covariance
np.random.seed(42)
signals = np.random.random(size=(75, 4))
print(covariance.ledoit_wolf(signals))

This outputs:

(array([[ 0.08626827,  0.        , -0.        , -0.        ],
       [ 0.        ,  0.08626827,  0.        ,  0.        ],
       [-0.        ,  0.        ,  0.08626827, -0.        ],
       [-0.        ,  0.        , -0.        ,  0.08626827]]), 1.0)

In other words, the estimator has deduced that their should be a shrinkage of 1: it's taking something proportional to the identity.

That shrinkage is given by "m_n" in lemma 3.2 of "A well-conditioned estimator for large-dimensional covariance matrices", Olivier Ledoit and Michael Wolf: "m_n = <S_n, I_n>" where "<.,.>" is the canonical matrix inner product, I_n is the identity, and S_n the data scatter matrix. As can be seen from this equation, m_n == 1 is possible only if the scatter matrix is 1. Hence this result is false. Not that I believed it at all.

I know where the bug is (n_splits == 0). I just need to find a robust test so that these things don't happen again.

This is quite bad: we have had a broken Ledoit Wolf for a few releases :(. Ledoit Wolf is the most useful covariance estimator.

Fixes scikit-learn#6195 Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct. However, it is fairly hard to convinced oneself that it is the case. This commit adds tests that are easier to read and relate to the publication.

ogrisel · 2016-01-27T10:15:17Z

I merged #6201. github closed #6195 automatically but this was not a bug in the end, right? Can you please summarize your analysis?

GaelVaroquaux · 2016-01-27T10:23:39Z

not a bug in the end, right? Can you please summarize your analysis?

Not a bug indeed. Or rather a bug in my brain, not in scikit-learn.

Fixes scikit-learn#6195 Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct. However, it is fairly hard to convinced oneself that it is the case. This commit adds tests that are easier to read and relate to the publication.

clamus · 2016-03-03T00:54:40Z

@GaelVaroquaux. I also found this result very counter intuitive. I read the paper again an kind of made sense of why this happens. Section 2.1 of Ledoit and Wolf defines the "population" optimal estimates. Using their notation, in your example \Sigma=I. Therefore \mu = 1 and \alpha^2 = 0. Using Lemma 2.1 gives \beta^2 = \delta^2. The shrinkage parameter (eq. 5) is \beta^2 / \delta^2 = 1. This kind of means that when the population covariance \Sigma is the identity, one should maximally shrink toward the identity. Kind of funny, but works as intended.

GaelVaroquaux · 2016-03-03T06:07:27Z

Indeed. I had to reread Ledoit and Wolf too to convince myself. Thanks for confirming.

ogrisel · 2016-03-03T07:49:28Z

This kind of means that when the population covariance \Sigma is the identity, one should maximally shrink toward the identity. Kind of funny, but works as intended.

Maybe the documentation of the user guide and the docstring of the function / estimator classcould be extended to give this intuition?

@clamus would you be interested in submitting a PR?

clamus · 2016-03-03T09:01:18Z

Yes, definitely! I'll submit a PR explaining this unexpected but correct property in the user guide and docstrings

Fixes scikit-learn#6195 Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct. However, it is fairly hard to convinced oneself that it is the case. This commit adds tests that are easier to read and relate to the publication.

GaelVaroquaux added the Bug label Jan 20, 2016

GaelVaroquaux mentioned this issue Jan 20, 2016

[MRG] Joblib 0.9.4 #6179

Merged

GaelVaroquaux mentioned this issue Jan 20, 2016

[MRG+1] TST: More tests for Ledoit-Wolf #6201

Merged

ogrisel closed this as completed in #6201 Jan 27, 2016

clamus mentioned this issue Mar 3, 2016

Explanation of unexpected but correct behavior of Ledoit-Wolf covariance estimate #6482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in LedoitWolf Shrinkage #6195

Bug in LedoitWolf Shrinkage #6195

Bug in LedoitWolf Shrinkage #6195

Bug in LedoitWolf Shrinkage #6195

Comments