8000 Bug in LedoitWolf Shrinkage · Issue #6195 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Bug in LedoitWolf Shrinkage #6195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GaelVaroquaux opened this issue Jan 20, 2016 · 6 comments
Closed

Bug in LedoitWolf Shrinkage #6195

GaelVaroquaux opened this issue Jan 20, 2016 · 6 comments
Labels

Comments

@GaelVaroquaux
Copy link
Member

The estimate of the shrinkage in the Ledoit is pretty broken:

import numpy as np
from sklearn import covariance
np.random.seed(42)
signals = np.random.random(size=(75, 4))
print(covariance.ledoit_wolf(signals))

This outputs:

(array([[ 0.08626827,  0.        , -0.        , -0.        ],
       [ 0.        ,  0.08626827,  0.        ,  0.        ],
       [-0.        ,  0.        ,  0.08626827, -0.        ],
       [-0.        ,  0.        , -0.        ,  0.08626827]]), 1.0)

In other words, the estimator has deduced that their should be a shrinkage of 1: it's taking something proportional to the identity.

That shrinkage is given by "m_n" in lemma 3.2 of "A well-conditioned estimator for large-dimensional covariance matrices", Olivier Ledoit and Michael Wolf: "m_n = <S_n, I_n>" where "<.,.>" is the canonical matrix inner product, I_n is the identity, and S_n the data scatter matrix. As can be seen from this equation, m_n == 1 is possible only if the scatter matrix is 1. Hence this result is false. Not that I believed it at all.

I know where the bug is (n_splits == 0). I just need to find a robust test so that these things don't happen again.

This is quite bad: we have had a broken Ledoit Wolf for a few releases :(. Ledoit Wolf is the most useful covariance estimator.

GaelVaroquaux added a commit to GaelVaroquaux/scikit-learn that referenced this issue Jan 20, 2016
Fixes scikit-learn#6195

Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct.
However, it is fairly hard to convinced oneself that it is the case.

This commit adds tests that are easier to read and relate to the
publication.
@ogrisel
Copy link
Member
ogrisel commented Jan 27, 2016

I merged #6201. github closed #6195 automatically but this was not a bug in the end, right? Can you please summarize your analysis?

@GaelVaroquaux
Copy link
Member Author
GaelVaroquaux commented Jan 27, 2016 via email

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this issue Feb 13, 2016
Fixes scikit-learn#6195

Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct.
However, it is fairly hard to convinced oneself that it is the case.

This commit adds tests that are easier to read and relate to the
publication.
@clamus
Copy link
clamus commented Mar 3, 2016

@GaelVaroquaux. I also found this result very counter intuitive. I read the paper again an kind of made sense of why this happens. Section 2.1 of Ledoit and Wolf defines the "population" optimal estimates. Using their notation, in your example \Sigma=I. Therefore \mu = 1 and \alpha^2 = 0. Using Lemma 2.1 gives \beta^2 = \delta^2. The shrinkage parameter (eq. 5) is \beta^2 / \delta^2 = 1. This kind of means that when the population covariance \Sigma is the identity, one should maximally shrink toward the identity. Kind of funny, but works as intended.

@GaelVaroquaux
Copy link
Member Author

Indeed. I had to reread Ledoit and Wolf too to convince myself. Thanks for confirming.

@ogrisel
Copy link
Member
ogrisel commented Mar 3, 2016

This kind of means that when the population covariance \Sigma is the identity, one should maximally shrink toward the identity. Kind of funny, but works as intended.

Maybe the documentation of the user guide and the docstring of the function / estimator classcould be extended to give this intuition?

@clamus would you be interested in submitting a PR?

@clamus
Copy link
clamus commented Mar 3, 2016

Yes, definitely! I'll submit a PR explaining this unexpected but correct property in the user guide and docstrings

mannby pushed a commit to mannby/scikit-learn that referenced this issue Apr 22, 2016
Fixes scikit-learn#6195

Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct.
However, it is fairly hard to convinced oneself that it is the case.

This commit adds tests that are easier to read and relate to the
publication.
TomDLT pushed a commit to TomDLT/scikit-learn that referenced this issue Oct 3, 2016
Fixes scikit-learn#6195

Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct.
However, it is fairly hard to convinced oneself that it is the case.

This commit adds tests that are easier to read and relate to the
publication.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
0