-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Bug in LedoitWolf Shrinkage #6195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Fixes scikit-learn#6195 Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct. However, it is fairly hard to convinced oneself that it is the case. This commit adds tests that are easier to read and relate to the publication.
not a bug in the end, right? Can you please summarize your analysis?
Not a bug indeed. Or rather a bug in my brain, not in scikit-learn.
|
Fixes scikit-learn#6195 Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct. However, it is fairly hard to convinced oneself that it is the case. This commit adds tests that are easier to read and relate to the publication.
@GaelVaroquaux. I also found this result very counter intuitive. I read the paper again an kind of made sense of why this happens. Section 2.1 of Ledoit and Wolf defines the "population" optimal estimates. Using their notation, in your example \Sigma=I. Therefore \mu = 1 and \alpha^2 = 0. Using Lemma 2.1 gives \beta^2 = \delta^2. The shrinkage parameter (eq. 5) is \beta^2 / \delta^2 = 1. This kind of means that when the population covariance \Sigma is the identity, one should maximally shrink toward the identity. Kind of funny, but works as intended. |
Indeed. I had to reread Ledoit and Wolf too to convince myself. Thanks for confirming. |
Maybe the documentation of the user guide and the docstring of the function / estimator classcould be extended to give this intuition? @clamus would you be interested in submitting a PR? |
Yes, definitely! I'll submit a PR explaining this unexpected but correct property in the user guide and docstrings |
Fixes scikit-learn#6195 Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct. However, it is fairly hard to convinced oneself that it is the case. This commit adds tests that are easier to read and relate to the publication.
Fixes scikit-learn#6195 Indeed, scikit-learn#6195 was not a bug: the code in scikit-learn is correct. However, it is fairly hard to convinced oneself that it is the case. This commit adds tests that are easier to read and relate to the publication.
The estimate of the shrinkage in the Ledoit is pretty broken:
This outputs:
In other words, the estimator has deduced that their should be a shrinkage of 1: it's taking something proportional to the identity.
That shrinkage is given by "m_n" in lemma 3.2 of "A well-conditioned estimator for large-dimensional covariance matrices", Olivier Ledoit and Michael Wolf: "m_n = <S_n, I_n>" where "<.,.>" is the canonical matrix inner product, I_n is the identity, and S_n the data scatter matrix. As can be seen from this equation, m_n == 1 is possible only if the scatter matrix is 1. Hence this result is false. Not that I believed it at all.
I know where the bug is (n_splits == 0). I just need to find a robust test so that these things don't happen again.
This is quite bad: we have had a broken Ledoit Wolf for a few releases :(. Ledoit Wolf is the most useful covariance estimator.
The text was updated successfully, but these errors were encountered: