You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was comparing different shrinkage algorithms and when looking at sklearn implementation of the OAS estimator I found something strange in the definition of the shrinkage factor or at least not clear to me. In the original formula from Chen et al. 2010 (the formula is also wrong in the paper, anyway) they used always the trace of the covariance matrix. Instead, this is the formula from sklearn.covariance.OAS module:
mu = np.trace(emp_cov) / n_features
# formula from Chen et al.'s **implementation**
alpha = np.mean(emp_cov ** 2)
num = alpha + mu ** 2
den = (n_samples + 1.) * (alpha - (mu ** 2) / n_features)
shrinkage = 1. if den == 0 else min(num / den, 1.)
shrunk_cov = (1. - shrinkage) * emp_cov
shrunk_cov.flat[::n_features + 1] += shrinkage * mu
where alpha is the mean of the squared covariance matrix instead of the trace, and also the mu parameter is normalized by the number of features also at the numerator, differently from what I found in the literature, referring to the same formula.
A trascription of what I found in papers should be (discarding the factor 2/p as in sklearn):
mu = np.trace(emp_cov)
alpha = np.trace(emp_cov ** 2)
num = alpha + mu ** 2
den = (n_samples + 1.) * (alpha - (mu ** 2) / n_features)
shrinkage = 1. if den == 0 else min(num / den, 1.)
shrunk_cov = (1. - shrinkage) * emp_cov + shrinkage * np.diag(np.diag(emp_cov))
Is this right? Are the two forms equivalent in a way that I couldn't understand?
Thank you in advance,
Assunta
The text was updated successfully, but these errors were encountered:
Hey there @assuntaciarlo! Sorry for the long delay with this issue. After reading into original publication (I believe it's been corrected) and considering the R implementation I'm inclined to agree with you that the current implementation is off. In particular, the $\mu^2$ terms in the numerator and denominator resolve to $\operatorname{tr} (\hat S)^2 / M^2$. Should be a fairly quick fix. Opened a PR here: #23867
Dear sklearn experts,
I was comparing different shrinkage algorithms and when looking at sklearn implementation of the OAS estimator I found something strange in the definition of the shrinkage factor or at least not clear to me. In the original formula from Chen et al. 2010 (the formula is also wrong in the paper, anyway) they used always the trace of the covariance matrix. Instead, this is the formula from sklearn.covariance.OAS module:
where alpha is the mean of the squared covariance matrix instead of the trace, and also the mu parameter is normalized by the number of features also at the numerator, differently from what I found in the literature, referring to the same formula.
A trascription of what I found in papers should be (discarding the factor 2/p as in sklearn):
Is this right? Are the two forms equivalent in a way that I couldn't understand?
Thank you in advance,
Assunta
The text was updated successfully, but these errors were encountered: