-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Incorrect implementation of explained_variance_ in PCA #9105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
# X_rpca = rpca.transform(X) | ||
# assert_array_almost_equal(rpca.explained_variance_, | ||
# expected_result, decimal=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uncomment or remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@agramfort Thanks. From my perspective, I think the test is necessary because it is based on the original definition, so I uncomment it, but I would like to know your advise.
+1 for MRG when CIs are happy |
This is issue #7699 right? I thought we were ok with the current implementation? I guess we changed our opinion? I don't have a strong opinion. |
@qinhanmin2014 I think when looking at #7699 I checked the definition of PCA, and I don't think any of the books I looked at mentioned the bessel correction. In particular Elements of Statistical Learning, which is my standard reference. Can you provide a reference with your definition, other than prcomp? |
@amueller Thanks for your instruction, this issue is indeed related to #7699 (1)I think the problem is not whether to use bessel correction. The main point is that the variance is supposed to be equal to the eigenvalue of the covariance matrix. I propose an additional test to prove my implementation. What's more, from my perspective, even if we dont't find any strong evidence, it may be better to ensure that sklearn behave the same as others. |
Oh wow, I misread the covariance test. Never mind, that test is clearly correct. |
@qinhanmin2014 please update what's new and let's merge |
@amueller @agramfort Finished. I also revert #7843 since these statements in the document are no longer needed. Thanks. |
thanks :) |
…t-learn#9105) * fix pca explained_variance_ * fix fit_transform * fix test_whitening * fix IncrementalPCA * uncomment the test * improve test * make CI green * revert scikit-learn#7843 and add what's new * fix what's new
…t-learn#9105) * fix pca explained_variance_ * fix fit_transform * fix test_whitening * fix IncrementalPCA * uncomment the test * improve test * make CI green * revert scikit-learn#7843 and add what's new * fix what's new
…t-learn#9105) * fix pca explained_variance_ * fix fit_transform * fix test_whitening * fix IncrementalPCA * uncomment the test * improve test * make CI green * revert scikit-learn#7843 and add what's new * fix what's new
…t-learn#9105) * fix pca explained_variance_ * fix fit_transform * fix test_whitening * fix IncrementalPCA * uncomment the test * improve test * make CI green * revert scikit-learn#7843 and add what's new * fix what's new
…t-learn#9105) * fix pca explained_variance_ * fix fit_transform * fix test_whitening * fix IncrementalPCA * uncomment the test * improve test * make CI green * revert scikit-learn#7843 and add what's new * fix what's new
…t-learn#9105) * fix pca explained_variance_ * fix fit_transform * fix test_whitening * fix IncrementalPCA * uncomment the test * improve test * make CI green * revert scikit-learn#7843 and add what's new * fix what's new
…t-learn#9105) * fix pca explained_variance_ * fix fit_transform * fix test_whitening * fix IncrementalPCA * uncomment the test * improve test * make CI green * revert scikit-learn#7843 and add what's new * fix what's new
Reference Issue
Fixes #7699
Mentioned in #8541 (the 5th comment by IainStrachan on 11 May)
Mentioned in #8544 (the 6th comment by IainStrachan on 11 May)
What does this implement/fix? Explain your changes.
PCA.explained_variance_
is incorrectly implemented. The test is also wrong so that the mistake is not detected.The result of explained_variance_ is wrong.
result from python:



result from R:
result according to the definition:
Reason for the incorrect implementation.
(1)In the code,
explained_variance_ = (S ** 2) / n_samples
should beexplained_variance_ = (S ** 2) / (n_samples - 1)
. Because when using SVD to calculate PCA, we get the eigenvalue ofA'A
(A' means the transpose of A), but what we need is the eigenvalue ofA'A/(n_samples - 1)
, which is equivalent to the covariance matrix of A.(2)In the test, we simply use
np.var
to calculate the variance. That's incorrect. We should usenp.cor
and pick the elements on the diagonal or setddof=1
.Any other comments?
I provide two ways for the new test, one based on current version, another based on the definition.
Please take some time to consider it. Thanks.