8000 [MRG+2] Incorrect implementation of explained_variance_ in PCA by qinhanmin2014 · Pull Request #9105 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] Incorrect implementation of explained_variance_ in PCA #9105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 21, 2017
Merged

[MRG+2] Incorrect implementation of explained_variance_ in PCA #9105

merged 9 commits into from
Jun 21, 2017

Conversation

qinhanmin2014
Copy link
Member
@qinhanmin2014 qinhanmin2014 commented Jun 11, 2017

Reference Issue

Fixes #7699
Mentioned in #8541 (the 5th comment by IainStrachan on 11 May)
Mentioned in #8544 (the 6th comment by IainStrachan on 11 May)

What does this implement/fix? Explain your changes.

PCA.explained_variance_ is incorrectly implemented. The test is also wrong so that the mistake is not detected.

The result of explained_variance_ is wrong.

result from python:
04
result from R:
01
result according to the definition:
02

Reason for the incorrect implementation.

(1)In the code, explained_variance_ = (S ** 2) / n_samples should be explained_variance_ = (S ** 2) / (n_samples - 1). Because when using SVD to calculate PCA, we get the eigenvalue of A'A(A' means the transpose of A), but what we need is the eigenvalue of A'A/(n_samples - 1) , which is equivalent to the covariance matrix of A.
(2)In the test, we simply use np.var to calculate the variance. That's incorrect. We should use np.cor and pick the elements on the diagonal or set ddof=1 .

Any other comments?

I provide two ways for the new test, one based on current version, another based on the definition.
Please take some time to consider it. Thanks.

@qinhanmin2014 qinhanmin2014 changed the title [WIP] Incorrect implementation of explained_variance_ in PCA [MRG] Incorrect implementation of explained_variance_ in PCA Jun 11, 2017

# X_rpca = rpca.transform(X)
# assert_array_almost_equal(rpca.explained_variance_,
# expected_result, decimal=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uncomment or remove

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agramfort Thanks. From my perspective, I think the test is necessary because it is based on the original definition, so I uncomment it, but I would like to know your advise.

@agramfort
Copy link
Member

+1 for MRG when CIs are happy

@agramfort agramfort changed the title [MRG] Incorrect implementation of explained_variance_ in PCA [MRG+1] Incorrect implementation of explained_variance_ in PCA Jun 12, 2017
@jnothman jnothman added this to the 0.19 milestone Jun 18, 2017
@amueller
Copy link
Member

This is issue #7699 right? I thought we were ok with the current implementation? I guess we changed our opinion? I don't have a strong opinion.

@amueller
Copy link
Member

@qinhanmin2014 I think when looking at #7699 I checked the definition of PCA, and I don't think any of the books I looked at mentioned the bessel correction. In particular Elements of Statistical Learning, which is my standard reference. Can you provide a reference with your definition, other than prcomp?

@qinhanmin2014
Copy link
Member Author
qinhanmin2014 commented Jun 19, 2017

@amueller Thanks for your instruction, this issue is indeed related to #7699

(1)I think the problem is not whether to use bessel correction. The main point is that the variance is supposed to be equal to the eigenvalue of the covariance matrix. I propose an additional test to prove my implementation.
(2)It seems that Elements of Statistical Learning simply use SVD to calculate PCA and say nothing about the covariance matrix and the calculation of variance.
(3)R, Matlab(princomp) implemented in the same way as the pull request.
(4)Many papers and books also implemented in the same way as the pull request.(e.g. https://arxiv.org/abs/1404.1100 (P11-P12), Machine Learning in Action (https://github.com/pbharrin/machinelearninginaction/blob/master/Ch13/pca.py)). but I currently can't find one which implemented in the same way as sklearn.

What's more, from my perspective, even if we dont't find any strong evidence, it may be better to ensure that sklearn behave the same as others.

@amueller
Copy link
Member

Oh wow, I misread the covariance test. Never mind, that test is clearly correct.
LGTM. Could you please add an entry to whats_new.rst in the bugfix section?

@amueller amueller changed the title [MRG+1] Incorrect implementation of explained_variance_ in PCA [MRG+2] Incorrect implementation of explained_variance_ in PCA Jun 20, 2017
@agramfort
Copy link
Member

@qinhanmin2014 please update what's new and let's merge

@qinhanmin2014
Copy link
Member Author

@amueller @agramfort Finished. I also revert #7843 since these statements in the document are no longer needed. Thanks.

@amueller amueller merged commit 2a36ff1 into scikit-learn:master Jun 21, 2017
@amueller
Copy link
Member

thanks :)

@qinhanmin2014 qinhanmin2014 deleted the my-feature-1 branch June 21, 2017 03:39
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…t-learn#9105)

* fix pca explained_variance_

* fix fit_transform

* fix test_whitening

* fix IncrementalPCA

* uncomment the test

* improve test

* make CI green

* revert scikit-learn#7843 and add what's new

* fix what's new
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No Bessel correction in PCA
4 participants
0