8000 PCA(whiten=True): unit variances != 1 (regression in 0.19) · Issue #11001 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

PCA(whiten=True): unit variances != 1 (regression in 0.19) #11001

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aldanor opened this issue Apr 20, 2018 · 8 comments
Closed

PCA(whiten=True): unit variances != 1 (regression in 0.19) #11001

aldanor opened this issue Apr 20, 2018 · 8 comments

Comments

@aldanor
Copy link
aldanor commented Apr 20, 2018

When passing whiten=True to PCA(), component-wise variances are not 'unit' as is claimed.

Unless I'm missing something, this is a regression presumably caused by #9105 (which appeared in v0.19)?

Example:

from sklearn.datasets import make_classification
from sklearn.decomposition import PCA

X, _ = make_classification(n_samples=1000, n_features=4, n_informative=3, n_redundant=0,
                           n_repeated=0, n_classes=2, random_state=10, shift=10., scale=10.)
print(1 - PCA(whiten=True).fit_transform(X).var(axis=0))

Under v0.18.2, this outputs

[ -4.44089210e-16   0.00000000e+00   1.11022302e-16   6.66133815e-16]

Under v0.19.0, this outputs

[ 0.001  0.001  0.001  0.001]
@agramfort
Copy link
Member
agramfort commented Apr 20, 2018 via email

@aldanor
Copy link
Author
aldanor commented Apr 20, 2018

Absolutely sure, same box, same environment, just different sklearn versions.

@aldanor
Copy link
Author
aldanor commented Apr 20, 2018

I’ll check the difference and report back, but I think it was something like 1e-7.

@aldanor
Copy link
Author
aldanor commented Apr 20, 2018

@agramfort I've updated the issue description. It's 1 - eps in the old version, in the new it's 1 - 0.001 which doesn't seem right.

@aldanor
Copy link
Author
aldanor commented Apr 20, 2018

Looks like, more generally, the resulting component variance is now equal to 1 - 1 / n_samples.

@qinhanmin2014
Copy link
Member

Thanks @aldanor for the issue. I think it depends on how we define the covariance matrix. Previously, we defined it as A'A/n_samples, so you get 1 with PCA(whiten=True).fit_transform(X).var(axis=0). Now we defined it as A'A/(n_samples - 1). so you get 1 with PCA(whiten=True).fit_transform(X).var(axis=0, ddof=1). The change is clearly documented in what's new.
Also see #7699
Closing as a duplicate of #10137
Please reopen if you disagree (with more evidence (e.g., some references))

@aldanor
Copy link
Author
aldanor commented Apr 20, 2018

Ok, gotcha. I guess the docstring / the docs might be a bit more clear then, underlining the fact that it's normalized wrt to ddof=1 variance.

@qinhanmin2014
Copy link
Member

PR to improve docstring is always welcome, but it might be better to wait for the outcome of #10137

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0