-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
PPCA algorithm returns inconsistent noise variance depending on truncated vs full solution #8541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I might just have run into this in #8544. PR welcome. |
Yeah there's a high chance this issue is causing the score method not to work, which also happened to me recently. I believe the fix is really easy though as all that needs to be done in the truncated case is: self.noise_variance = (total_var.sum() - self.explained_variance_.sum())/(D - n_components) where D is the full dimensionality and n_components the reduced dimensionality. |
@polmorenoc you are more than welcome to open a PR with your proposed changes. |
If you're going to put in a change, I also ran into this problem and I noticed (not that it makes a huge difference) that the computation of the eigenvalues from the singular values is also incorrect (line 473 of pca.py) in that it divides the square of the singular values by n_samples, and it should be (n_samples-1). |
In the full solution the returned value is (pca.fit_full):
self.noise_variance = explained_variance_[n_components:].mean()
whereas in the truncated case (pca.fit_truncated) you do:
self.noise_variance = (total_var.sum() - self.explained_variance_.sum())
So in the second case the total unexplained variance is returned, not the average. This is problematic not only because of the inconsistency, but because certain code in sklearn.decomposition.PCA, e.g. the computation of the precision matrix assumes that the self.noise_variance is the average noise variance for it to work as per Tipping and Bishop (http://www.miketipping.com/papers/met-mppca.pdf), which is not the case in the truncated PCA. Let me know if I'm missing something.
The text was updated successfully, but these errors were encountered: