10000 FIX/ENH?: PCA and RandomizedPCA now have inconsistent meanings of ``.components_`` · Issue #3930 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

FIX/ENH?: PCA and RandomizedPCA now have inconsistent meanings of .components_ #3930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dengemann opened this issue Dec 3, 2014 · 14 comments
Milestone

Comments

@dengemann
Copy link
Contributor

@eickenberg introduced a new API / meaning of the components attribute to PCA: 3f6c61f

The RandomizedPCA code still stores whitened components instead of unit scale components. This should be changed, right?

cc @agramfort @ogrisel

@eickenberg
Copy link
Contributor

From a consistency standpoint, definitely.

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/decomposition/pca.py#L599 makes the components have non-unit length, which is not the case in the current PCA anymore and so is inconsistent.
In the PR in which I changed this we had consensus on wanting unit norm components #3107 (comment) .

Now I am wondering if there are other places where this needs to be changed.

@eickenberg
Copy link
Contributor

(While we are at it: The output of the SVD should rather be U, S, VT = randomized_svd ... instead of U, S, V = randomized_svd :))

@ogrisel
Copy link
Member
ogrisel commented Dec 3, 2014

(While we are at it: The output of the SVD should rather be U, S, VT = randomized_svd ... instead of U, S, V = randomized_svd :))

I don't think it's worth breaking the backward compat for this, as long as the shape of the returned arrays is properly documented.

@ogrisel
Copy link
Member
ogrisel commented Dec 3, 2014

About @dengemann's report: does this mean that the inverse_transform of RandomizedPCA is broken as well? We should have more tests.

I am not sure what was the motivation of 3f6c61f anymore and I am not sure it was worth breaking the backward compat (if it was not required to solve a bug).

@larsmans you might want to have look at this.

@ogrisel
Copy link
Member
ogrisel commented Dec 3, 2014

Alright I remember now that I re-read the discussion at #3107. I agree that this is more intuitive that components_ should always form an orthonormal basis for *PCA models. We need to make sure that IncrementalPCA, RandomizedPCA and PCA are all consistent and we should properly document the change in the API change of whats_new.rst before the 0.16 release.

8000
@ogrisel ogrisel added this to the 0.16 milestone Dec 3, 2014
@giorgiop
Copy link
Contributor

Is this still open? I am happy to take a look at this since I am working of some of those classes.

@amueller amueller modified the milestones: 0.17, 0.16 Sep 11, 2015
@amueller
Copy link
Member

Yes, it is still open and needs a fix.
The problem is that I'm not sure what the deprecation path is for this. We probably don't want to just change the behavior.

@eickenberg
Copy link
Contributor

We probably don't want to just change the behavior.
+1

when I changed the behaviour of PCA, fixing what clearly was a bug, there
was feedback about how it broke a ZCA workflow ... So the deprecation needs
to be made very clear.

On Fri, Sep 11, 2015 at 5:46 PM, Andreas Mueller notifications@github.com
wrote:

Yes, it is still open and needs a fix.
The problem is that I'm not sure what the deprecation path is for this. We
probably don't want to just change the behavior.


Reply to this email directly or view it on GitHub
#3930 (comment)
.

@ogrisel
Copy link
Member
ogrisel commented Sep 22, 2015

Actually, if we add the randomized_svd solver of RandomizedPCA to the PCA class we can just deprecate the RandomizedPCA class and leave it unchanged: this way we do not introduce any backward compatibility issue.

@giorgiop
Copy link
Contributor

It depends if the new PCA will make it to next release. Should I focus on that then?

@giorgiop
Copy link
Contributor

I should have fixed this in #5299. Could anyone confirm this is all right now?

@amueller amueller modified the milestones: 0.17, 0.18 Nov 2, 2015
@giorgiop
Copy link
Contributor

I should have fixed this in #5299. Could anyone confirm this is all right now?

Changes are now in master. @dengemann @eickenberg can anyone confirm we can close this issue?

@dengemann
Copy link
Contributor Author

Great, will check it out.

@MechCoder
Copy link
Member

I confirm that we now store the unscaled components.

pca = PCA(svd_solver="full", n_components=3)
pca.fit(X)
np.sum(pca.components_**2, axis=1)
array([ 1.,  1.,  1.])
pca = PCA(svd_solver="randomized", n_components=3)
pca.fit(X)
np.sum(pca.components_**2, axis=1)
array([ 1.,  1.,  1.])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
0