10000 [MRG+1] Feature or Bug? Whitened PCA does not inverse_transform properly and is even document that way by eickenberg · Pull Request #3107 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Feature or Bug? Whitened PCA does not inverse_transform properly and is even document that way #3107

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

eickenberg
Copy link
Contributor

The PCA.inverse_transform docstring even explicitly states that the inverse transform after whitening is inexact, because the necessary rescaling to fall back into the right space is not done. Is there a specific reason for this? One would think that the expected behaviour of an inverse_transform should be that it maps to the closest possible point in input space given the incurred information loss. Since with (full rank) PCA one can keep all the information, the inverse transform should be the true inverse.

Any opinions on this?

My main concern for changing this is that this behaviour is documented and thus possibly expected by some users.
Making the PCA object do a true inverse is as easy as adding 2 lines to the inverse_transform as visible in the diff.

@@ -161,8 +161,7 @@ def test_pca_inverse():
pca.fit(X)
Y = pca.transform(X)
Y_inverse = pca.inverse_transform(Y)
relative_max_delta = (np.abs(X - Y_inverse) / np.abs(X).mean()).max()
assert_almost_equal(relative_max_delta, 0.11, decimal=2)
assert_almost_equal(X, Y_inverse, decimal=3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this one fail on master?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I wrote it before coding the fix, and it failed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather "fix", because I am still not sure whether the current behaviour can be useful or not

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised about the 3 decimals of precision

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather "fix", because I am still not sure whether the current behaviour can
be useful or not

Suppose that you have computed OLS weights in a reduced space, as in a
PCR (principal component regression), and you want to visualize them in
the full space. That would be a usecase.

I must admit that I would find it fishy to use whiten=True in such a
scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied the same assertion statement from above, where it is tested with whiten=False. It should work at higher precision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't transforming back into the full space imply an un-whitening as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't transforming back into the full space imply an un-whitening as
well?

Hard to say. I would always write a PCR without whitening, so things are
better posed.

@amueller
Copy link
Member

I find the current behavior unintuitive, to say the least, and agree with @eickenberg. Ugly proposed solution: introduce a parameter that changes the behavior, make the default False and add a warning that the default will change to True. Then a release later change the default, and one more release to remove the parameter. Hurray?

Or we could just break it ;)

Notes
-----
If whitening is enabled, inverse_transform does not compute the
exact inverse operation as transform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment dates back from 2011 so it seems this behavior as always been there

@GaelVaroquaux do you remember what was the rationale for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaelVaroquaux do you remember what was the rationale for this?

No, and honestly if we cannot find one, I do not mind changing this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the question is : do we consider this a bug and just fix and document in
what's new?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what makes everybody hesitant is that it was explicitly documented. Are there general notes on what inverse_transform should do? The rules I have been able to identify are something like inverse_transform(transform(point)) should be as close as possible to point for a metric appropriate to the situation and within a reasonable tradeoff of effort/closeness (in case this actually involves solving another optimization problem).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you ping the mailing list? If everybody agrees it's a bug I would treat
it this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a bug as well.

@seanv507
Copy link

Michael, Could you do the same fix for randomized PCA?

sean

@eickenberg
Copy link
Contributor Author

Yes, I can take care of changing this in all relevant modules if it is decided that this behaviour is wanted. In the mean time, I posted a fix to this on stack overflow which can be used without changing the scikit learn code: http://stackoverflow.com/questions/23254700/inversing-pca-transform-with-sklearn-with-whiten-true/23263923#23263923

@kastnerkyle
Copy link
Member

In my Incremental PCA PR (#3285) there is an introduction of a _BasePCA module. Though the other PCA modules have not yet been converted to use this as the base class, that is the direction we discussed in the PR comments. This is a great chance to a) unify common functions and b) provide deprecation support for the whole whiten/unwhiten transform issue in one place.

@eickenberg
Copy link
Contributor Author

Totally agree. Also my question to the mailing list whether it there was
any usecase for not inverting the whitening didn't cause anybody to speak
up and show one. In this sense, it would be good to push towards
unification, and consequently, consistency.

On Mon, Jul 14, 2014 at 5:58 PM, Kyle Kastner notifications@github.com
wrote:

In my Incremental PCA PR (#3285
#3285) there is an
introduction of a _BasePCA module. Though the other PCA modules have not
yet been converted to use this as the base class, that is the direction we
discussed in the PR comments. This is a great chance to a) unify common
functions and b) provide deprecation support for the whole whiten/unwhiten
transform issue in one place.


Reply to this email directly or view it on GitHub
#3107 (comment)
.

@amueller
Copy link
Member

So does this wait for #3285? Or do we want to merge this first?

@seanv507
Copy link

One last question/ request? currently whiten scales the components_ variable ( in order to rescale the transformed data) . should this be done? or should components_ vectors always have norm 1 (whether we do whiten or not)

@GaelVaroquaux
Copy link
Member
57A7

One last question/ request? currently whiten scales the components_ variable (
in order to rescale the transformed data) . should this be done? or should
components_ vectors always have norm 1 (whether we do whiten or not)

I think that the components should always have norm 1. We expect them to
be a ortonormal basis.

@eickenberg
Copy link
Contributor Author

+1

On Wed, Jul 16, 2014 at 6:33 PM, Gael Varoquaux notifications@github.com
wrote:

One last question/ request? currently whiten scales the components_
variable (
in order to rescale the transformed data) . should this be done? or
should
components_ vectors always have norm 1 (whether we do whiten or not)

I think that the components should always have norm 1. We expect them to
be a ortonormal basis.


Reply to this email directly or view it on GitHub
#3107 (comment)
.

@kastnerkyle
Copy link
Member

If we merge my PR (#3285) then there will be some work to get more of the
other PCA algorithms using the common methods of _BasePCA. That would be
the ideal time to handle deprecations IMO.

On Wed, Jul 16, 2014 at 11:33 AM, Gael Varoquaux notifications@github.com
wrote:

One last question/ request? currently whiten scales the components_
variable (
in order to rescale the transformed data) . should this be done? or
should
components_ vectors always have norm 1 (whether we do whiten or not)

I think that the components should always have norm 1. We expect them to
be a ortonormal basis.


Reply to this email directly or view it on GitHub
#3107 (comment)
.

B421

@amueller amueller added this to the 0.15.1 milestone Jul 18, 2014
@ogrisel
Copy link
Member
ogrisel commented Jul 29, 2014

One last question/ request? currently whiten scales the components_ variable ( in order to rescale the transformed data) . should this be done? or should components_ vectors always have norm 1 (whether we do whiten or not)

I think that the components should always have norm 1. We expect them to be a orthonormal basis.

+1 as well. This is another bug that should be tackled by this PR and properly documented. As this is quite an important semantic change I would be in favor of not including this fix in 0.15.1 but rather schedule that for 0.16.0 instead. Will change the milestone.

@ogrisel ogrisel closed this Jul 29, 2014
@ogrisel ogrisel reopened this Jul 29, 2014
@ogrisel
Copy link
Member
ogrisel commented Jul 29, 2014

oops, wrong comment button again. Sorry.

@ogrisel ogrisel modified the milestones: 0.15.1, 0.16 Jul 29, 2014
@coveralls
Copy link

Coverage Status

Coverage increased (+0.22%) when pulling 5f5ef35 on eickenberg:pca_whiten_inverse_transform into f7e9527 on scikit-learn:master.

@eickenberg
Copy link
Contributor Author

Components are now always of unit length.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 74e25ab on eickenberg:pca_whiten_inverse_transform into 25f72c4 on scikit-learn:master.

if self.whiten:
components_ = components_ * np.sqrt(exp_var[:, np.newaxis])
# if self.whiten:
# components_ = components_ * np.sqrt(exp_var[:, np.newaxis])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete the commented out lines.

@ogrisel
Copy link
Member
ogrisel commented Aug 1, 2014

This looks good to me, please add an entry in the whats new file. This change need to be clearly documented.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling b0ee936 on eickenberg:pca_whiten_inverse_transform into 25f72c4 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) when pulling 7128600 on eickenberg:pca_whiten_inverse_transform into 25f72c4 on scikit-learn:master.

@eickenberg
Copy link
Contributor Author

Do I need to squash these?

@ogrisel
Copy link
Member
ogrisel commented Aug 1, 2014

Yes please. It helps to have a single commit per change that have a meaning on its own.

@eickenberg
Copy link
Contributor Author

Sorry for being slow: All of them into one, or, as I just pushed, into semantically coherent subunits? (I'm getting better at rebase -i :))

@ogrisel
Copy link
Member
ogrisel commented Aug 1, 2014

This is good, although you could prefix the commit messages with ENH FIX TST or DOC depending on their content.

@ogrisel
Copy link
Member
ogrisel commented Aug 1, 2014

you need to rebase on top of current master though. github says that you are based on an old master.

@eickenberg
Copy link
Contributor Author

I was rebased on a master commit of earlier today. Now it should be the latest one. Also added the descriptors to the commits. Thanks for the patience!

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 3d2be3b on eickenberg:pca_whiten_inverse_transform into 07560e4 on scikit-learn:master.

@ogrisel
Copy link
Member
ogrisel commented Aug 1, 2014

We usually don't put square brackets in the commits messages (just in the PR headers for some reason) but this is fine. +1 for merge on my side.

@ogrisel ogrisel changed the title Feature or Bug? Whitened PCA does not inverse_transform properly and is even document that way [MRG+1] Feature or Bug? Whitened PCA does not inverse_transform properly and is even document that way Aug 1, 2014
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 443312d on eickenberg:pca_whiten_inverse_transform into bf43d5b on scikit-learn:master.

@agramfort
Copy link
Member

Besides the rebase pb lgtm

Travis failure is unrelated

@ogrisel
Copy link
Member
ogrisel commented Aug 1, 2014

@eickenberg do you want me to cherry-pick your commits and push them to master or do you want to try to rebase one more time to get rid of my "FIX #3485: class_weight='auto' on SGDClassifier" spurious commit in the middle?

@eickenberg
Copy link
Contributor Author

Now there are only the 4 commits that I wrote. Unless I didn't mess up badly, this should be already rebased on the latest master and correct in content.

@MechCoder
Copy link
Member

The travis failure should go if this is rebased on master. there was a bug fix.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) when pulling 69604d5 on eickenberg:pca_whiten_inverse_transform into 22cafa6 on scikit-learn:master.

@kastnerkyle
Copy link
Member

This looks pretty good to me - I am a definite +1 . It will help for #3285 and future unification of the common code in PCA

@arjoly
Copy link
Member
arjoly commented Aug 6, 2014

Reviewers are happy. I am going to merge by rebase.

@arjoly
Copy link
Member
arjoly commented Aug 6, 2014

Merged by rebase.

@arjoly arjoly closed this Aug 6, 2014
@arjoly
Copy link
Member
arjoly commented Aug 6, 2014

Thanks @eickenberg. Is there an issue ticket for this?

@eickenberg
Copy link
Contributor Author

I think I just used the PR as the issue ticket ...

Thanks everybody for all the helpful comments!

On Wed, Aug 6, 2014 at 2:45 PM, Arnaud Joly notifications@github.com
wrote:

Thanks @eickenberg https://github.com/eickenberg. Is there an issue
ticket for this?


Reply to this email directly or view it on GitHub
#3107 (comment)
.

@eyaler
Copy link
eyaler commented Mar 15, 2015

ZCA is a useful transformation defined as PCA + whitening + inverse rotation. this change breaks it.

@kastnerkyle
Copy link
Member

ZCA is mathematically different though (bias term, etc.). So if ZCA is
deemed useful it should probably be a separate class IMO instead of
piggybacking on internals of PCA

On Sun, Mar 15, 2015 at 1:15 PM, eyaler notifications@github.com wrote:

ZCA is a useful transformation defined as PCA + whitening + inverse
rotation. this change breaks it.


Reply to this email directly or view it on GitHub
#3107 (comment)
.

@eickenberg
Copy link
Contributor Author

but this is PCA, not ZCA, and its inverse transform should bring the signal
back as close as possible to the original (the measure here is l2 norm).
You can still make a ZCA out of this, though, by using the
pca.explained_variance_ terms appropriately.

On Sunday, March 15, 2015, eyaler notifications@github.com wrote:

ZCA is a useful transformation defined as PCA + whitening + inverse
rotation. this change breaks it.


Reply to this email directly or view it on GitHub
#3107 (comment)
.

@kastnerkyle
Copy link
Member

Agreed. I am not really sure how this change would break (broke?) a ZCA
implementation tbh beyond change of expected output. All the pieces are
still there.

On Sun, Mar 15, 2015 at 1:54 PM, eickenberg notifications@github.com
wrote:

but this is PCA, not ZCA, and its inverse transform should bring the signal
back as close as possible to the original (the measure here is l2 norm).
You can still make a ZCA out of this, though, by using the
pca.explained_variance_ terms appropriately.

On Sunday, March 15, 2015, eyaler notifications@github.com wrote:

ZCA is a useful transformation defined as PCA + whitening + inverse
rotation. this change breaks it.


Reply to this email directly or view it on GitHub
<
#3107 (comment)

.


Reply to this email directly or view it on GitHub
#3107 (comment)
.

@eickenberg
Copy link
Contributor Author

Well, the way it was before was that the PCA.inverse_transform forgot to unwhiten if whiten=True.

Also, it scaled the lengths of the components, although orthonormality is expected.

Exploiting this, you could do pca = PCA(whiten=True); pca.inverse_transform(pca.transform(data)), and it basically gives you ZCA. With the fix, it gives you the signal back, which is what is expected.

To make a ZCA using PCA, you can do pca = PCA(whiten=False); t = pca.fit_transform(data); zca = pca.inverse_transform(data / np.sqrt(pca.explained_variance_)) give or take an array shape and a square root.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0