[MRG+2] Fix LDA predict_proba() #11796

agamemnonc · 2018-08-11T19:24:10Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fixes the predict_proba() method of LinearDiscriminantAnalysis.
An if statement is used to differentiate between the binary and multi-class case, due to the different output format of the decision_function method implemented in the LinearClassifierMixin class.

Any other comments?

Copying from #6848:
Do we perhaps want to include additional tests checking the output of predict_proba for LDA and QDA both for the binary and multi-class cases?

jnothman

Yes, this needs non-regression tests

agamemnonc · 2018-08-20T17:33:17Z

@jnothman do you mean a numerical test that checks that the output probabilities of a toy dataset are as expected (e.g. something similar to test_qda_store_covariance() in test_discriminant_analysis.py)?

jnothman · 2018-08-20T23:09:56Z

I suppose so. Something that fails at master, works in this PR, and is illustrative of what we think correct behaviour should be.

agamemnonc · 2018-08-21T13:22:21Z

Could you please let me know if the non-regression test looks OK?
If so, I will change the prefix to MRG.

agamemnonc · 2018-10-19T14:13:01Z

Can I suggest that this PR be prioritised given that it fixes a bug (#6848), which yields wrong prediction outcomes for a somehow popular classifier?

taalexander · 2018-12-01T22:59:26Z

Hello, I just would like to note that I am running into this bug in practice and would really appreciate a fix.

jnothman · 2018-12-03T23:04:29Z

Thanks for the pings, @agamemnonc, @taalexander. I'll look soon.

jnothman

This LGTM. I only wonder whether this should be encapsulated in a logistic utility function.

jnothman · 2018-12-04T01:42:19Z

sklearn/discriminant_analysis.py

+            # up to a multipl
8000
icative constant.
+            likelihood = np.exp(prob - prob.max(axis=1)[:, np.newaxis])
+            # compute posterior probabilities
+            return likelihood / likelihood.sum(axis=1)[:, np.newaxis]


Why not continue to do this inplace (/=)?

agamemnonc · 2018-12-05T10:23:57Z

Thanks @jnothman .

Yes, you are right. Hopefully the most recent commit is much cleaner; it is also consistent with the predict_proba method of LogisticRegression.

I have also fixed a typo in test_discriminant_analysis.py.

jnothman · 2018-12-09T04:41:55Z

Nice!

agamemnonc · 2019-01-14T12:04:04Z

Given the already merged #12931, now there is a a conflict. (ΒΤW, I believe this could have been avoided if this PR had been timely merged (PR submitted on December 5th, whereas #12931 submitted on January 6th)).

Anyway, I suggest that the changes in the most recent commit are overwritten by the current PR , since the code in this PR inherits the method from the parent class (LinearClassifierMixin) rather than re-implementing it.

jnothman · 2019-01-15T02:19:34Z

Apologies about the poor management of related pull requests on our part. Please resolve conflicts with master so we can see the benefits of this pr more clearly

agamemnonc · 2019-01-15T12:02:57Z

OK, no problem.

I have now provided a fix, since the test introduced in this PR (test_lda_predict_proba) previously failed.

Moreover, the suggested fix reuses code by inheriting from the parent class rather than re-implementing the method when n_classes == 2.

~~Some checks have failed, not sure why.~~ This was previously due to a missing check_is_fitted check and has been now fixed.

jnothman · 2019-01-16T22:20:35Z

Please add a |Fix| entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

agamemnonc · 2019-01-17T12:13:45Z

Please add a |Fix| entry to the change log at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

Done, thanks for the instructions.

jnothman · 2019-02-28T05:28:55Z

It would be good to get this in 0.21. Thanks @agamemnonc if you can get to it.

agamemnonc · 2019-02-28T07:33:30Z

Apologies for the delay, I have been very busy with a submission recently, will try to deal with this by the end of this week.

jnothman · 2019-02-28T18:14:50Z

Thank you :)

agamemnonc · 2019-03-01T12:46:44Z

OK folks, I think I have now implemented everything we have agreed on.

If tests pass, @agramfort, @jnothman, @glemaitre could you please have a final look and merge if happy or let me know otherwise.

glemaitre

LGTM

amueller · 2019-03-01T17:08:50Z

sklearn/tests/test_discriminant_analysis.py

+        n_samples=90000, centers=blob_centers, covariances=blob_stds,
+        random_state=42
+    )
+    lda = LinearDiscriminantAnalysis(solver='lsqr').fit(X, y)


do we want to test this for the other solvers as well? how long does the test take given the amount of samples?

Good point.

Including the two other solvers adds only 0.09 s.

The test passes for solver=svd, but fails when solver=eigen.
This is probably related to #11727.

~~@amueller Shall we only include svd and lsqr for now in the tests and take a note in that other PR to update the tests to also include eigen when a fix is submitted?~~

I will try to also provide a fix for the eigen solver in this PR.

OK, I think I have now fixed that other issue which was due to bad normalisation of the eigenvectors and was causing issues with probabilities for the eigen solver #11727 . I have updated the rst file accordingly.

Now all three solvers are tested in the non-regression test.

agamemnonc · 2019-03-06T09:40:45Z

@glemaitre @jnothman you might need to re-approve, as I have modified the code in _solve_eigen. This was causing issues with the probabilities when using the eigen method (cf. #11727) --I am not sure why this normalisation was introduced there.

sklearn/tests/test_discriminant_analysis.py

agramfort

besides my nitpick LGTM

agramfort · 2019-03-07T16:44:23Z

thx @agamemnonc

glemaitre · 2019-03-09T18:39:21Z

A bit late but thanks a lot @agamemnonc

agamemnonc · 2019-03-10T09:42:03Z

A bit late but thanks a lot @agamemnonc

My pleasure—thank you all for all your help and feedback.

* fix LDA predict_proba() to handle binary and multi-class case * test_lda_predict_proba non-regression test * pep8 fix * lda predict_proba refactoring * Typo fix * flake8 fix * predict_proba check_is_fitted check * update what's new rst file * rename prob to decision * include additional tests for predict_proba * use allcose vs. assert_array_almost_equal * fix indent * replace len with size * explicit computation for binary case * fix style whats_new rst * predict_proba new regression test * give credit for regression test * fix bug for eigen solution * include all three solvers in predict_proba regression test * update whats_new rst file * fix minor formatting issue * use scipy.linalg instead of np.linalg

This reverts commit deea1e8.

* fix LDA predict_proba() to handle binary and multi-class case * test_lda_predict_proba non-regression test * pep8 fix * lda predict_proba refactoring * Typo fix * flake8 fix * predict_proba check_is_fitted check * update what's new rst file * rename prob to decision * include additional tests for predict_proba * use allcose vs. assert_array_almost_equal * fix indent * replace len with size * explicit computation for binary case * fix style whats_new rst * predict_proba new regression test * give credit for regression test * fix bug for eigen solution * include all three solvers in predict_proba regression test * update whats_new rst file * fix minor formatting issue * use scipy.linalg instead of np.linalg

fix LDA predict_proba() to handle binary and multi-class case

206f1de

agamemnonc mentioned this pull request Aug 11, 2018

LinearDiscriminantAnalysis predict probability bug #6848

Closed

agamemnonc changed the title ~~Fix LDA predict_proba()~~ [WIP] Fix LDA predict_proba() Aug 11, 2018

jnothman reviewed Aug 11, 2018

View reviewed changes

test_lda_predict_proba non-regression test

824e0cc

pep8 fix

6560db8

agamemnonc changed the title ~~[WIP] Fix LDA predict_proba()~~ [MRG] Fix LDA predict_proba() Aug 24, 2018

amueller added the Bug label Oct 22, 2018

jnothman reviewed Dec 4, 2018

View reviewed changes

jnothman approved these changes Dec 4, 2018

View reviewed changes

agamemnonc added 2 commits December 5, 2018 10:16

lda predict_proba refactoring

c6e2b62

Typo fix

d7a1226

Merge master & predict_proba() fix (test previously failed)

163683e

agamemnonc added 2 commits January 16, 2019 10:56

flake8 fix

85d45b0

predict_proba check_is_fitted check

15b59e7

jnothman previously approved these changes Jan 16, 2019

View reviewed changes

update what's new rst file

cac71cc

agamemnonc added 4 commits March 1, 2019 11:34

fix style whats_new rst

7320883

predict_proba new regression test

9613fea

give credit for regression test

55e3d2a

Merge branch 'master' into lda_predict_proba_fix

2c27e9b

glemaitre approved these changes Mar 1, 2019

View reviewed changes

agamemnonc changed the title ~~[MRG+1] Fix LDA predict_proba()~~ [MRG+2] Fix LDA predict_proba() Mar 1, 2019

amueller reviewed Mar 1, 2019

View reviewed changes

Merge branch 'master' into lda_predict_proba_fix

ce85441

agamemnonc mentioned this pull request Mar 1, 2019

Linear Discriminant Analysis eigen solver questionable implementation #11727

Closed

agamemnonc added 4 commits March 4, 2019 10:20

fix bug for eigen solution

9d198b1

include all three solvers in predict_proba regression test

bd4370e

update whats_new rst file

480e108

fix minor formatting issue

06e8572

agramfort reviewed Mar 6, 2019

View reviewed changes

sklearn/tests/test_discriminant_analysis.py Outdated Show resolved Hide resolved

agramfort approved these changes Mar 6, 2019

View reviewed changes

use scipy.linalg instead of np.linalg

625c3f6

agramfort merged commit 4140657 into scikit-learn:master Mar 7, 2019

agamemnonc deleted the lda_predict_proba_fix branch March 7, 2019 17:00

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+2] Fix LDA predict_proba() (scikit-learn#11796)"

1f6bed6

This reverts commit deea1e8.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+2] Fix LDA predict_proba() (scikit-learn#11796)"

69690d7

This reverts commit deea1e8.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+2] Fix LDA predict_proba() #11796

[MRG+2] Fix LDA predict_proba() #11796

[MRG+2] Fix LDA predict_proba() #11796

[MRG+2] Fix LDA predict_proba() #11796

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment