LinearDiscriminantAnalysis predict_proba should use softmax #5149

amueller · 2015-08-24T20:37:30Z

It uses an OVR normalization for multi-class for some unknown (to me) reason.
See #5134.

mblondel · 2015-08-26T12:29:49Z

AFAIK there are several variants of LDA in the multiclass case.

@cle1109 @kazemakase Which one do we implement?

cbrnr · 2015-08-26T12:44:40Z

We actually didn't change anything in predict_proba, but looking at the code I see that the implementation is a bit strange (or at least I don't understand why it's done like this). For example, why are there two arguments for np.exp and np.reciprocal?

Anyway, the code is identical to the one in logistic regression, so I guess we're doing the same thing here. Since @MechCoder is working on a fix, we could use it in LDA as well.

mblondel · 2015-08-26T14:10:28Z

For example, why are there two arguments for np.exp and np.reciprocal?

np.exp(a, a) is equal to np.exp(a, out=a). The named argument is not supported with older versions of NumPy.

Anyway, the code is identical to the one in logistic regression, so I guess we're doing the same thing here

The way to compute predict_proba may depend on the variant of multiclass LDA we're using. Since there are several variants, the code may be wrong or it may be correct.

@amueller What makes you think softmax is the right way to compute predict_proba here? Any reference?

cbrnr · 2015-08-26T14:19:52Z

np.exp(a, a) is equal to np.exp(a, out=a). The named argument is not supported with older versions of NumPy.

I see. I would have used a = np.exp(a), which is unambiguous because I don't have to look up the arguments.

Regarding the LDA variant, we're computing discriminant functions for each class. The class with the highest value will be selected for a particular sample (this is inherited from LinearClassifierMixin). Is that what you mean by variant of LDA?

mblondel · 2015-08-26T14:25:02Z

I would have used a = np.exp(a)

The point is to save a memory allocation by doing the operation in place but I agree this is not big deal.

Is that what you mean by variant of LDA?

From C. Bishop's book: "There are now many possible choices of criterion (Fukunaga, 1990)"

we're computing discriminant functions for each class

This sounds to me like the current code (compute each proba then normalize) is correct then.

amueller · 2015-08-27T02:04:49Z

I don't have my bishop here, we should check the reference.
It just seems like a very odd formula.
I was assuming that this is just p(y|x) under the probabilistic model where p(x|y) is gaussian with shared covariance.

amueller added the Bug label Aug 24, 2015

amueller added this to the 0.16.2 milestone Aug 24, 2015

amueller modified the milestones: 0.16.2, 0.17 Sep 8, 2015

amueller modified the milestones: 0.18, 0.17 Sep 20, 2015

amueller modified the milestones: 0.18, 0.19 Sep 22, 2016

jnothman changed the title ~~LDA predict_proba should use softmax~~ LinearDiscriminantAnalysis predict_proba should use softmax Jun 14, 2017

jnothman modified the milestones: 0.20, 0.19 Jun 14, 2017

rth mentioned this issue Jun 14, 2017

LatentDirichletAllocation Perplexity too big on Wiki dump #8943

Open

glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018

agamemnonc mentioned this issue Feb 7, 2019

[MRG+2] Fix LDA predict_proba() #11796

Merged

agramfort closed this as completed in #11796 Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LinearDiscriminantAnalysis predict_proba should use softmax #5149

LinearDiscriminantAnalysis predict_proba should use softmax #5149

LinearDiscriminantAnalysis predict_proba should use softmax #5149

LinearDiscriminantAnalysis predict_proba should use softmax #5149

Comments