Non-standard density estimation scores #8327

rfeinman · 2017-02-09T19:04:54Z

There is a non-standard normalization step performed when computing the log probability scores for KernelDensity. Using the data dimensionality to normalize the partition function of the kernel is very ad-hoc. If the user's data has high dimensionality they will observe absurdly small log probability scores, and will be very confused. I have traced the issue back to _log_kernel_norm in neighbors/binary_tree.pxi:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/binary_tree.pxi#L465-L493

At the very least this should be explained in the documentation if the code is to remain the way it currently stands now.

agramfort · 2017-02-10T08:10:33Z

@jakevdp any opinion? thx

jakevdp · Apr 4, 2017

Sorry for the belated response. I don't think it's ad-hoc to normalize a kernel density estimate so that the result is actually a density. But perhaps the docs could be clarified to say explicitly that the KDE is returning a probability density rather than a probability.

Gunkkk · 2019-08-26T15:18:11Z

Sorry for the belated response. I don't think it's ad-hoc to normalize a kernel density estimate so that the result is actually a density. But perhaps the docs could be clarified to say explicitly that the KDE is returning a probability density rather than a probability.

Then, is there any func for probability calculation please?

jakevdp · 2019-08-26T16:21:27Z

The result is a probability density, which is the only probability measure that makes sense for a continuous field.

cmarmo · 2020-09-29T16:03:22Z

Fixed in #11275

amueller added Easy Well-defined and straightforward way to resolve Documentation help wanted labels May 22, 2018

haroldfox mentioned this issue Jun 15, 2018

Kde doc fix #11275

Merged

cmarmo closed this as completed Sep 29, 2020

cmarmo removed the help wanted label Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Non-standard density estimation scores #8327

Non-standard density estimation scores #8327

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Non-standard density estimation scores #8327

Non-standard density estimation scores #8327

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!