8000 Non-standard density estimation scores · Issue #8327 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Non-standard density estimation scores #8327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rfeinman opened this issue Feb 9, 2017 · 5 comments
Closed

Non-standard density estimation scores #8327

rfeinman opened this issue Feb 9, 2017 · 5 comments
Labels
Documentation Easy Well-defined and straightforward way to resolve

Comments

@rfeinman
Copy link
rfeinman commented Feb 9, 2017

There is a non-standard normalization step performed when computing the log probability scores for KernelDensity. Using the data dimensionality to normalize the partition function of the kernel is very ad-hoc. If the user's data has high dimensionality they will observe absurdly small log probability scores, and will be very confused. I have traced the issue back to _log_kernel_norm in neighbors/binary_tree.pxi:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/binary_tree.pxi#L465-L493

At the very least this should be explained in the documentation if the code is to remain the way it currently stands now.

@agramfort
Copy link
Member

@jakevdp any opinion? thx

@jakevdp
Copy link
Member
jakevdp commented Apr 4, 2017

Sorry for the belated response. I don't think it's ad-hoc to normalize a kernel density estimate so that the result is actually a density. But perhaps the docs could be clarified to say explicitly that the KDE is returning a probability density rather than a probability.

@amueller amueller added Easy Well-defined and straightforward way to resolve Documentation help wanted labels May 22, 2018
@Gunkkk
Copy link
Gunkkk commented Aug 26, 2019

Sorry for the belated response. I don't think it's ad-hoc to normalize a kernel density estimate so that the result is actually a density. But perhaps the docs could be clarified to say explicitly that the KDE is returning a probability density rather than a probability.

Then, is there any func for probability calculation please?

@jakevdp
Copy link
Member
jakevdp commented Aug 26, 2019

The result is a probability density, which is the only probability measure that makes sense for a continuous field.

@cmarmo
Copy link
Contributor
cmarmo commented Sep 29, 2020

Fixed in #11275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Easy Well-defined and straightforward way to resolve
Projects
None yet
Development

No branches or pull requests

6 participants
0