-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Non-standard density estimation scores #8327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jakevdp any opinion? thx |
Sorry for the belated response. I don't think it's ad-hoc to normalize a kernel density estimate so that the result is actually a density. But perhaps the docs could be clarified to say explicitly that the KDE is returning a probability density rather than a probability. |
Then, is there any func for probability calculation please? |
The result is a probability density, which is the only probability measure that makes sense for a continuous field. |
Fixed in #11275 |
There is a non-standard normalization step performed when computing the log probability scores for KernelDensity. Using the data dimensionality to normalize the partition function of the kernel is very ad-hoc. If the user's data has high dimensionality they will observe absurdly small log probability scores, and will be very confused. I have traced the issue back to _log_kernel_norm in neighbors/binary_tree.pxi:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/neighbors/binary_tree.pxi#L465-L493
At the very least this should be explained in the documentation if the code is to remain the way it currently stands now.
The text was updated successfully, but these errors were encountered: