8000 BUG (maybe) wrong node bound spread in KernelDensity · Issue #27186 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

BUG (maybe) wrong node bound spread in KernelDensity #27186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Charlie-XIAO opened this issue Aug 28, 2023 · 2 comments
Open

BUG (maybe) wrong node bound spread in KernelDensity #27186

Charlie-XIAO opened this issue Aug 28, 2023 · 2 comments
Labels
Needs Investigation Issue requires investigation

Comments

@Charlie-XIAO
Copy link
Contributor
Charlie-XIAO commented Aug 28, 2023

Describe the bug

node_log_bound_spreads[i1] = (log(N1) +
compute_log_kernel(dist_LB_1,
h, kernel))

node_log_bound_spreads[i2] = (log(N2) +
compute_log_kernel(dist_LB_2,
h, kernel))

These lines do not seem to be correct. The right-hand side seems to be node log max bound but not node log bound spread. Should they instead be

node_log_bound_spreads[i1] = logsubexp(log(N1)+ compute_log_kernel(dist_LB_1, h, kernel),
                                       node_log_min_bounds[i1])
node_log_bound_spreads[i2] = logsubexp(log(N2)+ compute_log_kernel(dist_LB_2, h, kernel),
                                       node_log_min_bounds[i2])

I don't really find an unexpected result or something, just got confused when reading the code.

@Charlie-XIAO Charlie-XIAO added Bug Needs Triage Issue requires triage labels Aug 28, 2023
@glemaitre
Copy link
Member

@jakevdp Do you think that you could have a quick look at those lines?

@glemaitre glemaitre added Needs Investigation Issue requires investigation and removed Needs Triage Issue requires triage Bug labels Sep 21, 2023
@Charlie-XIAO
Copy link
Contributor Author
Charlie-XIAO commented Jul 23, 2024

In reponse to #27971 (comment) @glemaitre I'm providing some more intuition here:

node_log_min_bounds[i1] = (log(N1) +
compute_log_kernel(dist_UB_1,
h, kernel))
node_log_bound_spreads[i1] = (log(N1) +
compute_log_kernel(dist_LB_1,
h, kernel))

In the above snippet dist_UB_1 and dist_LB_1 are respectively the maximum and minimum distance between the point to node i1. The node lower bound should thus be node weight multiplied by the kernel computed from the maximum distance (because kernel function is decreasing) and if you take the logarithm you can see that the first part is correct. However the second part seems to be computing the node upper bound instead of the node bound spread (upper bound minus lower bound) since it is the same equation as when computing the node lower bound, except replacing maximum distance with minimum. I'm proposing that we should subtract by node lower bound (so logsubexp by node_log_min_bounds) here.

min_max_dist{{name_suffix}}(self, i1, pt, &dist_LB, &dist_UB)
child1_log_min_bound = log(N1) + compute_log_kernel(dist_UB, h,
kernel)
child1_log_bound_spread = logsubexp(log(N1) +
compute_log_kernel(dist_LB, h,
kernel),
child1_log_min_bound)

The above is one of the other places where node boundaries are updated for your reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Investigation Issue requires investigation
Projects
None yet
Development

No branches or pull requests

2 participants
0