-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Simplify computation of radius to match BIRCH more closely #19251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kno10 , could you please leave comments for the next person easily understand the optimized code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a small renaming to and some inline comment to explain how the cluster radius is computed.
@kno10 @adrinjalali @jnothman I let you check the PR with my last commit. I think we can merge without changelog because it's unlikely to be of interest to the end users but it's definitely an improvement for the maintainers and people who read the code in general. |
Looks good to me. |
add guard against negative "variance".
Thanks @kno10 |
Note that dot_product = -2 * n * sq_norm_ here, hence we do not need to recompute it.
Effectively, the old code does squared_sum_ / n_samples_ - 2 * sq_norm_ + sq_norm_