10000 Simplify computation of radius to match BIRCH more closely by kno10 · Pull Request #19251 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Simplify computation of radius to match BIRCH more closely #19251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 31, 2021

Conversation

kno10
Copy link
Contributor
@kno10 kno10 commented Jan 22, 2021

Note that dot_product = -2 * n * sq_norm_ here, hence we do not need to recompute it.
Effectively, the old code does squared_sum_ / n_samples_ - 2 * sq_norm_ + sq_norm_

8000
Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kno10 , could you please leave comments for the next person easily understand the optimized code?

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a small renaming to and some inline comment to explain how the cluster radius is computed.

@ogrisel
Copy link
Member
ogrisel commented Jan 28, 2021

@kno10 @adrinjalali @jnothman I let you check the PR with my last commit. I think we can merge without changelog because it's unlikely to be of interest to the end users but it's definitely an improvement for the maintainers and people who read the code in general.

@kno10
Copy link
Contributor Author
kno10 commented Jan 29, 2021

Looks good to me.
I don't think the derivation can be found in the original paper; but it's "obvious" when you are used to this technique, and as far as I know it was used in the original source code. If you want to add a more detailed reference, the derivation is also found in https://arxiv.org/pdf/2006.12881.pdf, Appendix A for BIRCH. Then it might be an alternative to only add a shorter comment, "Computation according to Appendix A, https://arxiv.org/pdf/2006.12881.pdf".

add guard against negative "variance".
@jnothman jnothman merged commit 88be2ab into scikit-learn:main Jan 31, 2021
@jnothman
Copy link
Member

Thanks @kno10

@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0