Simplify computation of radius to match BIRCH more closely #19251

kno10 · 2021-01-22T23:30:21Z

Note that dot_product = -2 * n * sq_norm_ here, hence we do not need to recompute it.
Effectively, the old code does squared_sum_ / n_samples_ - 2 * sq_norm_ + sq_norm_

adrinjalali

Thanks @kno10 , could you please leave comments for the next person easily understand the optimized code?

ogrisel

ogrisel left a comment

I pushed a small renaming to and some inline comment to explain how the cluster radius is computed.

ogrisel · 2021-01-28T15:23:01Z

@kno10 @adrinjalali @jnothman I let you check the PR with my last commit. I think we can merge without changelog because it's unlikely to be of interest to the end users but it's definitely an improvement for the maintainers and people who read the code in general.

kno10 · 2021-01-29T17:15:21Z

Looks good to me.
I don't think the derivation can be found in the original paper; but it's "obvious" when you are used to this technique, and as far as I know it was used in the original source code. If you want to add a more detailed reference, the derivation is also found in https://arxiv.org/pdf/2006.12881.pdf, Appendix A for BIRCH. Then it might be an alternative to only add a shorter comment, "Computation according to Appendix A, https://arxiv.org/pdf/2006.12881.pdf".

sklearn/cluster/_birch.py

add guard against negative "variance".

sklearn/cluster/_birch.py

jnothman · 2021-01-31T00:52:20Z

Thanks @kno10

Simplify computation of radius to match BIRCH more closely

4ac5e90

github-actions bot added the module:cluster label Jan 22, 2021

jnothman approved these changes Jan 23, 2021

View reviewed changes

adrinjalali reviewed Jan 23, 2021

View reviewed changes

rename new_norm to new_sq_norm + inline comment

70041f6

ogrisel approved these changes Jan 28, 2021

View reviewed changes

kno10 commented Jan 29, 2021

View reviewed changes

sklearn/cluster/_birch.py Outdated Show resolved Hide resolved

Update _birch.py

923729f

add guard against negative "variance".

kno10 mentioned this pull request Jan 29, 2021

TST Add test for numerical issues in BIRCH #19253

Open

ogrisel reviewed Jan 30, 2021

View reviewed changes

sklearn/cluster/_birch.py Outdated Show resolved Hide resolved

Update sklearn/cluster/_birch.py

4ba253f

jnothman merged commit 88be2ab into scikit-learn:main Jan 31, 2021

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Simplify computation of radius to match BIRCH more closely #19251

Simplify computation of radius to match BIRCH more closely #19251

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Simplify computation of radius to match BIRCH more closely #19251

Simplify computation of radius to match BIRCH more closely #19251

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!