Increase speed plot_birch_vs_minibatchkmeans.py #21703

Iglesys347 · 2021-11-18T14:41:35Z

Reference Issues/PRs

References #21598

What does this implement/fix? Explain your changes.

Reduced the number of samples (n_samples) in the make_blobs function.

Also changed the batch_size in MiniBatchKMeans. The documentation of MiniBatchKMeans says : "For faster compuations, you can set the batch_size greater than 256 * number of cores to enable parallelism on all cores.".

The purpose of those changes is to icrease the execution speed.

Here are the output and the time taken by the script before the changes (the time taken has been measured with the unix command time):

BIRCH without global clustering as the final step took 3.27 seconds
n_clusters : 158
BIRCH with global clustering as the final step took 3.25 seconds
n_clusters : 100
Time taken to run MiniBatchKMeans 4.02 seconds

real    0m13,758s
user    0m14,533s
sys     0m1,728s

And here the resulting plot:

Now the output and the time taken by the script after the changes:

BIRCH without global clustering as the final step took 1.06 seconds
n_clusters : 158
BIRCH with global clustering as the final step took 1.07 seconds
n_clusters : 100
Time taken to run MiniBatchKMeans 0.86 seconds

real    0m6,032s
user    0m7,936s
sys     0m2,008s

And the plot:

…dividing by 2 parameters n_samples, n_features, rank

…n speed

ogrisel · 2021-11-18T14:52:02Z

I think the point of this example is to compare algorithms that have no problem running on datasets with hundreds of thousands (or even millions) of data points. If think it's find to have an example that lasts ~10s to demonstrates scalability of estimators on data with larg-ish number of samples.

Therefore I would rather not change this example. WDYT @adrinjalali?

adrinjalali · 2021-11-18T15:05:18Z

I kinda agree with you @ogrisel . But I'd also be happy with this change, plus a note saying that the number of samples can be extended to a few hundred thousands w/o any issue, but not have it in the CI. WDYT?

examples/cluster/plot_birch_vs_minibatchkmeans.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

examples/cluster/plot_birch_vs_minibatchkmeans.py

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

ogrisel

LGTM! Let's wait for the CI to complete, just in case.

adrinjalali · 2021-11-18T17:07:25Z

@Iglesys347 could you please merge the latest main to make sure the docs are generated correctly?

Retrieve latest chagement

…se_speed_plot_birch_vs_minibatchkmeans

Iglesys347 · 2021-11-18T19:06:59Z

@adrinjalali @ogrisel All good ! Thank you both for your comments.

ogrisel · 2021-11-19T08:45:59Z

Runtime has been halved on the CI (from less than 8s to less than 4s). Thanks for the contribution.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Iglesys347 added 3 commits November 15, 2021 09:29

ENH Impoving execution speed of plot_pca_vs_fa_model_selection.py by …

16b3a8a

…dividing by 2 parameters n_samples, n_features, rank

ENH Reduced n_samples and batch_size optimisation to improve executio…

3b5a9ef

…n speed

ENH change in docstring to match with changes in code

1205666

adrinjalali changed the title ~~Increase speed plot birch vs minibatchkmeans~~ Increase speed plot_birch_vs_minibatchkmeans.py Nov 18, 2021

ogrisel reviewed Nov 18, 2021

View reviewed changes

examples/cluster/plot_birch_vs_minibatchkmeans.py Show resolved Hide resolved

ogrisel reviewed Nov 18, 2021

View reviewed changes

examples/cluster/plot_birch_vs_minibatchkmeans.py Outdated Show resolved Hide resolved

adrinjalali mentioned this pull request Nov 18, 2021

Accelerate slow examples #21598

Closed

41 tasks

ogrisel reviewed Nov 18, 2021

View reviewed changes

examples/cluster/plot_birch_vs_minibatchkmeans.py Outdated Show resolved Hide resolved

Iglesys347 and others added 3 commits November 18, 2021 16:38

Apply @ogrisel 's suggestion

0ef18ca

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Replacing os by joblib (@ogrisel 's suggestion)

d8810d1

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ENH change name variable n_centre to n_centers

cc61407

adrinjalali approved these changes Nov 18, 2021

View reviewed changes

examples/cluster/plot_birch_vs_minibatchkmeans.py Outdated Show resolved Hide resolved

Apply @adrinjalali 's suggestion

0a9a7e9

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

ogrisel approved these changes Nov 18, 2021

View reviewed changes

Iglesys347 and others added 2 commits November 18, 2021 19:24

Merge pull request #1 from scikit-learn/main

d4f8bfd

Retrieve latest chagement

Merge branch 'main' of github.com:Iglesys347/scikit-learn into increa…

7644bb1

…se_speed_plot_birch_vs_minibatchkmeans

ogrisel merged commit 5856205 into scikit-learn:main Nov 19, 2021

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 22, 2021

Increase speed plot_birch_vs_minibatchkmeans.py (scikit-learn#21703)

12613aa

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021

Increase speed plot_birch_vs_minibatchkmeans.py (scikit-learn#21703)

59526ae

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

Increase speed plot_birch_vs_minibatchkmeans.py (scikit-learn#21703)

27735b4

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021

Increase speed plot_birch_vs_minibatchkmeans.py (scikit-learn#21703)

8b78e0a

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

glemaitre pushed a commit that referenced this pull request Dec 25, 2021

Increase speed plot_birch_vs_minibatchkmeans.py (#21703)

43d7c92

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Increase speed plot_birch_vs_minibatchkmeans.py #21703

Increase speed plot_birch_vs_minibatchkmeans.py #21703

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Increase speed plot_birch_vs_minibatchkmeans.py #21703

Increase speed plot_birch_vs_minibatchkmeans.py #21703

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!