Optimize ensure_spacing #5062

rfezzani · 2020-11-12T10:28:03Z

Description

Fixes #5061. This PR proposes to process input points by batch. Using this strategy timing is considerably reduced:

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.

This reverts commit db38358.

jni

@rfezzani thanks for this, sorry for the long silence!

If I'm reading this correctly, this means that the cKDTree construction is quadratic, and you can get it cheaper by removing nearby points each time, so you never build a tree with all the points in it?

I do wonder how this affects the worst case, though: evenly spaced points just beyond min_distance away, so you don't actually remove anything in each batch...

One idea would be to instead double the batches starting from 10 up to the total size in the original space of points. So you build with the top 10, 20, 40, ... n peaks, removing points each time. That way, in the worst case, right before doing all the points you only do half the points, which is 1/4 the time, so you are protected from a dramatic blow-up of time in the worst case.

Thoughts?

rfezzani · 2020-11-20T08:26:34Z

No worry @jni, we all are so busy these days 😉.

You are absolutely right, it's an excellent suggestion! the strategy that you propose also has the advantage of removing the batch_size parameter 😄!

pep8speaks · 2020-12-08T16:24:15Z

Hello @rfezzani! Thanks for updating this PR. We checked the lines 8000 you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-09 10:39:58 UTC

jni

@rfezzani in the nick of time for 0.18.0, maybe! 😅 Do you want to check benchmarks with this new strategy both in a normal case and in the previous worst-case of evenly-spaced points? But I'm happy as is, thank you! 🎉

skimage/_shared/coord.py

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>

rfezzani · 2020-12-09T15:32:46Z

Hi @jni, thank you for your review! I measured the performances using

In [1]: from skimage._shared.coord import ensure_spacing                                                 

In [2]: for f, x in [("random", np.random.rand(10000, 2)), 
   ...:              ("worst", np.arange(20000).reshape((-1, 2)))]: 
   ...:     print(f) 
   ...:     for s in [None, 100, 500, 1000]: 
   ...:         print(f"s = {s}") 
   ...:         for n in range(500, 10001, 500): 
   ...:             print(f"n = {n}") 
   ...:             %timeit ensure_spacing(x[:n], 0.2, min_split_size=s) 
   ...:         print(4*'\n')

and obtained

Unfortunately, the new approach is twice as slow as the previous approach on the random points, but it is ~3x faster on worst case min_split_size = 500 which should be the default parameter I think.

jni · 2020-12-10T07:11:12Z

Thanks @rfezzani! I can't say I know which approach is better to have, but we should have one of them and the newer one seems like an ok choice for now — we can tweak later. @scikit-image/core anyone else available to review+merge this?

alexdesiqueira · 2020-12-12T17:23:39Z

Thank you @rfezzani! I'll merge, based on @jni arguments; I like it so far.

rfezzani · 2020-12-12T21:32:53Z

Thank you @alexdesiqueira !

jni · 2020-12-13T00:21:24Z

@meeseeksdev backport to v0.18.x

Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>

rfezzani added 12 commits May 22, 2020 09:41

Move trivial check to _get_peak_mask

db38358

Merge branch 'master' of https://github.com/scikit-image/scikit-image

2cafddd

Revert "Move trivial check to _get_peak_mask"

3f86cb5

This reverts commit db38358.

Merge branch 'master' of https://github.com/scikit-image/scikit-image

e9c41fa

Merge branch 'master' of https://github.com/scikit-image/scikit-image

2e5f49e

Merge branch 'master' of https://github.com/scikit-image/scikit-image

1fb3fbf

Merge branch 'master' of github.com:rfezzani/scikit-image

1373a9e

Merge branch 'master' of github.com:rfezzani/scikit-image

aef392b

Merge branch 'master' of https://github.com/scikit-image/scikit-image

9136ae3

Merge branch 'master' of github.com:rfezzani/scikit-image

7ed7a62

Merge branch 'master' of https://github.com/scikit-image/scikit-image

47eb9b8

Add batch processing strategy to ensure_spacing

f8d9a6c

rfezzani added the 📈 type: Performance label Nov 12, 2020

rfezzani added this to the 0.18 milestone Nov 12, 2020

rfezzani added 2 commits November 12, 2020 14:10

Modify default batch_size value

6d34f6e

Fix ensure_spacing output dtype

620fb08

mkcor mentioned this pull request Nov 16, 2020

2020's calendar of community management #4486

Closed

jni reviewed Nov 20, 2020

View reviewed changes

Implement @jni suggestion

57f511b

rfezzani added 2 commits December 8, 2020 17:25

Fix pep8

c1d3003

Fix UT

2697d90

jni approved these changes Dec 9, 2020

View reviewed changes

skimage/_shared/coord.py Outdated Show resolved Hide resolved

Update skimage/_shared/coord.py

c3fda18

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>

alexdesiqueira merged commit 7cb0732 into scikit-image:master Dec 12, 2020

meeseeksmachine mentioned this pull request Dec 13, 2020

Backport PR #5062 on branch v0.18.x (Optimize ensure_spacing) #5135

Merged

meeseeksmachine pushed a commit to meeseeksmachine/scikit-image that referenced this pull request Dec 13, 2020

Backport PR scikit-image#5062: Optimize ensure_spacing

f1b7d68

jni pushed a commit that referenced this pull request Dec 13, 2020

Backport PR #5062 on branch v0.18.x (Optimize ensure_spacing) (#5135)

03ec655

Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>

rfezzani mentioned this pull request Dec 29, 2020

Performance regression in peak_local_max in version 0.18.0 #5161

Open

rfezzani mentioned this pull request Nov 10, 2021

Segfault in peak_local_max with large numbed of segments #6010

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Optimize ensure_spacing #5062

Optimize ensure_spacing #5062

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimize ensure_spacing #5062

Optimize ensure_spacing #5062

Uh oh!

Conversation

Description

For reviewers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Comment last updated at 2020-12-09 10:39:58 UTC

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!