8000 Optimize ensure_spacing by rfezzani · Pull Request #5062 · scikit-image/scikit-image · GitHub
[go: up one dir, main page]

Skip to content

Optimize ensure_spacing #5062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Dec 12, 2020

Conversation

rfezzani
Copy link
Member

Description

Fixes #5061. This PR proposes to process input points by batch. Using this strategy timing is considerably reduced:

Figure_3

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

@rfezzani rfezzani added this to the 0.18 milestone Nov 12, 2020
Copy link
Member
@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfezzani thanks for this, sorry for the long silence!

If I'm reading this correctly, this means that the cKDTree construction is quadratic, and you can get it cheaper by removing nearby points each time, so you never build a tree with all the points in it?

I do wonder how this affects the worst case, though: evenly spaced points just beyond min_distance away, so you don't actually remove anything in each batch...

One idea would be to instead double the batches starting from 10 up to the total size in the original space of points. So you build with the top 10, 20, 40, ... n peaks, removing points each time. That way, in the worst case, right before doing all the points you only do half the points, which is 1/4 the time, so you are protected from a dramatic blow-up of time in the worst case.

Thoughts?

@rfezzani
Copy link
Member Author

No worry @jni, we all are so busy these days 😉.

You are absolutely right, it's an excellent suggestion! the strategy that you propose also has the advantage of removing the batch_size parameter 😄!

@pep8speaks
Copy link
pep8speaks commented Dec 8, 2020

Hello @rfezzani! Thanks for updating this PR. We checked the lines 8000 you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-12-09 10:39:58 UTC

Copy link
Member
@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfezzani in the nick of time for 0.18.0, maybe! 😅 Do you want to check benchmarks with this new strategy both in a normal case and in the previous worst-case of evenly-spaced points? But I'm happy as is, thank you! 🎉

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
@rfezzani
Copy link
Member Author
rfezzani commented Dec 9, 2020

Hi @jni, thank you for your review! I measured the performances using

In [1]: from skimage._shared.coord import ensure_spacing                                                 

In [2]: for f, x in [("random", np.random.rand(10000, 2)), 
   ...:              ("worst", np.arange(20000).reshape((-1, 2)))]: 
   ...:     print(f) 
   ...:     for s in [None, 100, 500, 1000]: 
   ...:         print(f"s = {s}") 
   ...:         for n in range(500, 10001, 500): 
   ...:             print(f"n = {n}") 
   ...:             %timeit ensure_spacing(x[:n], 0.2, min_split_size=s) 
   ...:         print(4*'\n')

and obtained

Figure_1

Unfortunately, the new approach is twice as slow as the previous approach on the random points, but it is ~3x faster on worst case min_split_size = 500 which should be the default parameter I think.

@jni
Copy link
Member
jni commented Dec 10, 2020

Thanks @rfezzani! I can't say I know which approach is better to have, but we should have one of them and the newer one seems like an ok choice for now — we can tweak later. @scikit-image/core anyone else available to review+merge this?

@alexdesiqueira
Copy link
Member

Thank you @rfezzani! I'll merge, based on @jni arguments; I like it so far.

@alexdesiqueira alexdesiqueira merged commit 7cb0732 into scikit-image:master Dec 12, 2020
@rfezzani
Copy link
Member Author

Thank you @alexdesiqueira !

@jni
Copy link
Member
jni commented Dec 13, 2020

@meeseeksdev backport to v0.18.x

meeseeksmachine pushed a commit to meeseeksmachine/scikit-image that referenced this pull request Dec 13, 2020
jni pushed a commit that referenced this pull request Dec 13, 2020
Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ensure_spacing function may be slow
4 participants
0