-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Optimize ensure_spacing #5062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ensure_spacing #5062
Conversation
This reverts commit db38358.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfezzani thanks for this, sorry for the long silence!
If I'm reading this correctly, this means that the cKDTree construction is quadratic, and you can get it cheaper by removing nearby points each time, so you never build a tree with all the points in it?
I do wonder how this affects the worst case, though: evenly spaced points just beyond min_distance away, so you don't actually remove anything in each batch...
One idea would be to instead double the batches starting from 10 up to the total size in the original space of points. So you build with the top 10, 20, 40, ... n peaks, removing points each time. That way, in the worst case, right before doing all the points you only do half the points, which is 1/4 the time, so you are protected from a dramatic blow-up of time in the worst case.
Thoughts?
No worry @jni, we all are so busy these days 😉. You are absolutely right, it's an excellent suggestion! the strategy that you propose also has the advantage of removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfezzani in the nick of time for 0.18.0, maybe! 😅 Do you want to check benchmarks with this new strategy both in a normal case and in the previous worst-case of evenly-spaced points? But I'm happy as is, thank you! 🎉
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Hi @jni, thank you for your review! I measured the performances using In [1]: from skimage._shared.coord import ensure_spacing
In [2]: for f, x in [("random", np.random.rand(10000, 2)),
...: ("worst", np.arange(20000).reshape((-1, 2)))]:
...: print(f)
...: for s in [None, 100, 500, 1000]:
...: print(f"s = {s}")
...: for n in range(500, 10001, 500):
...: print(f"n = {n}")
...: %timeit ensure_spacing(x[:n], 0.2, min_split_size=s)
...: print(4*'\n') and obtained Unfortunately, the new approach is twice as slow as the previous approach on the random points, but it is ~3x faster on worst case |
Thanks @rfezzani! I can't say I know which approach is better to have, but we should have one of them and the newer one seems like an ok choice for now — we can tweak later. @scikit-image/core anyone else available to review+merge this? |
Thank you @alexdesiqueira ! |
@meeseeksdev backport to v0.18.x |
Co-authored-by: Alexandre de Siqueira <alex.desiqueira@igdore.org>
Description
Fixes #5061. This PR proposes to process input points by batch. Using this strategy timing is considerably reduced:
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.