Speed up skimage.feature.peak_local_max #3984

clementkng · 2019-06-30T08:47:03Z

Description

@brownmk's proposal to modify the peak_local_max algorithm to recursively pass a small image enclosing each object rather than an image of the same size, improving algorithm performance.

Fixes gh-3974

Checklist

Docstrings for all functions
~~- [ ] Gallery example in ./doc/examples (new features only)~~
Benchmark in ./benchmarks, if your changes aren't covered by an
existing benchmark
Unit tests
Clean style in the spirit of PEP8

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.
Consider backporting the PR with @meeseeksdev backport to v0.14.x

sciunto · 2019-07-01T18:49:33Z

@clementkng Thanks for submitting. Is it still a WIP? I see that all the tests are passing...

clementkng · 2019-07-01T18:58:05Z

@sciunto Hi, I believe it may still be a WIP b/c the algorithm has changed a bit since to pass the tests, so I'm not sure if the performance gain is still there. Also, I'm considering putting in new tests that @brownmk used initially to test the code. Or do you think there's enough coverage w/ the existing tests?

brownmk · 2019-07-01T19:09:35Z

Hi, The performance gain is still there. If you want to add my test case, use the method below:

import numpy as np
from scipy import ndimage as ndi
from peak import peak_local_max

def test_many_objects():
    mask=np.zeros([500,500], dtype=bool)
    x,y=np.indices((500,500))
    x_c=x//20*20+10
    y_c=y//20*20+10
    mask[(x-x_c)**2+(y-y_c)**2<8**2]=True
    # create a mask, label each disk, create distance image for peak searching
    labels, num_objs=ndi.label(mask)
    dist=ndi.distance_transform_edt(mask)
    local_max=peak_local_max(dist, min_distance=20, indices=True, exclude_border=False, labels=labels)
    assert(len(local_max) == 625)

sciunto

I left several comments (sorry if I raised some on which you already wanted to work on before reviews). They are all minor and intended to improve the code readability.

sciunto · 2019-07-01T19:04:53Z

skimage/feature/peak.py

+        else:
+            return out
+
+    threshold_abs = threshold_abs if threshold_abs is not None else image.min()


Minor: I would move the block "if type(exclude_border)..." few lines above just before or after this line, to gather these lines which have the same spirit.

sounds good.

Only this one to fix, and it will be good for me.

sciunto · 2019-07-01T19:05:05Z

skimage/feature/peak.py

+
+        for i, obj in enumerate(ndi.find_objects(labels)):
+            label = i + 1
+            # print("Label>", label)


This line can be removed.

sciunto · 2019-07-01T19:05:30Z

skimage/feature/peak.py

+
+def _exclude_border(mask, footprint, exclude_border):
+    """
+    Remove peaks round the borders


Suggested change

Remove peaks round the borders

Remove peaks round the borders.

sciunto · 2019-07-01T19:05:39Z

skimage/feature/peak.py

+def _get_peak_mask(image, min_distance, footprint, threshold_abs,
+                   threshold_rel):
+    """
+    Return the mask containing all peak candidates above thresholds


Suggested change

Return the mask containing all peak candidates above thresholds

Return the mask containing all peak candidates above thresholds.

sciunto · 2019-07-01T19:08:02Z

skimage/feature/peak.py

+                                         footprint, exclude_border)
+
+        for i, obj in enumerate(ndi.find_objects(labels)):
+            label = i + 1


This variable seems to be unused... (Am i correct?)

The variable label is used in the next line:

img=image[obj]*(labels[obj]==label)

sciunto · 2019-07-01T19:13:09Z

skimage/feature/peak.py

+            return out
+
+    threshold_abs = threshold_abs if threshold_abs is not None else image.min()
+


The comment just below " # In the case of labels, recursively build and return an output
# operating on each label separately" seems to be uncorrect to me now. There is no more recursive call and instead, it's a call to ndi on each label. Can you fix this please?

sciunto · 2019-07-01T19:14:08Z

skimage/feature/peak.py

+            inner_mask = _exclude_border(np.ones_like(labels, dtype=bool),
+                                         footprint, exclude_border)
+
+        for i, obj in enumerate(ndi.find_objects(labels)):


Perhaps a comment describing what this for loop is doing would help the reader.

@brownmk How would you describe this loop?

For each label, extract a smaller image enclosing the object of interest, identify num_peaks_per_label peaks and mark them in variable out.

sciunto · 2019-07-01T19:16:50Z

skimage/feature/peak.py

+        for i, obj in enumerate(ndi.find_objects(labels)):
+            label = i + 1
+            # print("Label>", label)
+            img = image[obj] * (labels[obj] == label)


Can we use a less generic variable name here?

@brownmk What would you recommend here?

call it image_object? Up to you.

pep8speaks · 2019-07-01T20:29:30Z

Hello @clementkng! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-07-25 02:51:42 UTC

sciunto · 2019-07-02T05:31:17Z

The test failure is due to #3985 Not blocking for this PR.

@jni @hmaarrfk would you like to have a look as well?

hmaarrfk · 2019-07-03T00:36:28Z

skimage/feature/peak.py

+        mask = mask.swapaxes(0, i)
+        remove = (footprint.shape[i] if footprint is not None
+                  else 2 * exclude_border)
+        mask[:remove // 2] = mask[-remove // 2:] = False


I really dislike swapping axes

mask[(slice(None),) * i + slice(None, remove // 2)] = False mask[(slice(None),) * i + slice(-remove // 2, None)] = False

Sorry, are you referring to the line above or below or both?

Replace

mask = mask.swapaxes(0, i) remove = (footprint.shape[i] if footprint is not None else 2 * exclude_border) mask[:remove // 2] = mask[-remove // 2:] = False mask = mask.swapaxes(0, i)

with what @hmaarrfk provided:

remove = (footprint.shape[i] if footprint is not None else 2 * exclude_border) mask[(slice(None),) * i + slice(None, remove // 2)] = False mask[(slice(None),) * i + slice(-remove // 2, None)] = False

The code actually should be:

def _exclude_border(mask, footprint, exclude_border): """ Remove peaks round the borders """ # zero out the image borders for i in range(mask.ndim): remove = (footprint.shape[i] if footprint is not None else 2 * exclude_border) mask[(slice(None),) * i + (slice(None, remove // 2),)] = False mask[(slice(None),) * i + (slice(-remove // 2, None),)] = False return mask

Thanks for making that super explicit

The comment should be ("round" was a typo):

Remove peaks near the borders

hmaarrfk · 2019-07-03T00:38:39Z

skimage/feature/peak.py

+    out = np.zeros_like(image, dtype=np.bool)
+
+    # no peak for a trivial image
+    if np.all(image == image.flat[0]):


trivial images are definitely an "edge case", is this check really necessary? the call to np.all and checking equality for large images is actually quite expensive. Since this is an "unlikely" scenerio, is it better to move this logic to the end or remove it all together.

Moving the code to the end does not help, as it will still need to be checked before final results are returned (potentially overwrite peak candidates).
I am okay with removing the logic, just we might need to somehow documente that there could be an incompatibility for edge cases.

Here is an example, where there was no peak before, it will become 25 peaks. I agree this is a case probably users should check, instead of checking it every time within the library.

image = np.ones((5, 5)) # return 0, will return 25 if we remove the trivial case check print(len(peak.peak_local_max(image, min_distance=0, threshold_abs=0, indices=True)))

OK, thanks for the explanation. I guess this can be left as a secondary issue.

hmaarrfk · 2019-07-03T00:42:34Z

skimage/feature/peak.py


+        coordinates = _get_high_intensity_peaks(image, out, num_peaks)
        if indices is True:
            return coordinates
        else:
            nd_indices = tuple(coordinates.T)
            out[nd_indices] = True


coordinates is obtained from out. Do you need to set out again?

Good point. If num_peaks is provided, out may contain more than num_peaks candidates. It needs to be regenerated in that case.

I actually see a bug here. It was missing a statement out.fill(False).
The code review really helps!!

It should be:

if not indices and np.isinf(num_peaks): return out coordinates = _get_high_intensity_peaks(image, out, num_peaks) if indices: return coordinates else: out.fill(False) nd_indices = tuple(coordinates.T) out[nd_indices] = True return out

Where did the first two lines of code come from?

The first two lines is to deal with the case where user does not provide num_peaks (defaults to all, np.inf) and indices is False, out is already the answer that can be returned.
The code I provided (bold) in within the context below:

for i,obj in enumerate(ndi.find_objects(labels)): label=i+1 #print("Label>", label) img=image[obj]*(labels[obj]==label) mask=_get_peak_mask(img, min_distance, footprint, threshold_abs, threshold_rel) if exclude_border: # remove peaks fall in the exclude region mask &= inner_mask[obj] coordinates =_get_high_intensity_peaks(img, mask, num_peaks_per_label) nd_indices = tuple(coordinates.T) mask.fill(False) mask[nd_indices] = True out[obj] += mask if not indices and np.isinf(num_peaks): return out coordinates = _get_high_intensity_peaks(image, out, num_peaks) if indices: return coordinates else: out.fill(False) nd_indices = tuple(coordinates.T) out[nd_indices] = True return out # Non maximum filter mask = _get_peak_mask(image, min_distance, footprint, threshold_abs, threshold_rel)

@brownmk are you sure you don't want to open your own PR and take it over from here? You seem to be going through this quite extensively. Github Desktop, and editors like Atom or VS Code make git much less painful.

It's less of git, it's the 500-line guideline seems rather intimidating. All issues are addressed now, I will try to learn it next time.

https://github.com/scikit-image/scikit-image/blob/master/CONTRIBUTING.txt

That is a really fair point. I'll open an issue about this.

Most of that is a reference for us to point people to when they hit problems. We often forget how we setup our own environments....

@brownmk If you want, I can try guiding you through a setup process in exchange for allowing me to create a PR w/ your code.

hmaarrfk · 2019-07-03T00:44:00Z

Anyway to add teh benchmark code to the benchmarks dir?

clementkng · 2019-07-03T17:22:55Z

@hmaarrfk I can add the benchmark code (assuming you mean the code sample under Way to Produce in the original issue). I can extrapolate based off of other benchmarks how the benchmark for peak_local_max should be written, but since I would need conda to run asv, it would require some significant overhead on my end to get that set up. Would it be faster to push my benchmark code and have someone else confirm that it works, or should I proceed w/ a local install?

hmaarrfk · 2019-07-04T03:35:23Z

i don't think you need conda for asv do you?

clementkng · 2019-07-04T06:55:22Z

According to the install link, I need either virtualenv or conda. For now, the virtualenv solution is not working for me, possibly b/c I'm working of the Windows Subsystem for Linux.

hmaarrfk · 2019-07-04T15:13:31Z

Windows Subsystem for Linux

Having never used WSL it seems like a potential challenge. I feel like you are always going to hit these problems and never know if your code is wrong, or if it is a bug in WSL.

hmaarrfk · 2019-07-04T16:00:24Z

skdev) mark2@xps ~/g/scikit-image speed-up-peak-local-max↑·1|+2                                                                                   
$ asv continuous -b PeakLocalMaxSuite -E conda:3.7 master speed-up-peak-local-max                                                                  
· Creating environments
· Discovering benchmarks
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For scikit-image commit 7cd12184 <master> (round 1/2):
[  0.00%] ·· Building for conda-py3.7-cython-numpy-scipy.
[  0.00%] ·· Benchmarking conda-py3.7-cython-numpy-scipy
[ 25.00%] ··· Running (benchmark_peak_local_max.PeakLocalMaxSuite.time_peak_local_max--).
[ 25.00%] · For scikit-image commit 5a06093b <speed-up-peak-local-max> (round 1/2):
[ 25.00%] ·· Building for conda-py3.7-cython-numpy-scipy.
[ 25.00%] ·· Benchmarking conda-py3.7-cython-numpy-scipy
[ 50.00%] ··· Running (benchmark_peak_local_max.PeakLocalMaxSuite.time_peak_local_max--).
[ 50.00%] · For scikit-image commit 5a06093b <speed-up-peak-local-max> (round 2/2):
[ 50.00%] ·· Benchmarking conda-py3.7-cython-numpy-scipy
[ 75.00%] ··· benchmark_peak_local_max.PeakLocalMaxSuite.time_peak_local_max                                                             40.2±0.9ms
[ 75.00%] · For scikit-image commit 7cd12184 <master> (round 2/2):
[ 75.00%] ·· Building for conda-py3.7-cython-numpy-scipy.
[ 75.00%] ·· Benchmarking conda-py3.7-cython-numpy-scipy
[100.00%] ··· benchmark_peak_local_max.PeakLocalMaxSuite.time_peak_local_max                                                                1.75±0s
       before           after         ratio
     [7cd12184]       [5a06093b]
     <master>         <speed-up-peak-local-max>
-         1.75±0s       40.2±0.9ms     0.02  benchmark_peak_local_max.PeakLocalMaxSuite.time_peak_local_max

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

hmaarrfk · 2019-07-04T16:10:57Z

Seems to work too

asv continuous -b PeakLocalMaxSuite -E virtualenv:3.7 master speed-up-peak-local-max

hmaarrfk · 2019-07-04T16:25:40Z

I think the test for the trivial case should be added back. Unless we decide that this is a behavioural change we want to make. Which I don't think this PR is advocating for.

You can test for it in a different way, that isn't O(N) in cost.

$ git diff skimage/                                                                                                                                
diff --git a/skimage/feature/peak.py b/skimage/feature/peak.py
index 7736bf70c..7aa6ae20a 100644
--- a/skimage/feature/peak.py
+++ b/skimage/feature/peak.py
@@ -190,6 +190,14 @@ def peak_local_max(image, min_distance=1, threshold_abs=None,
             out[obj] += mask
 
         coordinates = _get_high_intensity_peaks(image, out, num_peaks)
+
+        # Test for the case where the image is a constant
+        # Here, we decide to return to the user that there is no local maximum
+        if coordinates.shape[0] == image.size:
+            if indices is True:
+                return np.empty((0, 2), np.int)
+            else:
+                return np.zeros_like(image, dtype=bool)
         if indices is True:
             return coordinates
         else:
@@ -205,6 +213,14 @@ def peak_local_max(image, min_distance=1, threshold_abs=None,
     # Select highest intensities (num_peaks)
     coordinates = _get_high_intensity_peaks(image, mask, num_peaks)
 
+    # Test for the case where the image is a constant
+    # Here, we decide to return to the user that there is no local maximum
+    if coordinates.shape[0] == image.size:
+        if indices is True:
+            return np.empty((0, 2), np.int)
+        else:
+            return np.zeros_like(image, dtype=bool)
+
     if indices is True:
         return coordinates
     else:

A test like the

In [1]: from skimage.feature.peak import peak_local_max                                                                                            
i
In [2]: import numpy as np                                                                                                                         

In [3]: image = np.ones((5, 5))                                                                                                                    

In [4]: print(len(peak_local_max(image, min_distance=0, threshold_abs=0, indices=True)))                                                           
0

Should also be added to test suite so that this behaviour doesn't magically disappear

brownmk · 2019-07-04T17:09:02Z

The change is not right. One cannot determine if the image is trivial by relying on the number of peaks detected. The returned number of peaks depends on the settings of threshold and num_peaks. For instance one can set num_peaks to 1 for a trivial image. We should stick with the original code here.

I think the test for the trivial case should be added back. Unless we decide that this is a behavioural change we want to make. Which I don't think this PR is advocating for.

You can test for it in a different way, that isn't O(N) in cost.

hmaarrfk · 2019-07-04T17:10:45Z

Ah got it. Sorry for misunderstanding that. Yeah, for this PR. the original check should be kept.

…idates

clementkng · 2019-07-24T17:30:37Z

@hmaarrfk Is there any update in the background on this PR or the issue about the trivial check?

hmaarrfk · 2019-07-25T02:25:14Z

background? I think you had the PR mostly complete. I don't think the failures are due to your code.

hmaarrfk · 2019-07-25T02:26:34Z

skimage/feature/tests/test_peak.py

+        dist = ndi.distance_transform_edt(mask)
+        local_max = peak.peak_local_max(dist, min_distance=20, indices=True,
+                                        exclude_border=False, labels=labels)
+        assert (len(local_max) == 625)


Suggested change

assert (len(local_max) == 625)

assert len(local_max) == 625

just triggering a build

Maybe just apply this to trigger a fresh new build. I think builds are fixed.

hmaarrfk

Builds should be passing these days.

hmaarrfk · 2019-07-25T02:56:27Z

I think you can likely remove your WIP too.

sciunto · 2019-07-27T09:24:21Z

Thanks @clementkng That's a great contribution!

scikit-image#3984 adds an optimization where a flat image immediately returns. However, it returns an array where the shape is incorrectly set to always be 2D when indices=True. A test is added to catch this scenario, and a fix is introduced where we set the output array's shape based on the input array's shape.

* Fix peak finding for empty images when indices=True #3984 adds an optimization where a flat image immediately returns. However, it returns an array where the shape is incorrectly set to always be 2D when indices=True. A test is added to catch this scenario, and a fix is introduced where we set the output array's shape based on the input array's shape.

clementkng added 2 commits June 30, 2019 00:58

Added faster peak_local_max implementation

1823e7c

Fix style issues

25eceaf

sciunto added 📈 type: Performance 🧐 Needs review labels Jul 1, 2019

sciunto added this to the 0.16 milestone Jul 1, 2019

sciunto reviewed Jul 1, 2019

View reviewed changes

clementkng added 3 commits July 1, 2019 12:49

Added test for many objects in peak_local_max

31caaef

Addressed some code feedback

88bd2c7

Addressed the rest of the feedback

6a6c302

PEP8 issue

a541b53

hmaarrfk reviewed Jul 3, 2019

View reviewed changes

Addressed inline feedback

6e24c3f

Addressed swapaxes call

7ce9787

Add a benchmark for peak local max

7d108a3

Fixup the use of a tab character instead of spaces in docs

fb34031

clementkng and others added 4 commits July 4, 2019 13:22

Addressed potential bug where out may contain more than num_peak cand…

05441e4

…idates

Added original check for trivial images back in

33e6700

Fixed PEP8 issues in benchmark

6a38880

Touchup comments a tiny bit for benchmark

a2bc6f7

hmaarrfk mentioned this pull request Jul 6, 2019

Trivial image check for peak_local_max #3990

Closed

hmaarrfk reviewed Jul 25, 2019

View reviewed changes

hmaarrfk approved these changes Jul 25, 2019

View reviewed changes

Modification to test to retrigger build

39c17db

clementkng changed the title ~~[WIP] Speed up skimage.feature.peak_local_max~~ Speed up skimage.feature.peak_local_max Jul 25, 2019

sciunto added action: mrg+1 and removed 🧐 Needs review labels Jul 25, 2019

sciunto approved these changes Jul 27, 2019

View reviewed changes

sciunto merged commit 7ec25d9 into scikit-image:master Jul 27, 2019

ttung mentioned this pull request Oct 21, 2019

Fix peak finding for empty images when indices=True #4263

Merged

9 tasks

	Remove peaks round the borders
	Remove peaks round the borders.

	Return the mask containing all peak candidates above thresholds
	Return the mask containing all peak candidates above thresholds.

		return out

		threshold_abs = threshold_abs if threshold_abs is not None else image.min()

Uh oh!

Speed up skimage.feature.peak_local_max #3984

Speed up skimage.feature.peak_local_max #3984

Uh oh!

Conversation

Uh oh!

Description

Checklist

For reviewers

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Comment last updated at 2019-07-25 02:51:42 UTC

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment