Hausdorff Distance (updated) #4382

clementkng · 2019-12-31T21:41:15Z

Description

This is my attempt to fix the feedback on @josteinbf 's Hausdorff distance PR and add docs, examples, and extra tests. I've changed the methods to a single hausdorff_distance method to be more clear.

Existing discussion at gh-738 and gh-4005.

Hausdorff distance can be used to determine the degree of resemblance two objects have between each other when superimposed on top of each other, for example in this paper.

Gallery Example Output

Checklist

Docstrings for all functions
Gallery example in ./doc/examples (new features only)
Benchmark in ./benchmarks, if your changes aren't covered by an
existing benchmark
Unit tests
Clean style in the spirit of PEP8

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.
Consider backporting the PR with @meeseeksdev backport to v0.14.x

…ance, improved test coverage

…dicative

…larger point distances getting through to the hausdorff distance

emmanuelle · 2020-01-03T20:53:02Z

skimage/metrics/_set_metrics.pyx

+
+import numpy as np
+
+def hausdorff_distance_onesided(cnp.float64_t[:, ::1] points_sup,


One of my pet comments here: in terms of speed, how would this Cython implementation compare to using a CKDTree, in particular https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query (the query method for one nearest-neighbor, which directly returns the distance)?

My guestimate is that for a small number of points the Cython will be faster, but that for a large number of points the KDTree might win. Are you interested in testing this @clementkng ? If you don't have the time I might give it a try.

I guess it also depends on the application for which we expect this function to be used. Will it be rather for very sparse images or for more busy ones?

Hi @emmanuelle, I can try to compare the cKDTree vs cython, but I'm not super familiar w/ how we'd test this. I'm guessing this is similar to the benchmarking process for the initial implementation of Hausdorff distance?

I expect the function to be used for reasonably busy images, as one of the applications is to find and recognize image shapes within a larger image.

I would just take a realistic example and see how the two methods compare, and also rescale the image of interest with several factors so that you can see if the fastest method depends on the number of non-zero pixels. Automatic benchmarks with asv are useful but here it's just to decide which method is better.

@emmanuelle to ensure reproducibility, should I push the code I use to compare the methods into the benchmark_metrics.py as well?

not at this stage because if one method is a clear winner we will not bother with the other. What I would do is write a python script calling the function on images of different sizes, and run this script in two different branches (each branch with a different implementation), and paste the script and its result here. But really whatever workflow works for you is fine :-).

@soupault I'm not sure I've seen the surface-distance library used in an Jupyter notebook. I'm having a little trouble including it in the comparisons, is there a code sample somewhere of how to import the package and use it if possible?

Are you asking how to install this library? You should be able to import compute_robust_hausdorff from surface_distance after following the install instructions in the readme.

So something like from surface_distance import compute_robust_hausdorff should work once I've installed the package right? I believe I'm having some issues since surface distance is not Python3 compatible, but I need Python3 to run/build scikit-image.

Here is a link to a first draft of benchmarks against cKD tree.

@clementkng From the issue you linked it looks like only that one import must be updated to be compatible with Python 3? You might succeed by applying that change to your local copy.

Thanks for the notebook! I've left a suggestion to make the comparison more meaningful.

emmanuelle · 2020-02-04T03:49:33Z

Thanks a lot @clementkng for this thorough benchmarks, it's a lot of work. So it seems that for a 256x256 image the cKDTree method is faster, this is the intuition that I had (cKDTree should be faster when images are large). Most images will be larger than 256^2, so I'd vote for the cKDTree, Thoughts from @lagru, @soupault or others? @clementkng would you still have the energy to make this change :-) ?

jni · 2020-02-04T04:45:32Z

@emmanuelle I agree that we should use the cKDTree version by default. Also, did you see my comment on the benchmark: it might be interesting to vendor the Cython cKDTree code so that we don't pay the function call overhead there. That could potentially accelerate the function even much more!

lagru · 2020-02-04T12:11:04Z

it might be interesting to vendor the Cython cKDTree code

Definitely worth looking into! With this PR and #4165 we'd have at least two algorithms that might benefit. Although that should definitely happen in a separate PR and involve some actual benchmarks that demonstrate a significant speed-up (I strongly suspect that there will be one). Beforehand we should compare scipy's cKDTree with sklearn's KDTree implementation and others if there are any.

emmanuelle · 2020-02-04T19:46:37Z

@jni I saw you comment and commented too ;-). We might get away with a faster solution by calling tree.query on a whole array, then no need to vendor anything.

jni · 2020-02-05T03:38:49Z

@emmanuelle very nice catch! The new numbers are eye-popping! 🙌

clementkng · 2020-05-19T00:26:06Z

@clementkng can we help you with something to finish this PR :-) ?

Hi @emmanuelle and @rfezzani, I realized I accidentally assumed the PR was over when the benchmarks proved that the scipy.cKDTree significantly outperformed this implementation, so I'm not sure what finishing this PR would mean. Sorry for the late response and misunderstanding! Based on the benchmark, should I start using scipy.cKDTree in my implementation? What's the best way to proceed? 😅

pep8speaks · 2020-06-06T00:51:00Z

Hello @clementkng! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file skimage/metrics/set_metrics.py:

Line 4:1: E302 expected 2 blank lines, found 1
Line 5:80: E501 line too long (81 > 79 characters)

Comment last updated at 2020-07-02 02:57:22 UTC

clementkng · 2020-06-06T01:00:08Z

@rfezzani thanks for the feedback 😄

clementkng · 2020-06-20T05:22:30Z

So I believe I'll need to rebase to resolve the conflict that is preventing tests from running, but last time I did that I wasn't able to resolve the merge conflict and as a result had to make a new branch and PR (which is this one). Would anyone be willing to help me rebase?

jni · 2020-06-22T08:16:10Z

Hi @clementkng! We usually squash and merge these days when there is a long and tortuous commit history, so you should be good to git merge master while on this branch. If you do want to rebase, then you can do:

git checkout hausdorff-distance-new
git rebase master
git status
# fix any merge conflicts
# use git add to mark resolved files
git rebase --continue
# repeat above until git status is clean
git push origin --force-with-lease

jni · 2020-06-22T08:17:09Z

skimage/measure/__init__.py

+           'shannon_entropy'
+           ]


You might not need a rebase if you revert this change!

jni · 2020-06-22T08:23:08Z

skimage/metrics/setup.py

+    from numpy.distutils.core import setup
+    setup(maintainer='scikit-image Developers',
+          maintainer_email='scikit-image@python.org',
+          description='Graph-based Image-processing Algorithms',


Description should be changed, e.g. "Metrics to compare images"

jni · 2020-06-22T08:25:36Z

@clementkng I had to go over the code and review the history to remind myself of the current status.

I realized I accidentally assumed the PR was over when the benchmarks proved that the scipy.cKDTree significantly outperformed this implementation, so I'm not sure what finishing this PR would mean. Sorry for the late response and misunderstanding! Based on the benchmark, should I start using scipy.cKDTree in my implementation? What's the best way to proceed?

Yes, I think the idea is to replace hausdorff_distance_onesided in this PR with the cKDTree implementation. Then it'll be ready to merge!

…on more description

clementkng · 2020-06-24T02:29:31Z

Thanks for the feedback @jni 😄

jni

@clementkng awesome, thank you for updating!!! This is almost ready, all that's left is a bit of nitpicking about the API 😅 Please let us know if you would like us to take over!

jni · 2020-06-24T05:28:10Z

skimage/metrics/set_metrics.py

+from ._set_metrics import hausdorff_distance_onesided
+
+
+def hausdorff_distance(a, b):


I think we can probably do better with these parameter names. For images we use image0, image1, or image_true, image_test so here I think we should use points0, points1 (my preference) or points_true, points_test, or even coords0, coords1. Any thoughts on this @clementkng @scikit-image/core ?

... Actually, I just saw that these are images, not actual points! So the docstring must be updated (suggestion below) and I would call these image0 and image1.

jni · 2020-06-24T05:34:22Z

skimage/metrics/set_metrics.py

+        The Hausdorff distance between sets ``a`` and ``b``, using
+        Euclidian distance to calculate the distance between points in ``a``
+        and ``b``.


Suggested change

The Hausdorff distance between sets ``a`` and ``b``, using

Euclidian distance to calculate the distance between points in ``a``

and ``b``.

The Hausdorff distance between coordinates of nonzero pixels in

``image0`` and ``image1``, using the Euclidian distance.

jni · 2020-06-24T05:35:03Z

skimage/metrics/set_metrics.py

+
+    Parameters
+    ----------
+    a, b : ndarray, dtype=bool


I'd remove the dtype specification, since this works totally fine for integer arrays. It could be used to compare two segmentations, for example.

jni · 2020-06-24T05:36:00Z

skimage/metrics/set_metrics.py

+    """
+    Calculate the Hausdorff distance [1]_ between two sets of points.
+
+    The Hausdorff distance is the maximum distance between any point on


Suggested change

"""

Calculate the Hausdorff distance [1]_ between two sets of points.

The Hausdorff distance is the maximum distance between any point on

"""

Calculate the Hausdorff distance between nonzero elements of given images.

The Hausdorff distance [1]_ is the maximum distance between any point on

jni · 2020-06-24T05:36:33Z

skimage/metrics/set_metrics.py

+    if a.dtype != np.bool or b.dtype != np.bool:
+        raise ValueError('Arrays must have dtype = \'bool\'')
+    if a.shape != b.shape:
+        raise ValueError('Array shapes must be identical')


Personally, I would remove these checks.

(certainly the first one!)

jni · 2020-06-24T05:39:07Z

skimage/metrics/set_metrics.py

+    # Handle empty sets properly
+    if a_points.shape[0] == 0 or b_points.shape[0] == 0:
+        if a_points.shape[0] == b_points.shape[0]:
+            # Both sets are empty and thus the distance is zero
+            return 0.
+        else:
+            # Exactly one set is empty; the distance is infinite
+            return np.inf


Here I would use len instead of .shape[0], as that is more readable as "if there are zero points".

jni · 2020-06-24T05:40:20Z

skimage/metrics/set_metrics.py

+    a_points = np.require(a_points, np.float64, ['C'])
+    b_points = np.require(b_points, np.float64, ['C'])


Personally I would solve that by adding .astype(np.float64) to the creation calls...

…to be clearer

jni · 2020-06-25T01:35:17Z

skimage/metrics/set_metrics.py

+    """
+    Calculate the Hausdorff distance between nonzero elements of given images.


The docstring should be on the same line as the quotes (many UIs rely on this to render the first line as per PEP257).

Suggested change

"""

Calculate the Hausdorff distance between nonzero elements of given images.

"""Calculate the Hausdorff distance between nonzero elements of given images.

jni

@scikit-image/core this is ready imho! (except for a very minor documentation fix that I don't think should hold up merging.)

jni · 2020-06-29T15:35:54Z

@emmanuelle do you have time to review and 🤞 merge today?

jni · 2020-07-01T14:34:49Z

ping @scikit-image/core anyone have time to review/potentially merge this PR?

rfezzani

Since we decided to use cKDTree, I think that we no more need to use Cython!
The hausdorff_distance_onesided function can now be defined in the set_metrics.py file (as a hidden function?) and the _set_metrics.pyx and setup.py files can be removed.
The requirements for contiguous arrays becomes also obsolete ;)

skimage/metrics/set_metrics.py

skimage/metrics/setup.py

clementkng · 2020-07-02T00:22:07Z

@rfezzani Why would hausdorff_distance_onesided need to be a hidden function?

skimage/setup.py

rfezzani · 2020-07-02T08:53:05Z

@rfezzani Why would hausdorff_distance_onesided need to be a hidden function?

Because it was just used internally by the hausdorff_distance function, but not defining the hausdorff_distance_onesided function as you did is a good option too ;)

jni · 2020-07-02T09:02:49Z

🎉 Thank you for your undeterred patience, @clementkng! It's so awesome to have this in!

rfezzani · 2020-07-02T09:06:11Z

🎉 thank you @clementkng!

clementkng · 2020-07-03T01:51:10Z

@jni @rfezzani thank you for your reviews! I'm glad I was able to revive this PR after being gone for so long.

josteinbf and others added 23 commits December 31, 2019 13:14

Add Haussdorff distance to measure subpackage.

633245f

Hausdorff: Handle empty sets properly.

8aa0535

Hausdorff: Add basic docstrings.

3471342

hausdorff: Test images with single-region pixels.

81da3a4

Hausdorff: Improve variable naming.

c1304ce

Hausdorff: Use numpy functions where possible.

da6b9ca

Updated documentation and tests, addressed some feedback

5a3005d

Removed hausdorff distance region since it was calling hausdorff dist…

f5b35f5

…ance, improved test coverage

Added docstring example

24db998

PEP8

f791651

Added hausdorff benchmark

a7393dc

[WIP] Gallery example for hausdorff

59b4122

PEP8

898671a

Moved hausdorff distance to metrics, changed file names to be more in…

6a3daf5

…dicative

PEP8

becc805

Removed python2 statement and always false checks

0e5ec2c

Added 3D test

2b02fbb

Added failing test based on gallery

0c3a87b

Fixed bug that terminated the inner for loop too early, resulting in …

6540c11

…larger point distances getting through to the hausdorff distance

Updated gallery example

6e0de1a

Changed tests to use classes to avoid global variable use

4e5b702

PEP8, removed unused constant

c79279f

Removed bento.info in attempt to clear Travis builds

c7db20f

clementkng mentioned this pull request Dec 31, 2019

[WIP] Hausdorff Distance (new) #4005

Closed

9 tasks

emmanuelle reviewed Jan 3, 2020

View reviewed changes

Remove redundant assertion and refactored test classes to test functions

972582b

PEP8

af7775c

jni reviewed Jun 22, 2020

View reviewed changes

Use faster scipy.spatial.cKDTree implementation, make setup descripti…

a98dd64

…on more description

jni requested changes Jun 24, 2020

View reviewed changes

Modify API descriptions, remove unnecessary checks, refactor np code …

15cfb67

…to be clearer

jni reviewed Jun 25, 2020

View reviewed changes

jni approved these changes Jun 25, 2020

View reviewed changes

Move beginning of docstring to same line as quotes

17283f0

rfezzani suggested changes Jul 1, 2020

View reviewed changes

skimage/metrics/set_metrics.py Outdated Show resolved Hide resolved

skimage/metrics/setup.py Outdated Show resolved Hide resolved

Remove Cython code now that we're using cKDTree

367e2a9

rfezzani reviewed Jul 2, 2020

View reviewed changes

skimage/setup.py Show resolved Hide resolved

rfezzani approved these changes Jul 2, 2020

View reviewed changes

jni merged commit 061a22e into scikit-image:master Jul 2, 2020

clementkng mentioned this pull request Jan 29, 2021

Add hausdorff points #5207

Merged


		import numpy as np

		def hausdorff_distance_onesided(cnp.float64_t[:, ::1] points_sup,

		from ._set_metrics import hausdorff_distance_onesided


		def hausdorff_distance(a, b):

		a_points = np.require(a_points, np.float64, ['C'])
		b_points = np.require(b_points, np.float64, ['C'])

		"""
		Calculate the Hausdorff distance between nonzero elements of given images.

	"""
	Calculate the Hausdorff distance between nonzero elements of given images.
	"""Calculate the Hausdorff distance between nonzero elements of given images.

Uh oh!

Hausdorff Distance (updated) #4382

Hausdorff Distance (updated) #4382

Uh oh!

Conversation

Uh oh!

Description

Gallery Example Output

Checklist

For reviewers

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Comment last updated at 2020-07-02 02:57:22 UTC

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment