-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Hausdorff Distance (updated) #4382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ance, improved test coverage
…larger point distances getting through to the hausdorff distance
skimage/metrics/_set_metrics.pyx
Outdated
|
||
import numpy as np | ||
|
||
def hausdorff_distance_onesided(cnp.float64_t[:, ::1] points_sup, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of my pet comments here: in terms of speed, how would this Cython implementation compare to using a CKDTree, in particular https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query (the query
method for one nearest-neighbor, which directly returns the distance)?
My guestimate is that for a small number of points the Cython will be faster, but that for a large number of points the KDTree might win. Are you interested in testing this @clementkng ? If you don't have the time I might give it a try.
I guess it also depends on the application for which we expect this function to be used. Will it be rather for very sparse images or for more busy ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @emmanuelle, I can try to compare the cKDTree vs cython, but I'm not super familiar w/ how we'd test this. I'm guessing this is similar to the benchmarking process for the initial implementation of Hausdorff distance?
I expect the function to be used for reasonably busy images, as one of the applications is to find and recognize image shapes within a larger image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just take a realistic example and see how the two methods compare, and also rescale the image of interest with several factors so that you can see if the fastest method depends on the number of non-zero pixels. Automatic benchmarks with asv are useful but here it's just to decide which method is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emmanuelle to ensure reproducibility, should I push the code I use to compare the methods into the benchmark_metrics.py as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not at this stage because if one method is a clear winner we will not bother with the other. What I would do is write a python script calling the function on images of different sizes, and run this script in two different branches (each branch with a different implementation), and paste the script and its result here. But really whatever workflow works for you is fine :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soupault I'm not sure I've seen the surface-distance library used in an Jupyter notebook. I'm having a little trouble including it in the comparisons, is there a code sample somewhere of how to import the package and use it if possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking how to install this library? You should be able to import compute_robust_hausdorff from surface_distance
after following the install instructions in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So something like from surface_distance import compute_robust_hausdorff
should work once I've installed the package right? I believe I'm having some issues since surface distance is not Python3 compatible, but I need Python3 to run/build scikit-image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a link to a first draft of benchmarks against cKD tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clementkng From the issue you linked it looks like only that one import must be updated to be compatible with Python 3? You might succeed by applying that change to your local copy.
Thanks for the notebook! I've left a suggestion to make the comparison more meaningful.
Thanks a lot @clementkng for this thorough benchmarks, it's a lot of work. So it seems that for a 256x256 image the cKDTree method is faster, this is the intuition that I had (cKDTree should be faster when images are large). Most images will be larger than 256^2, so I'd vote for the cKDTree, Thoughts from @lagru, @soupault or others? @clementkng would you still have the energy to make this change :-) ? |
@emmanuelle I agree that we should use the cKDTree version by default. Also, did you see my comment on the benchmark: it might be interesting to vendor the Cython cKDTree code so that we don't pay the function call overhead there. That could potentially accelerate the function even much more! |
Definitely worth looking into! With this PR and #4165 we'd have at least two algorithms that might benefit. Although that should definitely happen in a separate PR and involve some actual benchmarks that demonstrate a significant speed-up (I strongly suspect that there will be one). Beforehand we should compare scipy's cKDTree with sklearn's KDTree implementation and others if there are any. |
@jni I saw you comment and commented too ;-). We might get away with a faster solution by calling |
@emmanuelle very nice catch! The new numbers are eye-popping! 🙌 |
Hi @emmanuelle and @rfezzani, I realized I accidentally assumed the PR was over when the benchmarks proved that the scipy.cKDTree significantly outperformed this implementation, so I'm not sure what finishing this PR would mean. Sorry for the late response and misunderstanding! Based on the benchmark, should I start using scipy.cKDTree in my implementation? What's the best way to proceed? 😅 |
Hello @clementkng! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-07-02 02:57:22 UTC |
@rfezzani thanks for the feedback 😄 |
So I believe I'll need to rebase to resolve the conflict that is preventing tests from running, but last time I did that I wasn't able to resolve the merge conflict and as a result had to make a new branch and PR (which is this one). Would anyone be willing to help me rebase? |
Hi @clementkng! We usually squash and merge these days when there is a long and tortuous commit history, so you should be good to
|
skimage/measure/__init__.py
Outdated
'shannon_entropy' | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might not need a rebase if you revert this change!
skimage/metrics/setup.py
Outdated
from numpy.distutils.core import setup | ||
setup(maintainer='scikit-image Developers', | ||
maintainer_email='scikit-image@python.org', | ||
description='Graph-based Image-processing Algorithms', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description should be changed, e.g. "Metrics to compare images"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@clementkng I had to go over the code and review the history to remind myself of the current status.
Yes, I think the idea is to replace |
…on more description
Thanks for the feedback @jni 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clementkng awesome, thank you for updating!!! This is almost ready, all that's left is a bit of nitpicking about the API 😅 Please let us know if you would like us to take over!
skimage/metrics/set_metrics.py
Outdated
from ._set_metrics import hausdorff_distance_onesided | ||
|
||
|
||
def hausdorff_distance(a, b): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can probably do better with these parameter names. For images we use image0, image1
, or image_true, image_test
so here I think we should use points0, points1
(my preference) or points_true, points_test
, or even coords0, coords1
. Any thoughts on this @clementkng @scikit-image/core ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... Actually, I just saw that these are images, not actual points! So the docstring must be updated (suggestion below) and I would call these image0
and image1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
skimage/metrics/set_metrics.py
Outdated
The Hausdorff distance between sets ``a`` and ``b``, using | ||
Euclidian distance to calculate the distance between points in ``a`` | ||
and ``b``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Hausdorff distance between sets ``a`` and ``b``, using | |
Euclidian distance to calculate the distance between points in ``a`` | |
and ``b``. | |
The Hausdorff distance between coordinates of nonzero pixels in | |
``image0`` and ``image1``, using the Euclidian distance. |
skimage/metrics/set_metrics.py
Outdated
|
||
Parameters | ||
---------- | ||
a, b : ndarray, dtype=bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove the dtype specification, since this works totally fine for integer arrays. It could be used to compare two segmentations, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
skimage/metrics/set_metrics.py
Outdated
""" | ||
Calculate the Hausdorff distance [1]_ between two sets of points. | ||
|
||
The Hausdorff distance is the maximum distance between any point on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
Calculate the Hausdorff distance [1]_ between two sets of points. | |
The Hausdorff distance is the maximum distance between any point on | |
""" | |
Calculate the Hausdorff distance between nonzero elements of given images. | |
The Hausdorff distance [1]_ is the maximum distance between any point on |
skimage/metrics/set_metrics.py
Outdated
if a.dtype != np.bool or b.dtype != np.bool: | ||
raise ValueError('Arrays must have dtype = \'bool\'') | ||
if a.shape != b.shape: | ||
raise ValueError('Array shapes must be identical') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I would remove these checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(certainly the first one!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
skimage/metrics/set_metrics.py
Outdated
# Handle empty sets properly | ||
if a_points.shape[0] == 0 or b_points.shape[0] == 0: | ||
if a_points.shape[0] == b_points.shape[0]: | ||
# Both sets are empty and thus the distance is zero | ||
return 0. | ||
else: | ||
# Exactly one set is empty; the distance is infinite | ||
return np.inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would use len
instead of .shape[0]
, as that is more readable as "if there are zero points".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
skimage/metrics/set_metrics.py
Outdated
a_points = np.require(a_points, np.float64, ['C']) | ||
b_points = np.require(b_points, np.float64, ['C']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would solve that by adding .astype(np.float64)
to the creation calls...
skimage/metrics/set_metrics.py
Outdated
""" | ||
Calculate the Hausdorff distance between nonzero elements of given images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring should be on the same line as the quotes (many UIs rely on this to render the first line as per PEP257).
""" | |
Calculate the Hausdorff distance between nonzero elements of given images. | |
"""Calculate the Hausdorff distance between nonzero elements of given images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scikit-image/core this is ready imho! (except for a very minor documentation fix that I don't think should hold up merging.)
@emmanuelle do you have time to review and 🤞 merge today? |
ping @scikit-image/core anyone have time to review/potentially merge this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we decided to use cKDTree
, I think that we no more need to use Cython!
The hausdorff_distance_onesided
function can now be defined in the set_metrics.py
file (as a hidden function?) and the _set_metrics.pyx
and setup.py
files can be removed.
The requirements for contiguous arrays becomes also obsolete ;)
@rfezzani Why would |
Because it was just used internally by the |
🎉 Thank you for your undeterred patience, @clementkng! It's so awesome to have this in! |
🎉 thank you @clementkng! |
Description
Supersedes gh-742 and gh-4005
This is my attempt to fix the feedback on @josteinbf 's Hausdorff distance PR and add docs, examples, and extra tests. I've changed the methods to a single
hausdorff_distance
method to be more clear.Existing discussion at gh-738 and gh-4005.
Hausdorff distance can be used to determine the degree of resemblance two objects have between each other when superimposed on top of each other, for example in this paper.
Gallery Example Output
Checklist
./doc/examples
(new features only)./benchmarks
, if your changes aren't covered by anexisting benchmark
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.@meeseeksdev backport to v0.14.x