8000 Hausdorff Distance (updated) by clementkng · Pull Request #4382 · scikit-image/scikit-image · GitHub
[go: up one dir, main page]

Skip to content

Hausdorff Distance (updated) #4382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Jul 2, 2020

Conversation

clementkng
Copy link
Contributor
@clementkng clementkng commented Dec 31, 2019

Description

Supersedes gh-742 and gh-4005

This is my attempt to fix the feedback on @josteinbf 's Hausdorff distance PR and add docs, examples, and extra tests. I've changed the methods to a single hausdorff_distance method to be more clear.

Existing discussion at gh-738 and gh-4005.

Hausdorff distance can be used to determine the degree of resemblance two objects have between each other when superimposed on top of each other, for example in this paper.

Gallery Example Output

image

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

@clementkng clementkng mentioned this pull request Dec 31, 2019
9 tasks

import numpy as np

def hausdorff_distance_onesided(cnp.float64_t[:, ::1] points_sup,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my pet comments here: in terms of speed, how would this Cython implementation compare to using a CKDTree, in particular https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.cKDTree.query.html#scipy.spatial.cKDTree.query (the query method for one nearest-neighbor, which directly returns the distance)?

My guestimate is that for a small number of points the Cython will be faster, but that for a large number of points the KDTree might win. Are you interested in testing this @clementkng ? If you don't have the time I might give it a try.

I guess it also depends on the application for which we expect this function to be used. Will it be rather for very sparse images or for more busy ones?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @emmanuelle, I can try to compare the cKDTree vs cython, but I'm not super familiar w/ how we'd test this. I'm guessing this is similar to the benchmarking process for the initial implementation of Hausdorff distance?

I expect the function to be used for reasonably busy images, as one of the applications is to find and recognize image shapes within a larger image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just take a realistic example and see how the two methods compare, and also rescale the image of interest with several factors so that you can see if the fastest method depends on the number of non-zero pixels. Automatic benchmarks with asv are useful but here it's just to decide which method is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmanuelle to ensure reproducibility, should I push the code I use to compare the methods into the benchmark_metrics.py as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not at this stage because if one method is a clear winner we will not bother with the other. What I would do is write a python script calling the function on images of different sizes, and run this script in two different branches (each branch with a different implementation), and paste the script and its result here. But really whatever workflow works for you is fine :-).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soupault I'm not sure I've seen the surface-distance library used in an Jupyter notebook. I'm having a little trouble including it in the comparisons, is there a code sample somewhere of how to import the package and use it if possible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking how to install this library? You should be able to import compute_robust_hausdorff from surface_distance after following the install instructions in the readme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So something like from surface_distance import compute_robust_hausdorff should work once I've installed the package right? I believe I'm having some issues since surface distance is not Python3 compatible, but I need Python3 to run/build scikit-image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a link to a first draft of benchmarks against cKD tree.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clementkng From the issue you linked it looks like only that one import must be updated to be compatible with Python 3? You might succeed by applying that change to your local copy.

Thanks for the notebook! I've left a suggestion to make the comparison more meaningful.

@emmanuelle
Copy link
Member

Thanks a lot @clementkng for this thorough benchmarks, it's a lot of work. So it seems that for a 256x256 image the cKDTree method is faster, this is the intuition that I had (cKDTree should be faster when images are large). Most images will be larger than 256^2, so I'd vote for the cKDTree, Thoughts from @lagru, @soupault or others? @clementkng would you still have the energy to make this change :-) ?

@jni
Copy link
Member
jni commented Feb 4, 2020

@emmanuelle I agree that we should use the cKDTree version by default. Also, did you see my comment on the benchmark: it might be interesting to vendor the Cython cKDTree code so that we don't pay the function call overhead there. That could potentially accelerate the function even much more!

@lagru
Copy link
Member
lagru commented Feb 4, 2020

it might be interesting to vendor the Cython cKDTree code

Definitely worth looking into! With this PR and #4165 we'd have at least two algorithms that might benefit. Although that should definitely happen in a separate PR and involve some actual benchmarks that demonstrate a significant speed-up (I strongly suspect that there will be one). Beforehand we should compare scipy's cKDTree with sklearn's KDTree implementation and others if there are any.

@emmanuelle
Copy link
Member

@jni I saw you comment and commented too ;-). We might get away with a faster solution by calling tree.query on a whole array, then no need to vendor anything.

@jni
Copy link
Member
jni commented Feb 5, 2020

@emmanuelle very nice catch! The new numbers are eye-popping! 🙌

@clementkng
Copy link
Contributor Author
clementkng commented May 19, 2020

@clementkng can we help you with something to finish this PR :-) ?

Hi @emmanuelle and @rfezzani, I realized I accidentally assumed the PR was over when the benchmarks proved that the scipy.cKDTree significantly outperformed this implementation, so I'm not sure what finishing this PR would mean. Sorry for the late response and misunderstanding! Based on the benchmark, should I start using scipy.cKDTree in my implementation? What's the best way to proceed? 😅

@pep8speaks
Copy link
pep8speaks commented Jun 6, 2020

Hello @clementkng! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 4:1: E302 expected 2 blank lines, found 1
Line 5:80: E501 line too long (81 > 79 characters)

Comment last updated at 2020-07-02 02:57:22 UTC

@clementkng
Copy link
Contributor Author

@rfezzani thanks for the feedback 😄

@clementkng
Copy link
Contributor Author

So I believe I'll need to rebase to resolve the conflict that is preventing tests from running, but last time I did that I wasn't able to resolve the merge conflict and as a result had to make a new branch and PR (which is this one). Would anyone be willing to help me rebase?

@jni
Copy link
Member
jni commented Jun 22, 2020

Hi @clementkng! We usually squash and merge these days when there is a long and tortuous commit history, so you should be good to git merge master while on this branch. If you do want to rebase, then you can do:

git checkout hausdorff-distance-new
git rebase master
git status
# fix any merge conflicts
# use git add to mark resolved files
git rebase --continue
# repeat above until git status is clean
git push origin --force-with-lease

Comment on lines 51 to 52
'shannon_entropy'
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might not need a rebase if you revert this change!

from numpy.distutils.core import setup
setup(maintainer='scikit-image Developers',
maintainer_email='scikit-image@python.org',
description='Graph-based Image-processing Algorithms',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description should be changed, e.g. "Metrics to compare images"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jni
Copy link
Member
jni commented Jun 22, 2020

@clementkng I had to go over the code and review the history to remind myself of the current status.

I realized I accidentally assumed the PR was over when the benchmarks proved that the scipy.cKDTree significantly outperformed this implementation, so I'm not sure what finishing this PR would mean. Sorry for the late response and misunderstanding! Based on the benchmark, should I start using scipy.cKDTree in my implementation? What's the best way to proceed?

Yes, I think the idea is to replace hausdorff_distance_onesided in this PR with the cKDTree implementation. Then it'll be ready to merge!

@clementkng
Copy link
Contributor Author

Thanks for the feedback @jni 😄

Copy link
Member
@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clementkng awesome, thank you for updating!!! This is almost ready, all that's left is a bit of nitpicking about the API 😅 Please let us know if you would like us to take over!

from ._set_metrics import hausdorff_distance_onesided


def hausdorff_distance(a, b):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably do better with these parameter names. For images we use image0, image1, or image_true, image_test so here I think we should use points0, points1 (my preference) or points_true, points_test, or even coords0, coords1. Any thoughts on this @clementkng @scikit-image/core ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... Actually, I just saw that these are images, not actual points! So the docstring must be updated (suggestion below) and I would call these image0 and image1.

EED3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 22 to 24
The Hausdorff distance between sets ``a`` and ``b``, using
Euclidian distance to calculate the distance between points in ``a``
and ``b``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Hausdorff distance between sets ``a`` and ``b``, using
Euclidian distance to calculate the distance between points in ``a``
and ``b``.
The Hausdorff distance between coordinates of nonzero pixels in
``image0`` and ``image1``, using the Euclidian distance.


Parameters
----------
a, b : ndarray, dtype=bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove the dtype specification, since this works totally fine for integer arrays. It could be used to compare two segmentations, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 7 to 10
"""
Calculate the Hausdorff distance [1]_ between two sets of points.

The Hausdorff distance is the maximum distance between any point on
Copy link
Member
1CF5

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Calculate the Hausdorff distance [1]_ between two sets of points.
The Hausdorff distance is the maximum distance between any point on
"""
Calculate the Hausdorff distance between nonzero elements of given images.
The Hausdorff distance [1]_ is the maximum distance between any point on

Comment on lines 43 to 46
if a.dtype != np.bool or b.dtype != np.bool:
raise ValueError('Arrays must have dtype = \'bool\'')
if a.shape != b.shape:
raise ValueError('Array shapes must be identical')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would remove these checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(certainly the first one!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 51 to 58
# Handle empty sets properly
if a_points.shape[0] == 0 or b_points.shape[0] == 0:
if a_points.shape[0] == b_points.shape[0]:
# Both sets are empty and thus the distance is zero
return 0.
else:
# Exactly one set is empty; the distance is infinite
return np.inf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would use len instead of .shape[0], as that is more readable as "if there are zero points".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 60 to 61
a_points = np.require(a_points, np.float64, ['C'])
b_points = np.require(b_points, np.float64, ['C'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would solve that by adding .astype(np.float64) to the creation calls...

Comment on lines 7 to 8
"""
Calculate the Hausdorff distance between nonzero elements of given images.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should be on the same line as the quotes (many UIs rely on this to render the first line as per PEP257).

Suggested change
"""
Calculate the Hausdorff distance between nonzero elements of given images.
"""Calculate the Hausdorff distance between nonzero elements of given images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member
@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scikit-image/core this is ready imho! (except for a very minor documentation fix that I don't think should hold up merging.)

@jni
Copy link
Member
jni commented Jun 29, 2020

@emmanuelle do you have time to review and 🤞 merge today?

@jni
Copy link
Member
jni commented Jul 1, 2020

ping @scikit-image/core anyone have time to review/potentially merge this PR?

Copy link
Member
@rfezzani rfezzani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we decided to use cKDTree, I think that we no more need to use Cython!
The hausdorff_distance_onesided function can now be defined in the set_metrics.py file (as a hidden function?) and the _set_metrics.pyx and setup.py files can be removed.
The requirements for contiguous arrays becomes also obsolete ;)

@clementkng
Copy link
Contributor Author

@rfezzani Why would hausdorff_distance_onesided need to be a hidden function?

@rfezzani
Copy link
Member
rfezzani commented Jul 2, 2020

@rfezzani Why would hausdorff_distance_onesided need to be a hidden function?

Because it was just used internally by the hausdorff_distance function, but not defining the hausdorff_distance_onesided function as you did is a good option too ;)

@jni jni merged commit 061a22e into scikit-image:master Jul 2, 2020
@jni
Copy link
Member
jni commented Jul 2, 2020

🎉 Thank you for your undeterred patience, @clementkng! It's so awesome to have this in!

@rfezzani
Copy link
Member
rfezzani commented Jul 2, 2020

🎉 thank you @clementkng!

@clementkng
Copy link
Contributor Author

@jni @rfezzani thank you for your reviews! I'm glad I was able to revive this PR after being gone for so long.

@clementkng clementkng mentioned this pull request Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0