8000 FEAT: Flood fill in n-dimensions by JDWarner · Pull Request #3245 · scikit-image/scikit-image · GitHub
[go: up one dir, main page]

Skip to content

FEAT: Flood fill in n-dimensions #3245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Feb 26, 2019
Merged

Conversation

JDWarner
Copy link
Contributor
@JDWarner JDWarner commented Jul 5, 2018

Flood fill has arrived!

Builds upon the excellent work in #3022 by @lagru.

I'm very excited to have this functionality in the package! Took a bit of extra time, but I believe this is now fully general; adding the tolerance options makes it more generally useful. Every option I can think of is included. It's quite fast; should be suitable for interactive use in GUI widgets or similar.

Closes #2876.

Hopefully this can be backported to 0.14.

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

@pep8speaks
Copy link
pep8speaks commented Jul 5, 2018

Hello @JDWarner! Thanks for updating the PR.

Line 90:17: E226 missing whitespace around arithmetic operator
Line 101:30: E226 missing whitespace around arithmetic operator
Line 103:57: E226 missing whitespace around arithmetic operator

Line 231:48: E226 missing whitespace around arithmetic operator

Line 176:29: E241 multiple spaces after ','
Line 176:34: E241 multiple spaces after ','
Line 176:39: E241 multiple spaces after ','
Line 176:44: E241 multiple spaces after ','
Line 176:49: E241 multiple spaces after ','
Line 187:27: E201 whitespace after '['
Line 187:31: E241 multiple spaces after ','
Line 187:36: E241 multiple spaces after ','
Line 187:41: E241 multiple spaces after ','
Line 187:46: E241 multiple spaces after ','
Line 187:51: E241 multiple spaces after ','
Line 188:51: E241 multiple spaces after ','
Line 189:51: E241 multiple spaces after ','
Line 190:51: E241 multiple spaces after ','
Line 191:51: E241 multiple spaces after ','
Line 202:44: E226 missing whitespace around arithmetic operator
Line 209:40: E226 missing whitespace around arithmetic operator

Line 151:50: W504 line break after binary operator

Comment last updated on February 25, 2019 at 03:10 Hours UTC

@JDWarner
Copy link
Contributor Author
JDWarner commented Jul 5, 2018

Test failures appear entirely unrelated.

@codecov-io
Copy link
codecov-io commented Jul 6, 2018

Codecov Report

Merging #3245 into master will decrease coverage by 5.98%.
The diff coverage is 90.67%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3245      +/-   ##
==========================================
- Coverage      88%   82.01%   -5.99%     
==========================================
  Files         328      342      +14     
  Lines       27759    27431     -328     
==========================================
- Hits        24428    22498    -1930     
- Misses       3331     4933    +1602
Impacted Files Coverage Δ
skimage/util/__init__.py 100% <100%> (ø) ⬆️
skimage/util/setup.py 38.46% <38.46%> (ø)
skimage/util/_flood_fill.py 95.45% <95.45%> (ø)
skimage/util/tests/test_flood_fill.py 98.33% < 10000 98.33%> (ø)
skimage/viewer/tests/test_plugins.py 18.54% <0%> (-81.46%) ⬇️
skimage/viewer/tests/test_tools.py 19.46% <0%> (-80.54%) ⬇️
skimage/viewer/plugins/lineprofile.py 20.23% <0%> (-76.2%) ⬇️
skimage/viewer/canvastools/recttool.py 13% <0%> (-75.61%) ⬇️
skimage/viewer/tests/test_viewer.py 25.86% <0%> (-74.14%) ⬇️
skimage/viewer/canvastools/linetool.py 20.61% <0%> (-72.17%) ⬇️
... and 143 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e37854...219e290. Read the comment docs.

@JDWarner
Copy link
Contributor Author
JDWarner commented Jul 6, 2018

@lagru You were right, removing the additional logic flagging checked points actually was beneficial - small but significant (10-15%) reduction in runtime by chucking the whole thing. Out it goes!

Also combined if-statements and some minor cleanup.

Edit: We've really got a problem with some combination of AppVeyor, SciPy and BLAS.

@JDWarner
Copy link
Contributor Author
JDWarner commented Jul 6, 2018

@lagru Also happens you were right about handling borders; there was a serious bug where the fill wrapped around. Returned to a similar approach as in extrema, regression tests added.

Edit: despite the padding it's actually now faster than before, because the boundary checks are no longer required.

@JDWarner JDWarner force-pushed the flood_fill_nd branch 2 times, most recently from 35aea5c to 03bc00d Compare July 6, 2018 06:02
@JDWarner
Copy link
Contributor Author
JDWarner commented Jul 11, 2018

Making the Cython called directly from Python is the only suggestion I've yet to implement, @lagru; is there concern with changing the Cython functions from pure C to cpdef hybrids? I thought there was some benefit to a Cython hybrid intermediary to get to pure C, but happy to do this if it would be beneficial.

The other point I'm unsure about when changing the architecture is clearing the queue & freeing memory properly.

@lagru
Copy link
Member
lagru commented Jul 11, 2018

@JDWarner

The other point I'm unsure about when changing the architecture is clearing the queue & freeing memory properly.

For your use case it's pretty straight forward. You only need to follow the schema as detailed in the header of queue_with_history.pyx:

# QueueItem must be defined at this point
include "_queue_with_history.pxi"
# Allocate memory for queues state-storing structure
cdef QueueWithHistory queue
# Initialize state of queue and allocate buffer on heap 
# (initial size 64 * sizeof(QueueItem))
queue_init(&queue, 64)
try:
    ...  # Calls to queue_push, queue_pop, queue_restore, queue_clear
finally:
    # Ensure buffer on heap is deallocated
    queue_exit(&queue)
    # Don't use queue after this point.

You can safely move the cdef-statement and the try-finally-block into the two _flood_fill_x functions if you choose to remove the first function.

Making the Cython called directly from Python is the only suggestion I've yet to implement, @lagru; is there concern with changing the Cython functions from pure C to cpdef hybrids? I thought there was some benefit to a Cython hybrid intermediary to get to pure C, but happy to do this if it would be beneficial.

I think it depends on whether you want to anticipate the possibility that the _flood_fill_x function will be directly integrated in Cython-code elsewhere. In that case a cpdef-statement would be the way to go.
However right now declaring both functions as def should be enough because it can easily be changed to cpdef when it's actually needed.
I'm not sure about the actual performance differences between calling cpdef and cdef from Cython. You may want to look here (Cython Function Declarations) and here (How Fast are def cdef cpdef?).

@JDWarner
Copy link
Contributor Author
JDWarner commented Jul 11, 2018

@lagru thanks for the pointers (pun intended). Implemented, with cpdef. Tested, no significant change in execution time but it is now simpler.

Copy link
Member
@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good already! I've left a few comments concerning the API, documentation and Cython usage but nothing major except for one thing.

@JDWarner
Copy link
Contributor Author

Thanks for the review, @lagru! Everything now implemented except the API suggestion (returning input image when inplace is True). I am not opposed to returning the inplace-modified image, if another couple sets of eyes from @skimage/core agree.

If anyone has ideas about alternatives for the lazy imports I'm all ears. Tried a couple things but no working alternatives yet.

@lagru
Copy link
Member
lagru commented Jul 13, 2018

I don't want to push the scope of this PR but an idea to keep around for the future might be to try modifying this into a Scanline Fill. Shouldn't be to hard to do in a successive PR and might yield an even faster function.

@JDWarner
Copy link
Contributor Author

Latest patch is small but pretty neat; Fortran and C-contiguous arrays are both now supported.

@jni
Copy link
Member
jni commented Feb 15, 2019

@stefanv this is ready for another pass from you!

Copy link
Member
@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd give the implementation of the two functions my approval but I think the test suite would profit from a few more tests. I don't want to hold this up further so I'd be willing to add those tests later if you don't have the time.

Is there a way to look at the rendered example (plot_floodfill.py) by means of CI artifacts? Likewise for the coverage (#3245 (comment) doesn't seem to be up to date)?

Also, it's a minor nitpick but I guess the bot is used for a reason. Could you have another quick look at #3245 (comment)?



def test_neighbors():
# This test will only pass if the neighbors are exactly correct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this statement true? My impression is that this only checks if neighbors are connected in the last dimension?

On connectivity and the structuring element: I guess a large part of that functionality is covered by the tests for watershed and local_maxima but ideally this module would have at least some basic tests for the parameters connectivity and selem. You could just copy from the tests of local_maxima.

Copy link
Contributor Author
@JDWarner JDWarner Feb 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is actually, it fills a column and then fills a row. Both directions are checked. Given the way the ravelling works, if it is going to break it will break in 2D.

Copy link
Contributor Author
@JDWarner JDWarner Feb 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll admit I ascribe to a KISS style when it comes to testing in general... connectivity is checked in the docstring example and not replicated here, for example.

I'll add a basic test for selem but not directly testing selem exhaustively here, as it's entirely exported to the mechanisms in .extrema.py which are tested separately.

This is partially motivated by the length of our test suite, which has been an issue with CI builds in the past.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I just find that comment overly broad with potential for confusion. E.g. the test doesn't check for diagonal connections. I'd actually prefer something explicit like the examples from the docstring. Speaking of which, I testing docstrings is a nice feature to ensure that they are correct but I don't agree with the sentiment that they may replace tests. If someone changes them in the future they won't consider test coverage while doing so. I'd prefer the test coverage of a function to be independent of its docstring.

But same as above, feel free to resolve this. This is not worth blocking the PR for.

@JDWarner
Copy link
Contributor Author
JDWarner commented Feb 17, 2019

Coverage bot is just broken, probably related to some of the rebasing along the way. This PR doesn't touch the viewer but it claims huge losses in coverage there.

I've made relevant PEP8 tweaks. I invoke PEP8 itself against the remaining supposed PEP8 failures (E226), because per PEP8 one is supposed to omit whitespace in more complex expressions this way for readability. No PEP8 checking algorithms are smart enough to do it right. I blacklist E226 in my personal editors for this reason.

I do not know of a way to capture the example outputs in the CI. Matplotlib has certain mechanisms to test that sort of thing, with PNG comparisons and the like, but we do not use them for the examples.


# Shortcut if neighbor is already part of fill
if flags[neighbor] == UNKNOWN:
if image[neighbor] == seed_value:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine these two conditions into 1 if. My head hurts because of the deep nesting.


# Only do comparisons on points not (yet) part of fill
if flags[neighbor] == UNKNOWN:
if low_tol <= image[neighbor] <= high_tol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this syntax correct in cython? or is it doing ((low_tol <= image) <= high_tol)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmaarrfk it's correct. Cython tries very hard to be a superset of Python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok.

I'll just leave an other comment that I really don't like this syntax.
I know that working with lowest common denominators is "bad" but seeing as this causes many bugs in other languages, I think it is a good syntax "trick" to avoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmaarrfk I disagree, I think all other languages are broken in this regard and major advantages of Python should be celebrated. =)

Having said that, I don't strongly disagree because, tragically, NumPy arrays don't have this syntax (I guess because Python as far as I know doesn't provide a way to override this behaviour for classes), and that has bitten me a couple of times. Not bitten me hard enough that I introduced a bug, but hard enough that I had to rewrite something because I forgot about this not working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truthfully, my issue is with CPython. The fact that

for i in range(10):
    pass

can't be optimized is a travesty

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right forum for a chat about the Python language's weaknesses? =)

Anyway, "can't" is such a strong word... CPython is FOSS, so perhaps this can be your contribution to the language? ;)

@lagru
Copy link
Member
lagru commented Feb 18, 2019

I do not know of a way to capture the example outputs in the CI. Matplotlib has certain mechanisms to test that sort of thing, with PNG comparisons and the like, but we do not use them for the examples.

@JDWarner Not sure if you misunderstood me. My question was more along the line of if it's possible to view the examples as they would appear in on the website without having to clone the PR and build the documentation locally.

Copy link
Member
@lagru lagru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JDWarner Just to be clear, I think this is great work and I'm fine with this as is.

@jni
Copy link
Member
jni commented Feb 19, 2019

I'd like to formally move to merge this and leave any future improvements to future PRs. The tests cover 100% of the new code — they may not be perfect but full coverage is good enough for me, especially when the PR has been in the works for more than six months. Things like minor syntax fixes, comments cleanup, and better tests can all occur in future PRs.

@stefanv the GitHub UI still records you as a dissenting voice, though that may have changed with @JDWarner's latest commits. =)

@soupault
Copy link
Member

My question was more along the line of if it's possible to view the examples as they would appear in on the website without having to clone the PR and build the documentation locally.

@lagru Not yet, unfortunately. The relevant issue is here - #2269 .

Copy link
Member
@soupault soupault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you very much for the work! @JDWarner I've made several minor remarks. Let me know if you would like to address them, otherwise, I can take care of them.

-------
mask : ndarray
A Boolean array with the same shape as `image` is returned, with values
equal to 1 for areas connected to and equal (or within tolerance of)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 -> True, zero -> False.

@@ -350,7 +350,7 @@ def local_maxima(image, selem=None, connectivity=None, indices=False,
considered as part of the neighborhood.
connectivity : int, optional
A number used to determine the neighborhood of each evaluated pixel.
Adjacent pixels whose squared distance from the center is larger or
Adjacent pixels whose squared distance from the center is less than or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed also in local_minima.


output = flood_fill(image, (7, 7, 3), 0, tolerance=379)

np.testing.assert_equal(output, expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.testing should not be used directly. We have all the necessary wrappers in https://github.com/scikit-image/scikit-image/blob/master/skimage/_shared/testing.py .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that we'd reversed this policy? I remember the comment that having our own testing functions is one more thing to remember "just for skimage", and it resonated very strongly with me. We should use standard things except where it is an undue burden. I realise that the nose -> pytest transition was traumatic, but imho that is not sufficient justification to do our own thing, as we don't expect those transitions to be frequent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Apologies if the above comment comes like 18 months too late or so. =P)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why will anyone want to revert that? 😲 Frankly speaking, I'm hearing about this for the first time.

To me, the thin wrappers in skimage._shared.testing aren't just for potential convenient switch of the testing library. Besides that:

  1. It is a single place where all the utilised testing functions and classes are presented. Even after our transition to pytest we kept using a lot of functionality from numpy.testing. This is due to the fact that neither of those libraries include all the necessary functions and one is not superset of the other. On top of this, numpy.testing was still using both nose and pytest under the hood, and for a skimage contributor it was a pain to understand which functions he or she can import from numpy. Having this thin wrappers, a contributor can just import skimage._shared.testing and enjoy the coding;
  2. Single point of deprecation. When pytest or numpy.testing will have some functionality deprecated, for us it is much easier to rewrite couple of lines in skimage._shared.testing than to change few hundreds of them all over the library;
  3. Almost-0 maintenance cost.

If you have a strong argument why we should drop that, I'm happy to discuss, but, otherwise, I'd strongly prefer to keep things as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soupault I think I just got confused by the discussion here, specifically Mark's comment. But, my overall feeling remains, so I'll address your points:

  1. I get this, I just don't think that scikit-image should be that one place. np.testing is a much more logical place for that one place, especially since _shared is private, which means we can change things when we feel like it, which means that no outside project can use our wrappers.
  2. In contrast, np.testing is a public API that is much less likely to be deprecated.
  3. I would argue that postponing this PR (and countless others because everyone in the SciPy ecosystem is trained to use np.testing and with pytest.raises) is a maintenance cost.

Re (3), note that we don't have any reference to skimage._shared.testing in the contributing guide.

Copy link
Contributor Author
@JDWarner JDWarner Feb 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @jni.

I think I was deeply mired in surgical intern year when the conversion to pytest happened. These shared wrappers were unknown to me. Looking around the package, our usage remains heterogeneous. I would have been 👎 at converting anything to using them, unless they bring new important functionality, and regret not being part of that discussion. In fact, I would argue that any of them which are simple pass-through wrappers should not exist at all.

I strongly dislike forcing use of a hidden test API when NumPy's is perfectly suitable. Where certain features may not exist at base in NumPy, we should publicly expose these (even if they are deeply nested, like skimage.util.testing or similar). If we need something NumPy doesn't provide, surely someone else would want to use this as well. The potential portability advantages are not significant enough to justify a package-specific reimplementation. This is the same reason we generally don't reimplement scipy.ndimage.

Imagine a new user making a first contribution. They generate a perfectly compliant test suite using guides they find online, via NumPy's suite. We tell them - despite technical correctness - that we won't accept it? Yikes. All this really does is provide additional barriers to contribution.

Barriers to contribution are high enough already, IMHO, and we should be looking to reduce them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Background here: we switched before NumPy, so we had to take care of situations where they used nose specific functionality. The idea behind the wrappers were that there would be a single place to modify things, instead of using pytest-specific decorators, etc.

But, yes, I don't see any reason why we need to maintain our own testing utilities, if the functionality overlaps almost identically with numpy.

@stefanv
Copy link
Member
stefanv commented Feb 25, 2019 via email

@jni
Copy link
Member
jni commented Feb 25, 2019

I don't think we ever do this. We just go and make those stylistic fixes on the PR, as described in the core developer guide.

Well, compliance with the above is haphazard, at best. We definitely need to do better here.

The question to be discussed is whether we want to have something pytest-specific in our test suite; and I'd argue you don't really want to see the underlying technology show through; we've already made one testing framework change.

In this case, again, we have tons of places where we use pytest fixtures, pytest xfails, and pytest skipifs (try git grep "import pytest"). This is the correct approach imho. Test framework migrations are painful but rare, and I disagree with @soupault that the overhead is minimal. For one, the wrappers are not documented in-place, so we still have to go check the pytest or numpy documentation. It's a leaky abstraction.

I think not allowing functions from np.testing to be used is overkill

Additionally, swapping np.testing.assert_* with an alternative throughout our codebase is quite easy these days with editors like PyCharm.

@jni
Copy link
Member
jni commented Feb 25, 2019

@JDWarner I was just about to prod you to fix the merge conflicts. =D Thanks!

np.testing.assert_equal(output, expected)


def test_basic_nd():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Programmatic tests for 3d through 5d, just for @lagru ;)

@stefanv
Copy link
Member
stefanv commented Feb 25, 2019 via email

@jni
Copy link
Member
jni commented Feb 26, 2019

This now has three approvals and three comments, and the single Travis failure is related to #3767, so IMMA GONNA MERGE! 😀 Thanks @JDWarner, a headline feature for 0.15!

@jni jni merged commit ba67e1a into scikit-image:master Feb 26, 2019
@stefanv
Copy link
Member
stefanv commented Feb 27, 2019

Thank you, @JDWarner, great piece of work; this is a long-awaited addition!

@soupault
Copy link
Member

Thanks @JDWarner for the very important addition! As usually, well executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants
0