-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
FEAT: Flood fill in n-dimensions #3245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @JDWarner! Thanks for updating the PR.
Comment last updated on February 25, 2019 at 03:10 Hours UTC |
Test failures appear entirely unrelated. |
Codecov Report
@@ Coverage Diff @@
## master #3245 +/- ##
==========================================
- Coverage 88% 82.01% -5.99%
==========================================
Files 328 342 +14
Lines 27759 27431 -328
==========================================
- Hits 24428 22498 -1930
- Misses 3331 4933 +1602
Continue to review full report at Codecov.
|
@lagru You were right, removing the additional logic flagging checked points actually was beneficial - small but significant (10-15%) reduction in runtime by chucking the whole thing. Out it goes! Also combined Edit: We've really got a problem with some combination of AppVeyor, SciPy and BLAS. |
@lagru Also happens you were right about handling borders; there was a serious bug where the fill wrapped around. Returned to a similar approach as in extrema, regression tests added. Edit: despite the padding it's actually now faster than before, because the boundary checks are no longer required. |
35aea5c
to
03bc00d
Compare
Making the Cython called directly from Python is the only suggestion I've yet to implement, @lagru; is there concern with changing the Cython functions from pure C to The other point I'm unsure about when changing the architecture is clearing the queue & freeing memory properly. |
For your use case it's pretty straight forward. You only need to follow the schema as detailed in the header of # QueueItem must be defined at this point
include "_queue_with_history.pxi"
# Allocate memory for queues state-storing structure
cdef QueueWithHistory queue
# Initialize state of queue and allocate buffer on heap
# (initial size 64 * sizeof(QueueItem))
queue_init(&queue, 64)
try:
... # Calls to queue_push, queue_pop, queue_restore, queue_clear
finally:
# Ensure buffer on heap is deallocated
queue_exit(&queue)
# Don't use queue after this point. You can safely move the cdef-statement and the try-finally-block into the two
I think it depends on whether you want to anticipate the possibility that the |
@lagru thanks for the pointers (pun intended). Implemented, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good already! I've left a few comments concerning the API, documentation and Cython usage but nothing major except for one thing.
Thanks for the review, @lagru! Everything now implemented except the API suggestion (returning input image when inplace is True). I am not opposed to returning the inplace-modified image, if another couple sets of eyes from @skimage/core agree. If anyone has ideas about alternatives for the lazy imports I'm all ears. Tried a couple things but no working alternatives yet. |
I don't want to push the scope of this PR but an idea to keep around for the future might be to try modifying this into a Scanline Fill. Shouldn't be to hard to do in a successive PR and might yield an even faster function. |
Latest patch is small but pretty neat; Fortran and C-contiguous arrays are both now supported. |
9d8fd35
to
f3f77ab
Compare
@stefanv this is ready for another pass from you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd give the implementation of the two functions my approval but I think the test suite would profit from a few more tests. I don't want to hold this up further so I'd be willing to add those tests later if you don't have the time.
Is there a way to look at the rendered example (plot_floodfill.py) by means of CI artifacts? Likewise for the coverage (#3245 (comment) doesn't seem to be up to date)?
Also, it's a minor nitpick but I guess the bot is used for a reason. Could you have another quick look at #3245 (comment)?
|
||
|
||
def test_neighbors(): | ||
# This test will only pass if the neighbors are exactly correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this statement true? My impression is that this only checks if neighbors are connected in the last dimension?
On connectivity and the structuring element: I guess a large part of that functionality is covered by the tests for watershed
and local_maxima
but ideally this module would have at least some basic tests for the parameters connectivity
and selem
. You could just copy from the tests of local_maxima
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually, it fills a column and then fills a row. Both directions are checked. Given the way the ravelling works, if it is going to break it will break in 2D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll admit I ascribe to a KISS style when it comes to testing in general... connectivity is checked in the docstring example and not replicated here, for example.
I'll add a basic test for selem
but not directly testing selem
exhaustively here, as it's entirely exported to the mechanisms in .extrema.py
which are tested separately.
This is partially motivated by the length of our test suite, which has been an issue with CI builds in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I just find that comment overly broad with potential for confusion. E.g. the test doesn't check for diagonal connections. I'd actually prefer something explicit like the examples from the docstring. Speaking of which, I testing docstrings is a nice feature to ensure that they are correct but I don't agree with the sentiment that they may replace tests. If someone changes them in the future they won't consider test coverage while doing so. I'd prefer the test coverage of a function to be independent of its docstring.
But same as above, feel free to resolve this. This is not worth blocking the PR for.
Coverage bot is just broken, probably related to some of the rebasing along the way. This PR doesn't touch the viewer but it claims huge losses in coverage there. I've made relevant PEP8 tweaks. I invoke PEP8 itself against the remaining supposed PEP8 failures (E226), because per PEP8 one is supposed to omit whitespace in more complex expressions this way for readability. No PEP8 checking algorithms are smart enough to do it right. I blacklist E226 in my personal editors for this reason. I do not know of a way to capture the example outputs in the CI. Matplotlib has certain mechanisms to test that sort of thing, with PNG comparisons and the like, but we do not use them for the examples. |
684f1ee
to
edaa7cd
Compare
edaa7cd
to
f67535a
Compare
|
||
# Shortcut if neighbor is already part of fill | ||
if flags[neighbor] == UNKNOWN: | ||
if image[neighbor] == seed_value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we combine these two conditions into 1 if. My head hurts because of the deep nesting.
|
||
# Only do comparisons on points not (yet) part of fill | ||
if flags[neighbor] == UNKNOWN: | ||
if low_tol <= image[neighbor] <= high_tol: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this syntax correct in cython? or is it doing ((low_tol <= image) <= high_tol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmaarrfk it's correct. Cython tries very hard to be a superset of Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.
I'll just leave an other comment that I really don't like this syntax.
I know that working with lowest common denominators is "bad" but seeing as this causes many bugs in other languages, I think it is a good syntax "trick" to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hmaarrfk I disagree, I think all other languages are broken in this regard and major advantages of Python should be celebrated. =)
Having said that, I don't strongly disagree because, tragically, NumPy arrays don't have this syntax (I guess because Python as far as I know doesn't provide a way to override this behaviour for classes), and that has bitten me a couple of times. Not bitten me hard enough that I introduced a bug, but hard enough that I had to rewrite something because I forgot about this not working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truthfully, my issue is with CPython. The fact that
for i in range(10):
pass
can't be optimized is a travesty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right forum for a chat about the Python language's weaknesses? =)
Anyway, "can't" is such a strong word... CPython is FOSS, so perhaps this can be your contribution to the language? ;)
@JDWarner Not sure if you misunderstood me. My question was more along the line of if it's possible to view the examples as they would appear in on the website without having to clone the PR and build the documentation locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JDWarner Just to be clear, I think this is great work and I'm fine with this as is.
I'd like to formally move to merge this and leave any future improvements to future PRs. The tests cover 100% of the new code — they may not be perfect but full coverage is good enough for me, especially when the PR has been in the works for more than six months. Things like minor syntax fixes, comments cleanup, and better tests can all occur in future PRs. @stefanv the GitHub UI still records you as a dissenting voice, though that may have changed with @JDWarner's latest commits. =) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you very much for the work! @JDWarner I've made several minor remarks. Let me know if you would like to address them, otherwise, I can take care of them.
skimage/morphology/_flood_fill.py
Outdated
------- | ||
mask : ndarray | ||
A Boolean array with the same shape as `image` is returned, with values | ||
equal to 1 for areas connected to and equal (or within tolerance of) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1
-> True, zero
-> False.
@@ -350,7 +350,7 @@ def local_maxima(image, selem=None, connectivity=None, indices=False, | |||
considered as part of the neighborhood. | |||
connectivity : int, optional | |||
A number used to determine the neighborhood of each evaluated pixel. | |||
Adjacent pixels whose squared distance from the center is larger or | |||
Adjacent pixels whose squared distance from the center is less than or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be fixed also in local_minima
.
|
||
output = flood_fill(image, (7, 7, 3), 0, tolerance=379) | ||
|
||
np.testing.assert_equal(output, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.testing
should not be used directly. We have all the necessary wrappers in https://github.com/scikit-image/scikit-image/blob/master/skimage/_shared/testing.py .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that we'd reversed this policy? I remember the comment that having our own testing functions is one more thing to remember "just for skimage", and it resonated very strongly with me. We should use standard things except where it is an undue burden. I realise that the nose -> pytest transition was traumatic, but imho that is not sufficient justification to do our own thing, as we don't expect those transitions to be frequent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Apologies if the above comment comes like 18 months too late or so. =P)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why will anyone want to revert that? 😲 Frankly speaking, I'm hearing about this for the first time.
To me, the thin wrappers in skimage._shared.testing
aren't just for potential convenient switch of the testing library. Besides that:
- It is a single place where all the utilised testing functions and classes are presented. Even after our transition to
pytest
we kept using a lot of functionality fromnumpy.testing
. This is due to the fact that neither of those libraries include all the necessary functions and one is not superset of the other. On top of this,numpy.testing
was still using bothnose
andpytest
under the hood, and for a skimage contributor it was a pain to understand which functions he or she can import from numpy. Having this thin wrappers, a contributor can just importskimage._shared.testing
and enjoy the coding; - Single point of deprecation. When
pytest
ornumpy.testing
will have some functionality deprecated, for us it is much easier to rewrite couple of lines inskimage._shared.testing
than to change few hundreds of them all over the library; - Almost-0 maintenance cost.
If you have a strong argument why we should drop that, I'm happy to discuss, but, otherwise, I'd strongly prefer to keep things as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@soupault I think I just got confused by the discussion here, specifically Mark's comment. But, my overall feeling remains, so I'll address your points:
- I get this, I just don't think that scikit-image should be that one place.
np.testing
is a much more logical place for that one place, especially since_shared
is private, which means we can change things when we feel like it, which means that no outside project can use our wrappers. - In contrast,
np.testing
is a public API that is much less likely to be deprecated. - I would argue that postponing this PR (and countless others because everyone in the SciPy ecosystem is trained to use
np.testing
andwith pytest.raises
) is a maintenance cost.
Re (3), note that we don't have any reference to skimage._shared.testing
in the contributing guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @jni.
I think I was deeply mired in surgical intern year when the conversion to pytest happened. These shared wrappers were unknown to me. Looking around the package, our usage remains heterogeneous. I would have been 👎 at converting anything to using them, unless they bring new important functionality, and regret not being part of that discussion. In fact, I would argue that any of them which are simple pass-through wrappers should not exist at all.
I strongly dislike forcing use of a hidden test API when NumPy's is perfectly suitable. Where certain features may not exist at base in NumPy, we should publicly expose these (even if they are deeply nested, like skimage.util.testing
or similar). If we need something NumPy doesn't provide, surely someone else would want to use this as well. The potential portability advantages are not significant enough to justify a package-specific reimplementation. This is the same reason we generally don't reimplement scipy.ndimage
.
Imagine a new user making a first contribution. They generate a perfectly compliant test suite using guides they find online, via NumPy's suite. We tell them - despite technical correctness - that we won't accept it? Yikes. All this really does is provide additional barriers to contribution.
Barriers to contribution are high enough already, IMHO, and we should be looking to reduce them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Background here: we switched before NumPy, so we had to take care of situations where they used nose specific functionality. The idea behind the wrappers were that there would be a single place to modify things, instead of using pytest-specific decorators, etc.
But, yes, I don't see any reason why we need to maintain our own testing utilities, if the functionality overlaps almost identically with numpy.
On Sun, 24 Feb 2019 17:58:12 -0800, Josh Warner wrote:
Imagine a new user making a first contribution. They generate a
perfectly compliant test suite using guides they find online, via
NumPy's suite. We tell them - despite technical correctness - that we
won't accept it? Yikes. All this really does is provide additional
barriers to contribution.
I don't think we ever do this. We just go and make those stylistic
fixes on the PR, as described in the core developer guide.
The question to be discussed is whether we want to have something
pytest-specific in our test suite; and I'd argue you don't really want
to see the underlying technology show through; we've already made one
testing framework change.
So, having looked at our testing module a bit more carefully, I'll
soften my stance from the previous comment, and say that I think it
makes sense what we have done there. At the same time, I think *not*
allowing functions from `np.testing` to be used is overkill: the
contract of these functions is simple enough: check a condition and
raise if not. No matter what testing framework we choose to use, they
will have to support that.
|
Well, compliance with the above is haphazard, at best. We definitely need to do better here.
In this case, again, we have tons of places where we use pytest fixtures, pytest xfails, and pytest skipifs (try
Additionally, swapping |
@JDWarner I was just about to prod you to fix the merge conflicts. =D Thanks! |
np.testing.assert_equal(output, expected) | ||
|
||
|
||
def test_basic_nd(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Programmatic tests for 3d through 5d, just for @lagru ;)
On Sun, 24 Feb 2019 19:12:08 -0800, Juan Nunez-Iglesias wrote:
> I don't think we ever do this. We just go and make those stylistic fixes on the PR, as described in the core developer guide.
Well, compliance with the above is haphazard, at best. We definitely
need to do better here.
Yes, we should keep reminding one another of what we agreed to as a
team, and we should revise those guidelines when we realize a pattern
described there is not a good fit.
In this case, again, we have tons of places where we use pytest
fixtures, pytest xfails, and pytest skipifs (try `git grep "import
pytest"`). This is the correct approach imho. Test framework
migrations are painful *but rare*, and I disagree with @soupault that
the overhead is minimal. For one, the wrappers are not documented
in-place, so we still have to go check the pytest or numpy
documentation. It's a leaky abstraction.
I think many contributors look at the existing test suite to guide them.
So, as long as we make the entire codebase consistent, it should be
fine. I don't feel particularly strongly about the issue. I think you
can see why the decision was made to abstract these into a module
(during the testing migration period, when everything was in flux all
over the place), and you can see why people would prefer not to have it
any longer.
Let's see how the change looks in a PR, then we can discuss it further
with something tangible in front of us.
|
Thank you, @JDWarner, great piece of work; this is a long-awaited addition! |
Thanks @JDWarner for the very important addition! As usually, well executed. |
Flood fill has arrived!
Builds upon the excellent work in #3022 by @lagru.
I'm very excited to have this functionality in the package! Took a bit of extra time, but I believe this is now fully general; adding the tolerance options makes it more generally useful. Every option I can think of is included. It's quite fast; should be suitable for interactive use in GUI widgets or similar.
Closes #2876.
Hopefully this can be backported to 0.14.
later.
__init__.py
.doc/release/release_dev.rst
.