8000 Restandardize handling of Multi-Image files by maaleske · Pull Request #2815 · scikit-image/scikit-image · GitHub
[go: up one dir, main page]

Skip to content

Restandardize handling of Multi-Image files #2815

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 41 commits into from
Dec 12, 2020

Conversation

maaleske
Copy link
Contributor
@maaleske maaleske commented Sep 28, 2017

Description

Checklist

[It's fine to submit PRs which are a work in progress! But before they are merged, all PRs should provide:]

References

Closes #2811

For reviewers

(Don't remove the checklist below.)

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.

8000
@stefanv
Copy link
Member
stefanv commented Oct 3, 2017

The tests are legitimately failing on Travis-CI for this PR.

@maaleske
Copy link
Contributor Author

I think I resolved the test failures by explicitly passing load_func=None in MultiImage to trigger the old behaviour. I could not find a reason for the special imread definition there, and removing it doesn't seem to affect tests. This feels slightly odd and also makes MultiImage quite redundant (all it does is set the _filename attribute).

The whole thing feel slightly fragile, since even specifying just load_func=imread automatically removes the flattening of multi-frame files. It seems no tests catch this with imread_collection, which does exactly that. I don't what the better way would be. Should there perhaps be an explicit parameter for enabling the flattening behaviour?

@stefanv
Copy link
Member
stefanv commented Oct 10, 2017

My intuition is that we should never flatten. Each file should map to one image (array).

We can build a special interface to load multi-images and unpack them, but as you suggest, that should be explicit.

@maaleske
Copy link
Contributor Author

I separated all the multi-image functionality from ImageCollection to a inherited class FrameCollection and implemented MultiImage on top of that. Now ImageCollection is completely frame-agnostic, and just sees each frame as its own image, with _files corresponding to the _frame_index. MultiImage is now just a simple passthrough to Framecollection, and I think it could probably be deprecated as it adds no real functionality (as was apparently discussed before: #1200). Any comments on the class organization?

There's still some niggles to fix before it's ready for merging:

  • The current _files should probably be renamed to _ids for clarity, since it doesn't consistently refer to files (possibly adding a property which always does?)
  • _frame_index needs to be updated when slicing FrameCollections (was in ImageCollection)
  • FrameCollection does not yet accept any special load_func

@maaleske maaleske changed the title [WIP] Fix inconsistency of _numframes in ImageCollection [WIP] Restandardize handling of Multi-Image files Oct 13, 2017
@maaleske
Copy link
Contributor Author

I added the load_func parameter for FrameCollection, and resolved the slicing behaviour of _frame_index by removing it altogether (since it was redundant with _files in the returned collection). The original list of searched files can now be accessed as _searched_files, which should be helpful for anyone trying to debug further issues with multi-images.

Existing functionality should now work as it was, just with FrameCollection instead of ImageCollection.

@stefanv
Copy link
Member
stefanv commented Oct 16, 2017

Doesn't FrameCollection and MultiImage now do the same when called on a multi-frame image?

@maaleske
Copy link
Contributor Author

@stefanv Yes, FrameCollection is just the existing MultiImage, that now also handles multiple images. MultiImage probably should be deprecated in favor of FrameCollection, with warnings before removing it completely?

@stefanv
Copy link
Member
stefanv commented Oct 16, 2017

Why not just replace MultiImage with the FrameCollection implementation? Do they have differing APIs?

@maaleske
Copy link
Contributor Author
maaleske commented Oct 16, 2017

The API is different in that FrameCollection handles multiple multi-images. I'm not sure what you mean by replacing, do you want to keep the MultiImage class name (even though it now handles multiple files as well)?

@stefanv
Copy link
Member
stefanv commented Oct 16, 2017

I think, since we only added functionality, and since the old calls will still work, we can probably keep the MultiImage name, yes.

@maaleske
Copy link
Contributor Author

I figured it would be more clear to make a new class that explicitly handles any sets of frames, but I guess keeping MultiImage as the name is not too confusing.

@stefanv
Copy link
Member
stefanv commented Oct 16, 2017

Another option is to add an unpack_frames keyword to ImageCollection, and deprecate MultiImage.

@maaleske
Copy link
Contributor Author

Adding the parameter would make for a lot of branching in ImageCollection. Having the FrameCollection/MultiImage just inherit ImageCollection makes the latter much cleaner.

@maaleske
Copy link
Contributor Author

There is the small API change of MultiImage having the filename property which doesn't make sense in a multiple file setting, and would be best removed (which I guess was why I thought about making the separate class in the first place).

@stefanv
Copy link
Member
stefanv commented Oct 16, 2017

OK, then let's keep MultiImage, and just set the filename parameter to the file pattern, for backward compatibility. We can make it a property, and warn that that property is being deprecated.

@stefanv
Copy link
Member
stefanv commented Oct 16, 2017

Or, just keep it, and have it contain the file pattern :) Since this is new functionality, no-one should notice.

Copy link
Member
@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks!

I left some minor nitpicks, but overall good to go.

Is your name in the CONTRIBUTORS file, by the way? Please add it.

load_pattern : str or list
Pattern glob or filenames to load. The path can be absolute or
relative. Multiple patterns should be separated by os.pathsep,
e.g. '/tmp/work/*.png:/tmp/other/*.jpg'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backticks for code, ``/tmp/work/*.png:/tmp/other/*.jpg``

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for ``os.pathsep`` above

The current implementation makes use of ``tifffile`` for Tiff files and
PIL otherwise.
By default, MultiImage builds a list of frames in each image
and calls ``imread(filename, img_num=n)`` to read the n:th frame in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some indication that it is being called for each n

@maaleske
Copy link
Contributor Author

Done. To summarize the current version as compared to master:

  • ImageCollection now always deals with single items (files, indices, etc.) set by load_pattern (and possibly load_func) and never flattens images.
  • MultiImage implementation is now inherited from ImageCollection and just passes it the _frame_index (as built previously by _find_images) and a wraps imread or load_func to accept the pairs as input.
  • As a resulting new feature, MultiImage can now handle multiple files with a special load_func (which must then accept the img_num parameter)

@stefanv
Copy link
Member
stefanv commented Oct 16, 2017

This looks great; thanks @maaleske!

@stefanv
Copy link
Member
stefanv commented May 1, 2018

@scikit-image/core This PR went through review and has been approved; if another person could take a look, rebase and merge, please.

@soupault soupault added this to the 0.14 milestone May 6, 2018
@soupault soupault changed the title [WIP] Restandardize handling of Multi-Image files Restandardize handling of Multi-Image files May 6, 2018
@mkcor
Copy link
Member
mkcor commented Nov 18, 2020

I have deleted my comment starting with "Sorry for the gross series of 'Remove extra lines' commits..." -- I thought I wouldn't be able to push from my local branch and, hence, had to work with the online editor (because I couldn't when I wasn't core, but anyway).

@mkcor
Copy link
Member
mkcor commented Nov 18, 2020

I have tried updating this PR in line with #3928 but I'm afraid I have lost the original spirit of this contribution along the way... 😞 There are now 3 failing tests in test_collection.py, which were added by the original author, i.e.:

  • FAILED io/tests/test_collection.py::TestImageCollection::test_custom_load - T...
    as expected since load_pattern = [(1, 'one'), (2, 'two')] is not a string nor a list of strings;

  • FAILED io/tests/test_collection.py::TestImageCollection::test_custom_pattern
    indeed, glob('PREFIX/.cache/scikit-image/master/data/b??ck.png') results in ['PREFIX/.cache/scikit-image/master/data/brick.png'] and not in ['PREFIX/.cache/scikit-image/master/data/block.png', 'PREFIX/.cache/scikit-image/master/data/brick.png'] as 'expected;'

  • FAILED io/tests/test_collection.py::TestImageCollection::test_custom_pattern_not_pil
    where pattern PREFIX/.cache/scikit-image/master/data/chessboard_*_U8.npz' doesn't get matched to the expected ['chessboard_GRAY_U8.npz', 'chessboard_RGB_U8.npz']

Note that function _find_files doesn't get used at this point.

@jni could you please have a look? Thanks 😬

@mkcor
Copy link
Member
mkcor commented Nov 19, 2020

Forgot to loop in @alexdesiqueira, especially when referring to #3928 🙏

@mkcor
Copy link
Member
mkcor commented Nov 19, 2020

I realize this PR is three years old, so it might be unrealistic to include it at this point... I thought I would give it a try, since it was tagged with action: mrg+1 as well as 0.18.

Note that function _find_files doesn't get used at this point.

Removed in fad090c.

* FAILED io/tests/test_collection.py::TestImageCollection::test_custom_load - T...
  as expected since `load_pattern = [(1, 'one'), (2, 'two')]` is not a string nor a list of strings;

Removed in 335be36.

@maaleske
Copy link
Contributor Author

Since the pattern parameter was changed, the tests I added are probably outdated since they were meant to check for differences w.r.t. the original behaviour. It's probably best to disregard them and just make sure that you test whatever is needed to check for the issues in #5069 and #2811 (if that one still happens).

@mkcor
Copy link
Member
mkcor commented Nov 19, 2020

@maaleske great to read you!! Sure. Specifically, we want test_custom_load to fail, because the current approach is to support only str and list of str patterns. That's why I removed it. But, in my understanding so far, I think we would want test_custom_pattern to pass... Good point anyway, about focusing on the issues which are currently open. Thank you!

@alexdesiqueira
Copy link
Member
alexdesiqueira commented Nov 20, 2020

@maaleske great to see you around, brother! Hope we can get a beer one of these days.
@mkcor I'm sorry, I didn't see we're working on this at #4859 — I told* @jni I would check but it slipped away. Could we check that one instead?

@mkcor
Copy link
Member
mkcor commented Nov 20, 2020

Oh, I wasn't aware of #4859... 🤦‍♀️ It's okay: I've spent enough time in this code by now that I would be fast at reworking either PR. Which one of the two PRs do we want to close, @jni? @alexdesiqueira I don't think we want to keep both in parallel, indeed.

My general comment remains the same: My understanding is that we want to remain in the spirit of #3928 and build from there. Notably, we want to accept loading patterns which are either strings or lists of strings (not lists of anything as in the original contribution). Besides, if we keep (as I suggest we do) the handling of multi-pattern brought by #3928, the detection of frames as implemented in the original contribution fails at splitting frames within a given (multi-image) file. Therefore, we can only keep a marginal part of this contribution on the functionality side. But we should keep most tests brought by this contribution (yay!) even if some need adapting.

@jni
Copy link
Member
jni commented Nov 21, 2020

@mkcor @alexdesiqueira I gotta say I don't understand this stuff anymore! 😅 To me it seems that either PR can go forward, #4859 was a simple rebase. Since @mkcor you have made some nice updates here, I think we can just close #4859 and keep going here!

@alexdesiqueira
Copy link
Member

@mkcor @jni I apologize, that was on me. Thank you for understanding Juan!

@mkcor
Copy link
Member
mkcor commented Nov 27, 2020

@jni I don't know if we can squeeze this fix into 0.18... Anyway, I'm happy to be done with this resurrected PR! 😌

@cosinus11 eeffe8a fixes #5069.

In the end, I respected the original vision outlined by @maaleske and @stefanv:

My intuition is that we should never flatten. Each file should map to one image (array).
We can build a special interface to load multi-images and unpack them, but as you suggest, that should be explicit.

Namely, an image collection of (multiple) multi-frame file(s) has a tree structure when specifying a loading function (load_func parameter): ic[0][0] is the first frame from the first file, ic[0][1] is the second frame from the first file, etc.

The changes brought by this PR don't overwrite those brought by #3928 -- still, it would be great if you could review, @alexdesiqueira! 😄

@mkcor
Copy link
Member
mkcor commented Nov 27, 2020

Additional notes:

@cosinus11
Copy link

@jni I don't know if we can squeeze this fix into 0.18... Anyway, I'm happy to be done with this resurrected PR! 😌

@cosinus11 eeffe8a fixes #5069.

In the end, I respected the original vision outlined by @maaleske and @stefanv:

My intuition is that we should never flatten. Each file should map to one image (array).
We can build a special interface to load multi-images and unpack them, but as you suggest, that should be explicit.

Namely, an image collection of (multiple) multi-frame file(s) has a tree structure when specifying a loading function (load_func parameter): ic[0][0] is the first frame from the first file, ic[0][1] is the second frame from the first file, etc.

The changes brought by this PR don't overwrite those brought by #3928 -- still, it would be great if you could review, @alexdesiqueira! 😄

Thank you very much for your attention and your reactivity, amazing!
For your information, I use ImageCollection for reading 3D images for tomography (one file for 2048x2048x2048 voxels), in connection with PIL I read this images directly in raw format provided by the reconstruction software.

@stefanv stefanv merged commit e0ad3f6 into scikit-image:master Dec 12, 2020
@stefanv
Copy link
Member
stefanv commented Dec 12, 2020

Thank you all for your feedback and work put into this PR; given that this has been reviewed by more eyes than most other PRs, I am putting it in. Thanks @maaleske !

@jni
Copy link
Member
jni commented Dec 12, 2020

@meeseeksdev backport to v0.18.x

@jni
Copy link
Member
jni commented Dec 12, 2020

Wow, thank you so much @maaleske for the original work, and @alexdesiqueira @mkcor for the resurrection, and @stefanv for the merge! Amazing to not have this looming over my head anymore! 😂

jni pushed a commit that referenced this pull request Dec 12, 2020
Co-authored-by: Matti Eskelinen <matti.a.eskelinen@gmail.com>
@alexdesiqueira
Copy link
Member
alexdesiqueira commented Dec 12, 2020

Thank you very much everyone! Sorry for not reviewing again* @mkcor, thank you for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏩ type: Enhancement Improve existing features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent behaviour of ImageCollection

0