8000 ImageCollection is able to receive lists of patterns by alexdesiqueira · Pull Request #3928 · scikit-image/scikit-image · GitHub
[go: up one dir, main page]

Skip to content

ImageCollection is able to receive lists of patterns #3928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Sep 30, 2019

Conversation

alexdesiqueira
Copy link
Member

Description

Users can pass a list of patterns to ImageCollection, instead of a string.

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

@alexdesiqueira alexdesiqueira requested a review from stefanv May 24, 2019 19:45
@alexdesiqueira
Copy link
Member Author

OK, I'll consider that the implemented tests think on load_pattern as being images themselves, already.

@@ -104,6 +98,9 @@ class ImageCollection(object):

Notes
-----
Note that files are always stored in alphabetical order. Also note that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is false, it's alphanumeric order, im1.jpg will come before im10.jpg. My second contribution ever to scikit-image. =D

"""

def __init__(self, load_pattern, conserve_memory=True, load_func=None,
**load_func_kwargs):
"""Load and manage a collection of images."""
if isinstance(load_pattern, str):
try:
if not isinstance(load_pattern, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check should be more stringent: it should not be a string, but it should be a sequence.

Also, try/except clauses should contain as little content as possible. I think this one encompasses too many statements.

Copy link
Member Author
@alexdesiqueira alexdesiqueira May 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to fix that with:

if isinstance(load_pattern, (list, tuple)):

IMHO this try/except is as good as it gets. It receives the three types of inputs we'd like to put there, and it's rather efficient when compared to an if isinstance(load_pattern, (str, list, tuple). That could also bring us issues in a more difficult type of input:

load_pattern = [(1, 'one'), (2, 'two')]
In []: isinstance(load_pattern, (list, tuple))                     
Out[]: True

I don't know... Is there something I'm missing here? How could we solve that other way? Thanks for your help Juan!

relative. Multiple patterns should be separated by os.pathsep,
e.g. '/tmp/work/*.png:/tmp/other/*.jpg'. Also see
implementation notes below.
relative. Also see implementation notes below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should include all possible inputs here, including the pathsep. As far as I know we don't intend to deprecate that? We could keep the original message but include the statement about windows vs *nix, and also the option to pass in a list of patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They still work; I tried to state that in the documentation. Is it good?

Copy link
Member
@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A great start! And thanks for tackling this! In addition to my minor comments, you should include a test for the new behaviour.

@alexdesiqueira alexdesiqueira requested a review from jni May 25, 2019 15:30

If multiple patterns are entered as a single string, they should be
separated by os.pathsep, e.g. '/tmp/work/*.png:/tmp/other/*.jpg' in *nix,
or '/tmp/work/*.png;/tmp/other/*.jpg' in Windows.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@jni
Copy link
Member
jni commented May 26, 2019

@alexandrejaguar the problem with the TypeError syntax is that there's lots of statements there, any of which could potentially raise a TypeError that is not actually the error we're looking for.

There is also a design philosophy that I mostly subscribe to that "exceptions are for exceptional circumstances", and you shouldn't use them for "normal" and "expected" logic flow.

My suggestion: create a helper function, is_multipattern, that checks the following condition:

import typing

((isinstance(load_pattern, str) and os.pathsep in load_pattern) or
 (not isinstance(load_pattern, str) and
  isinstance(load_pattern, typing.Sequence) and
  all(isinstance(pattern, str) for pattern in load_pattern))
)

Then:

if is_multipattern(load_pattern):
    if isinstance(load_pattern, str):
        load_pattern = load_pattern.split(os.pathsep)
    ...
elif isinstance(load_pattern, str):
    ...  # single pattern logic
else:
    # Raise TypeError

This might also be a good opportunity for us to start using progressive typing:

from typing import Sequence, Union as U

def imread_collection(load_pattern : U[str, Sequence[str]], ...):

Then people's IDEs, at least, will tell them that [(1, 'one')] is invalid input for the function.

@alexdesiqueira
Copy link
Member Author

@jni sounds good, let's see what I got.

@stefanv
Copy link
Member
stefanv commented May 27, 2019 via email

@jni
Copy link
Member
jni commented May 27, 2019

It may be best to be honest: "For backward compatibility, we also accept a string of patterns delimited by os.pathsep."

I agree with this wording, thanks @stefanv! I would add only "... delimited by os.pathsep (which is ":" on Mac and Linux platforms, and ";" on Windows)."

I think it's important to be clear on this in case someone is trying to figure out what's wrong with the code that was written on one platform and is now broken on another.

@alexdesiqueira
Copy link
Member Author

It may be best to be honest: "For backward compatibility, we also accept a string of patterns delimited by os.pathsep."

I agree with this wording, thanks @stefanv! I would add only "... delimited by os.pathsep (which is ":" on Mac and Linux platforms, and ";" on Windows)."

I think it's important to be clear on this in case someone is trying to figure out what's wrong with the code that was written on one platform and is now broken on another.

I'm more into the wordy version as well, with a little modification: "... delimited by os.pathsep (";" on Windows, ":" on Mac and Linux)."

@jni
Copy link
Member
jni commented May 27, 2019

@alexandrejaguar nice job making it more concise! I'm all for it!

@alexdesiqueira
Copy link
Member Author

Then people's IDEs, at least, will tell them that [(1, 'one')] is invalid input for the function.

@jni I understand the concern with TypeError, and thanks for your example! Let's be clear then: we'll not accept entries as [(1, 'one'), (2, 'two')] in ImageCollection() anymore?
If not, we'll have to remove one of the tests as well.

@pep8speaks
Copy link
pep8speaks commented May 27, 2019

Hello @alexdesiqueira! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 91:42: W504 line break after binary operator

Comment last updated at 2019-09-17 00:54:07 UTC

@jni
Copy link
Member
jni commented May 27, 2019

Let's be clear then: we'll not accept entries as [(1, 'one'), (2, 'two')] in ImageCollection() anymore?

Wait, what? We accept those? What does it mean?

@alexdesiqueira
Copy link
Member Author
alexdesiqueira commented May 27, 2019 via email

@stefanv
Copy link
Member
stefanv commented May 27, 2019

If it is supported API with a test I would be very hesitant to remove it without further investigation. Was this perhaps to support "frame generators" that output frame number + image file combos? Or perhaps a way to provide explicit ordering? Is it used internally that way, without ever having been externally advertised? In that case, we could potentially just change it.

@alexdesiqueira
Copy link
Member Author

If it is supported API with a test I would be very hesitant to remove it without further investigation.

@stefanv @jni that's what I'm afraid of. This is the test:

================================== FAILURES ===================================
____________________ TestImageCollection.test_custom_load _____________________
self = <skimage.io.tests.test_collection.TestImageCollection testMethod=test_custom_load>
    def test_custom_load(self):
        load_pattern = [(1, 'one'), (2, 'two')]
    
        def load_fn(x):
            return x
    
>       ic = ImageCollection(load_pattern, load_func=load_fn)
python35-x64\lib\site-packages\skimage\io\tests\test_collection.py:83: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <skimage.io.collection.ImageCollection object at 0x000000B388F8FF60>
load_pattern = [(1, 'one'), (2, 'two')], conserve_memory = True
load_func = <function TestImageCollection.test_custom_load.<locals>.load_fn at 0x000000B3885900D0>
load_func_kwargs = {}
    def __init__(self, load_pattern, conserve_memory=True, load_func=None,
                 **load_func_kwargs):
        """Load and manage a collection of images."""
        self._files = []
        if _is_multipattern(load_pattern):
            if isinstance(load_pattern, str):
                load_pattern = load_pattern.split(os.pathsep)
            for pattern in load_pattern:
                self._files.extend(glob(pattern))
            self._files = sorted(self._files, key=alphanumeric_key)
            self._numframes = self._find_images()
        elif isinstance(load_pattern, str):
            self._files.extend(glob(load_pattern))
            self._files = sorted(self._files, key=alphanumeric_key)
            self._numframes = self._find_images()
        else:
>           raise TypeError('Invalid pattern as input.')
E           TypeError: Invalid pattern as input.
python35-x64\lib\site-packages\skimage\io\collection.py:184: TypeError

Is it used internally that way, without ever having been externally advertised?

I'll ckeck it. 🔍

@alexdesiqueira
Copy link
Member Author

Got it (skimage/io/tests/test_fits.py):

@testing.skipif(not pyfits_available, reason="pyfits not installed")
def test_imread_collection_single_MEF():
    io.use_plugin('fits')
    testfile = os.path.join(data_dir, 'multi.fits')
    ic1 = io.imread_collection(testfile)
    ic2 = io.ImageCollection(
        [(testfile, 1), (testfile, 2), (testfile, 3)],
        load_func=fplug.FITSFactory)
    assert _same_ImageCollection(ic1, ic2)


@testing.skipif(not pyfits_available, reason="pyfits not installed")
def test_imread_collection_MEF_and_simple():
    io.use_plugin('fits')
    testfile1 = os.path.join(data_dir, 'multi.fits')
    testfile2 = os.path.join(data_dir, 'simple.fits')
    ic1 = io.imread_collection([testfile1, testfile2])
    ic2 = io.ImageCollection([(testfile1, 1), (testfile1, 2),
                             (testfile1, 3), (testfile2, 0)],
                             load_func=fplug.FITSFactory)
    assert _same_ImageCollection(ic1, ic2)

OK, it seems that this was implemented only for FITS support.

@stefanv
Copy link
Member
stefanv commented Jul 13, 2019

Here is the relevant but extremely cryptic section of the docstring:

For files with multiple images, the images will be flattened into a list and added to the list of available images. In this case, load_func should accept the keyword argument img_num.

How does the image loader know that there are multiple frames inside of the image? Hrmmm... I feel like this functionality may need to be removed: too complex and confusing.

@@ -147,24 +160,28 @@ def imread_convert(f):
>>> coll[0].shape
(200, 200)

>>> ic = io.ImageCollection('/tmp/work/*.png:/tmp/other/*.jpg')
>>> ic = io.ImageCollection(['/tmp/work/*.png', '/tmp/other/*.jpg'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this should be the only supported way to setup multiple patterns in Scikit-Image 0.18.

To many options makes maintaining this kind of code super complicated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do this, you would have to add a deprecation warning. In that case, I see this PR more as a "simplification" of the API, since lists of strings are much more intuitive for python users than a strange seperator. I would approve ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comments, let's just soft-deprecate the old behaviour. This means removing the documentation for the old syntax, including the second paragraph above. @stefanv do you agree with this approach?

try:
ar = np.concatenate(all_images)
array_cat = np.concatenate(all_images)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is stack not used?

In [7]: np.stack([a, b])                                                                                 
Out[7]: 
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])

In [8]: np.stack([a, b]).shape                                                                           
Out[8]: (2, 3, 3)

10000

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really was only refactoring. I didn't think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stack did not exist when this code was written.

@hmaarrfk
Copy link
Member

I'm really the wrong person to ask about this PR. Personally, I don't think it should exist. imageio provides enough basic functionality. The "always sorted order" basically throws away a ton of metadata from the user. What if the user had images numbered zero through 10, but had 5 loaded images. They would never know which 5 were loaded using this function.

This function is "cute" for the examples you showed, but doesn't really extend well to any real world example I can think of where metadata is just as important for an image collection than the binary numbers themselves.

Because of the above, I don't think you will ever get a thumbs up from me. :( You shouldn't count me in as the 1/2 reviewers needed to merge this in.

That said, if you do want to push through with this, I strongly encourage you to deprecate the os.pathsep feature here, because:

  1. I feel it is really unpythonic, adding a special character to a string in order to later turn it into a list seems like a bashism.
  2. It puts undue complexity indefinitely in the scikit-image library.

For now, you still have a thumbs down since it seems to miss a test for the new functionality you added.

@alexdesiqueira
Copy link
Member Author

I'm really the wrong person to ask about this PR. Personally, I don't think it should exist.
(...)
Because of the above, I don't think you will ever get a thumbs up from me. :( You shouldn't count me in as the 1/2 reviewers needed to merge this in.

What happened with this @hmaarrfk from-the-past? 😄

IMO, this should be the only supported way to setup multiple patterns in Scikit-Image 0.18.
(...)
I would approve ;)

I'll take care of the other points.

@hmaarrfk
Copy link
Member

What happened with this @hmaarrfk from-the-past?

Maybe an other way to think about it is:

Loading and saving data imposes a certain permanent organization on experimental setups.

Since you don't provide a "save collection", I just don't see how this function helps anybody.

  1. Acquisition
  2. Saving and loading
  3. Analysis
  4. Visualization

Probably best to leave scikit image focused on the later two.

Whoever deals with acquisition can deal with saving and loading.

@jni
Copy link
Member
jni commented Aug 25, 2019

@hmaarrfk none of your points provide to me compelling evidence for deprecating working functionality. I'm perfectly ok with soft-deprecating it (removing any mention on it from the docs), but I think the functionality should remain, rather than cause unnecessary churn for downstream users.

@hmaarrfk
Copy link
Member

ok, dont't deprecate. A test definitely needs to be added.

@stefanv
Copy link
Member
stefanv commented Aug 26, 2019

@hmaarrfk It is true that we have a policy of not implementing one-liner's on the user's behalf. But ImageCollection is quite useful, unless you want to introduce a heavier dependency such as dask for managing delayed reading.

Use case: I have a directory with 2000 images in it (this happens surprisingly frequently). I want to load these up, take a look at some of them. Yes, I can go pick out filenames individually, but often it's handy to just load the whole bunch into a collection and to then grab items from that list-like object.

So, I'd be -1 on deprecating the ImageCollection, unless we provided explicit directions in the user documentation on how to do this with, e.g., dask.

@hmaarrfk
Copy link
Member

I'm not saying deprecating all of image collection. Scikit image is many voices and you should all push forward with this PR. I'm simply suggesting deprecating the behavior that uses split(os.pathsep).

@alexdesiqueira
Copy link
Member Author

@hmaarrfk:

Analysis
Visualization
Probably best to leave scikit image focused on the later two.

I'm humbly against you on that. In my mind we're supposed to be (or, at least, offer) a nice solution from the start (which is, of course, reading the image). I'm even against our policy that @stefanv brought up:

It is true that we have a policy of not implementing one-liner's on the user's behalf.

However, as you said, we're many voices.

@jni
Copy link
Member
jni commented Sep 6, 2019

I actually do agree with @hmaarrfk that we should "divest" skimage from IO things, and further, also visualisation things. But this is a very long term goal and should not affect us improving existing functionality and fixing bugs. I would just be very reluctant to add any IO functionality.

Copy link
Member
@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexdesiqueira if you add a test, and a note in the release notes, I'm happy to merge this as is. Soft-deprecating the old pattern is optional, as that can come later at any time.

@@ -104,6 +121,13 @@ class ImageCollection(object):

Notes
-----
Note that files are always stored in alphanumerical order. Also note
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that files are always stored in alphanumerical order. Also note
Note that files are always returned in alphanumerical order. Also note

@@ -147,24 +160,28 @@ def imread_convert(f):
>>> coll[0].shape
(200, 200)

>>> ic = io.ImageCollection('/tmp/work/*.png:/tmp/other/*.jpg')
>>> ic = io.ImageCollection(['/tmp/work/*.png', '/tmp/other/*.jpg'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comments, let's just soft-deprecate the old behaviour. This means removing the documentation for the old syntax, including the second paragraph above. @stefanv do you agree with this approach?

that their constituent arrays are equal.
"""
Ancillary function to compare two ImageCollection objects, checking that
their constituent arrays are equal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the FITS plugin is now broken, this should definitely be noted in the release notes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I'll check

@jni
Copy link
Member
jni commented Sep 6, 2019

@alexdesiqueira LOL, timing! 👋

@alexdesiqueira
Copy link
Member Author

@jni I'm sprinting!

@alexdesiqueira
Copy link
Member Author

I actually do agree with @hmaarrfk that we should "divest" skimage from IO things, and further, also visualisation things. But this is a very long term goal and should not affect us improving existing functionality and fixing bugs. I would just be very reluctant to add any IO functionality.

I feel I should again express my concern of we taking things away from skimage. At some point we will restrict the package to a bunch of hard-coding people, instead of all nice people around that want something somewhat easier to use and to get things done with (myself kinda included here).
But it's only my 2 BRL cents.

@jni
Copy link
Member
jni commented Sep 6, 2019

I feel I should again express my concern of we taking things away from skimage. At some point we will restrict the package to a bunch of hard-coding people,

no no: the idea is not to stop supporting ease of use, but to separate concerns into different packages. That way, people who need only an image processing function don't need to download matplotlib, imageio, freetype, and a bunch of other things. scikit-image would retain its ease of use through its easy interoperability with the ecosystem.

Either way, I'm not advocating removing functionality, at least not for a long time. I'm just aiming to be careful to not add a bunch of extra functionality that is not precisely aligned with our vision.

slicing returns a new ImageCollection, *not* a view into the data.

class ImageCollection(object):
"""Load and manage a collection of image files.

Parameters
----------
load_pattern : str or list
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is not complete.

Copy link
Member Author
@alexdesiqueira alexdesiqueira Sep 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stefanv I thought a little and I think you're right. Let's maintain the old behavior but without the documentation, to not break stuff for a while but not encouraging it.

implementation notes below.
load_pattern : str or list of str
Pattern string or list of strings to load. The filename path can be
absolute or relative.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexdesiqueira based on my reading of the notes from the last meeting, this description should include more detail, then we can merge!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll point out that my approval still stands, and @stefanv could merge when ready. =)

@stefanv stefanv merged commit e01560d into scikit-image:master Sep 30, 2019
@hmaarrfk
Copy link
Member
hmaarrfk commented Oct 1, 2019 via email

@alexdesiqueira alexdesiqueira deleted the improve_collections branch July 11, 2020 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0