-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
ImageCollection is able to receive lists of patterns #3928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImageCollection is able to receive lists of patterns #3928
Conversation
OK, I'll consider that the implemented tests think on |
skimage/io/collection.py
Outdated
@@ -104,6 +98,9 @@ class ImageCollection(object): | |||
|
|||
Notes | |||
----- | |||
Note that files are always stored in alphabetical order. Also note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is false, it's alphanumeric order, im1.jpg
will come before im10.jpg
. My second contribution ever to scikit-image. =D
skimage/io/collection.py
Outdated
""" | ||
|
||
def __init__(self, load_pattern, conserve_memory=True, load_func=None, | ||
**load_func_kwargs): | ||
"""Load and manage a collection of images.""" | ||
if isinstance(load_pattern, str): | ||
try: | ||
if not isinstance(load_pattern, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check should be more stringent: it should not be a string, but it should be a sequence.
Also, try/except
clauses should contain as little content as possible. I think this one encompasses too many statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to fix that with:
if isinstance(load_pattern, (list, tuple)):
IMHO this try/except
is as good as it gets. It receives the three types of inputs we'd like to put there, and it's rather efficient when compared to an if isinstance(load_pattern, (str, list, tuple)
. That could also bring us issues in a more difficult type of input:
load_pattern = [(1, 'one'), (2, 'two')]
In []: isinstance(load_pattern, (list, tuple))
Out[]: True
I don't know... Is there something I'm missing here? How could we solve that other way? Thanks for your help Juan!
skimage/io/collection.py
Outdated
relative. Multiple patterns should be separated by os.pathsep, | ||
e.g. '/tmp/work/*.png:/tmp/other/*.jpg'. Also see | ||
implementation notes below. | ||
relative. Also see implementation notes below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should include all possible inputs here, including the pathsep. As far as I know we don't intend to deprecate that? We could keep the original message but include the statement about windows vs *nix, and also the option to pass in a list of patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They still work; I tried to state that in the documentation. Is it good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A great start! And thanks for tackling this! In addition to my minor comments, you should include a test for the new behaviour.
skimage/io/collection.py
Outdated
|
||
If multiple patterns are entered as a single string, they should be | ||
separated by os.pathsep, e.g. '/tmp/work/*.png:/tmp/other/*.jpg' in *nix, | ||
or '/tmp/work/*.png;/tmp/other/*.jpg' in Windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice!
@alexandrejaguar the problem with the TypeError syntax is that there's lots of statements there, any of which could potentially raise a TypeError that is not actually the error we're looking for. There is also a design philosophy that I mostly subscribe to that "exceptions are for exceptional circumstances", and you shouldn't use them for "normal" and "expected" logic flow. My suggestion: create a helper function, import typing
((isinstance(load_pattern, str) and os.pathsep in load_pattern) or
(not isinstance(load_pattern, str) and
isinstance(load_pattern, typing.Sequence) and
all(isinstance(pattern, str) for pattern in load_pattern))
) Then: if is_multipattern(load_pattern):
if isinstance(load_pattern, str):
load_pattern = load_pattern.split(os.pathsep)
...
elif isinstance(load_pattern, str):
... # single pattern logic
else:
# Raise TypeError This might also be a good opportunity for us to start using progressive typing: from typing import Sequence, Union as U
def imread_collection(load_pattern : U[str, Sequence[str]], ...): Then people's IDEs, at least, will tell them that [(1, 'one')] is invalid input for the function. |
@jni sounds good, let's see what I got. |
On Fri, 24 May 2019 18:12:33 -0700, Juan Nunez-Iglesias wrote:
> Parameters
----------
load_pattern : str or list
Pattern glob or filenames to load. The path can be absolute or
- relative. Multiple patterns should be separated by os.pathsep,
- e.g. '/tmp/work/*.png:/tmp/other/*.jpg'. Also see
- implementation notes below.
+ relative. Also see implementation notes below.
I think we should include all possible inputs here, including the
pathsep. As far as I know we don't intend to deprecate that? We could
keep the original message but include the statement about windows vs
*nix, and also the option to pass in a list of patterns.
This is a tricky point: on the one hand you want the docstring to give
an accurate description of the function, but on the other you want it to
encourage best behavior.
It may be best to be honest: "For backward compatibility, we also accept
a string of patterns delimited by `os.pathsep`."
|
I agree with this wording, thanks @stefanv! I would add only "... delimited by I think it's important to be clear on this in case someone is trying to figure out what's wrong with the code that was written on one platform and is now broken on another. |
I'm more into the wordy version as well, with a little modification: |
@alexandrejaguar nice job making it more concise! I'm all for it! |
@jni I understand the concern with |
Hello @alexdesiqueira! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2019-09-17 00:54:07 UTC |
Wait, what? We accept those? What does it mean? |
There's a test that check for input like that, maybe a list of previously input images... I was trying to deal with that actually =D
Yep, maybe it's time to stop considering that...
|
If it is supported API with a test I would be very hesitant to remove it without further investigation. Was this perhaps to support "frame generators" that output frame number + image file combos? Or perhaps a way to provide explicit ordering? Is it used internally that way, without ever having been externally advertised? In that case, we could potentially just change it. |
@stefanv @jni that's what I'm afraid of. This is the test:
I'll ckeck it. 🔍 |
Got it (
OK, it seems that this was implemented only for FITS support. |
Here is the relevant but extremely cryptic section of the docstring:
How does the image loader know that there are multiple frames inside of the image? Hrmmm... I feel like this functionality may need to be removed: too complex and confusing. |
@@ -147,24 +160,28 @@ def imread_convert(f): | |||
>>> coll[0].shape | |||
(200, 200) | |||
|
|||
>>> ic = io.ImageCollection('/tmp/work/*.png:/tmp/other/*.jpg') | |||
>>> ic = io.ImageCollection(['/tmp/work/*.png', '/tmp/other/*.jpg']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this should be the only supported way to setup multiple patterns in Scikit-Image 0.18.
To many options makes maintaining this kind of code super complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do this, you would have to add a deprecation warning. In that case, I see this PR more as a "simplification" of the API, since lists of strings are much more intuitive for python users than a strange seperator. I would approve ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the comments, let's just soft-deprecate the old behaviour. This means removing the documentation for the old syntax, including the second paragraph above. @stefanv do you agree with this approach?
try: | ||
ar = np.concatenate(all_images) | ||
array_cat = np.concatenate(all_images) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is stack not used?
In [7]: np.stack([a, b])
Out[7]:
array([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]]])
In [8]: np.stack([a, b]).shape
Out[8]: (2, 3, 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really was only refactoring. I didn't think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stack did not exist when this code was written.
I'm really the wrong person to ask about this PR. Personally, I don't think it should exist. imageio provides enough basic functionality. The "always sorted order" basically throws away a ton of metadata from the user. What if the user had images numbered zero through 10, but had 5 loaded images. They would never know which 5 were loaded using this function. This function is "cute" for the examples you showed, but doesn't really extend well to any real world example I can think of where metadata is just as important for an image collection than the binary numbers themselves. Because of the above, I don't think you will ever get a thumbs up from me. :( You shouldn't count me in as the 1/2 reviewers needed to merge this in. That said, if you do want to push through with this, I strongly encourage you to deprecate the
For now, you still have a thumbs down since it seems to miss a test for the new functionality you added. |
What happened with this @hmaarrfk from-the-past? 😄
I'll take care of the other points. |
Maybe an other way to think about it is: Loading and saving data imposes a certain permanent organization on experimental setups. Since you don't provide a "save collection", I just don't see how this function helps anybody.
Probably best to leave scikit image focused on the later two. Whoever deals with acquisition can deal with saving and loading. |
@hmaarrfk none of your points provide to me compelling evidence for deprecating working functionality. I'm perfectly ok with soft-deprecating it (removing any mention on it from the docs), but I think the functionality should remain, rather than cause unnecessary churn for downstream users. |
ok, dont't deprecate. A test definitely needs to be added. |
@hmaarrfk It is true that we have a policy of not implementing one-liner's on the user's behalf. But ImageCollection is quite useful, unless you want to introduce a heavier dependency such as dask for managing delayed reading. Use case: I have a directory with 2000 images in it (this happens surprisingly frequently). I want to load these up, take a look at some of them. Yes, I can go pick out filenames individually, but often it's handy to just load the whole bunch into a collection and to then grab items from that list-like object. So, I'd be -1 on deprecating the ImageCollection, unless we provided explicit directions in the user documentation on how to do this with, e.g., dask. |
I'm not saying deprecating all of image collection. Scikit image is many voices and you should all push forward with this PR. I'm simply suggesting deprecating the behavior that uses |
I'm humbly against you on that. In my mind we're supposed to be (or, at least, offer) a nice solution from the start (which is, of course, reading the image). I'm even against our policy that @stefanv brought up:
However, as you said, we're many voices. |
I actually do agree with @hmaarrfk that we should "divest" skimage from IO things, and further, also visualisation things. But this is a very long term goal and should not affect us improving existing functionality and fixing bugs. I would just be very reluctant to add any IO functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexdesiqueira if you add a test, and a note in the release notes, I'm happy to merge this as is. Soft-deprecating the old pattern is optional, as that can come later at any time.
skimage/io/collection.py
Outdated
@@ -104,6 +121,13 @@ class ImageCollection(object): | |||
|
|||
Notes | |||
----- | |||
Note that files are always stored in alphanumerical order. Also note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that files are always |
|
Note that files are always returned in alphanumerical order. Also note |
@@ -147,24 +160,28 @@ def imread_convert(f): | |||
>>> coll[0].shape | |||
(200, 200) | |||
|
|||
>>> ic = io.ImageCollection('/tmp/work/*.png:/tmp/other/*.jpg') | |||
>>> ic = io.ImageCollection(['/tmp/work/*.png', '/tmp/other/*.jpg']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the comments, let's just soft-deprecate the old behaviour. This means removing the documentation for the old syntax, including the second paragraph above. @stefanv do you agree with this approach?
that their constituent arrays are equal. | ||
""" | ||
Ancillary function to compare two ImageCollection objects, checking that | ||
their constituent arrays are equal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the FITS plugin is now broken, this should definitely be noted in the release notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I'll check
@alexdesiqueira LOL, timing! 👋 |
@jni I'm sprinting! |
I feel I should again express my concern of we taking things away from skimage. At some point we will restrict the package to a bunch of hard-coding people, instead of all nice people around that want something somewhat easier to use and to get things done with (myself kinda included here). |
no no: the idea is not to stop supporting ease of use, but to separate concerns into different packages. That way, people who need only an image processing function don't need to download matplotlib, imageio, freetype, and a bunch of other things. scikit-image would retain its ease of use through its easy interoperability with the ecosystem. Either way, I'm not advocating removing functionality, at least not for a long time. I'm just aiming to be careful to not add a bunch of extra functionality that is not precisely aligned with our vision. |
skimage/io/collection.py
Outdated
slicing returns a new ImageCollection, *not* a view into the data. | ||
|
||
class ImageCollection(object): | ||
"""Load and manage a collection of image files. | ||
|
||
Parameters | ||
---------- | ||
load_pattern : str or list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is not complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanv I thought a little and I think you're right. Let's maintain the old behavior but without the documentation, to not break stuff for a while but not encouraging it.
implementation notes below. | ||
load_pattern : str or list of str | ||
Pattern string or list of strings to load. The filename path can be | ||
absolute or relative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexdesiqueira based on my reading of the notes from the last meeting, this description should include more detail, then we can merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll point out that my approval still stands, and @stefanv could merge when ready. =)
Curious to learn how people use this! Keep me updated!
…On Mon, Sep 30, 2019, 2:15 PM Stefan van der Walt ***@***.***> wrote:
Merged #3928 <#3928>
into master.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3928?email_source=notifications&email_token=AAAV7GBI2MP5TTL7EL3FSPLQMJT75A5CNFSM4HPSHF6KYY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOT5XM5HQ#event-2674839198>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAAV7GFHWE7VWD7JNT46733QMJT75ANCNFSM4HPSHF6A>
.
|
Description
Users can pass a list of patterns to ImageCollection, instead of a string.
Checklist
./doc/examples
(new features only)./benchmarks
, if your changes aren't covered by anexisting benchmark
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.@meeseeksdev backport to v0.14.x