-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
add type annotations in some modules in skimage.segmentation #4794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hello @emmanuelle! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2020-07-26 15:56:09 UTC |
from ._array_like import ArrayLike | ||
import numpy as np | ||
|
||
from typing_extensions import Annotated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this to crash with py3.6 but apparently it does not ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(a) that's fascinating, and (b) who cares, apparently today is Py3.7 day! =D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well we have py3.6 builds but they pass (not the ones building the docs but this is not an import problem)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what I'm saying is we can drop those builds so this is not really worth worrying about. =) See: https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we drop the builds we accept that de facto future versions will not support py3.6. Can't believe that Google Colab is still supporting only py3.6 but I just checked and it seems to be the case. So we can make this decision but it should be motivated by more than just this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's motivated by following NEP29. I'm super not interested in supporting Google Colab or other proprietary technologies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally understand why you're not interested :-). Nevertheless I think a lot of people are using Colab for deep learning on images and scikit-image for pre/post-processing.
@scikit-image/core it's wip but comments are welcome to steer the future direction of this work. The CI is failing on the doc build (sphinx-gallery auto examples), I can try to solve this now but the code structure may evolve. |
@@ -4,7 +4,7 @@ | |||
import numpy as np | |||
|
|||
from . import _marching_cubes_lewiner_luts as mcluts | |||
from . import _marching_cubes_lewiner_cy | |||
from . import _marching_cubes_lewiner_cy # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do Cython modules need to be ignored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See python/mypy#8575, mypy just does not read .pyx
files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also reading python/mypy#7542 now.
skimage/measure/_regionprops.py
Outdated
@@ -160,12 +160,12 @@ def __init__(self, slice, label, label_image, intensity_image, | |||
self._cache = {} | |||
self._ndim = label_image.ndim | |||
|
|||
@property | |||
@property # type: ignore # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
Also, same as above, why do @property
decorators need to be ignored? It seems like a pretty major issue that MyPy would have had to have addressed by now, surely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See python/mypy#1362. People are still linking to this issue as recently as Dec 2019 so I'm not sure it's fixed. Let me take a closer look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and sure this was a search and replace typo
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
skimage2ndimage.update({x: x for x in funcs}) | ||
funcs_binary = ('binary_erosion', 'binary_dilation', 'binary_opening', | ||
'binary_closing', 'black_tophat', 'white_tophat') | ||
skimage2ndimage.update({x: x for x in funcs_binary}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this change. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise mypy was complaining that the length had changed...
def random_walker(data: ImageArray, labels: LabelsArray, beta: float = 130, | ||
mode: Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j', | ||
tol: float = 1.e-3, copy: bool = True, | ||
multichannel: bool = False, return_full_prob: bool = False, | ||
spacing: Optional[Sequence] = None, *, | ||
prob_tol: float = 1e-3) -> LabelsArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def random_walker(data: ImageArray, labels: Labe F438 lsArray, beta: float = 130, | |
mode: Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j', | |
tol: float = 1.e-3, copy: bool = True, | |
multichannel: bool = False, return_full_prob: bool = False, | |
spacing: Optional[Sequence] = None, *, | |
prob_tol: float = 1e-3) -> LabelsArray: | |
def random_walker( | |
data: ImageArray, | |
labels: LabelsArray, | |
beta: float = 130, | |
mode: Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j', | |
tol: float = 1.e-3, | |
copy: bool = True, | |
multichannel: bool = False, | |
return_full_prob: bool = False, | |
spacing: Optional[Sequence] = None, | |
*, | |
prob_tol: float = 1e-3, | |
) -> LabelsArray: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Literal
mean that I can't do:
mode = get_mode_from_gui_button()
output = random_walker(image, labels, mode=mode)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you type your get_mode_from_gui_button
or not? If you type it the script above should pass mypy
. But probably you see get_mode_from_gui_button
as a more generic function, and then it would not be possible to type it with such a specific type as Literal['cg', 'cg_j', 'cg_mg', 'bf']
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jni could you please explain in more details what you meant here? Remember that type hints are not used at runtime but only by the type checker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emmanuelle yes but that is perfectly valid code to run through the type checker! I meant with my code snippet to show an example in which mode=
would be a variable, not always a literal string.
Also side note that function type annotations are available at runtime in the __annotations__
attribute of the function object!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I thought that Literal
was a cool thing in particular for UI generation (dropdown with the discrete values) but it is quite restrictive for type checking indeed. For example
from typing_extensions import Literal
from typing import Any
def my_func(x: Literal['a', 'b']) -> str:
return x + x
x: Any = 'a'
my_func(x) # ok
y: Any = 'c'
my_func(y) # ok even if wrong value, but with type Any it passes
z: str = 'a'
my_func(z) # not ok because type is str and not Literal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(dropdown with the discrete values)
Perhaps another good use for 'Annotated'? =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum yes very smart! I must admit I don't completely understand how Annotated
works (even after reading PEP 593). For example in the following code it's not possible to write directly Annotated[str, 'a', 'b']
in the function signature and I don't understand why. There are workarounds like defining the annotated type outside of the signature, so it's not a big deal, but I'd prefer to understand this better...
from typing_extensions import Annotated
def modes():
return ['a', 'b']
# error: Name 'a' is not defined
# error: Name 'b' is not defined
def my_func(x: Annotated[str, "a", "b"]) -> str:
return x + x
# ok
def my_other_func(x: Annotated[str, modes()]) -> str:
return x + x
# ok
modes_type = Annotated[str, 'a', 'b']
# ok
def yet_another_func(x: modes_type) -> str:
return x + x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are mixing up some concepts here. Literal
is a type, so it can also be a type of a variable. It just means only specific values are allowed. In this case it would mean that get_mode_from_gui_button()
needs to have a return type of Literal['cg', 'cg_j', 'cg_mg', 'bf']
(or possibly a subset of those values) to guarantee that only those values will be passed to random_walker
I think the most intuitive explanation I can come up with is that Literals are like Enums, but without the members having a name or the enum itself having a name.
In the case of Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j'
an Enum does seem like the more appropriate choice though. Which has the advantage that the meaning of the values can be defined and documented in one place, and every place it is used in can then also be found (by simply just looking for code that imports this enum)
The Annotated feature is something I haven't used myself yet. I think the intention for it are third party extensions for complex type checking like dependent types ( which is feature that is essential to properly type numeric code like anything that uses numpy, but that is a different topic). My intuition is that it is most likely not the solution to whatever problem you are trying to solve, but I don't fully understand the problem itself yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the problem itself yet
I'm glad you asked! 😂 We have two issues:
- NumPy arrays do double and triple duty in the library: they can represent an image (which can be 2D, 3D, nD, 2D+channels, ...), a set of coordinates, a set of spatial extents... So we need a way to distinguish between all these types, while still allowing bare NumPy arrays through no problem. Here NewType is indeed what I thought was the right solution, although it is not the right solution e.g. for distinguishing between multichannel and single channel arrays.
- The other issue we have is that for many functions (and we hope to continue to increase how many), not just NumPy arrays are allowed, but actually any array-like, if they implement
__array__
and maybe NEP-18 (__array_function__
). In this case, NewType falls over because we actually need to use a protocol, and that protocol is defined by NumPy, and you can't use NewType with a protocol.
if not isinstance(sigma, coll.Iterable): | ||
sigma = np.array([sigma, sigma, sigma], dtype=dtype) | ||
sigma /= spacing.astype(dtype) | ||
sigma /= spacing.astype(dtype) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not get this to pass without the ignore. Basically, spacing
has initially a sequence type and so this line would fail because a sequence has not astype
method. I tried to use the cast
function to cast spacing
to np.ndarray
but it did not work (much to my confusion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, actually I think this is a case of mypy uncovering a potential bug in our code. If spacing is a sequence other than a tuple or list, we will get to here and have an error, because we only convert spacing (line 266 above) if it's a list or a tuple. So I in fact would suggest changing that elif
to an else
, and then maybe this will pass? 🤞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but it does not work. Specifically,
if spacing is None:
spacing = np.ones(3, dtype=dtype)
else:
spacing = np.ascontiguousarray(spacing, dtype=dtype)
spacing = cast(np.ndarray, spacing)
reveal_type(spacing)
if not isinstance(sigma, coll.Iterable):
sigma = np.array([sigma, sigma, sigma], dtype=dtype)
sigma /= spacing.astype(dtype)
and when running mypy:
$ mypy --ignore-missing-imports skimage/segmentation/
skimage/segmentation/slic_superpixels.py:270: note: Revealed type is 'Union[typing.Sequence[Any], None]'
skimage/segmentation/slic_superpixels.py:273: error: Item "Sequence[Any]" of "Optional[Sequence[Any]]" has no attribute "astype"
skimage/segmentation/slic_superpixels.py:273: error: Item "None" of "Optional[Sequence[Any]]" has no attribute "astype"
Found 2 errors in 1 file (checked 26 source files)
It's as if the cast
was completely ignored... Any expert of mypy around here? :-p @kne42 @chrisconlan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @emmanuelle! Well, the change should go in anyway. =)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue might be that the spacing
variable is never declared as np.ndarray
. The error that mypy seems to ignore for some reason is that spacing = cast(np.ndarray, spacing)
is already a type error because the right hand side is an expression of type np.ndarray but the left hand side is a variable of type Optional[Sequence]
.
I think the proper solution is to split the spacing variable into two, one with type Optional[Sequence]
the other of type np.ndarray
.
skimage/typing/_image_types.py
Outdated
ImageArray = Annotated[np.ndarray, 'image', | ||
'a 2D or 3D single or multichannel image'] | ||
Image2dArray = Annotated[np.ndarray, 'image_2d'] | ||
ImagendArray = Annotated[np.ndarray, 'image_nd', | ||
'nd image'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd strongly prefer for Image
to be nD, and Image2D
and Image3D
to be special cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, Array
may be unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most often we want to specify either "N-d" or "2/3D".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we have
- N-d
- 2/3D
- 2D only (for example
segmentation.active_contour
orsegmentation.chan_vese
)
Should we pick up Image3d for 2/3d or will it be confusing to people? Looking for a good name here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also have multichannel vs spatial to consider.
Maybe Image2D3D
? Your suggestion of Image3D
for both can also work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the above again, having Image3D
double for both would probably be fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware of the NewType feature? On first glance it looks like this is what you actually want to use. Annotated
is ignored by mypy itself and supposed to be used for third party extensions to mypy. NewType
ensures that only something that is specifically typed as Image
is passed and not just any ndarray.
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Note that until numpy 1.20 is released in 5-6 months, the type import numpy as np
from typing_extensions import Annotated
Image = Annotated[np.ndarray, 'image']
def add_images(img1: Image, img2: Image) -> Image:
return img1 + img2
im1: np.ndarray = np.random.random((10, 10))
im2: np.ndarray = np.random.random((10, 10))
res = add_images(3, im2) |
With such types, introspecting types at runtime (for example for profiling or UI generation) would look like from skimage.typing import Image, Image2D, Image3D, Labels, Mask
from skimage import segmentation
import inspect
from typing_extensions import _AnnotatedAlias
from typing import _GenericAlias, Optional
image_types = [Image, Image2D, Image3D, Labels, Mask]
spec = inspect.getfullargspec(segmentation.watershed)
print(spec)
for arg in spec.args:
arg_type = spec.annotations[arg]
if type(arg_type) == _AnnotatedAlias:
for im_type in image_types:
if arg_type == im_type:
print(arg, ":", im_type)
elif type(arg_type) == _GenericAlias:
for im_type in image_types:
if arg_type == Optional[im_type]:
print(arg, ":", im_type)
else:
print(arg, ":", arg_type) Which results in FullArgSpec(args=['image', 'markers', 'connectivity', 'offset', 'mask', 'compactness', 'watershed_line'], varargs=None, varkw=None, defaults=(None, 1, None, None, 0, False), kwonlyargs=[], kwonlydefaults=None, annotations={'return': typing_extensions.Annotated[numpy.ndarray, 'labels', 'array of integers representing labels'], 'image': typing_extensions.Annotated[numpy.ndarray, 'image_nd', 'nd image'], 'markers': typing.Union[typing_extensions.Annotated[numpy.ndarray, 'labels', 'array of integers representing labels'], NoneType], 'connectivity': typing.Union[int, skimage.typing._array_like._SupportsArray], 'offset': typing.Union[numpy.ndarray, NoneType], 'mask': typing.Union[typing_extensions.Annotated[numpy.ndarray, 'mask', 'array of bools to be used as mask'], NoneType], 'compactness': <class 'float'>, 'watershed_line': <class 'bool'>})
image : typing_extensions.Annotated[numpy.ndarray, 'image_nd', 'nd image']
markers : typing_extensions.Annotated[numpy.ndarray, 'labels', 'array of integers representing labels']
mask : typing_extensions.Annotated[numpy.ndarray, 'mask', 'array of bools to be used as mask']
compactness : <class 'float'>
watershed_line : <class 'bool'> |
CI seems to be passing yeah! I added a checklist at the top of the PR with a few items to discuss. |
After following this discussion I find the arguments in favor of type annotations in signatures slightly more convincing. Although, I really don't like two points that were already raised about signature annotations:
|
image_dim: int, | ||
connectivity: Optional[Union[int, _SupportsArray]], | ||
offset: Optional[Sequence[int]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
image_dim: int, | |
connectivity: Optional[Union[int, _SupportsArray]], | |
offset: Optional[Sequence[int]], | |
image_dim: int, | |
connectivity: Optional[Union[int, _SupportsArray]], | |
offset: Optional[Sequence[int]], |
I'd be interested in your reasoning about adding two indent levels and deviating from the "norm" here? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lagru "the norm" is a choice that black made that I dislike. PEP8 encourages two levels to distinguish the arguments from the body of the function: https://www.python.org/dev/peps/pep-0008/#indentation
@fmagin I am so appreciative of the time you took to review the issue 🙏 I agree that @lagru I hear you on the ever-growing list of things people need to know to contribute. Fortunately, since this is a standard language feature, I think many people will soon be familiar with it. Also, since typing can be done gradually, there is no pressure to immediately annotate all functions---we can indeed help out new developers with that process (hey, a week ago I had never even written a type annotation myself ;). W.r.t. how these look: I agree with @jni that formatting/highlighting makes a big difference.
vs def foo(
image: Image,
mapping: Mapping[Text, Any] = ...
) -> Sequence[Text]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got around to actually reading the code changes and looking at the context. There are basically two kinds of problems/improvements I see:
1. Lack of NewType
usages.
Both as a replacement for Annotated
and also in other places where it is quite useful.
2. A common Python pattern that is actually incompatible with static typing, but easy to change.
Namely that a function argument is of Type A (like Sequence
) and then converted to Type B (like np.ndarray
) on its first use which is then reassigned to the function argument again. Which is technically a type error because you cant assign a value of Type B to a variable of Type A.
Because mypy treats external library functions without type annotations as returning Any
, the functions for creating an object of type np.ndarray
( like np.array
, np.ones
, np.ascontiguousarray
) are treated as returning Any
. Without specific mypy flags, a value of type Any
is valid in all places (even if they have a more specific type) instead of being valid in no places (except if they are explicitly declares as Any
)
The solution is simply to assign the function argument of Type A to a new variable of Type B after converting it to Type B.
@@ -267,12 +267,12 @@ def __getattr__(self, attr): | |||
f"'{type(self)}' object has no attribute '{attr}'" | |||
) | |||
|
|||
@property | |||
@property # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the type ignored here? In this example np.sum
seems annoying to properly type annotated for all cases, but that is the problem of the numpy project. In the specific case of a property that just wraps np.sum(self.image)
I would expect the return type to be simple float or np.ndarray
or at least just Union[float,np.ndarray]
, which is already a lot more information than type ignore. With type: ignore
I had to lookup the type signature of np.sum
myself now to even know that this can actually return an array and not just a float.
The same idea probably applies to the other type: ignore
annotations in the rest of the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See python/mypy#1362, which has a lot of people complaining that mypy does not support the property decorator :-).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You had me confused there for a minute and questioning my memories ;)
I have to admit though that this is a lot more complicated than I assumed when I wrote the comment:
The linked issue is about a subtly more specific case: mypy doesn't support decorated properties but it supports the property decorator. So this specific instance of # type: ignore
might actually be needed. But I am not even sure if this issue applies for just "properties that are decorated further" or also for "some function that is decorated and then wrapped with a property".
At least some of the cases further down only use @property
though, and I am fairly sure that it should work in that case by simply annotating the return type of the decorated method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmagin we flagged this above in this comment, which admittedly GitHub makes difficult to see, and the reason is found here:
image_dim: int, | ||
connectivity: Optional[Union[int, _SupportsArray]], | ||
offset: Optional[Sequence[int]], | ||
) -> Tuple[np.ndarray, np.ndarray]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might also be something that can be improved even further by using NewType
. The doscstring states that one of the returned arrays is a "structuring element" in the form of an array of bool, and the other is an offset as an array of int.
With NewType
this can be made explicit and this type can then also be used in the signature of e.g. _offsets_to_raveled_neighbors
(which is where the return value of this function ends up in in one case)
Its docstring for the parameter states:
selem : ndarray
A structuring element determining the neighborhood expressed as an
n-D array of 1's and 0's.
If selem
is annotated with the same NewType
that is returned here, this another example of documentation that can be formalized in the type system for both human and algorithmic use. Unless a n-D array of 1's and 0's.
is actually different to an array of bool
which would be even more important to emphasize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good example for the question I had above: how do we maintain parallel hierarchies of (1) NumPy arrays that are NumPy arrays but also a specific type of NumPy arrays, and (2) array-likes that might actually be dask arrays or OpenCL arrays or whatever.
if not isinstance(sigma, coll.Iterable): | ||
sigma = np.array([sigma, sigma, sigma], dtype=dtype) | ||
sigma /= spacing.astype(dtype) | ||
sigma /= spacing.astype(dtype) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue might be that the spacing
variable is never declared as np.ndarray
. The error that mypy seems to ignore for some reason is that spacing = cast(np.ndarray, spacing)
is already a type error because the right hand side is an expression of type np.ndarray but the left hand side is a variable of type Optional[Sequence]
.
I think the proper solution is to split the spacing variable into two, one with type Optional[Sequence]
the other of type np.ndarray
.
if not isinstance(sigma, Iterable): | ||
sigma = np.array([sigma, sigma, sigma], dtype=dtype) | ||
sigma /= spacing.astype(dtype) | ||
sigma /= spacing.astype(dtype) # type: ignore | ||
elif isinstance(sigma, (list, tuple)): | ||
sigma = np.array(sigma, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same problem as above: sigma
is initially declared as the type Union[float, Sequence[float]]
, but here it is assigned the result of np.array
. The type checker continues to treat sigma
as its declared type, and because np.array
is function from an external untyped library and not a constructor mypy just treats it as returning Any
which isn't considered an error by default. So it just assumes that this is correct and sigma is still guaranteed to be of type Union[float, Sequence[float]]
, leading to type issues further down in the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you for clarifying this for us! 🎉
@lagru thanks for your input. I should have added that there is a third option "do not add type annotations" which we need to consider, if we decide that the cost for users is too big (even if for developers it should have benefits). I really like the idea of @zpincus that IDEs could have an option whether to display type annotations. This is probably something that should be discussed on Twitter to reach devs of different IDEs. Also 👍 for having maintainers help along with type annotations, when a PR is proposing a new function for the public API, adding type annotations is a small task for maintainers compared to the overall review work. One thing we could do is release quickly a new version of scikit-image with just one subpackage typed and see what is the reaction of people. If we don't type the whole package it's clear that it's still experimental. |
Thanks @fmagin all your comments are very valuable. I tried using Image = NewType('Image', np.ndarray) but then I got the following error when running mypy:
At the moment it is not possible to make a I also see the following caveat with |
@fmagin if you don't mind, I'd have a couple of questions to submit to your expertise on type annotations.
|
That is a problem I wasn't aware of yet, thanks. I assumed it would at least work for directly importing some class. A possible workaround is using a type alias instead, i.e. simply declaring it as
I consider this a feature of
not without numpy supporting it, but I'd expect them to support it. Should be achievable with generics, like annotating a builtin python list as
That is somewhat complicated. Depending on what you want exactly this might be either possible with Literal Types and Generics, or is a feature that would make type checking an undecidable problem. I am not familiar enough with the type theory details here and uses of numpy you are imagining. Either way that is something that numpy has to decide and support, not much can be done if you are just using numpy. I vaguely remember that one of the use cases of
I have no idea, I didn't need protocols so far. My intuition is that it is possible, but I can't help you with the specifics. |
Just to add my two cents, I like inline annotations (type hints in function signatures) provided it's done with proper/conventional formatting/indentation. My viewpoint is the same as @zpincus's. |
Yes, I think pretty typical: people sometimes produce random test arrays ( This is one of the things I'm wondering about. If an input argument is, say, a dict, then usually you can just write:
where If not, please disregard! |
But do they do that regularly in code that also should pass a static type checker? That is the thing that somewhat confuses me.
I think TypedDict is appropriate for this, but this doesn't address your point. Both of these points are similar in a way: As far as I understand they only apply if the user has actively decided that their code must pass their mypy configuration check. The runtime doesn't care. PyCharm just treats those type errors as a warning by default, and this is also configurable. I have it turned up to a full error, but you could also reduce it the level of a linting issue. The third party library doesn't have to pass the My personal experience is that I annotated a library I was using and strongly argued for using |
Sorry I'm late to the party on some of this -- and note that I don't have a lot of skin in the game at this point, so I hope this isn't just bikeshedding, but @jni makes a good point:
The fundamental problem here is that duck-typing on array-like objects is a fantastic feature. Moreover, this goes counter to an idea that crops up earlier in the comments: that it's useful to have to specify when a specific array is "actually" an image. This seems quite wrongheaded to me. It's almost a category error to think that there exist arrays that are definitively "not images". If I have a dataset that I want to apply a skimage routine to (be it 1-d, 2-d, or 20-d), the data are certainly image enough (provided the skimage function can handle the dimensionality). But duck-typing on something as diverse as "anything that presents an array-like interface" is clearly very hard to square with type annotations, absent either runtime functions to cast things to So, how much of a problem is it, really, if skimage functions just always accept (Those who have been around since the early days of numpy will remember all the trouble that trying to use specialized subclasses of numpy arrays caused when doing numpy operations -- i.e. under what circumstances do you get the subclass back vs. a pure ndarray? It just turned into whack-a-mole after a while. And I wonder if this isn't going to wind up a lot like that). And importantly, note that any validation function that takes |
@zpincus (👋)
😂 love this. For simplicity below I'll talk about NumPy arrays only, but this applies to duck arrays. My idea with this type hierarchy is that a bare, unannotated NumPy array will always validate as an Image, because an Image is a NumPy array. What won't validate is something like a Coords, returned by That validation is useful, but the annotation is even more useful for packages like napari which can, in the not too distant future I hope, use the annotation to determine whether to add an Image layer or a Points layer or a Vectors layer. Ultimately, types are ignored at runtime, and I think if you want to get creative with your array library mixing, you can take a bit of flak from the type checker. =P But, absolutely, "raw" arrays should 100% validate as "image" arrays. I would certainly not entertain a system that didn't pass this test. |
Thanks for that input and summary! As I said, my experience with numpy is limited to basically a 200 LoC implementation of a perceptron, so I never really used numpy much. There is a good chance that this will change in the next years, so I guess on some level I am just invested in making sure that by that time there is proper typing ;) The duck type argument is quite convincing and I now understand how NewType is fairly annoying in that case and don't think it is a good choice anymore. Thank you for your patience in explaining this! I would still expect that there is value (especially for new contributors) in some way to document the intended use of a numpy array though:
I think that will actually not work with Type Aliases. That could be good use for |
Hi all, I'm very excited about adding type annotations to scikit-image functions. If one of you can guide me, and e.g. provide instructions like "all image input parameters should have type Let me know if/how I can help! And no hurry though. Cheers, |
Thanks @haesleinhuepf , I think it would be great if you want to take this up again! I am not very experienced with typing, and haven't read through this issue recently, so don't have much specific advice. Maybe @stefanv, @jni or @emmanuelle are more knowledgable on static typing and could provide guidance on the best path forward? I know SciPy has started adding typing recently (see Perhaps something to consider for 1.0 would be making the accepted types in scikit-image functions a bit stricter, requiring that all image inputs be arrays and dropping support for sequences like list, tuple, etc. Use of lists as images is sometimes nice for small documentation examples or simple test cases, but probably not common otherwise. |
I think that @haesleinhuepf write about semantic typing. Something which allows determining which parameter is an image, which one is mask or labeling etc. This could allow simple add support from GUI (like napari) with proper semantic setting parameters. |
No necessarily. But thanks for bringing this idea up! I like that idea. :-) |
If it cannot bring with type annotation then maybe adding some special field like |
Addresses (does not close) #3153.
Update: checklist of remaining items or things to discuss
Image
be the nD version?)Literal
or notI started playing a bit with mypy and started adding type annotations in the signature of a few functions in
skimage.segmentation
to see how it goes:random_walker
slic
watershed
This is very much WIP for various reasons.
Annotated
object exists only for python 3.7, one would need to think how to replace it for py3.6Some small caveats:
type: ignore
comments for such imports.