8000 add type annotations in some modules in skimage.segmentation by emmanuelle · Pull Request #4794 · scikit-image/scikit-image · GitHub
[go: up one dir, main page]

Skip to content

add type annotations in some modules in skimage.segmentation #4794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

emmanuelle
Copy link
Member
@emmanuelle emmanuelle commented Jun 18, 2020

Addresses (does not close) #3153.

Update: checklist of remaining items or things to discuss

  • validate the names of image types (should Image be the nD version?)
  • do we want to use Literal or not
  • what kind of tests can we add? Is it enough to have the mypy call in travis?
  • this is a proof of concept PR, but should I go ahead and type more and more functions? It's likely that we'll discover new things to discuss along the way, I'm just a but afraid of the gigantic PR and reviewer fatigue.

I started playing a bit with mypy and started adding type annotations in the signature of a few functions in skimage.segmentation to see how it goes:

  • random_walker
  • slic
  • watershed

This is very much WIP for various reasons.

  • I vendored some numpy code which has just been merged in their master but this should be explained in a clearer way
  • I expect the CI builds to fail at least for 3.6 because the Annotated object exists only for python 3.7, one would need to think how to replace it for py3.6
  • the types of array_like objects are matter to debate
  • we should decide whether we want that functions return an annotated type or not if we want to build pipelines using type annotations, output types must be compatible with input types of other functions.

Some small caveats:

@pep8speaks
Copy link
pep8speaks commented Jun 18, 2020

Hello @emmanuelle! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 270:14: E261 at least two spaces before inline comment

Line 9:1: E302 expected 2 blank lines, found 1

Line 104:5: E123 closing bracket does not match indentation of opening bracket's line
Line 104:5: E125 continuation line with same indent as next logical line

Line 10:80: E501 line too long (85 > 79 characters)
Line 105:5: E123 closing bracket does not match indentation of opening bracket's line
Line 105:5: E125 continuation line with same indent as next logical line
Line 277:5: E303 too many blank lines (2)

Line 15:9: E126 continuation line over-indented for hanging indent

Line 22:9: E704 multiple statements on one line (def)
Line 24:9: E704 multiple statements on one line (def)

Line 5:25: E127 continuation line over-indented for visual indent
Line 7:25: E127 continuation line over-indented for visual indent
Line 9:29: E127 continuation line over-indented for visual indent
Line 11:25: E127 continuation line over-indented for visual indent
Line 13:25: E127 continuation line over-indented for visual indent

Line 5:80: E501 line too long (122 > 79 characters)

Comment last updated at 2020-07-26 15:56:09 UTC

from ._array_like import ArrayLike
import numpy as np

from typing_extensions import Annotated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to crash with py3.6 but apparently it does not ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(a) that's fascinating, and (b) who cares, apparently today is Py3.7 day! =D

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well we have py3.6 builds but they pass (not the ones building the docs but this is not an import problem)

8000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I'm saying is we can drop those builds so this is not really worth worrying about. =) See: https://numpy.org/neps/nep-0029-deprecation_policy.html#support-table

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop the builds we accept that de facto future versions will not support py3.6. Can't believe that Google Colab is still supporting only py3.6 but I just checked and it seems to be the case. So we can make this decision but it should be motivated by more than just this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's motivated by following NEP29. I'm super not interested in supporting Google Colab or other proprietary technologies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally understand why you're not interested :-). Nevertheless I think a lot of people are using Colab for deep learning on images and scikit-image for pre/post-processing.

@emmanuelle emmanuelle mentioned this pull request Jun 18, 2020
@emmanuelle
Copy link
Member Author

@scikit-image/core it's wip but comments are welcome to steer the future direction of this work. The CI is failing on the doc build (sphinx-gallery auto examples), I can try to solve this now but the code structure may evolve.

@@ -4,7 +4,7 @@
import numpy as np

from . import _marching_cubes_lewiner_luts as mcluts
from . import _marching_cubes_lewiner_cy
from . import _marching_cubes_lewiner_cy # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do Cython modules need to be ignored?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See python/mypy#8575, mypy just does not read .pyx files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also reading python/mypy#7542 now.

@@ -160,12 +160,12 @@ def __init__(self, slice, label, label_image, intensity_image,
self._cache = {}
self._ndim = label_image.ndim

@property
@property # type: ignore # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo?

Also, same as above, why do @property decorators need to be ignored? It seems like a pretty major issue that MyPy would have had to have addressed by now, surely?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See python/mypy#1362. People are still linking to this issue as recently as Dec 2019 so I'm not sure it's fixed. Let me take a closer look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and sure this was a search and replace typo

Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
skimage2ndimage.update({x: x for x in funcs})
funcs_binary = ('binary_erosion', 'binary_dilation', 'binary_opening',
'binary_closing', 'black_tophat', 'white_tophat')
skimage2ndimage.update({x: x for x in funcs_binary})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this change. =)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise mypy was complaining that the length had changed...

Comment on lines 267 to 272
def random_walker(data: ImageArray, labels: LabelsArray, beta: float = 130,
mode: Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j',
tol: float = 1.e-3, copy: bool = True,
multichannel: bool = False, return_full_prob: bool = False,
spacing: Optional[Sequence] = None, *,
prob_tol: float = 1e-3) -> LabelsArray:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def random_walker(data: ImageArray, labels: Labe F438 lsArray, beta: float = 130,
mode: Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j',
tol: float = 1.e-3, copy: bool = True,
multichannel: bool = False, return_full_prob: bool = False,
spacing: Optional[Sequence] = None, *,
prob_tol: float = 1e-3) -> LabelsArray:
def random_walker(
data: ImageArray,
labels: LabelsArray,
beta: float = 130,
mode: Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j',
tol: float = 1.e-3,
copy: bool = True,
multichannel: bool = False,
return_full_prob: bool = False,
spacing: Optional[Sequence] = None,
*,
prob_tol: float = 1e-3,
) -> LabelsArray:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Literal mean that I can't do:

mode = get_mode_from_gui_button()
output = random_walker(image, labels, mode=mode)

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you type your get_mode_from_gui_button or not? If you type it the script above should pass mypy. But probably you see get_mode_from_gui_button as a more generic function, and then it would not be possible to type it with such a specific type as Literal['cg', 'cg_j', 'cg_mg', 'bf']?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jni could you please explain in more details what you meant here? Remember that type hints are not used at runtime but only by the type checker.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emmanuelle yes but that is perfectly valid code to run through the type checker! I meant with my code snippet to show an example in which mode= would be a variable, not always a literal string.

Also side note that function type annotations are available at runtime in the __annotations__ attribute of the function object!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought that Literal was a cool thing in particular for UI generation (dropdown with the discrete values) but it is quite restrictive for type checking indeed. For example

from typing_extensions import Literal
from typing import Any


def my_func(x: Literal['a', 'b']) -> str:
    return x + x


x: Any = 'a'
my_func(x) # ok
y: Any = 'c'
my_func(y) # ok even if wrong value, but with type Any it passes
z: str = 'a'
my_func(z) # not ok because type is str and not Literal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(dropdown with the discrete values)

Perhaps another good use for 'Annotated'? =)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hum yes very smart! I must admit I don't completely understand how Annotated works (even after reading PEP 593). For example in the following code it's not possible to write directly Annotated[str, 'a', 'b'] in the function signature and I don't understand why. There are workarounds like defining the annotated type outside of the signature, so it's not a big deal, but I'd prefer to understand this better...

from typing_extensions import Annotated

def modes():
    return ['a', 'b']

# error: Name 'a' is not defined
# error: Name 'b' is not defined
def my_func(x: Annotated[str, "a", "b"]) -> str:
    return x + x

# ok
def my_other_func(x: Annotated[str, modes()]) -> str:
    return x + x

# ok
modes_type = Annotated[str, 'a', 'b']
# ok
def yet_another_func(x: modes_type) -> str:
    return x + x

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are mixing up some concepts here. Literal is a type, so it can also be a type of a variable. It just means only specific values are allowed. In this case it would mean that get_mode_from_gui_button() needs to have a return type of Literal['cg', 'cg_j', 'cg_mg', 'bf'] (or possibly a subset of those values) to guarantee that only those values will be passed to random_walker

I think the most intuitive explanation I can come up with is that Literals are like Enums, but without the members having a name or the enum itself having a name.

In the case of Literal['cg', 'cg_j', 'cg_mg', 'bf'] = 'cg_j' an Enum does seem like the more appropriate choice though. Which has the advantage that the meaning of the values can be defined and documented in one place, and every place it is used in can then also be found (by simply just looking for code that imports this enum)

The Annotated feature is something I haven't used myself yet. I think the intention for it are third party extensions for complex type checking like dependent types ( which is feature that is essential to properly type numeric code like anything that uses numpy, but that is a different topic). My intuition is that it is most likely not the solution to whatever problem you are trying to solve, but I don't fully understand the problem itself yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmagin

I don't fully understand the problem itself yet

I'm glad you asked! 😂 We have two issues:

  • NumPy arrays do double and triple duty in the library: they can represent an image (which can be 2D, 3D, nD, 2D+channels, ...), a set of coordinates, a set of spatial extents... So we need a way to distinguish between all these types, while still allowing bare NumPy arrays through no problem. Here NewType is indeed what I thought was the right solution, although it is not the right solution e.g. for distinguishing between multichannel and single channel arrays.
  • The other issue we have is that for many functions (and we hope to continue to increase how many), not just NumPy arrays are allowed, but actually any array-like, if they implement __array__ and maybe NEP-18 (__array_function__). In this case, NewType falls over because we actually need to use a protocol, and that protocol is defined by NumPy, and you can't use NewType with a protocol.

if not isinstance(sigma, coll.Iterable):
sigma = np.array([sigma, sigma, sigma], dtype=dtype)
sigma /= spacing.astype(dtype)
sigma /= spacing.astype(dtype) # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not get this to pass without the ignore. Basically, spacing has initially a sequence type and so this line would fail because a sequence has not astype method. I tried to use the cast function to cast spacing to np.ndarray but it did not work (much to my confusion).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, actually I think this is a case of mypy uncovering a potential bug in our code. If spacing is a sequence other than a tuple or list, we will get to here and have an error, because we only convert spacing (line 266 above) if it's a list or a tuple. So I in fact would suggest changing that elif to an else, and then maybe this will pass? 🤞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but it does not work. Specifically,

    if spacing is None:
        spacing = np.ones(3, dtype=dtype)
    else:
        spacing = np.ascontiguousarray(spacing, dtype=dtype)

    spacing = cast(np.ndarray, spacing)
    reveal_type(spacing)
    if not isinstance(sigma, coll.Iterable):
        sigma = np.array([sigma, sigma, sigma], dtype=dtype)
        sigma /= spacing.astype(dtype)

and when running mypy:

$ mypy --ignore-missing-imports skimage/segmentation/
skimage/segmentation/slic_superpixels.py:270: note: Revealed type is 'Union[typing.Sequence[Any], None]'
skimage/segmentation/slic_superpixels.py:273: error: Item "Sequence[Any]" of "Optional[Sequence[Any]]" has no attribute "astype"
skimage/segmentation/slic_superpixels.py:273: error: Item "None" of "Optional[Sequence[Any]]" has no attribute "astype"
Found 2 errors in 1 file (checked 26 source files)

It's as if the cast was completely ignored... Any expert of mypy around here? :-p @kne42 @chrisconlan

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @emmanuelle! Well, the change should go in anyway. =)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue might be that the spacing variable is never declared as np.ndarray. The error that mypy seems to ignore for some reason is that spacing = cast(np.ndarray, spacing) is already a type error because the right hand side is an expression of type np.ndarray but the left hand side is a variable of type Optional[Sequence].

I think the proper solution is to split the spacing variable into two, one with type Optional[Sequence] the other of type np.ndarray.

Comment on lines 6 to 10
ImageArray = Annotated[np.ndarray, 'image',
'a 2D or 3D single or multichannel image']
Image2dArray = Annotated[np.ndarray, 'image_2d']
ImagendArray = Annotated[np.ndarray, 'image_nd',
'nd image']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd strongly prefer for Image to be nD, and Image2D and Image3D to be special cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Array may be unnecessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most often we want to specify either "N-d" or "2/3D".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have

  • N-d
  • 2/3D
  • 2D only (for example segmentation.active_contour or segmentation.chan_vese)

Should we pick up Image3d for 2/3d or will it be confusing to people? Looking for a good name here.

10000 Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have multichannel vs spatial to consider.

Maybe Image2D3D? Your suggestion of Image3D for both can also work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the above again, having Image3D double for both would probably be fine?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you aware of the NewType feature? On first glance it looks like this is what you actually want to use. Annotated is ignored by mypy itself and supposed to be used for third party extensions to mypy. NewType ensures that only something that is specifically typed as Image is passed and not just any ndarray.

emmanuelle and others added 4 commits June 27, 2020 16:59
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
Co-authored-by: Juan Nunez-Iglesias <juan.nunez-iglesias@monash.edu>
@emmanuelle
Copy link
Member Author
emmanuelle commented Jun 28, 2020

Note that until numpy 1.20 is released in 5-6 months, the type np.ndarray will be considered as Any and hence static type validation will not be very useful. Maybe we could add a CI build running mypy and building skimage with numpy master. For example, running mypy on the code below will pass with numpy 1.19 but fail with numpy master (as it should)

import numpy as np
from typing_extensions import Annotated

Image = Annotated[np.ndarray, 'image']


def add_images(img1: Image, img2: Image) -> Image:
    return img1 + img2
    
    
im1: np.ndarray = np.random.random((10, 10))
im2: np.ndarray = np.random.random((10, 10))
res = add_images(3, im2)

@emmanuelle
Copy link
Member Author

With such types, introspecting types at runtime (for example for profiling or UI generation) would look like

from skimage.typing import Image, Image2D, Image3D, Labels, Mask
from skimage import segmentation
import inspect
from typing_extensions import _AnnotatedAlias
from typing import _GenericAlias, Optional

image_types = [Image, Image2D, Image3D, Labels, Mask]

spec = inspect.getfullargspec(segmentation.watershed)
print(spec)
for arg in spec.args:
    arg_type = spec.annotations[arg]
    if type(arg_type) == _AnnotatedAlias:
        for im_type in image_types:
            if arg_type == im_type:
                print(arg, ":", im_type)
    elif type(arg_type) == _GenericAlias:
        for im_type in image_types:
            if arg_type == Optional[im_type]:
                print(arg, ":", im_type)
    else:
        print(arg, ":", arg_type)

Which results in

FullArgSpec(args=['image', 'markers', 'connectivity', 'offset', 'mask', 'compactness', 'watershed_line'], varargs=None, varkw=None, defaults=(None, 1, None, None, 0, False), kwonlyargs=[], kwonlydefaults=None, annotations={'return': typing_extensions.Annotated[numpy.ndarray, 'labels', 'array of integers representing labels'], 'image': typing_extensions.Annotated[numpy.ndarray, 'image_nd', 'nd image'], 'markers': typing.Union[typing_extensions.Annotated[numpy.ndarray, 'labels', 'array of integers representing labels'], NoneType], 'connectivity': typing.Union[int, skimage.typing._array_like._SupportsArray], 'offset': typing.Union[numpy.ndarray, NoneType], 'mask': typing.Union[typing_extensions.Annotated[numpy.ndarray, 'mask', 'array of bools to be used as mask'], NoneType], 'compactness': <class 'float'>, 'watershed_line': <class 'bool'>})
image : typing_extensions.Annotated[numpy.ndarray, 'image_nd', 'nd image']
markers : typing_extensions.Annotated[numpy.ndarray, 'labels', 'array of integers representing labels']
mask : typing_extensions.Annotated[numpy.ndarray, 'mask', 'array of bools to be used as mask']
compactness : <class 'float'>
watershed_line : <class 'bool'>

@emmanuelle
Copy link
Member Author

CI seems to be passing yeah! I added a checklist at the top of the PR with a few items to discuss.

@lagru
Copy link
Member
lagru commented Jul 15, 2020

but I'd be interested in having a variety of opinions here since we do have quite a variety of users. -- @emmanuelle

After following this discussion I find the arguments in favor of type annotations in signatures slightly more convincing. Although, I really don't like two points that were already raised about signature annotations:

  • It's another thing new contributors have to learn; I feel like that list is ever growing. What do you think about about making type annotations optional for new contributors? Maintainers could do it themselves by pushing to the PR (the suggest feature might come in really handy) at least until it's more established in the ecosystem which might increase the likelihood that contributors are more familiar.
  • I hate the look right now; this might be something that is alleviated over time with familiarity. Better syntax highlighting could be a huge help here to place more emphasis on the stuff that's actually code and not "just annotation" / differentiate parameter names, types and default values. Unfortunately annotations lie right in the middle of names and defaults and many editors are not there yet.

@lagru lagru added 🤔 Planning 💬 Discussion 📄 type: Documentation Updates, fixes and additions to documentation labels Jul 15, 2020
Comment on lines +10 to +12
image_dim: int,
connectivity: Optional[Union[int, _SupportsArray]],
offset: Optional[Sequence[int]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
image_dim: int,
connectivity: Optional[Union[int, _SupportsArray]],
offset: Optional[Sequence[int]],
image_dim: int,
connectivity: Optional[Union[int, _SupportsArray]],
offset: Optional[Sequence[int]],

I'd be interested in your reasoning about adding two indent levels and deviating from the "norm" here? 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lagru "the norm" is a choice that black made that I dislike. PEP8 encourages two levels to distinguish the arguments from the body of the function: https://www.python.org/dev/peps/pep-0008/#indentation

@stefanv
Copy link
Member
stefanv commented Jul 15, 2020

@fmagin I am so appreciative of the time you took to review the issue 🙏 I agree that NewType may be sufficient for our current use-case.

@lagru I hear you on the ever-growing list of things people need to know to contribute. Fortunately, since this is a standard language feature, I think many people will soon be familiar with it. Also, since typing can be done gradually, there is no pressure to immediately annotate all functions---we can indeed help out new developers with that process (hey, a week ago I had never even written a type annotation myself ;).

W.r.t. how these look: I agree with @jni that formatting/highlighting makes a big difference.

def foo(image: Image, mapping: Mapping[Text, Any] = ...) -> Sequence[Text]:

vs

def foo(
    image: Image,
    mapping: Mapping[Text, Any] = ...
) -> Sequence[Text]:

@fmagin
Copy link
fmagin commented Jul 15, 2020

@lagru

making type annotations optional for new contributors

Could you give a more concrete example of what you mean? I don't know the form of typical new contributions here, but if they are fixing existing bugs that should not involve types much.
Mypy also has a ton of options, so you could e.g. just enforce type annotations on function signatures and object attributes, but allow variables having the implicit any type. Or you could only enforce that if a type annotation is present it must be correct, but that leaving the annotation out is just equivalent to a linter warning and not a hard CI failure.

I think if someone contributes a new feature at least type signature of the functions should be present before it is merged, otherwise the implicit Any return will cause issues in the rest of the code. Not necessarily an overt error, but mypy silently accepting Any and propagating it.

I hate the look right now; this might be something that is alleviated over time with familiarity. Better syntax highlighting could be a huge help here to place more emphasis on the stuff that's actually code and not "just annotation" / differentiate parameter names, types and default values. Unfortunately annotations lie right in the middle of names and defaults and many editors are not there yet.

This just prompted me to find out how to do this in PyCharm, because I was curious myself:
You can simply go to Settings->Editor->Python and scroll down to "Type Annotations". Now you can change the styling of it, e.g. make it italic and a lower contrast to whatever your themes background color is.

My default looks like this (Solarized Light):
image

but it is easily change able to this (the color of a comment/docstring)
image

or even mark it more strongly as something that can be skipped (like the annotations for implicit parameter names that IntelliJ shows for some languages):
image

So this is actually be quite easy to do for most IDEs. I am actually considering this now, because in PyCharm most of the benefit for me is the tab completion and the added type info in the doc ( accessed with Ctrl+Q when hovering over an identifier) anyway.

Copy link
@fmagin fmagin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got around to actually reading the code changes and looking at the context. There are basically two kinds of problems/improvements I see:

1. Lack of NewType usages.

Both as a replacement for Annotated and also in other places where it is quite useful.

2. A common Python pattern that is actually incompatible with static typing, but easy to change.

Namely that a function argument is of Type A (like Sequence) and then converted to Type B (like np.ndarray) on its first use which is then reassigned to the function argument again. Which is technically a type error because you cant assign a value of Type B to a variable of Type A.
Because mypy treats external library functions without type annotations as returning Any, the functions for creating an object of type np.ndarray ( like np.array, np.ones, np.ascontiguousarray) are treated as returning Any. Without specific mypy flags, a value of type Any is valid in all places (even if they have a more specific type) instead of being valid in no places (except if they are explicitly declares as Any)

The solution is simply to assign the function argument of Type A to a new variable of Type B after converting it to Type B.

@@ -267,12 +267,12 @@ def __getattr__(self, attr):
f"'{type(self)}' object has no attribute '{attr}'"
)

@property
@property # type: ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the type ignored here? In this example np.sum seems annoying to properly type annotated for all cases, but that is the problem of the numpy project. In the specific case of a property that just wraps np.sum(self.image) I would expect the return type to be simple float or np.ndarray or at least just Union[float,np.ndarray], which is already a lot more information than type ignore. With type: ignore I had to lookup the type signature of np.sum myself now to even know that this can actually return an array and not just a float.

The same idea probably applies to the other type: ignore annotations in the rest of the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See python/mypy#1362, which has a lot of people complaining that mypy does not support the property decorator :-).

Copy link
@fmagin fmagin Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You had me confused there for a minute and questioning my memories ;)
I have to admit though that this is a lot more complicated than I assumed when I wrote the comment:
The linked issue is about a subtly more specific case: mypy doesn't support decorated properties but it supports the property decorator. So this specific instance of # type: ignore might actually be needed. But I am not even sure if this issue applies for just "properties that are decorated further" or also for "some function that is decorated and then wrapped with a property".
At least some of the cases further down only use @property though, and I am fairly sure that it should work in that case by simply annotating the return type of the decorated method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmagin we flagged this above in this comment, which admittedly GitHub makes difficult to see, and the reason is found here:

python/mypy#1362

image_dim: int,
connectivity: Optional[Union[int, _SupportsArray]],
offset: Optional[Sequence[int]],
) -> Tuple[np.ndarray, np.ndarray]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might also be something that can be improved even further by using NewType. The doscstring states that one of the returned arrays is a "structuring element" in the form of an array of bool, and the other is an offset as an array of int.
With NewType this can be made explicit and this type can then also be used in the signature of e.g. _offsets_to_raveled_neighbors (which is where the return value of this function ends up in in one case)
Its docstring for the parameter states:

    selem : ndarray
        A structuring element determining the neighborhood expressed as an
        n-D array of 1's and 0's.

If selem is annotated with the same NewType that is returned here, this another example of documentation that can be formalized in the type system for both human and algorithmic use. Unless a n-D array of 1's and 0's. is actually different to an array of bool which would be even more important to emphasize.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good example for the question I had above: how do we maintain parallel hierarchies of (1) NumPy arrays that are NumPy arrays but also a specific type of NumPy arrays, and (2) array-likes that might actually be dask arrays or OpenCL arrays or whatever.

if not isinstance(sigma, coll.Iterable):
sigma = np.array([sigma, sigma, sigma], dtype=dtype)
sigma /= spacing.astype(dtype)
sigma /= spacing.astype(dtype) # type: ignore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue might be that the spacing variable is never declared as np.ndarray. The error that mypy seems to ignore for some reason is that spacing = cast(np.ndarray, spacing) is already a type error because the right hand side is an expression of type np.ndarray but the left hand side is a variable of type Optional[Sequence].

I think the proper solution is to split the spacing variable into two, one with type Optional[Sequence] the other of type np.ndarray.

if not isinstance(sigma, Iterable):
sigma = np.array([sigma, sigma, sigma], dtype=dtype)
sigma /= spacing.astype(dtype)
sigma /= spacing.astype(dtype) # type: ignore
elif isinstance(sigma, (list, tuple)):
sigma = np.array(sigma, dtype=dtype)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as above: sigma is initially declared as the type Union[float, Sequence[float]], but here it is assigned the result of np.array. The type checker continues to treat sigma as its declared type, and because np.array is function from an external untyped library and not a constructor mypy just treats it as returning Any which isn't considered an error by default. So it just assumes that this is correct and sigma is still guaranteed to be of type Union[float, Sequence[float]], leading to type issues further down in the code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for clarifying this for us! 🎉

@emmanuelle
Copy link
Member Author

@lagru thanks for your input. I should have added that there is a third option "do not add type annotations" which we need to consider, if we decide that the cost for users is too big (even if for developers it should have benefits). I really like the idea of @zpincus that IDEs could have an option whether to display type annotations. This is probably something that should be discussed on Twitter to reach devs of different IDEs. Also 👍 for having maintainers help along with type annotations, when a PR is proposing a new function for the public API, adding type annotations is a small task for maintainers compared to the overall review work.

One thing we could do is release quickly a new version of scikit-image with just one subpackage typed and see what is the reaction of people. If we don't type the whole package it's clear that it's still experimental.

@emmanuelle
Copy link
Member Author

Thanks @fmagin all your comments are very valuable. I tried using NewType with

Image = NewType('Image', np.ndarray)

but then I got the following error when running mypy:

Argument 2 to NewType(...) must be subclassable (got "Any")

At the moment it is not possible to make a NewType from Any (although this has been discussed python/mypy#6701) and as you pointed out numpy objects are considered as Any in the absence of type annotations. This problem should go away with the next version of numpy which should be typed (but it will be in about 5 months).

I also see the following caveat with NewType: if third-party libraries want to use a scikit-image function using an Image NewType, they can't just pass a numpy array if they want to check their types with mypy, since mypy requires explicit casting to the NewType from the origin type. What I liked with Annotated is that you just pass along some indications but it's not more restrictive compared to the original np.ndarray object (if I understand well, I need to read again what you wrote about Annotated).

@emmanuelle
Copy link
Member Author

@fmagin if you don't mind, I'd have a couple of questions to submit to your expertise on type annotations.

  • is it possible to define a type accepting only numpy arrays of floats, or numpy arrays of dimension 2?
  • is it possible to define a type encompassing numpy arrays and other types of duck arrays (cupy arrays etc.)? I tried using the _SupportsArray protocol defined in numpy (which we vendor here before numpy is released), but when an object x is defined as _SupportsArray and x.astype or x.ndim is called, mypy errors because the _SupportsArray protocol guarantees only to have an __array__ method. Is there any way to work around this? You mentioned that mypy has a lot of options, is it possible for example to say that we allow any other additional methods for _SupportsArray?

@fmagin
Copy link
fmagin commented Jul 15, 2020

@emmanuelle

At the moment it is not possible to make a NewType from Any (although this has been discussed python/mypy#6701)

That is a problem I wasn't aware of yet, thanks. I assumed it would at least work for directly importing some class. A possible workaround is using a type alias instead, i.e. simply declaring it as Image = np.ndarray and then importing Image everywhere it is needed. As soon as numpy has stubs only this line needs to be changed to Image = NewType('Image', np.ndarray) again and you instantly have all the benefits.

I also see the following caveat with NewType: if third-party libraries want to use a scikit-image function using an Image NewType, they can't just pass a numpy array if they want to check their types with mypy, since mypy requires explicit casting to the NewType from the origin type.

I consider this a feature of NewType. Is it typical that third party libraries just create numpy arrays that are images? Or would they use scikit-image provided functions like converting a filename or pillow Image object to a numpy array?

is it possible to define a type accepting only numpy arrays of floats

not without numpy supporting it, but I'd expect them to support it. Should be achievable with generics, like annotating a builtin python list as List[Float].

is it possible to define a type accepting only [...] numpy arrays of dimension 2

That is somewhat complicated. Depending on what you want exactly this might be either possible with Literal Types and Generics, or is a feature that would make type checking an undecidable problem. I am not familiar enough with the type theory details here and uses of numpy you are imagining. Either way that is something that numpy has to decide and support, not much can be done if you are just using numpy.

I vaguely remember that one of the use cases of Annotated is to provide the metadata for this so possible future mypy extensions could handle it. https://www.python.org/dev/peps/pep-0586/#true-dependent-types-integer-generics mentions this.

is it possible to define a type encompassing numpy arrays and other types of duck arrays (cupy arrays etc.)? I tried using the _SupportsArray protocol defined in numpy (which we vendor here before numpy is released),

I have no idea, I didn't need protocols so far. My intuition is that it is possible, but I can't help you with the specifics.

@mkcor
Copy link
Member
mkcor commented Jul 15, 2020

Just to add my two cents, I like inline annotations (type hints in function signatures) provided it's done with proper/conventional formatting/indentation. My viewpoint is the same as @zpincus's.

@stefanv
Copy link
Member
stefanv commented Jul 15, 2020

I consider this a feature of NewType. Is it typical that third party libraries just create numpy arrays that are images? Or would they use scikit-image provided functions like converting a filename or pillow Image object to a numpy array?

Yes, I think pretty typical: people sometimes produce random test arrays (np.random.random((50, 50))). The whole point of using arrays for images is that we want any array to be passable to skimage routines, and it should just work.

This is one of the things I'm wondering about. If an input argument is, say, a dict, then usually you can just write: my_func(config={'a': 'some_val'}). If you now say:

def my_func(config: Config) ...

where Config = NewType('Config', dict), are you still able to invoke it that way, or do you first have to define the input dictionary to be of type Config? If so, that would be annoying to the user.

If not, please disregard!

@fmagin
Copy link
fmagin commented Jul 16, 2020

I consider this a feature of NewType. Is it typical that third party libraries just create numpy arrays that are images? Or would they use scikit-image provided functions like converting a filename or pillow Image object to a numpy array?

Yes, I think pretty typical: people sometimes produce random test arrays (np.random.random((50, 50))). The whole point of using arrays for images is that we want any array to be passable to skimage routines, and it should just work.

But do they do that regularly in code that also should pass a static type checker? That is the thing that somewhat confuses me.
The only required thing would be to simply import the Image type and write it as Image(np.random.random((50, 50))), if the Image type is in an importable file (i.e. not in a separate stub file). IMO that seems perfectly reasonable because the kind of code that is expected to pass a static type check would also want to make it clear that a random 2D array is intended to be treated as an Image. And compared to characters required to create the array in the first place it seems reasonable to also require the cast or conversion to Image at the point where it should be treated as an image.

This is one of the things I'm wondering about. If an input argument is, say, a dict, then usually you can just write: my_func(config={'a': 'some_val'}). If you now say:

def my_func(config: Config) ...

where Config = NewType('Config', dict), are you still able to invoke it that way, or do you first have to define the input dictionary to be of type Config? If so, that would be annoying to the user.

If not, please disregard!

I think TypedDict is appropriate for this, but this doesn't address your point.

Both of these points are similar in a way: As far as I understand they only apply if the user has actively decided that their code must pass their mypy configuration check. The runtime doesn't care. PyCharm just treats those type errors as a warning by default, and this is also configurable. I have it turned up to a full error, but you could also reduce it the level of a linting issue. The third party library doesn't have to pass the scikit-image mypy checks. I think they could even just disable type checks for certain scikit-img functions or the module if they want to pass their own mypy checks but don't want to use the types that a library they use requires. After the first call to any API typed as using Image I would expect that any new array returned is typed as Image anyway, so this still applies only rarely.

My personal experience is that I annotated a library I was using and strongly argued for using NewType when the maintainers didn't want to include it at first, because I wanted my code to be properly typed and this was only really possible with NewType in that case. Additionally my experience with even a small (200 LoC maybe) script that used numpy was that I extensively used NewType to keep track what the actual meaning of an array was. I can't really imagine the mindset of someone deciding their code must pass mypy and have strict typing, who is also annoyed by a library they are using that is using strict typing. But I have to admit that I usually work on codebases in the context of program analysis, so people there, including myself, are a lot more likely to have a deeper interest in type theory and programming language theory then a typical user of the scientific Python community, so my experience might not be representative of typical scikit-image users.

@zpincus
Copy link
Contributor
zpincus commented Jul 16, 2020

Sorry I'm late to the party on some of this -- and note that I don't have a lot of skin in the game at this point, so I hope this isn't just bikeshedding, but @jni makes a good point:

This is a good example for the question I had above: how do we maintain parallel hierarchies of (1) NumPy arrays that are NumPy arrays but also a specific type of NumPy arrays, and (2) array-likes that might actually be dask arrays or OpenCL arrays or whatever.

The fundamental problem here is that duck-typing on array-like objects is a fantastic feature. Moreover, this goes counter to an idea that crops up earlier in the comments: that it's useful to have to specify when a specific array is "actually" an image. This seems quite wrongheaded to me. It's almost a category error to think that there exist arrays that are definitively "not images". If I have a dataset that I want to apply a skimage routine to (be it 1-d, 2-d, or 20-d), the data are certainly image enough (provided the skimage function can handle the dimensionality).

But duck-typing on something as diverse as "anything that presents an array-like interface" is clearly very hard to square with type annotations, absent either runtime functions to cast things to Image or related types (and maybe do some 2- / 3- / n-d validation), or a big whitelist of "acceptable" duck-types. Hence a big chunk of the discussion above.

So, how much of a problem is it, really, if skimage functions just always accept Any for image inputs? It weakens the type-checker a bit, but I would argue not fatally. I get that annotations are optional, but having to remember to re-cast my numpy array back to Image every time I do some pure-numpy operation to the image also removes a lot of the utility of the type checker... I'd say that this might be a case where perfect is the enemy of the good.

(Those who have been around since the early days of numpy will remember all the trouble that trying to use specialized subclasses of numpy arrays caused when doing numpy operations -- i.e. under what circumstances do you get the subclass back vs. a pure ndarray? It just turned into whack-a-mole after a while. And I wonder if this isn't going to wind up a lot like that).

And importantly, note that any validation function that takes Any and then gives back an Image if the input is properly 2D or whatever still turns a type-check problem into a runtime error. So there could be very little win over just accepting Any for images to begin with and doing run-time validation that the duck quacks right.

@jni
Copy link
Member
jni commented Jul 16, 2020

@zpincus (👋)

It's almost a category error to think that there exist arrays that are definitively "not images".

😂 love this.

For simplicity below I'll talk about NumPy arrays only, but this applies to duck arrays.

My idea with this type hierarchy is that a bare, unannotated NumPy array will always validate as an Image, because an Image is a NumPy array. What won't validate is something like a Coords, returned by peak_local_max, for example.

That validation is useful, but the annotation is even more useful for packages like napari which can, in the not too distant future I hope, use the annotation to determine whether to add an Image layer or a Points layer or a Vectors layer.

Ultimately, types are ignored at runtime, and I think if you want to get creative with your array library mixing, you can take a bit of flak from the type checker. =P But, absolutely, "raw" arrays should 100% validate as "image" arrays. I would certainly not entertain a system that didn't pass this test.

@fmagin
Copy link
fmagin commented Jul 16, 2020

@zpincus @jni

Thanks for that input and summary! As I said, my experience with numpy is limited to basically a 200 LoC implementation of a perceptron, so I never really used numpy much. There is a good chance that this will change in the next years, so I guess on some level I am just invested in making sure that by that time there is proper typing ;)

The duck type argument is quite convincing and I now understand how NewType is fairly annoying in that case and don't think it is a good choice anymore. Thank you for your patience in explaining this!

I would still expect that there is value (especially for new contributors) in some way to document the intended use of a numpy array though:
I think the better solution is then various Type Aliases. They don't change the type checking ( it's like declaring a variable zero with the value 0 and using that every where instead.) The type checker will treat it identical to np.ndarray because it is really just the same as assigning the variable Image the value np.ndarray. What this allows is to still encode some intended use of an array. In the example from earlier #4794 (comment) this would still allow someone unfamiliar with the code (like me in that case) who encounters a function returning something described as a "structuring element" to use the IDE navigation to find every other function that expects or returns something described as a "structuring element". I think of it as a similar use as naming constants in this case if that makes sense. If you are familiar with the code (1 + 5 ** 0.5) / 2 or 1.618033988749895 might just be obviously the same two you, but as someone unfamiliar I would have appreciated if you had used scipy.constants.golden instead. Then I could easily find all the other code using it by finding usages of that constant. The type alias can also be for whatever the type for "quacks like an array" ends up being, and has the advantage that it can be then defined in one place, and changing it from Image = np.ndarray to Image = quacks_like_array_type is only a one line change. Multiple aliases also don't conflict, so it's still legal to pass something of the type "Image" to something expecting "StructuringElement" because for the type checker it is simply the same. You are really just passing the same thing instead.

Annotated would also achieve this, but if the extra metadata isn't read you are effectively just using Type Aliases anyway.

That validation is useful, but the annotation is even more useful for packages like napari which can, in the not too distant future I hope, use the annotation to determine whether to add an Image layer or a Points layer or a Vectors layer.

I think that will actually not work with Type Aliases. That could be good use for Annotated then, but I am not quite sure. But I don't understand either the intended use case for Annotated nor this use case enough. So I think I'll just encourage you to try it out and see if this will work out.

@haesleinhuepf
Copy link
Contributor

Hi all,

I'm very excited about adding type annotations to scikit-image functions. If one of you can guide me, and e.g. provide instructions like "all image input parameters should have type image:X and all output parameters should be output:Y=None and tests should pass :D", I'd be very happy to contribute and/or pay a student to volunteer ;-)

Let me know if/how I can help! And no hurry though.

Cheers,
Robert

@grlee77
Copy link
Contributor
grlee77 commented Jul 22, 2021

Thanks @haesleinhuepf , I think it would be great if you want to take this up again! I am not very experienced with typing, and haven't read through this issue recently, so don't have much specific advice. Maybe @stefanv, @jni or @emmanuelle are more knowledgable on static typing and could provide guidance on the best path forward?

I know SciPy has started adding typing recently (see scipy.spatial for example) and conditionally imports numpy.typing which was introduced in NumPy 1.20 for use of their ArrayLike type.

Perhaps something to consider for 1.0 would be making the accepted types in scikit-image functions a bit stricter, requiring that all image inputs be arrays and dropping support for sequences like list, tuple, etc. Use of lists as images is sometimes nice for small documentation examples or simple test cases, but probably not common otherwise.

@Czaki
Copy link
Contributor
Czaki commented Oct 4, 2021

Perhaps something to consider for 1.0 would be making the accepted types in scikit-image functions a bit stricter, requiring that all image inputs be arrays and dropping support for sequences like list, tuple, etc. Use of lists as images is sometimes nice for small documentation examples or simple test cases, but probably not common otherwise.

I think that @haesleinhuepf write about semantic typing. Something which allows determining which parameter is an image, which one is mask or labeling etc. This could allow simple add support from GUI (like napari) with proper semantic setting parameters.

@haesleinhuepf
Copy link
Contributor

I think that @haesleinhuepf write about semantic typing.

No necessarily. But thanks for bringing this idea up! I like that idea. :-)

@Czaki
Copy link
Contributor
Czaki commented Oct 4, 2021

If it cannot bring with type annotation then maybe adding some special field like __semantic__ which will contain semantic annotation for special fields (like input/output array) and a set of decorators to allow other creators to use it.

@soupault soupault marked this pull request as draft October 27, 2023 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 type: Documentation Updates, fixes and additions to documentation 💬 Discussion 🤔 Planning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0