8000 ENH: Add annotations to `ndarray` and `generic` · Issue #17368 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Add annotations to ndarray and generic #17368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks done
BvB93 opened this issue Sep 23, 2020 · 6 comments
Closed
5 tasks done

ENH: Add annotations to ndarray and generic #17368

BvB93 opened this issue Sep 23, 2020 · 6 comments

Comments

@BvB93
Copy link
Member
BvB93 commented Sep 23, 2020

1.20 will be the first numpy release featuring type hints and while the typing of numpy
is very much on ongoing process it would be great if we could get ndarray and generic
fully typed before its release.

TODO

@BvB93
Copy link
Member Author
BvB93 commented Oct 16, 2020

So it seems that annotating the comparison operations, == and != more specifically, will be problematic.

The problem here is that object.__eq__ is compatible with any object, resulting in a symmetry issue.
We can influence the output type of ndarray(...) == object(...), but can't affect the opposite case
(object(...) == ndarray(...)) in any way, shape or form. The latters return type will always be inferred
as builtins.bool.

A (simplified) example:

from numpy.typing import ArrayLike

class ndarray:
    def __eq__(self, value: ArrayLike) -> ndarray: ...

array = ndarray()

# The good
reveal_type(array == 1)  # Revealed type is 'ndarray'

# The bad
reveal_type(1 == array)  # Revealed type is 'builtins.bool'

@NeilGirdhar
Copy link
Contributor
NeilGirdhar commented Oct 29, 2020

In case you're interested, I ran into this issue and it motivated me to make a proposal to python-ideas, which Guido suggest I post to typing-sig. Please feel free to weigh in here.

@charris
Copy link
Member
charris commented Nov 23, 2020

Pushing off remaining tasks to 1.21.

@bagrounds
Copy link
bagrounds commented Apr 23, 2021

So it seems that annotating the comparison operations, == and != more specifically, will be problematic.

The problem here is that object.__eq__ is compatible with any object, resulting in a symmetry issue.
We can influence the output type of ndarray(...) == object(...), but can't affect the opposite case
(object(...) == ndarray(...)) in any way, shape or form. The latters return type will always be inferred
as builtins.bool.

A (simplified) example:

from numpy.typing import ArrayLike

class ndarray:
    def __eq__(self, value: ArrayLike) -> ndarray: ...

array = ndarray()

# The good
reveal_type(array == 1)  # Revealed type is 'ndarray'

# The bad
reveal_type(1 == array)  # Revealed type is 'builtins.bool'

Ah, looks like the problem here is that == is assumed (by the type system) to return a boolean (which makes sense, as the typical use case for equality), but numpy sometimes returns other types, like an array of booleans (which allows convenient syntax when working with arrays).

What is the official Python stance on the type that == is allowed to return?

According to this:

eq(a, b) is equivalent to a == b .... Note that these functions can return any value, which may or may not be interpretable as a Boolean value. See Comparisons for more information about rich comparisons.

But then following the link on comparisons, we see:

Comparisons yield boolean values: True or False.

So the documentation seems to contradict itself.

Maybe we can try to clarify the official Python stance on what types == (and other comparison operators) can return.

If the official stance is that equality can return any type, the type system should probably be changed to reflect that. By the way, I know there are multiple static type checkers for python - do all of them exhibit the same behavior here?

If the official stance is that they can only return boolean values, perhaps numpy could consider introducing a new function to handle the special cases, and change the comparison operators to conform to the spec. Though I'm sure that would be a disruptive breaking change, so I'm not sure how willing the numpy community would be to adopt that change in order to gain static typing.

I wonder if a compromise solution would work in the short term. Maybe something like this?

class ndarray:
    def __eq__(self, value: ArrayLike) -> ndarray | bool: ...

@seberg
Copy link
Member
seberg commented Apr 23, 2021 & 8000 #8226;

You can do the ndarray | bool annotation, but does it change anything?

What is the official Python stance on the type that == is allowed to return?

While I expect some at python may have discontent about how NumPy does comparison operators, there is not much point to discuss it in my opinion.
NumPy uses it, and will continue to do it since it is convenient and consistent in an array-programming context (even if that context is occasionally slightly at odds with "pure" Python). And array-programming tools, such as NumPy, are a huge chunk of the python ecosystem... (Plus, its not like array-programming is the only example. For example missing values (NA) use Kleene logic NA == NA -> NA.)

Long story short: I think this is clearly something that typing-sig must solve, there is nothing NumPy can do aside from trying to help with that process.

@BvB93
Copy link
Member Author
BvB93 commented Apr 25, 2021

There are a number of discussions on the __eq__ issue scattered around the typing, typeshed and mypy repos (e.g. python/typeshed#3685). it is not entirely clear to me in the first place if this should be fixed in typeshed or mypy in the first place.

You can do the ndarray | bool annotation, but does it change anything?

It's a question of whether or not we want to be incorrect 50% or ~100% of the time.
Honestly, for the sake of consistency I'm favoring the latter.

In any case, the only way in see this issue getting "fixed" in the short term is via a mypy plugin, though this will be rather invasive as we'd have to check every single __ne__ and __eq__ call for the presence of an ndarray. Even then, there are serious issues with generalizability, as I'm not sure how easily this band-aid can be applied to subclasses. And let's not even get started on array-likes that do not inherit from ndarray (e.g. the dask Array or pandas' DataFrame).

@jorenham jorenham added the 57 - Close? Issues which may be closable unless discussion continued label Aug 17, 2024
@jorenham jorenham removed the 57 - Close? Issues which may be closable unless discussion continued label Sep 3, 2024
@jorenham jorenham closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
0