8000 ENH: make typing module available at runtime by person142 · Pull Request #16558 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: make typing module available at runtime #16558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 17, 2020

Conversation

person142
Copy link
Member

Closes #16550.

This makes np.typing.ArrayLike and np.typing.DtypeLike available
at runtime in addition to typing time. Some things to consider:

  • ArrayLike uses protocols, which are only in the standard library
    in 3.8+, but are backported in typing_extensions. This
    conditionally imports Protocol and sets _SupportsArray to Any
    at runtime if the module is not available to prevent NumPy from
    having a hard dependency on typing_extensions. Since e.g. mypy
    already includes typing_extensions as a dependency, anybody
    actually doing type checking will have it set correctly.
  • We are starting to hit the edges of "the fiction of the stubs". In
    particular, they could just cram everything into __init__.pyi and
    ignore the real structure of NumPy. But now that typing is available
    a runtime, we have to e.g. carefully import ndarray from numpy
    in the typing module and not from ..core.multiarray, because
    otherwise mypy will think you are talking about a different
    ndarray. We will probably need to do some shuffling the stubs into
    more fitting locations to mitigate weirdness like this.

ArrayLike = Any
DtypeLike = Any
_SupportsArray = Any
from numpy.typing import ArrayLike, DtypeLike, _SupportsArray
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also execute all the code in pass, so we're also testing here that you can really import these things at runtime.

@@ -0,0 +1,3 @@
from ._array_like import _SupportsArray, ArrayLike
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't much code in this package, but since typing is so verbose it would be a little painful to keep Any, overload, ... out of the public namespace if we crammed everything in a typing.py file. So instead make a package and use this init to make sure we're only exporting exactly what we mean to.

from numpy import ndarray
from ._dtype_like import DtypeLike

if sys.version_info >= (3, 8):
Copy link
Member Author
@person142 person142 Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the "no hard dependency on typing_extensions" dance mentioned in the PR description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be an idea to issue an ImportWarning if HAVE_PROTOCOL = False?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd worry about that because it's pretty common for projects to run their tests suites with warnings turned into errors, and I could see something like e.g. a SciPy test run that doesn't have mypy installed and errors out because of the warning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that might be a problem, yes.
Does NumPy have a logger where where such information could be displayed? Since I suspect that silently setting _SupportsArray to Any (and by extension ArrayLike) could result in some unexpected issues (at least from a end user perspective).

If not, then it should be mentioned in the documentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No logger, when most of the action is happening in C code (potentially with the GIL released) logging doesn't work great. I added a big warning to to the top of the documentation in 4a120f0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so be it.
At least it is documented now, which is the most important thing.

import sys
from typing import Any, overload, Sequence, Tuple, Union

from numpy import ndarray
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of what I was trying to describe in the PR description where if we do from numpy import ndarray, then mypy goes and looks in __init__.pyi, finds ndarray, and types it correctly. But if we were to do from ..core.multiarray import ndarray, then it would find no stubs for that file and fall back on treating ndarray as Any, which would be bad.

Closes numpy#16550.

This makes `np.typing.ArrayLike` and `np.typing.DtypeLike` available
at runtime in addition to typing time. Some things to consider:

- `ArrayLike` uses protocols, which are only in the standard library
  in 3.8+, but are backported in `typing_extensions`. This
  conditionally imports `Protocol` and sets `_SupportsArray` to `Any`
  at runtime if the module is not available to prevent NumPy from
  having a hard dependency on `typing_extensions`. Since e.g. mypy
  already includes `typing_extensions` as a dependency, anybody
  actually doing type checking will have it set correctly.
- We are starting to hit the edges of "the fiction of the stubs". In
  particular, they could just cram everything into `__init__.pyi` and
  ignore the real structure of NumPy. But now that typing is available
  a runtime, we have to e.g. carefully import `ndarray` from `numpy`
  in the typing module and not from `..core.multiarray`, because
  otherwise mypy will think you are talking about a different
  ndarray. We will probably need to do some shuffling the stubs into
  more fitting locations to mitigate weirdness like this.
@person142
Copy link
Member Author

Now that I think of it, there is a much simpler solution to this problem-keep the current typing.pyi and define a typing.py file that just contains:

from typing import _Any

DtypeLike = _Any
ArrayLike = _Any

That is, the types are correctly defined at typing time and unconditionally defined to be whatever are runtime. It's short, avoids the "where do I import from" question, and the "is typing_extensions installed" question.

The drawback is that runtime introspection of the types is now impossible, but we mainly intended this as a syntactic convenience, i.e. we wanted people to be able to from numpy.typing import ArrayLike unconditionally versus having to do that in a TYPE_CHECKING block.

What do people think?

@BvB93
Copy link
Member
BvB93 commented Jun 10, 2020

Now that I think of it, there is a much simpler solution to this problem-keep the current typing.pyi and define a typing.py file that just contains:

I like this solution, though I feel if we ever decide to expose a Protocol subclasses then those should also be available at run-time.
Namely, this would allow one to use them for performing isinstance() and issubclass() checks.

Nevertheless, this is not an issue if we're just exposing ArrayLike and DtypeLike.

@BvB93
Copy link
Member
BvB93 commented Jun 10, 2020

Now that I think of it, there is a much simpler solution to this problem-keep the current typing.pyi and define a typing.py file that just contains:

Though I do feel a value more descriptive than Any would be useful.

What about a plain string?

DtypeLike = "numpy.typing.DtypeLike"
ArrayLike = "numpy.typing.ArrayLike"

@person142
Copy link
Member Author

Though I do feel a value more descriptive than Any would be useful.

The reason I like Any is that it is very noncommittal, which hopefully screams "please don't use this at runtime for anything" (and additionally makes it really hard to use it at runtime for anything). Though yeah, it is potentially confusing for users... That could mean that the extra effort here to make things available at runtime is worth it, or maybe it means we need to be extra clear in the documentation what's happening.

@eric-wieser
Copy link
Member

Though yeah, it is potentially confusing for users...

Right, using Any means that the type annotations given by Sphinx or help(...) are going to be useless.

@BvB93
Copy link
Member
BvB93 commented Jun 10, 2020

Come to think if it, this non-run-time only solution will also run into issues if ndarray (and thus by extension ArrayLike) become a generic with respect to the data type.
This would mean that, in the future, syntaxes such as ArrayLike[np.integer] would be impossible, requiring the user to wrap the whole package in a string ("ArrayLike[np.integer]"). This bring us back to square one, where ArrayLike is only usable to its full extent during non-run-time.

@person142
Copy link
Member Author

This would mean that, in the future, syntaxes such as ArrayLike[np.integer] would be impossible, requiring the user to wrap the whole package in a string

Hm yeah that’s absolutely convinced me that making this work at runtime is worth it.

from numpy import ndarray
from ._dtype_like import DtypeLike

if sys.version_info >= (3, 8):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be an idea to issue an ImportWarning if HAVE_PROTOCOL = False?

HAVE_PROTOCOL = True

if HAVE_PROTOCOL:
class _SupportsArray(Protocol):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for the future:
If _SupportsArray ever becomes public we should make it useable for run-time isinstance() and issubclass() checks, similar to the likes of SupportsInt (ref).

@eric-wieser
Copy link
Member

would be impossible, requiring the user to wrap the whole package in a string ("ArrayLike[np.integer]"). This bring us back to square one, where ArrayLike is only usable to its full extent during non-run-time.

This is the direction python is moving in anyway. from __future__ import annotations makes the quotes implicit.

@BvB93
Copy link
Member
BvB93 commented Jun 10, 2020

This is the direction python is moving in anyway. from __future__ import annotations makes the quotes implicit.

Eh, kind of?
What you're saying is, strictly speaking, true but it's also a bit of a slippery slope: at that point there is seemingly less and less reason for numpy.typing to be available at run-time in the first place.

Nevertheless, as of now the from __future__ import annotations import is only available for python 3.7 and later anyway, which makes that solution less than ideal.
That, and it would still complicate assigning, for example, ArrayLike[int] to a variable, as the subscription will be executed during run-time.

>>> from __future__ import annotations
>>> from numpy.typing import ArrayLike

>>> def func(ar: ArrayLike[int]):  # This will work fine, even if ArrayLike is set Any during runtime
...     pass

>>> type_alias = ArrayLike[int]  # This won't
Traceback (most recent call last):
  ...
TypeError: typing.Any is not subscriptable

@person142
Copy link
Member Author

Ok, I've added initial documentation for the numpy.typing module in 70130f8.

Typing `ArrayLike` correctly relies on `Protocol`, so warn users that
they should be on 3.8+ or install `typing-extensions` if they want
everything to work as expected.
@person142
Copy link
Member Author

After all the discussion above, how do people feel about the approach taken here?

@charris charris changed the title MAINT: make typing module available at runtime ENH: make typing module available at runtime Jun 13, 2020
@charris
Copy link
Member
charris commented Jun 13, 2020

The release notes for typing work should probably go under New features rather than improvements and there should also be a release note for this PR. @seberg How does one label a release note fragment for two word section headers?

@WarrenWeckesser
Copy link
Member
WarrenWeckesser commented Jun 13, 2020

How does one label a release note fragment for two word section headers?

According to README.rst in doc/release/upcoming_changes, it's #####.new_feature.rst.

@seberg
Copy link
Member
seberg commented Jun 13, 2020

The list of available sections is in the pyproject.toml.

person142 added a commit to person142/numpy that referenced this pull request Jun 14, 2020
@person142
Copy link
Member Author

Release note added in c88f5a2; opened #16603 to fix up the previous typing release note to be a new feature.

@mattip
Copy link
Member
mattip commented Jun 15, 2020

A few questions, I am new to the world of typing. Feel free to point me to background reading if that is easier than answering the questions directly.

Does this slow down import time?

Could you give a higher-level picture of when availability of typing at runtime is desired, what use-case this answers? Is it typical for libraries (vs. user code) to provide such an ability?

@person142
Copy link
Member Author

Does this slow down import time?

It shouldn't in that it I opted not to add any typing import to the top level __init__.py under the assumption that (since Python typing is fairly verbose) people will generally be doing from numpy.typing import ArrayLike to keep the type annotations short. But, that's maybe a somewhat disingenuous answer, so here's the breakdown I'm getting from a

$ python -X importtime -c "import numpy.typing"

numpy-typing-import-time

what use-case this answers?

Right, so it's important to be clear that this doesn't enable anything new*; see e.g. numpy/numpy-stubs#66 (comment) for a discussion of ways to use the things in numpy.typing without making them available at runtime. The argument from @shoyer in numpy/numpy-stubs#66 (comment) was that it might be confusing for users that they can't do things like

from numpy.typing import ArrayLike

x: ArrayLike = [1, 2, 3, 4]

*Though I will note that there are packages that do runtime introspection of types (pydantic); if anybody wanted to do something like that with ArrayLike then it would not be possible if it's only available at typing time.

Is it typical for libraries (vs. user code) to provide such an ability?

I think that it is atypical. It's hard to say why, some things that come to mind:

  • Not many packages have types yet, so there just aren't many examples to look at.
  • For more class-based packages the classes already are the types, so they don't need to do anything extra to export those types. If NumPy had some array like ABC, then we would use that mechanism.
  • NumPy is a lot more foundational than a lot of other packages.

If so inclined, @ethanhs might be able to offer better insights on this question.

else:
_SupportsArray = Any

ArrayLike = Union[bool, int, float, complex, _SupportsArray, Sequence]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to support buffer protocols, but Python's typing doesn't support that yet. In the meantime I would suggest adding memoryview and a comment referencing python/typing#593

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since memoryviews are Sequences we do allow them currently, e.g.

import numpy as np

x = b'foobar'
v = memoryview(x)
np.array(v)

passes mypy. I will add the comment though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test explicitly for memoryviews though; I'll open an issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd highly recommend commenting on this B.P.O issue asking for support for a Buffer protocol in typing, as that is the better place than the typing issue: https://bugs.python.org/issue27501

Comment on lines 40 to 44
.. code-block:: python

np.array(x**2 for x in range(10))

is valid NumPy code which will create an object array. The types will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this example would be better motivated by showing the output, REPL style:

In [2]: np.array(x**2 for x in range(10))
   ...:
Out[2]: array(<generator object <genexpr> at 0x1118c5a20>, dtype=object)

Could also substitute "object array" -> "0-dimensional object array"

Most readers will probably not realize that this code works in an unexpected way otherwise!

Comment on lines 44 to 45
is valid NumPy code which will create an object array. The types will
complain about this usage however.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language "the types will complain" sounds a little weird to me.

Types don't complain, they just are :).

Instead, I would say "Type checkers will complain"


is valid NumPy code which will create an object array. The types will
complain about this usage however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a brief note on the suggested work-around?

The obvious way would be to add a comment disabling typing:

  np.array(x**2 for x in range(10))  # type: ignore

Are there other recommended options?

I think we've also discussed making checks less strict if dtype=object is specified, e.g.,

  np.array(x**2 for x in range(10), dtype=object)

I don't know if that works yet. If it does, perhaps we should mention it, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other recommended options?

The other way we test for:

https://github.com/numpy/numpy/blob/master/numpy/tests/typing/pass/array_like.py#L43)

is adding an explicit Any annotation. I've added examples of both methods to the docs.

I don't know if that works yet. If it does, perhaps we should mention it, too.

Seems like @seberg would probably know the answer to that?

@shoyer
Copy link
Member
shoyer commented Jun 15, 2020

Does this slow down import time?

It shouldn't in that it I opted not to add any typing import to the top level __init__.py under the assumption that (since Python typing is fairly verbose) people will generally be doing from numpy.typing import ArrayLike to keep the type annotations short.

The overhead of importing the standard library's typing module is quite small, and it's increasingly likely that almost any non-trivial Python program will import typing at runtime -- soon likely including NumPy itself when we add type annotations.

So I think it should be fine to import typing at the top level __init__.py, so np.typing.ArrayLike works.

@person142
Copy link
Member Author

Thanks for the review @shoyer, most points should be addressed in c63f233 and 347a368. I didn't add the typing import to __init__.py yet though, since it seems likely other people will have opinions about that.

@emmatyping
Copy link

Oh, weird, I wrote out a comment but I suppose I didn't hit the comment button.

Anyway, making types available at runtime should be the default in my opinion. There are a few reasons why:

  1. 3.6 support. Python 3.6 not only will be supported until the very end of 2021 (https://devguide.python.org/#status-of-python-branches), it is also the most popular Python version for package downloads off PyPi (https://pypistats.org/packages/numpy, scroll down to "Daily Download Quantity of numpy Package"). You can somewhat get around this issue if you have people put quotes around their types or put everything in stubs, but that isn't very ergonomic.
  2. Type aliases. AType = Union[int, str, ndarray]. These must be evaluated at runtime, or put in if typing.TYPE_CHECKING blocks, which is a bit ugly.
  3. As previously mentioned, it is also less confusing to people newer to typing.

@person142
Copy link
Member Author

Thanks for that perspective @ethanhs.

Anybody else have strong opinions on importing typing in __init__? From the above graph it looks to be something like a 2.7% overhead (though as @shoyer noted, it's becoming increasingly likely that typing will have already been imported somewhere else, which would reduce that). I don't really have strong feelings on this point.

@mattip mattip added the triage review Issue/PR to be discussed at the next triage meeting label Jun 17, 2020
@mattip mattip merged commit 02883d8 into numpy:master Jun 17, 2020
@mattip
Copy link
Member
mattip commented Jun 17, 2020

Thanks @person142

@mattip mattip added triaged Issue/PR that was discussed in a triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Jun 17, 2020
@charris charris mentioned this pull request Oct 10, 2020
20 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 41 - Static typing triaged Issue/PR that was discussed in a triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make numpy.typing available at runtime
9 participants
0