8000 [Doc]: Most functions in axes.Axes accept datetime objects as well as floats, docstrings should reflect that · Issue #25676 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

[Doc]: Most functions in axes.Axes accept datetime objects as well as floats, docstrings should reflect that #25676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ldorigo opened this issue Apr 13, 2023 · 14 comments

Comments

@ldorigo
Copy link
ldorigo commented Apr 13, 2023

Documentation Link

No response

Problem

Related: microsoft/pylance-release#4238

Most functions of axes.Axes are marked as accepting floats, but datetime objects seem to work just fine. Docstrings should reflect this fact.

Suggested improvement

No response

@oscargus
Copy link
Member

Thanks for reporting this and #25677!

Note that for Matplotlib 3.8 we will provide our own stubs, so I'll also add the types label to those issues (and most likely pylance will not update their stubs).

@tacaswell
Copy link
Member

Due to the way the units handling works it is run-time configurable the exact types that can be passed in.

@jklymak
Copy link
Member
jklymak commented Apr 13, 2023

By default Matplotlib also accepts lists of strings:

and as @tacaswell points out, it can accept anything so long as there is a converter set up for it:

https://matplotlib.org/stable/api/units_api.html#module-matplotlib.units

I don't know how we convey all of that in the docstring of each and every plotting method that has units. OTOH, it wouldn't be a terrible idea for us to try, particularly in the case of methods that require deltas (eg errorbar with timedelta + datetime) that require extra explanation.

One issue with going through this effort is we are trying to overhaul data (and hence units) handling.

@jklymak
Copy link
Member
jklymak commented Apr 13, 2023

Note that for Matplotlib 3.8 we will provide our own stubs

Should we let the pylance folks about this?

@debonte
Copy link
debonte commented Apr 13, 2023

Note that for Matplotlib 3.8 we will provide our own stubs

Should we let the pylance folks about this?

I'm on the Pylance team. This is great news! When is 3.8 expected to ship?

@tacaswell
Copy link
Member

We aim for a ~6mo release cadence so probably July/August. I am going to advocate for an RC by scipy, but we have not set dates yet.

There are nightly wheels up if you want to try now.

@Higgs32584
Copy link
Contributor

ok, so I looked at the code, and this is more or less the standard entry that is mentioned as qualifying as accepting
functions in axes.Axes accept datetime objects as well as floats
example

if we want to include something like this we can either go very specific and just list the date time object, but it appears that we can accept SOME types of datetime objects but not others, which sounds like it could potentially be a pain if we want to manually document it.

datetime--> YES
datetime_CAPI-->NO,
time --> NO
timedelta --> NO
'timezone --> YES
tzinfo --> NO

@Higgs32584
Copy link
Contributor

Just a side point, I would not be opposed to adding supports for time and time delta, especially considering that datetime, and timezone already have supports in place for it to work.

time is a class that represents a time of day, with no date or time zone information. It has attributes for hour, minute, second, microsecond, and time zone offset

timedelta is a class that represents a duration or difference between two dates or times. It can be used to perform arithmetic operations on datetime objects, such as adding or subtracting a certain number of days, hours, minutes, or seconds.

@ksunden
Copy link
Member
ksunden commented Apr 20, 2023

I'm going to primarily address the type checking side of this problem. The docstring side is an "easier" problem, because we can say whatever we need to there. Not trivial because we don't want to raise the barrier to entry for the simple case, but we have mechanisms to play with.

When doing #24976, it was an intentional choice to leave out some of the unit specific behaviors/modifications to signatures (note, for instance, that units.py, category.py and dates.py have no pyi stub file yet). Not because I don't want to address it, but because it is nuanced enough that trying to ensure it was done correctly as part of a massive PR was untenable. Now that it is in and we have a basis to make perterbative changes from, I think this is more feasible.

As @tacaswell and @jklymak have pointed out, the units system is actually dynamic behavior as to what is allowed, and therefore essentially reverts to Any, which means that no static type checking can actually be done on these values. But that is not a particularly satisfying answer... And I think there is a path to do better.

We use np.typing.ArrayLike in many places. In most of our usage (at least as far as plotting methods go) we usually actually mean "a sequence of scalar values that our unit system understands". Often the individual method has further restrictions on shape/ndim, sometimes it will be 2D, others 1D, etc.

In fact, ArrayLike does not specify all of that (at least not yet) and is a fairly simple union, but it does include scalar values, which for many of our purposes we don't want, actually. It is a handy short hand that covers 90% of usage, but is a bit more lax than we would even want it to be. (np.ndarray allows specifying shape/dtype to an extent, but that doesn't help when you wish to be accepting of things other than actual arrays) I think advocating for and exploring options to allow the ArrayLike type hint to be made generic over dtype/shape would be valuable, if challenging, but for now work with what we have.

ArrayLike falls short in two main places:

  • When we use units to provide values that don't fall into the nested sequence types numpy expects:
    • Of the default unit converters this is actually a relatively limited set of:
      • datetime.datetime (and, indirectly, datetime.timedelta)
      • datetime.date
      • `decimal.Decimal
    • str, bytes, and numpy versions of any types are valid for ArrayLike
  • When we want a scalar value in axis units:
    • Currently we use float in most places, which is inaccurate
    • This same logic applies to axes.[xy]ticks, which are currently typed as Sequence[float], matching the docstring, but really do need to be expanded, possibly to simply ArrayLike

So, my proposal for keeping utility in type hinting while expressing intent through type hints:

Part 1: Provide types to use and work with

  • I propose a type MplScalar (name negotiable, but don't want to occlude the possibility of np having a Scalar type... in fact they do have a private _ScalarLike already)
    • By default, this would include all of the above types plus the others that are valid but already included in ArrayLike such as float, str, etc.
    • This is the type hint (plus standardized language to use in docstrings) for when we expect a single value in axis units.
    • To aid new users, perhaps should be in docstrings as "float or matplotlib scalar" or some similar language. possibly in type hints as float | MplScalar as well, even though that is actually redundant.
  • We then extend np.ArrayLike with this scalar type (and nested sequences thereof) as MplArrayLike
    • this may require either vendoring or using private functionality of numpy to get the nested sequence type they use in ArrayLike... consult with them for recommendations on that front.
    • I'm tempted to remove the scalars here as we usually mean something which has a shape, but for consistency, lean towards simple expansion.

These two types, with consistent usage and docstring equivalents, should solve any problems for users of matplotlib who do not register new types to converters. Users who do so may need to still # type: ignore some mpl related code.

Part 2: But what about my units types?

Converters for types which extends the "Supports Array" protocol should already be sorted for at least the ArrayLike case, but not necessarily the scalar case. Notably pint.Quantity passes as a float, but unyt.unyt_quantity does not (unyt generally has the dynamically set attributes for standard usage problem, so statically checking it is usually not helpful).

Fundamentally, registering new types is dynamic behavior that cannot be statically type checked without external knowledge.
But, we can provide a mypy (and/or pyright, but I have looked into pyright ecosystem a little less, perhaps @debonte could comment more on that) plugin to at least enable interested users to type check, even though it requires some config and is not automated.

There is some precedent in the ecosystem for supplying a mypy plugin. Numpy provides one that does platform specific checks for available dtypes.

I've not implemented one, but the process doesn't seem too hard.
We could decide to distribute as a separate pypi package if we didn't want to tie it to our release cycle, for instance.

Not the cleanest solution, as it may require end users to configure their type checking, especially in the IDE case (for libraries, this can generally be included in the type checker configs, but if those are not found for further downstream code then it could lead to errors being found which shouldn't be)

Part 3: Make the configuration more automatic

This is admittedly a bigger change that would need further discussion before implementing.
We could use entrypoints as a new mechanism of specifying a unit converter registration. The type checking plugin could then perhaps inspect the entrypoints to expand the valid types at checking time without relying on dynamic behavior.
Implementing this way may carry the risk of prematurely importing things, which may not be desirable, but we may be able to do so lazily, at which point it may be fine.

@Higgs32584
Copy link
Contributor
Higgs32584 commented Apr 20, 2023

I would not be opposed to that, I think what we would first need is a full list of valid objects for MPL Scalar, and if we can agree on the uniformity of such an object, i.e. some universal objects that are accepted in a large number of use cases, that would make a lot more sense to specify accepted types in one place.

@tacaswell
Copy link
Member

If the converters are typed is is possible to explain to the typing system that those are the types it needs to match? I am not sure how powerful the generics are (and I think there are open PEPs in this area e.g. https://peps.python.org/pep-0695/).

We know a number of down-stream libraries register their own unit converters so I do not think part 2 is a corner case.

As pointed out above, we care as (or more) about the number of dimensions and shape (and if it matches between inputs) as the type which is not currently expressed in the type system.

At the end of the day, because we do have dynamic inputs, Any may in fact the correct static typing (as can be expressed in the current type system). I think we need to be careful to not let the perfect be the enemy of the good here and avoid getting too far into an ontological quagmire.

My read is that we have to pick between giving false-negatives (current state where the type checkers flag on something they should not) or giving false-positive (using Any so the typecheckers are always happy). My sense is we should accept false-negatives but do the simplest possible thing that will get lists of datetime to stop flagging the type checkers (is a bunch of overloads enough?).

I'm pretty 👎🏻 on vendoring chunks of numpy's typing or of using anything private.

8000
@ksunden
Copy link
Member
ksunden commented Apr 21, 2023

To be clear, the chunk in particular for the vendoring route is not actually any real implementation, it is only their protocol class definition of a "Nested Sequence": https://github.com/numpy/numpy/blob/main/numpy/_typing/_nested_sequence.py
It is a standalone file with only stdlib imports, so risk is pretty minimal of that route. Essentially it would be "use theirs to avoid duplicating the effort ourselves"

The implementation would look something like:

MplScalar = float | datetime.datetime | datetime.date | datetime.timedelta | Decimal | str | bytes
# Could include int, bool explicitly, but both are actually included in float...
# left out complex for our purposes.

# To use np private apis:
from numpy._typing import _DualArrayLike
MplArrayLike = _DualArrayLike[np.dtype[Any], MplScalar]
# We _could_ restrict the dtype defined types, perhaps, but likely little to no reason to.

# To vendor Nested Sequence, but not rely on _DualArrayLike
MplArrayLike = ArrayLike | MplScalar | _NestedSequence[MplScalar]

# Might work without vendoring, but might have slightly different semantics than ArrayLike's implementation:
T = TypeVar("T")
NestedSequence: TypeAlias = Sequence[Union["NestedSequence[T]", T]]
MplArrayLike = ArrayLike | MplScalar | NestedSequence[MplScalar]

The last option there is actually close to what we do for "HashableList", for subplot_mosaic, and may actually be valid without using vendored or private code, actually... just its checking is slightly different than what ArrayLike does.

Overloads could work, with some caveats for pyplot generation (which maybe I should just bite the bullet and resolve anyway, should be possible, and it comes up often enough already...)

Generics are an interesting idea to play with... would require that the units type (well, an upper bound of the units type, which could have a default) be known at axis(/axes) creation time, but could probably be worked to default to Any, and so be opt-in to stricter checking, providing the power to those who want it, while the lack of complaints to those who don't, while still giving us TypeVars with useful names to declare intent. May also have pyplot interface problems that would need to be dealt with, but shouldn't be impossible.

I do think that type hints for e.g. plt.subplots and other pyplot axes creation methods would make narrowing the type hard though...

@Higgs32584
Copy link
Contributor
Higgs32584 commented Apr 21, 2023

In the meantime I can work on the docs, what do we want to say specifically?

@ksunden
Copy link
Member
ksunden commented Apr 21, 2023

@Higgs32584 your enthusiasm is appreciated, but not sure we have a full decision yet (and it may change with how we address the type checking side of things)

I actually think generics can work, playing with the idea a bit:

I added a generic over xunits yunits to Axes (don't actually use it yet, was just doing so for type checking instantiation)

diff --git a/lib/matplotlib/axes/_axes.pyi b/lib/matplotlib/axes/_axes.pyi
index 15dc1cf255..7c05d6e3f5 100644
--- a/lib/matplotlib/axes/_axes.pyi
+++ b/lib/matplotlib/axes/_axes.pyi
@@ -32,12 +32,15 @@ import matplotlib.streamplot as mstream
 import datetime
 import PIL
 from collections.abc import Callable, Sequence
-from typing import Any, Literal, overload
+from typing import Any, Literal, Generic, overload, TypeVar
 import numpy as np
 from numpy.typing import ArrayLike
 from matplotlib.typing import ColorType, MarkerType, LineStyleType
 
-class Axes(_AxesBase):
+XUnits = TypeVar("XUnits")
+YUnits = TypeVar("YUnits")
+
+class Axes(_AxesBase, Generic[XUnits, YUnits]):
     def get_title(self, loc: Literal["left", "center", "right"] = ...) -> str: ...
     def set_title(
         self,

With results:

# Decl separate from assignment is needed for plt.subplots because of tuple return
fig: Figure
ax_hinted_subplots: Axes[datetime.datetime, datetime.datetime]
fig, ax_hinted_subplots = plt.subplots()
ax_hinted: Axes[astropy.units.Quantity, astropy.units.Quantity] = plt.subplot()
ax_hinted_no_generic: Axes = plt.subplot()

fig1, ax_inferred_subplots = plt.subplots()
ax_inferred =<
82D9
/span> plt.subplot()

reveal_type(ax_hinted)             # Axes[Quantity, Quantity] by pyright, Axes[Any, Any] by mypy (Astropy not typed, so doesn't check)
reveal_type(ax_hinted_subplots)    # Axes[datetime, datetime]
reveal_type(ax_hinted_no_generic)  # Axes[Unknown, Unknown] by pyright, Axes[Any, Any] by mypy
reveal_type(ax_inferred_subplots)  # Any
reveal_type(ax_inferred)           # Axes[Unknown, Unknown] by pyright, Any by mypy

No errors in any of this (not that that precludes problems further down the line)...

My gut feeling is this could work essentially perfectly for the OO API, at least. I think pyplot may be a little stuck still though, as the valid types are based on hidden state. I think we could just expand to "Any" in pyplot though and say "if you want stricter type checking, use the OO api, that we usually recommend for other reasons anyway" (if it appears more than once in the signature, we can keep the typevar to say "hand me the same kind of thing for each of these", perhaps... but if it only appears once, then it may even warn for the type checker...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants
0