8000 TYP: `datetime64.__sub__` overload order should be consistent with `datetime.datetime` · Issue #28257 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

TYP: datetime64.__sub__ overload order should be consistent with datetime.datetime #28257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
randolf-scholz opened this issue Jan 31, 2025 · 19 comments

Comments

@randolf-scholz
Copy link
randolf-scholz commented Jan 31, 2025

Describe the issue:

datetime.datetime defines the overload order

class datetime(date):
    @overload  # type: ignore[override]
    def __sub__(self, value: Self, /) -> timedelta: ...
    @overload
    def __sub__(self, value: timedelta, /) -> Self: ...

numpy.datetime64 should follow this overload order and first list all overloads for subtracting timestamp-like right-hand side, and after that all overloads that subtract a timedelta-like right-hand side. Without having this order, a generic Protocol abstracting the datetime type algebraically like

class Timestamp[TD](Protocol):
    @overload
    def __sub__(self, other: Self, /) -> TD: ...
    @overload
    def __sub__(self, other: TD, /) -> Self: ...

will match datetime.datetime, but not numpy.datetime64.

Reproduce the code example:

import datetime as dt
from typing import Protocol, Self, overload, Generic, TypeVar

import numpy as np

TD = TypeVar("TD")

class Timestamp(Protocol, Generic[TD]):
    @overload
    def __sub__(self, other: Self, /) -> TD: ...
    @overload
    def __sub__(self, other: TD, /) -> Self: ...


py_dt = dt.datetime(year=2025, month=1, day=31)
foo: Timestamp = py_dt  # ✅
bar: Timestamp = np.datetime64(py_dt)  # ❌

Error message:

tmp.py:10: error: Overloaded function signatures 1 and 2 overlap with incompatible return types  [overload-overlap]
tmp.py:10: note: Flipping the order of overloads will fix this error
tmp.py:17: error: Incompatible types in assignment (expression has type "datetime64[datetime]", variable has type "Timestamp[Any]")  [assignment]
tmp.py:17: note: Following member(s) of "datetime64[datetime]" have conflicts:
tmp.py:17: note:     Expected:
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, datetime64[datetime], /) -> Any
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, Any, /) -> datetime64[datetime]
tmp.py:17: note:     Got:
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, int | integer[Any] | numpy.bool[builtins.bool], /) -> datetime64[datetime]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, datetime, /) -> timedelta
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, timedelta64[int], /) -> datetime64[int]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, timedelta64[timedelta], /) -> datetime64[datetime]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, datetime64[int], /) -> timedelta64[int]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, timedelta64[int], /) -> datetime64[date | int]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, timedelta64[timedelta], /) -> datetime64[date]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, datetime64[date], /) -> timedelta64[timedelta]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, timedelta64[None], /) -> datetime64[None]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, datetime64[None], /) -> timedelta64[None]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, timedelta64[timedelta | int | None] | int | integer[Any] | builtins.bool | numpy.bool[builtins.bool], /) -> datetime64[date | int | None]
tmp.py:17: note:         @overload
tmp.py:17: note:         def __sub__(self, datetime64[date | int | None], /) -> timedelta64[timedelta | int | None]

Python and NumPy Versions:

2.2.2
3.13.1 (main, Dec 4 2024, 08:54:14) [GCC 11.4.0]

Type-checker version and settings:

mypy 1.4.1
pyright 1.1.393

Additional typing packages.

No response

@jorenham
Copy link
Member

That seems like a good idea 👍🏻

@randolf-scholz
Copy link
Author

@jorenham Why are the timedelta64 and datetime64 generic with upper bound None | int | datetime and None | int | timedelta respectively? I do not quite understand what this generic type is for.

@jorenham
Copy link
Member

@jorenham Why are the timedelta64 and datetime64 generic with upper bound None | int | datetime and None | int | timedelta respectively? I do not quite understand what this generic type is for.

.item()

@jorenham
Copy link
Member
jorenham commented Jan 31, 2025

To be a bit more specific, when the generic type is None it's a NaT.
And it's an int, it (usually) means that it has a unit that cannot be expressed using the standard library timedelta or datetime types, e.g. attoseconds or months.

@jorenham
Copy link
Member

I'm only able to reproduce this with mypy (both 1.14.1 and the current release-1.15 branch); pyright (1.1.393) accepts the bar assignment, and so does basedpyright.

Mypy also reports about overlapping overloads, whereas pyright does not. But I don't think that's relevant here, because setting the bound of TD to dt.datetime | np.timedelta64 resolves that, while mypy still rejects the bar assignment.

# src/numpy_play/gh_28257.py
import datetime as dt
from typing import Any, Generic, Protocol, Self, TypeVar, overload

import numpy as np

TD = TypeVar("TD", bound=dt.datetime | np.timedelta64)

class Timestamp(Protocol, Generic[TD]):
    @overload
    def __sub__(self, other: Self, /) -> TD: ...
    @overload
    def __sub__(self, other: TD, /) -> Self: ...

py_dt = dt.datetime(year=2025, month=1, day=31)
foo: Timestamp[Any] = py_dt  # mypy: ✅, pyright: ✅
bar: Timestamp[Any] = np.datetime64(py_dt)  # mypy: ❌, pyright: ✅
full output
❯ uv run basedpyright src/numpy_play/gh_28257.py
0 errors, 0 warnings, 0 notes

❯ uv run pyright src/numpy_play/gh_28257.py
0 errors, 0 warnings, 0 informations 

❯ uv run mypy src/numpy_play/gh_28257.py
src/numpy_play/gh_28257.py:16: error: Incompatible types in assignment (expression has type "datetime64[datetime]", variable has type "Timestamp[Any]")  [assignment]
src/numpy_play/gh_28257.py:16: note: Following member(s) of "datetime64[datetime]" have conflicts:
src/numpy_play/gh_28257.py:16: note:     Expected:
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, datetime64[datetime], /) -> Any
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, Any, /) -> datetime64[datetime]
src/numpy_play/gh_28257.py:16: note:     Got:
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, int | integer[Any] | numpy.bool[builtins.bool], /) -> datetime64[datetime]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, datetime, /) -> timedelta
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, timedelta64[int], /) -> datetime64[int]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, timedelta64[timedelta], /) -> datetime64[datetime]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, datetime64[int], /) -> timedelta64[int]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, timedelta64[int], /) -> datetime64[date | int]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, timedelta64[timedelta], /) -> datetime64[date]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, datetime64[date], /) -> timedelta64[timedelta]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, timedelta64[None], /) -> datetime64[None]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, datetime64[None], /) -> timedelta64[None]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, timedelta64[timedelta | int | None] | int | integer[Any] | builtins.bool | numpy.bool[builtins.bool], /) -> datetime64[date | int | None]
src/numpy_play/gh_28257.py:16: note:         @overload
src/numpy_play/gh_28257.py:16: note:         def __sub__(self, datetime64[date | int | None], /) -> timedelta64[timedelta | int | None]
Found 1 error in 1 file (checked 1 source file)

But even if mypy is reporting a false positive here (which I'm not sure of), and flipping the overloads like you suggested doesn't break anything and resolves this issue, then I don't see the harm in doing that.

@jorenham jorenham self-assigned this Jan 31, 2025
@jorenham
Copy link
Member

Why are the timedelta64 and datetime64 generic with upper bound None | int | datetime and None | int | timedelta respectively?

The datetime64 type param is a bit broader actually, and also accepts dt.date

@jorenham
Copy link
Member
jorenham commented Jan 31, 2025

I did some investigating, and I think that mypy is behaving correctly here, and that Pyright isn't. But I also think that datetime64.__sub__ is behaving correctly here, because of this (11th) overload:

numpy/numpy/__init__.pyi

Lines 4604 to 4605 in 6bc9058

@overload
def __sub__(self: datetime64[dt.date], x: timedelta64[dt.timedelta], /) -> datetime64[dt.date]: ...

The required context is that dt.datetime <: dt.date, and that subtracting datetime64[dt.date] from itself, results in a dt.datetime. But if you subtract a timedelta64[dt.timedelta] from a datetime64[dt.date], then the result could be either a datetime64[dt.date] or a datetime64[dt.datetime]. So because datetime64[dt.date | dt.datetime] is equivalent to datetime64[dt.date], this overload is correct, and -> datetime64[dt.date] isn't assignable to -> Self.

So to conclude: np.datetime64 is broader than dt.datetime, and also broader than dt.date. It therefore isn't assignable to Timestamp[np.timedelta64].

@jorenham jorenham added the 57 - Close? Issues which may be closable unless discussion continued label Jan 31, 2025
@jorenham jorenham closed this as completed Feb 1, 2025
@jorenham jorenham removed the 57 - Close? Issues which may be closable unless discussion continued label Feb 1, 2025
@randolf-scholz
Copy link
Author
randolf-scholz commented Feb 5, 2025

@jorenham Please reopen. I ran some more tests and there do seem to be some defects in the current overloads.

  1. datetime64[dt.datetime] - dt.timedelta supported at runtime, but raises [operator]
  2. datetime64[dt.date] - dt.timedelta supported at runtime, but raises [operator]
  3. datetime64[dt.date] - datetime64[int] infers timedelta64 (expected: timedelta64[int])
  4. datetime64[dt.date] - timedelta64[int] infers datetime64[date | int] (expected: datetime64[int])
  5. datetime64[int] - datetime64[None] infers timedelta64[int] (expected: timedelta64[None])
  6. datetime64[int] - timedelta64[None] infers datetime64[int] (expected: timedelta64[None])

I was able to fix all of these issues, together with my Protocol by very carefully redesigning the overloads. The main issue was not the overload you pointed out, but indeed the order of overloads. The following works:

@overload  # timestamp[date] - timestamp[date] = duration[time]
def __sub__(self: datetime64[dt.date], x: datetime64[dt.date], /) -> timedelta64[dt.timedelta]: ...
@overload  # timestamp[date] - duration[time] = timestamp[date]
def __sub__(self: datetime64[_date], x: timedelta64[dt.timedelta], /) -> datetime64[_date]: ...
@overload  # timestamp[date] - py_timestamp[date] = py_duration
def __sub__(self: datetime64[_date], x: _date, /) -> dt.timedelta: ...
@overload  # timestamp[date] - py_duration = py_timestamp[date]
def __sub__(self: datetime64[_date], x: dt.timedelta, /) -> _date: ...
# ...

but if you were to swap, say 2 and 3, then you get:

tmp.py:173: note: Following member(s) of "datetime64[datetime]" have conflicts:
tmp.py:173: note:     Expected:
tmp.py:173: note:         @overload
tmp.py:173: note:         def __sub__(self, datetime64[datetime], /) -> Any
tmp.py:173: note:         @overload
tmp.py:173: note:         def __sub__(self, Any, /) -> datetime64[datetime]
tmp.py:173: note:     Got:
tmp.py:173: note:         @overload
tmp.py:173: note:         def __sub__(self, datetime64[date], /) -> timedelta64[timedelta]
tmp.py:173: note:         @overload
tmp.py:173: note:         def __sub__(self, datetime, /) -> timedelta
tmp.py:173: note:         @overload
tmp.py:173: note:         def __sub__(self, timedelta64[timedelta], /) -> datetime64[datetime]
tmp.py:173: note:         @overload
tmp.py:173: note:         def __sub__(self, timedelta, /) -> datetime

I am not exactly sure what is happening here, I think it has to do with how mypy tries to match the Any. Not sure if it is a bug or intended either.

I attached a test file (maybe we can integrate this as a unit tests). The overloads I came up with that make it pass:

# Priority 1a: operations against numpy time types.
@overload  # timestamp[date] - timestamp[date] = duration[time]
def __sub__(self: datetime64[dt.date], x: datetime64[dt.date], /) -> timedelta64[dt.timedelta]: ...
@overload  # timestamp[date] - duration[time] = timestamp[date]
def __sub__(self: datetime64[_date], x: timedelta64[dt.timedelta], /) -> datetime64[_date]: ...
# Priority 1b: operations against python time types.
@overload  # timestamp[date] - py_timestamp[date] = py_duration
def __sub__(self: datetime64[_date], x: _date, /) -> dt.timedelta: ...
@overload  # timestamp[date] - py_duration = py_timestamp[date]
def __sub__(self: datetime64[_date], x: dt.timedelta, /) -> _date: ...

# Priority 2: (T - NaT) and (NaT - T) always result in NaT
@overload  # timestamp[nan] - timestamp[T] = duration[nan]
def __sub__(self: datetime64[None], x: datetime64, /) -> timedelta64[None]: ...
@overload  # timestamp[nan] - duration[unknown] = timestamp[nan]
def __sub__(self: datetime64[None], x: timedelta64, /) -> datetime64[None]: ...
@overload  # timestamp[unknown] - timestamp[nan] = duration[nan]
def __sub__(self, x: datetime64[None], /) -> timedelta64[None]: ...
@overload  # timestamp[unknown] - duration[nan] = timestamp[nan]
def __sub__(self, x: timedelta64[None], /) -> datetime64[None]: ...

# Priority 3: integer resolution of any operand causes upcasting
@overload  # timestamp[int] - duration[unknown] = duration[int]
def __sub__(self: datetime64[int], x: datetime64, /) -> timedelta64[int]: ...
@overload  # timestamp[int] - duration[unknown] = timestamp[int]
def __sub__(self: datetime64[int], x: timedelta64, /) -> datetime64[int]: ...
@overload  # timestamp[unknown] - timestamp[int] = duration[int]
def __sub__(self, x: datetime64[int], /) -> timedelta64[int]: ...
@overload  # timestamp[unknown] - duration[int] = timestamp[int]
def __sub__(self, x: timedelta64[int], /) -> datetime64[int]: ...

# Priority 4: other
@overload  # timestamp[unknown] - duration[time] = timestamp[date]
def __sub__(self, x: datetime64[dt.date], /) -> timedelta64[dt.timedelta]: ...
@overload  # timestamp[unknown] - duration[time] = timestamp[date]
def __sub__(self, x: timedelta64[dt.timedelta], /) -> datetime64[dt.date]: ...
@overload  # timestamp[T] - integer = timestamp[T]
def __sub__(self, x: int | integer[Any] | np.bool, /) -> Self: ...
TEST SUITE
import datetime as dt
from typing import Protocol, Self, assert_type, overload

import numpy as np

py_date = dt.date(year=2025, month=1, day=31)
py_dt = dt.datetime(year=2025, month=1, day=31, hour=1, minute=23, second=45)
py_td = dt.timedelta(seconds=37)

np_dt = np.datetime64(py_dt)
np_dt_date = np.datetime64(py_date)
np_dt_int = np.datetime64(100, "ns")
np_dt_nat = np.datetime64(None)

np_td = np.timedelta64(py_td)
np_td_int = np.timedelta64(100, "ns")
np_td_nat = np.timedelta64(None)

# static checks
assert_type(py_date, dt.date)
assert_type(py_dt, dt.datetime)
assert_type(py_td, dt.timedelta)
# np_datetime64
assert_type(np_dt, "np.datetime64[dt.datetime]")
assert_type(np_dt_date, "np.datetime64[dt.date]")
assert_type(np_dt_int, "np.datetime64[int]")
assert_type(np_dt_nat, "np.datetime64[None]")
# np_timedelta64
assert_type(np_td, "np.timedelta64[dt.timedelta]")
assert_type(np_td_int, "np.timedelta64[int]")
assert_type(np_td_nat, "np.timedelta64[None]")

# ----------- runtime checks -------------
# fmt: off
# py_date
assert type(py_date - py_td) is dt.date
assert type(py_date - py_date) is dt.timedelta
# py_dt
assert type(py_dt - py_td) is dt.datetime
assert type(py_dt - py_dt) is dt.timedelta
# np_date
assert type(np_dt_date - py_date)    is dt.timedelta
# assert type(np_dt_date - py_dt)      is dt.timedelta
assert type(np_dt_date - py_td)      is dt.date  # ❌ raises [operator]
assert type(np_dt_date - np_dt)      is np.timedelta64
assert type(np_dt_date - np_dt_date) is np.timedelta64
assert type(np_dt_date - np_dt_int)  is np.timedelta64
assert type(np_dt_date - np_dt_nat)  is np.timedelta64
assert type(np_dt_date - np_td)      is np.datetime64
assert type(np_dt_date - np_td_int)  is np.datetime64
assert type(np_dt_date - np_td_nat)  is np.datetime64
# np_dt
# assert type(np_dt - py_date)    is dt.timedelta
assert type(np_dt - py_dt)      is dt.timedelta
assert type(np_dt - py_td)      is dt.datetime  # ❌ raises [operator]
assert type(np_dt - np_dt)      is np.timedelta64
assert type(np_dt - np_dt_date) is np.timedelta64
assert type(np_dt - np_dt_int)  is np.timedelta64
assert type(np_dt - np_dt_nat)  is np.timedelta64
assert type(np_dt - np_td)      is np.datetime64
assert type(np_dt - np_td_int)  is np.datetime64
assert type(np_dt - np_td_nat)  is np.datetime64
# np_dt_int
# assert type(np_dt_int - py_date)    is dt.timedelta
# assert type(np_dt_int - py_dt)      is dt.timedelta
# assert type(np_dt_int - py_td)      is dt.datetime
assert type(np_dt_int - np_dt)      is np.timedelta64
assert type(np_dt_int - np_dt_date) is np.timedelta64
assert type(np_dt_int - np_dt_int)  is np.timedelta64
assert type(np_dt_int - np_dt_nat)  is np.timedelta64
assert type(np_dt_int - np_td)      is np.datetime64
assert type(np_dt_int - np_td_int)  is np.datetime64
assert type(np_dt_int - np_td_nat)  is np.datetime64
# np_nat
# assert type(np_dt_nat - py_date)    is dt.timedelta
# assert type(np_dt_nat - py_dt)      is dt.timedelta
# assert type(np_dt_nat - py_td)      is dt.datetime
assert type(np_dt_nat - np_dt)      is np.timedelta64
assert type(np_dt_nat - np_dt_date) is np.timedelta64
assert type(np_dt_nat - np_dt_int)  is np.timedelta64
assert type(np_dt_nat - np_dt_nat)  is np.timedelta64
assert type(np_dt_nat - np_td)      is np.datetime64
assert type(np_dt_nat - np_td_int)  is np.datetime64
assert type(np_dt_nat - np_td_nat)  is np.datetime64

# ---------- static checks ----------

# py_date
assert_type(py_date - py_td, dt.date)
assert_type(py_date - py_date, dt.timedelta)
# py_dt
assert_type(py_dt - py_td, dt.datetime)
assert_type(py_dt - py_dt, dt.timedelta)
# np_dt
# assert_type(np_dt - py_date,    dt.timedelta)
assert_type(np_dt - py_dt,      dt.timedelta)
assert_type(np_dt - py_td,      dt.datetime)  # ❌ raises [operator]
assert_type(np_dt - np_dt,      "np.timedelta64[dt.timedelta]")
assert_type(np_dt - np_dt_date, "np.timedelta64[dt.timedelta]")
assert_type(np_dt - np_dt_int,  "np.timedelta64[int]")
assert_type(np_dt - np_dt_nat,  "np.timedelta64[None]")
assert_type(np_dt - np_td,      "np.datetime64[dt.datetime]")
assert_type(np_dt - np_td_int,  "np.datetime64[int]")
assert_type(np_dt - np_td_nat,  "np.datetime64[None]")
# np_date
assert_type(np_dt_date - py_date,    dt.timedelta)
# assert_type(np_dt_date - py_dt,      dt.timedelta)
assert_type(np_dt_date - py_td,      dt.date)  # ❌ raises [operator]
assert_type(np_dt_date - np_dt,      "np.timedelta64[dt.timedelta]")
assert_type(np_dt_date - np_dt_date, "np.timedelta64[dt.timedelta]")
assert_type(np_dt_date - np_dt_int,  "np.timedelta64[int]")
assert_type(np_dt_date - np_dt_nat,  "np.timedelta64[None]")
assert_type(np_dt_date - np_td,      "np.datetime64[dt.date]")
assert_type(np_dt_date - np_td_int,  "np.datetime64[int]")
assert_type(np_dt_date - np_td_nat,  "np.datetime64[None]")
# np_dt_int
# assert_type(np_dt_int - py_date,    dt.timedelta)
# assert_type(np_dt_int - py_dt,      dt.timedelta)
# assert_type(np_dt_int - py_td,      dt.date)  # ❌ raises [operator]
assert_type(np_dt_int - np_dt,      "np.timedelta64[int]")
assert_type(np_dt_int - np_dt_date, "np.timedelta64[int]")
assert_type(np_dt_int - np_dt_int,  "np.timedelta64[int]")
assert_type(np_dt_int - np_dt_nat,  "np.timedelta64[None]")
assert_type(np_dt_int - np_td,      "np.datetime64[int]")
assert_type(np_dt_int - np_td_int,  "np.datetime64[int]")
assert_type(np_dt_int - np_td_nat,  "np.datetime64[None]")
# np_nat
assert_type(np_dt_nat - np_dt,      "np.timedelta64[None]")
assert_type(np_dt_nat - np_dt_date, "np.timedelta64[None]")
assert_type(np_dt_nat - np_dt_int,  "np.timedelta64[None]")
assert_type(np_dt_nat - np_dt_nat,  "np.timedelta64[None]")
assert_type(np_dt_nat - np_td,      "np.datetime64[None]")
assert_type(np_dt_nat - np_td_int,  "np.datetime64[None]")
assert_type(np_dt_nat - np_td_nat,  "np.datetime64[None]")
# fmt: on


class Timedelta(Protocol):
    def __add__(self, other: Self, /) -> Self: ...
    def __radd__(self, other: Self, /) -> Self: ...
    def __sub__(self, other: Self, /) -> Self: ...
    def __rsub__(self, other: Self, /) -> Self: ...


class Timestamp[TD: Timedelta](Protocol):
    @overload
    def __sub__(self, other: Self, /) -> TD: ...
    @overload
    def __sub__(self, other: TD, /) -> Self: ...


class SupportsSubSelf[TD: Timedelta](Protocol):
    def __sub__(self, other: Self, /) -> TD: ...


class SupportsSubTD[TD: Timedelta](Protocol):
    def __sub__(self, other: TD, /) -> Self: ...


td: Timedelta = np_td

_a1: SupportsSubTD = py_dt  # ✅
_a2: SupportsSubTD = np_dt  # ✅
_a3: SupportsSubTD[np.timedelta64] = np_dt  # ❌

_b1: SupportsSubSelf = py_dt  # ✅
_b2: SupportsSubSelf = np_dt  # ✅
_b3: SupportsSubSelf[np.timedelta64] = np_dt  # ❌

# w/o generic
_5: Timestamp = py_dt  # ✅
_6: Timestamp = np_dt  # ❌ (not fixed by reorder)
# w/ generic
_7: Timestamp[dt.timedelta] = py_dt  # ✅
_8: Timestamp[np.timedelta64] = np_dt  # ❌ (not fixed by reorder)
# w/ nested generic
_9: Timestamp[np.timedelta64[dt.timedelta]] = np_dt  # ❌ (fixed by reorder)


def infer_td_type[TD: Timedelta](x: Timestamp[TD]) -> Timestamp[TD]:
    return x

# mypy fails these but pyright passes
assert_type(infer_td_type(np_dt), "Timestamp[np.timedelta64[dt.timedelta]]")
assert_type(infer_td_type(np_dt_int), "Timestamp[np.timedelta64[int]]")
assert_type(infer_td_type(np_dt_nat), "Timestamp[np.timedelta64[None]]")

datetime_sub_test.txt

@randolf-scholz
Copy link
Author

Here, _date = TypeVar("_date", bound=dt.date).

@jorenham jorenham reopened this Feb 5, 2025
@jorenham
Copy link
Member
jorenham commented Feb 5, 2025

Good catch!
Could you submit this as a PR? Your test suite could go into numpy/typing/tests/data/reveal, either as a new .pyi or appended to e.g. arithmetic.pyi

@jorenham jorenham removed their assignment Feb 5, 2025
@jorenham
Copy link
Member
jorenham commented Feb 8, 2025

I just tried out your suggestion @randolf-scholz. Once I added an additional overload at the bottom for the datetime64[?] - datetime64[?] case, the current tests all passed.

But mypy reports several overload-overlap errors, which are probably the same ones that you mentioned. But I noticed that Pyright doesn't report any errors here, so I'm guessing that they're false-positives.

    # Priority 1a: operations against numpy time types.
    @overload  # M[date] - M[date] -> m[timedelta]
    def __sub__(self: datetime64[dt.date], x: datetime64[dt.date], /) -> timedelta64[dt.timedelta]: ...
    @overload  # M[date] - m[timedelta] -> M[date]
    def __sub__(self: datetime64[_DateT], x: timedelta64[dt.timedelta], /) -> datetime64[_DateT]: ...

    # Priority 1b: operations against python time types.
    @overload  # M[datetime] - datetime -> timedelta
    def __sub__(self: datetime64[dt.datetime], x: dt.datetime, /) -> dt.timedelta: ...
    @overload  # M[date] - date -> timedelta
    def __sub__(self: datetime64[dt.date], x: dt.date, /) -> dt.timedelta: ...
    @overload  # M[date] - timedelta -> date
    def __sub__(self: datetime64[_DateT], x: dt.timedelta, /) -> _DateT: ...

    # Priority 2: (T - NaT) and (NaT - T) always result in NaT
    @overload  # M[NaT] - M[?] -> m[NaT]  # ❌ 6 overlaps 12 and 14
    def __sub__(self: datetime64[None], x: datetime64, /) -> timedelta64[None]: ...
    @overload  # M[NaT] - m[?] -> M[NaT]  # ❌ 7 overlaps 13 and 15
    def __sub__(self: datetime64[None], x: timedelta64, /) -> datetime64[None]: ...
    @overload  # M[?] - M[NaT] -> m[NaT]  # ❌ 8 overlaps 10
    def __sub__(self, x: datetime64[None], /) -> timedelta64[None]: ...
    @overload  # M[?] - m[NaT] -> M[NaT]  # ❌ 9 overlaps 11
    def __sub__(self, x: timedelta64[None], /) -> datetime64[None]: ...

    # Priority 3: integer resolution of any operand causes upcasting
    @overload  # M[int] - M[?] -> m[int]  # ❌ 10 overlaps 14
    def __sub__(self: datetime64[int], x: datetime64, /) -> timedelta64[int]: ...
    @overload  # M[int] - m[?] -> M[int]  # ❌ 11 overlaps 15
    def __sub__(self: datetime64[int], x: timedelta64, /) -> datetime64[int]: ...
    @overload  # M[?] - M[int] -> m[int]  # 12
    def __sub__(self, x: datetime64[int], /) -> timedelta64[int]: ...
    @overload  # M[?] - m[int] -> M[int]  # 13
    def __sub__(self, x: timedelta64[int], /) -> datetime64[int]: ...

    # Priority 4: unknown self type, but know outcome
    @overload  # M[?] - M[date] -> m[timedelta]  # 14
    def __sub__(self, x: datetime64[dt.date], /) -> timedelta64[dt.timedelta]: ...
    @overload  # M[?] - m[timedelta] -> M[date]  # 15
    def __sub__(self, x: timedelta64[dt.timedelta], /) -> datetime64[dt.date]: ...
    @overload  # M[T] - int-like -> M[T]
    def __sub__(self, x: _IntLike_co, /) -> Self: ...

    # Priority 5: both sides unknown
    @overload  # M[T] - M[?] -> m[?]
    def __sub__(self, x: datetime64, /) -> timedelta64: ...

I'll see if I can get your test suite to work as well

@jorenham
Copy link
Member
jorenham commented Feb 8, 2025

I wasn't able to get it to work after all, @randolf-scholz. See https://github.com/jorenham/numpy/tree/typing/fix-28257 for the code. Feel free to fork or copy it, in case you wanna give it a try yourself.

@randolf-scholz
Copy link
Author

Hm, I am looking at it and I notice two more things:

  1. timedelta64 is a subtype of np.integer, which needs to be accounted for in the overload self, other: int | np.integer | np.bool_
  2. What should happen when a more precise type meets an imprecise one?

So let's say we know x is a datetime64[dt.datetime], but we only know y is a np.timedelta64 (without given generic type)

Should x-y be inferred as either datetime64[dt.datetime] or datetime64[dt.date | int | None] or datetime64[Any]?

@randolf-scholz
Copy link
Author
randolf-scholz commented Feb 9, 2025

This gets complicated by the fact that a default equal to the union type was added.

For example, a user might produce a x=np.datetime64[dt.datetime] and a y=np.timedelta64[dt.timedelta]. However, due to some missing type annotations, mypy might only be able to deduce np.timedelta64[Unknown], for which it then substitutes the default, which then down the line produces some false-positives if x-y gets resolved as np.datetime64[dt.date | int | None].

On the other hand, if we resolve np.datetime64[dt.datetime] - np.timedelta64[Any] as np.datetime64[dt.datetime] this may result in some legitimate false negatives, for instance if y happens to produce NaT.

I would suggest instead to revert to produce np.datetime64[dt.datetime] - np.timedelta64[Any] = np.datetime64[Any], and replacing the default value with Any.

@jorenham
Copy link
Member
jorenham commented Feb 9, 2025
  1. timedelta64 is a subtype of np.integer, which needs to be accounted for in the overload self, other: int | np.integer | np.bool_

Only at runtime, but I plan on fixing this in the stubs (but in https://github.com/numpy/numtype first). So there's no need to take that into account (yet).

2. What should happen when a more precise type meets an imprecise one?

For me it helps to think of of e.g. datetime64[int | None] as datetime64[int] | datetime[None]. That way, the datetime64[int] | datetime[None] minus e.g. timedelta64[int] will result in (datetime64[int] - timedelta[int]) | (datetime64[None] - timedelta[int]) -> datetime64[int] | datetime64[None] -> datetime64[int | None].

So let's say we know x is a datetime64[dt.datetime], but we only know y is a np.timedelta64 (without given generic type)

Should x-y be inferred as either datetime64[dt.datetime] or datetime64[dt.date | int | None] or datetime64[Any]?

At runtime, you can only subtract a timedelta64[None] or timedelta64[dt.timedelta] from a datetime64[dt.datetime]. So combining possible outcomes gives us datetime64[dt.timedelta | None].

@jorenham
Copy link
Member
jorenham commented Feb 9, 2025

I would suggest instead to revert to produce np.datetime64[dt.datetime] - np.timedelta64[Any] = np.datetime64[Any], and replacing the default value with Any.

I'm not a big fan off Any, because 1) it violates LSP and can therefore lead to type-unsafe situations, and 2) because the @overload behavior is unspecified, and can currently lead to different outcomes with different typecheckers.

If you encounter any annotations that lead to them being inferred as _[Unknown] in NumPy, then that's something we should fix. Opening an issue or submitting a PR for it will help a lot in that case.

@randolf-scholz
Copy link
Author

What I meant with precise vs imprecise is the case when the type checker managed to resolve the generic type vs when it couldn't. Consider the following code (run against main branch):

import datetime as dt
from dataclasses import dataclass
from typing import assert_type, Any
import numpy as np

def my_untyped_fn():
    return dt.timedelta(seconds=2)

py_dt = dt.datetime(year=2025, month=2, day=1)
np_dt = np.datetime64(py_dt)  # resolved as np.datetime64[dt.datetime]
np_td = np.timedelta64(my_untyped_fn())  # resolved as np.datetime64[Any]
np_td2: np.timedelta64 = np_td  # resolved as np.timedelta64

assert_type(np_dt, "np.datetime64[dt.datetime]")
assert_type(np_td, "np.timedelta64[Any]")
assert_type(np_td2, "np.timedelta64[dt.timedelta | int | None]")

@dataclass
class Measurement:
    timestamp: np.datetime64[dt.datetime]

Measurement(np_dt - np_td)  # ❌: got datetime64[int]
Measurement(np_dt - np_td2)  #  ❌: got datetime64[date | int | None]

Existing libraries might lack type hints, leading to timedelta64[Any] or might have imprecise type hints such as just annotating np.timedelta64 which yields the default value for the generics.

In both cases, this code produces false positive error warnings.


The point is by setting the default value as dt.date | int | None and dt.timedelta | int | None, it means that when a user annotates something as

value: np.datetime64

it's essentially equivalent to annotating as

value: np.datetime64[dt.date] | np.datetime64[int] | no.datetime64[None]

Which may be surprising to many users and will cause issues when they try to forward this variable to a function that is annotated more precisely, like the Measurement class above.

@randolf-scholz
Copy link
Author

Generally with the python type system, given its gradual nature, my expectation (and I think that of many other users as well) is that removing type hints / making type hints less precise (like being lazy and just annotating np.timedelta64 instead of np.timedelta64[dt.timedelta], or not annotating at all) generally shouldn't cause the type checker to produce additional errors.

That's the rationale for Any.

@jorenham
Copy link
Member
jorenham commented Feb 9, 2025

Thore are pretty good points; it might indeed be better to have them default to Any.

But the thing I'm worried about, is that this will lead to situations like #28240, #28193, and #28017, where types whose type argument is Any that are used in overloads, will cause strange mypy behavior.

Such outcomes are difficult to predict, and could unintentionally lead to breaking changes (often only in mypy). I don't remember seeing anything like that being mentioned it in the mypy 1.15.0 release notes either, so I expect this to still be the case. So it's probably better to change this in the numpy stubs of numtype instead, see numpy/numtype#95.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0