Description
Motivation for this proposal: for full dtype support with nullable dtypes, we also need a nullable version of the datetime-like dtypes. For backwards compatibility, we need a new dtype (like we did for the other nullable dtypes), and that's what the proposal below is describing. And when creating a new dtype, I think it is the perfect opportunity to have a different default for the resolution (eg microsecond unit instead of nanosecond).
Summary: This proposal puts forward a new TimestampDtype
, a nullable extension dtype to hold timestamp data:
- A new timestamp data type that follows the pattern of the nullable dtypes (e.g. integer, boolean) with consistent missing value behaviour.
- A parameterized data type with support for multiple resolutions (from seconds through nanoseconds) and optionally time zones (unifying the tz-naive and tz-aware dtypes into a single ExtensionDtype).
- The new data type can have a better default resolution (e.g. microseconds instead of nanoseconds).
- I suggest using "timestamp" for the dtype name, because 1) we need a different name to differentiate from "datetime64" anyway and 2) this is then internally consistent with our Timestamp scalar. But an alternative could also be "Datetime64" (capitalized).
Full version at https://docs.google.com/document/d/1uCdxjlYAafdHD7f57kpkPsJV2Q9Oaxkg1a8V5steMBM/edit?usp=sharing
This would address #7307
Small illustrative code snippet:
>>> s
0 2020-01-01 00:00:00
1 <NA>
2 2020-01-01 02:00:00
dtype: timestamp[us]
>>> s_tz
0 2020-01-01 00:00:00+01:00
1 2020-01-01 01:00:00+01:00
2 2020-01-01 02:00:00+01:00
dtype: timestamp[ns, tz=Europe/Brussels]
Looking forward to your thoughts / comments.
cc @pandas-dev/pandas-core @pandas-dev/pandas-triage