-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
BUG/API: to_datetime preserves UTC offsets when parsing datetime strings #21822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
ac5a3d1
b81a8e9
6bf46a8
678b337
1917148
581a33e
a1bc8f9
80042e6
7c4135e
0651416
bacb6e3
a2f4aad
d48f341
7efb25c
1d527ff
f89d6b6
d91c63f
1054e8b
d9cdc91
7d04613
031284c
749e62e
23cbf75
1e6f87a
7539bcf
c1f51cd
db75a24
4733ac5
e3db735
2fa681f
5f36c98
d7ff275
4ff7cb3
8463d91
dddc6b3
cca3983
e441be0
a8a65f7
75f9fd9
6052475
f916c69
807a251
1cbd9b9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -456,14 +456,15 @@ cpdef array_to_datetime(ndarray[object] values, errors='raise', | |
is encountered | ||
|
||
Also returns a pytz.FixedOffset if an array of strings with the same | ||
timezone offset if passed and utc=True is not passed | ||
timezone offset is passed and utc=True is not passed. Otherwise, None | ||
is returned | ||
|
||
Handles datetime.date, datetime.datetime, np.datetime64 objects, numeric, | ||
strings | ||
|
||
Returns | ||
------- | ||
(ndarray, timezone offset) | ||
tuple (ndarray, timezone offset) | ||
""" | ||
cdef: | ||
Py_ssize_t i, n = len(values) | ||
|
@@ -481,18 +482,12 @@ cpdef array_to_datetime(ndarray[object] values, errors='raise', | |
bint is_coerce = errors=='coerce' | ||
_TSObject _ts | ||
int out_local=0, out_tzoffset=0 | ||
# Can't directly create a ndarray[int] out_local, | ||
# since most np.array constructors expect a long dtype | ||
# while _string_to_dts expects purely int | ||
# maybe something I am missing? | ||
ndarray[int64_t] out_local_values | ||
ndarray[int64_t] out_tzoffset_vals | ||
|
||
# specify error conditions | ||
assert is_raise or is_ignore or is_coerce | ||
|
||
try: | ||
out_local_values = np.empty(n, dtype=np.int64) | ||
out_tzoffset_vals = np.empty(n, dtype=np.int64) | ||
result = np.empty(n, dtype='M8[ns]') | ||
iresult = result.view('i8') | ||
|
@@ -630,7 +625,6 @@ cpdef array_to_datetime(ndarray[object] values, errors='raise', | |
# No error raised by string_to_dts, pick back up | ||
# where we left off | ||
out_tzoffset_vals[i] = out_tzoffset | ||
out_local_values[i] = out_local | ||
value = dtstruct_to_dt64(&dts) | ||
if out_local == 1: | ||
seen_datetime_offset = 1 | ||
|
@@ -685,12 +679,12 @@ cpdef array_to_datetime(ndarray[object] values, errors='raise', | |
# object path where an array of datetimes | ||
# (with individual datutil.tzoffsets) are returned | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo datutil |
||
|
||
# Faster to compare integers than to compare objects | ||
# Faster to compare integers than to compare pytz objects | ||
is_same_offsets = (out_tzoffset_vals[0] == out_tzoffset_vals).all() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There may be a perf tradeoff here, specifically in the case where we have all-strings, all of which are ISO, but that don't have matching timezones. Going through the The various paths (including |
||
if not is_same_offsets: | ||
raise TypeError | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This (pre-existing) pattern is pretty opaque to a first-time reader. What if instead of raising |
||
else: | ||
tz_out = pytz.FixedOffset(out_tzoffset_vals[0]) | ||
tz_out = pytz.FixedOffset(out_tzoffset) | ||
|
||
return result, tz_out | ||
except OutOfBoundsDatetime: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle other tzinfos could be returned, specifically if it falls back to dateutil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifying that
array_to_datetime
function itself can return apytz.FixedOffset
orNone
from the C parser output. If it goes through the dateutil parser in the non-ValueError
branch, I don't think there's a way to recover the timezone?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When
parse_datetime_string
is called if there's a timezone it should return a tz-awaredatetime
object, so the tzinfo can be pulled off that can't it?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right that's true, good catch. Sure I will try to add some tests and functionality to hit the dateutil parser before the object branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I am using a
set
to store the parsed timezone offsets (which should be more performant that using an array in theory / I was having some trouble using an array due to duplicates), howeverdateutil.tz.tzoffset
s cannot be hashed: dateutil/dateutil#792So instead, I am saving the
total_seconds()
of the dateutil tzoffset in the set instead and reconstructing the offsets aspytz.FixedOffset
sThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add Parameters here