8000 REF/ENH: Constructors for DatetimeArray/TimedeltaArray by jbrockmendel · Pull Request #23493 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
implement basic constructor tests, fix just enough of the comparisons…
… to pass them
  • Loading branch information
jbrockmendel committed Nov 4, 2018
commit 5f92cfa55865dc571dccb5dc8849565d1410e76e
9 changes: 7 additions & 2 deletions pandas/core/arrays/datetimes.py
10000
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,8 @@ def wrapper(self, other):
if isinstance(other, list):
# FIXME: This can break for object-dtype with mixed types
other = type(self)(other)
elif not isinstance(other, (np.ndarray, ABCIndexClass, ABCSeries)):
elif not isinstance(other, (np.ndarray, ABCIndexClass, ABCSeries,
DatetimeArrayMixin)):
# Following Timestamp convention, __eq__ is all-False
# and __ne__ is all True, others raise TypeError.
return ops.invalid_comparison(self, other, op)
Expand Down Expand Up @@ -204,6 +205,7 @@ def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False):
# e.g. DatetimeArray, DatetimeIndex
tz = values.tz

# TODO: what about if freq == 'infer'?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we should also get the freq from the values as a "cheap" inference? Or would there be cases were an inferred frequency can be different than the actual frequency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we should also get the freq from the values as a "cheap" inference?

That's what I'm thinking, yah

if freq is None and hasattr(values, "freq"):
# i.e. DatetimeArray, DatetimeIndex
freq = values.freq
Expand All @@ -219,7 +221,10 @@ def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False):
.format(cls=cls.__name__, data=repr(values)))
elif isinstance(values, ABCSeries):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would get out the _values, and then treat that the same as directly passing a DatetimeIndex/DatetimeArray ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if there is a graceful way to do this in the next pass (if I ever manage to catch up with all these comments!)

# extract nanosecond unix timestamps
values = values._values.asi8
if tz is None:
# TODO: Try to do this in just one place
tz = values.dt.tz
values = np.array(values.view('i8'))
elif isinstance(values, DatetimeArrayMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatetimeArrayMixin -> cls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you don't need to get the tz in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DatetimeArrayMixin -> cls?

No. For the moment we are still using inheritance, so this would mess up for DatetimeIndex == DatetimeArray. When we change to composition this check will have to become isinstance(values, (DatetimeArray, ABCDatetimeIndex))

# extract nanosecond unix timestamps
values = values.asi8
Expand Down
74 changes: 74 additions & 0 deletions pandas/tests/arrays/test_datetimes.py
A42E
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
"""
Tests for DatetimeArray
"""
import pytest
import numpy as np

from pandas.core.arrays import DatetimeArrayMixin as DatetimeArray

import pandas as pd
import pandas.util.testing as tm




class TestDatetimeArrayConstructors(object):

def test_init_from_object_dtype(self, tz_naive_fixture):
tz = tz_naive_fixture
if tz is not None:
pytest.xfail(reason="Casting DatetimeIndex to object-dtype raises "
"for pd.Index and is incorrect for np.array; "
"GH#24391")

# arbitrary DatetimeIndex; this should work for any DatetimeIndex
# with non-None freq
dti = pd.date_range('2016-01-1', freq='MS', periods=9, tz=tz)
expected = DatetimeArray(dti)

# Fails because np.array(dti, dtype=object) incorrectly returns Longs
result = DatetimeArray(np.array(dti, dtype=object), freq='infer')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned above, but I don't think DatetimeArray constructor should handle this case?

That is what we discussed in #23212 (although, @jbrockmendel , you didn't really react there)

Copy link
Member Author
@jbrockmendel jbrockmendel Nov 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully I've answered this enough times in this thread. I see no reason not to handle object dtype in the DatetimeArray constructor. Series and Index and DataFrame all handle object-dtype and lists; I find it counter-intuitive that we would have a small subset of classes that don't.

But in the short-run, we need to handle these cases if we want to share the extant tests (without significant overhaul) (which we do)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #23493 (comment) for my answer to a previous related comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in this thread you've said you're fine with bikeshedding this after we have a working implementation. Has this changed in the last few minutes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you are referring to #23493 (comment) ? Indeed, I was somewhat inconsistent here, but I think the other comments are quite clear in that I am at least asking why you are adding or keeping handling object dtypes?

If the answer is: because I think that is the way it should be, then let's discuss that and try to get to a consensus (#23212). If the answer is: because of practical reasons for now, then let's discuss the practical reasons and see if that is OK to defer to a follow-up, or if there is an easy way to overcome the practical reason.

tm.assert_equal(pd.DatetimeIndex(result), dti)

# Fails because `pd.Index(dti, dtype=object) raises incorrectly
result = DatetimeArray(pd.Index(dti, dtype=object), freq='infer')
tm.assert_equal(pd.DatetimeIndex(result), dti)

# NB: for now we re-wrap in DatetimeIndex to use assert_index_equal
# once assert_datetime_array_equal is in place, this will be changed
def test_init_only_freq_infer(self, tz_naive_fixture):
# just pass data and freq='infer' if relevant; no other kwargs
tz = tz_naive_fixture

# arbitrary DatetimeIndex; this should work for any DatetimeIndex
# with non-None freq
dti = pd.date_range('2016-01-1', freq='MS', periods=9, tz=tz)
expected = DatetimeArray(dti)
assert expected.freq == dti.freq
assert expected.tz == dti.tz

# broken until ABCDatetimeArray and isna is fixed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be done in the forthcoming PR mentioned in the OP.

# assert (dti == expected).all()
# assert (expected == dti).all()

result = DatetimeArray(expected)
tm.assert_equal(pd.DatetimeIndex(result), dti)

result = DatetimeArray(list(dti), freq='infer')
tm.assert_equal(pd.DatetimeIndex(result), dti)

result = DatetimeArray(tuple(dti), freq='infer')
tm.assert_equal(pd.DatetimeIndex(result), dti)

if tz is None:
result = DatetimeArray(np.array(dti), freq='infer')
tm.assert_equal(pd.DatetimeIndex(result), dti)

result = DatetimeArray(np.array(dti).astype('M8[s]'), freq='infer')
tm.assert_equal(pd.DatetimeIndex(result), dti)

result = DatetimeArray(pd.Series(dti), freq='infer')
tm.assert_equal(pd.DatetimeIndex(result), dti)

result = DatetimeArray(pd.Series(dti, dtype=object), freq='infer')
tm.assert_equal(pd.DatetimeIndex(result), dti)
0