-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
REF: Simplify Datetimelike constructor dispatching #23140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
f13cc58
Avoid non-public constructors
jbrockmendel 4188ec7
simplify and de-duplicate _generate_range
jbrockmendel 7804f1b
Check for invalid axis kwarg
jbrockmendel a4775f4
Move some EA properties up to mixins
jbrockmendel 8ee34fa
implement basic TimedeltaArray tests
jbrockmendel 78943c1
clean up PeriodArray constructor, with tests
jbrockmendel aa71383
make PeriodArray.__new__ more grown-up
jbrockmendel eae8389
Remove unused kwargs from TimedeltaArray.__new__
jbrockmendel e871733
revert change that broke tests
jbrockmendel 7840f91
Fixup whitespace
jbrockmendel ec50b0b
helper function for axis validation
jbrockmendel eb7a6b6
suggested clarifications
jbrockmendel 32c6391
Merge branch 'dlike8' of https://github.com/jbrockmendel/pandas into …
jbrockmendel c903917
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel b97ec96
move axis validation to nv
jbrockmendel 11db555
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel 147de57
revert some removals
jbrockmendel 7c4d281
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel b90f421
catch too-negative values
jbrockmendel dc4f474
Roll validate_minmax_axis into existing validate functions
jbrockmendel 46d5e64
fixup typo
jbrockmendel b5827c7
Merge branch 'master' of https://github.com/pandas-dev/pandas into dl…
jbrockmendel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
clean up PeriodArray constructor, with tests
- Loading branch information
commit 78943c17e2dcab4f9cf1098c18980d06c9dbfe9e
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
|
||
from pandas.core.dtypes.common import ( | ||
is_integer_dtype, is_float_dtype, is_period_dtype, | ||
is_object_dtype, | ||
is_datetime64_dtype) | ||
from pandas.core.dtypes.dtypes import PeriodDtype | ||
from pandas.core.dtypes.generic import ABCSeries | ||
|
@@ -124,15 +125,19 @@ def freq(self, value): | |
def __new__(cls, values, freq=None, **kwargs): | ||
if is_period_dtype(values): | ||
# PeriodArray, PeriodIndex | ||
if freq is not None and values.freq != freq: | ||
raise IncompatibleFrequency(freq, values.freq) | ||
freq = values.freq | ||
freq = dtl.validate_dtype_freq(values.dtype, freq) | ||
values = values.asi8 | ||
|
||
elif is_datetime64_dtype(values): | ||
# TODO: what if it has tz? | ||
values = dt64arr_to_periodarr(values, freq) | ||
|
||
elif is_object_dtype(values) or isinstance(values, (list, tuple)): | ||
# e.g. array([Period(...), Period(...), NaT]) | ||
values = np.array(values) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what if this is an int array? or is that prohibited? (except via _from_ordinals) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Then it gets passed through simple_new unchanged. |
||
if freq is None: | ||
freq = libperiod.extract_freq(values) | ||
values = libperiod.extract_ordinals(values, freq) | ||
|
||
return cls._simple_new(values, freq=freq, **kwargs) | ||
|
||
@classmethod | ||
|
@@ -175,6 +180,8 @@ def _from_ordinals(cls, values, freq=None, **kwargs): | |
|
||
@classmethod | ||
def _generate_range(cls, start, end, periods, freq, fields): | ||
periods = dtl.validate_periods(periods) | ||
|
||
if freq is not None: | ||
freq = Period._maybe_convert_freq(freq) | ||
|
||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be is_list_like? (for the isinstance check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically for object dtype (actually, I need to add
dtype=object
to thenp.array
call below) since we're calling libperiod.extract_ordinals, which expects object dtype.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specifically what happens if other non ndarray list likes hit this path? do they need handling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They do need handling, but we're not there yet. The thought process for implementing these constructors piece-by-piece is
a) The DatetimeIndex/TimedeltaIndex/PeriodIndex constructors are overgrown; let's avoid that in the Array subclasses.
b) Avoid letting the implementations get too far ahead of the tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other question: where was this handled previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard for me to say what's better in the abstract.
From the WIP PeriodArray PR, I found that having to think carefully about what type of data I had forced some clarity in the code. I liked having to explicitly reach for that
_from_periods
constructor.Regardless, I think our two goals with the array constructors should be
If you think we're likely to end up in a situation where being able to pass an array of objects to the main
__init__
will make things easier, then by all means.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am a bit puzzled why you would handle lists and and ndarray differently (tom and joris); these are clearly doing the same thing and we have a very similar handling for list likes throughout pandas
separating these is a non starter - even having a separate constructor is also not very friendly. pandas does inference on the construction which is one of the big selling points. trying to change this, esp at the micro level is a huge mental disconnect.
if you want to propose something like that pls do it in other issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we are.
But, my only argument was
If that's not persuasive then I'm not going to argue against handling them in the init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
+1
+1
Yes, I think we should be pretty forgiving about what gets accepted into
__init__
(for all three of Period/Datetime/Timedelta Arrays). Definitely don't want thestart, end, periods
currently in the Index subclass constructors. I think by excluding those we'll keep these constructors fairly straightforward.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not about lists vs arrays, it's about arrays of Period objects vs arrays of ordinal integers, which is something very different.
Being forgiving is exactly what lead to the complex Period/DatetimeIndex constructors. I think we should not make the same choice for our Array classes.
Of course it doesn't need to be that complex, as I think there are here EDBE two main usecases discussed: an array of scalar objects (eg Periods or Timestamps), or an array of the underlying storage type (eg datetime64 or ordinal integers).
I personally also think it makes the code clearer to even separate those two concepts (basically what we also did with IntegerArray), but maybe let's open an issue to further discuss that instead of here in a hidden review comment thread? (i can only open one later today )