-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: DataFrame.append with timedelta64 #39574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
<
8000
a href="/pandas-dev/pandas/pull/39574/commits/b3fb477074af15590d383b439ac87b1712977178" class="Link--secondary">b3fb477
pandas/core/dtypes/concat.py
Outdated
@@ -97,7 +97,7 @@ def _cast_to_common_type(arr: ArrayLike, dtype: DtypeObj) -> ArrayLike: | |||
return arr.astype(dtype, copy=False) | |||
|
|||
|
|||
def concat_compat(to_concat, axis: int = 0): | |||
def concat_compat(to_concat, axis: int = 0, pretend_axis1: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you just allow axis=None and call it that case. this is very odd naming here (appreciate the de-duplication that this allows); so i guess a better question is this temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you just allow axis=None and call it that case. this is very odd naming here
I want the naming to be very clear that this is a dont-try-this-at-home kludge (need to update the docstring to that effect)
so i guess a better question is this temporary?
inasmuch as it wont be needed once we have 2D EAs, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok to the extent this is not removed anytime soon, can you come up with a better argument name or another way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed + green
Can you summarize the behaviour that is changed / bug that is fixed? |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am personally a bit hesitant to change the all-NaN special case. That's longstanding behaviour (also for practical reasons), that will break code I think.
pandas/core/dtypes/concat.py
Outdated
@@ -72,6 +72,8 @@ def concat_compat(to_concat, axis: int = 0): | |||
---------- | |||
to_concat : array of arrays | |||
axis : axis to provide concatenation | |||
ea_compat_axis : bool, default False | |||
For ExtensionArray compat, behave as if axis == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what this means in practice to behave "as if axis=1"? Because I assume the arrays are still concatenated with axis=0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just matters for the dropping-empty check; will edit to clarify
@jorisvandenbossche helpful comments, thank you. i found a nice way to retain the None/nan behavior while fixing the original bug: supplementing JoinUnit.is_na with a check that it is a compatible NA. |
pandas/core/internals/concat.py
Outdated
if self.dtype == object: | ||
values = self.block.values | ||
return all( | ||
is_valid_nat_for_dtype(x, dtype) for x in values.ravel(order="K") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not only required for object dtype, I think. Also float NaN is considered "all NaN" when it comes to ignoring the dtype in concatting dataframes (and other dtypes as well I think):
In [39]: pd.concat([pd.DataFrame({'a': [np.nan, np.nan]}), pd.DataFrame({'a': [pd.Timestamp("2012-01-01")]})])
Out[39]:
a
0 NaT
1 NaT
0 2012-01-01
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the non-object case is handled below on L245-246. or do you have something else in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does my snippet above work with this PR?
(if so, then I don't fully understand why the changes to test_append_empty_frame_to_series_with_dateutil_tz
are needed)
@jorisvandenbossche i think your comments have all been addressed, lmk if i missed anything |
All reactions
Sorry, something went wrong.
@jorisvandenbossche gentle ping before your day ends |
All reactions
Sorry, something went wrong.
Can you add a whatsnew? |
All reactions
Sorry, something went wrong.
will do, thanks |
All reactions
Sorry, something went wrong.
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
tm.assert_frame_equal(result_b, expected) | ||
|
||
# column order is different | ||
expected = expected[["c", "d", "date", "a", "b"]] | ||
result = df.append([s, s], ignore_index=True) | ||
dtype = Series([date]).dtype | ||
expected["date"] = expected["date"].astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? (might be a left-over from astyping it to object before)
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, updated
Sorry, something went wrong.
All reactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last minor comment at https://github.com/pandas-dev/pandas/pull/39574/files#r575450869
Nice clean-up!
Sorry, something went wrong.
All reactions
jreback
jorisvandenbossche
Successfully merging this pull request may close these issues.
AFAICT several of the existing tests are just wrong, xref #39122 cc @jorisvandenbossche
Fixes (but havent yet added a test for) #39037 (comment)