BUG: DataFrame.append with timedelta64 #39574

jbrockmendel · 2021-02-03T04:16:46Z

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

AFAICT several of the existing tests are just wrong, xref #39122 cc @jorisvandenbossche

Fixes (but havent yet added a test for) #39037 (comment)

jreback · 2021-02-03T13:36:03Z

pandas/core/dtypes/concat.py

@@ -97,7 +97,7 @@ def _cast_to_common_type(arr: ArrayLike, dtype: DtypeObj) -> ArrayLike:
    return arr.astype(dtype, copy=False)


-def concat_compat(to_concat, axis: int = 0):
+def concat_compat(to_concat, axis: int = 0, pretend_axis1: bool = False):


why don't you just allow axis=None and call it that case. this is very odd naming here (appreciate the de-duplication that this allows); so i guess a better question is this temporary?

why don't you just allow axis=None and call it that case. this is very odd naming here

I want the naming to be very clear that this is a dont-try-this-at-home kludge (need to update the docstring to that effect)

so i guess a better question is this temporary?

inasmuch as it wont be needed once we have 2D EAs, yes.

ok to the extent this is not removed anytime soon, can you come up with a better argument name or another way of doing this?

renamed + green

jorisvandenbossche · 2021-02-03T14:33:34Z

Can you summarize the behaviour that is changed / bug that is fixed?

jbrockmendel · 2021-02-03T15:18:37Z

Can you summarize the behaviour that is changed / bug that is fixed?

df = DataFrame({"a": pd.array([1], dtype="Int64")})
other = DataFrame({"a": [np.timedelta64("NaT", "ns")]})
result = df.append(other, ignore_index=True)

>>> result
      a
0     1
1  <NA>
>>> result.dtypes
a    Int64
dtype: object

df = DataFrame(columns=["a"]).astype("Int64")
other = DataFrame({"a": [np.timedelta64(1, "ns")]})

>>> result = df.append(other, ignore_index=True)
ValueError: Wrong number of dimensions. values.ndim != ndim [1 != 2]

jorisvandenbossche

I am personally a bit hesitant to change the all-NaN special case. That's longstanding behaviour (also for practical reasons), that will break code I think.

jorisvandenbossche · 2021-02-08T23:10:57Z

pandas/core/dtypes/concat.py

@@ -72,6 +72,8 @@ def concat_compat(to_concat, axis: int = 0):
    ----------
    to_concat : array of arrays
    axis : axis to provide concatenation
+    ea_compat_axis : bool, default False
+        For ExtensionArray compat, behave as if axis == 1


Can you clarify what this means in practice to behave "as if axis=1"? Because I assume the arrays are still concatenated with axis=0?

just matters for the dropping-empty check; will edit to clarify

pandas/tests/frame/methods/test_append.py

pandas/tests/indexing/test_partial.py

pandas/tests/reshape/concat/test_append.py

jbrockmendel · 2021-02-09T01:52:03Z

@jorisvandenbossche helpful comments, thank you. i found a nice way to retain the None/nan behavior while fixing the original bug: supplementing JoinUnit.is_na with a check that it is a compatible NA.

jorisvandenbossche · 2021-02-10T14:43:13Z

pandas/core/internals/concat.py

+        if self.dtype == object:
+            values = self.block.values
+            return all(
+                is_valid_nat_for_dtype(x, dtype) for x in values.ravel(order="K")
+            )


This is not only required for object dtype, I think. Also float NaN is considered "all NaN" when it comes to ignoring the dtype in concatting dataframes (and other dtypes as well I think):

In [39]: pd.concat([pd.DataFrame({'a': [np.nan, np.nan]}), pd.DataFrame({'a': [pd.Timestamp("2012-01-01")]})]) Out[39]: a 0 NaT 1 NaT 0 2012-01-01

the non-object case is handled below on L245-246. or do you have something else in mind?

Does my snippet above work with this PR?
(if so, then I don't fully understand why the changes to test_append_empty_frame_to_series_with_dateutil_tz are needed)

Does my snippet above work with this PR?

yes it does

(if so, then I don't fully understand why the changes to test_append_empty_frame_to_series_with_dateutil_tz are needed)

I think that's driven by something sketchy-looking in get_reindexed_values, will see if that can be addressed.

jbrockmendel · 2021-02-11T18:41:29Z

@jorisvandenbossche i think your comments have all been addressed, lmk if i missed anything

jbrockmendel · 2021-02-12T18:15:21Z

@jorisvandenbossche gentle ping before your day ends

jorisvandenbossche · 2021-02-12T18:17:26Z

Can you add a whatsnew?

jbrockmendel · 2021-02-12T18:17:50Z

will do, thanks

pandas/core/internals/concat.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

pandas/core/internals/concat.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jorisvandenbossche · 2021-02-12T18:46:31Z

pandas/tests/reshape/concat/test_append.py

        tm.assert_frame_equal(result_b, expected)

        # column order is different
        expected = expected[["c", "d", "date", "a", "b"]]
-        result = df.append([s, s], ignore_index=True)
+        dtype = Series([date]).dtype
+        expected["date"] = expected["date"].astype(dtype)


Is this still needed? (might be a left-over from astyping it to object before)

you're right, updated

jorisvandenbossche

Last minor comment at https://github.com/pandas-dev/pandas/pull/39574/files#r575450869

Nice clean-up!

…g-concat-4

BUG: DataFrame.append with timedelta64

< 8000 a href="/pandas-dev/pandas/pull/39574/commits/b3fb477074af15590d383b439ac87b1712977178" class="Link--secondary">b3fb477

jreback requested changes Feb 3, 2021

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 3, 2021

jbrockmendel added 2 commits February 7, 2021 09:21

Merge branch 'master' into bug-concat-4

8083939

pretend_axi1 -> ea_compat_axis

512a50c

jorisvandenbossche requested changes Feb 8, 2021

View reviewed changes

jbrockmendel added 2 commits February 8, 2021 16:20

Merge branch 'master' into bug-concat-4

fa6d8a4

check compatibility in JoinUnit.is_na

1c63c05

jbrockmendel mentioned this pull request Feb 9, 2021

[ArrayManager] REF: Implement concat with reindexing #39612

Merged

Merge branch 'master' into bug-concat-4

d75a950

jorisvandenbossche requested changes Feb 10, 2021

View reviewed changes

jbrockmendel added 2 commits February 10, 2021 08:51

Merge branch 'master' into bug-concat-4

5e35e31

avoid object dtype

7de3800

jreback added this to the 1.3 milestone Feb 10, 2021

jbrockmendel added 2 commits February 11, 2021 18:41

Merge branch 'master' into bug-concat-4

dbee2bc

REF: re-use helper func

dbb59e7

jbrockmendel added 2 commits February 12, 2021 10:17

Merge branch 'master' into bug-concat-4

f58d791

whatsnew

f40cf7c

jorisvandenbossche reviewed Feb 12, 2021

View reviewed changes

pandas/core/internals/concat.py Outdated Show resolved Hide resolved

Update pandas/core/internals/concat.py

7315004

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jorisvandenbossche reviewed Feb 12, 2021

View reviewed changes F438

pandas/core/internals/concat.py Show resolved Hide resolved

Update pandas/core/internals/concat.py

faf6c35

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jorisvandenbossche reviewed Feb 12, 2021

View reviewed changes

jorisvandenbossche approved these changes Feb 12, 2021

View reviewed changes

jbrockmendel added 2 commits February 12, 2021 11:09

revert no-longer necessary

3ed534c

Merge branch 'bug-concat-4' of github.com:jbrockmendel/pandas into bu…

d1c9872

…g-concat-4

jreback approved these changes Feb 12, 2021

View reviewed changes

jreback merged commit bcf2406 into pandas-dev:master Feb 12, 2021

jbrockmendel deleted the bug-concat-4 branch February 12, 2021 20:28

jorisvandenbossche mentioned this pull request Apr 12, 2021

API: value-dependent behaviour in concat with all-NA data #40893

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: DataFrame.append with timedelta64 #39574

BUG: DataFrame.append with timedelta64 #39574

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: DataFrame.append with timedelta64 #39574

BUG: DataFrame.append with timedelta64 #39574

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!