Open
Description
Index._concat
(used by Index.append
) is thin wrapper around concat_compat
. It is overriden by CategoricalIndex so that CategoricalDtype is retained more often than it is in concat_compat
. We should make these match.
If we just rip CategoricalIndex._concat
, we break 6 tests, all of which boil down to:
def test_append_category_objects(self, ci):
# with objects
result = ci.append(Index(["c", "a"]))
expected = CategoricalIndex(list("aabbcaca"), categories=ci.categories)
> tm.assert_index_equal(result, expected, exact=True)
If we go the other way and change concat_compat
, we break 6 different tests, all of which involve all-empty arrays or arrays that can be losslessly cast to the Categorical's dtype, e.g (edited for legibility)
def test_concat_empty_series_dtype_category_with_array(self):
# GH#18515
left = Series(np.array([]), dtype="category")
right = Series(dtype="float64")
result = concat([left, right])
> assert result.dtype == "float64"
def test_concat_categorical_coercion(self):
# GH 13524
# category + not-category => not-category
s1 = Series([1, 2, np.nan], dtype="category")
s2 = Series([2, 1, 2])
exp = Series([1, 2, np.nan, 2, 1, 2], dtype="object")
> tm.assert_series_equal(pd.concat([s1, s2], ignore_index=True), exp)
E AssertionError: Attributes of Series are different
E
E Attribute "dtype" are different
E [left]: CategoricalDtype(categories=[1, 2], ordered=False)
E [right]: object
Changing concat_compat results in much more convenient behavior, but it is textbook "values-dependent behavior" that in general we want to avoid (cc @jorisvandenbossche)