8000 (feat): Support for `pandas` `ExtensionArray` by ilan-gold · Pull Request #8723 · pydata/xarray · GitHub
[go: up one dir, main page]

Skip to content

(feat): Support for pandas ExtensionArray #8723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 101 commits into from
Apr 18, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
b2712f1
(feat): first pass supporting extension arrays
ilan-gold Feb 2, 2024
47bddd2
(feat): categorical tests + functionality
ilan-gold Feb 2, 2024
dc8b788
(feat): use multiple dispatch for unimplemented ops
ilan-gold Feb 5, 2024
75524c8
(feat): implement (not really) broadcasting
ilan-gold Feb 5, 2024
c9ab452
(chore): add more `groupby` tests
ilan-gold Feb 5, 2024
1f3d0fa
(fix): fix more groupby incompatibility
ilan-gold Feb 5, 2024
8a70e3c
(bug): fix unused categories
ilan-gold Feb 5, 2024
f5a6505
(chore): refactor dispatched methods + tests
ilan-gold Feb 5, 2024
08a4feb
(fix): shared type should check for extension arrays first and then f…
ilan-gold Feb 6, 2024
d5b218b
(refactor): tests moved
ilan-gold Feb 6, 2024
00256fa
(chore): more higher level tests
ilan-gold Feb 6, 2024
b7ddbd6
(feat): to/from dataframe
ilan-gold Feb 8, 2024
a165851
(chore): check for plum import
ilan-gold Feb 8, 2024
a826edd
(fix): `__setitem__`/`__getitem__`
ilan-gold Feb 8, 2024
fde19ea
(chore): disallow stacking
ilan-gold Feb 8, 2024
4c55707
(fix): `pyproject.toml`
ilan-gold Feb 8, 2024
58ba17d
(fix): `as_shared_type` fix
ilan-gold Feb 8, 2024
a255310
(chore): add variable tests
ilan-gold Feb 8, 2024
4e78b7e
(fix): dask + categoricals
ilan-gold Feb 8, 2024
d9cedf5
(chore): notes/docs
ilan-gold Feb 8, 2024
426664d
(chore): remove old testing file
ilan-gold Feb 8, 2024
22ca77d
(chore): remove ocmmented out code
ilan-gold Feb 8, 2024
f32cfdf
Merge branch 'main' into extension_arrays
ilan-gold Feb 8, 2024
60f8927
(fix): import plum dispatch
ilan-gold Feb 8, 2024
ff22d76
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Feb 8, 2024
2153e81
Merge branch 'main' into extension_arrays
ilan-gold Feb 9, 2024
b6d0b31
(refactor): use `is_extension_array_dtype` as much as possible
ilan-gold Feb 9, 2024
d285871
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Feb 9, 2024
d847277
Merge branch 'main' into extension_arrays
ilan-gold Feb 9, 2024
8238c64
(refactor): `extension_array`->`array` + move to `indexing`
ilan-gold Feb 10, 2024
1260cd4
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Feb 10, 2024
b04ef98
(refactor): change order of classes
ilan-gold Feb 10, 2024
b9937bf
(chore): add small pyarrow test
ilan-gold Feb 12, 2024
0bba03f
(fix): fix some mypy issues
ilan-gold Feb 12, 2024
b714549
(fix): don't register unregisterable method
ilan-gold Feb 12, 2024
a3a678c
(fix): appease mypy
ilan-gold Feb 12, 2024
e521844
(fix): more sensible default implemetations allow most use without `p…
ilan-gold Feb 12, 2024
2d3e930
(fix): handling `pyarrow` tests
ilan-gold Feb 12, 2024
04c9969
(fix): actually do import correctly
ilan-gold Feb 12, 2024
5514539
Merge branch 'main' into extension_arrays
ilan-gold Feb 12, 2024
bedfa5c
(fix): `reduce` condition
ilan-gold Feb 13, 2024
e6c2690
Merge branch 'main' into extension_arrays
ilan-gold Feb 13, 2024
82dbda9
(fix): column ordering for dataframes
ilan-gold Feb 13, 2024
12217ed
(refactor): remove encoding business
ilan-gold Feb 13, 2024
dd5b87d
(refactor): raise error for dask + extension array
ilan-gold Feb 13, 2024
761a874
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Feb 13, 2024
52cabc8
Merge branch 'main' into extension_arrays
ilan-gold Feb 13, 2024
e0d58fa
(fix): only wrap `ExtensionDuckArray` that has a `.array` which is a …
ilan-gold Feb 15, 2024
c1e0e64
(fix): use duck array equality method, not pandas
ilan-gold Feb 15, 2024
17e3390
(refactor): bye plum!
ilan-gold Feb 15, 2024
dd2ef39
Merge branch 'main' into extension_arrays
ilan-gold Feb 15, 2024
c8e6bfe
(fix): `and` to `or` for casting to `ExtensionDuckArray`
ilan-gold Feb 15, 2024
b2a9517
(fix): check for class, not type
ilan-gold Feb 16, 2024
f5e1bd0
Merge branch 'main' into extension_arrays
ilan-gold Feb 16, 2024
407fad1
(fix): only support native endianness
ilan-gold Feb 19, 2024
3a47f09
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Feb 19, 2024
fdd3de4
Merge branch 'main' into extension_arrays
ilan-gold Feb 19, 2024
6b23629
Merge branch 'main' into extension_arrays
ilan-gold Feb 20, 2024
1c9047f
(refactor): no need for superfluous checks in `_maybe_wrap_data`
ilan-gold Feb 22, 2024
9be6b03
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Feb 22, 2024
d9304f1
(chore): clean up docs to no longer reference `plum`
ilan-gold Feb 22, 2024
6ec6725
(fix): no longer allow `ExtensionDuckArray` to wrap `ExtensionDuckArray`
ilan-gold Feb 22, 2024
bc9ac4c
(refactor): move `implements` logic to `indexing`
ilan-gold Feb 22, 2024
1e906db
Merge branch 'main' into extension_arrays
ilan-gold Feb 29, 2024
6fb8668
(refactor): `indexing.py` -> `extension_array.py`
ilan-gold Feb 29, 2024
8f034b4
(refactor): `ExtensionDuckArray` -> `PandasExtensionArray`
ilan-gold Feb 29, 2024
90a6de6
Merge branch 'main' into extension_arrays
dcherian Mar 3, 2024
2bd422a
Merge branch 'main' into extension_arrays
ilan-gold Mar 18, 2024
ff67943
Merge branch 'main' into extension_arrays
ilan-gold Mar 25, 2024
661d9f2
(fix): add writeable property
ilan-gold Mar 25, 2024
caee1c6
(fix): don't check writeable for `PandasExtensionArray`
ilan-gold Mar 25, 2024
1d12f5e
(fix): move check eariler
ilan-gold Mar 25, 2024
31dfbb5
Merge branch 'main' into extension_arrays
ilan-gold Mar 26, 2024
23b347f
Merge branch 'main' into extension_arrays
ilan-gold Mar 28, 2024
902c74b
(refactor): correct guard clause
ilan-gold Mar 28, 2024
0b64506
(chore): remove unnecessary `AttributeError`
ilan-gold Mar 28, 2024
0c7e023
(feat): singleton wrapped as array
ilan-gold Mar 28, 2024
dd7fe98
(feat): remove shared dtype casting
ilan-gold Mar 28, 2024
f0df768
(feat): loop once over `dataframe.items`
ilan-gold Mar 28, 2024
e2f0487
(feat): add `__len__` attribute
ilan-gold Mar 28, 2024
1eb6741
(fix): ensure constructor recieves `pd.Categorical`
ilan-gold Mar 28, 2024
2a7300a
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Mar 28, 2024
9cceadc
Update xarray/core/extension_array.py
ilan-gold Mar 28, 2024
f2588c1
Update xarray/core/extension_array.py
ilan-gold Mar 28, 2024
a0a63bd
(fix): drop condition for categorical corrected
ilan-gold Mar 28, 2024
5bb2bde
Merge branch 'main' into extension_arrays
ilan-gold Mar 28, 2024
f85f166
Merge branch 'main' into extension_arrays
ilan-gold Apr 3, 2024
7ecdeba
Merge branch 'main' into extension_arrays
ilan-gold Apr 4, 2024
6bc40fc
Merge branch 'main' into extension_arrays
ilan-gold Apr 11, 2024
e9dc53f
Apply suggestions from code review
dcherian Apr 13, 2024
4791799
(chore): test `chunk` behavior
ilan-gold Apr 16, 2024
c649362
Merge branch 'extension_arrays' of github.com:ilan-gold/xarray into e…
ilan-gold Apr 16, 2024
fc60dcf
Merge branch 'main' into extension_arrays
ilan-gold Apr 16, 2024
0374086
Update xarray/core/variable.py
dcherian Apr 16, 2024
b9515a6
Merge branch 'main' into extension_arrays
dcherian Apr 16, 2024
72bf807
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 16, 2024
63b6c42
(fix): bring back error
ilan-gold Apr 17, 2024
1d18439
(chore): add test for dropping cat for mean
ilan-gold Apr 17, 2024
17f05da
Update whats-new.rst
dcherian Apr 17, 2024
c906c81
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 17, 2024
e6db83b
Merge branch 'main' into extension_arrays
ilan-gold Apr 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
(refactor): tests moved
  • Loading branch information
ilan-gold committed Feb 6, 2024
commit d5b218bc05f047d29fd11bf3e7efe8ead7a0b752
103 changes: 103 additions & 0 deletions xarray/tests/test_duck_array_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import datetime as dt
import warnings
from collections.abc import Sequence

import numpy as np
import pandas as pd
Expand All @@ -11,6 +12,10 @@
from xarray import DataArray, Dataset, cftime_range, concat
from xarray.core import dtypes, duck_array_ops
from xarray.core.duck_array_ops import (
ExtensionDuckArray,
__extension_duck_array__broadcast,
__extension_duck_array__concatenate,
__extension_duck_array__where,
array_notnull_equiv,
concatenate,
count,
Expand Down Expand Up @@ -38,11 +43,44 @@
requires_bottleneck,
requires_cftime,
requires_dask,
requires_plum,
)

dask_array_type = array_type("dask")


@pytest.fixture
def categorical1():
return pd.Categorical(["cat1", "cat2", "cat2", "cat1", "cat2"])


@pytest.fixture
def categorical2():
return pd.Categorical(["cat2", "cat1", "cat2", "cat3", "cat1"])


@pytest.fixture
def int1():
return pd.arrays.IntegerArray(
np.array([1, 2, 3, 4, 5]), np.array([True, False, False, True, True])
)


@pytest.fixture
def int2():
return pd.arrays.IntegerArray(
np.array([6, 7, 8, 9, 10]), np.array([True, True, False, True, False])
)


@__extension_duck_array__concatenate.dispatch
def _(arrays: Sequence[pd.arrays.IntegerArray], axis: int = 0, out=None):
values = np.concatenate(arrays)
mask = np.isnan(values)
values = values.astype("int8")
return pd.arrays.IntegerArray(values, mask)


class TestOps:
@pytest.fixture(autouse=True)
def setUp(self):
Expand Down Expand Up @@ -932,3 +970,68 @@ def test_push_dask():
dask.array.from_array(array, chunks=(1, 2, 3, 2, 2, 1, 1)), axis=0, n=n
)
np.testing.assert_equal(actual, expected)


@requires_plum
def test_where_all_categoricals(categorical1, categorical2):
assert (
__extension_duck_array__where(
np.array([True, False, True, False, False]), categorical1, categorical2
)
== pd.Categorical(["cat1", "cat1", "cat2", "cat3", "cat1"])
).all()


@requires_plum
def test_where_drop_categoricals(categorical1, categorical2):
assert (
__extension_duck_array__where(
np.array([False, True, True, False, True]), categorical1, categorical2
).remove_unused_categories()
== pd.Categorical(["cat2", "cat2", "cat2", "cat3", "cat2"])
).all()


@requires_plum
def test_broadcast_to_categorical(categorical1):
with pytest.raises(NotImplementedError):
__extension_duck_array__broadcast(categorical1, (5, 2))


@requires_plum
def test_broadcast_to_same_categorical(categorical1):
assert (__extension_duck_array__broadcast(categorical1, (5,)) == categorical1).all()


@requires_plum
def test_concategorical_categorical(categorical1, categorical2):
assert (
__extension_duck_array__concatenate([categorical1, categorical2])
== type(categorical1)._concat_same_type((categorical1, categorical2))
).all()


@requires_plum
def test_integer_array_register_concatenate(int1, int2):
assert (
__extension_duck_array__concatenate([int1, int2])
== type(int1)._concat_same_type((int1, int2))
).all()


def test_duck_extension_array_equality(categorical1, int1):
int_duck_array = ExtensionDuckArray(int1)
categorical_duck_array = ExtensionDuckArray(categorical1)
assert (int_duck_array != categorical_duck_array).all()
assert (categorical_duck_array == categorical1).all()
assert (int1[0:2] == int_duck_array[0:2]).all()


def test_duck_extension_array_repr(int1):
int_duck_array = ExtensionDuckArray(int1)
assert repr(int1) in repr(int_duck_array)


def test_duck_extension_array_attr(int1):
int_duck_array = ExtensionDuckArray(int1)
assert (~int_duck_array.fillna(10)).all()
110 changes: 0 additions & 110 deletions xarray/tests/test_extension_array.py

This file was deleted.

0