8000 API: Uses pd.NA in IntegerArray by TomAugspurger · Pull Request #29964 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

API: Uses pd.NA in IntegerArray #29964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 57 commits into from
Dec 30, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
1eec965
API: Uses pd.NA in IntegerArray
TomAugspurger Dec 2, 2019
f5f61ea
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 2, 2019
c569562
wip
TomAugspurger Dec 2, 2019
a8261a4
wip
TomAugspurger Dec 3, 2019
c8ff04f
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 3, 2019
cddc9df
fixup value counts
TomAugspurger Dec 3, 2019
9488d34
fixed to_numpy
TomAugspurger Dec 3, 2019
0d5aab8
doc
TomAugspurger Dec 3, 2019
fa61a6d
wip
TomAugspurger Dec 3, 2019
de2c6c6
wip
TomAugspurger Dec 3, 2019
60d7663
wip
TomAugspurger Dec 3, 2019
a4c4618
fixup extension
TomAugspurger Dec 3, 2019
0a500be
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 4, 2019
1c716f3
update tests
TomAugspurger Dec 4, 2019
67c8d51
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
8000 TomAugspurger Dec 4, 2019
22a2bc7
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 4, 2019
34de18e
updates
TomAugspurger Dec 4, 2019
78944d1
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 5, 2019
ffbe299
wip
TomAugspurger Dec 5, 2019
7abf40e
API: Handle pow & rpow special cases
TomAugspurger Dec 5, 2019
36d403d
move
TomAugspurger Dec 6, 2019
f6b4062
Merge remote-tracking branch 'upstream/master' into na-pow
TomAugspurger Dec 6, 2019
945e8cd
revert
TomAugspurger Dec 6, 2019
04546f3
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 6, 2019
a493965
Merge remote-tracking branch 'upstream/master' into na-pow
TomAugspurger Dec 6, 2019
8fc8b3a
fixup
TomAugspurger Dec 6, 2019
a49aa65
handle negative
TomAugspurger Dec 6, 2019
8ad166d
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 6, 2019
dd745c3
Merge branch 'na-pow' into NA-scalar+IntegerArray
TomAugspurger Dec 6, 2019
88fa412
expand test
TomAugspurger Dec 6, 2019
0902eef
wip
TomAugspurger Dec 6, 2019
721a1ea
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 9, 2019
c658307
fixup
TomAugspurger Dec 9, 2019
4f9d775
exceptions
TomAugspurger Dec 9, 2019
1244ef4
wip
TomAugspurger Dec 9, 2019
4a34b45
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 9, 2019
5293d87
fixup
TomAugspurger Dec 9, 2019
39f225a
arrow
TomAugspurger Dec 9, 2019
ea19b2d
update
TomAugspurger Dec 9, 2019
fe2d98e
fixup
TomAugspurger Dec 10, 2019
68fe155
update
TomAugspurger Dec 10, 2019
f27a5c2
fixup
TomAugspurger Dec 10, 2019
b97450b
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 16, 2019
5d62af8
updates
TomAugspurger Dec 16, 2019
2bf57d6
test, repr
TomAugspurger Dec 16, 2019
2f4e1cd
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 17, 2019
021dc7b
fixup
TomAugspurger Dec 17, 2019
197f18b
enable
TomAugspurger Dec 17, 2019
259b779
fixup
TomAugspurger Dec 17, 2019
c0cfef9
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 18, 2019
3183d53
ints
TomAugspurger Dec 18, 2019
4986d84
restore comment
TomAugspurger Dec 18, 2019
76806e9
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 30, 2019
64b4ccc
Merge remote-tracking branch 'upstream/master' into NA-scalar+Integer…
TomAugspurger Dec 30, 2019
b39dc60
docs
TomAugspurger Dec 30, 2019
800158d
docs
TomAugspurger Dec 30, 2019
e5d6832
fixup
TomAugspurger Dec 30, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
updates
  • Loading branch information
TomAugspurger committed Dec 4, 2019
commit 34de18e76616513b0499867fb556f13d95f43a2b
26 changes: 16 additions & 10 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -671,16 +671,22 @@ def cmp_method(self, other):
if len(self) != len(other):
raise ValueError("Lengths must match to compare")

# numpy will show a DeprecationWarning on invalid elementwise
# comparisons, this will raise in the future
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous question about this. Is this comment no longer relevant or correct? Or why was it removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, do you know how this is actually hit? If NumPy is going to raise in the future, shouldn't they be seeing that warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is about the warning you get with comparisons with objects / non-broadcastable arrays. Eg:

In [29]: np.array([1, 2]) == "b"   
/home/joris/miniconda3/envs/dev/bin/ipython:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  #!/home/joris/miniconda3/envs/dev/bin/python
Out[29]: False

In [30]: pd.array([1, 2]) == "b" 
Out[30]: array([False, False])

(it seems IntegerArray already handles this fine, not sure there is a explicit test for that)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems IntegerArray already handles this fine,

Gotch. It's silencing the same warning from NumPy, and falling back to invalid_comparison, which returns the expected result. I'll restore the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... the comment is incorrect. NumPy will perform elementwise comparison in the future, not raise. If they were to raise on that in the future the implementation would be incorrect.

Though I'm still a bit confused, as the NumPy op is returning NotImplemented since we're calling it directly. Will that continue to return NotImplemented? Or will the elementwise result be different?

with warnings.catch_warnings():
warnings.filterwarnings("ignore", "elementwise", FutureWarning)
with np.errstate(all="ignore"):
result = op(self._data, other)
if other is libmissing.NA:
# numpy does not handle pd.NA well as "other" scalar (it returns
# a scalar False instead of an array)
result = np.zeros(self._data.shape, dtype="bool")
mask = np.ones(self._data.shape, dtype="bool")
else:
# numpy will show a DeprecationWarning on invalid elementwise
# comparisons, this will raise in the future
with warnings.catch_warnings():
warnings.filterwarnings("ignore", "elementwise", FutureWarning)
with np.errstate(all="ignore"):
result = op(self._data, other)

# nans propagate
if mask is None:
mask = self._mask
mask = self._mask.copy()
else:
mask = self._mask | mask

Expand Down Expand Up @@ -747,6 +753,7 @@ def _create_arithmetic_method(cls, op):

@unpack_zerodim_and_defer(op.__name__)
def integer_arithmetic_method(self, other):
# nans propagate

mask = None

Expand All @@ -771,15 +778,14 @@ def integer_arithmetic_method(self, other):
if not (is_float(other) or is_integer(other)):
raise TypeError("can only perform ops with numeric values")

# nans propagate
if mask is None:
mask = self._mask
mask = self._mask.copy()
else:
mask = self._mask | mask

# 1 ** np.nan is 1. So we have to unmask those.
if op_name == "pow":
mask = np.where(self == 1, False, mask)
mask = np.where(self._data == 1, False, mask)

elif op_name == "rpow":
mask = np.where(other == 1, False, mask)
Expand Down
16 changes: 14 additions & 2 deletions pandas/tests/arrays/test_integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ def _check_op(self, s, op_name, other, exc=None):

# 1 ** na is na, so need to unmask those
if op_name == "__pow__":
mask = np.where(s == 1, False, mask)
mask = np.where(s.to_numpy() == 1, False, mask)

elif op_name == "__rpow__":
mask = np.where(other == 1, False, mask)
Expand Down Expand Up @@ -265,25 +265,34 @@ def test_arith_integer_array(self, data, all_arithmetic_operators):
rhs = pd.Series([1] * len(data), dtype=data.dtype)
rhs.iloc[-1] = np.nan

if op in {"__pow__", "__rpow__"}:
pytest.skip("TODO")

self._check_op(s, op, rhs)

def test_arith_series_with_scalar(self, data, all_arithmetic_operators):
# scalar
op = all_arithmetic_operators
if op in {"__pow__", "__rpow__"}:
pytest.skip("TODO")

s = pd.Series(data)
self._check_op(s, op, 1, exc=TypeError)

def test_arith_frame_with_scalar(self, data, all_arithmetic_operators):
# frame & scalar
op = all_arithmetic_operators
if op in {"__pow__", "__rpow__"}:
pytest.skip("TODO")

df = pd.DataFrame({"A": data})
self._check_op(df, op, 1, exc=TypeError)

def test_arith_series_with_array(self, data, all_arithmetic_operators):
# ndarray & other series
op = all_arithmetic_operators
if op in {"__pow__", "__rpow__"}:
pytest.skip("TODO")

s = pd.Series(data)
other = np.ones(len(s), dtype=s.dtype.type)
Expand All @@ -292,6 +301,9 @@ def test_arith_series_with_array(self, data, all_arithmetic_operators):
def test_arith_coerce_scalar(self, data, all_arithmetic_operators):

op = all_arithmetic_operators
if op in {"__pow__", "__rpow__"}:
pytest.skip("TODO")

s = pd.Series(data)

other = 0.01
Expand Down Expand Up @@ -405,7 +417,7 @@ def test_scalar(self, other, all_compare_operators):
result = op(a, other)

if other is pd.NA:
expected = pd.array([None, None, None], dtype="Int64")
expected = pd.array([None, None, None], dtype="boolean")
else:
values = op(a._data, other)
expected = pd.arrays.BooleanArray(values, a._mask, copy=True)
Expand Down
0