10000 BUG: Fix array_equal for numeric and non-numeric scalar types by eendebakpt · Pull Request #27275 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix array_equal for numeric and non-numeric scalar types #27275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

10000
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
BUG: Fix array_equal for numeric and non-numeric scalar types
  • Loading branch information
eendebakpt committed Aug 23, 2024
commit 2211bc9c20e69552c936c41d177a849b7b205338
16 changes: 8 additions & 8 deletions numpy/_core/numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -2554,24 +2554,24 @@ def array_equal(a1, a2, equal_nan=False):
if a1.shape != a2.shape:
return False
if not equal_nan:
return builtins.bool((a1 == a2).all())
cannot_have_nan = (_dtype_cannot_hold_nan(a1.dtype)
and _dtype_cannot_hold_nan(a2.dtype))
if cannot_have_nan:
if a1 is a2:
return True
return builtins.bool((a1 == a2).all())
return builtins.bool((asarray(a1 == a2)).all())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this changes behaviour for MaskedArray, since it returns a masked array of bool, and here you strip the mask. In principle, not bad conceptually, since the underlying value is False if one array element is masked and the other is not, and True if both are masked. But definitely a change in API that might catch people by surprise: it will no longer be just (a1 == a2).all(), which I think is what most people would expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. I did not check the reasoning for the asarray - asanyarray would solve the problem with MaskedArray - I think that may be good enough to solve the problem here too? Alternatively, replace (...).all() with np.all(...) - that will deal with python bools fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, maybe np.all is best, then. The asarray was restoring old behavior, but hand't though about it potentially making a regression for masked (where removing it was maybe an improvement)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this vague feeling that I indeed removed to ensure MaskedArray worked. But it may have been assert_array_equal... Unfortunately, MaskedArray tests do not really cover the case with regular numpy functions much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using asanyarray preserves the masked array and solves issue #27271, so I updated the PR to use that.

@mhvk Does that address your point? As far as I understand it, the problem with asarray is that it creates an unnecessary copy of the masked array (and hence uses more memory).

I am also looking into updating the scalar comparison to solve #27271 by making np.array(1) == np.array('s') return a numpy bool instead of a python bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not the use of memory, but rather than np.asarray makes any array subclass into a regular ndarray. So, np.asanyarray solves that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second line of np.array_equal all input arrays are transformed with asarray, so I am not sure any masked arrays can arrive at the lines we are discussing now. We could change the second line into a1, a2 = asanyarray(a1), asanyarray(a2) so that masked arrays are handled.

To make it a bit more concrete here is a minimal example showing the behaviour of masked arrays.

import numpy as np
x = np.ma.array([1], mask=[1])
y = np.ma.array([1], mask=[1])
w = np.ma.array([0], mask=[1])
z = x == y

bool(z.all()), bool(np.asanyarray(z).all()), bool(np.asarray(z).all()) # False, False, True
np.array_equal(x, y) # what should this be?
np.array_equal(x, w) # what should this be?

On numpy 1.26 and 2.0 the output of np.array_equal(x, y) is True and the output of np.array_equal(x, w) is False. But since the input arrays are fully masked, one could argue that the output of both np.array_equal(x, y) and np.array_equal(x, w) should be the same (but I would not be sure whether the result should then be True or False).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I am not sure any masked arrays can arrive at the lines we are discussing now

Yes, you are right. But I guess asanyarray is just as well, anyway, it just shouldn't make a difference.


if a1 is a2:
# nan will compare equal so an array will compare equal to itself.
return True

cannot_have_nan = (_dtype_cannot_hold_nan(a1.dtype)
and _dtype_cannot_hold_nan(a2.dtype))
if cannot_have_nan:
return builtins.bool(asarray(a1 == a2).all())

# Handling NaN values if equal_nan is True
a1nan, a2nan = isnan(a1), isnan(a2)
# NaN's occur at different locations
if not (a1nan == a2nan).all():
return False
# Shapes of a1, a2 and masks are guaranteed to be consistent by this point
return builtins.bool((a1[~a1nan] == a2[~a1nan]).all())
return builtins.bool(asarray(a1[~a1nan] == a2[~a1nan]).all())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this one might not be needed because we can only end up here with numeric types (the calls isnan(a1) and isnan(a2) must be valid).

Copy link
Member
@seberg seberg Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, all numeric types can be compared. So I think you can skip it if you like.
(That is part of it, the other is that the index result is always 1-D)



def _array_equiv_dispatcher(a1, a2):
Expand Down
7 changes: 7 additions & 0 deletions numpy/_core/tests/test_numeric.py
Original file line number Diff line number Diff line change
Expand Up @@ -2192,6 +2192,13 @@ def test_array_equal_equal_nan(self, bx, by, equal_nan, expected):
assert_(res is expected)
assert_(type(res) is bool)

def test_array_equal_different_scalar_types(self):
# https://github.com/numpy/numpy/issues/27271
a = np.array("foo")
b = np.array(1)
assert not np.array_equal(a, b)
assert not np.array_equiv(a, b)

def test_none_compares_elementwise(self):
a = np.array([None, 1, None], dtype=object)
assert_equal(a == None, [True, False, True])
Expand Down
Loading
0