-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Fix array_equal for numeric and non-numeric scalar types #27275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/_core/numeric.py
Outdated
# Handling NaN values if equal_nan is True | ||
a1nan, a2nan = isnan(a1), isnan(a2) | ||
# NaN's occur at different locations | ||
if not (a1nan == a2nan).all(): | ||
return False | ||
# Shapes of a1, a2 and masks are guaranteed to be consistent by this point | ||
return builtins.bool((a1[~a1nan] == a2[~a1nan]).all()) | ||
return builtins.bool(asarray(a1[~a1nan] == a2[~a1nan]).all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this one might not be needed because we can only end up here with numeric types (the calls isnan(a1)
and isnan(a2)
must be valid).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, all numeric types can be compared. So I think you can skip it if you like.
(That is part of it, the other is that the index result is always 1-D)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I think this is the right thing for backporting.
Fixing the situation long-term, could be done in the ==
special path (not sure why we return a bool
) and also by changing the ufunc to return 0-D arrays if any input is 0-D. A change I had prepared for a while.
Approving, it is fair with or without unsing that second asarray
call.
numpy/_core/numeric.py
Outdated
# Handling NaN values if equal_nan is True | ||
a1nan, a2nan = isnan(a1), isnan(a2) | ||
# NaN's occur at different locations | ||
if not (a1nan == a2nan).all(): | ||
return False | ||
# Shapes of a1, a2 and masks are guaranteed to be consistent by this point | ||
return builtins.bool((a1[~a1nan] == a2[~a1nan]).all()) | ||
return builtins.bool(asarray(a1[~a1nan] == a2[~a1nan]).all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, all numeric types can be compared. So I think you can skip it if you like.
(That is part of it, the other is that the index result is always 1-D)
numpy/_core/numeric.py
Outdated
if a1 is a2: | ||
return True | ||
return builtins.bool((a1 == a2).all()) | ||
return builtins.bool((asarray(a1 == a2)).all()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this changes behaviour for MaskedArray
, since it returns a masked array of bool, and here you strip the mask. In principle, not bad conceptually, since the underlying value is False
if one array element is masked and the other is not, and True
if both are masked. But definitely a change in API that might catch people by surprise: it will no longer be just (a1 == a2).all()
, which I think is what most people would expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. I did not check the reasoning for the asarray
- asanyarray
would solve the problem with MaskedArray
- I think that may be good enough to solve the problem here too? Alternatively, replace (...).all()
with np.all(...)
- that will deal with python bools fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, maybe np.all
is best, then. The asarray
was restoring old behavior, but hand't though about it potentially making a regression for masked (where removing it was maybe an improvement)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have this vague feeling that I indeed removed to ensure MaskedArray
worked. But it may have been assert_array_equal
... Unfortunately, MaskedArray
tests do not really cover the case with regular numpy functions much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using asanyarray
preserves the masked array and solves issue #27271, so I updated the PR to use that.
@mhvk Does that address your point? As far as I understand it, the problem with asarray
is that it creates an unnecessary copy of the masked array (and hence uses more memory).
I am also looking into updating the scalar comparison to solve #27271 by making np.array(1) == np.array('s')
return a numpy bool instead of a python bool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not the use of memory, but rather than np.asarray
makes any array subclass into a regular ndarray
. So, np.asanyarray
solves that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the second line of np.array_equal
all input arrays are transformed with asarray
, so I am not sure any masked arrays can arrive at the lines we are discussing now. We could change the second line into a1, a2 = asanyarray(a1), asanyarray(a2)
so that masked arrays are handled.
To make it a bit more concrete here is a minimal example showing the behaviour of masked arrays.
import numpy as np
x = np.ma.array([1], mask=[1])
y = np.ma.array([1], mask=[1])
w = np.ma.array([0], mask=[1])
z = x == y
bool(z.all()), bool(np.asanyarray(z).all()), bool(np.asarray(z).all()) # False, False, True
np.array_equal(x, y) # what should this be?
np.array_equal(x, w) # what should this be?
On numpy 1.26 and 2.0 the output of np.array_equal(x, y)
is True
and the output of np.array_equal(x, w)
is False
. But since the input arrays are fully masked, one could argue that the output of both np.array_equal(x, y)
and np.array_equal(x, w)
should be the same (but I would not be sure whether the result should then be True
or False
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I am not sure any masked arrays can arrive at the lines we are discussing now
Yes, you are right. But I guess asanyarray
is just as well, anyway, it just shouldn't make a difference.
Because this is tagged for backport, please squash when merging. |
Thanks @eendebakpt let me put this in for backport, since I think for that it should be good. |
Backport of numpy#27275 Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if `a1 is a2:` check can be done before the calculation of `cannot_have_nan` Closes numpygh-27271
Backport of numpy#27275 Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if `a1 is a2:` check can be done before the calculation of `cannot_have_nan` Closes numpygh-27271
…27275) Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if a1 is a2: check can be done before the calculation of cannot_have_nan
…27275) Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed. The order of statements is slightly reordered, so that the if a1 is a2: check can be done before the calculation of cannot_have_nan
Mitigates #27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed.
The order of statements is slightly reordered, so that the
if a1 is a2:
check can be done before the calculation ofcannot_have_nan
EDIT,NOTE(seberg): Should be backported to both 2.0.2 and 2.1
Closes gh-27271