8000 BUG: Fix array_equal for numeric and non-numeric scalar types by eendebakpt · Pull Request #27275 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix array_equal for numeric and non-numeric scalar types #27275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 26, 2024

Conversation

eendebakpt
Copy link
Contributor
@eendebakpt eendebakpt commented Aug 23, 2024

Mitigates #27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed.

The order of statements is slightly reordered, so that the if a1 is a2: check can be done before the calculation of cannot_have_nan

EDIT,NOTE(seberg): Should be backported to both 2.0.2 and 2.1

Closes gh-27271

@ngoldbaum ngoldbaum added the 09 - 8000 Backport-Candidate PRs tagged should be backported label Aug 23, 2024
# Handling NaN values if equal_nan is True
a1nan, a2nan = isnan(a1), isnan(a2)
# NaN's occur at different locations
if not (a1nan == a2nan).all():
return False
# Shapes of a1, a2 and masks are guaranteed to be consistent by this point
return builtins.bool((a1[~a1nan] == a2[~a1nan]).all())
return builtins.bool(asarray(a1[~a1nan] == a2[~a1nan]).all())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this one might not be needed because we can only end up here with numeric types (the calls isnan(a1) and isnan(a2) must be valid).

Copy link
Member
@seberg seberg Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, all numeric types can be compared. So I think you can skip it if you like.
(That is part of it, the other is that the index result is always 1-D)

@eendebakpt eendebakpt changed the title Draft: BUG: Fix array_equal for numeric and non-numeric scalar types BUG: Fix array_equal for numeric and non-numeric scalar types Aug 23, 2024
Copy link
Member
@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think this is the right thing for backporting.

Fixing the situation long-term, could be done in the == special path (not sure why we return a bool) and also by changing the ufunc to return 0-D arrays if any input is 0-D. A change I had prepared for a while.

Approving, it is fair with or without unsing that second asarray call.

# Handling NaN values if equal_nan is True
a1nan, a2nan = isnan(a1), isnan(a2)
# NaN's occur at different locations
if not (a1nan == a2nan).all():
return False
# Shapes of a1, a2 and masks are guaranteed to be consistent by this point
return builtins.bool((a1[~a1nan] == a2[~a1nan]).all())
return builtins.bool(asarray(a1[~a1nan] == a2[~a1nan]).all())
Copy link
Member
@seberg seberg Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, all numeric types can be compared. So I think you can skip it if you like.
(That is part of it, the other is that the index result is always 1-D)

@charris charris added this to the 2.0.2 release milestone Aug 24, 2024
if a1 is a2:
return True
return builtins.bool((a1 == a2).all())
return builtins.bool((asarray(a1 == a2)).all())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note this changes behaviour for MaskedArray, since it returns a masked array of bool, and here you strip the mask. In principle, not bad conceptually, since the underlying value is False if one array element is masked and the other is not, and True if both are masked. But definitely a change in API that might catch people by surprise: it will no longer be just (a1 == a2).all(), which I think is what most people would expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. I did not check the reasoning for the asarray - asanyarray would solve the problem with MaskedArray - I think that may be good enough to solve the problem here too? Alternatively, replace (...).all() with np.all(...) - that will deal with python bools fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, maybe np.all is best, then. The asarray was restoring old behavior, but hand't though about it potentially making a regression for masked (where removing it was maybe an improvement)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this vague feeling that I indeed removed to ensure MaskedArray worked. But it may have been assert_array_equal... Unfortunately, MaskedArray tests do not really cover the case with regular numpy functions much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using asanyarray preserves the masked array and solves issue #27271, so I updated the PR to use that.

@mhvk Does that address your point? As far as I understand it, the problem with asarray is that it creates an unnecessary copy of the masked array (and hence uses more memory).

I am also looking into updating the scalar comparison to solve #27271 by making np.array(1) == np.array('s') return a numpy bool instead of a python bool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is not the use of memory, but rather than np.asarray makes any array subclass into a regular ndarray. So, np.asanyarray solves that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the second line of np.array_equal all input arrays are transformed with asarray, so I am not sure any masked arrays can arrive at the lines we are discussing now. We could change the second line into a1, a2 = asanyarray(a1), asanyarray(a2) so that masked arrays are handled.

To make it a bit more concrete here is a minimal example showing the behaviour of masked arrays.

import numpy as np
x = np.ma.array([1], mask=[1])
y = np.ma.array([1], mask=[1])
w = np.ma.array([0], mask=[1])
z = x == y

bool(z.all()), bool(np.asanyarray(z).all()), bool(np.asarray(z).all()) # False, False, True
np.array_equal(x, y) # what should this be?
np.array_equal(x, w) # what should this be?

On numpy 1.26 and 2.0 the output of np.array_equal(x, y) is True and the output of np.array_equal(x, w) is False. But since the input arrays are fully masked, one could argue that the output of both np.array_equal(x, y) and np.array_equal(x, w) should be the same (but I would not be sure whether the result should then be True or False).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I am not sure any masked arrays can arrive at the lines we are discussing now

Yes, you are right. But I guess asanyarray is just as well, anyway, it just shouldn't make a difference.

@charris
Copy link
Member
charris commented Aug 25, 2024

Because this is tagged for backport, please squash when merging.

@seberg
Copy link
Member
seberg commented Aug 26, 2024

Thanks @eendebakpt let me put this in for backport, since I think for that it should be good.

@seberg seberg merged commit 10533ca into numpy:main Aug 26, 2024
66 checks passed
charris pushed a commit to charris/numpy that referenced this pull request Aug 26, 2024
Backport of numpy#27275

Mitigates numpy#27271. The underlying issue (an array comparison returning a
python bool instead of a numpy bool) is not addressed.

The order of statements is slightly reordered, so that the if
`a1 is a2:` check can be done before the calculation of
`cannot_have_nan`

Closes numpygh-27271
charris pushed a commit to charris/numpy that referenced this pull request Aug 26, 2024
Backport of numpy#27275

Mitigates numpy#27271. The underlying issue (an array comparison returning a
python bool instead of a numpy bool) is not addressed.

The order of statements is slightly reordered, so that the if
`a1 is a2:` check can be done before the calculation of
`cannot_have_nan`

Closes numpygh-27271
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 26, 2024
@charris charris removed this from the 2.0.2 release milestone Aug 26, 2024
DimitriPapadopoulos pushed a commit to DimitriPapadopoulos/numpy that referenced this pull request Aug 27, 2024
…27275)

Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed.

The order of statements is slightly reordered, so that the if a1 is a2: check can be done before the calculation of cannot_have_nan
ArvidJB pushed a commit to ArvidJB/numpy that referenced this pull request Nov 1, 2024
…27275)

Mitigates numpy#27271. The underlying issue (an array comparison returning a python bool instead of a numpy bool) is not addressed.

The order of statements is slightly reordered, so that the if a1 is a2: check can be done before the calculation of cannot_have_nan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Array comparison between numerical and non-numerical returns builtin bool
5 participants
0