10000 BUG: Fixed an issue wherein certain `nan<x>` functions could fail for object arrays by BvB93 · Pull Request #19821 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fixed an issue wherein certain nan<x> functions could fail for object arrays #19821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 5, 2021

Conversation

BvB93
Copy link
Member
@BvB93 BvB93 commented Sep 3, 2021

Previously there were a number of cases wherein the nan<x> functions (e.g. nanmedian) could fail for object arrays,
be it either due to a call to isnan or assuming that array reduction would always return a generic subclass,
the latter being in the possession of a dtype attribute.

@BvB93 BvB93 added the 00 - Bug label Sep 3, 2021
@BvB93 BvB93 marked this pull request as draft September 3, 2021 14:25
@BvB93 BvB93 marked this pull request as ready for review September 3, 2021 15:02
np.uint16, np.uint32, np.uint64)
@pytest.mark.parametrize(
"dtype",
np.typecodes["AllInteger"] + np.typecodes["AllF 10000 loat"] + "O",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.typecodes['AllFloat'][1:] would remove 'e', but is not quite as robust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, as of fffcb6e the tests seem to work fine with np.float16.

# Precaution against reduced object arrays
try:
return a.dtype.type(a / b)
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster than checking for the attribute as you do below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During a quick local test it shaves off about 100 ns (~400 vs ~300 ns) for the non-object case.
Would you feel this performance gain is worthwhile enough to stick to the current try/except block?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see it making a difference in the big picture. There might be a small difference in clarity, but with one liners that is debatable ...

@charris
Copy link
Member
charris commented Sep 4, 2021

LGTM, couple of nits. I assume the main fix here is for the case of non-array objects being returned.

@@ -160,8 +160,12 @@ def _remove_nan_1d(arr1d, overwrite_input=False):
True if `res` can be modified in place, given the constraint on the
input
"""
if arr1d.dtype == object:
# object arrays do not support `isnan` (gh-9009), so make a guess
c = np.not_equal(arr1d, arr1d, dtype=bool)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clever.

@charris charris merged commit b3a66e8 into numpy:main Sep 5, 2021
@charris
Copy link
Member
charris commented Sep 5, 2021

Thanks Bas.

The circleci failure may be ignored, it would be fixed by a rebase. It might be worth testing some of these function with 0-D arrays, but that is for another PR.

@BvB93 BvB93 deleted the nanfunctions branch September 6, 2021 07:12
@seberg seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Sep 7, 2021
@seberg
Copy link
Member
seberg commented Sep 7, 2021

Should we add a release note for this? (A brief ping to the mailing list may also be nice, it is an API change after all.)

I also wonder if we want to do this for the nan functions, whether we should actually upgrade isnan to use this assumption? (I am fine with either though, defining a != a as the definition for NaN in nan<x> can very much be independent of isnan.

I am fine with it, though. Just thought it might be good to add a note and ping the list.

seberg added a commit to seberg/numpy that referenced this pull request Sep 7, 2021
The test needs to be updated, since this PR adds the correct checks
for floating point errors to accumulations.
@BvB93
Copy link
Member Author
BvB93 commented Sep 8, 2021

Should we add a release note for this? (A brief ping to the mailing list may also be nice, it is an API change after all.)

I would argue that this is really just a bug fix; namelly I'd reasonably expect the <x> and nan<x> functions to produce the same result if no nan-values are involved. The only thing that might potentially warrant a ping is the a != a approach for finding object-based nans, and even this is based on a similar approach used in a prior PR (#9013).

@seberg
Copy link
Member
seberg commented Sep 8, 2021

Ah OK, I did not realize we already have this logic in other places.

@BvB93
Copy link
Member Author
BvB93 commented Sep 9, 2021

@charris turns out 0d arrays are very much a worthwhile test case here: #19854.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0