-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122
Conversation
numpy/testing/_private/utils.py
Outdated
def chk_same_position(x_id, y_id, hasval='nan'): | ||
"""Handling nan/inf: check that x and y have the nan/inf at the same | ||
locations.""" | ||
def func_same_pos(func, x, y, hasval='nan'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a full docstring. I think the original name is a bit better. Maybe change the signature to (x, y, func, hasval='nan')
. Do we really need both hasval
and func
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func
does not always have a name (used with a lambda
below to test for +/- inf). I'll change to func_chk_same_pos
.
numpy/testing/_private/utils.py
Outdated
8000 | x = x[~flag] | |
if y.shape: | ||
y = y[~flag] | ||
return x, y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this function - can't it just be written return x[~flag], y[~flag]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, x
or y
can be a scalar while the other is an array. This is why the flags were treated separately before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue here is avoiding arr_0d[bool_nd]
, right? Or is it about avoiding non_generic_scalar[bool_0d]
?
Would x, y = np.broadcast_arrays(x, y)
be off the table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not crazy, but I'd rather stick with what I have here...
numpy/testing/_private/utils.py
Outdated
else: | ||
x, y = remove_flagged(x, y, xy_isnan) | ||
elif xy_isinf is not False: | ||
x, y = remove_flagged(x, y, xy_isinf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like micro-optimization here - can't we just write in place of these 7 lines:
to_remove = xy_isinf | xy_isnan
x = x[~to_remove]
y = y[~to_remove]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't work for the same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem was that xy_isinf
could just be False
, in which case the flags would not broadcast correctly. But of course, this is easily checked in remove_flagged
, so I did that now.
4237e61
to
d324b60
Compare
@charris, @eric-wieser - thanks for the comments; I made some changes which I think make the use clearer. |
numpy/testing/_private/utils.py
Outdated
if flag is False: | ||
return x, y | ||
if flag.all(): | ||
return array(()), array(()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'd use []
here for a size-0 array - I'm used to seeing ()
for a 0dim array, so this is a little jarring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
numpy/testing/_private/utils.py
Outdated
return x_id if x_id.shape else y_id | ||
|
||
def remove_flagged(x, y, flag): | ||
"""Remove elements in x and y, noting that either may be scalar.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
python or numpy scalar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy, I'll add to the comment.
numpy/testing/_private/utils.py
Outdated
def chk_same_position(x_id, y_id, hasval='nan'): | ||
"""Handling nan/inf: check that x and y have the nan/inf at the same | ||
locations.""" | ||
def func_chk_same_pos(x, y, func=isnan, hasval='nan'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we spell this check
instead of chk
? Or even assert_same_predicate_pos
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, assert_same_pos
d324b60
to
c1544bf
Compare
numpy/testing/_private/utils.py
Outdated
@@ -706,6 +711,20 @@ def chk_same_position(x_id, y_id, hasval='nan'): | |||
% (hasval), verbose=verbose, header=header, | |||
names=('x', 'y'), precision=precision) | |||
raise AssertionError(msg) | |||
return x_id if x_id.shape else y_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is backward - you should return the scalar boolean mask here, not the array one (which would at this point be equal to the scalar at every position anyway). That allows you to remove all of your special cases in remove_flagged
(and replace it with the one-liner I suggest below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed, you are completely right - that gets rid of remove_flagged
altogether. Now implemented.
numpy/testing/_private/utils.py
Outdated
x = x[~x_isinf] | ||
y = y[~y_isinf] | ||
xy_isinf = func_assert_same_pos(x, y, | ||
func=lambda xy: xy == +inf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.isposinf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learn something new every day...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But doesn't work for complex numbers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does what you have here work for complex numbers?
c1544bf
to
46324d1
Compare
numpy/testing/_private/utils.py
Outdated
# flag as it everywhere, so we should return the scalar flag. | ||
return x_id if not x_id.shape else y_id | F438||
|
||
def remove_flagged(x, y, flag): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove this function now
numpy/testing/_private/utils.py
Outdated
if any(x_isnat) or any(y_isnat): | ||
x = x[~x_isnat] | ||
y = y[~y_isnat] | ||
if getattr(flagged, 'shape', ()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this attribute guaranteed to exist? As long as x and y are numpy scalars, then you don't need this guard at all - scalar[bool_scalar]
works just fine.
Plus then x
and y
are always 1d, so you can get rid of the isinstance(val, bool)
and ravel()
below too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it can be False
-- initialized as such. I tried changing, but cannot get rid of the if flagged
stanza, nor of the isinstance
- I suggest we leave this to another PR...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just initialize it to flagged = np.False_
, to at least eliminates the getattr
). I'd then spell that if flagged.ndim != 0
for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but I do feel that I'm then decreasing the clarity of the rest of the code - at least in my emacs, False
is coloured very nicely, and false_
is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue that keeping a variable type consistent is more important than having the color look like in your editor.
I tried changing, but cannot get rid of the if flagged stanza
I'm gonna do a checkout to see what I'm missing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about bool_(False)
, which meets both requirements?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see why the 0d case still has special handling - I'd forgotten that arr[bool_0d].shape == (?,) + arr.shape
, and is not 1D.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or an alternate spelling - np.ndim(flagged) == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like the ndim
- that is much clearer why we do it. Pushed up.
Does the inf-checking behave on complex numbers? |
46324d1
to
ceb6e5d
Compare
On the |
Complex number checking is out of scope here - that's fine. Maybe open an issue about it if you can produce a test-case |
Cannot immediately produce one. Partially because of an oddity:
But even accounting for that, no, I cannot seem to break it! |
Interesting oddity there. I suppose you have to use |
numpy/testing/_private/utils.py
Outdated
@@ -706,6 +711,9 @@ def chk_same_position(x_id, y_id, hasval='nan'): | |||
% (hasval), verbose=verbose, header=header, | |||
names=('x', 'y'), precision=precision) | |||
raise AssertionError(msg) | |||
# If there is a scalar, then here we know the array has the same | |||
# flag as it everywhere, so we should return the scalar flag. | |||
return x_id if not x_id.shape else y_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this is clearer as x_id.ndim == 0
rather than not x_id.shape
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
Just the nits about |
ceb6e5d
to
66d22aa
Compare
The removal of nan and inf from arrays that are compared using test routines like assert_array_equal treated the two arrays separately, which for masked arrays meant that some elements would not be removed when they should have been. This PR corrects this.
66d22aa
to
5718b33
Compare
Took the liberty of adding and explicit |
numpy/testing/_private/utils.py
Outdated
x_id = func(x) | ||
y_id = func(y) | ||
if not any(x_id) and not any(y_id): | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these two lines needed? Aren't they covered by the below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd think not, but numerous recursion failures if I leave them out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what happens is that it does recurse, but the if
statement prevents it from doing it more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is simple to solve - just use any
- will push fix up shortly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't work either, but passing in equal_[nan|inf]=False
does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment saying that this is needed to prevent recursion or something?
Good to go either way
21dd5aa
to
34bc491
Compare
What a pain! But finally got it working with a more sensible implementation. Turns out I was partially misled by test that failed because they used subclasses that were not written correctly (one by me...). |
numpy/testing/_private/utils.py
Outdated
def func_assert_same_pos(x, y, func=isnan, hasval='nan'): | ||
"""Handling nan/inf: combine results of running func on x and y, | ||
checking that they are True at the same locations.""" | ||
# Both the cast to bool_ and the != True comparison are done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move the cast to bool to the scalar case below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed to cast both x_id
and y_id
, but that can indeed be done at the end. That does look a bit nicer, so done.
@@ -55,6 +55,10 @@ def __array_wrap__(self, obj, context=None): | |||
obj.metadata = self.metadata | |||
return obj | |||
|
|||
def __array_finalize__(self, obj): | |||
self.metadata = getattr(obj, 'metadata', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe in the commit message why these changes were needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. This was indeed a non-trivial one - I had to look again to remind me why it was needed. I wondered about the use of stating it in the commit message until I recalled that with git blame
one can find why this was added.
34bc491
to
7eb1533
Compare
numpy/testing/_private/utils.py
Outdated
# If there is a scalar, then here we know the array has the same | ||
# flag as it everywhere, so we should return the scalar flag. | ||
return (bool_(x_id) if x_id.ndim == 0 else | ||
(bool_(y_id) if y_id.ndim == 0 else y_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably clearer as just a normal if / else chain at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed; done.
if x.size == 0: | ||
return | ||
elif flagged: | ||
# no sense doing comparison if everything is flagged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I agree with this - what if x
and y
are quantities of different dimensions? The test should still compare the empty arrays so that the correct error is raised, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same case as the if x.size == 0
stanza above, but for the case where x
and y
are both scalar. In principle, I agree the comparison should work for empty elements, but in practice, this is not the case (e.g., matrix
changing its shape being one problem...)
Since this was here before, I propose to leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i agree this is out of scope for the patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it sadly remains a chunk of rather fragile code...
Sorry for dropping the ball on this one. Would be great to get this into 1.15. |
This brought to light two bugs in tests, which are fixed here, viz., that a sample ndarray subclass that tested propagation of an added parameter was incomplete, in that in propagating the parameter in __array_wrap__ it assumed it was there on self, but that assumption could be broken when a view of self was taken (as is done by x[~flagged] in the test routine), since there was no __array_finalize__ defined. The other subclass bug counted, incorrectly, on only needing to provide one type of comparison, the __lt__ being explicitly tested. But flags are compared with __eq__ and those flags will have the same subclass.
7eb1533
to
3ad49aa
Compare
Out of curiosity, how many of the affected tests are already shadowed in |
Thanks Marten. |
Wow thanks @mhvk and the rest, this indeed was not trivial! |
# compared usefully, and for which .all() yields masked. | ||
x_id = func(x) | ||
y_id = func(y) | ||
if (x_id == y_id).all() != True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm,
In [10]: ma.masked != True
Out[10]: masked
In [11]: ma.masked == True
Out[11]: masked
In [12]: bool(ma.masked)
Out[12]: False
@mhvk The camparison to True seems to achieve nothing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose is to ensure that the stanza is not executed if x_id
or y_id
is masked
, yet is executed if they are different.
The removal of nan and inf from arrays that are compared using test routines like assert_array_equal treated the two arrays separately, which for masked arrays meant that some elements would not be removed when they should have been. This PR corrects this.
fixes #11121