8000 BUG,MAINT: Ensure masked elements can be tested against nan and inf. by mhvk · Pull Request #11122 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 7, 2018

Conversation

mhvk
Copy link
Contributor
@mhvk mhvk commented May 19, 2018

The removal of nan and inf from arrays that are compared using test routines like assert_array_equal treated the two arrays separately, which for masked arrays meant that some elements would not be removed when they should have been. This PR corrects this.

fixes #11121

def chk_same_position(x_id, y_id, hasval='nan'):
"""Handling nan/inf: check that x and y have the nan/inf at the same
locations."""
def func_same_pos(func, x, y, hasval='nan'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a full docstring. I think the original name is a bit better. Maybe change the signature to (x, y, func, hasval='nan'). Do we really need both hasval and func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func does not always have a name (used with a lambda below to test for +/- inf). I'll change to func_chk_same_pos.

8000
x = x[~flag]
if y.shape:
y = y[~flag]
return x, y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this function - can't it just be written return x[~flag], y[~flag]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, x or y can be a scalar while the other is an array. This is why the flags were treated separately before.

Copy link
Member
@eric-wieser eric-wieser May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue here is avoiding arr_0d[bool_nd], right? Or is it about avoiding non_generic_scalar[bool_0d]?

Would x, y = np.broadcast_arrays(x, y) be off the table?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not crazy, but I'd rather stick with what I have here...

else:
x, y = remove_flagged(x, y, xy_isnan)
elif xy_isinf is not False:
x, y = remove_flagged(x, y, xy_isinf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like micro-optimization here - can't we just write in place of these 7 lines:

to_remove = xy_isinf | xy_isnan
x = x[~to_remove]
y = y[~to_remove]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work for the same reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was that xy_isinf could just be False, in which case the flags would not broadcast correctly. But of course, this is easily checked in remove_flagged, so I did that now.

@eric-wieser eric-wieser added the component: numpy.ma masked arrays label May 25, 2018
@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from 4237e61 to d324b60 Compare May 25, 2018 12:51
@mhvk
Copy link
Contributor Author
mhvk commented May 25, 2018

@charris, @eric-wieser - thanks for the comments; I made some changes which I think make the use clearer.

if flag is False:
return x, y
if flag.all():
return array(()), array(())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd use [] here for a size-0 array - I'm used to seeing () for a 0dim array, so this is a little jarring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

return x_id if x_id.shape else y_id

def remove_flagged(x, y, flag):
"""Remove elements in x and y, noting that either may be scalar."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python or numpy scalar?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy, I'll add to the comment.

def chk_same_position(x_id, y_id, hasval='nan'):
"""Handling nan/inf: check that x and y have the nan/inf at the same
locations."""
def func_chk_same_pos(x, y, func=isnan, hasval='nan'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we spell this check instead of chk? Or even assert_same_predicate_pos?

Copy link
8000
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, assert_same_pos

@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from d324b60 to c1544bf Compare May 25, 2018 18:57
@@ -706,6 +711,20 @@ def chk_same_position(x_id, y_id, hasval='nan'):
% (hasval), verbose=verbose, header=header,
names=('x', 'y'), precision=precision)
raise AssertionError(msg)
return x_id if x_id.shape else y_id
Copy link
Member
@eric-wieser eric-wieser May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is backward - you should return the scalar boolean mask here, not the array one (which would at this point be equal to the scalar at every position anyway). That allows you to remove all of your special cases in remove_flagged (and replace it with the one-liner I suggest below)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed, you are completely right - that gets rid of remove_flagged altogether. Now implemented.

x = x[~x_isinf]
y = y[~y_isinf]
xy_isinf = func_assert_same_pos(x, y,
func=lambda xy: xy == +inf,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.isposinf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Learn something new every day...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But doesn't work for complex numbers...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does what you have here work for complex numbers?

@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from c1544bf to 46324d1 Compare May 25, 2018 20:07
F438
# flag as it everywhere, so we should return the scalar flag.
return x_id if not x_id.shape else y_id

def remove_flagged(x, y, flag):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to remove this function now

if any(x_isnat) or any(y_isnat):
x = x[~x_isnat]
y = y[~y_isnat]
if getattr(flagged, 'shape', ()):
Copy link
Member
@eric-wieser eric-wieser May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this attribute guaranteed to exist? As long as x and y are numpy scalars, then you don't need this guard at all - scalar[bool_scalar] works just fine.

Plus then x and y are always 1d, so you can get rid of the isinstance(val, bool) and ravel() below too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it can be False -- initialized as such. I tried changing, but cannot get rid of the if flagged stanza, nor of the isinstance - I suggest we leave this to another PR...

Copy link
Member
@eric-wieser eric-wieser May 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just initialize it to flagged = np.False_, to at least eliminates the getattr). I'd then spell that if flagged.ndim != 0 for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but I do feel that I'm then decreasing the clarity of the rest of the code - at least in my emacs, False is coloured very nicely, and false_ is not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that keeping a variable type consistent is more important than having the color look like in your editor.

I tried changing, but cannot get rid of the if flagged stanza

I'm gonna do a checkout to see what I'm missing here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about bool_(False), which meets both requirements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see why the 0d case still has special handling - I'd forgotten that arr[bool_0d].shape == (?,) + arr.shape, and is not 1D.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or an alternate spelling - np.ndim(flagged) == 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I like the ndim - that is much clearer why we do it. Pushed up.

@eric-wieser
Copy link
Member

Does the inf-checking behave on complex numbers?

@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from 46324d1 to ceb6e5d Compare May 27, 2018 00:48
@mhvk
Copy link
Contributor Author
mhvk commented May 27, 2018

On the inf checking on complex numbers: I really don't want to go there right now... Am having too much fun with the gufunc signature stuff..

@eric-wieser
Copy link
Member

Complex number checking is out of scope here - that's fine. Maybe open an issue about it if you can produce a test-case

@mhvk
Copy link
Contributor Author
mhvk commented May 27, 2018

Cannot immediately produce one. Partially because of an oddity:

1.+1j*np.inf
# (nan+infj)

But even accounting for that, no, I cannot seem to break it!

@eric-wieser
Copy link
Member
eric-wieser commented May 27, 2018

Interesting oddity there. I suppose you have to use complex(1, np.inf). I guess complex number infinities are not very useful unless in polar coordinates.

@@ -706,6 +711,9 @@ def chk_same_position(x_id, y_id, hasval='nan'):
% (hasval), verbose=verbose, header=header,
names=('x', 'y'), precision=precision)
raise AssertionError(msg)
# If there is a scalar, then here we know the array has the same
# flag as it everywhere, so we should return the scalar flag.
return x_id if not x_id.shape else y_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is clearer as x_id.ndim == 0 rather than not x_id.shape

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

@eric-wieser
Copy link
Member
eric-wieser commented May 27, 2018

Just the nits about ndim and shape remain - you're right that it's harder to remove the special cases than I'd thought

@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from ceb6e5d to 66d22aa Compare May 27, 2018 12:08
The removal of nan and inf from arrays that are compared using
test routines like assert_array_equal treated the two arrays
separately, which for masked arrays meant that some elements would
not be removed when they should have been. This PR corrects this.
@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from 66d22aa to 5718b33 Compare May 27, 2018 12:09
@eric-wieser
Copy link
Member

Took the liberty of adding and explicit != 0 comparison, since that's normally how we spell ndim comparisons

x_id = func(x)
y_id = func(y)
if not any(x_id) and not any(y_id):
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two lines needed? Aren't they covered by the below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd think not, but numerous recursion failures if I leave them out...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what happens is that it does recurse, but the if statement prevents it from doing it more than once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is simple to solve - just use any - will push fix up shortly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't work either, but passing in equal_[nan|inf]=False does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment saying that this is needed to prevent recursion or something?

Good to go either way

@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from 21dd5aa to 34bc491 Compare May 27, 2018 17:16
@mhvk
Copy link
Contributor Author
mhvk commented May 27, 2018

What a pain! But finally got it working with a more sensible implementation. Turns out I was partially misled by test that failed because they used subclasses that were not written correctly (one by me...).

def func_assert_same_pos(x, y, func=isnan, hasval='nan'):
"""Handling nan/inf: combine results of running func on x and y,
checking that they are True at the same locations."""
# Both the cast to bool_ and the != True comparison are done
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move the cast to bool to the scalar case below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to cast both x_id and y_id, but that can indeed be done at the end. That does look a bit nicer, so done.

@@ -55,6 +55,10 @@ def __array_wrap__(self, obj, context=None):
obj.metadata = self.metadata
return obj

def __array_finalize__(self, obj):
self.metadata = getattr(obj, 'metadata', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe in the commit message why these changes were needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This was indeed a non-trivial one - I had to look again to remind me why it was needed. I wondered about the use of stating it in the commit message until I recalled that with git blame one can find why this was added.

@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from 34bc491 to 7eb1533 Compare May 29, 2018 13:26
# If there is a scalar, then here we know the array has the same
# flag as it everywhere, so we should return the scalar flag.
return (bool_(x_id) if x_id.ndim == 0 else
(bool_(y_id) if y_id.ndim == 0 else y_id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably clearer as just a normal if / else chain at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed; done.

if x.size == 0:
return
elif flagged:
# no sense doing comparison if everything is flagged.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree with this - what if x and y are quantities of different dimensions? The test should still compare the empty arrays so that the correct error is raised, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the same case as the if x.size == 0 stanza above, but for the case where x and y are both scalar. In principle, I agree the comparison should work for empty elements, but in practice, this is not the case (e.g., matrix changing its shape being one problem...)

Since this was here before, I propose to leave it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i agree this is out of scope for the patch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it sadly remains a chunk of rather fragile code...

@eric-wieser
Copy link
Member

Sorry for dropping the ball on this one. Would be great to get this into 1.15.

@mhvk mhvk added this to the 1.15.0 release milestone Jun 4, 2018
This brought to light two bugs in tests, which are fixed here, viz.,
that a sample ndarray subclass that tested propagation of an added
parameter was incomplete, in that in propagating the parameter in
__array_wrap__ it assumed it was there on self, but that assumption
could be broken when a view of self was taken (as is done by
x[~flagged] in the test routine), since there was no
__array_finalize__ defined.

The other subclass bug counted, incorrectly, on only needing to provide
one type of comparison, the __lt__ being explicitly tested.  But flags
are compared with __eq__ and those flags will have the same subclass.
@mhvk mhvk force-pushed the assert-array-comparison-with-masked branch from 7eb1533 to 3ad49aa Compare June 4, 2018 13:38
@charris
Copy link
Member
charris commented Jun 4, 2018

Out of curiosity, how many of the affected tests are already shadowed in numpy/ma/testutils.py?

@charris charris merged commit 5cbb982 into numpy:master Jun 7, 2018
@charris
Copy link
Member
charris commented Jun 7, 2018

Thanks Marten.

@mhvk mhvk deleted the assert-array-comparison-with-masked branch June 7, 2018 20:16
@taldcroft
Copy link

Wow thanks @mhvk and the rest, this indeed was not trivial!

# compared usefully, and for which .all() yields masked.
x_id = func(x)
y_id = func(y)
if (x_id == y_id).all() != True:
Copy link
Member
@charris charris Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm,

In [10]: ma.masked != True
Out[10]: masked

In [11]: ma.masked == True
Out[11]: masked

In [12]: bool(ma.masked)
Out[12]: False

@mhvk The camparison to True seems to achieve nothing here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to ensure that the stanza is not executed if x_id or y_id is masked, yet is executed if they are different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect behavior of np.testing.assert_array_almost_equal on linux, windows
4 participants
0