BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122

mhvk · 2018-05-19T16:25:03Z

The removal of nan and inf from arrays that are compared using test routines like assert_array_equal treated the two arrays separately, which for masked arrays meant that some elements would not be removed when they should have been. This PR corrects this.

fixes #11121

charris · 2018-05-21T21:25:48Z

numpy/testing/_private/utils.py

-    def chk_same_position(x_id, y_id, hasval='nan'):
-        """Handling nan/inf: check that x and y have the nan/inf at the same
-        locations."""
+    def func_same_pos(func, x, y, hasval='nan'):


Needs a full docstring. I think the original name is a bit better. Maybe change the signature to (x, y, func, hasval='nan'). Do we really need both hasval and func?

func does not always have a name (used with a lambda below to test for +/- inf). I'll change to func_chk_same_pos.

eric-wieser · 2018-05-25T05:01:29Z

numpy/testing/_private/utils.py

+            x = x[~flag]
+        if y.shape:
+            y = y[~flag]
+        return x, y


I don't understand this function - can't it just be written return x[~flag], y[~flag]?

No, x or y can be a scalar while the other is an array. This is why the flags were treated separately before.

The issue here is avoiding arr_0d[bool_nd], right? Or is it about avoiding non_generic_scalar[bool_0d]?

Would x, y = np.broadcast_arrays(x, y) be off the table?

Not crazy, but I'd rather stick with what I have here...

eric-wieser · 2018-05-25T05:04:43Z

numpy/testing/_private/utils.py

+                else:
+                    x, y = remove_flagged(x, y, xy_isnan)
+            elif xy_isinf is not False:
+                x, y = remove_flagged(x, y, xy_isinf)


Seems like micro-optimization here - can't we just write in place of these 7 lines:

to_remove = xy_isinf | xy_isnan x = x[~to_remove] y = y[~to_remove]

Doesn't work for the same reason.

The problem was that xy_isinf could just be False, in which case the flags would not broadcast correctly. But of course, this is easily checked in remove_flagged, so I did that now.

mhvk · 2018-05-25T12:52:46Z

@charris, @eric-wieser - thanks for the comments; I made some changes which I think make the use clearer.

eric-wieser · 2018-05-25T16:49:03Z

numpy/testing/_private/utils.py

+        if flag is False:
+            return x, y
+        if flag.all():
+            return array(()), array(())


nit: I'd use [] here for a size-0 array - I'm used to seeing () for a 0dim array, so this is a little jarring

eric-wieser · 2018-05-25T16:50:40Z

numpy/testing/_private/utils.py

+        return x_id if x_id.shape else y_id
+
+    def remove_flagged(x, y, flag):
+        """Remove elements in x and y, noting that either may be scalar."""


python or numpy scalar?

numpy, I'll add to the comment.

eric-wieser · 2018-05-25T16:51:09Z

numpy/testing/_private/utils.py

-    def chk_same_position(x_id, y_id, hasval='nan'):
-        """Handling nan/inf: check that x and y have the nan/inf at the same
-        locations."""
+    def func_chk_same_pos(x, y, func=isnan, hasval='nan'):


Can we spell this check instead of chk? Or even assert_same_predicate_pos?

OK, assert_same_pos

eric-wieser · 2018-05-25T19:32:33Z

numpy/testing/_private/utils.py

@@ -706,6 +711,20 @@ def chk_same_position(x_id, y_id, hasval='nan'):
                                % (hasval), verbose=verbose, header=header,
                                names=('x', 'y'), precision=precision)
            raise AssertionError(msg)
+        return x_id if x_id.shape else y_id


I think this is backward - you should return the scalar boolean mask here, not the array one (which would at this point be equal to the scalar at every position anyway). That allows you to remove all of your special cases in remove_flagged (and replace it with the one-liner I suggest below)

Yes, indeed, you are completely right - that gets rid of remove_flagged altogether. Now implemented.

eric-wieser · 2018-05-25T19:36:12Z

numpy/testing/_private/utils.py

-                x = x[~x_isinf]
-                y = y[~y_isinf]
+                xy_isinf = func_assert_same_pos(x, y,
+                                                func=lambda xy: xy == +inf,


np.isposinf?

Learn something new every day...

But doesn't work for complex numbers...

Does what you have here work for complex numbers?

eric-wieser · 2018-05-25T20:53:01Z

numpy/testing/_private/utils.py

+        # flag as it everywhere, so we should return the scalar flag.
+        return x_id if not x_id.shape else y_id
+
+    def remove_flagged(x, y, flag):


Don't forget to remove this function now

eric-wieser · 2018-05-25T20:53:51Z

numpy/testing/_private/utils.py

-                if any(x_isnat) or any(y_isnat):
-                    x = x[~x_isnat]
-                    y = y[~y_isnat]
+        if getattr(flagged, 'shape', ()):


Isn't this attribute guaranteed to exist? As long as x and y are numpy scalars, then you don't need this guard at all - scalar[bool_scalar] works just fine.

Plus then x and y are always 1d, so you can get rid of the isinstance(val, bool) and ravel() below too

No, it can be False -- initialized as such. I tried changing, but cannot get rid of the if flagged stanza, nor of the isinstance - I suggest we leave this to another PR...

Just initialize it to flagged = np.False_, to at least eliminates the getattr). I'd then spell that if flagged.ndim != 0 for clarity

I thought about that, but I do feel that I'm then decreasing the clarity of the rest of the code - at least in my emacs, False is coloured very nicely, and false_ is not.

I'd argue that keeping a variable type consistent is more important than having the color look like in your editor.

I tried changing, but cannot get rid of the if flagged stanza

I'm gonna do a checkout to see what I'm missing here

How about bool_(False), which meets both requirements?

Ok, I see why the 0d case still has special handling - I'd forgotten that arr[bool_0d].shape == (?,) + arr.shape, and is not 1D.

Or an alternate spelling - np.ndim(flagged) == 0

Yes, I like the ndim - that is much clearer why we do it. Pushed up.

eric-wieser · 2018-05-26T00:42:19Z

Does the inf-checking behave on complex numbers?

mhvk · 2018-05-27T00:54:28Z

On the inf checking on complex numbers: I really don't want to go there right now... Am having too much fun with the gufunc signature stuff..

eric-wieser · 2018-05-27T00:56:42Z

Complex number checking is out of scope here - that's fine. Maybe open an issue about it if you can produce a test-case

mhvk · 2018-05-27T01:26:42Z

Cannot immediately produce one. Partially because of an oddity:

1.+1j*np.inf
# (nan+infj)

But even accounting for that, no, I cannot seem to break it!

eric-wieser · 2018-05-27T05:27:18Z

Interesting oddity there. I suppose you have to use complex(1, np.inf). I guess complex number infinities are not very useful unless in polar coordinates.

eric-wieser · 2018-05-27T05:45:57Z

numpy/testing/_private/utils.py

@@ -706,6 +711,9 @@ def chk_same_position(x_id, y_id, hasval='nan'):
                                % (hasval), verbose=verbose, header=header,
                                names=('x', 'y'), precision=precision)
            raise AssertionError(msg)
+        # If there is a scalar, then here we know the array has the same
+        # flag as it everywhere, so we should return the scalar flag.
+        return x_id if not x_id.shape else y_id


nit: I think this is clearer as x_id.ndim == 0 rather than not x_id.shape

eric-wieser · 2018-05-27T06:09:28Z

eric-wieser commented

May 27, 2018

•

The removal of nan and inf from arrays that are compared using test routines like assert_array_equal treated the two arrays separately, which for masked arrays meant that some elements would not be removed when they should have been. This PR corrects this.

eric-wieser · 2018-05-27T14:48:11Z

Took the liberty of adding and explicit != 0 comparison, since that's normally how we spell ndim comparisons

eric-wieser · 2018-05-27T14:49:35Z

numpy/testing/_private/utils.py

+        x_id = func(x)
+        y_id = func(y)
+        if not any(x_id) and not any(y_id):
+            return False


Are these two lines needed? Aren't they covered by the below?

You'd think not, but numerous recursion failures if I leave them out...

So, what happens is that it does recurse, but the if statement prevents it from doing it more than once.

which is simple to solve - just use any - will push fix up shortly.

Doesn't work either, but passing in equal_[nan|inf]=False does.

No, it doesn't...

Maybe add a comment saying that this is needed to prevent recursion or something?

Good to go either way

mhvk · 2018-05-27T17:17:26Z

What a pain! But finally got it working with a more sensible implementation. Turns out I was partially misled by test that failed because they used subclasses that were not written correctly (one by me...).

eric-wieser · 2018-05-29T07:39:03Z

numpy/testing/_private/utils.py

+    def func_assert_same_pos(x, y, func=isnan, hasval='nan'):
+        """Handling nan/inf: combine results of running func on x and y,
+        checking that they are True at the same locations."""
+        # Both the cast to bool_ and the != True comparison are done


Can you move the cast to bool to the scalar case below?

I needed to cast both x_id and y_id, but that can indeed be done at the end. That does look a bit nicer, so done.

eric-wieser · 2018-05-29T07:40:22Z

numpy/lib/tests/test_ufunclike.py

@@ -55,6 +55,10 @@ def __array_wrap__(self, obj, context=None):
                obj.metadata = self.metadata
                return obj

+            def __array_finalize__(self, obj):
+                self.metadata = getattr(obj, 'metadata', None)


Can you describe in the commit message why these changes were needed?

Done. This was indeed a non-trivial one - I had to look again to remind me why it was needed. I wondered about the use of stating it in the commit message until I recalled that with git blame one can find why this was added.

eric-wieser · 2018-06-04T01:32:43Z

numpy/testing/_private/utils.py

+        # If there is a scalar, then here we know the array has the same
+        # flag as it everywhere, so we should return the scalar flag.
+        return (bool_(x_id) if x_id.ndim == 0 else
+                (bool_(y_id) if y_id.ndim == 0 else y_id))


Probably clearer as just a normal if / else chain at this point.

Agreed; done.

eric-wieser · 2018-06-04T08:11:52Z

numpy/testing/_private/utils.py

+            if x.size == 0:
+                return
+        elif flagged:
+            # no sense doing comparison if everything is flagged.


Not sure I agree with this - what if x and y are quantities of different dimensions? The test should still compare the empty arrays so that the correct error is raised, right?

It is the same case as the if x.size == 0 stanza above, but for the case where x and y are both scalar. In principle, I agree the comparison should work for empty elements, but in practice, this is not the case (e.g., matrix changing its shape being one problem...)

Since this was here before, I propose to leave it as is.

Sure, i agree this is out of scope for the patch

Overall, it sadly remains a chunk of rather fragile code...

eric-wieser · 2018-06-04T08:12:30Z

Sorry for dropping the ball on this one. Would be great to get this into 1.15.

This brought to light two bugs in tests, which are fixed here, viz., that a sample ndarray subclass that tested propagation of an added parameter was incomplete, in that in propagating the parameter in __array_wrap__ it assumed it was there on self, but that assumption could be broken when a view of self was taken (as is done by x[~flagged] in the test routine), since there was no __array_finalize__ defined. The other subclass bug counted, incorrectly, on only needing to provide one type of comparison, the __lt__ being explicitly tested. But flags are compared with __eq__ and those flags will have the same subclass.

charris · 2018-06-04T17:02:30Z

Out of curiosity, how many of the affected tests are already shadowed in numpy/ma/testutils.py?

charris · 2018-06-07T20:04:07Z

Thanks Marten.

taldcroft · 2018-06-07T20:25:16Z

Wow thanks @mhvk and the rest, this indeed was not trivial!

charris · 2018-08-17T18:58:08Z

numpy/testing/_private/utils.py

+        # compared usefully, and for which .all() yields masked.
+        x_id = func(x)
+        y_id = func(y)
+        if (x_id == y_id).all() != True:


Hmm,

In [10]: ma.masked != True Out[10]: masked In [11]: ma.masked == True Out[11]: masked In [12]: bool(ma.masked) Out[12]: False

@mhvk The camparison to True seems to achieve nothing here.

The purpose is to ensure that the stanza is not executed if x_id or y_id is masked, yet is executed if they are different.

mhvk added 00 - Bug component: numpy.testing 03 - Maintenance labels May 19, 2018

charris reviewed May 21, 2018

View reviewed changes

eric-wieser reviewed May 25, 2018

View reviewed changes

eric-wieser added the component: numpy.ma masked arrays label May 25, 2018

mhvk force-pushed the assert-array-comparison-with-masked branch from 4237e61 to d324b60 Compare May 25, 2018 12:51

eric-wieser reviewed May 25, 2018

View reviewed changes

mhvk force-pushed the assert-array-comparison-with-masked branch from d324b60 to c1544bf Compare May 25, 2018 18:57

eric-wieser reviewed May 25, 2018

View reviewed changes

mhvk force-pushed the assert-array-comparison-with-masked branch from c1544bf to 46324d1 Compare May 25, 2018 20:07

eric-wieser reviewed May 25, 2018

View reviewed changes

mhvk force-pushed the assert-array-comparison-with-masked branch from 46324d1 to ceb6e5d Compare May 27, 2018 00:48

eric-wieser reviewed May 27, 2018

View reviewed changes

mhvk force-pushed the assert-array-comparison-with-masked branch from ceb6e5d to 66d22aa Compare May 27, 2018 12:08

mhvk force-pushed the assert-array-comparison-with-masked branch from 66d22aa to 5718b33 Compare May 27, 2018 12:09

eric-wieser approved these changes May 27, 2018

View reviewed changes

eric-wieser reviewed May 27, 2018

View reviewed changes

mhvk force-pushed the assert-array-comparison-with-masked branch from 21dd5aa to 34bc491 Compare May 27, 2018 17:16

eric-wieser reviewed May 29, 2018

View reviewed changes

mhvk force-pushed the assert-array-comparison-with-masked branch from 34bc491 to 7eb1533 Compare May 29, 2018 13:26

eric-wieser reviewed Jun 4, 2018

View reviewed changes

mhvk added this to the 1.15.0 release milestone Jun 4, 2018

mhvk force-pushed the assert-array-comparison-with-masked branch from 7eb1533 to 3ad49aa Compare June 4, 2018 13:38

eric-wieser approved these changes Jun 4, 2018

View reviewed changes

charris merged commit 5cbb982 into numpy:master Jun 7, 2018

mhvk deleted the assert-array-comparison-with-masked branch June 7, 2018 20:16

charris mentioned this pull request Aug 15, 2018

np.testing.assert_almost_equal Issue in Python 3.5 for numpy==1.15.0 #11743

Closed

charris reviewed Aug 17, 2018

View reviewed changes

Uh oh!

BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122

BUG,MAINT: Ensure masked elements can be tested against nan and inf. #11122

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment