Inconsistent behavior for ufuncs in Numpy v1.10.X

jsolbrig · 2016-01-26T17:50:08Z

I am copying this from a discussion on the mailing list.

I am attempting to subclass numpy.ma.MaskedArray. I am currently using Python v2.7.10. The problem discussed below does not occur in Numpy v1.9.2, but does occur in all versions of Numpy v1.10.x including v1.10.4.

Using mathematical operators on my subclass behaves differently than using the analogous ufunc. When using the ufunc directly (e.g. np.subtract(arr1,
arr2)), array_prepare, array_finalize, and array_wrap are all called as expected, however, when using the symbolic operator (e.g. arr1-arr2) only array_finalize is called. As a consequence, I lose any information stored in arr._optinfo when a mathematical operator is used.
Here is a code snippet that illustrates the issue.

#!/bin/env python

import numpy as np
from numpy.ma import MaskedArray, nomask

class InfoArray(MaskedArray):
    def __new__(cls, info=None, data=None, mask=nomask, dtype=None, 
                copy=False, subok=True, ndmin=0, fill_value=None,
                keep_mask=True, hard_mask=None, shrink=True, **kwargs):
        obj = super(InfoArray, cls).__new__(cls, data=data, mask=mask,
                      dtype=dtype, copy=copy, subok=subok, ndmin=ndmin, 
                      fill_value=fill_value, hard_mask=hard_mask,
                      shrink=shrink, **kwargs)
        obj._optinfo['info'] = info
        return obj

    def __array_prepare__(self, out, context=None):
        print '__array_prepare__'
        return super(InfoArray, self).__array_prepare__(out, context)

    def __array_wrap__(self, out, context=None):
        print '__array_wrap__'
        return super(InfoArray, self).__array_wrap__(out, context)

    def __array_finalize__(self, obj):
        print '__array_finalize__'
        return super(InfoArray, self).__array_finalize__(obj)

if __name__ == "__main__":
    arr1 = InfoArray('test', data=[1,2,3,4,5,6])
    arr2 = InfoArray(data=[0,1,2,3,4,5])

    diff1 = np.subtract(arr1, arr2)
    print diff1._optinfo

    diff2 = arr1-arr2
    print diff2._optinfo

If run, the output looks like this:

$ python test_ma_sub.py 
#Call to np.subtract(arr1, arr2) here
__array_finalize__
__array_finalize__
__array_prepare__
__array_finalize__
__array_wrap__
__array_finalize__
{'info': 'test'}
#Executing arr1-arr2 here
__array_finalize__
{}

Currently I have simply downgraded to 1.9.2 to solve the problem for myself, but have been having difficulty figuring out where the difference lies between 1.9.2 and 1.10.0.

The text was updated successfully, but these errors were encountered:

charris · 2016-01-26T18:08:59Z

Looks like 3c6b6ba is the problem commit.

charris · 2016-01-26T18:15:09Z

@mhvk Thoughts?

mhvk · 2016-01-26T18:35:07Z

@charris - I'll need to look in more detail, which will not be before next week (graduate admissions...). I suspect #5864 more, though, since that directly touched the binary operations (CC @njsmith)

charris · 2016-01-26T19:03:22Z

@mhvk I did a bisection and 3c6b6ba is guilty ;) #5864 was my first thought also.

jsolbrig · 2016-01-26T19:27:29Z

This is getting a little deep for me at this point, though I will try to help if I can.

While reverting to v1.9.3 solves my current issues, it seems to me that the behavior is incorrect even in the 1.9.3 release. Calling np.subtract(arr1, arr2) correctly enters both array_prepare and array_wrap. Using the subtraction operator (arr1-arr2) skips this functionality, which goes counter to the numpy documentation on subclassing ndarray which states that array_prepare is called "before every ufunc" and array_wrap is called "at the end of every ufunc". This is a problem for code that attempts to properly use array_prepare and/or array_wrap to maintain metadata.

One issue compounded the other leaving me very confused I think.

charris · 2016-01-26T19:33:08Z

@jsolbrig Binary ops are handled a differently from ufuncs as they interact with Python, i.e., Python makes the call to the subtract method. The proper way to handle that is a current topic of ~~discussion~~ contention.

jsolbrig · 2016-01-26T19:39:25Z

@charris Wouldn't that mean that the documentation is incorrect, then? The subtraction operator still calls a ufunc (albeit an overridden one), but array_prepare and array_wrap are not called as specified in the documentation. I would expect the behavior of the operator and the function to be exactly the same from the standpoint of subclassing from MaskedArray.

seberg · 2016-01-26T19:42:22Z

The thing is that ufunc wrapping works on the ndarray level, but masked arrays do not get passed into the ufuncs, instead they have their own set of functions. MaskedArrays are not part of the core numpy API and as such they neither have to, nor ever did call those methods. Of course you are right to say that it would be nice if they would behave like a real numpy array, but they are unfortunatly just a tag on module, with all the problems that come with it.

EDIT: Of course you can pass them into the ufuncs, but their operators don't, they have their own API in this regard (and it basically means that subclasses trying to do a lot of things need to do even more magic probably). But I think subclasses were somewhat anticipated, so I am not quite sure.

jsolbrig · 2016-01-26T19:46:34Z

Okay, yeah, that makes sense. It does make me wonder, though, if array_prepare and array_wrap should be called inside the call method of _MaskedBinaryOperation and 8000 other similar classes in order to keep consistent behavior for subclassing and make it easier for others in the future.

njsmith · 2016-01-26T21:02:27Z

Sure, maybe? It's basically impossible to fix ndarray subclassing to work well, and then MaskedArray has a giant pile of hacks to make it kinda-sorta-work as an ndarray subclass, and then you're trying to subclass that. So... if there's an additional hack that will make your particular use case work, then might as well try it; it's unlikely to make things worse :-). (And the accidental regression versus 1.9 is certainly not OK; sorry about that.) As a piece of general advice though I'd strongly recommend trying to find some way to solve your problem that doesn't involve subclassing any of these things :-)

jsolbrig · 2016-01-26T21:30:07Z

Yep, I've basically come to the same conclusion that subclassing MaskedArray is probably the wrong solution. I'm going to have to think about how best to go about it, though. Using MaskedArray the way I have been has been so convenient other than this problem. I now have a workaround for this problem and everything seems to work the way I would hope, but it's a little worrisome that it was so difficult to achieve.

After looking at it a bit more, though, it does appear that if called in the appropriate location in _MaskedBinaryOperation, array_prepare and array_wrap would actually work the way they are supposed to. Do you think it would be worth me looking into how to implement the numpy-consistent behavior?

njsmith · 2016-01-26T21:34:20Z

My personal policy is to leave np.ma maintenance to those who care enough to fix things. So really it's up to you to decide whether it's a worthwhile use of your time :-). We would probably merge a pull request that fixed the regression, at least.

mhvk · 2016-01-26T21:38:47Z

@jsolbrig - just as a background: the PR that broke things for you tried to fix the case where the MaskedArray contains an ndarray subclass, so we should try to keep that intact (which I think should not be a problem; and anyway, it has test cases...). Anyway, am quite happy to look at possible solutions.

mhvk · 2016-01-27T16:03:13Z

Just so it doesn't get lost, in the thread on numpy-dev, @seberg wrote:

Well, there was definitely a change in that code that will cause this, i.e. the code block:

        if isinstance(b, MaskedArray):
            if isinstance(a, MaskedArray):
                result._update_from(a)
            else:
                result._update_from(b)
        elif isinstance(a, MaskedArray):
            result._update_from(a)

was changed to something like:

        masked_result._update_from(result)

charris · 2016-02-09T20:37:47Z

@mhvk Ping...

mhvk · 2016-02-09T21:07:36Z

Overwhelmed by my day job right now, in particular graduate admission, but a break is at least in sight... Hopefully still this week.

charris · 2016-02-20T17:20:13Z

@mhvk Ping! I'd like to get another beta out this weekend, what does your schedule look like?

mhvk · 2016-02-20T20:29:06Z

The break took longer in coming than expected, but now do have some time. Hopefully this weekend or Monday.

charris · 2016-02-20T21:15:52Z

The break took longer in coming than expected

They always do ;)

mhvk · 2016-02-21T01:19:55Z

@jsolbrig - as noted by @seberg and @njsmith, in general one cannot expect a np.ufunc call to do exactly the same as a masked array method -- those are all overwritten to do (in principle) the same as np.ma.ufunc (I think this is a mistake, but that is with the benefit of hindsight and knowing that something like __numpy_ufunc__ could resolve this much more elegantly). Anyway, it is also clear that in my original PR to get masked arrays holding ndarray subclasses to work better, I went too far, not quite understanding what I was doing. Hopefully, this will be resolved in #7296... Note that I used your sample above as a test case (really useful to have it!); still, please check it resolves your larger issue as well.

jsolbrig · 2016-02-22T18:36:22Z

@mhvk - Thanks for looking at this. In my time looking at this I had started wondering why __numpy_ufunc__ wasn't used here, but it seems that the answer might simply be legacy at this point. I may look at this now and then to see if I can figure out how to clean some of this up. Using __numpy_ufunc__ seems much more appropriate and cleaner that what is currently done and would make MaskedArray significantly easier to maintain in the long-run. Thankfully things like _MaskedBinaryOperation are well isolated, so testing a new solution shouldn't require much modification to the main body of MaskedArray. Do you see any reason that trying to re-implement the ufuncs through __numpy_ufunc__ wouldn't work?

mhvk · 2016-02-22T18:42:54Z

@jsolbrig - yes, definitely, __numpy_ufunc__ would be the way to go! Though with all the discussion about how exactly to use it (#5844), it may be good to wait a little before spending serious time (that said, the essence is not going to change).

jsolbrig · 2016-02-22T21:27:18Z

@mhvk - Thanks for pointing out that thread. I'll work through it and start participating in the conversation at minimum. Hopefully I can help create a decent solution, too. Finding a good solution for this issue would be very helpful for one of my current projects and I wouldn't mind spending some time on it extra time presents i 8000 tself.

mhvk · 2016-02-22T21:30:40Z

That's a very long one -- though in the mailing list we had concluded that, really, to make progress on __numpy_ufunc__ it would be very helpful if someone summarized the issues, so in that respect an independent view might be quite useful. (But definitely don't feel obliged...)

jsolbrig · 2016-02-22T21:54:52Z

I'll see what I can do. It would be nice to get involved in one of the
projects that I use so frequently and making a summary would be a good
introduction to the issues surrounding the issues. I'm not feeling any
obligation beyond wanting to see that what I see as a problem gets fixed
and realizing that I am probably one of a very small set of people who care
enough to have even bothered to dig this deep.

On Mon, Feb 22, 2016 at 2:30 PM Marten van Kerkwijk <
notifications@github.com> wrote:

That's a very long one -- though in the mailing list we had concluded
that, really, to make progress on numpy_ufunc it would be very
helpful if someone summarized the issues, so in that respect an independent
view might be quite useful. (But definitely don't feel obliged...)

—
Reply to this email directly or view it on GitHub
#7122 (comment).

mhvk · 2016-02-22T22:00:24Z

That's how it started for me too... And __numpy_ufunc__ is a really nice idea.

charris added this to the 1.11.0 release milestone Jan 26, 2016

charris added 06 - Regression component: numpy.ma masked arrays labels Jan 26, 2016

charris mentioned this issue Feb 11, 2016

Operation on masked_array changes fill_value #3762

Open

mhvk mentioned this issue Feb 21, 2016

Revert part of #3907 which incorrectly propogated MaskedArray info. #7296

Merged

charris closed this as completed in #7296 Feb 21, 2016

charris mentioned this issue Feb 21, 2016

Backport 7296, Revert part of #3907 which incorrectly propagated MaskedArray info. #7298

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Inconsistent behavior for ufuncs in Numpy v1.10.X #7122

Inconsistent behavior for ufuncs in Numpy v1.10.X #7122

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Inconsistent behavior for ufuncs in Numpy v1.10.X #7122

Inconsistent behavior for ufuncs in Numpy v1.10.X #7122

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!