8000 Inconsistent behavior for ufuncs in Numpy v1.10.X · Issue #7122 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Inconsistent behavior for ufuncs in Numpy v1.10.X #7122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jsolbrig opened this issue Jan 26, 2016 · 26 comments
Closed

Inconsistent behavior for ufuncs in Numpy v1.10.X #7122

jsolbrig opened this issue Jan 26, 2016 · 26 comments

Comments

@jsolbrig
Copy link

I am copying this from a discussion on the mailing list.

I am attempting to subclass numpy.ma.MaskedArray. I am currently using Python v2.7.10. The problem discussed below does not occur in Numpy v1.9.2, but does occur in all versions of Numpy v1.10.x including v1.10.4.

Using mathematical operators on my subclass behaves differently than using the analogous ufunc. When using the ufunc directly (e.g. np.subtract(arr1,
arr2)), array_prepare, array_finalize, and array_wrap are all called as expected, however, when using the symbolic operator (e.g. arr1-arr2) only array_finalize is called. As a consequence, I lose any information stored in arr._optinfo when a mathematical operator is used.
Here is a code snippet that illustrates the issue.

#!/bin/env python

import numpy as np
from numpy.ma import MaskedArray, nomask

class InfoArray(MaskedArray):
    def __new__(cls, info=None, data=None, mask=nomask, dtype=None, 
                copy=False, subok=True, ndmin=0, fill_value=None,
                keep_mask=True, hard_mask=None, shrink=True, **kwargs):
        obj = super(InfoArray, cls).__new__(cls, data=data, mask=mask,
                      dtype=dtype, copy=copy, subok=subok, ndmin=ndmin, 
                      fill_value=fill_value, hard_mask=hard_mask,
                      shrink=shrink, **kwargs)
        obj._optinfo['info'] = info
        return obj

    def __array_prepare__(self, out, context=None):
        print '__array_prepare__'
        return super(InfoArray, self).__array_prepare__(out, context)

    def __array_wrap__(self, out, context=None):
        print '__array_wrap__'
        return super(InfoArray, self).__array_wrap__(out, context)

    def __array_finalize__(self, obj):
        print '__array_finalize__'
        return super(InfoArray, self).__array_finalize__(obj)

if __name__ == "__main__":
    arr1 = InfoArray('test', data=[1,2,3,4,5,6])
    arr2 = InfoArray(data=[0,1,2,3,4,5])

    diff1 = np.subtract(arr1, arr2)
    print diff1._optinfo

    diff2 = arr1-arr2
    print diff2._optinfo

If run, the output looks like this:

$ python test_ma_sub.py 
#Call to np.subtract(arr1, arr2) here
__array_finalize__
__array_finalize__
__array_prepare__
__array_finalize__
__array_wrap__
__array_finalize__
{'info': 'test'}
#Executing arr1-arr2 here
__array_finalize__
{}

Currently I have simply downgraded to 1.9.2 to solve the problem for myself, but have been having difficulty figuring out where the difference lies between 1.9.2 and 1.10.0.

@charris
Copy link
Member
charris commented Jan 26, 2016

Looks like 3c6b6ba is the problem commit.

@charris
Copy link
Member
charris commented Jan 26, 2016

@mhvk Thoughts?

@mhvk
Copy link
Contributor
mhvk commented Jan 26, 2016

@charris - I'll need to look in more detail, which will not be before next week (graduate admissions...). I suspect #5864 more, though, since that directly touched the binary operations (CC @njsmith)

@charris
Copy link
Member
charris commented Jan 26, 2016

@mhvk I did a bisection and 3c6b6ba is guilty ;) #5864 was my first thought also.

@charris charris added this to the 1.11.0 release milestone Jan 26, 2016
@jsolbrig
Copy link
Author

This is getting a little deep for me at this point, though I will try to help if I can.

While reverting to v1.9.3 solves my current issues, it seems to me that the behavior is incorrect even in the 1.9.3 release. Calling np.subtract(arr1, arr2) correctly enters both array_prepare and array_wrap. Using the subtraction operator (arr1-arr2) skips this functionality, which goes counter to the numpy documentation on subclassing ndarray which states that array_prepare is called "before every ufunc" and array_wrap is called "at the end of every ufunc". This is a problem for code that attempts to properly use array_prepare and/or array_wrap to maintain metadata.

One issue compounded the other leaving me very confused I think.

@charris
Copy link
Member
charris commented Jan 26, 2016

@jsolbrig Binary ops are handled a differently from ufuncs as they interact with Python, i.e., Python makes the call to the subtract method. The proper way to handle that is a current topic of discussion contention.

@jsolbrig
Copy link
Author

@charris Wouldn't that mean that the documentation is incorrect, then? The subtraction operator still calls a ufunc (albeit an overridden one), but array_prepare and array_wrap are not called as specified in the documentation. I would expect the behavior of the operator and the function to be exactly the same from the standpoint of subclassing from MaskedArray.

@seberg
Copy link
Member
seberg commented Jan 26, 2016

The thing is that ufunc wrapping works on the ndarray level, but masked arrays do not get passed into the ufuncs, instead they have their own set of functions. MaskedArrays are not part of the core numpy API and as such they neither have to, nor ever did call those methods. Of course you are right to say that it would be nice if they would behave like a real numpy array, but they are unfortunatly just a tag on module, with all the problems that come with it.

EDIT: Of course you can pass them into the ufuncs, but their operators don't, they have their own API in this regard (and it basically means that subclasses trying to do a lot of things need to do even more magic probably). But I think subclasses were somewhat anticipated, so I am not quite sure.

@jsolbrig
Copy link
Author

Okay, yeah, that makes sense. It does make me wonder, though, if array_prepare and array_wrap should be called inside the call method of _MaskedBinaryOperation and 8000 other similar classes in order to keep consistent behavior for subclassing and make it easier for others in the future.

@njsmith
Copy link
Member
njsmith commented Jan 26, 2016

Sure, maybe? It's basically impossible to fix ndarray subclassing to work well, and then MaskedArray has a giant pile of hacks to make it kinda-sorta-work as an ndarray subclass, and then you're trying to subclass that. So... if there's an additional hack that will make your particular use case work, then might as well try it; it's unlikely to make things worse :-). (And the accidental regression versus 1.9 is certainly not OK; sorry about that.) As a piece of general advice though I'd strongly recommend trying to find some way to solve your problem that doesn't involve subclassing any of these things :-)

@jsolbrig
Copy link
Author

Yep, I've basically come to the same conclusion that subclassing MaskedArray is probably the wrong solution. I'm going to have to think about how best to go about it, though. Using MaskedArray the way I have been has been so convenient other than this problem. I now have a workaround for this problem and everything seems to work the way I would hope, but it's a little worrisome that it was so difficult to achieve.

After looking at it a bit more, though, it does appear that if called in the appropriate location in _MaskedBinaryOperation, array_prepare and array_wrap would actually work the way they are supposed to. Do you think it would be worth me looking into how to implement the numpy-consistent behavior?

@njsmith
Copy link
Member
njsmith commented Jan 26, 2016

My personal policy is to leave np.ma maintenance to those who care enough to fix things. So really it's up to you to decide whether it's a worthwhile use of your time :-). We would probably merge a pull request that fixed the regression, at least.

@mhvk
Copy link
Contributor
mhvk commented Jan 26, 2016

@jsolbrig - just as a background: the PR that broke things for you tried to fix the case where the MaskedArray contains an ndarray subclass, so we should try to keep that intact (which I think should not be a problem; and anyway, it has test cases...). Anyway, am quite happy to look at possible solutions.

@mhvk
Copy link
Contributor
mhvk commented Jan 27, 2016

Just so it doesn't get lost, in the thread on numpy-dev, @seberg wrote:

Well, there was definitely a change in that code that will cause this, i.e. the code block:

        if isinstance(b, MaskedArray):
            if isinstance(a, MaskedArray):
                result._update_from(a)
            else:
                result._update_from(b)
        elif isinstance(a, MaskedArray):
            result._update_from(a)

was changed to something like:

        masked_result._update_from(result)

@charris
Copy link
Member
charris commented Feb 9, 2016

@mhvk Ping...

@mhvk
Copy link
Contributor
mhvk commented Feb 9, 2016

Overwhelmed by my day job right now, in particular graduate admission, but a break is at least in sight... Hopefully still this week.

@charris
Copy link
Member
charris commented Feb 20, 2016

@mhvk Ping! I'd like to get another beta out this weekend, what does your schedule look like?

@mhvk
Copy link
Contributor
mhvk commented Feb 20, 2016

The break took longer in coming than expected, but now do have some time. Hopefully this weekend or Monday.

@charris
Copy link
Member
charris commented Feb 20, 2016

The break took longer in coming than expected

They always do ;)

@mhvk
Copy link
Contributor
mhvk commented Feb 21, 2016

@jsolbrig - as noted by @seberg and @njsmith, in general one cannot expect a np.ufunc call to do exactly the same as a masked array method -- those are all overwritten to do (in principle) the same as np.ma.ufunc (I think this is a mistake, but that is with the benefit of hindsight and knowing that something like __numpy_ufunc__ could resolve this much more elegantly). Anyway, it is also clear that in my original PR to get masked arrays holding ndarray subclasses to work better, I went too far, not quite understanding what I was doing. Hopefully, this will be resolved in #7296... Note that I used your sample above as a test case (really useful to have it!); still, please check it resolves your larger issue as well.

@jsolbrig
Copy link
Author

@mhvk - Thanks for looking at this. In my time looking at this I had started wondering why __numpy_ufunc__ wasn't used here, but it seems that the answer might simply be legacy at this point. I may look at this now and then to see if I can figure out how to clean some of this up. Using __numpy_ufunc__ seems much more appropriate and cleaner that what is currently done and would make MaskedArray significantly easier to maintain in the long-run. Thankfully things like _MaskedBinaryOperation are well isolated, so testing a new solution shouldn't require much modification to the main body of MaskedArray. Do you see any reason that trying to re-implement the ufuncs through __numpy_ufunc__ wouldn't work?

@mhvk
Copy link
Contributor
mhvk commented Feb 22, 2016

@jsolbrig - yes, definitely, __numpy_ufunc__ would be the way to go! Though with all the discussion about how exactly to use it (#5844), it may be good to wait a little before spending serious time (that said, the essence is not going to change).

@jsolbrig
Copy link
Author

@mhvk - Thanks for pointing out that thread. I'll work through it and start participating in the conversation at minimum. Hopefully I can help create a decent solution, too. Finding a good solution for this issue would be very helpful for one of my current projects and I wouldn't mind spending some time on it extra time presents i 8000 tself.

@mhvk
Copy link
Contributor
mhvk commented Feb 22, 2016

That's a very long one -- though in the mailing list we had concluded that, really, to make progress on __numpy_ufunc__ it would be very helpful if someone summarized the issues, so in that respect an independent view might be quite useful. (But definitely don't feel obliged...)

@jsolbrig
Copy link
Author

I'll see what I can do. It would be nice to get involved in one of the
projects that I use so frequently and making a summary would be a good
introduction to the issues surrounding the issues. I'm not feeling any
obligation beyond wanting to see that what I see as a problem gets fixed
and realizing that I am probably one of a very small set of people who care
enough to have even bothered to dig this deep.

On Mon, Feb 22, 2016 at 2:30 PM Marten van Kerkwijk <
notifications@github.com> wrote:

That's a very long one -- though in the mailing list we had concluded
that, really, to make progress on numpy_ufunc it would be very
helpful if someone summarized the issues, so in that respect an independent
view might be quite useful. (But definitely don't feel obliged...)


Reply to this email directly or view it on GitHub
#7122 (comment).

@mhvk
Copy link
Contributor
mhvk commented Feb 22, 2016

That's how it started for me too... And __numpy_ufunc__ is a really nice idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants
0