8000 BUG difference in behaviour for subclass output in ufuncs by mhvk · Pull Request #4753 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG difference in behaviour for subclass output in ufuncs #4753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 5, 2015

Conversation

mhvk
Copy link
Contributor
@mhvk mhvk commented Jun 18, 2015

If one wants to have the output of a ufunc passed to a ndarray subclass, but the input is a regular array, the behaviour differs depending on whether one passes the output as a positional or as a keyword argument: if passed as positional argument, __numpy_ufunc__ is called, while if passed as a keyword argument, __array_prepare__ and __array_wrap__ are called. I would have expected for __numpy_ufunc__ to be called in both cases.

p.s. If the input is a subclass, it always works as expected.

Example script:

import numpy as np


class MyArray(np.ndarray):
    def __array_prepare__(self, obj, context=None):
        print('prepare:', self, obj, context)
        return super(MyArray, self).__array_prepare__(obj, context)

    def __array_wrap__(self, obj, context):
        print('wrap:', self, obj, context)
        return super(MyArray, self).__array_wrap__(obj, context)

    def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs):
        print('ufunc:', self, ufunc, method, i, inputs, kwargs)
        values = [np.asarray(arg) for arg in inputs]
        if 'out' in kwargs:
            kwargs['out'] = kwargs['out'].view(np.ndarray)
        result = getattr(ufunc, method)(*values, **kwargs) / 2.
        return result.view(MyArray)

a = np.array([-1., -1.]).view(MyArray)
print(repr(np.sin(np.arange(2.), a)))
a = np.array([-1., -1.]).view(MyArray)
print(repr(np.sin(np.arange(2.), out=a)))

@pv
Copy link
Member
pv commented Jun 2, 2014

Is this something that was not addressed in gh-4626?

EDIT: apparently yes. this is an oversight

@pv
Copy link
Member
pv commented Jun 2, 2014

@cowlicks: overlooked case. Also, what should i be when self is one of the output arrays (probably past the inputs, which is likely ok...)?

@juliantaylor juliantaylor added this to the 1.9 blockers milestone Jun 2, 2014
@juliantaylor juliantaylor modified the milestones: 1.9 blockers, 1.10 blockers Aug 26, 2014
@mhvk
Copy link
Contributor Author
mhvk commented Jan 19, 2015

Checking 1.10 blockers brought up by my __numpy_ufunc__ trials: In numpy 1.10.0.dev-256d256, this is still an issue.

@cowlicks
Copy link
Contributor

I'll take a look this weekend.

@cowlicks
Copy link
Contributor

Okay so what currently happens is PyUFunc_CheckOverride is called at every ufunc call. It sets result as the result of the override if one exists, if no override is found result is NULL. Inside PyUFunc_CheckOverride we check positional arguments for __numpy_ufunc__ attribute. If it is not there, set result = NULL and return 0. Clearly we need to check out here, but it will add a little overhead to every ufunc call since we are now checking a dictionary too. I think we need to insert this logic here

@njsmith
Copy link
Member
njsmith commented Jan 26, 2015

Surely there is somewhere in the ufunc logic where the arguments are
normalized. (Or if not there ought to be ;-).) Can't we put this checking
after that happens?

...okay reading the code you linked to I see that the numpy_ufunc code
is apparently already has its own separate reimplementation of ufunc
argument handling. I can see how we might not want to run two separate
copies of the same argument normalization on every ufunc call :-(

On Mon, Jan 26, 2015 at 4:00 AM, Blake Griffith notifications@github.com
wrote:

Okay so what currently happens is PyUFunc_CheckOverride is called at
every ufunc call. It sets result as the result of the override if one
exists, if no override is found result is NULL. Inside
PyUFunc_CheckOverride we check positional arguments for numpy_ufunc
attribute. If it is not there, set result = NULL and return 0. Clearly we
need to check out here, but it will add a little overhead to every ufunc
call since we are now checking a dictionary too. I think we need to insert
this logic here
https://github.com/numpy/numpy/blob/master/numpy/core/src/private/ufunc_override.h#L224


Reply to this email directly or view it on GitHub
#4753 (comment).

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org

@cowlicks
Copy link
Contributor

They are only normalized (to what __numpy_ufunc__ needs) If there is a __numpy_ufunc__ attribute present. I was reluctant to normalize first because that would add even more overhead to every ufunc call. I'm testing my prototype (based on my last comment) now. I can try both approaches and see if the timing vs code complexity tradeoff is significant.

@njsmith
Copy link
Member
njsmith commented Jan 26, 2015

I know it's more work and may spill over into cleaning up other people's
messes, but I think it's worth looking to see if you can move the checks
for the numpy_ufunc attribute to somewhere after the regular ufunc
argument processing, which already has to deal with positional versus
kwargs.
On 26 Jan 2015 04:22, "Blake Griffith" notifications@github.com wrote:

They are only normalized (to what numpy_ufunc needs) If there is a
numpy_ufunc attribute present. I was reluctant to normalize first
because that would add even more overhead to every ufunc call. I'm testing
my prototype (based on my last comment) now. I can try both approaches and
see if the timing vs code complexity tradeoff is significant.


Reply to this email directly or view it on GitHub
#4753 (comment).

@cowlicks
Copy link
Contributor

Yeah, that is definitely the right way to do this. I'm not sure why I did not do it that way originally. I'll try to find where that happens and check it out... After I get this working.

@cowlicks
Copy link
Contributor

@njsmith I found some arg parsing stuff here, but it is just for multiarray stuff? I don't think it is used for ufuncs. I need to look more into this.

My prototype is here (based on my first comment). But it is breaking np.dot and idk why.

@mhvk
Copy link
Contributor Author
mhvk commented Jan 26, 2015

Had a look at the prototype and while it looks good generally, I worry it would still break for ufuncs with two outputs (where out is a tuple; see your work in #4171)

@cowlicks
Copy link
Contributor
cowlicks commented Feb 1, 2015

Little update: I rebased on current master and the segfault went away... So it wasn't my fault I guess? That had me stumped for a while. Working on a fix taking @mhvk's comments into consideration.

@cowlicks
Copy link
Contributor
cowlicks commented Feb 1, 2015

This seems to fix it. But I have not refactored to use the general arg processing as suggested by @njsmith. I have not assessed the difficulty of that yet. If it seems hard I'll make an issue with a plan of action.

I think there is still cleaning up to do. And I have not checked that I did not create any memory leaks. I'm not confident in my C coding. So I'll double check this then make a PR.

@mhvk
Copy link
Contributor Author
mhvk commented Feb 2, 2015

looks good to me, but am definitely even less of an expert.

@cowlicks
Copy link
Contributor
cowlicks commented Feb 2, 2015

There is a bit of a problem. The __numpy_ufunc__ protocol specifies it has the signature:

def __numpy_ufunc__(self, ufunc, method, i, inputs, **kwargs)

Where "i is the index of self in inputs" but output arguments get put into kwargs. So what should the i index be in this case? (See the NEP here).

Basically __numpy_ufunc__ does not account for the output arguments being able to override the ufunc. I'm still thinking about the best way to approach this. Suggestions? @pv

@mhvk
Copy link
Contributor Author
mhvk commented Feb 2, 2015

FWIW: I think the approach you took, of just continuing to add to i, is the most logical one. It is important for an output to be able to have __numpy_ufunc__ and thus to tell that it cannot handle the result.

@cowlicks
Copy link
Contributor
cowlicks commented Feb 2, 2015

@mhvk But it is breaking this test.

Basically the test uses the "defective" behavior.

@mhvk
Copy link
Contributor Author
mhvk commented Feb 2, 2015

Hmm, fortunately, __numpy_ufunc__ has not been part of a release, so still in time to get it right! It seems obvious the implementer will have to be able to deal with either i is None or i >= len(inputs). The latter would carry more information, so seems preferred.

@ewmoore
Copy link
Contributor
ewmoore commented Feb 2, 2015

If it hasn't been released perhaps the signature should be changed. Maybe add an o parameter or make i be input_index and add an output_index and require that one or the other always be None. This seems much clearer than indexing into a list of output arrays if i >= len(inputs).

Is it documented anywhere that passing an output array can effect the ufunc dispatch? Seems kind of surprising that it work this way.

@mhvk
Copy link
Contributor Author
mhvk commented Feb 2, 2015

@ewmoore - but the output should not just be treated as an arbitrary ndarray either (indeed, that is what the current __array_prepare__ and __array_wrap__ do well).

@pv
Copy link
Member
pv commented Feb 2, 2015

Only the option i >= len(inputs) is compatible with code in Scipy.
That it's out there is somewhat non-ideal situation, but what is done is
done and needs to be considered now. If a different interface is
desired, then __numpy_ufunc__ needs to be renamed to
__numpy_override__ or something similar.

@mhvk
Copy link
Contributor Author
mhvk commented Feb 2, 2015

I'd also think i>=len(inputs) is most logical conceptually in the sense that it now effectively indexes *args in the original call for the case that no out keyword was present.

@cowlicks
Copy link
Contributor
cowlicks commented Feb 5, 2015

Okay. I'll correct the test that is failing to check for i >= len(inputs), do some cleanup, and make a PR.

@mhvk
Copy link
Contributor Author
mhvk commented Jun 11, 2015

@cowlicks - did you ever get around to fixing this? You had #5566, but you closed that intending to make a different solution...

@cowlicks
Copy link
Contributor

No I never got around it. There is a lot of discussion around __numpy_ufunc__ and I have not had the bandwidth to follow it. It seemed like a rewrite was being proposed in #5844? But I don't really have time to read that whole thread.

6D40

@mhvk
Copy link
Contributor Author
mhvk commented Jun 18, 2015

@cowlicks - I think this bug has little to do with the (long) discussion about what to do with ndarray binops, but rather affects __numpy_ufunc__ more generally. Since we're close to release, I went ahead and reworked your earlier pull request, in a way that I think will address all previous comments. Please have a look. (My own sense: OK, but needs a few tests that i is indeed what it should be.) cc @pv, @njsmith, @jaimefrio, @charris, since they all looked at your original PR (#5566).

…tely

for the case where self is among outputs but not among inputs. Ensure it
works both out passed on as an argument and with out in a keyword argument.
@mhvk
Copy link
Contributor Author
mhvk commented Jun 19, 2015

OK, I added further tests that the index passed on to __numpy_ufunc__ is in fact correct in all cases. I also rebased and merged all commits into one. I note that there was some discussion that it would have been more logical to do this test after normalising the arguments, but that doesn't really matter for the functionality, and at this late stage it seems best to just fix the bug rather than try to overhaul the order.

@njsmith
Copy link
Member
njsmith commented Jun 19, 2015

OK, going to say something unpopular. (Certainly I'm annoyed at myself for saying it, but nonetheless...) I've been debating this with myself for a while, and I've come to the conclusion that we should bite the bullet and drop the index argument from __numpy_ufunc__'s signature. This will require some care to avoid breaking already released versions of scipy.sparse (lesson learned: we should never have released scipy.sparse with __numpy_ufunc__ before it came out in numpy), but that's better than leaving it forever.

Rationale:

  • the index argument is never useful. The only time it's useful at all is for classes that want to cast self to ndarray and recurse, and even in those cases they can just as well do a loop and handle all objects of their type at once. It'll be faster too. None of the example classes we've written during the binop discussion have used it at all.
  • the impossibility of having a useful value for output arguments makes there index even more useless. (Sure we can set it to
    n > nin, but what's the use of this? Writing code to check for this and do something sensible is more complicated than just finding the array by hand.)
  • there are almost certainly other arguments we should dispatch on, like where=. What do we set the index to in this case?

Sorry. I'm the one who suggested adding it in the first place, but it was a bad idea :-/

Also while we're at it, we should make the out= argument unconditionally a tuple, rather than sometimes a single ndarray and sometimes a tuple of them. This would simplify argument processing (see: all the times we wrote if not isinstance(out, tuple): out = (out,) on the example binop implementations).

@mhvk
Copy link
Contributor Author
mhvk commented Jun 19, 2015

@njsmith - I made your point a new issue #5986, as I think it deserves better discussion than it might get here. Also, this PR is really not so much about what to do with the index as about ensuring that out arguments are able to override __numpy_ufunc__, whether given as an argument or as a keyword argument.

@charris charris modified the milestones: 1.10.0 release, 1.10 blockers Jun 21, 2015
@charris
Copy link
Member
charris commented Jun 27, 2015

@mhvk, @njsmith So where does this sit now?

@njsmith
Copy link
Member
njsmith commented Jun 28, 2015

I think the status is: this is an uncontroversial bug, and @mhvk has a plausible patch here, which hasn't been converted into a PR yet.

This isn't necessarily a 1.10 blocker (it would be annoying to release with a bug in __numpy_ufunc__, but it would just be a bug; we could fix it in 1.10.1 if we had to).

As a practical matter it would probably be best to get #6001 and #5986 sorted out first before addressing this, to avoid collisions.

@mhvk
Copy link
Contributor Author
mhvk commented Jun 28, 2015

@njsmith - note that my patch did get converted to a PR -- this issue itself has been converted to a PR [1]. I actually don't think it impacts any other the other PR too much. If we remove index, it would effectively be removed again, but that's fine.

[1] See http://docs.astropy.org/en/latest/development/workflow/additional_git_topics.html#converting-a-github-issue-to-a-pull-request

@charris charris modified the milestones: 1.10.0 release, 1.11.0 release Aug 3, 2015
@charris
Copy link
Member
charris commented Aug 3, 2015

Moved this to the 1.11.0 release with the rest of the __numpy_ufunc__ stuff.

@charris
Copy link
Member
charris commented Oct 3, 2015

@mhvk What need to be done to bring this to completion. If we need to change the interface, now would be a good time to get it decided. @pv IIRC, we are also going to need a rename for scipy sparse.

@mhvk
Copy link
Contributor Author
mhvk commented Oct 5, 2015

@charris - I think this PR is all OK for the present __numpy_ufunc__. If we do decide to change that, including removing the index from the call signature, then most of the actual code will likely get removed, but some of the tests will guard against this problem recurring (specifically, https://github.com/numpy/numpy/pull/4753/files#diff-bc1e8d40b8b7758facc5e143be016f99R2515).

So, my feeling is to merge this, in order to let the test case rather than this open PR guard against the problem recurring.

charris added a commit that referenced this pull request Oct 5, 2015
BUG difference in behaviour for subclass output in ufuncs
@charris charris merged commit 68e61c2 into numpy:master Oct 5, 2015
@charris
Copy link
Member
charris commented Oct 5, 2015

@mhvk OK, merging. Thanks.

@mhvk mhvk deleted the bug-4753 branch October 5, 2015 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0