8000 BUG np.where half-initializes subclass of output · Issue #5095 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG np.where half-initializes subclass of output #5095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mhvk opened this issue Sep 21, 2014 · 23 comments · Fixed by #5105
Closed

BUG np.where half-initializes subclass of output #5095

mhvk opened this issue Sep 21, 2014 · 23 comments · Fixed by #5105

Comments

@mhvk
Copy link
Contributor
mhvk commented Sep 21, 2014

In numpy 1.9, the behaviour of np.where(condition, first, second) changed, in that the output no longer always is an ndarray instance but an instance of the subclass of first. However, __array_finalize__(self, obj) is called with obj=None, and hence the subclass can not properly initialize itself (according to the documentation [1], None is reserved for the case where a new instance is created via __new__). This change in behaviour has caused some problems (e.g., astropy/astropy#2958, also discussion in astropy/astropy#1274).

I should add that I do not necessarily believe the old behaviour was preferable, but it would be good if the current behaviour is documented.

Going forward, an option which I think would be ideal (though not necessarily practical...), is if np.where would do the equivalent of

out = first.copy()
out[condition] = second[condition]

In this way, subclasses can check if the second argument is consistent with the first (e.g., for our Quantity class, check the units and possibly convert). This might be done most easily if, somehow, np.where could be turned into a ufunc and passed through the __numpy_ufunc__ mechanism.

[1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

@juliantaylor
Copy link
Contributor

where now uses the iterator to allocate the output, but when it allocates the result it propagates the subtype but calls the constructor without an object, like a direct __new__ call

hacking it in using the input object with the hightest priority should fix this and the testsuite succeeds.
But I am a skeptical if this is safe for all cases, our testsuite is leaky in regards to these.
Do we need a new iterator flag to call PyArray_NewFromDescr with non-null obj argument from the iterator output allocation?

@juliantaylor
Copy link
Contributor

branch with the changes:
https://github.com/juliantaylor/numpy/tree/iter-subobj

@mhvk can you try if that branch fixes your issues?

@juliantaylor
Copy link
Contributor

on second though where is currently completely subclass unsafe as there is no way to preserve extra information like masks.
should we revert back to 1.8 dropping of subclasses? or change it to use indexing so __getitem__ __setitem__ are used as suggested?

@mhvk
Copy link
Contributor Author
mhvk commented Sep 22, 2014

@juliantaylor - I agree that it would remain unsafe for subclasses. I wondered whether my suggestion was much slower, but it turns out it is not too bad, at least for large arrays:

import numpy as np
def my_where(condition, first, second):
    condition, first, second = np.broadcast_arrays(condition, first, second)
    condition = ~condition
    result = first.copy()
    result[condition] = second[condition]
    return result

a = np.random.uniform(size=100000)
condition = a > 0.5
%timeit np.where(condition, a, 0.)
# 100 loops, best of 3: 2.07 ms per loop
%timeit my_where(condition, a, 0.)
# 100 loops, best of 3: 2.36 ms per loop

For smaller arrays, it is worse. E.g., for size=100, one becomes dominated by all the initialisation: 13.5 and 96.7 micro-sec, respectively.

@njsmith
Copy link
Member
njsmith commented Sep 22, 2014

Note that implementing where() via indexing requires a special case for 0d
inputs.

where(True, 5, 10)

works, but

result[condition] = ...

will fail because of how 0d indexing works.

@juliantaylor
Copy link
Contributor

where can be double as fast as indexing if the condition is more friendly to branch prediction (dense or sparse) but that should not be an issue, we can add a wrapper python function that does indexing on subclasses and calls to the existing C function for ndarrays
for 1.9.1 we should probably just disable the broken subclass keeping as it brakes code.

@seberg
Copy link
Member
seberg commented Sep 23, 2014

result[np.asarray(condition)] is a workaround until our deprecation is all done and we can change it in master.

juliantaylor added a commit to juliantaylor/numpy that referenced this issue Sep 23, 2014
the C implementation cannot preserve subclass attributes, this needs a
python wrapper using indexing for subclasses.
closes numpygh-5095
@mhvk
Copy link
Contributor Author
8000
mhvk commented Sep 23, 2014

@juliantaylor - I saw in #5105 that you hope to produce a version that handles subclasses correctly. Let me know if I can help in any way. Also, shall I raise a new issue reminding us of this?

@juliantaylor
Copy link
Contributor

too bad broadcast_arrays doesn't preserve subtypes, else this would probably already be it:

        a = np.asanyarray(a)
        b = np.asanyarray(b)
        cond, a, b = np.broadcast_arrays(cond, a, b)
        res = b.copy()
        res[cond] = a[cond]
        return res

I wonder if thats fixable

@mhvk
Copy link
Contributor Author
mhvk commented Sep 23, 2014

See #4622! (about to be merged by @seberg)

@mhvk
Copy link
Contributor Author
mhvk commented Sep 23, 2014

p.s. This is indeed why I thought of my example. Note that broadcast_arrays does upgrade to an array, so I think one could even omit the asanyarray.

@juliantaylor
Copy link
Contributor

oh very nice, I'll have a look at supporting subclasses when its merged
should be something along the lines of this:
https://github.com/juliantaylor/numpy/tree/where-subclass
but it still needs some work to succeed the tests with special float values

@WeatherGod
Copy link
Contributor

Sorry for resurrecting an old thread, but I finally tracked down the source of an obscure bug in one of our production systems, and I think it relates to this. The line (np.where(x > 224, 224, x)/25.4).filled(0) where x was a masked array, used to work fine because np.where() returned a masked array that had a method filled(). I know I can use np.ma.where(), and I will going forward, but changing the behavior of np.where() did break legitimate code.

@charris charris added this to the 1.10.0 release milestone Jul 29, 2015
@charris
Copy link
Member
charris commented Jul 29, 2015

I've put a preliminary 1.10 milestone on this so that it gets looked at.

@mhvk
Copy link
Contributor Author
mhvk commented Jul 29, 2015

@charris - be sure to reopen the issue, otherwise it probably won't get looked at at all! (@juliantaylor -- a proper solution would still be very welcome!)

@njsmith
Copy link
Member
njsmith commented Jul 29, 2015

Fwiw would be comfortable with not considering this a 1.10 blocker and
instead letting it sit until someone who has a specific interest in
subclasses shows up with a proposal+fix. Right now it sounds like no one is
sure what a proper fix would even be. And while the behavior in master
might not be optimal, it's not clear that there's even any back compat gain
to be had, given that master matches 1.9 and so everyone who wants to
support 1.9 already has to adapt regardless. Leaving it in 1.10 won't
create any new additional burden.
On Jul 29, 2015 10:36 AM, "Charles Harris" notifications@github.com wrote:

I've put a preliminary 1.10 milestone on this so that it gets looked at.


Reply to this email directly or view it on GitHub
#5095 (comment).

@njsmith njsmith reopened this Jul 29, 2015
@shoyer
Copy link
Member
shoyer commented Jul 29, 2015

np.where works like a ufunc, though I know it isn't implemented that way internally. So I think the right solution to this is probably to simply use __numpy_ufunc__ and let subclasses/duck-types override the functionality that way.

@mhvk's proposed solution is interesting, but it won't work for all duck-array types:

out = first.copy()
out[condition] = second[condition]

The problem is that some of the duck-array types for which where is most useful are actually immutable (e.g., dask.array). These types need their own solution, which they can implement with __numpy_ufunc__.

@WeatherGod
Copy link
Contributor

I should also point out that my code example used to work because the condition was a masked array. Depending on the type of first would have been misleading in my case as my example has just an integer there.

@WeatherGod
Copy link
Contributor

...and looking over my company's internal bug tracker, this may be the source of another bug that was customer-facing, causing our shipped netCDF files to have very negative values in places that should have been masked, and our contours to look a bit silly. We never figured out the exact cause of the issue, but I am now going to see if an np.where() was used.

@charris
Copy link
Member
charris commented Jul 29, 2015

Perhaps we should raise an error until the subclass issue is fixed. @juliantaylor Have you had any time to pursue your solution?

@WeatherGod
Copy link
Contributor

I think I would be more comfortable with a warning. The sneaky nature of
this problem indicates that the results are more like a degraded result
rather than "wrong", especially in the face of duck-typing elsewhere. A
warning would have pulled me to this spot in the code faster, and I would
have reported it sooner as such. It is quite possible that other people
have encountered similar problems and merely shrugged and worked around it,
or didn't figure out the source of it.

On Wed, Jul 29, 2015 at 4:00 PM, Charles Harris notifications@github.com
wrote:

Perhaps we should raise an error until the subclass issue is fixed.
@juliantaylor https://github.com/juliantaylor Have you had any time to
pursue your solution?


Reply to this email directly or view it on GitHub
#5095 (comment).

@charris charris modified the milestones: 1.11.0 release, 1.10.0 release Oct 13, 2015
@charris
Copy link
Member
charris commented Jan 21, 2016

So this is still up in the air. Pushing off to 1.12.

@mhvk
Copy link
Contributor Author
mhvk commented Jun 17, 2019

Closing since one can now override np.where with __array_function__. Hopefully, eventually where becomes a proper ufunc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants
0