BUG np.where half-initializes subclass of output #5095

mhvk · 2014-09-21T20:00:29Z

In numpy 1.9, the behaviour of np.where(condition, first, second) changed, in that the output no longer always is an ndarray instance but an instance of the subclass of first. However, __array_finalize__(self, obj) is called with obj=None, and hence the subclass can not properly initialize itself (according to the documentation [1], None is reserved for the case where a new instance is created via __new__). This change in behaviour has caused some problems (e.g., astropy/astropy#2958, also discussion in astropy/astropy#1274).

I should add that I do not necessarily believe the old behaviour was preferable, but it would be good if the current behaviour is documented.

Going forward, an option which I think would be ideal (though not necessarily practical...), is if np.where would do the equivalent of

out = first.copy()
out[condition] = second[condition]

In this way, subclasses can check if the second argument is consistent with the first (e.g., for our Quantity class, check the units and possibly convert). This might be done most easily if, somehow, np.where could be turned into a ufunc and passed through the __numpy_ufunc__ mechanism.

[1] http://docs.scipy.org/doc/numpy/user/basics.subclassing.html

The text was updated successfully, but these errors were encountered:

juliantaylor · 2014-09-22T21:24:51Z

where now uses the iterator to allocate the output, but when it allocates the result it propagates the subtype but calls the constructor without an object, like a direct __new__ call

hacking it in using the input object with the hightest priority should fix this and the testsuite succeeds.
But I am a skeptical if this is safe for all cases, our testsuite is leaky in regards to these.
Do we need a new iterator flag to call PyArray_NewFromDescr with non-null obj argument from the iterator output allocation?

juliantaylor · 2014-09-22T21:26:58Z

branch with the changes:
https://github.com/juliantaylor/numpy/tree/iter-subobj

@mhvk can you try if that branch fixes your issues?

juliantaylor · 2014-09-22T21:31:51Z

on second though where is currently completely subclass unsafe as there is no way to preserve extra information like masks.
should we revert back to 1.8 dropping of subclasses? or change it to use indexing so __getitem__ __setitem__ are used as suggested?

mhvk · 2014-09-22T22:44:21Z

@juliantaylor - I agree that it would remain unsafe for subclasses. I wondered whether my suggestion was much slower, but it turns out it is not too bad, at least for large arrays:

import numpy as np
def my_where(condition, first, second):
    condition, first, second = np.broadcast_arrays(condition, first, second)
    condition = ~condition
    result = first.copy()
    result[condition] = second[condition]
    return result

a = np.random.uniform(size=100000)
condition = a > 0.5
%timeit np.where(condition, a, 0.)
# 100 loops, best of 3: 2.07 ms per loop
%timeit my_where(condition, a, 0.)
# 100 loops, best of 3: 2.36 ms per loop

For smaller arrays, it is worse. E.g., for size=100, one becomes dominated by all the initialisation: 13.5 and 96.7 micro-sec, respectively.

njsmith · 2014-09-22T23:23:16Z

Note that implementing where() via indexing requires a special case for 0d
inputs.

where(True, 5, 10)

works, but

result[condition] = ...

will fail because of how 0d indexing works.

juliantaylor · 2014-09-23T07:10:02Z

where can be double as fast as indexing if the condition is more friendly to branch prediction (dense or sparse) but that should not be an issue, we can add a wrapper python function that does indexing on subclasses and calls to the existing C function for ndarrays
for 1.9.1 we should probably just disable the broken subclass keeping as it brakes code.

seberg · 2014-09-23T07:59:31Z

result[np.asarray(condition)] is a workaround until our deprecation is all done and we can change it in master.

the C implementation cannot preserve subclass attributes, this needs a python wrapper using indexing for subclasses. closes numpygh-5095

mhvk · 2014-09-23T20:31:21Z

@juliantaylor - I saw in #5105 that you hope to produce a version that handles subclasses correctly. Let me know if I can help in any way. Also, shall I raise a new issue reminding us of this?

juliantaylor · 2014-09-23T20:47:31Z

too bad broadcast_arrays doesn't preserve subtypes, else this would probably already be it:

        a = np.asanyarray(a)
        b = np.asanyarray(b)
        cond, a, b = np.broadcast_arrays(cond, a, b)
        res = b.copy()
        res[cond] = a[cond]
        return res

I wonder if thats fixable

mhvk · 2014-09-23T20:57:43Z

See #4622! (about to be merged by @seberg)

mhvk · 2014-09-23T20:58:47Z

p.s. This is indeed why I thought of my example. Note that broadcast_arrays does upgrade to an array, so I think one could even omit the asanyarray.

juliantaylor · 2014-09-23T21:36:22Z

oh very nice, I'll have a look at supporting subclasses when its merged
should be something along the lines of this:
https://github.com/juliantaylor/numpy/tree/where-subclass
but it still needs some work to succeed the tests with special float values

WeatherGod · 2015-07-29T17:22:41Z

Sorry for resurrecting an old thread, but I finally tracked down the source of an obscure bug in one of our production systems, and I think it relates to this. The line (np.where(x > 224, 224, x)/25.4).filled(0) where x was a masked array, used to work fine because np.where() returned a masked array that had a method filled(). I know I can use np.ma.where(), and I will going forward, but changing the behavior of np.where() did break legitimate code.

charris · 2015-07-29T17:36:21Z

I've put a preliminary 1.10 milestone on this so that it gets looked at.

mhvk · 2015-07-29T17:50:00Z

@charris - be sure to reopen the issue, otherwise it probably won't get looked at at all! (@juliantaylor -- a proper solution would still be very welcome!)

njsmith · 2015-07-29T17:53:06Z

Fwiw would be comfortable with not considering this a 1.10 blocker and
instead letting it sit until someone who has a specific interest in
subclasses shows up with a proposal+fix. Right now it sounds like no one is
sure what a proper fix would even be. And while the behavior in master
might not be optimal, it's not clear that there's even any back compat gain
to be had, given that master matches 1.9 and so everyone who wants to
support 1.9 already has to adapt regardless. Leaving it in 1.10 won't
create any new additional burden.
On Jul 29, 2015 10:36 AM, "Charles Harris" notifications@github.com wrote:

I've put a preliminary 1.10 milestone on this so that it gets looked at.

—
Reply to this email directly or view it on GitHub
#5095 (comment).

shoyer · 2015-07-29T18:11:47Z

np.where works like a ufunc, though I know it isn't implemented that way internally. So I think the right solution to this is probably to simply use __numpy_ufunc__ and let subclasses/duck-types override the functionality that way.

@mhvk's proposed solution is interesting, but it won't work for all duck-array types:

out = first.copy()
out[condition] = second[condition]

The problem is that some of the duck-array types for which where is most useful are actually immutable (e.g., dask.array). These types need their own solution, which they can implement with __numpy_ufunc__.

WeatherGod · 2015-07-29T18:15:16Z

I should also point out that my code example used to work because the condition was a masked array. Depending on the type of first would have been misleading in my case as my example has just an integer there.

WeatherGod · 2015-07-29T18:35:29Z

...and looking over my company's internal bug tracker, this may be the source of another bug that was customer-facing, causing our shipped netCDF files to have very negative values in places that should have been masked, and our contours to look a bit silly. We never figured out the exact cause of the issue, but I am now going to see if an np.where() was used.

charris · 2015-07-29T20:00:00Z

Perhaps we should raise an error until the subclass issue is fixed. @juliantaylor Have you had any time to pursue your solution?

WeatherGod · 2015-07-29T20:09:21Z

I think I would be more comfortable with a warning. The sneaky nature of
this problem indicates that the results are more like a degraded result
rather than "wrong", especially in the face of duck-typing elsewhere. A
warning would have pulled me to this spot in the code faster, and I would
have reported it sooner as such. It is quite possible that other people
have encountered similar problems and merely shrugged and worked around it,
or didn't figure out the source of it.

On Wed, Jul 29, 2015 at 4:00 PM, Charles Harris notifications@github.com
wrote:

Perhaps we should raise an error until the subclass issue is fixed.
@juliantaylor https://github.com/juliantaylor Have you had any time to
pursue your solution?

—
Reply to this email directly or view it on GitHub
#5095 (comment).

charris · 2016-01-21T17:26:11Z

So this is still up in the air. Pushing off to 1.12.

mhvk · 2019-06-17T15:47:11Z

Closing since one can now override np.where with __array_function__. Hopefully, eventually where becomes a proper ufunc.

This was referenced Sep 21, 2014

Monkey-patch non-ufunc Numpy functions astropy/astropy#1274

Closed

doctest failure in time.core with --remote-data astropy/astropy#2958

Merged

juliantaylor mentioned this issue Sep 23, 2014

BUG: don't propagate subtypes from np.where #5105

Merged

juliantaylor added a commit to juliantaylor/numpy that referenced this issue Sep 23, 2014

BUG: don't propagate subtypes from np.where

8dd2e47

the C implementation cannot preserve subclass attributes, this needs a python wrapper using indexing for subclasses. closes numpygh-5095

charris closed this as completed in #5105 Sep 23, 2014

mhvk mentioned this issue Sep 30, 2014

Table MaskedColumn updating issues astropy/astropy#2997

Closed

mhvk mentioned this issue Oct 18, 2014

Quantities become arrays using np.choose astropy/astropy#3035

Closed

charris added the 06 - Regression label Jul 29, 2015

charris added this to the 1.10.0 release milestone Jul 29, 2015

njsmith reopened this Jul 29, 2015

ndjensen mentioned this issue Aug 3, 2015

Inconsistent dtype behavior of numpy.where() #6155

Closed

charris modified the milestones: 1.11.0 release, 1.10.0 release Oct 13, 2015

charris modified the milestones: 1.12.0 release, 1.11.0 release Jan 21, 2016

rgommers modified the milestone: 1.12.0 release Feb 15, 2017

mhvk mentioned this issue Feb 27, 2017

np.where does not work with Quantity astropy/astropy#5842

Closed

eric-wieser mentioned this issue Apr 26, 2017

ENH: make np.where a ufunc #8994

Open

mhvk mentioned this issue Apr 20, 2018

Functions select and where don't preserve subclasses #10933

Closed

mhvk closed this as completed Jun 17, 2019

domsmrz mentioned this issue Apr 20, 2024

BUG: np.where called with ps.Series returns np.array instead of pd.Series pandas-dev/pandas#58329

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG np.where half-initializes subclass of output #5095

BUG np.where half-initializes subclass of output #5095

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG np.where half-initializes subclass of output #5095

BUG np.where half-initializes subclass of output #5095

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!