add extended axis and keepdims support to percentile and median #3908

juliantaylor · 2013-10-13T16:46:53Z

No description provided.

juliantaylor · 2013-10-13T16:47:13Z

An issue is the new median doesn't call mean anymore, so it will break astropy

8000

juliantaylor · 2013-10-13T16:57:12Z

using percentile to implement median makes it twice as slow for small arrays (< 10000 elements)
maybe we can keep the explicit the median code?

njsmith · 2013-10-13T17:40:10Z

What about making percentile faster, can we do that? :-)
On 13 Oct 2013 17:57, "Julian Taylor" notifications@github.com wrote:

using percentile to implement median makes it twice as slow for small
arrays (< 10000 elements)
maybe we can keep the explicit the median code?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3908#issuecomment-26221596
.

juliantaylor · 2013-10-13T18:45:11Z

improved percentile a bit, only 50% slower now for even elements, about same speed for 5k element arrays with odd number of elements.

charris · 2013-10-14T23:47:36Z

numpy/lib/function_base.py

+    """
+    call func with a as first argument swapping axis to use extended axis
+    on functions that don't support it natively
+    returns result and a.shape with axis dims set to 1


Needs summary line, parameters documentation, etc.. In other words, the usual documentation. Blank line before last """`.

Could also use more consistent line length.

charris · 2013-10-16T18:22:07Z

An issue is the new median doesn't call mean anymore, so it will break astropy

Does astropy overload mean?

juliantaylor · 2013-10-16T18:24:12Z

yes see #3851

charris · 2013-10-16T18:46:09Z

Right, forgot about that. @astrofrog I wonder if there is a solution that doesn't involve numpy contortions.

astrofrog · 2013-10-16T20:03:31Z

@charris - I really hope there is a better solution, as I am aware that what we use right now is fairly hacky I know that __numpy_ufunc__ is going to be in 1.9, but that doesn't solve the problem of how to ensure that mean/median/etc also get processed by the sub-class since those aren't ufuncs. It would be great if these functions were made (in Numpy) to call some kind of finalize/wrap method on sub-classes once they are done (similarly to ufuncs). Have there been any related proposals in the past?

pv · 2013-10-17T08:30:08Z

mean, var, std, min, max, all, any are all implemented via ufuncs, so __numpy_ufunc__ should capture them.

median uses partition, which however cannot be captured.

pv · 2013-10-17T08:35:43Z

However, I don't see real obstacles for implementing mean et al. as gufuncs. This might also be a performance increase, as these methods currently use temporary arrays etc.

astrofrog · 2013-10-17T09:00:08Z

@pv - so just so I understand, is there a reason why partition can't be made to behave like mean et al? (in terms of __numpy_ufunc__)

seberg · 2013-10-17T10:09:01Z

@astrofrog, since, at least normally, partition uses a weighted calculation a simple mean just can't do it. Which would make it a bit of a kludge to use a mean.

seberg · 2013-12-05T14:09:04Z

numpy/lib/function_base.py

-    if (q < 0).any() or (q > 1).any():
-        raise ValueError(
-            "Percentiles must be in the range [0,100]")
+    # faster than np.any(q < 0) or np.any(q < 1), relevant for small arrays


Hmmm, not sure, but large arrays may be plausible, too. I suppose it wouldn't be here if it wasn't a big difference (and yeah, 2-3 or so elements are probably common). But maybe we should make a threshold?

reductions are so freakishly slow :(
the any() code takes 12 us, while a median of 5000 doubles takes 55us in total (including all the weighted partition stuff)

using count_nonzero is better for large q, but still slower than the python loop for len(q) < 10
nonzero wins if len(q) > 10, while median does not get much more expensive, I guess a threshold could make sense

In [3]: q = np.array([0.5]) In [4]: %timeit ((q < 0).any() or (q > 1).any()) 100000 loops, best of 3: 12.1 µs per loop In [5]: %timeit ((q < 0) | (q > 1)).any() 100000 loops, best of 3: 8.79 µs per loop In [6]: %timeit np.count_nonzero((q < 0) | (q > 1)) 100000 loops, best of 3: 4.8 µs per loop In [7]: def f(q): ...: for i in range(q.size): ...: if q[i] < 0. or q[i] > 100.: ...: pass ...: In [8]: %timeit f(q) 1000000 loops, best of 3: 841 ns per loop

seberg · 2013-12-05T14:22:21Z

Didn't try it out live, but the tests look quite extensive anyway. @astrofrog how important is that mean call for you? If it is important, we can maybe just add check whether partition returned a base class array, and if not call the (no-op) .mean()

astrofrog · 2013-12-05T15:04:16Z

@seberg - what's important for us is that running np.mean and np.median continue to call .mean() for numpy array sub-classes. Since partition is a new function, we don't really support it currently so it doesn't matter as much if it doesn't end up calling a method on the sub-class. Does this make sense?

charris · 2013-12-22T01:03:20Z

This actually looks to have passed. Ping Travis to make it official.

charris · 2013-12-22T01:16:50Z

@astrofrog @seberg What to do here? I really don't like working around subclasses, numpy was not designed as a base class, should not be used as such, and we end up like Gulliver tangled up and paralysed in zillions of tiny hassles. I'd really like to see astropy not depend on other routines calling mean and median methods when available. At the very least, astropy should implement it's own median if numpy should be calling it. Forcing median to call mean just doesn't make sense to me. Grrr...

astrofrog · 2013-12-24T08:41:10Z

@charris - I would disagree with the statement regarding sub-classing ndarray - it is a clearly documented feature in the Numpy Basics section (http://docs.scipy.org/doc/numpy/user/basics.subclassing.html). Having said that, I completely agree that relying on np.median to call np.mean is a hack that we should transition away from as soon as possible. Which is why np.median and np.partition should ideally support ndarray sub-classes (without calling np.mean). Why can median and partition not be made to call __array_wrap__/__array_finalize__/__numpy_ufunc__ or another equivalent sub-class method?

seberg · 2013-12-24T12:40:42Z

Calling __array_wrap__ seems reasonable to me. Frankly, we could probably call it much more often but to do that more general, but the state of all this subclassing isn't great anyway, and I doubt that alone might fix it.

juliantaylor · 2014-01-05T15:44:29Z

updated according to sebergs comments, I don't see a reasonable way to call mean in this variant, just calling r.reshape(-1, 1).mean(axis=-1) on the result won't work because the result might not be a ndarray anymore but a scalar.
Similar with __array_wrap__ it won't work on scalars.

juliantaylor · 2014-02-15T11:34:17Z

any ideas what to do with this?
played a bit more with __array_wrap__ and saw that neither quantities nor astropy would work with it except when we pretend percentile is a add ufunc

juliantaylor · 2014-03-12T19:56:24Z

ok I reverted back to a separate median, can this now be looked at? its blocking the nan median PRs

charris · 2014-03-13T00:20:15Z

numpy/lib/function_base.py

+            for i, s in enumerate(sorted(keep)):
+                a = a.swapaxes(i, s)
+            # merge reduced axis
+            a = a.reshape(a.shape[:nkeep] + (np.prod(a.shape[nkeep:]),))


Could use (-1,) instead of (np.prod(a.shape[nkeep:]),).

charris · 2014-03-13T01:05:05Z

LGTM, modulo nitpicks. @astrofrog Does this work for you?

There are two things that it would be nice to have implemented for ndarrays, missing values and units. We tried the first and ended up deadlocked, and I'm not sure how we could deal with the second in a reasonable way... :( That said, we need to draw the line somewhere in dealing with ndarray subclasses, or at least come to some conclusion about general policy. Going far down this road leads to madness.

njsmith · 2014-03-13T02:11:46Z

@charris: bit pattern NAs and units are both easy if we have parametrized
dtypes + base class ndarrays.
On 13 Mar 2014 01:05, "Charles Harris" notifications@github.com wrote:

LGTM, modulo nitpicks. @astrofrog https://github.com/astrofrog Does
this work for you?

There are two things that it would be nice to have implemented for
ndarrays, missing values and units. We tried the first and ended up
deadlocked, and I'm not sure how we could deal with the second in a
reasonable way... :( That said, we need to draw the line somewhere in
dealing with ndarray subclasses, or at least come to some conclusion about
general policy. Going far down this road leads to madness.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3908#issuecomment-37489469
.

astrofrog · 2014-03-13T12:55:25Z

@charris thanks for the heads up! Just checking to see if it works for us.

astrofrog · 2014-03-13T13:06:23Z

@charris - all tests in Astropy pass with this version of the Numpy branch.

Thanks for being willing to compromise to not break our subclassing. I wonder whether it would be worth having a BoF at SciPy 2014 to discuss subclassing Numpy arrays and discuss a long-term solution? We use it for unit support and I think some other packages do.

charris · 2014-03-13T19:06:36Z

numpy/lib/function_base.py

+    -------
+    result : tuple
+        Result of func(a, **kwargs) and a.shape with axis dims set to 1
+        suiteable to use to archive the same result as the ufunc keepdims


suiteable <- suitable.

archive <- achieve?

fixed by reformulating the sentence

Merging median and percentile make would break astropy and quantities as we don't call mean anymore. These packages rely on overriding mean to add their own median behavior.

add extended axis and keepdims support to percentile and median

charris · 2014-03-13T22:34:26Z

8C7C

Finally ;) Thanks Julian.

charris reviewed Oct 14, 2013
View reviewed changes

charris mentioned this pull request Oct 19, 2013

BUG: core: ensure __r*__ has precedence over __numpy_ufunc__ #3856

Merged

seberg reviewed Dec 5, 2013
View reviewed changes

charris closed this Dec 22, 2013

charris reopened this Dec 22, 2013

juliantaylor mentioned this pull request Feb 13, 2014

Closes issue #586. Updates handling of nan behaviour in numpy.lib.function_base.median. #4287

Closed

juliantaylor added the Needs decision label Feb 24, 2014

empeeu mentioned this pull request Mar 8, 2014

ENH: Added proper handling of nans for numpy.lib.function_base.median #4460

Closed

juliantaylor removed the Needs decision label Mar 12, 2014

charris reviewed Mar 13, 2014
View reviewed changes

ENH: add extended axis and keepdims support to median and percentile

eea1a9c

charris reviewed Mar 13, 2014
View reviewed changes

MAINT: revert back to separate median implementation

7d53c81

Merging median and percentile make would break astropy and quantities as we don't call mean anymore. These packages rely on overriding mean to add their own median behavior.

charris added a commit that referenced this pull request Mar 13, 2014

Merge pull request #3908 from juliantaylor/median-percentile

48c77a6

add extended axis and keepdims support to percentile and median

charris merged commit 48c77a6 into numpy:master Mar 13, 2014

charris mentioned this pull request Mar 27, 2014

ENH: added functionality nanmedian to numpy #4307

Merged

rkern mentioned this pull request Nov 17, 2023

DOC: Unexpected values from np.median when passing a sequence of ints for axis= #25174

Closed

Uh oh!

add extended axis and keepdims support to percentile and median #3908

add extended axis and keepdims support to percentile and median #3908

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!