-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: rewrite ma.median to improve poor performance for multiple dimensions #4760
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'd like this in 1.9 but it still needs some iterations, e.g. using partition instead of sort for larger dimensions. |
@@ -6188,7 +6188,7 @@ def sort(a, axis= -1, kind='quicksort', order=None, endwith=True, fill_value=Non | |||
else: | |||
filler = fill_value | |||
# return | |||
indx = np.indices(a.shape).tolist() | |||
indx = [x for x in np.indices(a.shape)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg the testsuite works but I'm not sure if that is really the same for large dimensions
if not possibly a argument to tolist that does not convert to python longs would be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, if anything, the old code could misbehave (but it won't unless you plug in a 32 dimensional array)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so when indexing whether the input is a tuple or array only matters for the first dimension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is wonky. But since you have a a list of arrays (or lists), yes, unless the first dimension has more then 32 entries (which probably is a bug for 32 dimensional input in the old code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm. about the bug. I doubt it is there, but you are not changing that part. Would have to pass in a tuple to make it a reliable error. (doesn't hurt anyway, only is a tiny bit slower because tuples parse so slow -- old keyword arguments being slow thing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually here a sparse meshgrid seems to be the same, and uses significantly less memory, I'll update to that
added usage of masked median to nanmedian at a cutoff of 400 elements in the axis. on larger arrays the faster partition gains more than the apply_along_axis costs as sorting nans has the same properties a masked sort (moving them to the end) one could probably duplicate the masked median code to squeeze some more performance as the masked sort is heavy on indexing and memory. I'm probably also not going to do the same for percentile for 1.9 anymore either. |
def _nanmedian_small(a, axis=None, out=None, overwrite_input=False): | ||
""" | ||
sort + indexing median, faster for small medians along multiple dimensions | ||
due to the high overhead of apply_along_axis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might mention nanmedian for parameter documentation.
Not sure of all the details, I'll leave that to the tests. |
tolist() converts numpy integers to python integers which are converted back to numpy integers by the indexing. meshgrid(indexing='ij') returns the indices wanted here as the right type. triples performance of sorting a size=(200, 200, 50) array along axis 2 and reduces memory usage by almost 40%.
…sions many masked median along a small dimension is extremely slow due to the usage of apply_along_axis which iterates fully in python. The unmasked median is about 1000x faster. Work around this issue by using indexing to select the median element instead of apply_along_axis. Further improvements are possible, e.g. using the current np.nanmedian approach for masked medians along large dimensions so partition is used instead of sort or to extend partition to allow broadcasting over multiple elements. Closes numpygh-4683.
How much gain does this bring to the small array performance? It might make sense to push this off to the next release so you can spend more time with it. |
easily a factor 500, the performance of many small medians is abysmal as the pure python logic in apply_along_axis is very slow compared to the median. |
Well, that's significant ;) Anything we could do to speed up apply_along_axis? |
in the linked issue I listed the ideas I had to solve this issue. |
updated fixing the minor style issue and added an explicit testcase for the two nanmedian code paths |
ENH: rewrite ma.median to improve poor performance for multiple dimensions
Thanks Julian. |
Fixes numpy#5969. Performance fix numpy#4760 had caused wrong shaped results in the 1D case. This fix restores the original 1D behavior.
Fixes numpy#5969. Performance fix numpy#4760 had caused wrong shaped results in the 1D case. This fix restores the original 1D behavior.
many masked median along a small dimension is extremely slow due to the
usage of apply_along_axis which iterates fully in python. The unmasked
median is about 1000x faster.
Work around this issue by using indexing to select the median element
instead of apply_along_axis.
Further improvements are possible, e.g. using the current np.nanmedian
approach for masked medians along large dimensions so partition is used
instead of sort or to extend partition to allow broadcasting over
multiple elements.
Closes gh-4683.