10000 ENH: rewrite ma.median to improve poor performance for multiple dimensions by juliantaylor · Pull Request #4760 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: rewrite ma.median to improve poor performance for multiple dimensions #4760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 2, 2014

Conversation

juliantaylor
Copy link
Contributor

many masked median along a small dimension is extremely slow due to the
usage of apply_along_axis which iterates fully in python. The unmasked
median is about 1000x faster.

Work around this issue by using indexing to select the median element
instead of apply_along_axis.

Further improvements are possible, e.g. using the current np.nanmedian
approach for masked medians along large dimensions so partition is used
instead of sort or to extend partition to allow broadcasting over
multiple elements.

Closes gh-4683.

@juliantaylor
Copy link
Contributor Author

I'd like this in 1.9 but it still needs some iterations, e.g. using partition instead of sort for larger dimensions.
this technique should also be used to the new nanmedian and nanpercentile.
posted now for comments, maybe I'm overlooking some neat indexing trick that would make this nicer.

@@ -6188,7 +6188,7 @@ def sort(a, axis= -1, kind='quicksort', order=None, endwith=True, fill_value=Non
else:
filler = fill_value
# return
indx = np.indices(a.shape).tolist()
indx = [x for x in np.indices(a.shape)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg the testsuite works but I'm not sure if that is really the same for large dimensions
if not possibly a argument to tolist that does not convert to python longs would be good

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, if anything, the old code could misbehave (but it won't unless you plug in a 32 dimensional array)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so when indexing whether the input is a tuple or array only matters for the first dimension?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is wonky. But since you have a a list of arrays (or lists), yes, unless the first dimension has more then 32 entries (which probably is a bug for 32 dimensional input in the old code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nvm. about the bug. I doubt it is there, but you are not changing that part. Would have to pass in a tuple to make it a reliable error. (doesn't hurt anyway, only is a tiny bit slower because tuples parse so slow -- old keyword arguments being slow thing)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually here a sparse meshgrid seems to be the same, and uses significantly less memory, I'll update to that

@juliantaylor
Copy link
Contributor Author

added usage of masked median to nanmedian at a cutoff of 400 elements in the axis. on larger arrays the faster partition gains more than the apply_along_axis costs

as sorting nans has the same properties a masked sort (moving them to the end) one could probably duplicate the masked median code to squeeze some more performance as the masked sort is heavy on indexing and memory.
The masked median could probably also benefit from the threshold to use partition, but that involves quite extensive refactoring which would probably take too long to still go into 1.9.

I'm probably also not going to do the same for percentile for 1.9 anymore either.

def _nanmedian_small(a, axis=None, out=None, overwrite_input=False):
"""
sort + indexing median, faster for small medians along multiple dimensions
due to the high overhead of apply_along_axis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might mention nanmedian for parameter documentation.

8000
@charris
Copy link
Member
charris commented Jun 2, 2014

Not sure of all the details, I'll leave that to the tests.

tolist() converts numpy integers to python integers which are converted
back to numpy integers by the indexing.
meshgrid(indexing='ij') returns the indices wanted here as the right type.
triples performance of sorting a size=(200, 200, 50) array along axis 2
and reduces memory usage by almost 40%.
…sions

many masked median along a small dimension is extremely slow due to the
usage of apply_along_axis which iterates fully in python. The unmasked
median is about 1000x faster.

Work around this issue by using indexing to select the median element
instead of apply_along_axis.

Further improvements are possible, e.g. using the current np.nanmedian
approach for masked medians along large dimensions so partition is used
instead of sort or to extend partition to allow broadcasting over
multiple elements.

Closes numpygh-4683.
@charris
Copy link
Member
charris commented Jun 2, 2014

How much gain does this bring to the small array performance? It might make sense to push this off to the next release so you can spend more time with it.

@juliantaylor
Copy link
Contributor Author

easily a factor 500, the performance of many small medians is abysmal as the pure python logic in apply_along_axis is very slow compared to the median.
I have seen many stumble over this issue, its pretty common in astronomy where you often stack a bunch of images using a median, e.g. 1000x1000x100 stacks. It also came up on the mailing list at least once.
the only existing workaround is using bottleneck or writing this yourself which not trivial.

@charris
Copy link
Member
charris commented Jun 2, 2014

Well, that's significant ;) Anything we could do to speed up apply_along_axis?

@juliantaylor
Copy link
Contributor Author

in the linked issue I listed the ideas I had to solve this issue.
improving apply_along_axis should be possible and worthwhile by moving it to C code using npyiter, possibly one could also cythonize it

@juliantaylor
Copy link
Contributor Author

updated fixing the minor style issue and added an explicit testcase for the two nanmedian code paths

charris added a commit that referenced this pull request Jun 2, 2014
ENH: rewrite ma.median to improve poor performance for multiple dimensions
@charris charris merged commit 14cc717 into numpy:master Jun 2, 2014
@charris
Copy link
Member
charris commented Jun 2, 2014

Thanks Julian.

AmitAronovitch added a commit to AmitAronovitch/numpy that referenced this pull request May 21, 2016
Fixes numpy#5969.
Performance fix numpy#4760 had caused wrong shaped results in the 1D case.
This fix restores the original 1D behavior.
charris pushed a commit to charris/numpy that referenced this pull request May 22, 2016 8284
Fixes numpy#5969.
Performance fix numpy#4760 had caused wrong shaped results in the 1D case.
This fix restores the original 1D behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

poor performance of multidimensional masked median
3 participants
0