8000 Memory bloat using numpy.ma.median (Py 2.7.4, Np 1.7.1) · Issue #4814 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Memory bloat using numpy.ma.median (Py 2.7.4, Np 1.7.1) #4814

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
durack1 opened this issue Jun 19, 2014 · 6 comments
Closed

Memory bloat using numpy.ma.median (Py 2.7.4, Np 1.7.1) #4814

durack1 opened this issue Jun 19, 2014 · 6 comments

Comments

@durack1
Copy link
durack1 commented Jun 19, 2014

I've got a large (7000,180,3600) masked array that I've been sending to numpy.ma.median.
Prior to the function call memory used on the system is 20Gb (I have a bunch of other variables I'm also dealing with), and upon the function call memory quickly increases to 200 and then 300Gb! without even returning a result (I kill the job before it completes)

Is this expected behaviour?

I've tried using the overwrite_input=True option, however this doesn't seem to resolve anything.

I'm using python 2.7.4 and numpy 1.7.1

@durack1 durack1 changed the title Memory bloat using numpy.ma.median Memory bloat using numpy.ma.median (Py 2.7.4, Np 1.7.1) Jun 19, 2014
@durack1
Copy link
Author
durack1 commented Jun 19, 2014

Looks like this might be a similar issue to #4760 and #4683 - though specifically I'm more concerned about the memory usage than the function performance

@juliantaylor
Copy link
Contributor

they are related np.ma.sort prior to 1.9 is creates a index array with 8 byte indices as large as the original array, 4760 also improves that and reduces the memory by 40% or more if you are using float arrays.
the new nanmedian should use even less memory if the array shape allows it to take the faster partition code path (median accross first or last dimension but not the second)

8000

@durack1
Copy link
Author
durack1 commented Jun 19, 2014

Yeah so my call was to numpy.ma.median(var,overwrite_input=True,axis=0) so theoretically, jumping up to 1.9.0b1 should give me a big jump in performance and memory usage too?

And yep, var is a float..

@juliantaylor
Copy link
Contributor

yes, if you can replace the masked values with nans nanmedian should be even better along axis 0

@durack1
Copy link
Author
durack1 commented Jun 20, 2014

Wow, you're not kidding.. On a different machine using Py 2.7.7/Np 1.9.0b1 numpy.ma.median calculation took 6s (4Gb) whereas with Py 2.7.4/Np 1.7.1 it took 305s (60Gb). This is on a much smaller (7305,22,3600) float array..

Nice performance jump!

@juliantaylor
Copy link
Contributor

closing, as it is now much better, there might still be some room for improvement but its reasonably usable now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0