-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Memory bloat using numpy.ma.median (Py 2.7.4, Np 1.7.1) #4814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
they are related np.ma.sort prior to 1.9 is creates a index array with 8 byte indices as large as the original array, 4760 also improves that and reduces the memory by 40% or more if you are using float arrays. |
Yeah so my call was to numpy.ma.median(var,overwrite_input=True,axis=0) so theoretically, jumping up to 1.9.0b1 should give me a big jump in performance and memory usage too? And yep, var is a float.. |
yes, if you can replace the masked values with nans nanmedian should be even better along axis 0 |
Wow, you're not kidding.. On a different machine using Py 2.7.7/Np 1.9.0b1 numpy.ma.median calculation took 6s (4Gb) whereas with Py 2.7.4/Np 1.7.1 it took 305s (60Gb). This is on a much smaller (7305,22,3600) float array.. Nice performance jump! |
closing, as it is now much better, there might still be some room for improvement but its reasonably usable now. |
I've got a large (7000,180,3600) masked array that I've been sending to numpy.ma.median.
Prior to the function call memory used on the system is 20Gb (I have a bunch of other variables I'm also dealing with), and upon the function call memory quickly increases to 200 and then 300Gb! without even returning a result (I kill the job before it completes)
Is this expected behaviour?
I've tried using the overwrite_input=True option, however this doesn't seem to resolve anything.
I'm using python 2.7.4 and numpy 1.7.1
The text was updated successfully, but these errors were encountered: