-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Fix memory leak in concatenate. #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Temporary array was not being freed after use.
@@ -602,6 +602,7 @@ | |||
for (iarrays = 0; iarrays < narrays; ++iarrays) { | |||
Py_DECREF(arrays[iarrays]); | |||
} | |||
PyArray_free(arrays); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PyDataMem_* functions might be preferable here, might also be a bit of overkill...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm certainly no expert, but I know a memory leak when I see one, and I attempted to plug it in the simplest/quickest/least-invasive way I could find. I really have no preference as to how it gets fixed, as long as it gets fixed (preferably soon). [grin]
Do you have (or could you quickly throw together) an alternative patch that uses the method you propose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the patch looks fine, but there was a recent commit that allows tracking memory usage with the PyDataMem_NEW/PyDataMem_FREE. functions, so I'd like some input from the patch author as to what all memory we should be tracking and what we shouldn't worry about. You can see the routines themselves as they are in the same code unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PyDataMem_* functions are for allocating array data. This is allocating a c-array of pointers to ArrayObjects. I don't think it would be helpful to track this in the same way.
PyArray_malloc/free seem like the right approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that optionally uses the Python allocation. My own preference would probably be malloc/free if we don't want to track memory usage in these things.
@thouis Could you comment on where you think the new memory allocation functions should be used? |
The patch looks correct to me as well. Regarding PyArray_malloc versus PyDataMem_NEW, I don't think @thouis's patch changes anything directly -- there are a ton of existing usages for both, with PyArray_malloc used for random small chunks of memory and PyDataMem_NEW used for array data attributes. We might well want to revisit the question of whether this distinction makes sense, which things exactly should be tracked, etc., but the current usage in PyArray_Concatenate is consistent with the other ~100 calls to PyArray_malloc I see in the source tree. |
I'll put this in. The question of memory management needs to be clarified at some point, what with the multiplicity of methods we have at the moment, but I suppose we can do that at some future date. |
Fix memory leak in concatenate.
As @njsmith says, my patch shouldn't change anything directly. My understanding is that PyDataMem_* should be used for array data, and also probably anything that grows linearly with array data, e.g., temporaries that are the same size as the input arrays, such as (I think) is the case in some of the npysort functions (though I don't believe this is currently the case everywhere, particularly the sort functions). That said, tracking all allocations wouldn't necessarily be a bad idea. While working on the patch, I wondered if there was a good reason for the direct use of malloc/free vs. the python allocator. Is there a good reason not to just use the Python allocator for everything? The one thing I can think of is that calling the python allocator might cause arbitrary python code to run, due to the GC. This question would be something else to consider, as long as memory management in numpy is being clarified. |
Hi Thouis, When Enthought ported Numpy to IronPython and .net they needed to do their own memory management, that is one of the reasons that I tend to favor separating Numpy internals from Python. Whether that is worth the effort is another question, I'll be curious to see what Mark has been working on and how he deals with that issue. |
Temporary array was not being freed after use.