-
-
Notifications
You must be signed in to change notification settings - Fork 11k
Use PyMem_RawMalloc on Python 3.4 and newer #7404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Extract of the gdb traceback where PyMem_Malloc() is currently called with the GIL released:
|
PyMem_Malloc() is called in 3 other places:
fortranobject.c: It looks like fortran_doc() is called with the GIL held. (PyMem_Free is also called with the GIL held in this file.) numpy/random/mtrand/mtrand.pyx: It looks like PyMem_Malloc() is also called with the GIL held, but I don't know enough Cython to double check :-/ (Same for PyMem_Free(), it also looks to be correctly used.) |
It appears this does not fix the issue for Python <= 3.3 (in particular 2.7)? |
In practice, PyMem_Malloc() can be called without the GIL being held until Python 3.6. But you may have issues if you play with the tracemalloc debugger or other debug tools. It may change in Python 3.6 with my patch: With my patch, calling PyMem_Malloc() without the GIL being held can fail badly. |
Do we do this normally? ufunc.at is not in the best state unfortunately, I would be surprised if we really call PyMem_Malloc without the GIL in many places, though wouldn't rule it out. If it is always safe with this patch, I guess we do not have to worry about it though. If it can fail badly with the new behaviour, isn't there some reason why we should try to prefer using the new behaviour as well? Holding the GIL a little longer in ufunc.at is probably not an issue as well. Just curious. |
I don't see anything wrong with the patch, just asking what the best thing for us to do here is. |
"There are some places where memory is acquired without the GIL, but I think we use malloc in those places. @Haypo What is the advantage of using PyMem_RawMalloc()?" PyMem_RawMalloc() calls malloc() in practice. The advantage of calling PyMem_RawMalloc() instead of calling directly malloc() is that you can use tracemalloc to track memory leaks: The overhead of calling of PyMem_RawMalloc() instead of malloc() is negligible: By the way, I'm working on an extension of the tracemalloc module for numpy: |
@Haypo That sounds good. I guess the question come down to whether it is better to fix the places where |
How do you plan to "fix" the code? It's not trivial to lock/unlock the GIL. If you know that in some cases, the GIL can be released, always use PyMem_RawMalloc(). You might also replace direct calls to malloc() with PyMem_RawMalloc() (on Python 3.4+) to be able to track all memory allocated by numpy using tracemalloc. |
You can use the PyGILState_* functions to reacquire GIL. Of course, this has some performance penalty (and I don't remember whether it bombs subinterpreters, but we don't support those in any case). But it may be better to just use malloc or PyMem_RawMalloc instead. |
Yeah, it's likely to kill performances :-)
In this case, I suggest to use PyMem_RawMalloc on Python 3.4+ or malloc() on older Python version. I see that numpy has a PY_USE_PYMEM define (you can see it in my change). |
Let me rephrase the question: is there an advantage to using PyMem_Malloc() when the GIL is not a problem? |
For debuggers like tracemalloc, the tracemalloc doesn't have to acquire the GIL, so it's faster. If my change https://bugs.python.org/issue26249 is accepted, PyMem_Malloc() will be faster than PyMem_RawMalloc() (faster than malloc()?) for allocations smaller or equal than 512 bytes. |
@Haypo Thanks for the clarification. I ask because, IIRC, we started using more PyMem_Malloc to take advantage of the memory pool for small objects. OTOH, 512 bytes = 8 doubles, which is not a very large array. @juliantaylor Thoughts? |
Do you mean the pymalloc memory allocator optimized for small objects? PyMem_Malloc() doesn't use pymalloc, only PyObject_Malloc(). My change https://bugs.python.org/issue26249 proposes to modify PyMem_Malloc() to use pymalloc too. |
PyArray_malloc should not be used anywhere where it is really relevant, we use PyObject_malloc for objects and our own small data allocator for small arrays. We also have our own variant of tracemalloc so thats not so relevant either. It has been discussed in the past to match that with pythons but pythons api is not sufficiently flexible for that. |
Did you see my recent issue #26530 that I wrote for numpy? I wasn't aware that numpy has its own tool to track memory allocations. Interesting. |
Well, it has its own interface for hooking the memory allocation to track it. AFAIK no-one actually uses these hooks though, so tracemalloc is interesting because it lets us free-ride on the larger python ecosystem :-) |
Just so. There is a definite advantage to using standard tools that people will be more familiar with and probably expect. |
Could use a mention in the |
@charris: 512 bytes is 64 doubles, not 8. |
My pull request fixes support for Python 3.6. Can it be merged? I suggest to defer changes to hold the GIL and to call PyMem_Malloc() rather than PyMem_RawMalloc(). PyMem_Malloc() and PyMem_RawMalloc() use exactly the same memory allocator in Python 3.4 and 3.5. My change to optimize the PyMem_Malloc() is not merged in CPython 3.6 yet, so it still uses the same allocator. |
@Haypo: sure, it sounds like maybe we should think about revisiting how If PyMem_Malloc and PyMem_RawMalloc are the same in 3.4 and earlier, then why does your patch have an |
My patch uses PyMem_RawMalloc on Python >= 3.4. Extract of the patch:
It cannot be used on older Python versions since the function was introduced in Python 3.4 ;-) |
ah, right, silly me :-) Can you expand the comment a bit, to note that (a) numpy sometimes calls |
Change PyArray_malloc() macro to use PyMem_RawMalloc() on Python 3.4 and newer. This macro can be called indirectly from ufunc_at() which releases the GIL, whereas PyMem_Malloc() requires the GIL to be held: https://docs.python.org/dev/c-api/memory.html#memory-interface PyMem_RawMalloc() can be called without the GIL.
Done. Python 3.6 has a new PYTHONMALLOC=debug env variable which will now fail with a fatal error in PyMem_Malloc() if the function is called without holding the GIL. And I'm now more confident to push my change modifying PyMem_Malloc() to reuse PyObject_Malloc() allocator, and so get the fast allocator for memory blocks <= 512 bytes. |
@njsmith: By the way, what's the status of tracemalloc support in numpy? It's not the first time that someone joined #python-fr help channel and had a question about the memory consumption of an application using numpy. |
@njsmith: Oh, I now recall that I was waiting for your feedback on https://bugs.python.org/issue26530 to be able to track more memory allocations in numpy! |
LGTM, thanks -- and sorry about the slowness |
Thank you. You didn't reply to my tracemalloc questions ;-) |
Change PyArray_malloc() macro to use PyMem_RawMalloc() on Python 3.4
and newer. This macro can be called indirectly from ufunc_at() which
releases the GIL, whereas PyMem_Malloc() requires the GIL to be held:
https://docs.python.org/dev/c-api/memory.html#memory-interface
PyMem_RawMalloc() can be called without the GIL.