-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: allow using aligned memory allocation, or exposing an API for memory management #17467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am not opposed, fo rme it was mainly something that other people were always more interested in (e.g. Julian). One issue I see glancing at the PR, is that currently we support for a user to provide the memory block (and NumPy taking ownership). And that usage is broken by gh-5470. So I wonder if we would need an allocation-strategy flag on the array, or something similar? Other than that, the code seems simple enough that it is not an issue, unless there is some negative performance implication. One more question: You want such a higher-aligned memory, but you do not actually care for knowing whether an array provides that alignment? So this doesn't affect the question in gh-17359, right? |
Right, it is orthogonal. |
As envisioned in the PR, the functions are part of the PyArrayObject so each ndarray object would carry the information with it. The PR would have to make sure we use the correct functions consistently in all the right places: everywhere |
Ah, I missed that it was stored on the array-object. That seems like a good solution to be able to change things safely (I am a bit worried that the cache must only be used if it is the current default allocator, maybe be part of the allocator struct, but that is a small thing). So maybe this had also stalled because it required extending the array struct? I think I am in favor of being open to that in any case, it is also useful for cleaning up buffer handling (and who knows what else). |
Speaking of memory management, it would be great if NumPy provides an API in either Python (preferred) or C (we can still make it work) for swapping in a memory manager implemented by 3rd parties. This need comes from GPU libraries: We use something called pinned/pagelocked memory to speed up host-device transfer, and in CuPy, for example, there is a custom memory manager to handle this. But it's not very convenient when we want every NumPy array to be backed by our memory pool, and it's better if NumPy offers an API to do so: # very rough sketch just to illustrate the idea
import numpy as np
import cupy as cp
np.set_allocator(cp.cuda.alloc_pinned_memory)
a = np.empty(100) # use pinned memory This would make our life a lot easier (rel: cupy/cupy#3625). cc possibly interested people @jakirkham @kmaehashi @gmarkall |
cc @pentschev @hmaarrfk (also likely interested) |
Thanks Matti! 😄 There can be some value to overriding |
I think |
Thank you @mattip for considering our need! The rendered doc looks great -- It seems the memory hooks would allow us to hack something together so that a
As far as I can tell, this is a valid observation. If it's hard to override |
Yeah I think it is possible to handle |
I agree with comments from @jakirkham and @leofang , it would be useful to override |
@pentschev I am not sure what you meant by "using the hooks". How would you detect that a chunk of memory was modified? Just to be clear: I am reasonably sure I correctly identified malloc/realloc/free callsites inside NumPy and replaced them with function calls that use the hooks. However I did not touch the ~250 uses of memcpy inside NumPy. This is a large task, and one I would prefer to defer. My question was if people contemplating using the malloc/realloc/free hooks forsee problems due to glibc's memcpy being used internally by NumPy? |
Sorry @mattip , I think I confused myself in several ways above. When I wrote about hooks, I was thinking of memory allocations, internal |
Yeah I don't think we are worried about when NumPy uses |
Fixes gh-17467. Adds a public struct to hold memory manipulation routines PyDataMem_Handler and two new API functions PyDataMem_SetHandler to replace the current routines with the new ones, and PyDataMem_GetHandlerName to get the string name of the current routines (either globally or for a specific ndarray object). This also changes the size of the ndarray object to hold the PyDataMem_Handler active when it was created so subsequent actions on its data memory will remain consistent. Tests and documentation are included. Along the way, I found some places in the code where the current policy is inconsistent (all data memory handling should have gone through npy_*_cache not PyDataMem_*) so even if this is rejected it might improve the cache handling. The PyDataMem_Handler has fields to override memcpy, these are currently not implemented: memcpy in the code base is untouched. I think this PR is invasive enough as-is, if desired memcpy can be handled in a follow-up PR. * ENH: add and use global configurable memory routines * ENH: add tests and a way to compile c-extensions from tests * fix allocation/free exposed by tests * DOC: document the new APIs (and some old ones too) * BUG: return void from FREE, also some cleanup * MAINT: changes from review * fixes from linter * setting ndarray->descr on 0d or scalars mess with FREE * make scalar allocation more consistent wrt np_alloc_cache * change formatting for sphinx * remove memcpy variants * update to match NEP 49 * ENH: add a python-level get_handler_name * ENH: add core.multiarray.get_handler_name * Allow closure-like definition of the data mem routines * Fix incompatible pointer warnings * Note PyDataMemAllocator and PyMemAllocatorEx differentiation Co-authored-by: Matti Picus <matti.picus@gmail.com> * Redefine default allocator handling * Always allocate new arrays using the current_handler * Search for the mem_handler name of the data owner * Sub-comparisons don't need a local mem_handler * Make the default_handler a valid PyDataMem_Handler * Fix PyDataMem_SetHandler description (NEP discussion) * Pass the allocators by reference * Implement allocator context-locality * Fix documentation, make PyDataMem_GetHandler return const * remove import of setuptools==49.1.3, doesn't work on python3.10 * Fix refcount leaks * fix function signatures in test * Return early on PyDataMem_GetHandler error (VOID_compare) * Add context/thread-locality tests, allow testing custom policies * ENH: add and use global configurable memory routines * ENH: add tests and a way to compile c-extensions from tests * fix allocation/free exposed by tests * DOC: document the new AP 6731 Is (and some old ones too) * BUG: return void from FREE, also some cleanup * MAINT: changes from review * fixes from linter * setting ndarray->descr on 0d or scalars mess with FREE * make scalar allocation more consistent wrt np_alloc_cache * change formatting for sphinx * remove memcpy variants * update to match NEP 49 * ENH: add a python-level get_handler_name * ENH: add core.multiarray.get_handler_name * Allow closure-like definition of the data mem routines * Fix incompatible pointer warnings * Note PyDataMemAllocator and PyMemAllocatorEx differentiation Co-authored-by: Matti Picus <matti.picus@gmail.com> * Redefine default allocator handling * Always allocate new arrays using the current_handler * Search for the mem_handler name of the data owner * Sub-comparisons don't need a local mem_handler * Make the default_handler a valid PyDataMem_Handler * Fix PyDataMem_SetHandler description (NEP discussion) * Pass the allocators by reference * remove import of setuptools==49.1.3, doesn't work on python3.10 * fix function signatures in test * try to fix cygwin extension building * YAPF mem_policy test * Less empty lines, more comments (tests) * Apply suggestions from code review (set an exception and) Co-authored-by: Matti Picus <matti.picus@gmail.com> * skip test on cygwin * update API hash for changed signature * TST: add gc.collect to make sure cycles are broken * Implement thread-locality for PyPy Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * Update numpy/core/tests/test_mem_policy.py Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net> * fixes from review * update circleci config * fix test * make the connection between OWNDATA and having a allocator handle more explicit * improve docstring, fix flake8 for tests * update PyDataMem_GetHandler() from review * Implement allocator lifetime management * update NEP and add best-effort handling of error in PyDataMem_UserFREE * ENH: fix and test for blindly taking ownership of data * Update doc/neps/nep-0049.rst Co-authored-by: Elias Koromilas <elias.koromilas@gmail.com>
Submitting this here since it is easier to link to all the issues and PRs dealing with memory management. Will submit it to the mailing list after a first-discussion here if needed.
There are three allocation to instantiate a
PyArrayObject
: the python allocation of thePyObject
, the use ofPyArray_malloc
/PyArray_realloc
/PyArray_free
to allocate array dims and strides, and thePyDataMem_NEW
/PyDataMem_NEWZEROED
/PyDataMem_FREE
calls for the array data.I would like to make the case for an API for overloading internal use of
PyDataMem_NEW
,PyDataMem_NEWZEROED
andPyDataMem_FREE
. Work on this was begun in gh-5470, and referred to in issues gh-5312 (asking for an aligned allocator). In addition there is PR gh-5457, to provide a global interface for specifying aligned allocations.As mentioned in the issue, alignment can be crucial for some applications, but in general is just extra overhead, so it should be configurable by the user/app.
I think I would prefer providing an API hook as in gh-5470. That PR actually had two parts: a configurable
PyDataMem*
overrides and a hook mechanism. The hook mechanism was merged (does anyone use that?) but the part I am interested in would requre an API change.A fallback proposal would be to allow specifying a default allocation alignment aka gh-5457, perhaps using
PyArray_malloc_aligned
and friends as part of NumPy with the random API refactor.The text was updated successfully, but these errors were encountered: