8000 ENH: Configurable allocator by mattip · Pull Request #17582 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Configurable allocator #17582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 83 commits into from
Oct 25, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
55f2f6c
ENH: add and use global configurable memory routines
mattip Oct 11, 2020
23da73e
ENH: add tests and a way to compile c-extensions from tests
mattip Oct 16, 2020
94b9f25
fix allocation/free exposed by tests
mattip Oct 18, 2020
81b45fd
DOC: document the new APIs (and some old ones too)
mattip Oct 18, 2020
fc32c2f
BUG: return void from FREE, also some cleanup
mattip Oct 19, 2020
de22327
MAINT: changes from review
mattip Oct 19, 2020
38274a4
fixes from linter
mattip Mar 11, 2021
59b520a
setting ndarray->descr on 0d or scalars mess with FREE
mattip Apr 16, 2021
5264019
make scalar allocation more consistent wrt np_alloc_cache
mattip Apr 16, 2021
7c396d7
change formatting for sphinx
mattip Apr 17, 2021
de9001c
remove memcpy variants
mattip Apr 19, 2021
9e7c3ed
update to match NEP 49
mattip May 3, 2021
953cc88
ENH: add a python-level get_handler_name
mattip May 6, 2021
ad9329b
ENH: add core.multiarray.get_handler_name
mattip May 6, 2021
< 8000 /a>
6d10fdb
Allow closure-like definition of the data mem routines
eliaskoromilas Jun 30, 2021
18bea05
Fix incompatible pointer warnings
eliaskoromilas Jun 30, 2021
4368023
Note PyDataMemAllocator and PyMemAllocatorEx differentiation
eliaskoromilas Jul 1, 2021
d243313
Redefine default allocator handling
eliaskoromilas Jul 1, 2021
c9b6854
Always allocate new arrays using the current_handler
eliaskoromilas Jul 5, 2021
a8fd378
Search for the mem_handler name of the data owner
eliaskoromilas Jul 5, 2021
a17565b
Sub-comparisons don't need a local mem_handler
eliaskoromilas Jul 5, 2021
2ec5912
Make the default_handler a valid PyDataMem_Handler
eliaskoromilas Jul 14, 2021
227c4b8
Fix PyDataMem_SetHandler description (NEP discussion)
eliaskoromilas Jul 14, 2021
fb8135d
Pass the allocators by reference
eliaskoromilas Jul 14, 2021
7291484
Implement allocator context-locality
eliaskoromilas Aug 8, 2021
c7a9c22
Fix documentation, make PyDataMem_GetHandler return const
eliaskoromilas Aug 8, 2021
99f8250
remove import of setuptools==49.1.3, doesn't work on python3.10
mattip Aug 9, 2021
b43c1fe
Fix refcount leaks
eliaskoromilas Aug 9, 2021
a7a5435
fix function signatures in test
mattip Aug 9, 2021
e3723df
Return early on PyDataMem_GetHandler error (VOID_compare)
eliaskoromilas Aug 9, 2021
144acc6
Add context/thread-locality tests, allow testing custom policies
eliaskoromilas Aug 9, 2021
8539f5f
Merge branch 'configurable_allocator' into configurable_allocator
eliaskoromilas Aug 9, 2021
6ab00d0
ENH: add and use global configurable memory routines
mattip Oct 11, 2020
e7e8754
ENH: add tests and a way to compile c-extensions from tests
mattip Oct 16, 2020
5f08532
fix allocation/free exposed by tests
mattip Oct 18, 2020
7266029
DOC: document the new APIs (and some old ones too)
mattip Oct 18, 2020
5d547ff
BUG: return void from FREE, also some cleanup
mattip Oct 19, 2020
4617c50
MAINT: changes from review
mattip Oct 19, 2020
8f739c4
fixes from linter
mattip Mar 11, 2021
90205b6
setting ndarray->descr on 0d or scalars mess with FREE
mattip Apr 16, 2021
5c0d3f9
make scalar allocation more consistent wrt np_alloc_cache
mattip Apr 16, 2021
3b385d9
change formatting for sphinx
mattip Apr 17, 2021
e6e12a3
remove memcpy variants
mattip Apr 19, 2021
048552d
update to match NEP 49
mattip May 3, 2021
ad6f8ad
ENH: add a python-level get_handler_name
mattip May 6, 2021
50f8b93
ENH: add core.multiarray.get_handler_name
mattip May 6, 2021
c7438f5
Allow closure-like definition of the data mem routines
eliaskoromilas Jun 30, 2021
f823ba4
Fix incompatible pointer warnings
eliaskoromilas Jun 30, 2021
ad13161
Note PyDataMemAllocator and PyMemAllocatorEx differentiation
eliaskoromilas Jul 1, 2021
0a08acd
Redefine default allocator handling
eliaskoromilas Jul 1, 2021
1f0301d
Always allocate new arrays using the current_handler
eliaskoromilas Jul 5, 2021
3d56aa0
Search for the mem_handler name of the data owner
eliaskoromilas Jul 5, 2021
8ea6818
Sub-comparisons don't need a local mem_handler
eliaskoromilas Jul 5, 2021
fb2af4d
Make the default_handler a valid PyDataMem_Handler
eliaskoromilas Jul 14, 2021
f05a1c6
Fix PyDataMem_SetHandler description (NEP discussion)
eliaskoromilas Jul 14, 2021
660e0a4
Pass the allocators by reference
eliaskoromilas Jul 14, 2021
a4f8d71
remove import of setuptools==49.1.3, doesn't work on python3.10
mattip Aug 9, 2021
d7b1a1d
fix function signatures in test
mattip Aug 9, 2021
ab1a0eb
try to fix cygwin extension building
mattip Aug 9, 2021
b92e36c
YAPF mem_policy test
eliaskoromilas Aug 9, 2021
76cda3a
Merge branch 'configurable_allocator' into configurable_allocator
eliaskoromilas Aug 9, 2021
3eadf2f
Less empty lines, more comments (tests)
eliaskoromilas Aug 10, 2021
dbe9d73
Apply suggestions from code review (set an exception and)
eliaskoromilas Aug 10, 2021
1bfb870
Merge pull request #57 from eliaskoromilas/configurable_allocator
mattip Aug 11, 2021
0511820
skip test on cygwin
mattip Aug 11, 2021
23c4bc0
update API hash for changed signature
mattip Aug 13, 2021
ed8649b
TST: add gc.collect to make sure cycles are broken
mattip Aug 13, 2021
9aacefa
Implement thread-locality for PyPy
eliaskoromilas Aug 12, 2021
79712fa
Update numpy/core/tests/test_mem_policy.py
mattip Aug 25, 2021
a2ae4c0
fixes from review
mattip Aug 25, 2021
09b9c0d
update circleci config
mattip Aug 25, 2021
efb3c77
fix test
mattip Aug 25, 2021
2945c64
make the connection between OWNDATA and having a allocator handle mor…
mattip Aug 25, 2021
3a97d9a
improve docstring, fix flake8 for tests
mattip Aug 26, 2021
1df805c
update PyDataMem_GetHandler() from review
mattip Aug 26, 2021
ef607bd
Implement allocator lifetime management
eliaskoromilas Aug 27, 2021
8bdc9a1
Merge pull request #59 from eliaskoromilas/configurable_allocator
mattip Aug 29, 2021
a3256e5
update NEP and add best-effort handling of error in PyDataMem_UserFREE
mattip Aug 29, 2021
4d6ea65
merge main into branch
mattip Aug 30, 2021
5941d7c
merge main into branch
mattip Oct 17, 2021
522c368
Merge branch 'main' into configurable_allocator
mattip Oct 25, 2021
442b0e1
ENH: fix and test for blindly taking ownership of data
mattip Oct 25, 2021
8ca8b54
Update doc/neps/nep-0049.rst
seberg Oct 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions doc/TESTS.rst.txt
8000
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,21 @@ originally written without unit tests, there are still several modules
that don't have tests yet. Please feel free to choose one of these
modules and develop tests for it.

Using C code in tests
---------------------

NumPy exposes a rich :ref:`C-API<c-api>` . These are tested using c-extension
modules written "as-if" they know nothing about the internals of NumPy, rather
using the official C-API interfaces only. Examples of such modules are tests
for a user-defined ``rational`` dtype in ``_rational_tests`` or the ufunc
machinery tests in ``_umath_tests`` which are part of the binary distribution.
Starting from version 1.21, you can also write snippets of C code in tests that
will be compiled locally into c-extension modules and loaded into python.

.. currentmodule:: numpy.testing.extbuild

.. autofunction:: build_and_import_extension

Labeling tests
--------------

Expand Down
27 changes: 14 additions & 13 deletions doc/neps/nep-0049.rst
6D40
Original file line number Diff line number Diff line change
Expand Up @@ -93,19 +93,21 @@ High level design

Users who wish to change the NumPy data memory management routines will use
:c:func:`PyDataMem_SetHandler`, which uses a :c:type:`PyDataMem_Handler`
structure to hold pointers to functions used to manage the data memory.
structure to hold pointers to functions used to manage the data memory. In
order to allow lifetime management of the ``context``, the structure is wrapped
in a ``PyCapsule``.

Since a call to ``PyDataMem_SetHandler`` will change the default functions, but
that function may be called during the lifetime of an ``ndarray`` object, each
``ndarray`` will carry with it the ``PyDataMem_Handler`` struct used at the
time of its instantiation, and these will be used to reallocate or free the
data memory of the instance. Internally NumPy may use ``memcpy`` or ``memset``
on the pointer to the data memory.
``ndarray`` will carry with it the ``PyDataMem_Handler``-wrapped PyCapsule used
at the time of its instantiation, and these will be used to reallocate or free
the data memory of the instance. Internally NumPy may use ``memcpy`` or
``memset`` on the pointer to the data memory.

The name of the handler will be exposed on the python level via a
``numpy.core.multiarray.get_handler_name(arr)`` function. If called as
``numpy.core.multiarray.get_handler_name()`` it will return the name of the
global handler that will be used to allocate data for the next new `ndarrray`.
handler that will be used to allocate data for the next new `ndarrray`.

NumPy C-API functions
=====================
Expand Down Expand Up @@ -150,20 +152,19 @@ NumPy C-API functions
15780_ and 15788_ but has not yet been resolved. When it is this NEP should
be revisited.

.. c:function:: const PyDataMem_Handler * PyDataMem_SetHandler(PyDataMem_Handler *handler)
.. c:function:: PyObject * PyDataMem_SetHandler(PyObject *handler)

Sets a new allocation policy. If the input value is ``NULL``, will reset
the policy to the default. Returns the previous policy, ``NULL`` if the
previous policy was the default. We wrap the user-provided functions
the policy to the default. Return the previous policy, or
return NULL if an error has occurred. We wrap the user-provided
so they will still call the Python and NumPy memory management callback
hooks. All the function pointers must be filled in, ``NULL`` is not
accepted.

.. c:function:: const PyDataMem_Handler * PyDataMem_GetHandler(PyArrayObject *obj)
.. c:function:: const PyObject * PyDataMem_GetHandler()

Return the ``PyDataMem_Handler`` used by the
``PyArrayObject``. If ``NULL``, return the handler
that will be used to allocate data for the next ``PyArrayObject``.
Return the current policy that will be used to allocate data for the
next ``PyArrayObject``. On failure, return ``NULL``.

``PyDataMem_Handler`` thread safety and lifetime
================================================
Expand Down
119 changes: 119 additions & 0 deletions doc/source/reference/c-api/data_memory.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
Memory management in NumPy
==========================

The `numpy.ndarray` is a python class. It requires additional memory allocations
to hold `numpy.ndarray.strides`, `numpy.ndarray.shape` and
`numpy.ndarray.data` attributes. These attributes are specially allocated
after creating the python object in `__new__`. The ``strides`` and
``shape`` are stored in a piece of memory allocated internally.

The ``data`` allocation used to store the actual array values (which could be
pointers in the case of ``object`` arrays) can be very large, so NumPy has
provided interfaces to manage its allocation and release. This document details
how those interfaces work.

Historical overview
-------------------

Since version 1.7.0, NumPy has exposed a set of ``PyDataMem_*`` functions
(:c:func:`PyDataMem_NEW`, :c:func:`PyDataMem_FREE`, :c:func:`PyDataMem_RENEW`)
which are backed by `alloc`, `free`, `realloc` respectively. In that version
NumPy also exposed the `PyDataMem_EventHook` function described below, which
wrap the OS-level calls.

Since those early days, Python also improved its memory management
capabilities, and began providing
various :ref:`management policies <memoryoverview>` beginning in version
3.4. These routines are divided into a set of domains, each domain has a
:c:type:`PyMemAllocatorEx` structure of routines for memory management. Python also
added a `tracemalloc` module to trace calls to the various routines. These
tracking hooks were added to the NumPy ``PyDataMem_*`` routines.

NumPy added a small cache of allocated memory in its internal
``npy_alloc_cache``, ``npy_alloc_cache_zero``, and ``npy_free_cache``
functions. These wrap ``alloc``, ``alloc-and-memset(0)`` and ``free``
respectively, but when ``npy_free_cache`` is called, it adds the pointer to a
short list of available blocks marked by size. These blocks can be re-used by
subsequent calls to ``npy_alloc*``, avoiding memory thrashing.

Configurable memory routines in NumPy (NEP 49)
----------------------------------------------

Users may wish to override the internal data memory routines with ones of their
own. Since NumPy does not use the Python domain strategy to manage data memory,
it provides an alternative set of C-APIs to change memory routines. There are
no Python domain-wide strategies for large chunks of object data, so those are
less suited to NumPy's needs. User who wish to change the NumPy data memory
management routines can use :c:func:`PyDataMem_SetHandler`, which uses a
:c:type:`PyDataMem_Handler` structure to hold pointers to functions used to
manage the data memory. The calls are still wrapped by internal routines to
call :c:func:`PyTraceMalloc_Track`, :c:func:`PyTraceMalloc_Untrack`, and will
use the :c:func:`PyDataMem_EventHookFunc` mechanism. Since the functions may
change during the lifetime of the process, each ``ndarray`` carries with it the
functions used at the time of its instantiation, and these will be used to
reallocate or free the data memory of the instance.

.. c:type:: PyDataMem_Handler

A struct to hold function pointers used to manipulate memory

.. code-block:: c

typedef struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to reference the builtin PyMemAllocatorEx somewhere in this section and compare the two (https://docs.python.org/3/c-api/memory.html#customize-memory-allocators).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add. The original npy_*_cache set of functions uses PyDataMem_* functions, which it seems preceeded these more sophisticated interfaces and directly used malloc/calloc/free.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented. The Python ones do not use our non-documented-but-public PyDataMem_EventHookFunc callbacks. So I could see a path where we deprecate PyDataMem_EventHookFunc and move to the Python memory management strategies, although that would mean

  • if someone by chance implemented a PyDataMem_EventHookFunc callback it would no longer work.
  • in order to override data allocations, a user would also override other random memory allocations

char name[128]; /* multiple of 64 to keep the struct aligned */
PyDataMemAllocator allocator;
} PyDataMem_Handler;

where the allocator structure is

.. code-block:: c

/* The declaration of free differs from PyMemAllocatorEx */
typedef struct {
void *ctx;
void* (*malloc) (void *ctx, size_t size);
void* (*calloc) (void *ctx, size_t nelem, size_t elsize);
void* (*realloc) (void *ctx, void *ptr, size_t new_size);
void (*free) (void *ctx, void *ptr, size_t size);
} PyDataMemAllocator;

.. c:function:: PyObject * PyDataMem_SetHandler(PyObject *handler)

Set a new allocation policy. If the input value is ``NULL``, will reset the
policy to the default. Return the previous policy, or
return ``NULL`` if an error has occurred. We wrap the user-provided functions
so they will still call the python and numpy memory management callback
hooks.

.. c:function:: PyObject * PyDataMem_GetHandler()

Return the current policy that will be used to allocate data for the
next ``PyArrayObject``. On failure, return ``NULL``.

For an example of setting up and using the PyDataMem_Handler, see the test in
:file:`numpy/core/tests/test_mem_policy.py`

.. c:function:: void PyDataMem_EventHookFunc(void *inp, void *outp, size_t size, void *user_data);

This function will be called during data memory manipulation

.. c:function:: PyDataMem_EventHookFunc * PyDataMem_SetEventHook(PyDataMem_EventHookFunc *newhook, void *user_data, void **old_data)

Sets the allocation event hook for numpy array data.

Returns a pointer to the previous hook or ``NULL``. If old_data is
non-``NULL``, the previous user_data pointer will be copied to it.

If not ``NULL``, hook will be called at the end of each ``PyDataMem_NEW/FREE/RENEW``:

.. code-block:: c

result = PyDataMem_NEW(size) -> (*hook)(NULL, result, size, user_data)
PyDataMem_FREE(ptr) -> (*hook)(ptr, NULL, 0, user_data)
result = PyDataMem_RENEW(ptr, size) -> (*hook)(ptr, result, size, user_data)

When the hook is called, the GIL will be held by the calling
thread. The hook should be written to be reentrant, if it performs
operations that might cause new allocation events (such as the
creation/destruction numpy objects, or creating/destroying Python
objects which might cause a gc)
1 change: 1 addition & 0 deletions doc/source/reference/c-api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,4 @@ code.
generalized-ufuncs
coremath
deprecations
data_memory
10 changes: 10 additions & 0 deletions numpy/core/_add_newdocs.py
Original file line number Diff line number Diff line change
Expand Up @@ -4727,6 +4727,16 @@
and then throwing away the ufunc.
""")

add_newdoc('numpy.core.multiarray', 'get_handler_name',
"""
get_handler_name(a: ndarray) -> str,None

Return the name of the memory handler used by `a`. If not provided, return
the name of the memory handler that will be used to allocate data for the
next `ndarray` in this context. May return None if `a` does not own its
memory, in which case you can traverse ``a.base`` for a memory handler.
""")

add_newdoc('numpy.core.multiarray', '_set_madvise_hugepage',
"""
_set_madvise_hugepage(enabled: bool) -> bool
Expand Down
4 changes: 3 additions & 1 deletion numpy/core/code_generators/cversions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,7 @@
# DType related API additions.
# A new field was added to the end of PyArrayObject_fields.
# Version 14 (NumPy 1.21) No change.
# Version 14 (NumPy 1.22) No change.
0x0000000e = 17a0f366e55ec05e5c5c149123478452

# Version 15 (NumPy 1.22) Configurable memory allocations
0x0000000f = 0c420aed67010594eb81f23ddfb02a88
9 changes: 6 additions & 3 deletions numpy/core/code_generators/numpy_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,9 @@
# End 1.6 API
}

#define NPY_NUMUSERTYPES (*(int *)PyArray_API[6])
#define PyBoolArrType_Type (*(PyTypeObject *)PyArray_API[7])
#define _PyArrayScalar_BoolValues ((PyBoolScalarObject *)PyArray_API[8])
# define NPY_NUMUSERTYPES (*(int *)PyArray_API[6])
# define PyBoolArrType_Type (*(PyTypeObject *)PyArray_API[7])
# define _PyArrayScalar_BoolValues ((PyBoolScalarObject *)PyArray_API[8])

multiarray_funcs_api = {
'PyArray_GetNDArrayCVersion': (0,),
Expand Down Expand Up @@ -350,6 +350,9 @@
'PyArray_ResolveWritebackIfCopy': (302,),
'PyArray_SetWritebackIfCopyBase': (303,),
# End 1.14 API
'PyDataMem_SetHandler': (304,),
'PyDataMem_GetHandler': (305,),
# End 1.21 API
}

ufunc_types_api = {
Expand Down
38 changes: 32 additions & 6 deletions numpy/core/include/numpy/ndarraytypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -355,12 +355,10 @@ struct NpyAuxData_tag {
#define NPY_ERR(str) fprintf(stderr, #str); fflush(stderr);
#define NPY_ERR2(str) fprintf(stderr, str); fflush(stderr);

/*
* Macros to define how array, and dimension/strides data is
* allocated.
*/

/* Data buffer - PyDataMem_NEW/FREE/RENEW are in multiarraymodule.c */
/*
* Macros to define how array, and dimension/strides data is
* allocated. These should be made private
*/

#define NPY_USE_PYMEM 1

Expand Down Expand Up @@ -666,6 +664,24 @@ typedef struct _arr_descr {
PyObject *shape; /* a tuple */
} PyArray_ArrayDescr;

/*
* Memory handler structure for array data.
*/
/* The declaration of free differs from PyMemAllocatorEx */
typedef struct {
void *ctx;
void* (*malloc) (void *ctx, size_t size);
void* (*calloc) (void *ctx, size_t nelem, size_t elsize);
void* (*realloc) (void *ctx, void *ptr, size_t new_size);
void (*free) (void *ctx, void *ptr, size_t size);
} PyDataMemAllocator;

typedef struct {
char name[128]; /* multiple of 64 to keep the struct aligned */
PyDataMemAllocator allocator;
} PyDataMem_Handler;


/*
* The main array object structure.
*
Expand Down Expand Up @@ -716,6 +732,10 @@ typedef struct tagPyArrayObject_fields {
/* For weak references */
PyObject *weakreflist;
void *_buffer_info; /* private buffer info, tagged to allow warning */
/*
* For malloc/calloc/realloc/free per object
*/
PyObject *mem_handler;
} PyArrayObject_fields;

/*
Expand Down Expand Up @@ -1659,6 +1679,12 @@ PyArray_CLEARFLAGS(PyArrayObject *arr, int flags)
((PyArrayObject_fields *)arr)->flags &= ~flags;
}

static NPY_INLINE NPY_RETURNS_BORROWED_REF PyObject *
PyArray_HANDLER(PyArrayObject *arr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be public API? Otherwise can we move it or at least hide it behind the "internal" define to be clear about it?

I actually also wonder if we should call it arr->_mem_handler to at least ask people to not access it directly (more guard possible, but probably not necessary/useful)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArray_HANDLER is currently the only API to get (not set) an array's mem_handler. Exposing it in the public API will make advanced users' lives easier (don't make them access it directly :). It's no more risky than e.g. PyArray_BASE.

Actual use-cases:

  1. Find the "true" array allocator (the one that alloacated the underlying data).
while (arr != NULL && PyArray_Check(arr)) {
	if (PyArray_CHKFLAGS((PyArrayObject *) arr, NPY_ARRAY_OWNDATA)) {
		PyObject *_handler_ = PyArray_HANDLER((PyArrayObject *) arr);
		if (!_handler_) {
			PyErr_SetString(PyExc_RuntimeError, "no memory handler found but OWNDATA flag set");
			return -1;
		}
		PyDataMem_Handler *handler = (PyDataMem_Handler *) PyCapsule_GetPointer(_handler_, "mem_handler");
		if (!handler) {
			return -1;
		}
		printf("### %s ###", handler->name);
		return 0;
	}
	arr = PyArray_BASE((PyArrayObject *) arr);
}
PyErr_SetString(PyExc_ValueError, "argument must be an ndarray");
return -1;
  1. Allocate a new array with another's allocator.
PyObject *a_arr_handler = PyArray_HANDLER(a_arr);
if (!a_arr_handler) {
	PyErr_SetString(PyExc_RuntimeError, "no memory handler found");
	return -1;
}
PyObject *old_handler = PyDataMem_SetHandler(a_arr_handler);
Py_DECREF(a_arr_handler);
if (!old_handler) {
	return -1;
}

// construct b_arr

if (!PyDataMem_SetHandler(old_handler)) {
	Py_DECREF(old_handler);
	return -1;
}
Py_DECREF(old_handler);

// a_arr and b_arr are handled by the same allocator

return 0;

Copy link
Member
@seberg seberg Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's no more risky than e.g. PyArray_BASE

EDIT: Sorry forgot to add :). I agree it is no different, but we cannot discuss any modifications to that, while we could maybe fathom modifications here?

OK, yeah, we do want to be able to fetch the allocator. I guess the question I have is then whether we want to define the PyObject * that is returned as actually opaque? Do we consider the fact that this is currently a capsule to be stable ABI? I suppose it is stable enough, in the sense that it is not terrible if users get the content, since it will at least fail graciously if things change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my other comment:

To summarize the proposed API looks like this:

  • PyObject *PyDataMem_GetHandler() # current allocator getter (new reference)

  • PyObject *PyDataMem_SetHandler(PyObject *) # current allocator setter, return the previous allocator (new reference)

  • PyObject *PyArray_HANDLER(PyArrayObject *) # Return the memory handler of an array, or NULL if no memory handler available, which implies that the array doesn't own it's own data and the user should have searched for a base that does (borrowed reference)

I think it's clean enough.

Copy link
Contributor
@eliaskoromilas eliaskoromilas Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yeah, we do want to be able to fetch the allocator. I guess the question I have is then whether we want to define the PyObject * that is returned as actually opaque? Do we consider the fact that this is currently a capsule to be stable ABI? I suppose it is stable enough, in the sense that it is not terrible if users get the content, since it will at least fail graciously if things change.

I guess, since this is the same PyObject representation that also PyDataMem_GetHandler/PyDataMem_SetHandler return/accept it has to be stable ABI. The PyDataMem_Handlerstruct that is encapsulated, on the other hand, could be "versioned". Is that what you mean?

If yes, a nice way to do it could be through the capsule name. For example:

user:

handler_capsule = PyCapsule_New(my_handler, "v1", destructor);

numpy:

if (!PyCapsule_IsValid(arr->mem_handler, "v1") {
   // get pointer and cast it as PyDataMem_Handler_v1
} else if (!PyCapsule_IsValid(arr->mem_handler, "v2") {
  // get pointer and cast it as PyDataMem_Handler_v2
} else {
  // unknown version
}

{
return ((PyArrayObject_fields *)arr)->mem_handler;
}

#define PyTypeNum_ISBOOL(type) ((type) == NPY_BOOL)

#define PyTypeNum_ISUNSIGNED(type) (((type) == NPY_UBYTE) || \
Expand Down
4 changes: 2 additions & 2 deletions numpy/core/multiarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@
'count_nonzero', 'c_einsum', 'datetime_as_string', 'datetime_data',
'dot', 'dragon4_positional', 'dragon4_scientific', 'dtype',
'empty', 'empty_like', 'error', 'flagsobj', 'flatiter', 'format_longfloat',
'frombuffer', 'fromfile', 'fromiter', 'fromstring', 'inner',
'interp', 'interp_complex', 'is_busday', 'lexsort',
'frombuffer', 'fromfile', 'fromiter', 'fromstring', 'get_handler_name',
Copy link
Member
@seberg seberg Aug 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'frombuffer', 'fromfile', 'fromiter', 'fromstring', 'get_handler_name',
'frombuffer', 'fromfile', 'fromiter', 'fromstring', 'get_handler_name',

I think we should maybe add an additional word to make "handler" explicit, since this is top-level.

EDIT: sorry, meant to make a "review" not individual comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it is only in core, so I don't mind the name (although maybe "allocator" or so would be more clear anyway)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave this for a future PR? I am really bad with names. I didn't want to use "allocator" since it is really a "allocator/free policy handler", and ended up with "handler"

'inner', 'interp', 'interp_complex', 'is_busday', 'lexsort',
'matmul', 'may_share_memory', 'min_scalar_type', 'ndarray', 'nditer',
'nested_iters', 'normalize_axis_index', 'packbits',
'promote_types', 'putmask', 'ravel_multi_index', 'result_type', 'scalar',
Expand Down
4 changes: 2 additions & 2 deletions numpy/core/setup_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@
# 0x0000000d - 1.19.x
# 0x0000000e - 1.20.x
# 0x0000000e - 1.21.x
# 0x0000000e - 1.22.x
C_API_VERSION = 0x0000000e
# 0x0000000f - 1.22.x
C_API_VERSION = 0x0000000f

class MismatchCAPIWarning(Warning):
pass
Expand Down
Loading
0