8000 BUG: array interface PyCapsule reference by burlen · Pull Request #20674 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: array interface PyCapsule reference #20674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 26, 2022

Conversation

burlen
Copy link
Contributor
@burlen burlen commented Dec 29, 2021

With this change numpy holds a reference to the PyCapsule by which the
PyArrayInterface is returned in addition to the object which supplied
it. This lets numpy signal (via the PyCapsule destructor) when it is
finished with the data, which may not be managed by the object which
provided the PyCapsule instance. The old behavior of holding a reference
only to the object which provided the PyCapsule is kept to ensure backward
compatibility.

closes gh-20673

@burlen
Copy link
Contributor Author
burlen commented Dec 29, 2021

there is a small self contained example I've used for testing this change here
https://github.com/burlen/test_array_interface

/* a tuple to hold references */
PyObject *refs = PyTuple_New(2);
if (!refs) {
PyErr_SetString(PyExc_MemoryError, "Failed to allocate a Tuple of size 2");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error should already be set, and since it is an allocation error the less done after that the better.

Suggested change
PyErr_SetString(PyExc_MemoryError, "Failed to allocate a Tuple of size 2");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. Thanks for pointing this out. I made the suggested change.

@mattip
Copy link
Member
mattip commented Dec 29, 2021

What is printed for the a.base before and after this change?

@mattip
Copy link
Member
mattip commented Dec 29, 2021

The sample seems small enough it could be added as a test, using testing.extbuild to build a c-extension. This should probably be marked with @pytest.mark.slow.

With this change numpy holds a reference to the PyCapsule by which the
PyArrayInterface is returned in addition to the object which supplied
it. This lets numpy signal (via the PyCapsule destructor) when it is
finished with the data, which may not be managed by the object which
provided the PyCapsule instance. The old behavior of holding a reference
only to the object which provided the PyCapsule is kept to ensure backward
compatibility.

closes numpygh-20673
@burlen burlen force-pushed the array_interface_reference branch from 5ed9183 to a2d854b Compare December 29, 2021 19:53
@burlen
Copy link
Contributor Author
burlen commented Dec 29, 2021

What is printe 8000 d for the a.base before and after this change?

before:

nparr.base = <buffer.buffer; proxy of <Swig Object of type 'buffer *' at 0x7f1186c5a180> >

after:

nparr.base = (<buffer.buffer; proxy of <Swig Object of type 'buffer *' at 0x7f4183e1a870> >, <capsule object NULL at 0x7f4183e1a900>)

The above output was obtained from the example I linked to above.

The sample seems small enough it could be added as a test

Thanks for the pointers. I added a simplified test based on the example. The test is in a second commit to make it easy to experiment before/after the bug fix patch.

The test exercises data shared through the c-struct method of the array
interface protocol.
@burlen burlen force-pushed the array_interface_reference branch from a2d854b to 5fa7168 Compare December 29, 2021 20:15
@mattip
Copy link
Member
mattip commented Dec 29, 2021

Cool. The extension building code is rather new, so suggestions to make the workflow easier would be appreciated.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Dec 30, 2021
PyObject *ret = PyArray_NewFromDescrAndBase(
&PyArray_Type, thetype,
inter->nd, inter->shape, inter->strides, inter->data,
inter->flags, NULL, input);
Py_DECREF(attr);
inter->flags, NULL, refs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not much of an opinion on whether this should be backported). But what do you think about my proposal to change this to:

ctx = PyCapsule_GetContext(attr)
if (ctx == NULL && PyErr_Occurred()) {
    /* error cleanup */
    return NULL;
}
/* ctx should not be NULL, but do not rely on it for now: */ 
ref = ctx != NULL ? ctx : input;

I feel like that may be a nice way to achieve the same thing, and makes me more comfortable with the change.

The copy=NEVER champions could look into adding something a ctx == input check if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about my proposal to change this to

Two things come to mind.

Note that the type of the PyCapsule's context field is void *, and that the base argument to PyArray_NewFromDescrAndBase is type PyObject*.

PyArray_NewFromDescrAndBase(
PyTypeObject *subtype, PyArray_Descr *descr,
int nd, npy_intp const *dims, npy_intp const *strides, void *data,
int flags, PyObject *obj, PyObject *base)

The proposed change would make memory management of resources other than PyObject's more complicated because it would require the capsule's context field be a PyObject but in data shared with numpy from C/C++ the most common use case would be management of temporary buffers allocated with new or malloc. See the self contained example I've been using for testing linked to above for an example. With the proposed change, the capsule's context field could no longer be used to track such resources directly, and the capsule's destructor could no longer be used to directly release them. Such resources would instead need to be managed by a secondary object. This could be accomplished, by setting the context field to a second PyCapsule, but it is more complicated to do it that way.

There's also a potential type safety issue introduced here because thePyCapsule's context is type void * . It would be easy to misuse this design by inadvertently passing something other than a PyObject as the context. Such accidental misuse would lead to an invalid memory access, and would potentially crash with a segv. There's not a safe way in C to check if the memory pointed to by the void* context is in fact a PyObject* before numpy uses it. One could argue that it's the responsibility of the sharer of the data, but it's worth noting that this issue isn't present in the original proposal.

I don't know that these two things are necessarily show stoppers for your suggestion, but they might be worth considering.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the type of the PyCapsule's context field is void , and that the base argument to PyArray_NewFromDescrAndBase is type PyObject.

Hmm, missed that. I guess the docs specify it must be an object, but this was never really enforced, so it is unclear that we should do it.
I can also see the point that it would limit your flexibility compared to this solution, have to think a bit more about this, but if others want to move forward, please do not worry too much... I suppose in the end we have "flags" we can use to tag on new information to add new information (not ideal but OK).

A small problem would be if there are consumers other than NumPy, but that would only affect your type actually using this.

@mattip mattip added triage review Issue/PR to be discussed at the next triage meeting and removed triage review Issue/PR to be discussed at the next triage meeting labels Jan 11, 2022
@mattip
Copy link
Member
mattip commented Jan 12, 2022

In the latest developer meeting we discussed:

  • we should hold on to the PyCapsule reference.
  • we should deprecate __array_struct__, there are too many interop protocols. See PR DOC: Added explanation document on interoperability #20185 to document them.
  • find a way to allow interop with user-defined dtypes
  • find a way to have an interop with data that we can be sure is never copied

@seberg correct me if I misrepresented the discussion. The bottom line is to polish any rough edges in this PR and accept it.

@seberg
Copy link
Member
seberg commented Jan 12, 2022

Yes, I still don't love it, but that is fine :). I am unsure if we should backport this, as I think it largly extends the protocol rather than fixing a bug per-se and it changes arr.base from something that may often be meaningful, into something hard to make sense of (a tuple).

For "rough edges" beyond my "why :)?": It would be nice to just use PyTuple_Pack(2, ...) which will be much shorter code.

@charris
Copy link
Member
charris commented Jan 12, 2022

it changes arr.base

@seberg Could you expand on this?

@mattip
Copy link
Member
mattip commented Jan 12, 2022

@charris: before this PR, a.base was a reference to the object inside the PyCapsule. After this PR, it is a tuple of (ref-to-object, PyCapsule). See this comment above

@seberg
Copy link
Member
seberg commented Jan 12, 2022

Previously, if I had an object that had an array struct (say JAX implemented it, though they do not):

jax_arr = jax.numpy.Array([1, 2, 3, 4])
numpy_arr = np.asarray(jax_arr)

Then, before:

numpy_arr.base is jax_arr

while now we will have:

numpy_arr.base == (jax_arr, some_capsule)

@charris
Copy link
Member
charris commented Jan 12, 2022

numpy_arr = np.asarray(jax_arr)

Ah, so jax_arr is exporting a PyCapsule to np.asarray?

@seberg
Copy link
Member
seberg commented Jan 12, 2022

Yes, it would export a capsule containing the same information that __array_interface__ would contain. That capsule should keep the data alive (i.e. have co-ownership). The proposal is to store the capsule as well now. This enables the capsule to be the sole owner of the data.

This makes the protocol safer, since before passing around an __array_struct__ might remove it from the actual owner (__array_interface__ has the problem as well).

I agree with this, and I am starting to agree that from my "long term" thinking, I just shouldn't care about this :). Long term __array__() is probably the only good protocol, and the easiest to expand with additional functionality.

@mattip
Copy link
Member
mattip commented Jan 26, 2022

Thanks @burlen

@mattip
Copy link
Member
mattip commented Jan 26, 2022

Next thing to do is to deprecate __array_struct__ ...

@charris
Copy link
Member
charris commented Jan 26, 2022

I marked for backport, but the changed return type worries me. Is there a good reason to backport this, or should it wait on 1.23?

@seberg
Copy link
Member
seberg commented Jan 26, 2022

I tend to not backporting, I view this more as an enhancement rather than a bug-fix to begin with.

@mattip
Copy link
Member
mattip commented Jan 26, 2022

I agree with @seberg

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jan 26, 2022
@charris
Copy link
Member
charris commented Jan 26, 2022

Thanks. I unmarked it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: array interface c-struct memory management, the wrong reference is held
4 participants
0