8000 PoC,WIP: Proof of concept for extending buffer protocol string by seberg · Pull Request #50 · seberg/numpy · GitHub
[go: up one dir, main page]

Skip to content

PoC,WIP: Proof of concept for extending buffer protocol string #50

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

seberg
Copy link
Owner
@seberg seberg commented Mar 4, 2025

This is a proof of concept, that allows roundtripping code such as:

a = np.arange(100000).astype("T")
print(memoryview(a).format)  # just to see it
via_memview_string = np.asarray(memoryview(a))

a = np.arange(1000).astype("m8[ms]")  # or even struct "m8[...],i"
print(memoryview(a).format)  # just to see it
via_memview_timedelta = np.asarray(memoryview(a))

As well as a PoC for basic Cython exposure as a typed memory-view (see below).

I opted to simply store the descr pointer directly. This seems just as well because the descr owns the side-car buffer. One could still export it to non-Python (somewhat) since accessing that side-car buffer doesn't need Python API in the end (just struct layout information).

For the times I opted to store two hex ints. From Python that is awkward, but our time enum is public anyway. (But we could also store ns, etc. strings.)


Note: This branch shows a minimal PoC extension to access NumPy string arrays via Cython (these are particularly difficulty, because one needs the allocator to access elements).
Because of this, it requires using this branch of Cython da-woods/cython#5

seberg added 3 commits March 3, 2025 16:55
This closes numpygh-28190 and fixes another issue in the initial code
that triggered the regression.

Note that we may still want to avoid this, since this does lead to
constructing (view compatible) structured dtypes unnecessarily here.

It would also compactify the dtype.  For building unnecessary dtypes,
the better solution may be to just introduce a "canonical" flag to
the dtypes (now that we have the space).
This is a proof of concept, that allows roundtripping code such
as:
```
a = np.arange(100000).astype("T")
print(memoryview(a).format)  # just to see it
via_memview_string = np.asarray(memoryview(a))

a = np.arange(1000).astype("m8[ms]")  # or even struct "m8[...],i"
print(memoryview(a).format)  # just to see it
via_memview_timedelta = np.asarray(memoryview(a))

```
I opted to simply store the `descr` pointer directly.  This seems
just as well because the descr owns the side-car buffer.
One could still export it to non-Python (somewhat) since accessing
that side-car buffer _doesn't_ need Python API in the end (just
struct layout information).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
0