8000 ENH: Provide PyArray_CastBuffer for numpy 1.x compatibility · Issue #27624 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Provide PyArray_CastBuffer for numpy 1.x compatibility #27624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sergiopasra opened this issue Oct 23, 2024 · 7 comments
Closed

ENH: Provide PyArray_CastBuffer for numpy 1.x compatibility #27624

sergiopasra opened this issue Oct 23, 2024 · 7 comments

Comments

@sergiopasra
Copy link

Proposed new feature or change:

PyArray_GetCastFunc was removed in numpy 2.0. In the documentation of the change is it mentioned that:

In case you require an alternative, please let us know so we can create new API such as PyArray_CastBuffer() which could use old or new cast functions depending on the NumPy version.

So I'm requesting the creation of such API PyArray_CastBuffer()so that the functionality of PyArray_GetCastFunc can be still be used in numpy 2

Contex

I'm the maintainer of several processing pipelines for the 10 meter telescope GTC, at la Palma Observatory, Spain.

Sometimes we need to operate over ~100 2000x2000 arrays. I have implemented the operations using PyArrayIter in C, and it works well, but our current code doesn't compile in numpy 2.0, in part because PyArray_GetCastFunc has been removed.

I have reimplemented our code using NpyIter_MultiNew and it works well in numpy 2 and the code is cleaner, but then we can't use it because we routinely reach the limit of NPY_MAXARGS (#4398), we would need at least NPY_MAXARGS=256 to recover the functionality we have today.

@ngoldbaum
Copy link
Member

Ping @seberg - maybe we could change something about NpyIter_MultiNew to avoid needing the new API? I don't have enough context to usefully weigh in here.

@seberg
Copy link
Member
seberg commented Oct 24, 2024

Removing the maxargs limit is possible, I am not sure how hard it will be. One of those things that is probably not terribly hard, but needs someone to work on concentrated for a while.

I'll try to look at the code that uses this function. The removed API was 8000 very odd (most of the code was unused in NumPy!).

I suppose a "fix" might be to go back to iterating all arrays indvidiually (with the cast included in the new iterator).
But I agree that isn't pretty and presumably a decent amount slower (you will also need to ensure identical iteration order and iterator shape to manually broadcast).

I don't mind adding that function, although in this use-case I suspect even that individual iterator setup would probably perform better and be clearer.

@sergiopasra
Copy link
Author

Thank you @seberg and @ngoldbaum for looking into this.

Removing the maxargs limit is possible, I am not sure how hard it will be. One of those things that is probably not terribly hard, but needs someone to work on concentrated for a while.

I've been searching in the code a little bit. In some cases it seems to that it can be changed by a malloc/free, NewNpyArrayIterObject_tag contains also variables with size maxargs (readflags, writeflags). In this case I don't know where the fields can be allocated. I must say that I'm not familiar with the life cicle of numpy objects.

I'll try to look at the code that uses this function. The removed API was very odd (most of the code was unused in NumPy!).

I suppose a "fix" might be to go back to iterating all arrays individually (with the cast included in the new iterator). But I agree that isn't pretty and presumably a decent amount slower (you will also need to ensure identical iteration order and iterator shape to manually broadcast).

I initially tried that, but I couldn't find information in the docs about the casting. Furthermore, astronomical images in FITS format are always stored in big-endian byte order, so I have to bitswap the pointers before casting.
I was honestly very happy when I realized that NpyIter_MultiNew performs the bitswap and the casting.

I don't mind adding that function, although in this use-case I suspect even that individual iterator setup would probably perform better and be clearer.

It would work for me, but I don't know how perform the bitswap and casting with the new iterators. Is it in the public API?

@seberg
Copy link
Member
seberg commented Oct 25, 2024

contains also variables with size maxargs (readflags, writeflags).

There should be a few places where we have static arrays in functions. All of them would be nice to replace, but I suspect we do need to make sure to not litter the code with actual malloc/PyMem_Malloc calls.
(Which, in C, probably means a little dance to use a small static array if the number of arrays is small, and otherwise just allocate.)

I suspect the NpyIter object itself should be (almost?) fine with it, but if it needs tweaks, they will be a bit hard to figure out indeed.

So if you have some time, I would definite review all parts of this, and it would be nice to make progress, even if we don't finish it.

but I don't know how perform the bitswap and casting with the new iterators. Is it in the public API

Yes, of course it can do all of that for you! Please don't hesitate to ask (or maybe join the NumPy slack even) if you need more pointers.

The path to iterating the arrays individually, would be to just use NpyIter_New. Its API is a little bit limited in case you need to do broadcasting (you can't pass the shape). But you can always use MultiNew or AdvancedNew with a single array.

Unless you know for sure, you do need to pass the order, and you need to know that all arrays have the same shape ahead of time, manually broadcast them (or use the advanced version, which allows passing a shape).
You should set a few flags, such as Buffered probably (but I suspect you already have that).

The main downside of not using the multi-iter, is that advancing the iterators is just more work. Typically, there is the "External loop" optimization, which allows you to do the inner-most iteration in a simple C for-loop (with strides).
With many iterators, I am not sure you can make it work (although it may in practice), because the inner-most loop is in theory not the same size (but in practice maybe it is!)

Anyway, you still will gain performance, I think. Because buffered casting should offset the downsides (compared to the old code).

@seberg
Copy link
Member
seberg commented Nov 7, 2024

@sergiopasra did you ever make progress here or should we discuss/look into it further? (The request to make it public is still good of course, even if I don't really think it would be a great solution here as such.)

@sergiopasra
Copy link
Author

Hi, I think I have a workaround for my problem by using a list of NpyIter objects created with NpyIter_MultiNew.

Each of them has less than maxargs operands. I have to manage the iters in sync, but it seems to work nicely with my use case. I think I will not need the compatibility API, so this issue can be closed. Thank very much for your suggestions, they helped me a lot.

@seberg
Copy link
Member
seberg commented Dec 31, 2024

@sergiopasra maybe not that helpful unfortunately, but NumPy 2.3 should remove the MAXARG limitation from the iterator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0