8000 ENH: Introduce multiple pair parameters in the 'repeat' function by PLameiras · Pull Request #23937 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Introduce multiple pair parameters in the 'repeat' function #23937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

PLameiras
Copy link
Contributor
@PLameiras PLameiras commented Jun 14, 2023

This feature enables pairs of parameters to be passed to the NumPy's repeat function: tuple arguments can now be passed to the 'repeats' parameter and the ‘axis’ parameter can now also receive a sequence of integers. Data types are properly checked. Flattened output arrays (non-specified axes) and repeats broadcasted to the size of the paired axis are also taken into account. For instance, if two pairs of arguments are used and the second one doesn't have an axis specified, a flat output array is returned. This feature was first suggested in #21435.

Moreover, the multiple repeats are processed in ascending order, meaning the repeats that result in a smaller size of its axis in the intermediate output array are processed first. This adjustment renders a processing time reduction of approximately 50% in significantly large repeats (i.e. over 100 repeats per axis).

This enhancement makes the repeat function more versatile and elegant. The greater the number of dimensions of the input array to be repeated over a axis, the more useful this feature is.

Usage example:

>>> x = np.array([[1,2],[3,4]])
>>> x = np.repeat(x, ([3, 3], [1, 2]), (1, 0))
array([[1, 1, 1, 2, 2, 2],
[3, 3, 3, 4, 4, 4],
[3, 3, 3, 4, 4, 4]])

>>> x = np.repeat(x, (3, [1, 2], 1), (1, 0))
array([1, 1, 1, 2, 2, 2,
3, 3, 3, 4, 4, 4,
3, 3, 3, 4, 4, 4])

Co-authored-by: Paulo Almeida <paulocesaralmeida@tecnico.ulisboa.pt>
@rkern
Copy link
Member
rkern commented Jun 14, 2023

This seems to me to be better as a separate function that will use np.repeat() as a primitive rather than extending the and complicating the semantics of np.repeat() itself.

Either way, this is the kind of expansion of the API that needs to be discussed on the mailing list first.

@mattip
Copy link
Member
mattip commented Jun 14, 2023

Expanding a bit on @rkern's comment:

There is some discussion of adding repeat to the Array API in a future revision, along with a number of other commonly-used APIs. Extending the signature would move NumPy further away from the signature used in other array-processing libraries, and would need to be considered carefully.

@PLameiras
Copy link
Contributor Author

Thank you for your input!
I would like to point out that this feature is able to halve the process time for repeats that result in a very large array size on the repeated dimension by processing the multiple repeats in ascending order (smallest resulting axis size first).
Here are some of the improvements that were consistently managed locally:

image

There is no problem in creating a new separate function - 'repeats' - that uses 'repeat' and preserves all the new functionalities that were implemented.
I can either bring forward those changes or keep the current feature as it is.
Either way, any input is greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Second contribution
Development

Successfully merging this pull request may close these issues.

3 participants
0