-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
memcpy-based fast, typed shuffle. #6933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Once you have this working, could you add a note to |
a063f4c
to
25612bd
Compare
☔ The latest upstream changes (presumably #6932) made this pull request unmergeable. Please resolve the merge conflicts. |
fcfaa98
to
62be39a
Compare
# Fast, statically typed path: shuffle the underlying buffer. | ||
# Only for non-empty, 1d objects of class ndarray (subclasses such | ||
# as MaskedArrays may not support this approach). | ||
x_ptr = <char*><size_t>x.ctypes.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the cast to size_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Cython interprets <char*>python_object
as casting a bytes object (not an object supporting the buffer protocol) to a char*
, but <char*>c_object
as a C-style cast. (If you try it you'll get "TypeError: expected bytes, int (or whatever you pass in) found")
da0288c
to
dbb2e80
Compare
(switched back to non-generative test) |
☔ The latest upstream changes (presumably #6453) made this pull request unmergeable. Please resolve the merge conflicts. |
Kindly bumping this issue and hoping it'll make it for 1.11.0. |
You have conflicts? |
It's the 3rd time (including the previous PR) I'm merging trivial conflicts in the release notes so I'm only going to do it if someone tells me this will be merged right after. |
Only for 1d-ndarrays exactly, as subtypes (e.g. masked arrays) may not allow direct shuffle of the underlying buffer (in fact, the old implementation destroyed the underlying values of masked arrays while shuffling). Also handles struct-containing-object 1d ndarrays properly. See numpy#6776 for an earlier, less general (but even faster: ~6x) improvement attempt, numpy#5514 for the original issue.
# as MaskedArrays may not support this approach). | ||
x_ptr = <char*><size_t>x.ctypes.data | ||
stride = x.strides[0] | ||
nbytes = x[:1].nbytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This confused me for a while. I think the more conventional way to write this would be x.dtype.itemsize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I wrote this back when I hoped to be able to handle nd-arrays there too (this can't work because the sub-arrays are not necessarily contiguous). Changing it back.
One nitpick, but otherwise LGTM. @charris, any objections to merging for 1.11? |
@njsmith I haven't reviewed it, but I'm happy for any help in that regard. We have 123 PRs pending, so if it looks good to you, put it in. It might be another day or two before the 1.11 branch. |
Apparently gcc only specializes one branch (the last one) so I went for another 33% performance increase (matching numpy#6776) in what's likely the most common use case.
8c00233
to
309fdd4
Compare
Let's try it out then. Thanks Antony! |
memcpy-based fast, typed shuffle.
Only for ndarrays exactly, as subtypes (e.g. masked arrays) may not
like directly shuffling the underlying buffer (in fact, the old
implementation destroyed the underlying values of masked arrays while
shuffling).
Also handles struct-containing-object 1d ndarrays properly.
See #6776 for an earlier, less general (but even faster: ~6x)
improvement attempt, #5514 for the original issue.