8000 memcpy-based fast, typed shuffle. by anntzer · Pull Request #6933 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

memcpy-based fast, typed shuffle. #6933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 17, 2016
Merged

Conversation

anntzer
Copy link
Contributor
@anntzer anntzer commented Jan 4, 2016

Only for ndarrays exactly, as subtypes (e.g. masked arrays) may not
like directly shuffling the underlying buffer (in fact, the old
implementation destroyed the underlying values of masked arrays while
shuffling).

Also handles struct-containing-object 1d ndarrays properly.

See #6776 for an earlier, less general (but even faster: ~6x)
improvement attempt, #5514 for the original issue.

@charris
Copy link
Member
charris commented Jan 4, 2016

Once you have this working, could you add a note to doc/release/1.11.0-notes.rst? The commit message should also begin MAINT: as you are improving existing functionality.

@anntzer anntzer force-pushed the fastshuffle-memcpy branch from a063f4c to 25612bd Compare January 4, 2016 07:03
@anntzer
Copy link
Contributor Author
anntzer commented Jan 4, 2016

Fixed.
Note that while this approach is more general than the one in #6776 and provides 4x speedup, the specializations in #6776 were even faster (~6x speedup). Perhaps it is possible to coerce gcc into generating specializations of shuffle for common nbytes sizes (1/2/4/8)?

@homu
Copy link
Contributor
homu commented Jan 6, 2016

☔ The latest upstream changes (presumably #6932) made this pull request unmergeable. Please resolve the merge conflicts.

@anntzer anntzer force-pushed the fastshuffle-memcpy branch 2 times, most recently from fcfaa98 to 62be39a Compare January 7, 2016 00:05
# Fast, statically typed path: shuffle the underlying buffer.
# Only for non-empty, 1d objects of class ndarray (subclasses such
# as MaskedArrays may not support this approach).
x_ptr = <char*><size_t>x.ctypes.data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the cast to size_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Cython interprets <char*>python_object as casting a bytes object (not an object supporting the buffer protocol) to a char*, but <char*>c_object as a C-style cast. (If you try it you'll get "TypeError: expected bytes, int (or whatever you pass in) found")

@anntzer anntzer force-pushed the fastshuffle-memcpy branch from da0288c to dbb2e80 Compare January 12, 2016 18:35
@anntzer
Copy link
Contributor Author
anntzer commented Jan 13, 2016

(switched back to non-generative test)

@homu
Copy link
Contributor
homu commented Jan 17, 2016

☔ The latest upstream changes (presumably #6453) made this pull request unmergeable. Please resolve the merge conflicts.

@anntzer
Copy link
Contributor Author
anntzer commented Jan 17, 2016

Kindly bumping this issue and hoping it'll make it for 1.11.0.

@njsmith
Copy link
Member
njsmith commented Jan 17, 2016

You have conflicts?

@anntzer
Copy link
Contributor Author
anntzer commented Jan 17, 2016

It's the 3rd time (including the previous PR) I'm merging trivial conflicts in the release notes so I'm only going to do it if someone tells me this will be merged right after.

Only for 1d-ndarrays exactly, as subtypes (e.g. masked arrays) may not
allow direct shuffle of the underlying buffer (in fact, the old
implementation destroyed the underlying values of masked arrays while
shuffling).

Also handles struct-containing-object 1d ndarrays properly.

See numpy#6776 for an earlier, less general (but even faster: ~6x)
improvement attempt, numpy#5514 for the original issue.
# as MaskedArrays may not support this approach).
x_ptr = <char*><size_t>x.ctypes.data
stride = x.strides[0]
nbytes = x[:1].nbytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This confused me for a while. I think the more conventional way to write this would be x.dtype.itemsize.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I wrote this back when I hoped to be able to handle nd-arrays there too (this can't work because the sub-arrays are not necessarily contiguous). Changing it back.

@njsmith
Copy link
Member
njsmith commented Jan 17, 2016

One nitpick, but otherwise LGTM. @charris, any objections to merging for 1.11?

@charris
Copy link
Member
charris commented Jan 17, 2016

@njsmith I haven't reviewed it, but I'm happy for any help in that regard. We have 123 PRs pending, so if it looks good to you, put it in. It might be another day or two before the 1.11 branch.

Apparently gcc only specializes one branch (the last one) so I went for
another 33% performance increase (matching numpy#6776) in what's likely the
most common use case.
@anntzer anntzer force-pushed the fastshuffle-memcpy branch from 8c00233 to 309fdd4 Compare January 17, 2016 04:57
@njsmith
Copy link
Member
njsmith commented Jan 17, 2016

Let's try it out then. Thanks Antony!

njsmith added a commit that referenced this pull request Jan 17, 2016
memcpy-based fast, typed shuffle.
@njsmith njsmith merged commit afd1174 into numpy:master Jan 17, 2016
@anntzer anntzer deleted the fastshuffle-memcpy branch January 17, 2016 06:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0