memcpy-based fast, typed shuffle. #6933

anntzer · 2016-01-04T01:35:14Z

Only for ndarrays exactly, as subtypes (e.g. masked arrays) may not
like directly shuffling the underlying buffer (in fact, the old
implementation destroyed the underlying values of masked arrays while
shuffling).

Also handles struct-containing-object 1d ndarrays properly.

See #6776 for an earlier, less general (but even faster: ~6x)
improvement attempt, #5514 for the original issue.

charris · 2016-01-04T02:26:49Z

Once you have this working, could you add a note to doc/release/1.11.0-notes.rst? The commit message should also begin MAINT: as you are improving existing functionality.

anntzer · 2016-01-04T07:19:52Z

Fixed.
Note that while this approach is more general than the one in #6776 and provides 4x speedup, the specializations in #6776 were even faster (~6x speedup). Perhaps it is possible to coerce gcc into generating specializations of shuffle for common nbytes sizes (1/2/4/8)?

homu · 2016-01-06T23:31:12Z

☔ The latest upstream changes (presumably #6932) made this pull request unmergeable. Please resolve the merge conflicts.

charris · 2016-01-12T17:44:20Z

numpy/random/mtrand/mtrand.pyx

+            # Fast, statically typed path: shuffle the underlying buffer.
+            # Only for non-empty, 1d objects of class ndarray (subclasses such
+            # as MaskedArrays may not support this approach).
+            x_ptr = <char*><size_t>x.ctypes.data


Why the cast to size_t?

Because Cython interprets <char*>python_object as casting a bytes object (not an object supporting the buffer protocol) to a char*, but <char*>c_object as a C-style cast. (If you try it you'll get "TypeError: expected bytes, int (or whatever you pass in) found")

anntzer · 2016-01-13T01:38:39Z

(switched back to non-generative test)

homu · 2016-01-17T04:06:08Z

☔ The latest upstream changes (presumably #6453) made this pull request unmergeable. Please resolve the merge conflicts.

anntzer · 2016-01-17T04:19:38Z

Kindly bumping this issue and hoping it'll make it for 1.11.0.

njsmith · 2016-01-17T04:20:54Z

You have conflicts?

anntzer · 2016-01-17T04:22:51Z

It's the 3rd time (including the previous PR) I'm merging trivial conflicts in the release notes so I'm only going to do it if someone tells me this will be merged right after.

Only for 1d-ndarrays exactly, as subtypes (e.g. masked arrays) may not allow direct shuffle of the underlying buffer (in fact, the old implementation destroyed the underlying values of masked arrays while shuffling). Also handles struct-containing-object 1d ndarrays properly. See numpy#6776 for an earlier, less general (but even faster: ~6x) improvement attempt, numpy#5514 for the original issue.

njsmith · 2016-01-17T04:47:36Z

numpy/random/mtrand/mtrand.pyx

+            # as MaskedArrays may not support this approach).
+            x_ptr = <char*><size_t>x.ctypes.data
+            stride = x.strides[0]
+            nbytes = x[:1].nbytes


This confused me for a while. I think the more conventional way to write this would be x.dtype.itemsize.

I think I wrote this back when I hoped to be able to handle nd-arrays there too (this can't work because the sub-arrays are not necessarily contiguous). Changing it back.

njsmith · 2016-01-17T04:49:53Z

One nitpick, but otherwise LGTM. @charris, any objections to merging for 1.11?

charris · 2016-01-17T04:55:42Z

@njsmith I haven't reviewed it, but I'm happy for any help in that regard. We have 123 PRs pending, so if it looks good to you, put it in. It might be another day or two before the 1.11 branch.

Apparently gcc only specializes one branch (the last one) so I went for another 33% performance increase (matching numpy#6776) in what's likely the most common use case.

njsmith · 2016-01-17T05:24:30Z

Let's try it out then. Thanks Antony!

memcpy-based fast, typed shuffle.

anntzer mentioned this pull request Jan 4, 2016

Faster random.shuffle via static typing. #6776

Closed

charris added component: numpy.random 03 - Maintenance labels Jan 4, 2016

anntzer force-pushed the fastshuffle-memcpy branch from a063f4c to 25612bd Compare January 4, 2016 07:03

anntzer force-pushed the fastshuffle-memcpy branch 2 times, most recently from fcfaa98 to 62be39a Compare January 7, 2016 00:05

charris reviewed Jan 12, 2016
View reviewed changes

anntzer force-pushed the fastshuffle-memcpy branch from da0288c to dbb2e80 Compare January 12, 2016 18:35

njsmith reviewed Jan 17, 2016
View reviewed changes

anntzer added 2 commits January 16, 2016 20:56

Top shuffle speed for machine-sized ints/floats.

b8cf7f9

Apparently gcc only specializes one branch (the last one) so I went for another 33% performance increase (matching numpy#6776) in what's likely the most common use case.

Revert to non-generative test.

309fdd4

anntzer force-pushed the fastshuffle-memcpy branch from 8c00233 to 309fdd4 Compare January 17, 2016 04:57

njsmith added a commit that referenced this pull request Jan 17, 2016

Merge pull request #6933 from anntzer/fastshuffle-memcpy

afd1174

memcpy-based fast, typed shuffle.

njsmith merged commit afd1174 into numpy:master Jan 17, 2016

anntzer deleted the fastshuffle-memcpy branch January 17, 2016 06:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

memcpy-based fast, typed shuffle. #6933

memcpy-based fast, typed shuffle. #6933

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

memcpy-based fast, typed shuffle. #6933

memcpy-based fast, typed shuffle. #6933

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!