-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
RandomState.shuffle has a terrible overhead for NumPy arrays #5514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This patch modifies random.shuffle so that (when working on a ndarray) an array of indices is shuffled and then elements are take()n from that array in that order. This allows the inner loop to be statically typed (it turns out this is not so easy to write a generic shuffling code using Cython fused types) and thus much faster (~6x for me), at the expense of a threefold increase in memory use (I guess take() needs to create a copy, and an additional array of indices is created.). See numpy#5514.
Only for 1d-ndarrays exactly, as subtypes (e.g. masked arrays) may not allow direct shuffle of the underlying buffer (in fact, the old implementation destroyed the underlying values of masked arrays while shuffling). Also handles struct-containing-object 1d ndarrays properly. See numpy#6776 for an earlier, less general (but even faster: ~6x) improvement attempt, numpy#5514 for the original issue.
Only for 1d-ndarrays exactly, as subtypes (e.g. masked arrays) may not allow direct shuffle of the underlying buffer (in fact, the old implementation destroyed the underlying values of masked arrays while shuffling). Also handles struct-containing-object 1d ndarrays properly. See numpy#6776 for an earlier, less general (but even faster: ~6x) improvement attempt, numpy#5514 for the original issue.
Can this be closed? The 1-D case has been fixed, for the multidimensional case I would suggest either shuffling the indexes and making a copy using take or a special indexing class that uses shuffled indexes. The latter won't work for slices, however. |
I suppose it would be faster in the multiple dimension case to make a shuffled copy, then copy back. The function should be reentrant as long as the index shuffling does not take place under the lock. |
I suppose one could also shuffle a 1-D object array of views in the multidimensional case, but the result would not act like a true multidimensional array. |
The nd implementation uses a bounce buffer just big enough to hold one "row", which should be close to optimal. |
There are still som minor issues with the single-dimensional case. The loop on line 5100 should be over The multidimensional case I am not sure about. An ndarray is generally not the appropriate data structure for shuffling more than one dimension. I would prefer to use a ragged array in my own code. Another option would be to shuffle a 1D array of indices and then use it for fancy indexing. I don't think it is worth the effort to try to fix something that is inherently unfixable because the data structure is unsuitable for the task. |
I think you need to keep the GIL to maintain repeatability. Does cython deal with |
Cython does optimize |
We don't need to keep the GIL because we have acquired the lock on the RandomState object. Line 5073. |
Yes Cython does deal with |
Note that you need to be careful with releasing the GIL. Yes, you can do it, but only if the underlying array has no object references. |
We are just calling |
You should also set the array flags to non-writable before entering the loop. |
Even in the case of object references we do not need it because we are not changing refcounts. |
Setting the non-writable flag temporarily would be safer, but an idiot could just undo the flag. |
It doesn't mean we shouldn't at least try to avoid data corruption by reasonably behaved multithreaded programs... |
I agree that setting non-writable is a good idea. |
I doubt it helps too much, but maybe a bit.... Basically you have to be careful in any case, since you will get undefined behaviour even if you read. But if you read from an object array, you might well blow up your reference counts and crash python. Frankly, just don't think it is worth the trouble to think about releasing the GIL for object arrays. |
Ok, so just let us special case them. It is just an extra line of code. |
That was my thought, I hoped you would open a PR since you seem to have an exact plan, and point that out ;). |
Will do, once I get home. No coding from my cell phone 😉 |
Has this been fixed in master? There is a special path now. |
Yeah, I will close it. One could probably try to do similar optimizations for higher dimensional arrays, but there are a lot of traps here and in most cases it is probably not much useful. Doesn't mean we can't do it, but no need to keep an issue around. |
The shuffle function for
numpy.random.RandomState
is implemented like this:This implementation is close to optimal for Python lists, as it would just swap
PyObject*
pointers. We could probably do a little better by declaringx
list, but not much.However for NumPy arrays this implementation is awful.
First, the indexing is implemented as Python function calls, as Cython does not know that
x
is an ndarray. Along with it comes all the overhead like bounds checking, etc.Second, the swapping statement
x[i], x[j] = x[j], x[i]
continously creates and destroys dtype scalars (Python objects). In the multidimensional case it creates temporary ndarrays instead.All in all, this is as bad as it gets.
In the single dimensional case we should have the shuffling loop
as a C function, with specialisations depending on the dtype (e.g. a switch statement on the typenum). Another possibility to avoid C is to use "fusedtypes" in Cython to code this generically.
In the multidimensional case we could shuffle an array of void* pointers to the subarrays, and use this to fill in the output array.
This will kill enormous amount of Python overhead in Monte Carlo simulations where we need to shuffle, e.g. in permutation tests.
The text was updated successfully, but these errors were encountered: