-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Alternative to random.shuffle
, with an axis
argument.
#5173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Don't see why this would need to be an alternative -- why not just add an On Sat, Oct 11, 2014 at 9:36 PM, Warren Weckesser notifications@github.com
Nathaniel J. Smith |
The current behavior of
You can interpret that as being For a 2-D array, you can shuffle
In It would be fine if the alternative shuffling was implemented by adding appropriate arguments to |
Perhaps two arguments could be added to
When |
Oh, ugh, I just assumed that it was more consistent with analogous I'm +1 on a version of shuffle that has calling conventions that match (Maybe "scramble"?) On Sat, Oct 11, 2014 at 10:31 PM, Warren Weckesser <notifications@github.com
Nathaniel J. Smith |
Ah, describing the desired behavior as an analog of |
I was surprised, too, and based on the comments on the stackoverflow question, at least two other experienced numpy users were surprised. I'll start a discussion on the mailing list. |
I guess if the average user is currently getting it wrong then it's worth On Sat, Oct 11, 2014 at 11:00 PM, Warren Weckesser <notifications@github.com
Nathaniel J. Smith |
We need a function named Sue. |
Just wanted to +1 this feature, as I too expected it to exist, analogously to sort(axis=N). Was there any decision made on the mailing list? |
The mailing list thread is here: |
random.shuffle
, with an axis
argument.random.shuffle
, with an axis
argument.
This would be really useful! |
I also would appreciate that! According to https://stackoverflow.com/a/35647011/3401634, for multi-dimensional arrays
is the same as
So why not implement
as
with the default |
Any news on this? I was surprised this functionality doesn't exist. For now I'm using |
Can this be closed now because of #13829? |
(Note that while working on the examples here, I found a bug in the new shuffle code. In what follows, I am using the fix proposed in #14662, which has been merged.) @wkschwartz, the change in #13829 is useful, but it is not the enhancement requested here. The axis added in #13829 still treats the array as a 1-d sequence to be shuffled. The new axis argument allows the user to specify which axis is viewed as the 1-d axis, but it does not do an independent shuffle within the axis. For example,
You can see that the rows have not been independently shuffled. The columns have been rearranged, but the values within each column are the same. The behavior requested in this issue is to shuffle independently, as in the
|
I would like to float this again, maybe also for wednesdays meeting. We just added higher dimensional capabilities to All of these use the current shuffle logic which is But, in almost all occasions, |
As noted in the github cross-reference above, I have a work-in-progress PR at #15121. I got some good feedback after submitting the PR, but I haven't made time to address all the issues that were brought up. |
@WarrenWeckesser that is cool, what I am personally more urgently concerned about is that we expanded the over meaning in the new API and recently at that. I am probably just overreacting right now, because I am a bit annoyed that I missed this or did not think it to the end before... But I honestly think the currentl logic is very dangerous. It is easy to miss that it does not provide the expected along meaning. And it is not the meaning that |
@seberg, thanks for poking this issue. I think we still need to reach consensus on the API. I'll try to give a brief summary of past ideas here. I'll follow your convention of using "over" and "along" for two interpretations of At the end of the mailing list discussion several years ago, I ended up thinking the solution was to not change the APIs of With the new functions, we would have the following related
(The methods that operate "over" the axis, Instead of two new methods, it has been suggested that we have just one, with a parameter that controls the in-place vs. copy behavior. Two suggestions have been floated for this: (a) Add an The main alternative to creating new methods is to add a new parameter to the existing methods that changes how
(Editorial digression: Inevitably in discussion like this, the issue of growing the namespace (in this case, the Having said all that, here are two additions to the existing signature of (1) I think I've covered the all the various API ideas that have come up. If anyone knows of others, let us know. |
I am happy with increasing the API here. I am not sure there is much reason to be against it:
I suppose what is going on here is that |
I find I do think While it seems simple to extend the existing API, I think @rkern's point about not having keywords that radically change behavior is the best path. |
I suppose for in-place vs. not-in-place, we have the alternative |
I'm getting back to this issue (and the related PR at #15121). Back when I created the original issue, and tried to describe the problem with the current It would be great if we could truly replicate the |
Without a singleton generator I think this would be impossible to achieve. |
@bashtage wrote
This is what the mailing list discussion (sort of) converged to back in 2014. Here's a link to Nathaniels suggestion: https://mail.python.org/pipermail/numpy-discussion/2014-October/071364.html His If we add
which has a nice consistency. In this version of the API, none of the methods have an Over in #15121, I recently added another method, with the ungainly and obviously temporary name
But if we are going to introduce an
Note that the That's my summary of the two main contenders for the change. We have the What do folks think? |
Of the three scenarios you listed, in order, I would rank them 1, 3, and quite far behind 2. The 2 permutations that are doing quite radically different things seems like a big source of confusion. My personal preference is to avoid the mandatory use of out to access a feature; I always think of out as a performance choice that can make sense in some scenarios. I would not really like to teach students to use out just to access a feature. I would also assume that in case 3 |
But shuffling in place is a performance choice, isn't it? |
In-place can also be a coding style choice, when available. Perhaps a confusing, and maybe error-prone one. My personal take is that when f(x, out=x) always feels a bit magical since it is sometimes used as a very non-obvious way to achieve something quick. f(x, inplace=True), despite not looking like anything else, seems much clearer (looks a bit like an old pandas pattern that has mostly been removed). |
True, but it is a coding style choice that in NumPy seems typically spelled using I admit its a bit magical and an While possible a bit tedious to write, and somewhat less quick to read, |
A hypothetical question for those advocating the use of |
|
Yes, I think I am not sure about the "with no need to implemented the method |
That was part of my API thought experiment. I'm didn't mean to imply that there is anything wrong with what we have now. I was just saying that, if we started from scratch--and I'll add to my hypothetical premises that we aren't concerned with matching the Python API for lists--then the preferred API for sorting would be |
Another question, not so hypothetical: if using |
Yes, in my opinion adding an |
The method permuted(x, axis=None, out=None) shuffles an array. Unlike the existing shuffle method, it shuffles the slices along the given axis independently. Closes numpygh-5173.
* ENH: random: Make _shuffle_raw and _shuffle_int standalone functions. * ENH: random: Add the method `permuted` to Generator. The method permuted(x, axis=None, out=None) shuffles an array. Unlike the existing shuffle method, it shuffles the slices along the given axis independently. Closes gh-5173.
It would be nice to have an alternative to
numpy.random.shuffle
that accepts anaxis
argument, and that independently shuffles the one-dimensional slices. Here's an implementation that I'll calldisarrange
. It works, but it would be nice to have a more efficient C implementation.Example:
This request was motivated by this question on stackoverflow: http://stackoverflow.com/questions/26310346/quickly-calculate-randomized-3d-numpy-array-from-2d-numpy-array/
The text was updated successfully, but these errors were encountered: