-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: random: Add the method permuted
to Generator.
#15121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I guess it needs some unit tests, too. 😧 |
I added the WIP prefix to the title. I definitely need feedback on both the API and the implementation. The implementation of Be sure to peruse the old mailing list thread to see the motivation for the API. |
Perhaps one of:
Then |
Another name that I considered was +1 for making
Creating just one new function and using |
I'm really sorry if I missed it, but what about adding an |
@matthew-brett, is the idea that one gives I should also mention another API alternative from the email thread. I originally proposed adding the boolean parameter
|
Yes, my idea, to the extent I had thought it through, was to have I did see your |
Namespace pollution with these seems pretty bad. I think these only make sense if there is a move to remove shuffle/permutation. If these can't or shouldn't but removed, then I think it would be better to fix this with a cleverly chosen keyword argument. Using I also think having both is not a good idea. Why not just
I also think ca. 2014 the compatibility guarantee was so strong that it almost dictated to have a new function. I think in Of the names you have mentioned, I would prefer Maybe I'm missing something, but how does this differ from the version in master Edit: I get it now. It is sort of like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things that could be improved.
|
I'd just like to humbly beg for the I know these are trivial extensions, but the same is true of the Python built-in "sorted", and I'm sure y'all have the same experience that I do, of finding it extremely useful, so that I almost invariably use I would also find it very useful for teaching because:
|
Another thought about a keyword argument. This feels like how these differ from permutation. The existing seems like an outer permutation since it permutes entire slices. This permutation seems like an inner since it permutes the elements within a slice (or the slice's .flat). This suggests something like |
@bashtage, thanks for the review! I probably won't get back to this until after the holidays. |
@bashtage, @matthew-brett and @eric-wieser, thanks again for the review and suggestions last December. This dropped down a few levels on my "to do" list, but @seberg commented over in gh-5173, so it is time to bump this back up the list. I'd like to continue the discussion of the API over in gh-5173 before getting back to the PR. I just added a comment in that issue with my attempt to summarize the API discussions so far. Let's use that issue to finish up the API design. This PR might not survive the discussion. 😮 |
84bb935
to
d0cd357
Compare
I am confused about using "permute" and "shuffle" in the docstrings: do they mean the same or different things? The docstring for permutation and docstring for shufle do not clear up my confusion. It does seem that
should have an |
"randomly permute" and "shuffle" mean the same thing: reorder the elements randomly. When typeset as code,
In my latest update to this PR, I added a new method, |
Sorry, my bad. So if I am a first-time user, how do I know which to use: |
As for the api, I find permuted with out to be very confusing. and I still think add |
b401776
to
57ca9bb
Compare
permuted
to Generator.
57ca9bb
to
ab230cb
Compare
shuffle of the columns. | ||
|
||
In-place vs. copy | ||
~~~~~~~~~~~~~~~~~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be the moved up, the axis
difference is secondary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a table summarizing this lengthy description or some other kind of TL;DR summary:
method | copy/inplace | axis handling |
---|---|---|
shuffle | inplace | as-if 1d |
permutation | copy | as-if 1d |
permute | either copy or inplace | axis independent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also perhaps worth noting that shuffle
works on non-numpy sequences.
I'm updating the PR to ensure that the GIL is held when shuffling object arrays. This affects the existing
Can anyone explain that "trick"? The calls to |
That's exactly the point. This strategy is used repeatedly by ufunc loops too. You can repeat identical code multiple times, and have the compiler optimize it a different way each time based on the if statement that surrounds it. Doing it this way is safer and faster than trying to push the optimization through yourself. |
@eric-wieser, thanks. In |
@WarrenWeckesser I just wondered if it was possible, and I think you can do |
@seberg, thanks, I wasn't aware of that option. I don't think it helps here, because according to the Cython docs, the argument to |
From Sebastian and from some further reading: yes, gcc can, in fact, replace a |
@bashtage wrote
In the current version of the PR, if |
In the latest version, I think I've addressed all the comments. To cut down on the amount of repeated code, I made a thin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I am happy, the only real change here is the except -1
to make sure that errors during WriteBackIfCopy
(which I guess are possible) are floated correctly. I wonder if our __init__.pyd
really needs those excepts in many places, but that is a different thing.
I am OK with this approach, its not super pretty but works. I believe that this was mentioned a few times on the mailing list before, so my plan will be to merge soon and then we post a heads-up there just in case someone is unhappy with the new API (now we have nice docs to point to as well).
@@ -28,6 +29,13 @@ from ._common cimport (POISSON_LAM_MAX, CONS_POSITIVE, CONS_NONE, | |||
validate_output_shape | |||
) | |||
|
|||
cdef extern from "numpy/arrayobject.h": | |||
int PyArray_ResolveWritebackIfCopy(np.ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int PyArray_ResolveWritebackIfCopy(np.ndarray) | |
int PyArray_ResolveWritebackIfCopy(np.ndarray) except -1 |
self.shuffle(out) | ||
return out | ||
|
||
ax = normalize_axis_index(axis, np.ndim(out)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ax = normalize_axis_index(axis, np.ndim(out)) | |
ax = normalize_axis_index(axis, out.ndim) |
LGTM. Clearly lots of hard work and thinking about the edge cases. |
Thanks @WarrenWeckesser |
Note: updated on 15-July-2020.
In 2014, I created a github issue [1]_ and started a mailing list
discussion [2]_ about a limitation of the functions
shuffle
andpermutation
innumpy.random
. The problem is that those functionstreat the input as 1-d sequence, and only apply the shuffle or
permutation to that 1-d input. There is no function to, say, shuffle
each row of a 2-d array, with the permutation of each row being
independent of the others.
At the time, most folks were in favor of adding new functions, although
there was still some debate about the names. This PR is an implementation
of the Generator method
permuted
with anout
parameter as proposedby @eric-wieser below, and as favored by @seberg in gh-5173. The full
signature is
When axis=None, the operation applies to the flattened array.
(The return value of
permuted
in that case still has thesame shape as
x
.) Otherwise, random permutations are appliedindependently along the given axis.
Currently
permuted
is the only new function that is implemented.For a consistent API, we would might eventually deprecate
shuffle
and
permutation
(as proposed in gh=5173), and replace them withThis would handle
axis
likeshuffle
and permutation currentlyhandle it. I can add that to this PR if we want to make that change now.
Closes gh-5173.
[1] #5173
[2] https://mail.python.org/pipermail/numpy-discussion/2014-October/071340.html