-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
ENH: Allow size=0 in numpy.random.choice, regardless of array #8717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't testing the right thing at all. The code already worked when size
had zero-entries, and this breaks np.random.choice(['a', 'b'], size=(3, 0, 4))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test:
assert_equal(np.random.choice(['a', 'b'], size=(3, 0, 4)).shape, (3, 0, 4))
numpy/random/tests/test_random.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only test that fails on master, all the others already pass
Also while we're here, it would be nice if |
Hi eric, thanks for your comments. I didn't quite think it through. I wanted |
@MareinK: I don't think you should be special casing zeros size here. Remove the bit that throws the error, and then fix the bit that fails next. Fixing the randint thing will make |
Aaah I see what you mean, I will be working on it. |
I made a new attempt. I'm not sure about the changes to Current solution: if size is None:
rng = <npy_{{npy_udt}}>(high - low)
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
elif np.prod(size) == 0:
return <ndarray>np.empty(size, np.{{np_dt}})
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
rng = <npy_{{npy_udt}}>(high - low)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array First alternative: merge the case for if size is None:
rng = <npy_{{npy_udt}}>(high - low)
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
if np.prod(size) != 0:
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
rng = <npy_{{npy_udt}}>(high - low)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array Second alternative: keep the definition for if np.prod(size) != 0:
rng = <npy_{{npy_udt}}>(high - low)
if size is None:
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
elif np.prod(size) == 0:
return <ndarray>np.empty(size, np.{{np_dt}})
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array A third possibility combines these two options. Do any of these seem preferable? I'm looking forward to any feedback! |
AppVeyor failed with
Not sure what caused this? |
It has just finished #8669, not sure what is going on. |
Appveyor just tested the merge of #8669, with the same changeset as this. Not sure what is going on there. Anyway, I restarted the tests. |
Can you just make |
And why can't you compute the subtraction outside the if statement? Surely doing a subtraction alone won't cause an error. |
The subtraction ( As for cdef npy_{{npy_udt}} off, rng, buf
cdef npy_{{npy_udt}} *out
cdef ndarray array "arrayObject"
cdef npy_intp cnt
cdef rk_state *state = <rk_state *>PyCapsule_GetPointer(rngstate, NULL)
off = <npy_{{npy_udt}}>(<npy_{{npy_dt}}>low)
if low <= high:
rng = <npy_{{npy_udt}}>(high - low)
if size is None:
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array I'll add tests for |
|
Yeah, to me the only important case is low==high, which just works here. |
Hmmm, true, you can take nothing out of an empty set I guess. A bit funny, but probably right. |
I mean, that's the entire point of this PR! Although Wikipedia argues that [a,b) is a valid but empty set even if b < a. So maybe the current patch is correct in allowing that. |
Getting |
Lots of good points there. I'm generally pretty happy with this patch as it is, once you add the other tests. A test of |
Tests added. |
Tests look good to me.
Still confused about this. If this is true, then how does this line work:
|
<
8000
p dir="auto">Do you mean, why does this line not throw the same error when low<0 ? It seems to be because of the additional cast to npy_{{npy_dt}} . Indeed, adding this cast to the high-low subtraction resolves the 'negative value' error, but it introduces an error in one test and causes another to fail.
Although, I don't really understand how the |
Nope, |
Aha, and then random numbers are generated based on that, and then interpreted as signed again... I suppose. How about the issue with |
@MareinK: I've added a commit to fix that - there's actually a bug in the subtraction |
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be after include "numpy.pxd"
to have access to the numpy types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well enforce casting here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high - low
can produce a result that doesn't fit in npy_dt
, and cause signed overflow (which is undefined in C - not sure about Cython). Unsigned overflow, however, is well-defined, and does the right thing.
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "a cannot be empty unless no samples are taken"
I made the proposed changes. I noticed that for each error that is raised, the I also noticed that now, there are some interesting effects with multi dimensional arrays, such as: np.random.choice([[[[]]]],(3,0,4)) = array([], shape=(3, 0, 4, 1, 1, 0), dtype=float64) Not sure if that is desirable. An error can be raised instead by removing just a single |
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here.
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message probably wants fixing to be consistent with the others
In case my previous comment didn't convey it, Arguably you could extend it to work over the |
Maybe for another time then :) It's illegal again with my latest commit ( |
@seberg, I think I've had too much of a hand in this to give an impartial review - can you take a quick look? |
☔ The latest upstream changes (presumably #8649) made this pull request unmergeable. Please resolve the merge conflicts. |
``np.random.choice`` raised an error when the arguments described an empty | ||
distribution. This has been fixed so that e.g. | ||
``np.random.choice([],0) == np.array([],dtype=float64)``. | ||
Bundled version of LAPACK is now 3.2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline needed here
☔ The latest upstream changes (presumably #8885) made this pull request unmergeable. Please resolve the merge conflicts. |
#9576 affects |
See also #7810, which adds multidimensional array support to |
Allow randint to handle zero size arrays to allow choice to function on empty inputs xref numpy#8717
This was approved but now needs a rebase/merge to resolve conflicts |
I'd like to continue work on this but I've since deleted the fork that this request was based on. Is the best solution to create a new fork and a new pull request? Perhaps as described here? Thanks. |
@MareinK The PR is still available. I can put it up as a new PR if you don't have the code any longer. |
Indeed I don't have the code available anymore. As I said I can reconstruct it, but I don't know what is the best way to go about it. |
Replaced by #11383 |
Fixes #8311. Includes tests.