ENH: Allow size=0 in numpy.random.choice, regardless of array#8717
Conversation
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
This isn't testing the right thing at all. The code already worked when size had zero-entries, and this breaks np.random.choice(['a', 'b'], size=(3, 0, 4))
There was a problem hiding this comment.
Test:
assert_equal(np.random.choice(['a', 'b'], size=(3, 0, 4)).shape, (3, 0, 4))
numpy/random/tests/test_random.py
Outdated
There was a problem hiding this comment.
This is the only test that fails on master, all the others already pass
|
Also while we're here, it would be nice if |
|
Hi eric, thanks for your comments. I didn't quite think it through. I wanted |
|
@MareinK: I don't think you should be special casing zeros size here. Remove the bit that throws the error, and then fix the bit that fails next. Fixing the randint thing will make |
|
Aaah I see what you mean, I will be working on it. |
|
I made a new attempt. I'm not sure about the changes to Current solution: if size is None:
rng = <npy_{{npy_udt}}>(high - low)
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
elif np.prod(size) == 0:
return <ndarray>np.empty(size, np.{{np_dt}})
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
rng = <npy_{{npy_udt}}>(high - low)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return arrayFirst alternative: merge the case for if size is None:
rng = <npy_{{npy_udt}}>(high - low)
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
if np.prod(size) != 0:
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
rng = <npy_{{npy_udt}}>(high - low)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return arraySecond alternative: keep the definition for if np.prod(size) != 0:
rng = <npy_{{npy_udt}}>(high - low)
if size is None:
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
elif np.prod(size) == 0:
return <ndarray>np.empty(size, np.{{np_dt}})
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return arrayA third possibility combines these two options. Do any of these seem preferable? I'm looking forward to any feedback! |
|
AppVeyor failed with
Not sure what caused this? |
|
It has just finished #8669, not sure what is going on. |
|
Appveyor just tested the merge of #8669, with the same changeset as this. Not sure what is going on there. Anyway, I restarted the tests. |
|
Can you just make |
|
And why can't you compute the subtraction outside the if statement? Surely doing a subtraction alone won't cause an error. |
|
The subtraction ( As for cdef npy_{{npy_udt}} off, rng, buf
cdef npy_{{npy_udt}} *out
cdef ndarray array "arrayObject"
cdef npy_intp cnt
cdef rk_state *state = <rk_state *>PyCapsule_GetPointer(rngstate, NULL)
off = <npy_{{npy_udt}}>(<npy_{{npy_dt}}>low)
if low <= high:
rng = <npy_{{npy_udt}}>(high - low)
if size is None:
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return arrayI'll add tests for |
|
|
|
Yeah, to me the only important case is low==high, which just works here. |
|
Hmmm, true, you can take nothing out of an empty set I guess. A bit funny, but probably right. |
|
I mean, that's the entire point of this PR! Although Wikipedia argues that [a,b) is a valid but empty set even if b < a. So maybe the current patch is correct in allowing that. |
|
Getting |
|
Lots of good points there. I'm generally pretty happy with this patch as it is, once you add the other tests. A test of |
|
Tests added. |
|
Tests look good to me.
Still confused about this. If this is true, then how does this line work: |
|
Do you mean, why does this line not throw the same error when Although, I don't really understand how the |
Nope, |
|
Aha, and then random numbers are generated based on that, and then interpreted as signed again... I suppose. How about the issue with |
|
@MareinK: I've added a commit to fix that - there's actually a bug in the subtraction |
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
This needs to be after include "numpy.pxd" to have access to the numpy types
There was a problem hiding this comment.
May as well enforce casting here
There was a problem hiding this comment.
high - low can produce a result that doesn't fit in npy_dt, and cause signed overflow (which is undefined in C - not sure about Cython). Unsigned overflow, however, is well-defined, and does the right thing.
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
Perhaps "a cannot be empty unless no samples are taken"
|
I made the proposed changes. I noticed that for each error that is raised, the I also noticed that now, there are some interesting effects with multi dimensional arrays, such as: np.random.choice([[[[]]]],(3,0,4)) = array([], shape=(3, 0, 4, 1, 1, 0), dtype=float64)Not sure if that is desirable. An error can be raised instead by removing just a single |
numpy/random/mtrand/mtrand.pyx
Outdated
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
This message probably wants fixing to be consistent with the others
|
In case my previous comment didn't convey it, Arguably you could extend it to work over the |
|
Maybe for another time then :) It's illegal again with my latest commit ( |
|
@seberg, I think I've had too much of a hand in this to give an impartial review - can you take a quick look? |
|
☔ The latest upstream changes (presumably #8649) made this pull request unmergeable. Please resolve the merge conflicts. |
| ``np.random.choice`` raised an error when the arguments described an empty | ||
| distribution. This has been fixed so that e.g. | ||
| ``np.random.choice([],0) == np.array([],dtype=float64)``. | ||
| Bundled version of LAPACK is now 3.2.2 |
|
☔ The latest upstream changes (presumably #8885) made this pull request unmergeable. Please resolve the merge conflicts. |
|
#9576 affects |
|
See also #7810, which adds multidimensional array support to |
Allow randint to handle zero size arrays to allow choice to function on empty inputs xref numpy#8717
|
This was approved but now needs a rebase/merge to resolve conflicts |
|
I'd like to continue work on this but I've since deleted the fork that this request was based on. Is the best solution to create a new fork and a new pull request? Perhaps as described here? Thanks. |
|
@MareinK The PR is still available. I can put it up as a new PR if you don't have the code any longer. |
|
Indeed I don't have the code available anymore. As I said I can reconstruct it, but I don't know what is the best way to go about it. |
|
Replaced by #11383 |
Fixes #8311. Includes tests.