-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Allow size=0 in numpy.random.choice, regardless of array #8717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/random/mtrand/mtrand.pyx
Outdated
@@ -1094,6 +1094,8 @@ cdef class RandomState: | |||
|
|||
# Format and Verify input | |||
a = np.array(a, copy=False) | |||
if np.prod(size) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't testing the right thing at all. The code already worked when size
had zero-entries, and this breaks np.random.choice(['a', 'b'], size=(3, 0, 4))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test:
assert_equal(np.random.choice(['a', 'b'], size=(3, 0, 4)).shape, (3, 0, 4))
numpy/random/tests/test_random.py
Outdated
s = (0,) | ||
a = np.array([1.5, 2.5, 3.5]) | ||
p = [0.1, 0.4, 0.5] | ||
assert_equal(np.random.choice([], s).shape, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only test that fails on master, all the others already pass
Also while we're here, it would be nice if |
Hi eric, thanks for your comments. I didn't quite think it through. I wanted |
@MareinK: I don't think you should be special casing zeros size here. Remove the bit that throws the error, and then fix the bit that fails next. Fixing the randint thing will make |
Aaah I see what you mean, I will be working on it. |
I made a new attempt. I'm not sure about the changes to Current solution: if size is None:
rng = <npy_{{npy_udt}}>(high - low)
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
elif np.prod(size) == 0:
return <ndarray>np.empty(size, np.{{np_dt}})
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
rng = <npy_{{npy_udt}}>(high - low)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array First alternative: merge the case for if size is None:
rng = <npy_{{npy_udt}}>(high - low)
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
if np.prod(size) != 0:
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
rng = <npy_{{npy_udt}}>(high - low)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array Second alternative: keep the definition for if np.prod(size) != 0:
rng = <npy_{{npy_udt}}>(high - low)
if size is None:
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
elif np.prod(size) == 0:
return <ndarray>np.empty(size, np.{{np_dt}})
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array A third possibility combines these two options. Do any of these seem preferable? I'm looking forward to any feedback! |
AppVeyor failed with
Not sure what caused this? |
It has just finished #8669, not sure what is going on. |
Appveyor just tested the merge of #8669, with the same changeset as this. Not sure what is going on there. Anyway, I restarted the tests. |
Can you just make |
And why can't you compute the subtraction outside the if statement? Surely doing a subtraction alone won't cause an error. |
The subtraction ( As for cdef npy_{{npy_udt}} off, rng, buf
cdef npy_{{npy_udt}} *out
cdef ndarray array "arrayObject"
cdef npy_intp cnt
cdef rk_state *state = <rk_state *>PyCapsule_GetPointer(rngstate, NULL)
off = <npy_{{npy_udt}}>(<npy_{{npy_dt}}>low)
if low <= high:
rng = <npy_{{npy_udt}}>(high - low)
if size is None:
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
return np.{{np_dt}}(<npy_{{npy_dt}}>buf)
else:
array = <ndarray>np.empty(size, np.{{np_dt}})
cnt = PyArray_SIZE(array)
array_data = <npy_{{npy_udt}} *>PyArray_DATA(array)
with nogil:
rk_random_{{npy_udt}}(off, rng, cnt, array_data, state)
return array I'll add tests for |
|
Yeah, to me the only important case is low==high, which just works here. |
Hmmm, true, you can take nothing out of an empty set I guess. A bit funny, but probably right. |
I mean, that's the entire point of this PR! Although Wikipedia argues that [a,b) is a valid but empty set even if b < a. So maybe the current patch is correct in allowing that. |
Getting |
Lots of good points there. I'm generally pretty happy with this patch as it is, once you add the other tests. A test of |
Tests added. |
Tests look good to me.
Still confused about this. If this is true, then how does this line work:
|
Do you mean, why does this line not throw the same error when Although, I don't really understand how the |
Nope, |
Aha, and then random numbers are generated based on that, and then interpreted as signed again... I suppose. How about the issue with |
@MareinK: I've added a commit to fix that - there's actually a bug in the subtraction |
include "numpy.pxd" | ||
include "randint_helpers.pxi" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be after include "numpy.pxd"
to have access to the numpy types
@@ -23,7 +23,7 @@ def get_dispatch(dtypes): | |||
|
|||
{{for npy_dt, npy_udt, np_dt in get_dispatch(dtypes)}} | |||
|
|||
def _rand_{{npy_dt}}(low, high, size, rngstate): | |||
def _rand_{{npy_dt}}(npy_{{npy_dt}} low, npy_{{npy_dt}} high, size, rngstate): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be d F438 isplayed to describe this comment to others. Learn more.
May as well enforce casting here
rng = <npy_{{npy_udt}}>(high - low) | ||
off = <npy_{{npy_udt}}>(<npy_{{npy_dt}}>low) | ||
off = <npy_{{npy_udt}}>(low) | ||
rng = <npy_{{npy_udt}}>(high) - <npy_{{npy_udt}}>(low) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
high - low
can produce a result that doesn't fit in npy_dt
, and cause signed overflow (which is undefined in C - not sure about Cython). Unsigned overflow, however, is well-defined, and does the right thing.
@@ -1106,8 +1106,6 @@ cdef class RandomState: | |||
raise ValueError("a must be 1-dimensional") | |||
else: | |||
pop_size = a.shape[0] | |||
if pop_size is 0: | |||
raise ValueError("a must be non-empty") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "a cannot be empty unless no samples are taken"
I made the proposed changes. I noticed that for each error that is raised, the I also noticed that now, there are some interesting effects with multi dimensional arrays, such as: np.random.choice([[[[]]]],(3,0,4)) = array([], shape=(3, 0, 4, 1, 1, 0), dtype=float64) Not sure if that is desirable. An error can be raised instead by removing just a single |
numpy/random/mtrand/mtrand.pyx
Outdated
raise ValueError("a must be greater than 0") | ||
elif a.ndim != 1: | ||
elif a.ndim != 1 and np.prod(size) != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here.
numpy/random/mtrand/mtrand.pyx
Outdated
@@ -1100,14 +1100,14 @@ cdef class RandomState: | |||
pop_size = operator.index(a.item()) | |||
except TypeError: | |||
raise ValueError("a must be 1-dimensional or an integer") | |||
if pop_size <= 0: | |||
if pop_size <= 0 and np.prod(size) != 0: | |||
raise ValueError("a must be greater than 0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message probably wants fixing to be consistent with the others
In case my previous comment didn't convey it, Arguably you could extend it to work over the |
Maybe for another time then :) It's illegal again with my latest commit ( |
@seberg, I think I've had too much of a hand in this to give an impartial review - can you take a quick look? |
☔ The latest upstream changes (presumably #8649) made this pull request unmergeable. Please resolve the merge conflicts. |
Even when no elements needed to be drawn, ``np.random.randint`` and | ||
``np.random.choice`` raised an error when the arguments described an empty | ||
distribution. This has been fixed so that e.g. | ||
``np.random.choice([],0) == np.array([],dtype=float64)``. | ||
Bundled version of LAPACK is now 3.2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newline needed here
☔ The latest upstream changes (presumably #8885) made this pull request unmergeable. Please resolve the merge conflicts. |
#9576 affects |
See also #7810, which adds multidimensional array support to |
Allow randint to handle zero size arrays to allow choice to function on empty inputs xref numpy#8717
This was approved but now needs a rebase/merge to resolve conflicts |
I'd like to continue work on this but I've since deleted the fork that this request was based on. Is the best solution to create a new fork and a new pull request? Perhaps as described here? Thanks. |
@MareinK The PR is still available. I can put it up as a new PR if you don't have the code any longer. |
Indeed I don't have the code available anymore. As I said I can reconstruct it, but I don't know what is the best way to go about it. |
Replaced by #11383 |
Fixes #8311. Includes tests.