8000 ENH: Allow size=0 in numpy.random.choice by mattip · Pull Request #11383 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Allow size=0 in numpy.random.choice #11383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 24, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions doc/release/1.16.0-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,12 @@ New Features
Improvements
============

``randint`` and ``choice`` now work on empty distributions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Even when no elements needed to be drawn, ``np.random.randint`` and
``np.random.choice`` raised an error when the arguments described an empty
distribution. This has been fixed so that e.g.
``np.random.choice([], 0) == 8000 np.array([], dtype=float64)``.

Changes
=======
24 changes: 12 additions & 12 deletions numpy/random/mtrand/mtrand.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@
# SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

include "Python.pxi"
include "randint_helpers.pxi"
include "numpy.pxd"
include "randint_helpers.pxi"
include "cpython/pycapsule.pxd"

from libc cimport string
Expand Down Expand Up @@ -988,9 +988,9 @@ cdef class RandomState:
raise ValueError("low is out of bounds for %s" % dtype)
if ihigh > highbnd:
raise ValueError("high is out of bounds for %s" % dtype)
if ilow >= ihigh:
raise ValueError("low >= high")

if ilow >= ihigh and np.prod(size) != 0:
raise ValueError("Range cannot be empty (low >= high) unless no samples are taken")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was suprised for a bit here, but I guess we do it like a python range and allow strange ranges as empty ranges, seems fine to me.

with self.lock:
ret = randfunc(ilow, ihigh - 1, size, self.state_address)

Expand Down Expand Up @@ -1114,15 +1114,15 @@ cdef class RandomState:
# __index__ must return an integer by python rules.
pop_size = operator.index(a.item())
except TypeError:
raise ValueError("a must be 1-dimensional or an integer")
if pop_size <= 0:
raise ValueError("a must be greater than 0")
raise ValueError("'a' must be 1-dimensional or an integer")
if pop_size <= 0 and np.prod(size) != 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit unintuitive that this works for None, but I guess OK, could also add size is None explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That also has the advantage that np.prod probably adds a bit of overhead, but maybe negligible in any case.

raise ValueError("'a' must be greater than 0 unless no samples are taken")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat thought it was backticks ;). This is good, I do not think we or python has serious guidelines for errors.

elif a.ndim != 1:
raise ValueError("a must be 1-dimensional")
raise ValueError("'a' must be 1-dimensional")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is normal to escape argument names in error messages, see, e.g.,

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L988

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is, you can find examples for everything both in numpy and the standard lib. Personally, I think quotes probably make it slightly more discoverable what a refers to, so sounds good to me. (e.g. something says axis is invalid may not refer to an axis argument, something saying `axis` is invalid certainly does).

Anyway, I think the PR looks good and you guys can put it in if you like.

else:
pop_size = a.shape[0]
if pop_size is 0:
raise ValueError("a must be non-empty")
if pop_size is 0 and np.prod(size) != 0:
raise ValueError("'a' cannot be empty unless no samples are taken")

if p is not None:
d = len(p)
Expand All @@ -1136,9 +1136,9 @@ cdef class RandomState:
pix = <double*>PyArray_DATA(p)

if p.ndim != 1:
raise ValueError("p must be 1-dimensional")
raise ValueError("'p' must be 1-dimensional")
if p.size != pop_size:
raise ValueError("a and p must have same size")
raise ValueError("'a' and 'p' must have same size")
if np.logical_or.reduce(p < 0):
raise ValueError("probabilities are not non-negative")
if abs(kahan_sum(pix, d) - 1.) > atol:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you test for an empty probabilities array, you will see that this check fails also, so might as well allow that too?

Expand Down
6 changes: 3 additions & 3 deletions numpy/random/mtrand/randint_helpers.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def get_dispatch(dtypes):

{{for npy_dt, npy_udt, np_dt in get_dispatch(dtypes)}}

def _rand_{{npy_dt}}(low, high, size, rngstate):
def _rand_{{npy_dt}}(npy_{{npy_dt}} low, npy_{{npy_dt}} high, size, rngstate):
"""
_rand_{{npy_dt}}(low, high, size, rngstate)

Expand Down Expand Up @@ -60,8 +60,8 @@ def _rand_{{npy_dt}}(low, high, size, rngstate):
cdef npy_intp cnt
cdef rk_state *state = <rk_state *>PyCapsule_GetPointer(rngstate, NULL)

rng = <npy_{{npy_udt}}>(high - low)
off = <npy_{{npy_udt}}>(<npy_{{npy_dt}}>low)
off = <npy_{{npy_udt}}>(low)
rng = <npy_{{npy_udt}}>(high) - <npy_{{npy_udt}}>(low)

if size is None:
rk_random_{{npy_udt}}(off, rng, 1, &buf, state)
Expand Down
9 changes: 9 additions & 0 deletions numpy/random/tests/test_random.py
Original file line number Diff line number Diff line change
Expand Up @@ -440,6 +440,15 @@ def test_choice_return_shape(self):
assert_equal(np.random.choice(6, s, replace=False, p=p).shape, s)
assert_equal(np.random.choice(np.arange(6), s, replace=True).shape, s)

# Check zero-size
assert_equal(np.random.randint(0, 0, size=(3, 0, 4)).shape, (3, 0, 4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A test for randint(10, 10, size=0) might be good too - just to check that we didn't special case 0 somehow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

assert_equal(np.random.randint(0, -10, size=0).shape, (0,))
assert_equal(np.random.randint(10, 10, size=0).shape, (0,))
assert_equal(np.random.choice(0, size=0).shape, (0,))
assert_equal(np.random.choice([], size=(0,)).shape, (0,))
assert_equal(np.random.choice(['a', 'b'], size=(3, 0, 4)).shape, (3, 0, 4))
assert_raises(ValueError, np.random.choice, [], 10)

def test_bytes(self):
np.random.seed(self.seed)
actual = np.random.bytes(10)
Expand Down
0