10000 BUG: Buttress handling of extreme values in randint by gfyoung · Pull Request #8846 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Buttress handling of extreme values in randint #8846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 9, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 16 additions & 5 deletions numpy/random/mtrand/mtrand.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -964,20 +964,31 @@ cdef class RandomState:
high = low
low = 0

# '_randint_type' is defined in
# 'generate_randint_helpers.py'
key = np.dtype(dtype).name
if not key in _randint_type:
if key not in _randint_type:
raise TypeError('Unsupported dtype "%s" for randint' % key)

lowbnd, highbnd, randfunc = _randint_type[key]

if low < lowbnd:
# TODO: Do not cast these inputs to Python int
#
# This is a workaround until gh-8851 is resolved (bug in NumPy
# integer comparison and subtraction involving uint64 and non-
# uint64). Afterwards, remove these two lines.
ilow = int(low)
ihigh = int(high)

if ilow < lowbnd:
raise ValueError("low is out of bounds for %s" % (key,))
if high > highbnd:
if ihigh > highbnd:
raise ValueError("high is out of bounds for %s" % (key,))
if low >= high:
if ilow >= ihigh:
raise ValueError("low >= high")

with self.lock:
ret = randfunc(low, high - 1, size, self.state_address)
ret = randfunc(ilow, ihigh - 1, size, self.state_address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we claim that everything should work if comparisons get fixed, then this line doesn't need to change - right?

Copy link
Contributor Author
@gfyoung gfyoung Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. The following lines will not be needed:

ilow = int(ilow)
ihigh = int(high)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me be more clear: I think we should continue to pass low and high on this line - there is no reason to change to ilow and ihigh here.

Copy link
Contributor Author
@gfyoung gfyoung Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the case:

>>> uint64_max = np.uint64(np.iinfo(np.uint64).max)
>>> uint64_max - 1
1.8446744073709552e+19

This PR cannot demonstrate any clearer just how fragile numpy's operators are when handling large uint64 numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn, you're right. In that case, the comment above saying "remove these lines" should also point out that we need them to work around subtraction as well. Or just a comment to that effect next to the subtraction

Alternatively, np.subtract(high, 1, dtype=dtype) would probably be safe here too.

Copy link
Contributor Author
@gfyoung gfyoung Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. In fact, even your suggestion is not safe:

>>> uint64_max = np.uint64(np.iinfo(np.uint64).max)
>>>
# correct but bad for randint
>>> np.subtract(uint64_max, 1, dtype=np.int64)
-2
>>>
>>> np.subtract(uint64_max, 1, dtype=None)   # oops
1.8446744073709552e+19

Copy link
Member
@eric-wieser eric-wieser Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you meant by # correct but bad for randint. You're in for a bad time if you ask for an upper bound that doesn't come close to fitting in the container type. What does passing dtype=None into randint do normally?

Copy link
Contributor Author
@gfyoung gfyoung Apr 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at those examples again, I realize that those aren't valid in the context of randint. The only time that we could do this subtraction is if dtype=np.uint64, in which case the subtraction actually works correctly.

So ignore everything I just said in the previous comment above. I still would rather prefer to use high - 1 (as the current workaround) over np.subtract, as that will make it clear to us later that we need to patch this once #8851 is fixed.

dtype=None is not valid (per the docs) and will error as an invalid dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.subtract has the benefit of extending to broadcasting more naturally.

I'm happy with the current workaround, provided there's a comment indicating that the subtraction too is a workaround

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment has been appended to the TODO I wrote earlier.


if size is None:
if dtype in (np.bool, np.int, np.long):
Expand Down
45 changes: 44 additions & 1 deletion numpy/random/tests/test_random.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,22 +157,42 @@ def test_rng_zero_and_extremes(self):
for dt in self.itype:
lbnd = 0 if dt is np.bool_ else np.iinfo(dt).min
ubnd = 2 if dt is np.bool_ else np.iinfo(dt).max + 1

tgt = ubnd - 1
assert_equal(self.rfunc(tgt, tgt + 1, size=1000, dtype=dt), tgt)

tgt = lbnd
assert_equal(self.rfunc(tgt, tgt + 1, size=1000, dtype=dt), tgt)

tgt = (lbnd + ubnd)//2
assert_equal(self.rfunc(tgt, tgt + 1, size=1000, dtype=dt), tgt)

def test_full_range(self):
# Test for ticket #1690

for dt in self.itype:
lbnd = 0 if dt is np.bool_ else np.iinfo(dt).min
ubnd = 2 if dt is np.bool_ else np.iinfo(dt).max + 1
Copy link
Member
@eric-wieser eric-wieser Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would np.iinfo(np.bool_) be a reasonable thing for numpy to contain? (EDIT: #8849)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, as it does not work for the moment:

>>> np.iinfo(np.bool_)
...
ValueError: Invalid integer data type.


try:
self.rfunc(lbnd, ubnd, dtype=dt)
except Exception as e:
raise AssertionError("No error should have been raised, "
"but one was with the following "
"message:\n\n%s" % str(e))
Copy link
Member
@eric-wieser eric-wieser Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little weird - better to leave the raw exception so that the tests produce a useful stack trace, I think

Copy link
Contributor Author
@gfyoung gfyoung Apr 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this because I wanted to know which element in the for-loop, if any, failed. No stack-trace is going to tell you that AFAICT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need an assert_no_exception or some such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think assert_no_exception would be all the useful - wouldn't that go on every single function call otherwise? Based on @gfyoung's reasining, parameterized test cases would solve this.

I did this because I wanted to know which element in the for-loop, if any, failed.

In what way does this do a better job of doing that, given that dt is not used in the message?


def test_in_bounds_fuzz(self):
# Don't use fixed seed
np.random.seed()

for dt in self.itype[1:]:
for ubnd in [4, 8, 16]:
vals = self.rfunc(2, ubnd, size=2**16, dtype=dt)
assert_(vals.max() < ubnd)
assert_(vals.min() >= 2)
vals = self.rfunc(0, 2, size=2**16, dtype=np.bool)

vals = self.rfunc(0, 2, size=2**16, dtype=np.bool_)

assert_(vals.max() < 2)
assert_(vals.min() >= 0)

Expand Down Expand Up @@ -209,6 +229,29 @@ def test_repeatability(self):
res = hashlib.md5(val).hexdigest()
assert_(tgt[np.dtype(np.bool).name] == res)

def test_int64_uint64_corner_case(self):
# When stored in Numpy arrays, `lbnd` is casted
# as np.int64, and `ubnd` is casted as np.uint64.
# Checking whether `lbnd` >= `ubnd` used to be
# done solely via direct comparison, which is incorrect
# because when Numpy tries to compare both numbers,
# it casts both to np.float64 because there is
# no integer superset of np.int64 and np.uint64. However,
# `ubnd` is too large to be represented in np.float64,
# causing it be round down to np.iinfo(np.int64).max,
# leading to a ValueError because `lbnd` now equals
# the new `ubnd`.

dt = np.int64
tgt = np.iinfo(np.int64).max
lbnd = np.int64(np.iinfo(np.int64).max)
ubnd = np.uint64(np.iinfo(np.int64).max + 1)

# None of these function calls should
# generate a ValueError now.
actual = np.random.randint(lbnd, ubnd, dtype=dt)
assert_equal(actual, tgt)

def test_respect_dtype_singleton(self):
# See gh-7203
for dt in self.itype:
Expand Down
9 changes: 0 additions & 9 deletions numpy/random/tests/test_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,15 +55,6 @@ def test_permutation_longs(self):
b = np.random.permutation(long(12))
assert_array_equal(a, b)

def test_randint_range(self):
# Test for ticket #1690
lmax = np.iinfo('l').max
lmin = np.iinfo('l').min
try:
random.randint(lmin, lmax)
except:
raise AssertionError

def test_shuffle_mixed_dimension(self):
# Test for trac ticket #2074
for t in [[1, 2, 3, None],
Expand Down
0