Silent overflow error in `numpy.random.default_rng.negative_binomial` #18997

SV-97 · 2021-05-12T09:41:42Z

There still seem to be overflow/underflow problems for negative_binomial even in the "new" rng-API (see also issue 1494).

I'd expect the behaviour to be similar to poisson:

>>> import numpy as np
>>> np.random.default_rng(1).poisson(2**64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_generator.pyx", line 3088, in numpy.random._generator.Generator.poisson
  File "_common.pyx", line 815, in numpy.random._common.disc
  File "_common.pyx", line 374, in numpy.random._common.check_constraint
ValueError: lam value too large

but there are no errors.

In particular the "tipover"-point after which it overflows depends on p. For p=0.5 it's at 9223372037481356287 (so default_rng(1).negative_binomial(9223372037481356287, 0.5) > 0 and default_rng(1).negative_binomial(9223372037481356287+1, 0.5) < 0) which still works fine for p=0.9. For p=0 (which shouldn't even be allowed and raise an error) I've experienced it to be at exactly 2**63, but I can't reproduce that now; on my current machine it seems to be the case that lambda n: np.random.default_rng(1).negative_binomial(n, 0) is identically equal to -2**63 for all n.

Reproducing code example:

In [1]: from numpy.random import default_rng                                    

In [2]: default_rng(1).negative_binomial(1859539395891982336, 0.1)              
Out[2]: -9223372036854775808
...
In [14]: default_rng(seed=1).negative_binomial(10e20, 0.1)                      
Out[14]: -9223372036854775808

In [15]: default_rng(seed=1).negative_binomial(10e20, 0.5)                    
Out[15]: -9223372036854775808

import numpy as np
print(np.random.default_rng(1).negative_binomial(2**64, 0.1))

Error message:

The code samples don't raise an error regardless of np.seterr(all="raise").

NumPy/Python version information:

In [5]: import sys, numpy; print(numpy.__version__, sys.version)
1.18.5 3.8.3 (default, Jul  2 2020, 16:21:59) 
[GCC 7.3.0]

This has been verified on python 3.8.3, numpy version 1.18.5 and 1.17.4 running on linux mint 20 as well as python 3.8.5, numpy version 1.19.2 running on macOS Big Sur (Version 11.3).

The text was updated successfully, but these errors were encountered:

seberg · 2021-05-12T21:50:53Z

IIRC, there is probably a NaN or inf created somewhere in the algorithm, and casting that to integer creates the weird value. In any case, I suppose it should be an error or at least a warning.

Making floating point error checking work here (so that np.seterr works), probably also works to at least give an "invalid value" warning.

bashtage · 2021-05-13T07:13:38Z

I'm not surprised. There has not been systematic testing to determine the breakdown point of many of the algorithms. Many that aren't simple transformations of normals or uniforms probably will have incorrect distributions for parameters near these points as well.

bashtage · 2021-05-13T07:15:08Z

For p=0 (which shouldn't even be allowed and raise an error)

I don't agree with this. Both 0 and 1 are in the support of p.

SV-97 · 2021-05-20T19:15:54Z

For p=0 (which shouldn't even be allowed and raise an error)

I don't agree with this. Both 0 and 1 are in the support of p.

I'm no expert in probability theory and was merely citing my ressources (German wikipedia states p∈(0,1), my uni's ressources (0,1] - I just checked some more and English wikipedia states [0,1], wolfram alpha (0,1)). Just looking at the pmf I'd guess that allowing 0 would lead to the distribution not being normed since it'd be identically zero - but it's also very well possible that I'm missing something here. Allowing 1 seems fine to me. Maybe it's a formal thing that people like to ignore? Or it depends on which of the few alternative definitions one uses.

bashtage · 2021-05-20T21:49:59Z

p=0 cleanly limits to infinity. However here we have integer values and so 0 and parameters close to 0 should be excluded since infinity cannot be represented (p<=0 is excluded, but the real exclusion range should depend on both p and n). p=1 cleanly limits to 0 since there are always no failures before n successes.

rkern · 2021-05-25T03:24:17Z

FWIW, I like to take a pragmatic approach for these domain endpoints. I rarely take any textbook's pronouncement of the open-/closed-ness of the domains as a given. Those are typically the most variable things between sources. Some sources like to exclude some endpoints because it would make the general formula (that otherwise applies cleanly to the interior) more complicated. And other sources will include them because their treatment of the resulting infinities is obvious from the context (or they just don't care).

In most cases, there is no thought given at all to the pragmatic concerns of numerical computation in floating point or finite-integer arithmetic, and that's what really concerns me. I'm happy to modify the definition somewhat to handle these extreme cases somewhat felicitously.

The p=0 or p~0 case takes some thought, though. Every probability distribution that we implement isn't actually the ideal probability distribution. We're subject to constraints in that we simply can't represent all of the possible random values for most distributions. So, one way or the other, we're modifying the definition of the distribution in some way. There are two main approaches, but only one is workable here, I think. The one that doesn't work is to just reject anything above the integer that you can represent. That redefines the distribution that just cuts off at the largest integer that you can represent and renormalizes the remainder. The problem for that case here is that you'll be rejecting forever.

The other approach is to permit that probability mass, but just reassign it to the last possible value. That is, if X ~ NegBin(n, p), then X' ~ min(NegBin(n, p), BIGGEST_INT), more or less. We could implement this relatively cleanly in random_poisson_ptrs() (which is where the overflow is happening), I think.

https://github.com/numpy/numpy/blob/main/numpy/random/src/distributions/distributions.c#L562

We could just clip the result of the floor() before casting it, I think. I think all of the rest of the logic would continue to work. That would also let us relax the limit that we place on the lam parameter of the Generator.poisson() method in Cython.

seberg added 00 - Bug component: numpy.random labels May 12, 2021

Androp0v mentioned this issue Feb 6, 2022

BUG: Add parameter check to negative_binomial #21005

Merged

charris closed this as completed in #21005 Mar 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Silent overflow error in `numpy.random.default_rng.negative_binomial` #18997

Silent overflow error in `numpy.random.default_rng.negative_binomial` #18997

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Silent overflow error in numpy.random.default_rng.negative_binomial #18997

Silent overflow error in numpy.random.default_rng.negative_binomial #18997

Comments

Uh oh!

Reproducing code example:

Error message:

NumPy/Python version information:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Silent overflow error in `numpy.random.default_rng.negative_binomial` #18997

Silent overflow error in `numpy.random.default_rng.negative_binomial` #18997