BUG: Add parameter check to negative_binomial #21005

Androp0v · 2022-02-06T18:06:58Z

Hi!

First contribution to numpy here, hope I did it right. Searched the repo to see if this had been solved, but I couldn't find any other PR regarding this. PR marked as draft since it has no tests. Will add tests if you deem this approach to solving #18997 acceptable.

Add parameter check to negative_binomial to avoid internally calling poisson distribution generator with an invalid lam parameter.

Fixes #18997, noticed that it Numpy 1.22.2, macOS 12.3, Arm64 (Apple Silicon), Python 3.8.5, calling the issue's sample code:

default_rng(1).negative_binomial(2**62, 0.1)

No longer makes a silent overflow, but instead enters an infinite loop. To solve it:

~~Modified the internal disc function to support parameter checks acting on pairs of parameters (the value that caused an overflow / infinite loop was dependant on both the n and p values).~~
~~Added extra parameter constraint type CONS_NEGATIVE_BINOMIAL_POISSON. This parameter check:~~
- Computes the mean value of the gamma distribution used as an intermediate step in building the negative binomial distribution.
- Computes the standard deviation of that gamma distribution.
- If a value within 10 standard deviations of the mean of such gamma function would invoke the poisson generator with a value of λ (lam) that would be above the (already present) max safe value of lam (POISSON_LAM_MAX), raise a ValueError.

Sample code

After the fix, the sample code (above), returns the following:

>>> np.random.default_rng(1).negative_binomial(2.0**62, 0.1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_generator.pyx", line 3051, in numpy.random._generator.Generator.negative_binomial
ValueError: n too large or p too small

Notes:

The parameter check is very conservative (maybe too much?). Since the POISSON_LAM_MAX is already conservative (same 10-std deviation margin) and I added another 10-std margin on top of that for the negative_binomial generator (the gamma function would have to get to a sample value 10-stds above the mean, and then call the poisson function with that value and get another 10-std above the (poisson) mean to cause a numerical overflow. Maybe the threshold of the negative binomial check could be brought down a bit (5-stds for the gamma function max, for example, but that would mean that occasionally the poisson generator could be called with a lam value above what is possible by directly calling np.random.default_rng(1).poisson.

Thoughts?

bashtage · 2022-02-06T22:56:02Z

numpy/random/mtrand.pyx

@@ -3510,10 +3510,11 @@ cdef class RandomState:
        ...    print(i, "wells drilled, probability of one success =", probability)

        """
-        out = disc(&legacy_negative_binomial, &self._aug_state, size, self.lock, 2, 0,
+        out = disc(&legacy_negative_binomial, &self._aug_state, size, self.lock, 2, 0, 0,


Changes to mtrand aren't allowed in general if they would change the stream. All that could be done here would be to protect against values that lead to infinite loops.

Could you clarify what do you mean with changing the stream? Is changing the signature of disc not allowed?

Otherwise, note that I didn't change the output of any of the functions in mtrand at all (not even to protect against the infinite loops, calling np.random.negative_binomial(2**64, 0.1) would still result in an infinite loop, the checks only apply to the calls to default_rng().negative_binomial()), I just updated the function calls here since the new disc signature has an extra parameter, which is set to an ignored value for all functions in mtrand.

If a value slightly above lam max in your code would eventually return, then you have now changed the stream. I haven't reasoned carefully about this but as a general rule mtrand only changes for trivial things like doc string fixes and to keep to compiling.

See https://numpy.org/neps/nep-0019-rng-policy.html

Understood your point about the stream now (thanks!). However, upon reading NEP-19 on your link, it's still unclear to me if this applies to the new PRNGs too (i.e. are changes to np.random.default_rng().negative_binomial's stream allowed?).

Calls to np.random.negative_binomial (the one in mtrand) in this PR do not raise an error even if they would result in a value of lam too large to ever return, so the stream is not affected for any function in mtrand. Only the negative_binomial distribution that is created when calling a Generator method is affected.

Changes and improvements to Generator are definitely allowed. Changes like this that prevent infinite loops, which are really bugs, are definitely in bounds. I think it would help if you removed any changes to mtrand here so it would be easier to think about the changes to the checks independently from the NEP issues.

bashtage

I think the check shoudl be pulled up into _generator so that there is no need to alter _common. It would only make sense to alter _common if there were more examples of this type. The flow will be to determine if both inputs are scalar, and then use a scalar check. If either is not a scalar, then you should convert to arrays using the C-api using the correct conversion types, and check the elements there.

bashtage · 2022-02-15T18:35:53Z

numpy/random/_common.pyx

@@ -408,9 +408,18 @@ cdef int check_array_constraint(np.ndarray val, object name, constraint_type con
            raise ValueError("{0} value too large".format(name))
        elif not np.all(np.greater_equal(val, 0.0)):
            raise ValueError("{0} < 0 or {0} contains NaNs".format(name))
+    elif cons == CONS_NEGATIVE_BINOMIAL_POISSON:


< 8000 p dir="auto">Isn't this redundant with CONS_POISSON. Seems like the only difference is possibly in the exception.

bashtage · 2022-02-15T18:38:28Z

numpy/random/_common.pyx

+cdef int check_constraint_pair(double val0, double val1, object name_0, object name_1, constraint_type cons) except -1:
+    cdef double max_lam
+    if cons == CONS_NEGATIVE_BINOMIAL_POISSON:
+        max_lam = (1 - val1) / val1 * (val0 + 10 * np.sqrt(np.array(val0).astype(np.double)))


Why cast double values to array? This will be quite slow. Just use sqrt from libc on doubles.

Androp0v · 2022-02-17T12:23:58Z

I think the check shoudl be pulled up into _generator so that there is no need to alter _common. It would only make sense to alter _common if there were more examples of this type. The flow will be to determine if both inputs are scalar, and then use a scalar check. If either is not a scalar, then you should convert to arrays using the C-api using the correct conversion types, and check the elements there.

Thanks for taking the time to review this. I put the checks in _common as I thought all checks were being performed there, but upon further research I've found some functions in _generator also perform changes there. I'll change the implementation to match your suggestions.

…loop for large values

bashtage

I think it can be simplified a bit more.

numpy/random/_generator.pyx

…gative_binomial

bashtage · 2022-02-17T16:52:08Z

numpy/random/_generator.pyx

+            if np.any(np.greater(max_lam_arr, POISSON_LAM_MAX)):
+                raise ValueError("n too large or p too small")
+        else:
+            _dp = PyFloat_AsDouble(p)


In the scalar path, think you can use something like

cdef double *_dp cdef int64_t *_in _in = <int64_t*>np.PyArray_DATA(n_arr)[0] _dp = <double*>np.PyArray_DATA(p_arr)[0] check_constraint(<double>_in[0], 'n', CONS_POSITIVE_NOT_NAN) check_constraint(_dp[0], 'p', CONS_BOUNDED_GT_0_1)

since you already have paid the price for the conversion above to arrays, so can just grab the first element form each when scalar.

Okay, this surfaced some issues with my implementation: while numpy's binomial accepts floating point values for the n argument, they're truncated to integer values. On the other hand, negative_binomial used to accept floating point arguments without truncation, so the n array should actually be of type double.

Done now, hopefully!

bashtage · 2022-02-18T16:39:09Z

numpy/random/_generator.pyx

+            # call to the poisson distribution function with a value of lam too large.
+            max_lam_arr = (1 - p_arr) / p_arr * (n_arr + 10 * np.sqrt(n_arr))
+            if np.any(np.greater(max_lam_arr, POISSON_LAM_MAX)):
+                raise ValueError("n too large or p too small")


Could you add a bit to the Notes section of the docstring explaining the cross-parameter restriction?

Ideally should reference the note in the exception so that users know precisely what has happened

bashtage · 2022-02-18T16:40:11Z

numpy/random/_generator.pyx

+            check_constraint(_dp[0], 'p', CONS_BOUNDED_GT_0_1)
+            # Check that the choice of negative_binomial parameters won't result in a
+            # call to the poisson distribution function with a value of lam too large.
+            _dmax_lam = (1 - _dp[0]) / _dp[0] * (_dn[0] + 10 * np.sqrt(_dn[0]))


cdef double _dmax_lam above

bashtage · 2022-02-18T16:41:08Z

numpy/random/_generator.pyx

+            check_constraint(_dp[0], 'p', CONS_BOUNDED_GT_0_1)
+            # Check that the choice of negative_binomial parameters won't result in a
+            # call to the poisson distribution function with a value of lam too large.
+            _dmax_lam = (1 - _dp[0]) / _dp[0] * (_dn[0] + 10 * np.sqrt(_dn[0]))


use libc.math.sqrt

from libc.math cimport sqrt

bashtage · 2022-02-18T16:41:38Z

A few easy fixes that looks ready.

bashtage · 2022-02-18T17:23:29Z

numpy/random/_generator.pyx

+        within 10 sigma of the max acceptable value of the ``Generator.poisson``
+        method. This happens when :math:`p` is too low (a lot of failures happen
+        for every success) and :math:`n` is too big (a lot of sucesses are
+        allowed).


Can you add in the mathematical formula for computing the max?

bashtage · 2022-02-18T17:24:09Z

I think my last was the final change needed before we invite a committer to review.

numpy/random/_generator.pyx

bashtage · 2022-02-21T09:51:39Z

@rkern @mattip @charris This looks correct and sensible to me. It would be good if you could take a look.

rkern

LGTM! Thanks @Androp0v!

numpy/random/_generator.pyx

charris · 2022-03-13T19:16:32Z

Thanks @Androp0v .

Androp0v changed the title ~~Add parameter check to negative_binomial~~ BUG: Add parameter check to negative_binomial Feb 6, 2022

github-actions bot added the 00 - Bug label Feb 6, 2022

bashtage reviewed Feb 6, 2022

View reviewed changes

bashtage suggested changes Feb 15, 2022

View reviewed changes

Androp0v force-pushed the negative_binomial-checks branch 2 times, most recently from 464f302 to b916807 Compare February 17, 2022 15:25

Add parameter check in negative_binomial generator to avoid infinite …

f3d450d

…loop for large values

Androp0v force-pushed the negative_binomial-checks branch from b916807 to f3d450d Compare February 17, 2022 15:54

bashtage suggested changes Feb 17, 2022

View reviewed changes

numpy/random/_generator.pyx Outdated Show resolved Hide resolved

numpy/random/_generator.pyx Outdated Show resolved Hide resolved

Simplify check and add test for invalid combinations of p and n in ne… 8000

c1a95ff

…gative_binomial

Androp0v force-pushed the negative_binomial-checks branch from e29bf7a to c1a95ff Compare February 17, 2022 16:44

Androp0v marked this pull request as ready for review February 17, 2022 16:45

bashtage reviewed Feb 17, 2022

View reviewed changes

Androp0v marked this pull request as draft February 18, 2022 10:07

Performance improvements for negative_binomial checks

cc2ddfc

Androp0v force-pushed the negative_binomial-checks branch from 3d5ecd1 to cc2ddfc Compare February 18, 2022 12:08

Androp0v marked this pull request as ready for review February 18, 2022 12:10

bashtage reviewed Feb 18, 2022

View reviewed changes

bashtage reviewed Feb 18, 2022 F438

View reviewed changes

Androp0v force-pushed the negative_binomial-checks branch from fc084ea to ba31f5a Compare February 18, 2022 17:50

bashtage reviewed Feb 18, 2022

View reviewed changes

numpy/random/_generator.pyx Outdated Show resolved Hide resolved

Androp0v force-pushed the negative_binomial-checks branch from ba31f5a to c320949 Compare February 18, 2022 18:01

rkern approved these changes Feb 21, 2022

View reviewed changes

bashtage reviewed Feb 21, 2022

View reviewed changes

numpy/random/_generator.pyx Outdated Show resolved Hide resolved

Improved negative_binomial checks documentation

2686f44

Androp0v force-pushed the negative_binomial-checks branch from c320949 to 2686f44 Compare February 21, 2022 17:00

charris merged commit 55f31b2 into numpy:main Mar 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Add parameter check to negative_binomial #21005

BUG: Add parameter check to negative_binomial #21005

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: Add parameter check to negative_binomial #21005

BUG: Add parameter check to negative_binomial #21005

Uh oh!

Conversation

Uh oh!

Sample code

Notes:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!