-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Add parameter check to negative_binomial #21005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/random/mtrand.pyx
Outdated
@@ -3510,10 +3510,11 @@ cdef class RandomState: | |||
... print(i, "wells drilled, probability of one success =", probability) | |||
|
|||
""" | |||
out = disc(&legacy_negative_binomial, &self._aug_state, size, self.lock, 2, 0, | |||
out = disc(&legacy_negative_binomial, &self._aug_state, size, self.lock, 2, 0, 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to mtrand aren't allowed in general if they would change the stream. All that could be done here would be to protect against values that lead to infinite loops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify what do you mean with changing the stream? Is changing the signature of disc
not allowed?
Otherwise, note that I didn't change the output of any of the functions in mtrand at all (not even to protect against the infinite loops, calling np.random.negative_binomial(2**64, 0.1)
would still result in an infinite loop, the checks only apply to the calls to default_rng().negative_binomial()
), I just updated the function calls here since the new disc
signature has an extra parameter, which is set to an ignored value for all functions in mtrand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a value slightly above lam max in your code would eventually return, then you have now changed the stream. I haven't reasoned carefully about this but as a general rule mtrand only changes for trivial things like doc string fixes and to keep to compiling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood your point about the stream now (thanks!). However, upon reading NEP-19 on your link, it's still unclear to me if this applies to the new PRNGs too (i.e. are changes to np.random.default_rng().negative_binomial
's stream allowed?).
Calls to np.random.negative_binomial
(the one in mtrand) in this PR do not raise an error even if they would result in a value of lam too large to ever return, so the stream is not affected for any function in mtrand. Only the negative_binomial
distribution that is created when calling a Generator
method is affected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes and improvements to Generator
are definitely allowed. Changes like this that prevent infinite loops, which are really bugs, are definitely in bounds. I think it would help if you removed any changes to mtrand here so it would be easier to think about the changes to the checks independently from the NEP issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the check shoudl be pulled up into _generator
so that there is no need to alter _common
. It would only make sense to alter _common
if there were more examples of this type. The flow will be to determine if both inputs are scalar, and then use a scalar check. If either is not a scalar, then you should convert to arrays using the C-api using the correct conversion types, and check the elements there.
numpy/random/_common.pyx
Outdated
@@ -408,9 +408,18 @@ cdef int check_array_constraint(np.ndarray val, object name, constraint_type con | |||
raise ValueError("{0} value too large".format(name)) | |||
elif not np.all(np.greater_equal(val, 0.0)): | |||
raise ValueError("{0} < 0 or {0} contains NaNs".format(name)) | |||
elif cons == CONS_NEGATIVE_BINOMIAL_POISSON: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONS_POISSON
. Seems like the only difference is possibly in the exception.
numpy/random/_common.pyx
Outdated
cdef int check_constraint_pair(double val0, double val1, object name_0, object name_1, constraint_type cons) except -1: | ||
cdef double max_lam | ||
if cons == CONS_NEGATIVE_BINOMIAL_POISSON: | ||
max_lam = (1 - val1) / val1 * (val0 + 10 * np.sqrt(np.array(val0).astype(np.double))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cast double values to array? This will be quite slow. Just use sqrt
from libc on doubles.
Thanks for taking the time to review this. I put the checks in |
464f302
to
b916807
Compare
…loop for large values
b916807
to
f3d450d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be simplified a bit more.
…gative_binomial
e29bf7a
to
c1a95ff
Compare
numpy/random/_generator.pyx
Outdated
if np.any(np.greater(max_lam_arr, POISSON_LAM_MAX)): | ||
raise ValueError("n too large or p too small") | ||
else: | ||
_dp = PyFloat_AsDouble(p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the scalar path, think you can use something like
cdef double *_dp
cdef int64_t *_in
_in = <int64_t*>np.PyArray_DATA(n_arr)[0]
_dp = <double*>np.PyArray_DATA(p_arr)[0]
check_constraint(<double>_in[0], 'n', CONS_POSITIVE_NOT_NAN)
check_constraint(_dp[0], 'p', CONS_BOUNDED_GT_0_1)
since you already have paid the price for the conversion above to arrays, so can just grab the first element form each when scalar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this surfaced some issues with my implementation: while numpy's binomial
accepts floating point values for the negative_binomial
used to accept floating point arguments without truncation, so the n
array should actually be of type double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done now, hopefully!
3d5ecd1
to
cc2ddfc
Compare
numpy/random/_generator.pyx
Outdated
# call to the poisson distribution function with a value of lam too large. | ||
max_lam_arr = (1 - p_arr) / p_arr * (n_arr + 10 * np.sqrt(n_arr)) | ||
if np.any(np.greater(max_lam_arr, POISSON_LAM_MAX)): | ||
raise ValueError("n too large or p too small") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a bit to the Notes section of the docstring explaining the cross-parameter restriction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally should reference the note in the exception so that users know precisely what has happened
numpy/random/_generator.pyx
Outdated
check_constraint(_dp[0], 'p', CONS_BOUNDED_GT_0_1) | ||
# Check that the choice of negative_binomial parameters won't result in a | ||
# call to the poisson distribution function with a value of lam too large. | ||
9E88 | _dmax_lam = (1 - _dp[0]) / _dp[0] * (_dn[0] + 10 * np.sqrt(_dn[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cdef double _dmax_lam above
numpy/random/_generator.pyx
Outdated
check_constraint(_dp[0], 'p', CONS_BOUNDED_GT_0_1) | ||
# Check that the choice of negative_binomial parameters won't result in a | ||
# call to the poisson distribution function with a value of lam too large. | ||
_dmax_lam = (1 - _dp[0]) / _dp[0] * (_dn[0] + 10 * np.sqrt(_dn[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use libc.math.sqrt
from libc.math cimport sqrt
A few easy fixes that looks ready. |
numpy/random/_generator.pyx
Outdated
within 10 sigma of the max acceptable value of the ``Generator.poisson`` | ||
method. This happens when :math:`p` is too low (a lot of failures happen | ||
for every success) and :math:`n` is too big (a lot of sucesses are | ||
allowed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add in the mathematical formula for computing the max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
I think my last was the final change needed before we invite a committer to review. |
fc084ea
to
ba31f5a
Compare
ba31f5a
to
c320949
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks @Androp0v!
c320949
to
2686f44
Compare
Thanks @Androp0v . |
Hi!
First contribution to numpy here, hope I did it right.
Searched the repo to see if this had been solved, but I couldn't find any other PR regarding this. PR marked as draft since it has no tests. Will add tests if you deem this approach to solving #18997 acceptable.Add parameter check to
negative_binomial
to avoid internally calling poisson distribution generator with an invalid lam parameter.Fixes #18997, noticed that it Numpy 1.22.2, macOS 12.3, Arm64 (Apple Silicon), Python 3.8.5, calling the issue's sample code:
No longer makes a silent overflow, but instead enters an infinite loop. To solve it:
Modified the internaldisc
function to support parameter checks acting on pairs of parameters (the value that caused an overflow / infinite loop was dependant on both then
andp
values).Added extra parameter constraint typeCONS_NEGATIVE_BINOMIAL_POISSON
. This parameter check:POISSON_LAM_MAX
), raise aValueError
.Sample code
After the fix, the sample code (above), returns the following:
Notes:
The parameter check is very conservative (maybe too much?). Since the
POISSON_LAM_MAX
is already conservative (same 10-std deviation margin) and I added another 10-std margin on top of that for thenegative_binomial
generator (the gamma function would have to get to a sample value 10-stds above the mean, and then call the poisson function with that value and get another 10-std above the (poisson) mean to cause a numerical overflow. Maybe the threshold of the negative binomial check could be brought down a bit (5-stds for the gamma function max, for example, but that would mean that occasionally the poisson generator could be called with a lam value above what is possible by directly callingnp.random.default_rng(1).poisson
.Thoughts?