-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: random: dirichlet(alpha)
can return nans in some cases.
#24210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Don't call the C function random_beta() with both parameters `a` and `b` set to 0. In the case where this would occur, we know that the remaining values in the random vector being generated must be 0, so we set the remaining values to 0 and exit the loop that is generating the vector. This change also disallows alpha being a vector of all zeros. Closes numpygh-24210.
Don't call the C function random_beta() with both parameters `a` and `b` set to 0. In the case where this would occur, we know that the remaining values in the random vector being generated must be 0, so we set the remaining values to 0 and exit the loop that is generating the vector. Closes numpygh-24210.
I proposed a fix in #24220. In that pull request, I disallowed an input that is all zeros. However, I'm now wondering if instead of an error, the return value in that case should just be a vector of zeros. Any opinions? |
I think that returning a vector of zeros would be more consistent with #23440 |
The only thing I don't like about returning a vector of zeros is that it breaks the invariant that the sum of the vector is 1. But if the invariant is stated more carefully as "the sum of the variate components corresponding to nonzero Unless anyone brings up other points to consider, I'll update the PR later today or tomorrow to return all zeros instead of raising an exception. |
Don't call the C function random_beta() with both parameters `a` and `b` set to 0. In the case where this would occur, we know that the remaining values in the random vector being generated must be 0, so can break out of the loop early. After this change, when alpha is all zero, the random variates will also be all zero. Closes numpygh-24210.
I was looking at the PR going to suggest to simplify the branch a bit and to note that subnormals may need to be accounted for. Which led me to this:
which returns a mix of To me this looks like it isn't correct to return all zeros? Dirichlet doesn't have all dimensions result converge to return towards 0 in the limit of Even more obvious: A single dimensional dirichlet always returns 1 right now. (The subnormal problem is that within
|
Yeah, and to be clear, the handling of subnormals by the C function
|
Right, I first thought you could just use |
Another quick note:
|
I created a separate issue about I'll submit a PR with a fix for that issue. |
Don't call the C function random_beta() with both parameters `a` and `b` set to 0. In the case where this would occur, we know that the remaining values in the random vector being generated must be 0, so can break out of the loop early. After this change, when alpha is all zero, the random variates will also be all zero. Closes numpygh-24210.
Don't call the C function random_beta() with both parameters `a` and `b` set to 0. In the case where this would occur, we know that the remaining values in the random vector being generated must be 0, so can break out of the loop early. After this change, when alpha is all zero, the random variates will also be all zero. Closes numpygh-24210.
Describe the issue:
When all the values in
alpha
are less than 0.1, andalpha
ends in two or more zeros, the components of the variates returned bydirichlet(alpha)
corresponding to those final zeros will be nan.For example,
When all the values in
alpha
are less than 0.1,dirichlet
uses the algorithm that is based on the beta distribution. The problem occurs becausedirichlet
ends up calling the C functionrandom_beta
with both parametersa
andb
being 0, which results inrandom_beta
returningnan
. Currently, the public API forbeta
requires botha
andb
to be positive; this is checked before thebeta
method calls the C functionrandom_beta
. Thedirichlet
code callsrandom_beta
directly, so that validation is bypassed.It looks like
random_beta
handles one parameter being 0 in a manner consistent with the reasoning that allowsdirichlet
to have zeros inalpha
. That's whynan
s are produced only when there are two or more zeros at the end ofalpha
, because that is the only case wheredirichlet
will callrandom_beta
with both parameters being 0.It shouldn't be too difficult to fix to
dirichlet
to handle the case where all the values inalpha
are less than 0.1, and two or more values at the end ofalpha
are 0.But there is a remaining question that was not brought up in #22547 or #23440: how should an input that is all zeros be handled? Some options (ordered by my preference):
len(alpha) - 1
zeros and single 1 at a random position in the vector). (On second thought, there probably isn't any reasonable justification for this.)Runtime information:
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: