-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Add np.random.dirichlet2 #5872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Add np.random.dirichlet2 #5872
Conversation
np.random.dirichlet fails for small alpha parameters (see numpy#5851). The new implmentation np.random.dirichlet2 switches to generation via beta RVs (stick breaking approach) whenever all alpha parameters are smaller than one.
|
||
diric = np.zeros(shape, np.float64) | ||
val_arr = <ndarray>diric | ||
val_data= <double*>PyArray_DATA(val_arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate the general look of this alignment scheme, but I understand you are just replicating what is elsewhere in this module. In any case, there should be a space before =
, even if there isn't in the equivalent line of dirichlet
.
If you want to add unit tests, it would make sense to at least add tests analogous to The first test The second test |
When I mentioned an analogy to http://en.wikipedia.org/wiki/Pairwise_summation in response to @jaimefrio's concerns about stability, I had in mind something like the following: from __futur
8000
e__ import print_function, division
import numpy as np
def mybeta(a, b):
# allow one or both of (a, b) to be zero
if a < 0 or b < 0:
raise ValueError
if a == 0 and b == 0:
return 0, 0
if a == 0:
return 0, 1
if b == 0:
return 1, 0
c = np.random.beta(a, b)
return c, 1-c
def _pairwise(a, start, stop, out):
# update `out` in-place and return the sub-sequence sum
n = stop - start
if n == 1:
return a[start]
else:
m = start + n // 2
if n == 2:
s0 = a[start]
s1 = a[m]
else:
s0 = _pairwise(a, start, m, out)
s1 = _pairwise(a, m, stop, out)
c0, c1 = mybeta(s0, s1)
out[start:m] *= c0
out[m:stop] *= c1
return s0 + s1
def _dirichlet2(a):
a = np.asarray(a, dtype=float)
out = np.ones_like(a)
_pairwise(a, 0, len(a), out)
return out
def dirichlet2(a, size=None):
if size is None:
return _dirichlet2(a)
else:
return np.array([_dirichlet2(a) for _ in range(size)])
def main():
for a in (
[1, 2, 3, 4],
[1e-20, 1e-18, 0.1],
[0, 0, 0, 42, 0],
[0, 0.1, 0, 0, 0, 1e-20, 0],
):
x = dirichlet2(a, size=10000)
print(x[0])
print(x.mean(axis=0))
print()
main()
This is Edit: updated to allow inputs that are exactly zero |
Anyone still interested in this? |
Yes, I think it is still an important issue. But there seems to be little concensus on how to proceed. |
How would that be done in practice. The new implementation would contain a code path, which would break numpy`strict backward compatibility requirement for random numbers. Would we have two different dirichlet implementation, a legacy one attached to the legacy random state instance, and a new attached to the new random state instances? |
Yes, that's right. Two different versions in different classes. This is
already the case for normals, everything that depends on normals, gammas,
and exponentials.
…On Fri, Apr 12, 2019, 08:02 Benjamin Trendelkamp-Schroer < ***@***.***> wrote:
How would that be done in practice. The new implementation would contain a
code path, which would break numpy`strict backward compatibility
requirement for random numbers.
Would we have two different dirichlet implementation, a legacy one
attached to the legacy random state instance, and a new attached to the new
random state instances?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5872 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFU5RaOIFfOnSFQ8cNY2vGwg6grAVWstks5vgC-ggaJpZM4EZXYz>
.
On Fri, Apr 12, 2019, 08:02 Benjamin Trendelkamp-Schroer < ***@***.***> wrote:
How would that be done in practice. The new implementation would contain a
code path, which would break numpy`strict backward compatibility
requirement for random numbers.
Would we have two different dirichlet implementation, a legacy one
attached to the legacy random state instance, and a new attached to the new
random state instances?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#5872 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFU5RaOIFfOnSFQ8cNY2vGwg6grAVWstks5vgC-ggaJpZM4EZXYz>
.
|
#13163 was merged, you can rebase off HEAD. |
@trendelkampschroer If you have time to work on this, you can replace |
Now would be a good time to change the |
@trendelkampschroer, thanks for getting this started back in 2015 (!). @bashtage pinged you in June about resuming this work, but we haven't had a response. Here's another ping. I'd like to get this PR, or something similar, into NumPy to fix the problem with small As a reminder for anyone else, here's the problem that can occur with
Now that NEP 19 is implemented, we can change the stream of variates produced by a distribution. The legacy code in For what it's worth, here's a Python+NumPy implementation of the "stick-breaking" algorithm:
It handles the small values where the old code fails:
Here's a quick check that it is not generating any
|
Thanks a ton for looking into this and sorry for not getting back to you. I have started an implementation of the stick breaking approach as cython code, but was not able to compile it. I'll dig it up and paste the relevant code snippet. I am sorry, but I can't seem to find the necessary time to see that through, although I absolutely would love to contribute. Please feel free to take this up from here. Thanks so much for putting so much effort into a redesign of numpy random and into the numpy package in general! |
@WarrenWeckesser @mattip While developing this I have found that the beta distribution is suffering from slowness when parameters a, b have small values. I have added tests that demonstrate this. This affects the stick-breaking implementation as the small a, b case is the one that we wanted to fix in the first place. In order to make the stick breaking approach robust for cases with small alpha parameters we first need to adress the slowness in the beta generator. Please let me know If I need to use a different base branch than master for my PR. I just wanted to finally bring this into the world to give this some momentum. |
Closing. There is now #14924. |
np.random.dirichlet fails for small alpha parameters (see #5851).
The new implmentation np.random.dirichlet2 switches to generation
via beta RVs (stick breaking approach) whenever all alpha parameters
are smaller than one.