-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Add np.random.dirichlet2 #5872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4602,6 +4602,141 @@ cdef class RandomState: | |
|
||
return diric | ||
|
||
def dirichlet2(self, object alpha, size=None): | ||
""" | ||
dirichlet2(alpha, size=None) | ||
|
||
Draw samples from the Dirichlet distribution. | ||
|
||
Draw `size` samples of dimension k from a Dirichlet distribution. A | ||
Dirichlet-distributed random variable can be seen as a multivariate | ||
generalization of a Beta distribution. Dirichlet pdf is the conjugate | ||
prior of a multinomial in Bayesian inference. | ||
|
||
Parameters | ||
---------- | ||
alpha : array | ||
Parameter of the distribution (k dimension for sample of | ||
dimension k). | ||
size : int or tuple of ints, optional | ||
Output shape. If the given shape is, e.g., ``(m, n, k)``, then | ||
``m * n * k`` samples are drawn. Default is None, in which case a | ||
single value is returned. | ||
|
||
Returns | ||
------- | ||
samples : ndarray, | ||
The drawn samples, of shape (size, alpha.ndim). | ||
|
||
Notes | ||
----- | ||
.. math:: X \\approx \\prod_{i=1}^{k}{x^{\\alpha_i-1}_i} | ||
|
||
Uses the following property for computation: for each dimension, | ||
draw a random sample y_i from a standard gamma generator of shape | ||
`alpha_i`, then | ||
:math:`X = \\frac{1}{\\sum_{i=1}^k{y_i}} (y_1, \\ldots, y_n)` is | ||
Dirichlet distributed. | ||
|
||
References | ||
---------- | ||
.. [1] David McKay, "Information Theory, Inference and Learning | ||
Algorithms," chapter 23, | ||
http://www.inference.phy.cam.ac.uk/mackay/ | ||
.. [2] Wikipedia, "Dirichlet distribution", | ||
http://en.wikipedia.org/wiki/Dirichlet_distribution | ||
|
||
Examples | ||
-------- | ||
Taking an example cited in Wikipedia, this distribution can be used if | ||
one wanted to cut strings (each of initial length 1.0) into K pieces | ||
with different lengths, where each piece had, on average, a designated | ||
average length, but allowing some variation in the relative sizes of | ||
the pieces. | ||
|
||
>>> s = np.random.dirichlet((10, 5, 3), 20).transpose() | ||
|
||
>>> plt.barh(range(20), s[0]) | ||
>>> plt.barh(range(20), s[1], left=s[0], color='g') | ||
>>> plt.barh(range(20), s[2], left=s[0]+s[1], color='r') | ||
>>> plt.title("Lengths of Strings") | ||
|
||
""" | ||
|
||
#================= | ||
# Pure python algo | ||
#================= | ||
#alpha = N.atleast_1d(alpha) | ||
#k = alpha.size | ||
|
||
#if n == 1: | ||
# val = N.zeros(k) | ||
# for i in range(k): | ||
# val[i] = sgamma(alpha[i], n) | ||
# val /= N.sum(val) | ||
#else: | ||
# val = N.zeros((k, n)) | ||
# for i in range(k): | ||
# val[i] = sgamma(alpha[i], n) | ||
# val /= N.sum(val, axis = 0) | ||
# val = val.T | ||
|
||
#return val | ||
|
||
cdef npy_intp k | ||
cdef npy_intp totsize | ||
cdef ndarray alpha_arr, val_arr | ||
cdef double *alpha_data | ||
cdef double *val_data | ||
cdef npy_intp i, j | ||
cdef double acc, invacc | ||
|
||
cdef double alpha_max | ||
cdef double alpha_sum | ||
|
||
k = len(alpha) | ||
alpha_arr = <ndarray>PyArray_ContiguousFromObject(alpha, NPY_DOUBLE, 1, 1) | ||
alpha_data = <double*>PyArray_DATA(alpha_arr) | ||
|
||
shape = _shape_from_size(size, k) | ||
|
||
diric = np.zeros(shape, np.float64) | ||
val_arr = <ndarray>diric | ||
val_data= <double*>PyArray_DATA(val_arr) | ||
|
||
i = 0 | ||
totsize = PyArray_SIZE(val_arr) | ||
alpha_max = alpha_arr.max() | ||
|
||
if alpha_max < 1.0: | ||
alpha_sum = alpha_arr.sum() | ||
"""Generation via beta-RVs""" | ||
with self.lock, nogil: | ||
while i < totsize: | ||
acc = 1.0 | ||
invacc = alpha_sum | ||
for j from 0<= j < k-1: | ||
invacc -= alpha_data[j] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this numerically stable? Or is there the risk that floating point rounding will have us end with a negative
and index into it to retrieve the partial sums. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To answer my own question:
So we need to do something about this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes thats true of course. I had also thought about storing the cumsum of the alphas, but was hesitant to introduce another array. I think we also need to guard against underflow of acc. Let me think about this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe use something recursive like http://en.wikipedia.org/wiki/Pairwise_summation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's only a mitigation, not a fix. Consider the case of an failure with On Wed, May 13, 2015 at 2:42 PM, argriffing notifications@github.com
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If there is a failure with only two elements in the array then I'm not understanding something because the 2-element case is the same as the beta distribution. I'm assuming we can sample from a beta distribution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This shouldn't stop the technical discussion, but FTR I'm -1 on merging
|
||
val_data[i+j] = acc * rk_beta(self.internal_state, alpha_data[j], invacc) | ||
acc = acc - val_data[i+j] | ||
val_data[i+(k-1)] = acc | ||
i = i+k | ||
|
||
else: | ||
with self.lock, nogil: | ||
while i < totsize: | ||
acc = 0.0 | ||
for j from 0 <= j < k: | ||
val_data[i+j] = rk_standard_gamma(self.internal_state, | ||
alpha_data[j]) | ||
acc = acc + val_data[i+j] | ||
invacc = 1/acc | ||
for j from 0 <= j < k: | ||
val_data[i+j] = val_data[i+j] * invacc | ||
i = i + k | ||
|
||
return diric | ||
|
||
# Shuffling and permutations: | ||
def shuffle(self, object x): | ||
""" | ||
|
@@ -4755,6 +4890,7 @@ logseries = _rand.logseries | |
multivariate_normal = _rand.multivariate_normal | ||
multinomial = _rand.multinomial | ||
dirichlet = _rand.dirichlet | ||
dirichlet2 = _rand.dirichlet2 | ||
|
||
shuffle = _rand.shuffle | ||
permutation = _rand.permutation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate the general look of this alignment scheme, but I understand you are just replicating what is elsewhere in this module. In any case, there should be a space before
=
, even if there isn't in the equivalent line ofdirichlet
.