Description
This issue is motivated by ongoing work in SciPy to improve the existing multivariate distributions and add new ones. We'd like to be sure that, if possible, enhancements that we make remain consistent with the current and future behavior of NumPy's multivariate distributions.
The specific issue is how the size
parameter interacts with broadcasting of the distribution parameters, some of which are 1-d or 2-d arrays rather than scalars.
I'm going to refer to gufunc broadcasting, so it will be helpful to have the gufunc signatures of the existing multivariate distributions (even though they are not actually implemented as gufuncs):
Name Parameters Gufunc signature
--------------------------- -------------- ----------------
dirichlet alpha (n) -> (n)
multinomial n, pvals (),(n) -> (n)
multivariate_normal mean, cov (n),(n,n) -> (n)
multivariate_hypergeometric colors, nsample (n),() -> (n)
@bashtage has a pull request to add broadcasting to multinomial
. As an example, suppose I have four values of n
, say 2, 5, 10 and 20, and I have two discrete probability distributions of length 3, say [0.1, 0.2, 0.7]
and [0.8, 0.2, 0.0]
, and I want to generate 10 multinomial variates for each combination of those inputs. With the code in @bashtage's PR, we can write:
In [85]: n = np.array([2, 5, 10, 20]).reshape(-1, 1)
In [86]: pvals = np.array([[0.1, 0.2, 0.7], [0.8, 0.2, 0.0]])
In [87]: rng = np.random.default_rng()
In [88]: x = rng.multinomial(n, pvals, size=(10, 4, 2))
In [89]: x.shape
Out[89]: (10, 4, 2, 3)
The shape of the output has three sources: (10,) is the number of variates that we wanted for each combination of input parameters; (4, 2) is the broadcast shape of the input parameters, without the dimension associated with the gufunc signature; (3,) is the dimension associated with each variate.
Note that size
is not the shape of the output array--size
does not include the dimension associated with the variates themselves.
The question is how to adapt this to the other multivariate distribution methods. The following is my attempt to generalize the behavior of @bashtage's PR, and document how the size
parameter must be set when broadcasting the distribution parameters.
Edit: the rest of this comment is obsolete; I'll keep the text here to preserve the context of existing responses, but the latest version of this is in a comment below.
Old comment, preserved for context.
- Let S be the shape of the result of broadcasting the distribution parameters according to gufunc broadcasting. (In the example above, this is (4, 2, 3); (4, 2) comes from the "actual" broadcasting (i.e. forming all the pairwise combinations), and (3,) is the dimension associated with the elementary operation of the gufunc.)
- If we want just one variate per broadcast combination of the distribution parameters, we can use
size=None
, and the shape of the output will be S. - Otherwise, drop from S the trailing dimension associated with the output term of the gufunc signature, and call this shape S1. (Continuing the example, this gives S1 = (4, 2).)
- Prepend to S1 the number of variates desired for each broadcast combination, say
m
, to get thesize
:size = (m,) + S1
. (Finishing the example: we wanted 10 variates per broadcast combination, so we havesize=(10, 4, 2)
.) (Note that the desired number of variates could be a tuple of integers instead of an integer. In the example, if we want the 10 variates arranged as a 2-d array with shape (2, 5), we could usesize=(2, 5, 4, 2)
.)
Here's how that looks for an example of broadcasting applied to the multivariate_normal
method. Suppose I want to generate random variates from the multivariate normal distribution. I have four different means, say [0, 0, 0], [1, -1, 1], [0, 0, 5] and [3, 3, 2], and two covariance matrices, each with shape (3, 3). For each combination of mean and covariance matrix, I want to generate 10 variates. Let's say I define mean
and cov
6B2D
as follows:
In [55]: mean = np.array([[0, 0, 0], [1, -1, 1], [0, 0, 5], [3, 3, 2]])
In [56]: cov = np.array([np.eye(3), 4*np.eye(3)])
so mean
has shape (4, 3) and cov
has shape (2, 3, 3). These variables are not compatible for broadcasting in a gufunc with signature (n),(n,n)->(n), so we'll reshape mean
to have shape (4, 1, 3) when we call multivariate_normal
. The gufunc broadcasting then gives a shape of (4, 2, 3). To get the correct size
parameter, we drop the final (3,), and prepend (10,), to get size=(10, 4, 2)
. The output will have shape (10, 4, 2, 3).
The call to multivariate_normal
would be
x = rng.multivariate_normal(mean[:, None, :], cov, size=(10, 4, 2))
Now the question is, is this the API that we want? If not, what alternatives should be considered? [0]
I think it is essential to have the desired API explicitly defined before proceeding with @bashtage's PR for the multinomial
method.
multinomial
method.[0] Edit: Ignore this. I have a radically different idea, but it would mean changing the API of all the existing distribution methods, both univariate and multivariate. The idea is to add a new parameter, say num_variates
, that gives the number of variates to generate for each broadcast combination. (The code could be written so that if num_variates
is given, size
is ignored.) The function can figure out the rest of the output shape from the shapes of the distribution parameters. With this API, the multinomial
example shown above becomes
x = rng.multinomial(n, pvals, num_variates=10)
and the multivariate_normal
example becomes
x = rng.multivariate_normal(mean[:, None, :], cov, num_variates=10)
I think this makes using broadcasting and setting the size much simpler, but I'm not sure there is interest in such a significant change.