8000 ENH: Support array-like `n` in random.random.multinomial by ColCarroll · Pull Request #9710 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Support array-like n in random.random.multinomial #9710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

ColCarroll
Copy link

This changes the numpy.random.multinomial API to be more similar to the API for numpy.random.binomial. In particular, something like this:

>>> np.random.binomial([10, 20], 0.2)
array([2, 7])

>>> np.random.multinomial([10, 20], [0.2, 0.8])
array([[ 1,  9],
       [ 4, 16]])

I did some light benchmarking, and it seems like the current use case is equally fast, while this provides at least some speed up over np.array([np.random.multinomial(n, pvals) for n in nvals]).

@@ -4549,7 +4549,7 @@ cdef class RandomState:

Parameters
----------
n : int
n : int or array_like of ints
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs .. versionchanged:: 1.14.0 with a note that allowing array_like here is new. You can find examples in the file.

mnix[i+d-1] = dn

i = i + d
for k from 0 <= k < m:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is outdated syntax for Pyrex compatibility

@eric-wieser
Copy link
Member
eric-wieser commented Sep 19, 2017

I'm wary of adding another (m)(n)->(m,n) function instead of trying to give normal broadcasting rules.

Can we make it work as:

>>> np.random.multinomial([[10, 20]], [0.2, 0.8])
array([[ 1,  9],
       [ 4, 16]])

I'm also wondering if accepting an ND p array would make sense, with out[i,j,k] = count_with(p[i,j,k]) (but broadcasting) - but that doesn't need to be dealt with now

@ColCarroll
Copy link
Author

I removed what I think was the pyrex compatibility, but I'm not sure I understand your point about broadcasting? Would you guess the right approach is (in case n and pvals are both arrays) to use <broadcast>PyArray_MultiIterNew? Still learning some of this Cython...

@ColCarroll
Copy link
Author

I tried a bit to use np.broadcasting, but it gets funny since the multinomial naturally wants to associate each integer n with a vector of p_vals. I'm happy to keep trying to update, but might need an example of a similar pattern (where the broadcasting was against all but the last dimension).

@eric-wieser
Copy link
Member
eric-wieser commented Oct 2, 2017

it gets funny since the multinomial naturally wants to associate each integer n with a vector of p_vals

This maybe feels like a feature? Force the user to make the integers n and the p_vals to be along different axes?

Essentially, there are two different ways we could go with broadcasting. Here I'm using A<x, y> as a shorthand for an array A with shape x,y, and mn for `multinomial:

  1. No broadcasting, concatenate dimensions

    • mn1(N<a>, P<b>) -> M<a,b> where M[a,b] is the count of event b (probability P[b]) occured given N[a] samples
    • mn1(N<a,b>, P<c,d>) -> M<a,b,c,d> where M[a,b,c,d] is the count of event c,d (probability P[c,d]) occured given N[a,b] samples
  2. Broadcast across all dimensions

    • mn2(N<a,1>, P<1,b>) -> M<a,b> where M[a,b] is the count of event b (probability P[0,b]) occured given N[a,0] samples
    • mn2(N<1,a>, P<b,1>) -> M<b,a> where M[b,a] is the count of event b (probability P[b,0]) occured given N[0,a] samples
    • mn2(N<1,b,1,d>, P<a,1,c,1>) -> M<a,b,c,d> where M[a,b,c,d] is the count of event (a, c) (probability P[a,0,c,0]) occured given N[0,b,0,d] samples.
    • extension: mn2(N<a, 1>, P<a, b>, axis=1) -> M<a,b> where M[b,a] is the count of event b given N[a,0] samples from the set of events (a,b) | a

Version 2 is more complex, but it does preserve the meanings of dimensions, which is often handy

@eric-wieser
Copy link
Member

PyArray_MultiIterNew is an outdated API - the replacement is NpyIter_MultiNew

@eric-wieser
Copy link
Member

Looking at the nditer API, it seems my suggestion above might be totally unreasonable...

@eric-wieser
Copy link
Member
eric-wieser commented Oct 2, 2017

Actually this is totally possible, there's just a quirk in the python-side interface to nditer ( #9808) that makes it non-obvious. Here's what I believe is a python implementation of what I'm describing.

Caveat: I have no idea what I'm doing with nditer, and would appreciate comments from those who do.

import numpy as np
from numpy.lib.stride_tricks import as_strided
from numpy.random import multinomial as m_old

def multinomial(n, p):
    it = np.nditer(
        (n, p, None),
        flags=[
            "zerosize_ok", 'multi_index', 'reduce_ok'  # no idea if these are needed
        ],
        op_flags=[
            ['readonly'],
            ['readonly'],
            ['writeonly', 'allocate']
        ]
    )
    n, p, out = it.operands

    p_shape = np.array(p.shape)
    n_shape = np.array(p.shape)
    out_shape = np.array(out.shape)

    # axes containing probabilities, counted from the end
    p_axs = p_shape != 1  # todo: let this be passed in
    p_axs, = p_axs.nonzero()
    p_axs -= p.ndim
    if not np.all(n_shape[p_axs] == 1):
        raise ValueError("Axes of n must be length 1 where probabilities are not")

    # don't iterate over these
    for ax_i in p_axs:
        it.remove_axis(it.ndim + ax_i)

    it.remove_multi_index()  # I think this just makes things faster?
    # it.enable_external_loop()  # TODO: maybe a speedup to be had by using this

    # convert to arrays so we can index these
    p_strides = np.array(p.strides)
    out_strides = np.array(out.strides)

    while not it.finished:
        ni, pi, outi = it.value
        # workaround for gh-9808
        pi = as_strided(pi, shape=p_shape[p_axs], strides=p_strides[p_axs])
        outi = as_strided(outi, shape=out_shape[p_axs], strides=out_strides[p_axs])

        outi[...] = m_old(ni, pi.ravel()).reshape(p.shape)
        it.iternext()

    return out

@ColCarroll
Copy link
Author

Digging this up and asking if there is anything I could do to help this along?

@mattip
Copy link
Member
mattip commented May 29, 2019

Closing, np.random.Generator().multinomial([10, 20], [0.2, 0.8]) works with the new API. Please reopen if needed.

@mattip mattip closed this May 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0