Discuss: PR to fix multinomial precision issues

@junpenglao

Hi team,

I've been experiencing a multinomial sampling problem, first described here, and then surfacing again [here]. The issue that I get is that the pvals sum to greater than 1 with float32 precision.

As I've dug around, I found that the issue is a floating point precision issue in numpy's multinomial. Internally, it casts everything to float64. Issue is discussed on this issue on the numpy repository.

I'm thinking of submitting a very small PR that builds upon @junpenglao's previous PR on distributions/multinomial.py.

def _random(self, n, p, size=None):
    # Set float type to float64 for numpy.
    p = p.astype('float64')
    # Now, re-normalize all of the values in float64 precision.
    p = p / (p.sum(axis=1, keepdims=True))
    if size == p.shape:
       ...

I have done one test using the same notebook in which I first discovered the problem, and now they go away. From an empirical standpoint, the performance of the multinomial classification model is identical to when I had the + 1E6 hack previously described.

I wanted to pitch this first here to see if there's something I'm missing, before taking the time to put in the PR - or should I put in the PR first and solicit code review?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions