8000 Discuss: PR to fix multinomial precision issues · Issue #2469 · pymc-devs/pymc · GitHub
[go: up one dir, main page]

Skip to content
Discuss: PR to fix multinomial precision issues #2469
@ericmjl

Description

@ericmjl

Hi team,

I've been experiencing a multinomial sampling problem, first described here, and then surfacing again [here]. The issue that I get is that the pvals sum to greater than 1 with float32 precision.

As I've dug around, I found that the issue is a floating point precision issue in numpy's multinomial. Internally, it casts everything to float64. Issue is discussed on this issue on the numpy repository.

I'm thinking of submitting a very small PR that builds upon @junpenglao's previous PR on distributions/multinomial.py.

def _random(self, n, p, size=None):
    # Set float type to float64 for numpy.
    p = p.astype('float64')
    # Now, re-normalize all of the values in float64 precision.
    p = p / (p.sum(axis=1, keepdims=True))
    if size == p.shape:
       ...

I have done one test using the same notebook in which I first discovered the problem, and now they go away. From an empirical standpoint, the performance of the multinomial classification model is identical to when I had the + 1E6 hack previously described.

I wanted to pitch this first here to see if there's something I'm missing, before taking the time to put in the PR - or should I put in the PR first and solicit code review?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0