-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
Hi team,
I've been experiencing a multinomial sampling problem, first described here, and then surfacing again [here]. The issue that I get is that the pvals
sum to greater than 1 with float32 precision.
As I've dug around, I found that the issue is a floating point precision issue in numpy
's multinomial. Internally, it casts everything to float64
. Issue is discussed on this issue on the numpy
repository.
I'm thinking of submitting a very small PR that builds upon @junpenglao's previous PR on distributions/multinomial.py
.
def _random(self, n, p, size=None):
# Set float type to float64 for numpy.
p = p.astype('float64')
# Now, re-normalize all of the values in float64 precision.
p = p / (p.sum(axis=1, keepdims=True))
if size == p.shape:
...
I have done one test using the same notebook in which I first discovered the problem, and now they go away. From an empirical standpoint, the performance of the multinomial classification model is identical to when I had the + 1E6
hack previously described.
I wanted to pitch this first here to see if there's something I'm missing, before taking the time to put in the PR - or should I put in the PR first and solicit code review?