8000 multinomial casts input to np.float64 · Issue #8317 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

multinomial casts input to np.float64 #8317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
QCaudron opened this issue Nov 25, 2016 · 15 comments · Fixed by #18482
Closed

multinomial casts input to np.float64 #8317

QCaudron opened this issue Nov 25, 2016 · 15 comments · Fixed by #18482

Comments

@QCaudron
Copy link

np.random.multinomial seems to cast its second argument pvals to an array of dtype float64. Whilst this isn't an issue in and of itself, I've come across an interesting scenario where I have an array of dtype float32 whose sum is 0.99999994, and when it gets cast to float64, its sum is now 1.0000000222053895, which causes np.random.multinomial to raise ValueError: sum(pvals[:-1]) > 1.0.

For a little context, I have an array of dtype float32 because it's come off the GPU, where single precision calculations are a great deal faster.

@charris
Copy link
Member
charris commented Nov 25, 2016

Hmm, I suspect the float32 sum is off due to rounding. Not by much, though

In [18]: nextafter(float32(1), float32(0))
Out[18]: 0.99999994

The question is, what is pvals[-1]? If it is large I suspect another problem somewhere. How many values are in pvals?

@QCaudron
Copy link
Author

pvals[-1] is typically also very small, along with most of the other 64 terms in pvals ( one or two terms tend to dominate ).

Happy to upload a .npy if that'd help.

@charris
Copy link
Member
charris commented Nov 26, 2016

Happy to upload a .npy if that'd help.

This definitely sounds like a roundoff problem in computing the sum, and maybe in normalizing also. Not sure what your best bet for a fix is, probably depends on the details of your application.

@QCaudron
Copy link
Author

I'm dealing with it application-side; I just thought it might be worth noting, as this is a bit of a surprising behaviour - certainly took me a while to work out what was going on. Is there any reason a float32 array wouldn't be acceptable to np.random.multinomial ?

@charris
Copy link
Member
charris commented Nov 26, 2016

Is there any reason a float32 array wouldn't be acceptable to np.random.multinomial ?

No. But this is a classic setup for floating point roundoff errors: lots of small values and a couple of big ones. All the small values loose precision when they are added to the big ones. After this you won't be surprised, you will expect problems ;)

@ericmjl
Copy link
ericmjl commented Jul 28, 2017

Many apologies for resurrecting an old thread here, but I noticed that PyMC3 gets issues with numpy's multinomial function when working with float32s returned from the GPU. Is there a reason for explicit casting to float64, rather than simply letting the floats be what they are?

@ericmjl
Copy link
ericmjl commented Aug 2, 2017

@QCaudron may I ask, what's the fix that you've been working with to get around the multinomial probability rounding problems?

@QCaudron
Copy link
Author
QCaudron commented Aug 2, 2017

@ericmjl I'm afraid I don't remember the context here, and thus can't provide a workaround 😉 My intuition says I probably cast pvals to float64, normalised the array to sum to one, and passed that in.

@leachim
Copy link
leachim commented Nov 7, 2018

I happened to run into this issue today. Wondering whether there is any reason for casting the array to float64? This should probably be fixed in numpy itself.

@escorciav
Copy link

Similar to leachim I faced this bug recently. Is there a way to get this fixed?

@bashtage
Copy link
Contributor

Could someone post an example of an array the reproduces this issue? I haven't been able to with random arrays.

@escorciav
Copy link
escorciav commented Apr 11, 2019

The following snippet should break in a considerable amount of time (< 3 mins). You can take the tricky case then 😅 . Please lemme know if that helps.

import numpy as np

while True:
    x = np.random.rand(100).astype(np.float32)
    x /= x.sum()
    neg_video_ind = np.random.multinomial(1, x)

I replicated the (buggy) behavior in my machine (numpy=1.14.3), and google-colab. Go for this if you only need a 100 vector to replicate the bug.

@bashtage
Copy link
Contributor

This allowed it to be easily replicated -- hadn't realized that I needed a relatively large pval array.

@bashtage
Copy link
Contributor

This always triggers

x = np.array([9.9e-01, 9.9e-01, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09, 1.0e-09,
       1.0e-09, 1.0e-09, 1.0e-09], dtype=np.float32)
y = x / x.sum()
np.random.multinomial(1, y)

bashtage added a commit to bashtage/numpy that referenced this issue Feb 18, 2021
Add additional check when original input is an array that does not have dtype double

closes numpy#8317
bashtage added a commit to bashtage/numpy that referenced this issue Feb 26, 2021
Improve error message when the sum of pvals is larger than 1
when the input data is an ndarray

closes numpy#8317
xref numpy#16732
@olfMombach
Copy link

I don't really see how that fixes the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
0