-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
np.sum returns wrong answers #21769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am including the code and its output again; there were a few typos in my first submission. Sorry! Code:
Output:
|
@seberg Sebastian, Thanks for reviewing this issue. When you get a few minutes, I would appreciate if you can clarify why you don't consider this as a bug. The computation is clearly returning a mathematically wrong answer. To clarify, given the stochastic dominance relationship between the three Poisson distributions, the tail probabilities should be monotonically increasing (higher rates should yield higher tail probability). But it is not true that 1.11e-16 < 0! Also, what is curious is that the answer is correct for 7.0* 10**-6 and 8.0* 10**-6, but is incorrect for 7.5* 10**-6. This happens only for 7.5. For example, if I change 7.5 to 7.3 or 7.8, I get 0.0 which is the correct answer (the correct answer is of the order of 10**-54, which is essentially zero). From my perspective this is a high-priority bug. If you cannot count on Numpy summing numbers correctly, then you cannot count on anything it computes. Thanks for your attention. |
@nc10 the typical precision computers use is "double precison", i.e. Pythons float. You can use Python You can try things like Now, there may be tricks available, So no, it is clearly not a NumPy issue. But rather too high expectations of typical computer math. For more details, see for example: |
See also #8786. There are several other discussions of the problem. |
I understand your point of view. But this is not about precision, but is about consistency. Numpy is returning a nonzero value when it is returning zero for a theoretically larger quantity. I actually don't need this level of precision. If a larger quantity is falling out side of float64 resolution and returned zero, a smaller quantity should also fall outside of that resolution and should return zero. As an additional piece of information, the expression evaluates to correct value (zero) if I use numba and compile it C (i.e., it evaluates consistently). This suggests that the problem is not about how the numerical system represents small numbers but it is about how Numpy is carrying out the calculation. Does this make sense? |
Out of curiosity, are you running on a 32 bit platform? |
FWIW, if you want actually accurate tail values, you should use the
|
A bit of theory: The problem is theoretically related to infinite sums and absolutely summable sequences. In numerical computation, the order in which the terms of a sequence are added up matters even when from the mathematical perspective this does not matter (as you pointed out, this is because of finite precision). In short, to get the correct result when a sequence is absolutely summable, one must first sort values in increasing order (in terms of their absolute values) and then add them up from small to large. |
@charris No, I am using 64bit platform (Ubuntu 18.04) |
I ask, because the FPU may be different on 32 bit and 64 bit platforms, even on the same hardware. Results also vary depending on which hardware features are used by the compiler and how the compiler organizes the computation. |
@rkern I don't need accuracy at the level of 1e-16 for my numerical computation (1e-6 is far more than enough). What causes problem (at least for me) is inconsistent evaluation. If 1e-18 is represented as zero, then 1e-19 should also be represented as zero. Also, don't forget that this is happening only for 7.5e-6. Both 7.4e-6 and 7.6e-6 evaluate to zero! |
@charris |
No one is "representing 1e-18 as zero". That's not what's going on here. Forget about the properties of Poisson distributions like stochastic dominance (if you need something closer to those ideal properties, you must use the much more accurate
Now, it does happen to be the case that |
Thanks for the explanation. I just made up this example to demonstrate the problem I am facing in my actual numerical computation. But your detailed description helped me to better understand the issue. I guess with your clarification, I should be able to devise a workaround. Thanks A1C1 again! |
Changing the order can get you different results, sometimes more preferably for certain purposes ( |
Yes, you are correct. From my experience, the most reliable approach is to sort the sequence in increasing order (in absolute value term) and then summing from small to large. This of course make it an n-log(n) algorithm. This entire issue came up because I was testing my computation by checking its continuity. I have closed-form expressions of the ratio of two tail probabilities for zero rate and non-zero rate and I wanted to numerically verify my computation by checking if the limit (when the rate goes to zero) matches up with my direct computation. In any case, your comment was very helpful. Thanks again! |
Describe the issue:
When I calculate the tail probability of a Poisson distribution, I get a wrong value with "np.sum". "sum" by itself returns a correct value.
Reproduce the code example:
Error message:
NumPy/Python version information:
%%
import sys, numpy; print(numpy.version, sys.version) -> 1.22.4 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0]
The text was updated successfully, but these errors were encountered: