8000 numpy.sum not stable enough sometimes (Kahan, math.fsum) · Issue #8786 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

numpy.sum not stable enough sometimes (Kahan, math.fsum) #8786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nschloe opened this issue Mar 15, 2017 · 29 comments
Open

numpy.sum not stable enough sometimes (Kahan, math.fsum) #8786

nschloe opened this issue Mar 15, 2017 · 29 comments

Comments

@nschloe
Copy link
Contributor
nschloe commented Mar 15, 2017

Background

To approximate an integral over the reference interval [-1, 1], one arranges points x_i and weights w_i such that the number

sum(f(x_i) * w_i)

approximates the integral of some functions f (typically a set of polynomials) reasonably well. Sophisticated schemes have many points and weights, and naturally the more weights there are, the smaller each individual weight must be.

Bottom line: You'll have to sum up many small values.

When numpy.sum doesn't cut it

I noticed that numpy.sum produces large enough round-off errors for the tests to fail with as few as 7 summands (being -44.34121805, -15.72356145, -0.04563372, 0., 0.04563372, 15.72356145, 44.34121805, not shown in full precision). math.fsum always worked fine, but cannot be used anymore since I'd like to compute the sum for many intervals at once.

Work so far

Kahan's summation

def kahan_sum(a, axis=0):
    s = numpy.zeros(a.shape[:axis] + a.shape[axis+1:])
    c = numpy.zeros(s.shape)
    for i in range(a.shape[axis]):
        # http://stackoverflow.com/a/42817610/353337
        y = a[(slice(None),) * axis + (i,)] - c
        t = s + y
        c = (t - s) - y
        s = t.copy()
    return s

does the trick. Note, however, the use of a Python loop here, which leads to a performance degradation if the number of points and weights is large.

math.fsum uses a different algorithm which may or may not be better suited than Kahan.

See here for a previous discussion of the topic.

@eric-wieser
Copy link
Member

I don't think that copy is needed there.

These seems like an improvement that could be built directly in to the sum function if #8773 were implemented

@juliantaylor
Copy link
Contributor

sum uses pairwise summation which is reasonably accurate without a performance impact.
kahan sum could already be implemented now but is significantly slower.

how exactly are you summing? pairwise summation unfortunately is not used when you are summing along a strided axis, again for performance reasons.

Implementing an even more accurate sum is interesting, maybe via a new function or some precision context manager. Though it the use cases need to be clear. Kahan is easy to implement and acceptable in performance but not as good the algorithm math.fsum uses which is significantly slower and more complicated to implement.

@nschloe
Copy link
Contributor Author
nschloe commented Mar 15, 2017

kahan sum could already be implemented now but is significantly slower.

And more precise.

how exactly are you summing? pairwise summation unfortunately is not used when you are summing along a strided axis, again for performance reasons.

Interesting. I'm using numpy.sum(a, axis=0), so that shouldn't be a problem.

@juliantaylor
Copy link
Contributor

That might be the problem, the fast axis in numpy is the last one by default (C order).
How does it fare when you have your arrays in fortran order?

@nschloe
Copy link
Contributor Author
nschloe commented Mar 15, 2017

When transpose()ing and summing over axis=-1, I see the same deficiencies.

@eric-wieser
Copy link
Member

I see the same deficiencies.

Yes, that's because you're doing exactly the same operation over memory, and just calling the axes different things

np.copy(c, order='f').sum(axis=0) is the relevent test.

@seberg
Copy link
Member
seberg commented Mar 15, 2017

You would have to sparkle in a copy there after the transpose, or do copy("F") and not transpose.

@nschloe
Copy link
Contributor Author
nschloe commented Mar 15, 2017

Thanks for the clarification. Unfortunately,

numpy.copy(gamma, order='f').sum(axis=0)

has the same lack of precision as the original statement.

@seberg
Copy link
Member
seberg commented Mar 15, 2017

Well, you can't expect much difference here, its just a few numbers after all. And you can always get catastrophic anihilation, even in pairwise summation, pairwise summation is just much better because in typical scenarios it make sure that the two numbers being added are always of similar size.

E.g. if you would restructure your numbers by always having the first, last, second, second to last, etc. you would get a good result of course (and maybe even better for pairwise).

@njsmith
Copy link
Member
njsmith commented Mar 15, 2017

Another quick hack to get better accuracy is to sort your array before summing.

It'd certainly be nice to have better summation algorithms available as an option, but np.sum is unlikely to ever use them by default given the performance cost.

@seberg
Copy link
Member
seberg commented Mar 15, 2017

Right, silly by me. The pairwise stuff could maybe actually have helped,since it might sum all the negative and then all the positive ones one after another (the example actually was sorted, but basically symmetric about zero, which invites anihilation)?

@nschloe
Copy link
Contributor Author
nschloe commented Mar 15, 2017

np.sum is unlikely to ever use them by default given the performance cost.

Definitely not, I agree.

It'd be nice to have more options, though, for example numpy.kahan_sum with the same signature as numpy.sum.

@charris
Copy link
Member
charris commented Mar 15, 2017

The same algorithm used to generate a Hamming code could also be adapted: always summing the two numbers of smallest absolute value. Could be implemented as a priority queue. I'm not recommending that, have just wondered now and then how well it would work...

@njsmith
Copy link
Member
njsmith commented Mar 15, 2017

Sorry, yeah, you would want to sort by absolute value. (I've only used this trick for summing probabilities, where this doesn't come up, so I forgot :-).)

@nschloe
Copy link
Contributor Author
nschloe commented Mar 15, 2017

always summing the two numbers of smallest absolute value.

You can do this when summing floats, but not when summing arrays.

@njsmith
Copy link
Member
njsmith commented Mar 15, 2017

@nschloe: you have to sort each array separately, yes. It's awkward but certainly possible, as an option.

@wuisawesome
Copy link
wuisawesome commented Mar 20, 2017

Would either a kahan_sum/fsum be a feature others think is worth implementing?

@asmeurer
Copy link
Member

A stable numpy sum would be awesome. Sorting is a PITA since you have to do it by a key (abs), and if you have a more than 1-d array you have to sort each subarray independently.

@nschloe
Copy link
Contributor Author
nschloe commented May 12, 2018

For reference, I implemented some stable sums and dot products in accupy. Pretty slow in comparison to np.sum though.

@cdeil
Copy link
cdeil commented Jun 20, 2019

We would also like to have a more precise sum, for a likelihood that's the sum of ~ a billion numbers.

@nschloe - Very nice comparison in accupy, thanks!

So your conclusion is that the precision of kahan sum isn't much better, so not worth exposing in Numpy?

(kahan is actually already here for a different application, not for numpy.sum)

fsum accuracy seems amazing. How slow is it?
The speed comparison in accupy isn't fair since it's using a Python for loop, no?
(See here).
Is a fsum available (in C or Cython or Numba or ...) somewhere that works on Numpy arrays?

@seberg
Copy link
Member
seberg commented Jun 20, 2019

@cdeil python uses Shewchuk summation, my guess is you can just look at their code or google a C implementation at the very least. Adapting it into short cython code is probably less work then trying to dig very long.

Now that we support the axis argument for gufuncs, it could actually be a nice little exercise to implement a gufunc for stable summation. Not sure it will be ideal performance wise, but cython solution are unlikely to be any better (unless they bloat up a lot).

@nschloe
Copy link
Contributor Author
nschloe commented Jun 20, 2019

So your conclusion is that the precision of kahan sum isn't much better, so not worth exposing in Numpy?

Not my conclusion but a known property of Kahan. It's just not robust when it comes to ill-conditioned sums. The best thing would be have fsum (aka Shewchuk) indeed, but there are faster solutions too which work in k-fold machine precision. (See accupy.)

You can acutally try accupy though and see if it fits your needs. If yes, perhaps we can tweak it for speed.

The speed comparison in accupy isn't fair since it's using a Python for loop, no?

It does indeed. If you can get it done in C++, super!

@lorentzenchr
Copy link
Contributor
lorentzenchr commented Feb 18, 2024

To add a simple example:

import numpy as np

def naive_sum(a):
    result = np.zeros_like(a, shape=(1,))
    for i in range(a.shape[0]):
        result[0] += a[i]
    return result[0]

a = np.linspace(-100, 100, 1_000_000, dtype=np.float32)
np.random.default_rng(42).shuffle(a)

sum(a), naive_sum(a), np.sum(a)
# (0.0, -0.5259266, -0.008300781)

This shows that numpy.sum does better than the naive summation.

And for reference (not implying numpy should do the same!), Python's sum meanwhile uses (Kahan-) Neumaier summation, see python/cpython#100425.

@nschloe
Copy link
Contributor Author
nschloe commented Feb 22, 2024

For those interested, here is an example with a badly conditioned sum:

import math
import sys

import numpy as np

def naive_sum(lst):
    s = 0.0
    for val in lst:
        s += val
    return val

# A badly conditioned sum, condition number ~2.188e+14
p = [
    -0.41253261766461263,
    41287272281118.43,
    -1.4727977348624173e-14,
    5670.3302557520055,
    2.119245229045646e-11,
    -0.003679264134906428,
    -6.892634568678797e-14,
    -0.0006984744181630712,
    -4054136.048352595,
    -1003.101760720037,
    -1.4436349910427172e-17,
    -41287268231649.57,
]

exact = -0.377392919181026


print(f"Python {sys.version}")
print(f"NumPy {np.__version__}\n")
for sumfun in [math.fsum, sum, np.sum, naive_sum]:
    s = sumfun(p)
    err = abs(s - exact)
    name = f"{sumfun.__module__}.{sumfun.__name__}"
    print(f"{name:<18}: {s:10.4e}, error {err:.2e} ({err / abs(s)*100:.1f} %)")
Python 3.12.4 (main, Jun  7 2024, 06:33:07) [GCC 14.1.1 20240522]
NumPy 2.1.0

math.fsum         : -3.7739e-01, error 0.00e+00 (0.0 %)
builtins.sum      : -3.7739e-01, error 0.00e+00 (0.0 %)
numpy.sum         : -3.8281e-01, error 5.42e-03 (1.4 %)
__main__.naive_sum: -4.1287e+13, error 4.13e+13 (100.0 %)

fsum() computes the sum without error, sum() and np.sum() are still in the error order of magnitude 10^{-3}. We don't speak of naive_sum().

@asmaier
Copy link
asmaier commented May 3, 2024

Using Python 3.12.3 on Mac OS 14.1.1 with Apple M3 Pro I get the following results from the script by @nschloe:

math.fsum() -0.377392919181026 0.0
sum():       -0.377392919181026 0.0
np.sum():    -0.3828125 0.005419580818973979
naive_sum():    -41287268231649.57 41287268231649.195

The reason why math.fsum() and sum() now yield the same result might be that since Python 3.12 the algorithm for sum() was improved: "sum() now uses Neumaier summation to improve accuracy and commutativity when summing floats or mixed ints and floats. ", see docs.python.org/3/whatsnew/3.12.html

@NimaSarajpoor
Copy link

I am not sure whether I should open a new issue or not. Please let me know.

Issue:
In the following example, I've created two 1D numpy arrays. Both have the same non-zero elements but at different positions. They have different sum !

Example:

import numpy as np

non_zero_elems = """
0.024861743494183196  0.024861743494183196  0.024861743494183196
 0.021792470064670937  0.021792470064670937  0.018949939845733486
 0.018949939845733486  0.0015757541842979532 0.0015757541842979532
 0.0015757541842979532 0.0015757541842979532 0.0015757541842979532
 0.0015757541842979532 0.0015757541842979532 0.0015757541842979532
 0.04033997886552635   0.04033997886552635   0.04033997886552635
 0.04033997886552635  
 """
non_zero_elems = np.array(non_zero_elems.split()).astype(np.float64)
IDX1 = np.array([28, 29, 30, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
       53, 54])

IDX2 = np.array([ 8,  9, 10, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
       53, 54])

# `x` and `y` are two arrays with the same non-zero elements but at different positions
x = np.zeros(55, dtype=np.float64)
y = np.zeros(55, dtype=np.float64)

x[IDX1] = non_zero_elems.copy()
y[IDX2] = non_zero_elems.copy()

print(np.sum(x) == np.sum(y))  # False!! 
print(np.sum(x[x!=0]) == np.sum(y[y!=0]))  # True

@seberg
Copy link
Member
seberg commented Apr 22, 2025

In the following example, I've created two 1D numpy arrays. Both have the same non-zero elements but at different positions. They have different sum !

It's not an issue, it is a basic property of numerical floating point math that re-ordering even the simplest operations can lead to different results. More stable summation may do that less, but I don't think there is any reasonable version that is completely exact.

@lorentzenchr
Copy link
Contributor

@seberg What do numpy maintainers intend to do about this issue. I see several options:

  1. Close the issue as a "non issue" or "won't be implemented".
  2. Add an option to the existing sum (what about mean and average?)
  3. Add a new function numpy.fsum that simply promotes python's fsum to a ufunc.
  4. Do nothing.
  5. Yet another option.

@seberg
Copy link
Member
seberg commented Apr 22, 2025

Well, I think there is nothing to do about mean/sum directly. We could think about a new function or maybe about a new option to sum/mean (but that is harder).
So I think it's a reasonable request, but a hard one. A new function would be the simplest thing probably.
(I don't see anything to do about documentation.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0