8000 ENH: Stable summation method for cumsum (add.reduce) · Issue #14909 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Stable summation method for cumsum (add.reduce) #14909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
robparrishqc opened this issue Nov 15, 2019 · 7 comments
Open

ENH: Stable summation method for cumsum (add.reduce) #14909

robparrishqc opened this issue Nov 15, 2019 · 7 comments

Comments

@robparrishqc
Copy link

numpy (superb library - no slight intended here!) does not seem to support Kahan or butterfly summation in cumsum, which is critically important for accuracy in large arrays of numbers of similar magnitude.

Reproducing code example:

import numpy as np; print(np.cumsum(np.ones((2**28,), dtype=np.float32))[-1] - 2**28)

import numpy as np
print(np.cumsum(np.ones((2**28,), dtype=np.float32))[-1] - 2**28)
# Should be zero

Note that sum obviously uses butterfly summation:

print(np.sum(np.ones((2**28,), dtype=np.float32)) - 2**28)
# Is zero on any machine I can find
@seberg
Copy link
Member
seberg commented Nov 17, 2019

Sum uses (partial) pairwise summation (in many cases). What is butterfly summation? There should be a few open issues about including stable summation methods in numpy, could you cross reference these here and comment there?

@robparrishqc
Copy link
Author

I think butterfly summation == pairwise summation == parallel sum scan (just different names for the same thing). If there is any remaining doubt, the technique I am referring to is described extensively for CUDA here: https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch39.html - this same approach should work quite well and be reasonably efficient for CPU, but some care will have to be taken to make sure the algorithm uses the cache structure efficiently (or you'll end up with something that scales like log2 memcpy ops). np.sum is clearly using something like this, but np.cumsum is clearly not (see example above). This is particularly problematic when summing large arrays of similar magnitude numbers in low precision (e.g., float32).

@seberg
Copy link
Member
seberg commented Nov 17, 2019

Yes, the issue is that np.cumsum uses np.add.accumulate which is a bit more limited in implementing special logic (at least right now).

If you can help out, I would prefer to move this discussion to an existing issue and close this one, although it may be that the existing stable summation issues are not specific about cumsum.

@robparrishqc
Copy link
Author

Certainly - which open issue should I look at?

@charris
Copy link
Member
charris commented Nov 17, 2019

I suppose one easy fix for float32 would be to accumulate the running sum in double precision.

@seberg
Copy link
Member
seberg commented Nov 18, 2019

True about float64 making the float32 issue less bad. xref gh-8786 I do not mind moving this there as well, but can also stay its own issue.

@seberg seberg changed the title Butterfly summation needed in cumsum ENH: Stable summation method for cumsum (add.reduce) Nov 18, 2019
@nschloe
Copy link
Contributor
nschloe commented Dec 4, 2019

Just an FTI: I once created accupy for this purpose. It's slow though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants
0