-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: Stable summation method for cumsum (add.reduce) #14909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Sum uses (partial) pairwise summation (in many cases). What is butterfly summation? There should be a few open issues about including stable summation methods in numpy, could you cross reference these here and comment there? |
I think butterfly summation == pairwise summation == parallel sum scan (just different names for the same thing). If there is any remaining doubt, the technique I am referring to is described extensively for CUDA here: https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch39.html - this same approach should work quite well and be reasonably efficient for CPU, but some care will have to be taken to make sure the algorithm uses the cache structure efficiently (or you'll end up with something that scales like log2 memcpy ops). |
Yes, the issue is that If you can help out, I would prefer to move this discussion to an existing issue and close this one, although it may be that the existing stable summation issues are not specific about cumsum. |
Certainly - which open issue should I look at? |
I suppose one easy fix for float32 would be to accumulate the running sum in double precision. |
True about float64 making the float32 issue less bad. xref gh-8786 I do not mind moving this there as well, but can also stay its own issue. |
Just an FTI: I once created accupy for this purpose. It's slow though. |
numpy
(superb library - no slight intended here!) does not seem to support Kahan or butterfly summation incumsum
, which is critically important for accuracy in large arrays of numbers of similar magnitude.Reproducing code example:
import numpy as np; print(np.cumsum(np.ones((2**28,), dtype=np.float32))[-1] - 2**28)
Note that
sum
obviously uses butterfly summation:The text was updated successfully, but these errors were encountered: