-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
BUG: Improve the accuracy of the FFT implementation #10625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Nice. There is also a way to improve the twiddle factor computation using two doubles, one to accumulate errors. It's been a couple or three of decades and I don't recall where I saw it. IIRC, one of the problems with big ffts is that the cos is nearly one near the beginning and there is an advantage to factoring it out. I may have some very old code sitting around someplace where I did this, it did improve the accuracy of the FFT. Looks like FFTW is advantageous for prime sizes and small FFTs, the latter probably due to better cache usage. |
Looks like a good modern review of methods for computing twiddle factors is here. |
You can fix the circleci error by rebasing on current master. |
Previously, the numerical constants in the FFT code were not provided at full double precision which led to a loss of accuracy in the FFT operation. Additionally this commit improves the accuracy of the twiddle factor calculation by reducing the argument of exp(2j*pi*m/n) to the first octant before calling the library function. On average the commit lowers the remaining numerical error in the FFT by a factor of ten.
I rebased, improved the commit message and looked at the C coding style. Unfortunately, the whole Thanks for pointing out the review. It basically says that using recurrence relations for the calculation of twiddle factors is bad. This is the same message as publicized by the FFTW authors (see http://www.fftw.org/accuracy/comments.html and their papers in which they cite Tasche & Zeuner). Fortunately, the numpy code does not do that (except in one case, see below). The twiddle factors are precomputed using accurate library functions and stored in a large (size N) array. My patch increases the accuracy of this direct twiddle factor computation while still using double accuracy. The comparison to the extended precision calculation (which generates correctly rounded double precision twiddle factors) shows that further improvements to this calculation (e.g. by accumulating errors using two double values) will not have any significant influence on the accuracy of the FFT. The accuracy of the FFT is now limited by other factors (choice of algorithm, implementation of radix passes, etc.). Several issues remain with the code - one is critical to the accuracy.
Both issues closely relate to the future of the numpy.fft implementation. I remember there was some discussion about this on the mailing list some time back (https://mail.scipy.org/pipermail/numpy-discussion/2014-October/071521.html and https://mail.scipy.org/pipermail/numpy-discussion/2014-October/071545.html). There is the issue of two equivalent implementations in numpy and scipy, and that apparently there seems to be no nice drop-in replacement for the current implementation that fits feature- and license-wise and is not overly complex. Now probably no one (me included) wants to spend a lot of time fixing, maintaining and improving the current implementation when it probably should be replaced. Additionally any of those fixes would have to be mirrored in the scipy implementation (e.g. the twiddle factor calculation). |
Yeah, don't worry about the C style. There is other old, generated C code that also violates the style guide. The duplication of functionality between NumPy and SciPy here is indeed unfortunate. Part of that is due to the NumPy policy of having no Fortran dependencies, and part of it is the API. I happen to like the NumPy API better, at least for representing the results of real FFTs. If FFTW had a compatible licence we could use that with a fallback to FFTPACK. @matthew-brett Maybe we could put FFTPACK wheels up in PyPI or some such? Maybe even revisit the no Fortran policy. I believe we now have the machinery to do that. |
Thanks @Lodomir . |
The previous numpy FFT implementation did not provide the numerical constants with full double precision as it was a direct conversion of a single precision Fortran code. This commit fixes this and consequently improves accuracy. I took the constants from the DFFTPACK source code but checked them first using gmpy2.
Additionally, I took a look at the calculation of the twiddle factors
exp(2j*pi*n/m)
in the implementation and found that their accuracy could also be improved in two ways:m/n
can be reduced to the first octant[1, 1/8]
prior to the multiplication with2*pi
. Due to the symmetry of the exponential function, the values for all octants can be obtained by simple symmetry operations. This improves the accuracy of the twiddle factor as less accuracy is lost in the multiplication inside the argument.It turned out that the first option increases the accuracy substantially while having only a small performance impact. Doing the calculation in extended precision provides almost no further increase in accuracy but doubles the twiddle factor calculation time on my computer (Only for the first FFT, afterwards the twiddle factors are cached). Thus, the PR currently proposes to not use extended precision calculations -- but it can easily be changed. One could also argue that one usually cares about the runtime if FFTs of the same size are repeated very often. In this case the time for creating the twiddle factors would not really matter.
I did benchmarks on the accuracy and speed of the FFT implementations in numpy ("numpy-1.13.3", this PR with double precision "numpy-fixed-double", this PR with extended precision "numpy-fixed-extended"), scipy and pyfftw. The results are shown below, the code can be found in the following repo: https://github.com/Lodomir/benchmark-ffts
Accuracy
Above is the forward error (L2 error) of a complex-valued double-precision FFT compared against a high-precision calculation of the DFT for several array sizes (power-of-two, regular numbers (prime factors <= 5), primes). Additionally, the identity error, i.e., comparing
x
andifft(fft(x))
is shown for larger array sizes. One can see that the fixed numpy FFT performs even better than scipy (which uses DFFTPACK and is equivalent to numpy with accurate constants). It is also on par with FFTW except for primes sizes where the lack of a prime-size FFT and the use of a N^2-algorithm leads to large error propagation.The same for real-valued FFTs. Curiosly, the FFTPACK implementation for non-factorizable inputs performs really bad here. We should propably document this.
Speed
The speed of the FFT itself is not impacted by the PR. However, the calculation of the twiddle factors is made slower due to the additional branching in front of the sin/cos computation. If we decide to use extended precision the performance impact depends on the math-library used. I tested this with the glibc-implementation which leads to a factor of two in the runtime for the twiddle factor calculation.
I tried to benchmark the twiddle factor creation time by measuring the runtime of the FFT and then destroying the FFT cache manually. The runtime for twiddle factor calculation is the difference between the non-cached FFT run and the cached one!
The results for complex-valued FFTs.
The results for real-valued FFTs.
PS: I realize just now that I forgot reading the conventions on commit messages and the C code. Sorry for that and hopefully I was not too far off... I will fix it when I have time.