BUG: Improve the accuracy of the FFT implementation #10625

ncgeib · 2018-02-18T20:23:23Z

The previous numpy FFT implementation did not provide the numerical constants with full double precision as it was a direct conversion of a single precision Fortran code. This commit fixes this and consequently improves accuracy. I took the constants from the DFFTPACK source code but checked them first using gmpy2.

Additionally, I took a look at the calculation of the twiddle factors exp(2j*pi*n/m) in the implementation and found that their accuracy could also be improved in two ways:

The integer fraction m/n can be reduced to the first octant [1, 1/8] prior to the multiplication with 2*pi. Due to the symmetry of the exponential function, the values for all octants can be obtained by simple symmetry operations. This improves the accuracy of the twiddle factor as less accuracy is lost in the multiplication inside the argument.
The twiddle factor can be calculated in extended precision when it is available.

It turned out that the first option increases the accuracy substantially while having only a small performance impact. Doing the calculation in extended precision provides almost no further increase in accuracy but doubles the twiddle factor calculation time on my computer (Only for the first FFT, afterwards the twiddle factors are cached). Thus, the PR currently proposes to not use extended precision calculations -- but it can easily be changed. One could also argue that one usually cares about the runtime if FFTs of the same size are repeated very often. In this case the time for creating the twiddle factors would not really matter.

I did benchmarks on the accuracy and speed of the FFT implementations in numpy ("numpy-1.13.3", this PR with double precision "numpy-fixed-double", this PR with extended precision "numpy-fixed-extended"), scipy and pyfftw. The results are shown below, the code can be found in the following repo: https://github.com/Lodomir/benchmark-ffts

Accuracy

Above is the forward error (L2 error) of a complex-valued double-precision FFT compared against a high-precision calculation of the DFT for several array sizes (power-of-two, regular numbers (prime factors <= 5), primes). Additionally, the identity error, i.e., comparing x and ifft(fft(x)) is shown for larger array sizes. One can see that the fixed numpy FFT performs even better than scipy (which uses DFFTPACK and is equivalent to numpy with accurate constants). It is also on par with FFTW except for primes sizes where the lack of a prime-size FFT and the use of a N^2-algorithm leads to large error propagation.

The same for real-valued FFTs. Curiosly, the FFTPACK implementation for non-factorizable inputs performs really bad here. We should propably document this.

Speed

The speed of the FFT itself is not impacted by the PR. However, the calculation of the twiddle factors is made slower due to the additional branching in front of the sin/cos computation. If we decide to use extended precision the performance impact depends on the math-library used. I tested this with the glibc-implementation which leads to a factor of two in the runtime for the twiddle factor calculation.

I tried to benchmark the twiddle factor creation time by measuring the runtime of the FFT and then destroying the FFT cache manually. The runtime for twiddle factor calculation is the difference between the non-cached FFT run and the cached one!

The results for complex-valued FFTs.

The results for real-valued FFTs.

PS: I realize just now that I forgot reading the conventions on commit messages and the C code. Sorry for that and hopefully I was not too far off... I will fix it when I have time.

charris · 2018-02-18T20:57:19Z

Nice. There is also a way to improve the twiddle factor computation using two doubles, one to accumulate errors. It's been a couple or three of decades and I don't recall where I saw it. IIRC, one of the problems with big ffts is that the cos is nearly one near the beginning and there is an advantage to factoring it out. I may have some very old code sitting around someplace where I did this, it did improve the accuracy of the FFT.

Looks like FFTW is advantageous for prime sizes and small FFTs, the latter probably due to better cache usage.

charris · 2018-02-18T21:28:33Z

Looks like a good modern review of methods for computing twiddle factors is here.

charris · 2018-02-19T01:55:20Z

You can fix the circleci error by rebasing on current master.

Previously, the numerical constants in the FFT code were not provided at full double precision which led to a loss of accuracy in the FFT operation. Additionally this commit improves the accuracy of the twiddle factor calculation by reducing the argument of exp(2j*pi*m/n) to the first octant before calling the library function. On average the commit lowers the remaining numerical error in the FFT by a factor of ten.

ncgeib · 2018-02-20T12:31:35Z

I rebased, improved the commit message and looked at the C coding style. Unfortunately, the whole fftpack.c does not follow the style guide so I opted to make the file consistent in itself.

Thanks for pointing out the review. It basically says that using recurrence relations for the calculation of twiddle factors is bad. This is the same message as publicized by the FFTW authors (see http://www.fftw.org/accuracy/comments.html and their papers in which they cite Tasche & Zeuner). Fortunately, the numpy code does not do that (except in one case, see below). The twiddle factors are precomputed using accurate library functions and stored in a large (size N) array.

My patch increases the accuracy of this direct twiddle factor computation while still using double accuracy. The comparison to the extended precision calculation (which generates correctly rounded double precision twiddle factors) shows that further improvements to this calculation (e.g. by accumulating errors using two double values) will not have any significant influence on the accuracy of the FFT. The accuracy of the FFT is now limited by other factors (choice of algorithm, implementation of radix passes, etc.).

Several issues remain with the code - one is critical to the accuracy.

The accuracy of the real-valued FFT for prime sizes is really bad. For N=4099 I already observe maximum errors around e=5e-9. This means we loose almost 6 decimal digits of accuracy for some array elements. After looking at the code more closely I guess this is due to calculation of twiddle factors by reccurence in radfg (at line 934-959) and radbg (at line 1092-1117). I have no idea why those routines do not use pre-computed twiddle factors. Unfortunately, I am also unable to quickly fix this. I admit I do not really understand what the code is doing.
The comparison to scipy.fftpack shows that numpy's C-translation of FFTPACK is much slower than the direct wrapper around the Fortran package. This leads me to suspect that there are some low-hanging fruits to pick in improving the speed of the numpy implementation.

Both issues closely relate to the future of the numpy.fft implementation. I remember there was some discussion about this on the mailing list some time back (https://mail.scipy.org/pipermail/numpy-discussion/2014-October/071521.html and https://mail.scipy.org/pipermail/numpy-discussion/2014-October/071545.html). There is the issue of two equivalent implementations in numpy and scipy, and that apparently there seems to be no nice drop-in replacement for the current implementation that fits feature- and license-wise and is not overly complex.

Now probably no one (me included) wants to spend a lot of time fixing, maintaining and improving the current implementation when it probably should be replaced. Additionally any of those fixes would have to be mirrored in the scipy implementation (e.g. the twiddle factor calculation).
So maybe now is a good time to warm up that discussion? Maybe it could lead to having a maintainable, sufficiently fast and accurate FFT in either scipy or numpy.

charris · 2018-02-20T16:39:44Z

Yeah, don't worry about the C style. There is other old, generated C code that also violates the style guide. The duplication of functionality between NumPy and SciPy here is indeed unfortunate. Part of that is due to the NumPy policy of having no Fortran dependencies, and part of it is the API. I happen to like the NumPy API better, at least for representing the results of real FFTs. If FFTW had a compatible licence we could use that with a fallback to FFTPACK. @matthew-brett Maybe we could put FFTPACK wheels up in PyPI or some such? Maybe even revisit the no Fortran policy. I believe we now have the machinery to do that.

charris · 2018-02-20T16:44:23Z

Thanks @Lodomir .

charris added 00 - Bug component: numpy.fft 03 - Maintenance labels Feb 19, 2018

ncgeib changed the title ~~Improving the accuracy of the FFT implementation~~ [BUG] Improving the accuracy of the FFT implementation Feb 20, 2018

charris merged commit 32bc576 into numpy:master Feb 20, 2018

charris changed the title ~~[BUG] Improving the accuracy of the FFT implementation~~ BUG: Improve the accuracy of the FFT implementation Feb 20, 2018

ncgeib mentioned this pull request Aug 26, 2018

Weird behavior of Numpy.fft in solving a chaotic system #10575

Closed

This was referenced Sep 5, 2018

Accuracy of real FFTs is really bad #11886

Closed

ENH: Add pocketfft sources to numpy for testing, benchmarks, etc. #11888

Merged

ncgeib mentioned this pull request Sep 11, 2018

FFTN generates different results on different CPUs #11926

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Improve the accuracy of the FFT implementation #10625

BUG: Improve the accuracy of the FFT implementation #10625

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: Improve the accuracy of the FFT implementation #10625

BUG: Improve the accuracy of the FFT implementation #10625

Uh oh!

Conversation

Accuracy

Speed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!