A New Representation of FFT Algorithms
A New Representation of FFT Algorithms
Abstract—In this paper we propose a new representation for rotators than the radix-2 ones and less adders than the radix-4
FFT algorithms called the triangular matrix representation. This ones. Later, radix-23 [9], [10], radix-24 [11] and in general
representation is more general than the binary tree representation radix-2k [9], [12], [13] algorithms were proposed.
and, therefore, it introduces new FFT algorithms that were not
discovered before. Furthermore, the new representation has the Nowadays we can talk about a large number of FFT algo-
advantage that it is simple and easy to understand, as each FFT rithms. This is thanks to the binary tree representation [14],
algorithm only consists of a triangular matrix. Besides, the new [15]. This representation is a generalization of the Cooley-
representation allows for obtaining the exact twiddle factor values Tukey algorithm, which allows for decimating an FFT at any
in the FFT flow graph easily. This facilitates the design of FFT stage in each iteration of the decimation. The binary tree repre-
hardware architectures. As a result, the triangular matrix repre-
sentation is an excellent alternative to represent FFT algorithms sentation can represent all the algorithms mentioned previously.
and it opens new possibilities in the exploration and understanding A comparison of FFT algorithms can be found in [16].
of the FFT. In this paper we propose a new representation of the FFT
Index Terms—Binary tree, Cooley-Tukey, fast Fourier trans- algorithms. It is called triangular matrix, as the FFT algorithms
form (FFT). are represented by a triangular matrix. The triangular matrix
representation has four main advantages. First, the representa-
tion is simple and easy to understand. Second, it support any
I. I NTRODUCTION
radix, including mixed radix algorithms. Third, it is a more
C. Binary-Tree Representation
In both decompositions, the N -point DFT is transformed into
two N/2-point DFTs. By applying the procedure iteratively, The binary tree representation [14], [15] is a generalization
each step halves the number of points of the DFTs, which of the Cooley-Tukey algorithm. The difference among them
finally leads to 2-point DFTs. lies in the fact that the binary tree representation allows for
decimating an FFT of any size at each of the iterations in the
decomposition, not only a 2-point DFT.
B. FFT Representation Using Flow Graphs A binary tree diagram is an effective way to represent various
In order to understand the operations performed by an FFT, FFT algorithms [14]. Fig. 2 shows all the possible binary tree
the FFT is represented most times by its flow graph. An diagrams for N = 16. In binary tree representation, each node
example of an FFT flow graph is shown in Fig. 1. The numbers is assigned a value m to represent a 2m -point DFT. Each node
at the input represent the indexes of the input sequence, x[n], has at most two leaves which represent the inner and outer
whereas those at the output are the frequencies, k, of the output DFTs.
signal X[k]. It can be observed that the inputs are ordered, and Mathematically, the binary tree decomposition is described as
the outputs are provided in the so-called bit-reversed order [19].
The flow graph consists of a series of n stages, s ∈ {1 . . . n}, X[k1 + P k2 ] =
where additions, subtractions and complex rotations are cal- P −1
Q−1 t1 k1 t2 k1
culated. Additions and subtractions come in pairs, forming a = x[Qt1 + t2 ]WP WN WQt2 k2 (4)
structure called butterfly. The flow graph in Fig. 1 assumes t2 =0 t1 =0
that the lower edge of each butterfly is always multiplied by
−1. These −1 are usually not depicted in order to simplify the where 0 ≤ k1 ≤ P − 1 and 0 ≤ k2 ≤ Q − 1.
graphs. Here, an N -point DFT is decomposed into Q DFTs of
The rotations are represented by the numbers between the P -point and P DFTs of Q-point. These are named as in-
stages, φ. Each of these values corresponds to a rotation by the ner and outer DFTs, respectively. Each output of the inner
twiddle factor DFT is multiplied by a twiddle factor, which has a resolution
N = 2p+q and index t2 k1 .
WNφ = e−j N φ .
2π
(2) The decimation of an N point DFT is done by choosing the
values P and Q. Here, it is important to realized that N =
As general criterion, throughout the paper, it is considered that P · Q. Thus, the first iteration of the binary tree decomposition
the rotations of each stage s are those placed to the right of the allows for splitting an N -point DFT in multiple different ways,
butterflies of that stage. which correspond to the possible combinations of P and Q
Whereas WNφ represents a single rotation value, WL (without whose product is equal to N .
exponent) represents a set of rotations, where After the first iteration, the remaining P and Q-point DFTs
are again divided into smaller DFTs by using the same proce-
WL = WLk , k = 0, . . . , L − 1. (3) dure. Again, these DFTs can be divided arbitrarily by factors
whose product is equal to the corresponding DFT size. The
Rotations by the angles 0◦ , 90◦ , 180◦ , and 270◦ , are called procedure continues until all the branches have been reduced to
trivial rotations. These rotations correspond to multiplications radix-2 DFTs. When the binary tree is finalized, all leaf nodes
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
This number comes from all the possible selections that can be
done at each iteration. For instance, for N = 16, n = log2 N = 4.
According to (5) this leads to 5 different algorithms, as shown
in Fig. 2.
The twiddle factors at each stages are directly obtained from
the binary tree. This is done by going across the binary tree
from left to right. In this process, the final leafs of the tree,
represented by 1, correspond to radix-2 DFT operations. For
example, the sequence of numbers in the binary tree in Fig. 2(a)
is 2, 3, 4. Given this sequence, the number of angles of the cor-
responding twiddle factors is the power of these numbers. Thus,
for this example the twiddle factors are W4 , W8 , and W16 .
This corresponds to the DIT decomposition.
D. Matrix Factorization
Another way to represent the FFT is by using the matrix
factorization [12], [20]. This method represents the FFT in Fig. 4. 16-point radix-22 DIF FFT flow graph.
an algebraic way as a product of matrices. For instance, the
equation purpose is to represent and compare FFT algorithms, the binary
tree representation is the most attractive representation so far.
(r) (r)
TN = PN · Ir ⊗ T nr · DN · Tr ⊗ I N (6)
r
III. M OVING FFT ROTATIONS
is a matricial representation of the recursion used in the Cooley-
Tukey algorithm [12]. In the equation, ⊗ represents a tensor A. Basic Movement of Twiddle Factors
product or Kronecker product and the matrices represent per- In an FFT, rotations can be moved among various stages.
mutations and twiddle factor multiplications. Fig. 3 shows how rotations are moved in a flow graph. Fig. 3(a)
considers the rotations previous to a butterfly, φx = a and
φy = b, and those after the butterfly, φX = c and φY = d. The
E. Comparison of Representations
equation of this structure is
Each FFT representation has its utility. The Cooley-Tukey
algorithm and the matrix factorization approaches represent X = WNc · x · WNa + y · WNb
the FFT algorithm algebraically. The matrix factorization is a Y = WNd · x · WNa − y · WNb . (7)
powerful mathematical and computational tool to represent not
only the FFT algorithms but also the permutations involved in If we divide the inputs by the factor WNq , we can pass this
FFT architectures. Although they are useful approaches for the factor to the outputs, leading to the equation
generation of FFTs, in terms of visualization and representation
the Cooley-Tukey algorithm and the matrix factorization are X = WNc+q · x · WNa−q + y · WNb−q
difficult to visualize and understand. This makes it difficult to
observe different algorithms and draw conclusions about them. Y = WNd+q · x · WNa−q − y · WNb−q . (8)
If we look for a representation of FFT algorithms we want
to get an idea of the algorithm from its visualization. From This equation is represented in Fig. 3(b). As a result, q is
this perspective, the representation using flow graphs provides subtracted from the φ values at the input and is added to the
more clarity than the Cooley-Tukey and the matrix factorization φ values at the output.
approaches. The representation using flow graphs is excellent
for FFTs of small sizes. However, for large FFTs the size of the
B. Moving Rotations in the FFT Flow Graph
graph increases drastically and it becomes cumbersome.
Finally, the binary tree representation is clear, intuitive, If we apply the movement of rotations to the flow graph in
compact and can represent small and large FFTs easily. If the Fig. 1, we obtain new flow graphs. The flow graph in Fig. 4
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 5. 16-point radix-2 DIT FFT flow graph with inputs in natural order. Fig. 7. 16-point radix-4 DIT FFT flow graph.
Fig. 5 the inputs are in natural order and the outputs are in bit-
reversed order. It can be easily checked that both figures are the
same algorithm, i.e., the operations in both of them are exactly
the same. This refutes the common belief that the inputs of the
DIF decomposition are in natural order and those of the DIT
decomposition are in bit-reversed order. In fact, the order of the
inputs and outputs is always independent of the FFT algorithm:
Each FFT algorithm can be represented with any input and
output orders.
Fig. 6. 16-point radix-2 DIT FFT flow graph. C. Radix-4, Radix-8, etc. Algorithms
Fig. 7 shows the 16-point radix-4 DIF FFT flow graph. Each
represents a 16-point radix-22 FFT. If we compare Figs. 1 and 4, dot in the middle of the stages with four inputs and four outputs
we observe that the radix-22 FFT is the result of moving the represents a radix-4 butterfly, like the one shown in Fig. 8. If
factor WN1 from stage 1 to stage 2 in the butterflies with we substitute the dots by the corresponding butterfly, we obtain
odd twiddle factors. In general for any N , the radix-22 FFT the flow graph shown in Fig. 9. By comparing it to the 16-point
N/4 radix-22 DIF FFT flow graph in Fig. 4 we observe that the radix-
algorithm is the result of moving all the factors except WN
22 and radix-4 algorithms are indeed the same algorithm, i.e.,
from each odd stage to the next even stage in a radix-2 flow
N/4 the operations in both of them are exactly the same. The same
graph. The twiddle factor WN is a trivial rotation that remains holds for radix-8 and radix-23, for radix-16 and radix-24, etc.
in odd stages.
If we move twiddle factors in Fig. 4 from stage 2 to stage 3,
we obtain the flow graph in Fig. 5. This flow graph is a 16-point IV. T HE T RIANGULAR M ATRIX R EPRESENTATION
radix-2 DIT FFT. Therefore, by moving twiddle factors we can
A. Rotations in an FFT Flow Graph
obtain different FFT algorithms. Indeed, the FFT algorithms
only differ on the values of the twiddle factors. The movements presented in previous section can be clas-
A more usual representation of the 16-point radix-2 DIT FFT sified into sets. In any FFT algorithm there is one set of WN1
is shown in Fig. 6. In this case, the inputs are depicted in bit- rotations that can be move in all stages except the last one.
reversed order and the outputs in natural order [18], whereas in There are also two set of WN2 . One of them can be moved
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Fig. 9. 16-point radix-4 DIT FFT flow graph with detailed butterflies.
Fig. 11. 16-point radix-2 DIF FFT flow graph with decomposed twiddle B. The Triangular Matrix Representation
factors.
The triangular matrix representation for FFT algorithms is
shown in Fig. 12 for the case of N = 16 points. For any N ,
between stages 1 and n−2, and the other one between stages 2 the triangular matrix representation has a number of rows and
and n − 1. And in general, there are m sets of WN2
m−1
. Each columns equal to n − 1. The upper right corner is the rotation
set p, p = 0 . . . m − 1 can be moved between stages 1 + p and WN1 , which represents the smallest rotation angle. Starting from
n − m + p. This is shown in Fig. 10 for the case of N = 16 the corner, each diagonal corresponds to a twiddle factor. The
points. second diagonal is the factor WN2 . The third diagonal is the
In case of the 16-point FFT flow graph, Fig. 11 shows factor WN4 and so on. The last diagonal is the lowest one and
N/4
the flow graph of the FFT with decomposed rotations. The corresponds to the trivial rotations WN .
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I
N UMBER OF FFT A LGORITHMS OF THE
T RIANGULAR M ATRIX R EPRESENTATION
Fig. 15. (a) 256-point radix-2 DIF FFT. (b) 256-point radix-2 DIT FFT. (c) 256-point radix-22 DIF FFT. (d) 256-point mixed radix-25 -23 DIF FFT. (e) 256-point
radix-24 FFT (balanced binary tree). (f) 256-point FFT algorithm not representable with a binary tree.
TABLE II
A LGORITHMS W ITH THE M INIMUM N UMBER
OF N ON -T RIVIAL R OTATIONS
Fig. 17. Triangular matrix representation of the 16-point radix-2 DIF FFT
algorithm with the bits of the index. For radix-22 the twiddle factor calculation is reduced to two
equations, one for odd stages and one for even stages. For
where the sum is for all the elements of the matrix xij whose radix-22 DIF the twiddle factors at odd stages are
stage is equal to s, b(i) is the bit in row i and b(j) is the bit in
φs (I) ≡ bn−s · bn−s−1 · 2n−2 (16)
column j. We can also represent equation (10) in binary as
and those at even stages are
φs (I) ≡ . . 0 ] · [b(j) 0 .
[b(i) 0 . . . 0 ]. (11)
xij =s n−1−i j φs (I) ≡ [bn−s bn−s+1 ] · [bn−s−1 . . . b0 ] · 2s−2 . (17)
For the triangular matrix representation in Fig. 17 this leads to Finally, for the radix-22 DIT algorithm, twiddle factors at odd
stages are
φ1 (I) ≡ b3 · [b2 00] + b3 · [b1 0] + b3 · b0 = b3 · [b2 b1 b0 ]
φ2 (I) ≡ [b2 0] · [b1 0] + [b2 0] · [b0 ] = [b2 0] · [b1 b0 ] φs (I) ≡ bn−s · bn−s−1 · 2n−2 (18)
φ3 (I) ≡ [b1 00] · [b0 ]. (12) and twiddle factors at even stages are
It can be checked that the values of φs (I) correspond to the φs (I) ≡ [bn−s−1 bn−s−2 ] · [bn−s . . . bn−1 ] · 2n−s−2 . (19)
twiddle factors in Fig. 16. For instance φ1 (9) ≡ b3 · [b2 b1 b0 ] =
1 · [001] ≡ 1 and φ2 (15) ≡ [b2 0] · [b1 b0 ] = [10] · [11] = 110 ≡ 6.
V. A PPLICATION TO FFT I MPLEMENTATION
In the same way, the twiddle factors for the balanced binary
tree in Fig. 15(e) are A. Minimum Number of Non-Trivial Rotations
φ1 (I) ≡ b7 · [b6 000000] The algorithms with the minimum number of non-trivial ro-
φ2 (I) ≡ [b6 b7 ] · [b5 b4 0000] tations are relevant for both software implementation and direct
implementation in hardware. In software, these algorithms lead
φ3 (I) ≡ [b5 00] · [b4 0000]
to the smallest number of operations. In a direct hardware
φ4 (I) ≡ [b4 b5 b6 b7 ] · [b3 b2 b1 b0 ] implementation these algorithms lead to the implementation
φ5 (I) ≡ [b3 0000] · [b2 00] with the smallest number of rotators.
φ6 (I) ≡ [b2 b3 0000] · [b1 b0 ] The algorithms with the minimum number of non-trivial
rotations are shown in Table II for FFT sizes from 16 to 256.
φ7 (I) ≡ [b1 000000] · b0 . (13)
For even powers of two the best alternative is to decompose the
Note that stages 1, 3, 5, and 7 only take the values 0 and 64, FFT in groups of radix-22 FFTs, while trying to make the tree
N/4
where WN = WN64 is a trivial rotation. And stage 4 is the as balanced as possible. For odd powers of two, mixed radix is
stage where most rotation sets are allocated. used, where one of the radices is 23 and the other radix includes
In this way it is very easy to obtain the twiddle factors of any the rest of stages. These stages are then decomposed as in the
FFT algorithm. case of even powers of two.
Based on the procedure in previous sections, we can calculate The proposed triangular matrix representation has direct
the twiddle factors of different FFT algorithms. For some com- application to the design of FFT hardware architectures. In
mon radices, the twiddle factor calculations can be represented this section we present the example of a single-delay feedback
by simple equations. For radix-2 DIF, the twiddle factors at any (SDF) architecture, shown in Fig. 18(a). This architecture re-
stage s and for any n are ceives input samples in natural order and provides the outputs
in bit-reversed order. The architecture is the same for any FFT
φs (I) ≡ bn−s · 2s−1 · [bn−s−1 . . . b0 ]. (14) algorithm. The only difference among algorithms is the content
of the memories M 1, M 2, and M 3. The memory at each stage,
Thus, a single equation defines all the twiddle factors of
s, includes the rotations by φs (I), as obtained in the previous
the FFT.
section. For instance, in order to implement the radix-2 DIT
Analogously, for the radix-2 DIT, the twiddle factors are
FFT the twiddle factors in the memories will be those defined
obtained by
by equation (15), and for a radix-22 DIF the memories must
φs (I) ≡ bn−s−1 · 2n−s−1 [bn−s . . . bn−1 ]. (15) store the twiddle factors according to equations (16) and (17).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
R EFERENCES
[1] J. Cooley and J. W. Tukey, “An algorithm for the machine calculation of
complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, Apr. 1965.
[2] G. Bergland, “A guided tour of the fast fourier transform,” IEEE
Spectrum, vol. 6, no. 7, pp. 41–52, Jul. 1969.
[3] W. M. Gentleman and G. Sande, “Fast Fourier transforms for fun and
profit,” in Proc. Joint Comput. Conf., vol. 29, Nov. 1966, pp. 563–578.
[4] J. Cooley, P. Lewis, and P. Welch, “Historical notes on the fast Fourier
transform,” Proc. IEEE, vol. 55, no. 10, pp. 1675–1677, Oct. 1967.
[5] R. C. Singleton, “An algorithm for computing the mixed radix fast
Fourier transform,” IEEE Trans. Audio Electroacoust., vol. AU-17, no. 2,
pp. 93–103, Jun. 1969.
[6] Y. Jin Moon and Y. Il Kim, “A mixed-radix 4-2 butterfly with simple
bit reversing for ordering the output sequences,” in Proc. Int. Conf. Adv.
Commun. Technol., Feb. 2006, vol. 3, pp. 1771–1774.
[7] S. He and M. Torkelson, “A new approach to pipeline FFT processor,” in
Proc. Parallel Process. Symp., Apr. 1996, pp. 766–770.
[8] S. He and M. Torkelson, “Design and implementation of a 1024-point
pipeline FFT processor,” in Proc. IEEE Custom Integr. Circuits Conf.,
May 1998, pp. 131–134.
[9] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM
(de)modulation,” in Proc. URSI Int. Symp. Signals, Syst., Electron.,
Sep. 1998, pp. 257–262.
Fig. 18. Single delay feedback FFT architectures. (a) For inputs in natural [10] T. Adiono, M. Irsyadi, Y. Hidayat, and A. Irawan, “64-point fast efficient
order and outputs in bit-reversed order. (b) For inputs in bit-reversed order and FFT architecture using radix-2 3 single path delay feedback,” in Proc. Int.
outputs in natural order. Conf. Electr. Eng. Informat., Aug. 2009, vol. 2, pp. 654–658.
[11] C.-P. Fan, M.-S. Lee, and G.-A. Su, “A low multiplier and multiplica-
tion costs 256-point FFT implementation with simplified radix-24 SDF
In general, the order of the twiddle factors in memory must architecture,” in Proc. IEEE Asia-Pacific Conf. Circuits Syst., Dec. 2006,
respect the order of the data input to the multiplier. In a second pp. 1935–1938.
[12] A. Cortés, I. Vélez, and J. F. Sevillano, “Radix r k FFTs: Matricial rep-
example, we may want an architecture in which inputs are in resentation and SDC/SDF pipeline implementation,” IEEE Trans. Signal
bit-reversed order and outputs in natural order, as shown in Process., vol. 57, no. 7, pp. 2824–2839, Jul. 2009.
Fig. 18(b). In this architecture, the inputs to the multipliers are [13] M. Garrido, J. Grajal, M. A. Sánchez, and O. Gustafsson, “Pipelined
in bit-reversed order. According to this, the twiddle factors in radix-2k feedforward FFT architectures,” IEEE Trans. VLSI Syst., vol. 21,
pp. 23–32, Jan. 2013.
the memories must be stored in the memories or read from [14] H.-Y. Lee and I.-C. Park, “Balanced binary-tree decomposition for area-
the memories in bit-reversed order. Again, any FFT algorithm efficient pipelined FFT processing,” IEEE Trans. Circuits Syst. I, Reg.
representable by triangular matrices can be mapped to the archi- Papers, vol. 54, no. 4, pp. 889–900, Apr. 2007.
[15] F. Qureshi and O. Gustafsson, “Generation of all radix-2 fast Fourier
tecture, being the only difference the content of the memories. transform algorithms using binary trees,” in Proc. Eur. Conf. Circuit
In order to achieve optimized FFT architectures, several Theory Des., Aug. 2011, pp. 677–680.
previous works [21], [22] show how to proceed. The main idea [16] F. Qureshi, “Optimization of rotations in FFTs,” Ph.D. dissertation, Dept.
Elect. Eng., Linköping Univ., Mar. 2012.
behind the design is to combine the algorithm selection and [17] J. Cooley and J. Tukey, “An algorithm for the machine calculation of
the rotator implementation. This leads to simpler rotators at complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, 1965.
different FFT stages. [18] A. Oppenheim and R. Schafer, Discrete-Time Signal Processing.
Engelwood Cliffs, NJ, USA: Prentice-Hall, 1989.
[19] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal
VI. C ONCLUSION Processing. Engelwood Cliffs, NJ, USA: Prentice-Hall, 1975.
[20] C. V. Loan, Computational Frameworks for the Fast Fourier Transform.
This paper has presented the triangular matrix representation. Philadelphia, PA, USA: SIAM, 1992.
It is based on considering groups of rotations and moving [21] M. Garrido, R. Andersson, F. Qureshi, and O. Gustafsson, “Multiplierless
unity-gain SDF FFTs,” IEEE Trans. VLSI Syst., 2016, to be published.
them among FFT stages. As the name of the representation [22] R. Andersson, “FFT hardware architectures with reduced twiddle factor
suggests, FFT algorithms are represented by a triangular matrix, sets,” M.S. thesis, Dept. Elect. Eng., Linköping Univ., Jun. 2014.
where the elements of the matrix indicate in which FFT stage
the rotations are allocated. The triangular matrix representa-
tion adds degrees of freedom with respect to the binary tree
representation. As a result, it can represent significantly more Mario Garrido (M’07) received the M.S. degree
algorithms than the binary tree representation. The triangular in electrical engineering and the Ph.D. degree from
matrix representation has direct application to the implemen- the Technical University of Madrid (UPM), Madrid,
Spain, in 2004 and 2009, respectively. In 2010 he
tation of the FFT in software and hardware. The algorithms moved to Sweden to work as a Postdoctoral Re-
with the minimum number of operations are highlighted in searcher at the Department of Electrical Engineering
the paper. And in order to obtain optimized hardware designs, at Linköping University, Sweden. Since 2012 he is
Associate Professor at the same department.
the triangular matrix representation can be combined with the His research focuses on optimized hardware de-
rotator implementation. sign for signal processing applications. This includes
the design of hardware architectures for the calcu-
lation of transforms, such as the fast Fourier transform (FFT), circuits for
ACKNOWLEDGMENT data management, the CORDIC algorithm, and circuits to calculate statistical
and mathematical operations. His research covers high-performance circuits
The author wants to thank Dr. O. Gustafsson for the valuable for real-time computation, as well as designs for small area and low power
discussions about this work. consumption.