[go: up one dir, main page]

0% found this document useful (0 votes)
9 views9 pages

A New Representation of FFT Algorithms

This paper introduces a new representation for FFT algorithms using triangular matrices, which simplifies understanding and allows for the generation of new algorithms. The triangular matrix representation is more general than the binary tree representation, supporting any radix and facilitating the design of FFT hardware architectures. The paper discusses the advantages of this representation and compares it with existing methods, highlighting its potential for further exploration of FFT algorithms.

Uploaded by

Pradeep K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views9 pages

A New Representation of FFT Algorithms

This paper introduces a new representation for FFT algorithms using triangular matrices, which simplifies understanding and allows for the generation of new algorithms. The triangular matrix representation is more general than the binary tree representation, supporting any radix and facilitating the design of FFT hardware architectures. The paper discusses the advantages of this representation and compares it with existing methods, highlighting its potential for further exploration of FFT algorithms.

Uploaded by

Pradeep K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS 1

A New Representation of FFT Algorithms


Using Triangular Matrices
Mario Garrido, Member, IEEE

Abstract—In this paper we propose a new representation for rotators than the radix-2 ones and less adders than the radix-4
FFT algorithms called the triangular matrix representation. This ones. Later, radix-23 [9], [10], radix-24 [11] and in general
representation is more general than the binary tree representation radix-2k [9], [12], [13] algorithms were proposed.
and, therefore, it introduces new FFT algorithms that were not
discovered before. Furthermore, the new representation has the Nowadays we can talk about a large number of FFT algo-
advantage that it is simple and easy to understand, as each FFT rithms. This is thanks to the binary tree representation [14],
algorithm only consists of a triangular matrix. Besides, the new [15]. This representation is a generalization of the Cooley-
representation allows for obtaining the exact twiddle factor values Tukey algorithm, which allows for decimating an FFT at any
in the FFT flow graph easily. This facilitates the design of FFT stage in each iteration of the decimation. The binary tree repre-
hardware architectures. As a result, the triangular matrix repre-
sentation is an excellent alternative to represent FFT algorithms sentation can represent all the algorithms mentioned previously.
and it opens new possibilities in the exploration and understanding A comparison of FFT algorithms can be found in [16].
of the FFT. In this paper we propose a new representation of the FFT
Index Terms—Binary tree, Cooley-Tukey, fast Fourier trans- algorithms. It is called triangular matrix, as the FFT algorithms
form (FFT). are represented by a triangular matrix. The triangular matrix
representation has four main advantages. First, the representa-
tion is simple and easy to understand. Second, it support any
I. I NTRODUCTION
radix, including mixed radix algorithms. Third, it is a more

I T WAS 50 years ago when Cooley and Tukey proposed the


fast Fourier transform (FFT) algorithm [1]. The FFT is a way
to compute the discrete Fourier transform (DCT) that reduces
general representation and allows for representing significantly
more algorithms than the binary tree one. Third, it allows for
generating the twiddle factors at each stage of the FFT easily.
the amount of computations from O(N 2 ) to O(N log N ) [2]. As a result, the triangular matrix representation opens new
The Cooley-Tukey algorithm led to two main algorithms: possibilities in the research of the FFT.
Radix-2 decimation in frequency (DIF) and radix-2 decimation The paper is organized as follows. Section II reviews pre-
in time (DIT). The radix-2 DIF FFT is a decomposition in vious FFT representations. Section III explains how to move
which the output sequence is separated into even and odd sam- rotations among FFT stages. This section also refutes some
ples iteratively. Analogously, the radix-2 DIT FFT separates the common beliefs related to the FFT that are shown to be wrong.
input sequence into even and odd samples iteratively. Section IV presents the triangular matrix representation, relates
After the Cooley-Tukey algorithm, new FFT algorithms were it to the binary tree representation and shows how to obtain the
presented. The radix-4 FFT algorithm [3] reduces the rotations twiddle factors. Section V shows the application to FFT imple-
of the radix-2 algorithm. The same occurs with radix-8 and mentation. Finally, Section VI provides the main conclusions of
higher radices. For further information about the first FFT algo- the paper.
rithms, [4] provides historical notes on the first FFT algorithms.
Later, the radices were combined in mixed radix algorithms
[5], [6]. Mixed radix is generally used when the number of II. R EVIEW OF P REVIOUS FFT R EPRESENTATIONS
points of the FFT, N , is not a power of the radix. For instance, A. The Cooley-Tukey Algorithm
N = 128 is not a power of 4 and, therefore, it cannot be
implemented using radix-4. However, it can be implemented The Cooley-Tukey algorithm [17], [18] is based on the obser-
with mixed radix-4 and 2, being all the stages of radix-4 except vation that multiple operations can be shared when calculating
the last one. the output frequencies of the FFT. This is done by decomposing
An evolution of the FFT algorithms happened with radix-22 the equation of the DFT
FFT [7], [8]. This radix led to new FFT architectures with less

N −1
x[n]e−j N nk .

X[k] = (1)
n=0
Manuscript received February 8, 2016; revised April 29, 2016 and June 15,
2016; accepted June 16, 2016. This work was supported by the the Swedish
ELLIIT Program. This paper was recommended by Associate Editor J. Dolecek. The most common decompositions are decimation in time
The author is with the Department of Electrical Engineering, Linköping Uni- (DIT) and decimation in frequency (DIF) [18]. The DIT de-
versity, SE-581 83 Linköping, Sweden (e-mail: mario.garrido.galvez@liu.se). composition separates the sequence x[n] into its even and odd
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. samples, whereas the DIF decomposition is applied on the
Digital Object Identifier 10.1109/TCSI.2016.2587822 output sequence X[k].
1549-8328 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

Fig. 2. Binary tree diagram of all possible algorithm for N = 16.

by 1, j, −1 and −j, respectively. They are called trivial rota-


tions because they can be easily implemented by interchang-
ing the real and imaginary parts and/or changing the sign of
kN/4
the data. Note that the twiddle factor W4 = W4k = WN ,
k ∈ {0, . . . , 3} only includes trivial rotations, whereas larger
Fig. 1. 16-point radix-2 DIF FFT flow graph.
twiddle factor include other rotations as well.

C. Binary-Tree Representation
In both decompositions, the N -point DFT is transformed into
two N/2-point DFTs. By applying the procedure iteratively, The binary tree representation [14], [15] is a generalization
each step halves the number of points of the DFTs, which of the Cooley-Tukey algorithm. The difference among them
finally leads to 2-point DFTs. lies in the fact that the binary tree representation allows for
decimating an FFT of any size at each of the iterations in the
decomposition, not only a 2-point DFT.
B. FFT Representation Using Flow Graphs A binary tree diagram is an effective way to represent various
In order to understand the operations performed by an FFT, FFT algorithms [14]. Fig. 2 shows all the possible binary tree
the FFT is represented most times by its flow graph. An diagrams for N = 16. In binary tree representation, each node
example of an FFT flow graph is shown in Fig. 1. The numbers is assigned a value m to represent a 2m -point DFT. Each node
at the input represent the indexes of the input sequence, x[n], has at most two leaves which represent the inner and outer
whereas those at the output are the frequencies, k, of the output DFTs.
signal X[k]. It can be observed that the inputs are ordered, and Mathematically, the binary tree decomposition is described as
the outputs are provided in the so-called bit-reversed order [19].
The flow graph consists of a series of n stages, s ∈ {1 . . . n}, X[k1 + P k2 ] =
where additions, subtractions and complex rotations are cal-  P −1  

Q−1  t1 k1 t2 k1
culated. Additions and subtractions come in pairs, forming a = x[Qt1 + t2 ]WP WN WQt2 k2 (4)
structure called butterfly. The flow graph in Fig. 1 assumes t2 =0 t1 =0
that the lower edge of each butterfly is always multiplied by
−1. These −1 are usually not depicted in order to simplify the where 0 ≤ k1 ≤ P − 1 and 0 ≤ k2 ≤ Q − 1.
graphs. Here, an N -point DFT is decomposed into Q DFTs of
The rotations are represented by the numbers between the P -point and P DFTs of Q-point. These are named as in-
stages, φ. Each of these values corresponds to a rotation by the ner and outer DFTs, respectively. Each output of the inner
twiddle factor DFT is multiplied by a twiddle factor, which has a resolution
N = 2p+q and index t2 k1 .
WNφ = e−j N φ .

(2) The decimation of an N point DFT is done by choosing the
values P and Q. Here, it is important to realized that N =
As general criterion, throughout the paper, it is considered that P · Q. Thus, the first iteration of the binary tree decomposition
the rotations of each stage s are those placed to the right of the allows for splitting an N -point DFT in multiple different ways,
butterflies of that stage. which correspond to the possible combinations of P and Q
Whereas WNφ represents a single rotation value, WL (without whose product is equal to N .
exponent) represents a set of rotations, where After the first iteration, the remaining P and Q-point DFTs
are again divided into smaller DFTs by using the same proce-
WL = WLk , k = 0, . . . , L − 1. (3) dure. Again, these DFTs can be divided arbitrarily by factors
whose product is equal to the corresponding DFT size. The
Rotations by the angles 0◦ , 90◦ , 180◦ , and 270◦ , are called procedure continues until all the branches have been reduced to
trivial rotations. These rotations correspond to multiplications radix-2 DFTs. When the binary tree is finalized, all leaf nodes
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

GARRIDO: A NEW REPRESENTATION OF FFT ALGORITHMS USING TRIANGULAR MATRICES 3

correspond to the required radix and internal nodes correspond


to twiddle factor multiplication stages, which connect the two
decomposed DFTs.
The number of radix-2 2n -point FFT algorithms generated
using binary trees is [15]
Fig. 3. Movement of rotations in the FFT. (a) Original structure. (b) Structure
after movement.
(2(n − 1))!
. (5)
n!(n − 1)!

This number comes from all the possible selections that can be
done at each iteration. For instance, for N = 16, n = log2 N = 4.
According to (5) this leads to 5 different algorithms, as shown
in Fig. 2.
The twiddle factors at each stages are directly obtained from
the binary tree. This is done by going across the binary tree
from left to right. In this process, the final leafs of the tree,
represented by 1, correspond to radix-2 DFT operations. For
example, the sequence of numbers in the binary tree in Fig. 2(a)
is 2, 3, 4. Given this sequence, the number of angles of the cor-
responding twiddle factors is the power of these numbers. Thus,
for this example the twiddle factors are W4 , W8 , and W16 .
This corresponds to the DIT decomposition.

D. Matrix Factorization
Another way to represent the FFT is by using the matrix
factorization [12], [20]. This method represents the FFT in Fig. 4. 16-point radix-22 DIF FFT flow graph.
an algebraic way as a product of matrices. For instance, the
equation purpose is to represent and compare FFT algorithms, the binary
tree representation is the most attractive representation so far.
(r)   (r)
TN = PN · Ir ⊗ T nr · DN · Tr ⊗ I N (6)
r
III. M OVING FFT ROTATIONS
is a matricial representation of the recursion used in the Cooley-
Tukey algorithm [12]. In the equation, ⊗ represents a tensor A. Basic Movement of Twiddle Factors
product or Kronecker product and the matrices represent per- In an FFT, rotations can be moved among various stages.
mutations and twiddle factor multiplications. Fig. 3 shows how rotations are moved in a flow graph. Fig. 3(a)
considers the rotations previous to a butterfly, φx = a and
φy = b, and those after the butterfly, φX = c and φY = d. The
E. Comparison of Representations
equation of this structure is
Each FFT representation has its utility. The Cooley-Tukey
algorithm and the matrix factorization approaches represent X = WNc · x · WNa + y · WNb
the FFT algorithm algebraically. The matrix factorization is a Y = WNd · x · WNa − y · WNb . (7)
powerful mathematical and computational tool to represent not
only the FFT algorithms but also the permutations involved in If we divide the inputs by the factor WNq , we can pass this
FFT architectures. Although they are useful approaches for the factor to the outputs, leading to the equation
generation of FFTs, in terms of visualization and representation 
the Cooley-Tukey algorithm and the matrix factorization are X = WNc+q · x · WNa−q + y · WNb−q
difficult to visualize and understand. This makes it difficult to 
observe different algorithms and draw conclusions about them. Y = WNd+q · x · WNa−q − y · WNb−q . (8)
If we look for a representation of FFT algorithms we want
to get an idea of the algorithm from its visualization. From This equation is represented in Fig. 3(b). As a result, q is
this perspective, the representation using flow graphs provides subtracted from the φ values at the input and is added to the
more clarity than the Cooley-Tukey and the matrix factorization φ values at the output.
approaches. The representation using flow graphs is excellent
for FFTs of small sizes. However, for large FFTs the size of the
B. Moving Rotations in the FFT Flow Graph
graph increases drastically and it becomes cumbersome.
Finally, the binary tree representation is clear, intuitive, If we apply the movement of rotations to the flow graph in
compact and can represent small and large FFTs easily. If the Fig. 1, we obtain new flow graphs. The flow graph in Fig. 4
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

Fig. 5. 16-point radix-2 DIT FFT flow graph with inputs in natural order. Fig. 7. 16-point radix-4 DIT FFT flow graph.

Fig. 8. Radix-4 butterfly.

Fig. 5 the inputs are in natural order and the outputs are in bit-
reversed order. It can be easily checked that both figures are the
same algorithm, i.e., the operations in both of them are exactly
the same. This refutes the common belief that the inputs of the
DIF decomposition are in natural order and those of the DIT
decomposition are in bit-reversed order. In fact, the order of the
inputs and outputs is always independent of the FFT algorithm:
Each FFT algorithm can be represented with any input and
output orders.

Fig. 6. 16-point radix-2 DIT FFT flow graph. C. Radix-4, Radix-8, etc. Algorithms
Fig. 7 shows the 16-point radix-4 DIF FFT flow graph. Each
represents a 16-point radix-22 FFT. If we compare Figs. 1 and 4, dot in the middle of the stages with four inputs and four outputs
we observe that the radix-22 FFT is the result of moving the represents a radix-4 butterfly, like the one shown in Fig. 8. If
factor WN1 from stage 1 to stage 2 in the butterflies with we substitute the dots by the corresponding butterfly, we obtain
odd twiddle factors. In general for any N , the radix-22 FFT the flow graph shown in Fig. 9. By comparing it to the 16-point
N/4 radix-22 DIF FFT flow graph in Fig. 4 we observe that the radix-
algorithm is the result of moving all the factors except WN
22 and radix-4 algorithms are indeed the same algorithm, i.e.,
from each odd stage to the next even stage in a radix-2 flow
N/4 the operations in both of them are exactly the same. The same
graph. The twiddle factor WN is a trivial rotation that remains holds for radix-8 and radix-23, for radix-16 and radix-24, etc.
in odd stages.
If we move twiddle factors in Fig. 4 from stage 2 to stage 3,
we obtain the flow graph in Fig. 5. This flow graph is a 16-point IV. T HE T RIANGULAR M ATRIX R EPRESENTATION
radix-2 DIT FFT. Therefore, by moving twiddle factors we can
A. Rotations in an FFT Flow Graph
obtain different FFT algorithms. Indeed, the FFT algorithms
only differ on the values of the twiddle factors. The movements presented in previous section can be clas-
A more usual representation of the 16-point radix-2 DIT FFT sified into sets. In any FFT algorithm there is one set of WN1
is shown in Fig. 6. In this case, the inputs are depicted in bit- rotations that can be move in all stages except the last one.
reversed order and the outputs in natural order [18], whereas in There are also two set of WN2 . One of them can be moved
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

GARRIDO: A NEW REPRESENTATION OF FFT ALGORITHMS USING TRIANGULAR MATRICES 5

Fig. 9. 16-point radix-4 DIT FFT flow graph with detailed butterflies.

algorithm represented in the figure is the radix-2 DIF FFT. Note


that the first stage includes the set of WN1 , one of the 2 sets of
WN2 and one of the 3 sets of WN4 . The second stage includes
the second set of WN2 and one set of WN4 . And the third stage
includes one set of WN4 .
Fig. 10. Rotations in a 16-point FFT.
It can be observed that the set of WN1 can be placed in the first
three stages of the FFT by just moving it through the butterflies.
The set WN2 in the first stage can be placed in stages 1 and 2,
but not in stage 3. The set WN2 in the second stage can be placed
in stages 2 and 3. And the sets WN4 cannot be moved. Note that
these sets of rotations are formed according to their movement
possibilities.
Note also that the radix-2 DIF FFT algorithm is the case in
which the sets of rotations are placed at the earliest possible
stage. If we move the rotations to the most left stage we obtain
the radix-2 DIT FFT algorithm, as shown in Fig. 5. And if we
move the rotations by WN1 and WN2 in the first stage to the
second one, we obtain the radix-22 FFT, as shown in Fig. 4.
Apart from the radix-2 DIF, radix-2 DIT and radix-22 algo-
rithms, we can perform other movements in the flow graph to
obtain new FFT algorithms. For instance, a new algorithm is
obtained when we move the set of WN2 in the first stage to the
second one and keep the set of WN1 in the first stage.
As a result, the number of FFT algorithms that can be
represented is large and we need a representation that includes
all of them. This representation is presented in the next section.

Fig. 11. 16-point radix-2 DIF FFT flow graph with decomposed twiddle B. The Triangular Matrix Representation
factors.
The triangular matrix representation for FFT algorithms is
shown in Fig. 12 for the case of N = 16 points. For any N ,
between stages 1 and n−2, and the other one between stages 2 the triangular matrix representation has a number of rows and
and n − 1. And in general, there are m sets of WN2
m−1
. Each columns equal to n − 1. The upper right corner is the rotation
set p, p = 0 . . . m − 1 can be moved between stages 1 + p and WN1 , which represents the smallest rotation angle. Starting from
n − m + p. This is shown in Fig. 10 for the case of N = 16 the corner, each diagonal corresponds to a twiddle factor. The
points. second diagonal is the factor WN2 . The third diagonal is the
In case of the 16-point FFT flow graph, Fig. 11 shows factor WN4 and so on. The last diagonal is the lowest one and
N/4
the flow graph of the FFT with decomposed rotations. The corresponds to the trivial rotations WN .
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

TABLE I
N UMBER OF FFT A LGORITHMS OF THE
T RIANGULAR M ATRIX R EPRESENTATION

Fig. 12. Triangular matrix representation for FFT algorithms.

Fig. 13. Triangular matrix representation of typical 16-point FFT algorithms.


(a) Radix-2 DIF. (b) Radix-2 DIT. (c) Radix-22 DIF.
Fig. 14. Number of FFT algorithm as a function of the number of FFT stages.

The number of the column represents the highest FFT stage


where the rotation can be moved, whereas the number of the alternatives for each element in the matrix, the total number
row indicates the lowest FFT stage where the rotation can be of algorithms as a function of n is obtained as
moved. According to this, WN1 can be placed between stages 1
and n − 1. There is one set of rotations by WN2 that can be 
n−1
Number of algorithms = (n − k)k . (9)
placed in stages 1 and 2, and another set that can be placed
k=1
in stages 2 and 3. Finally, for the trivial rotations in the main
diagonal the number of the column and the number of the row The total number of algorithms as a function of the FFT size
are the same. This means that each set of trivial rotations is is shown in Table I. Note that the number grows extremely fast,
placed in an specific stage and cannot be moved. All of this achieving more than 34 thousands for N = 64 and more than
agrees with the explanation in Section IV-A. 24 million for N = 128.
The radix-2 DIF FFT is the case where all the rotations are Fig. 14 compares the number of algorithms representable
in the lowest possible stage. This means that the values inside by binary trees as shown in equation (5) with the number
the matrix are equal to the row in which they are. This is of algorithms representable by triangular matrix as shown in
shown in Fig. 13(a). Analogously, the radix-2 DIT FFT is the equation (9). Fig. 14 shows that the number of algorithms
case where all the rotations are in the highest possible stage. representable by triangular matrix is significantly larger than
This means that the values inside the matrix are equal to the that of binary tree. In fact, the binary tree representation is a
column in which they are. This is shown in Fig. 13(b). Finally, subset of the triangular matrix representation. This is explained
The radix-22 DIF FFT is the case in which the rotations of the more in detail in the next section.
radix-2 DIF algorithm in odd stages are moved to the next even
stage. This is shown in Fig. 13(c).
D. Equivalence to the Binary Tree Representation
Fig. 15 shows various 256-point FFT algorithms in the
C. Number of Representations
triangular matrix representation, as well as its equivalent binary
The number of algorithm of the triangular matrix represen- tree. Fig. 15(a) is the 256-point radix-2 DIF FFT. As mentioned
tation can be calculated easily. As we know the number of before, it is characterized by the fact that each element of the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

GARRIDO: A NEW REPRESENTATION OF FFT ALGORITHMS USING TRIANGULAR MATRICES 7

Fig. 15. (a) 256-point radix-2 DIF FFT. (b) 256-point radix-2 DIT FFT. (c) 256-point radix-22 DIF FFT. (d) 256-point mixed radix-25 -23 DIF FFT. (e) 256-point
radix-24 FFT (balanced binary tree). (f) 256-point FFT algorithm not representable with a binary tree.

matrix is equal to its row. Fig. 15(b) is the 256-point radix-2


DIT FFT. It is characterized by the fact that each element of
the matrix is equal to its column. Fig. 15(c) is the 256-point
radix-22 DIF FFT. It is characterized by the fact that elements
in even rows are equal to the row number and elements in odd
rows are equal to the next row, except for the trivial twiddle fac-
N/4
tors WN . Fig. 15(d) is the 256-point mixed radix-25-23 DIF
FFT. Fig. 15(e) is the 256-point radix-24 FFT. For a 256-point
FFT, radix-24 forms a balanced binary tree. In a balanced binary
tree the DIF and DIT decompositions are the same algorithm.
The balanced binary tree is characterized by the fact that it
is formed by squares of numbers. It includes a big square of
n/2 × n/2 elements for the stage n/2, a square of n/4 × n/4
elements for stages n/4 and 3n/4, and so on. And it ends with
squares of 1 elements for odd stages.
Finally, Fig. 15(f) shows an algorithm that is not repre-
sentable by a binary tree. The criterion to know if a triangular
matrix representation can also be represented by a binary tree
is simple: A binary tree is an algorithm in which the triangular
Fig. 16. 16-point radix-2 DIF FFT flow graph with the bits of the index
matrix representation consists of n − 1 rectangles or squares depicted.
of numbers, one for each stage. If a shape in the triangular
matrix representation is not a rectangle or a square, or if the define the index we always consider the inputs in natural order
number of them is larger than n − 1, then this representation and the outputs in bit-reversed order. According to this, φs (I)
cannot be represented as a binary tree. Note that, indeed, is the value of the rotation at stage s and index I. For instance,
most algorithms cannot be represented as a binary tree. This in Fig. 16 φ1 (9) = 1 and φ2 (15) = 6.
highlights the fact that the triangular matrix representation can The values of φs (I) can be obtained directly from the bits
represent significantly more algorithms than the binary tree of the index bn−1 bn−2 . . . b1 b0 . For this purpose, we include
representation. the bits of the index in the triangular matrix representation, as
shown in Fig. 17.
E. Obtaining the Twiddle Factor Values According to this, the values of the twiddle factors are
obtained from the triangular matrix representation as
Fig. 16 shows the flow graph of a 16-point radix-2 DIF 
FFT that includes the index I ≡ bn−1 bn−2 . . . b1 b0 , where (≡) φs (I) = b(i) · 2n−1−i · b(j) · 2j (10)
relates the decimal and binary representations of the index. To xij =s
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS

TABLE II
A LGORITHMS W ITH THE M INIMUM N UMBER
OF N ON -T RIVIAL R OTATIONS

Fig. 17. Triangular matrix representation of the 16-point radix-2 DIF FFT
algorithm with the bits of the index. For radix-22 the twiddle factor calculation is reduced to two
equations, one for odd stages and one for even stages. For
where the sum is for all the elements of the matrix xij whose radix-22 DIF the twiddle factors at odd stages are
stage is equal to s, b(i) is the bit in row i and b(j) is the bit in
φs (I) ≡ bn−s · bn−s−1 · 2n−2 (16)
column j. We can also represent equation (10) in binary as
 and those at even stages are
φs (I) ≡ . . 0 ] · [b(j) 0 .
[b(i) 0 . . . 0 ]. (11)
xij =s n−1−i j φs (I) ≡ [bn−s bn−s+1 ] · [bn−s−1 . . . b0 ] · 2s−2 . (17)
For the triangular matrix representation in Fig. 17 this leads to Finally, for the radix-22 DIT algorithm, twiddle factors at odd
stages are
φ1 (I) ≡ b3 · [b2 00] + b3 · [b1 0] + b3 · b0 = b3 · [b2 b1 b0 ]
φ2 (I) ≡ [b2 0] · [b1 0] + [b2 0] · [b0 ] = [b2 0] · [b1 b0 ] φs (I) ≡ bn−s · bn−s−1 · 2n−2 (18)
φ3 (I) ≡ [b1 00] · [b0 ]. (12) and twiddle factors at even stages are

It can be checked that the values of φs (I) correspond to the φs (I) ≡ [bn−s−1 bn−s−2 ] · [bn−s . . . bn−1 ] · 2n−s−2 . (19)
twiddle factors in Fig. 16. For instance φ1 (9) ≡ b3 · [b2 b1 b0 ] =
1 · [001] ≡ 1 and φ2 (15) ≡ [b2 0] · [b1 b0 ] = [10] · [11] = 110 ≡ 6.
V. A PPLICATION TO FFT I MPLEMENTATION
In the same way, the twiddle factors for the balanced binary
tree in Fig. 15(e) are A. Minimum Number of Non-Trivial Rotations
φ1 (I) ≡ b7 · [b6 000000] The algorithms with the minimum number of non-trivial ro-
φ2 (I) ≡ [b6 b7 ] · [b5 b4 0000] tations are relevant for both software implementation and direct
implementation in hardware. In software, these algorithms lead
φ3 (I) ≡ [b5 00] · [b4 0000]
to the smallest number of operations. In a direct hardware
φ4 (I) ≡ [b4 b5 b6 b7 ] · [b3 b2 b1 b0 ] implementation these algorithms lead to the implementation
φ5 (I) ≡ [b3 0000] · [b2 00] with the smallest number of rotators.
φ6 (I) ≡ [b2 b3 0000] · [b1 b0 ] The algorithms with the minimum number of non-trivial
rotations are shown in Table II for FFT sizes from 16 to 256.
φ7 (I) ≡ [b1 000000] · b0 . (13)
For even powers of two the best alternative is to decompose the
Note that stages 1, 3, 5, and 7 only take the values 0 and 64, FFT in groups of radix-22 FFTs, while trying to make the tree
N/4
where WN = WN64 is a trivial rotation. And stage 4 is the as balanced as possible. For odd powers of two, mixed radix is
stage where most rotation sets are allocated. used, where one of the radices is 23 and the other radix includes
In this way it is very easy to obtain the twiddle factors of any the rest of stages. These stages are then decomposed as in the
FFT algorithm. case of even powers of two.

F. Twiddle Factors of Common Algorithms B. SDF FFTs

Based on the procedure in previous sections, we can calculate The proposed triangular matrix representation has direct
the twiddle factors of different FFT algorithms. For some com- application to the design of FFT hardware architectures. In
mon radices, the twiddle factor calculations can be represented this section we present the example of a single-delay feedback
by simple equations. For radix-2 DIF, the twiddle factors at any (SDF) architecture, shown in Fig. 18(a). This architecture re-
stage s and for any n are ceives input samples in natural order and provides the outputs
in bit-reversed order. The architecture is the same for any FFT
φs (I) ≡ bn−s · 2s−1 · [bn−s−1 . . . b0 ]. (14) algorithm. The only difference among algorithms is the content
of the memories M 1, M 2, and M 3. The memory at each stage,
Thus, a single equation defines all the twiddle factors of
s, includes the rotations by φs (I), as obtained in the previous
the FFT.
section. For instance, in order to implement the radix-2 DIT
Analogously, for the radix-2 DIT, the twiddle factors are
FFT the twiddle factors in the memories will be those defined
obtained by
by equation (15), and for a radix-22 DIF the memories must
φs (I) ≡ bn−s−1 · 2n−s−1 [bn−s . . . bn−1 ]. (15) store the twiddle factors according to equations (16) and (17).
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

GARRIDO: A NEW REPRESENTATION OF FFT ALGORITHMS USING TRIANGULAR MATRICES 9

R EFERENCES
[1] J. Cooley and J. W. Tukey, “An algorithm for the machine calculation of
complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, Apr. 1965.
[2] G. Bergland, “A guided tour of the fast fourier transform,” IEEE
Spectrum, vol. 6, no. 7, pp. 41–52, Jul. 1969.
[3] W. M. Gentleman and G. Sande, “Fast Fourier transforms for fun and
profit,” in Proc. Joint Comput. Conf., vol. 29, Nov. 1966, pp. 563–578.
[4] J. Cooley, P. Lewis, and P. Welch, “Historical notes on the fast Fourier
transform,” Proc. IEEE, vol. 55, no. 10, pp. 1675–1677, Oct. 1967.
[5] R. C. Singleton, “An algorithm for computing the mixed radix fast
Fourier transform,” IEEE Trans. Audio Electroacoust., vol. AU-17, no. 2,
pp. 93–103, Jun. 1969.
[6] Y. Jin Moon and Y. Il Kim, “A mixed-radix 4-2 butterfly with simple
bit reversing for ordering the output sequences,” in Proc. Int. Conf. Adv.
Commun. Technol., Feb. 2006, vol. 3, pp. 1771–1774.
[7] S. He and M. Torkelson, “A new approach to pipeline FFT processor,” in
Proc. Parallel Process. Symp., Apr. 1996, pp. 766–770.
[8] S. He and M. Torkelson, “Design and implementation of a 1024-point
pipeline FFT processor,” in Proc. IEEE Custom Integr. Circuits Conf.,
May 1998, pp. 131–134.
[9] S. He and M. Torkelson, “Designing pipeline FFT processor for OFDM
(de)modulation,” in Proc. URSI Int. Symp. Signals, Syst., Electron.,
Sep. 1998, pp. 257–262.
Fig. 18. Single delay feedback FFT architectures. (a) For inputs in natural [10] T. Adiono, M. Irsyadi, Y. Hidayat, and A. Irawan, “64-point fast efficient
order and outputs in bit-reversed order. (b) For inputs in bit-reversed order and FFT architecture using radix-2 3 single path delay feedback,” in Proc. Int.
outputs in natural order. Conf. Electr. Eng. Informat., Aug. 2009, vol. 2, pp. 654–658.
[11] C.-P. Fan, M.-S. Lee, and G.-A. Su, “A low multiplier and multiplica-
tion costs 256-point FFT implementation with simplified radix-24 SDF
In general, the order of the twiddle factors in memory must architecture,” in Proc. IEEE Asia-Pacific Conf. Circuits Syst., Dec. 2006,
respect the order of the data input to the multiplier. In a second pp. 1935–1938.
[12] A. Cortés, I. Vélez, and J. F. Sevillano, “Radix r k FFTs: Matricial rep-
example, we may want an architecture in which inputs are in resentation and SDC/SDF pipeline implementation,” IEEE Trans. Signal
bit-reversed order and outputs in natural order, as shown in Process., vol. 57, no. 7, pp. 2824–2839, Jul. 2009.
Fig. 18(b). In this architecture, the inputs to the multipliers are [13] M. Garrido, J. Grajal, M. A. Sánchez, and O. Gustafsson, “Pipelined
in bit-reversed order. According to this, the twiddle factors in radix-2k feedforward FFT architectures,” IEEE Trans. VLSI Syst., vol. 21,
pp. 23–32, Jan. 2013.
the memories must be stored in the memories or read from [14] H.-Y. Lee and I.-C. Park, “Balanced binary-tree decomposition for area-
the memories in bit-reversed order. Again, any FFT algorithm efficient pipelined FFT processing,” IEEE Trans. Circuits Syst. I, Reg.
representable by triangular matrices can be mapped to the archi- Papers, vol. 54, no. 4, pp. 889–900, Apr. 2007.
[15] F. Qureshi and O. Gustafsson, “Generation of all radix-2 fast Fourier
tecture, being the only difference the content of the memories. transform algorithms using binary trees,” in Proc. Eur. Conf. Circuit
In order to achieve optimized FFT architectures, several Theory Des., Aug. 2011, pp. 677–680.
previous works [21], [22] show how to proceed. The main idea [16] F. Qureshi, “Optimization of rotations in FFTs,” Ph.D. dissertation, Dept.
Elect. Eng., Linköping Univ., Mar. 2012.
behind the design is to combine the algorithm selection and [17] J. Cooley and J. Tukey, “An algorithm for the machine calculation of
the rotator implementation. This leads to simpler rotators at complex Fourier series,” Math. Comput., vol. 19, pp. 297–301, 1965.
different FFT stages. [18] A. Oppenheim and R. Schafer, Discrete-Time Signal Processing.
Engelwood Cliffs, NJ, USA: Prentice-Hall, 1989.
[19] L. R. Rabiner and B. Gold, Theory and Application of Digital Signal
VI. C ONCLUSION Processing. Engelwood Cliffs, NJ, USA: Prentice-Hall, 1975.
[20] C. V. Loan, Computational Frameworks for the Fast Fourier Transform.
This paper has presented the triangular matrix representation. Philadelphia, PA, USA: SIAM, 1992.
It is based on considering groups of rotations and moving [21] M. Garrido, R. Andersson, F. Qureshi, and O. Gustafsson, “Multiplierless
unity-gain SDF FFTs,” IEEE Trans. VLSI Syst., 2016, to be published.
them among FFT stages. As the name of the representation [22] R. Andersson, “FFT hardware architectures with reduced twiddle factor
suggests, FFT algorithms are represented by a triangular matrix, sets,” M.S. thesis, Dept. Elect. Eng., Linköping Univ., Jun. 2014.
where the elements of the matrix indicate in which FFT stage
the rotations are allocated. The triangular matrix representa-
tion adds degrees of freedom with respect to the binary tree
representation. As a result, it can represent significantly more Mario Garrido (M’07) received the M.S. degree
algorithms than the binary tree representation. The triangular in electrical engineering and the Ph.D. degree from
matrix representation has direct application to the implemen- the Technical University of Madrid (UPM), Madrid,
Spain, in 2004 and 2009, respectively. In 2010 he
tation of the FFT in software and hardware. The algorithms moved to Sweden to work as a Postdoctoral Re-
with the minimum number of operations are highlighted in searcher at the Department of Electrical Engineering
the paper. And in order to obtain optimized hardware designs, at Linköping University, Sweden. Since 2012 he is
Associate Professor at the same department.
the triangular matrix representation can be combined with the His research focuses on optimized hardware de-
rotator implementation. sign for signal processing applications. This includes
the design of hardware architectures for the calcu-
lation of transforms, such as the fast Fourier transform (FFT), circuits for
ACKNOWLEDGMENT data management, the CORDIC algorithm, and circuits to calculate statistical
and mathematical operations. His research covers high-performance circuits
The author wants to thank Dr. O. Gustafsson for the valuable for real-time computation, as well as designs for small area and low power
discussions about this work. consumption.

You might also like