CN105718424B

CN105718424B - A kind of parallel Fast Fourier Transform processing method

Info

Publication number: CN105718424B
Application number: CN201610052233.5A
Authority: CN
Inventors: 禹霁阳; 汪路元; 李欣; 徐轲; 郭丽明; 冯国平; 徐勇; 李珂
Original assignee: Beijing Institute of Spacecraft System Engineering
Current assignee: Beijing Institute of Spacecraft System Engineering
Priority date: 2016-01-26
Filing date: 2016-01-26
Publication date: 2018-11-02
Anticipated expiration: 2036-01-26
Also published as: CN105718424A

Abstract

The present invention provides a parallel Fast Fourier Transform processing method, which divides the data sequence x(n) whose number of points is N=r ^S into vr secondary data blocks, and then calculates each secondary data by using radix r FFT in the block The FFT result of data, wherein: v=r ^Z , r and S are arbitrary integers, Z=0, 1, ... or S 2, so the number of data points N of the present invention has more values, can be selected in these Select the scheme with the least zero padding in the value, thereby reducing the occupation of storage space and calculation time; and the present invention adopts multi-butterfly parallel computing, and the number of parallel butterfly computing units v=r ^Z , so the present invention can be based on hardware The degree of parallelism can be selected according to the allocation of resources, which has greater flexibility.

Description

A Parallel Fast Fourier Transform Processing Method

技术领域technical field

本发明涉及FFT处理器设计技术领域，特别涉及一种并行快速傅立叶变换处理方法。The invention relates to the technical field of FFT processor design, in particular to a parallel fast Fourier transform processing method.

背景技术Background technique

快速傅立叶变换是通信、雷达系统中数字信号处理过程的关键组成，用来实现数字信号在时域和频域之间的变换。它在实际应用中能够实现诸如频谱分析、数字滤波、相关处理等功能。Fast Fourier transform is a key component of digital signal processing in communication and radar systems, and it is used to realize the transformation of digital signals between time domain and frequency domain. It can realize functions such as spectrum analysis, digital filtering, and correlation processing in practical applications.

当前，多蝶形并行计算快速傅立叶变换基本采用原位存储下的基2、基4结构实现，主要包括存储器访问地址生成、无冲突数据交换、蝶形计算等步骤。无冲突数据交换过程中，如果采用加入额外缓存的方法，会使得缓存的占用量随着点数的增加而增加，同时大量缓存的读写操作会增加整个计算的时间。目前多蝶形并行计算快速傅立叶变换中，多采用存储器无冲突访问地址生成模块，由此抵消掉蝶形计算前数据访问过程中产生的冲突，在无冲突访问存储器的条件下进行多蝶形并行计算。At present, the multi-butterfly parallel computing fast Fourier transform is basically implemented using the radix-2 and radix-4 structures under in-situ storage, which mainly includes steps such as memory access address generation, conflict-free data exchange, and butterfly calculation. In the process of conflict-free data exchange, if the method of adding an additional cache is adopted, the cache usage will increase with the increase of the number of points, and at the same time, a large number of cache read and write operations will increase the entire calculation time. At present, in the fast Fourier transform of multi-butterfly parallel computing, the memory non-conflict access address generation module is mostly used, thereby offsetting the conflicts generated during the data access process before butterfly computing, and performing multi-butterfly parallelism under the condition of conflict-free access to memory calculate.

北京理工大学CN101504638号专利公开了一种可变点数流水线FFT处理器，该发明是一种可变点数流水线FFT处理器，包括第一1024点可变FFT处理模块、旋转因子处理模块、第二1024点可变FFT处理模块、选择与控制模块。上述四个模块与处理器外部的中间数据存储模块共同完成大点数FFT的二维处理。中兴通讯股份有限公司CN101847986A号专利公开了一种实现FFT/IFFT变换的电路及方法，方法为：确定迭代次数、第一和第二RAM的深度、ROM存储器的深度；将待变换的输入数据的前和后半部分分别存入第二和第一RAM；进行迭代蝶形运算：第首次迭代中，读取第一和第二RAM时采用倒位序读取，偶数次蝶形运算结果写入第一RAM，奇数次蝶形运算结果写入第二RAM；在其他迭代中，采用正常位序读取第一和第二RAM，写回RAM的方式与第一次相同；最后一次迭代采用正常顺序来访问存储器。The CN101504638 patent of Beijing Institute of Technology discloses a variable-point pipeline FFT processor, which is a variable-point pipeline FFT processor, including a first 1024-point variable FFT processing module, a twiddle factor processing module, and a second 1024 Point-variable FFT processing module, selection and control module. The above four modules and the intermediate data storage module outside the processor jointly complete the two-dimensional processing of the large-point FFT. Zhongxing Communications Co., Ltd. CN101847986A patent discloses a circuit and method for realizing FFT/IFFT transformation. The method is as follows: determine the number of iterations, the depth of the first and second RAMs, and the depth of the ROM memory; The first and second halves are respectively stored in the second and first RAMs; iterative butterfly operation is performed: in the first iteration, when reading the first and second RAMs, the inverted order is used to read, and even times of butterfly operation results are written In the first RAM, the results of the odd number of butterfly operations are written to the second RAM; in other iterations, the first and second RAMs are read using the normal bit order, and the way to write back to the RAM is the same as the first time; the last iteration uses the normal sequential access to memory.

上述方法主要存在以下问题：The above method mainly has the following problems:

(1)基数的单一性影响了实际应用中傅里叶变换点数的选择(1) The singleness of the base affects the selection of Fourier transform points in practical applications

现有技术中的快速傅里叶变换在基2、基4相对应的点数选择时，最多需要补足一倍的零以此来达到长度为2的幂次方的条件。这就使得在计算过程中增加一倍的存储空间和一倍以上的计算时间。In the fast Fourier transform in the prior art, when the number of points corresponding to base 2 and base 4 is selected, at most double zeros are required to meet the condition that the length is a power of 2. This doubles the storage space and more than doubles the computing time in the calculation process.

(2)基数的单一性影响了实际应用中蝶形计算单元并行度的选择(2) The singleness of the base affects the selection of the parallelism of the butterfly computing unit in practical applications

现有技术中，由于基数是2的幂次方，蝶形计算单元的并行度也要是2的幂次方才能达到计算效率最高。而在实际应用中硬件资源可能达不到这样的要求，这种情况下采用降低并行度满足设计要求则会显著降低系统计算的性能。In the prior art, since the base number is a power of 2, the parallelism of the butterfly computing unit must also be a power of 2 to achieve the highest calculation efficiency. However, in practical applications, hardware resources may not meet such requirements. In this case, reducing the degree of parallelism to meet the design requirements will significantly reduce the performance of system computing.

发明内容Contents of the invention

本发明的目的在于克服现有技术的不足，提供一种并行快速傅立叶变换处理器及处理方法，可以实现。The purpose of the present invention is to overcome the deficiencies of the prior art, to provide a parallel fast Fourier transform processor and processing method, which can be realized.

本发明的上述目的通过以下方案实现：Above-mentioned purpose of the present invention is achieved through the following scheme:

一种并行快速傅立叶变换处理方法，包括如下步骤：A parallel fast Fourier transform processing method, comprising the steps of:

(1)、对接收到的数据序列x(n)进行第一级数据分组，即将N个数据划分为v个一级数据块，每个所述一级数据块包括个数据；其中，第n₂个一级数据块中第n₁个数据为x′(n₁,n₂)＝x(n₁v+n₂)，n2＝0、1、…、v-1；N＝r^S，v＝r^Z，Z＝0、1、…或S-2，S和r为整数；(1), performing first-level data grouping on the received data sequence x(n), that is, dividing N data into v first-level data blocks, and each of the first-level data blocks includes data; wherein, the n _1th data in the n _2th primary data block is x′(n ₁ ,n ₂ )=x(n ₁ v+n ₂ ), n2=0, 1,..., v-1; N=r ^S , v=r ^Z , Z=0, 1,... or S-2, S and r are integers;

(2)、对步骤(1)划分的各一级数据块进行第二级数据分组，即将每个一级数据块划分为r个二级数据块，每个所述二级数据块包括个数据；其中，第n₂个一级数据块划分的第n'₂个二级数据块中的第n'₁个数据x″(n'₁,n'₂,n₂)＝x′(n'₁r+n'₂,n₂)，n'₂＝0、1、…、r-1，n₂＝0、1、…、v-1；(2), each first-level data block that step (1) is divided carries out second-level data grouping, is about to divide each first-level data block into r secondary data blocks, and each described second-level data block includes data; wherein, the n' _1th data x"(n' ₁ ,n' ₂ ,n ₂ ₎ =x' ₍ n' ₁ r+n' ₂ ,n ₂ ), n' ₂ =0, 1, ..., r-1, n ₂ =0, 1, ..., v-1;

(3)、对每个二级数据块中的个数据进行点FFT计算；其中第n₂个一级数据块划分的第n'₂个二级数据块中的个数据FFT计算结果为其中，n'₂＝0、1、…、r-1，n₂＝0、1、…、v-1， (3), for each secondary data block data Point FFT calculation; wherein the n'th ₂ secondary data blocks divided by n' ₂ primary data blocks The FFT calculation result of each data is in, n' ₂ =0, 1, ..., r-1, n ₂ =0, 1, ..., v-1,

(4)、将每个一级数据块中的r个二级数据块的FFT计算结果进行合并，得到每个一级数据块的FFT计算结果；其中，第n₂个一级数据块中的个数据的FFT计算结果为其中，n₂＝0、1、…、v-1， (4), the FFT calculation results of r secondary data blocks in each primary data block are merged to obtain the FFT calculation results of each primary data block; wherein, in the n ₂ primary data blocks The FFT calculation result of the data is in, n ₂ =0, 1, . . . , v-1,

(5)、将v个一级数据块的FFT计算结果进行合并，得到数据序列x(n)的N点FFT计算结果其中，_k＝0、1、…、N-1，W_N＝e^-j2π/N。(5) Merge the FFT calculation results of v first-level data blocks to obtain the N-point FFT calculation results of the data sequence x(n) Wherein, _k =0, 1, ..., N-1, W _N =e ^-j2π/N .

上述的并行快速傅立叶变换处理方法，在步骤(2)中，采用vr个点的双口存储器存放vr个二级数据块的数据；所述vr个双口存储器分为v个存储器组，对应于v个一级数据块；每个所述存储器组中包括r个双口存储器，对应于1个一级数据块中的r个二级数据块；其中采用一个S位r进制计数器得到序列x(n)中每个数据的存储地址，即确定序列x(n)中第n个数据保存在第Group_Id(n)个存储器组中的第Ram_Id(n)个存储器中的存储地址序号Add_ID(n)，n＝0、1、…、N-1，具体实现方法如下：The above-mentioned parallel fast Fourier transform processing method, in step (2), adopts vr The dual-port memory of point stores the data of vr secondary data blocks; the vr dual-port memory is divided into v memory groups, corresponding to v primary data blocks; each of the memory groups includes r dual-port Memory, corresponding to r secondary data blocks in one primary data block; wherein an S-bit r-ary counter is used to obtain the storage address of each data in the sequence x(n), that is, to determine the The nth data is stored in the storage address serial number Add_ID (n) in the Ram_Id (n) memory in the Group_Id (n) memory group, n=0, 1, ..., N-1, and the specific implementation method is as follows:

所述计数器的计数值n的r进制数值为(a_S-1a_S-2…a_Z+1a_Za_Z-1…a₁a₀)_r，其中a₀～a_S-1为所述r进制数值第1位～第S位的数字，所述数字的取值范围为0～r-1，则存储器组序号Group_Id(n)的r进制数值为(a_Z-1…a₁a₀)_r，存储器序号Ram_Id(n)的r进制数值为(a_Z)_r，存储地址序号Add_ID(n)的r进制数值为(a_S-1a_S-2…a_Z+1)_r。The r-ary value of the count value n of the counter is (a _S-1 a _S-2 ...a _Z+1 a _Z a _Z-1 ...a ₁ a ₀ ) _r , where a ₀ to a _S-1 are The r-ary value of the 1st to S-th digits of the r-ary value ranges from 0 to r-1, and the r-ary value of the memory group serial number Group_Id(n) is (a _Z-1 … a ₁ a ₀ ) _r , the r-ary value of the memory serial number Ram_Id(n) is (a _Z ) _r , the r-ary value of the storage address serial number Add_ID(n) is (a _S-1 a _S-2 …a _{Z +1} ) _r .

上述的并行快速傅立叶变换处理方法，在步骤(3)中，采用基r FFT计算得到每个二级数据块中的个数据的FFT结果；在步骤(4)中采用基r FFT计算对每个一级数据块中的r个二级数据块的FFT计算结果进行合并；在步骤(5)中采用基v FFT计算对v个一级数据块的FFT计算结果进行合并。In the above-mentioned parallel fast Fourier transform processing method, in step (3), the radix r FFT calculation is used to obtain the FFT results of data; in step (4), adopt radix r FFT calculation to merge the FFT calculation results of r secondary data blocks in each primary data block; use radix v FFT calculation in step (5) Merge the FFT calculation results of the v first-level data blocks.

上述的并行快速傅立叶变换处理方法，在步骤(3)中，采用v个基r蝶形单元对vr个二级数据块进行基r FFT计算，每个一级数据块中的r个二级数据块通过时分复用系统共用1个基r蝶形计算单元；所述时分复用系统包括r个串并转换模块、第一级选通控制单元、基r蝶形计算单元、第二级选通控制单元和r个并串转换模块，其中：In the above-mentioned parallel fast Fourier transform processing method, in step (3), v base r butterfly units are used to perform base r FFT calculations on vr secondary data blocks, and r secondary data blocks in each primary data block The block shares a base-r butterfly computing unit through a time-division multiplexing system; A control unit and r parallel-to-serial conversion modules, wherein:

r个串并转换模块：与一个一级数据块中的r个二级数据块一一对应；分别从r个二级数据块的存储器中读取数据，得到r路串行数据，其中每路串行数据包括个数据点；然后每个串并转换模块对相应的串行数据进行串并转换，将串行的个数据点转换为r路并行数据，每路并行数据包括个数据点；r serial-to-parallel conversion modules: one-to-one correspondence with r secondary data blocks in a primary data block; read data from the memories of r secondary data blocks respectively to obtain r channels of serial data, wherein each channel Serial data includes data points; then each serial-to-parallel conversion module performs serial-to-parallel conversion on the corresponding serial data, and converts the serial data points are converted into r parallel data, and each parallel data includes data points;

第一级选通控制单元：对r个串并转换模块输出的并行数据进行选通操作，每次选通其中1个串并转换模块输出的r路并行数据，然后将所述r路并行数据输出到基r蝶形计算单元；The first-level gating control unit: performs gating operation on the parallel data output by r serial-to-parallel conversion modules, selects the r-way parallel data output by one of the serial-to-parallel conversion modules each time, and then transfers the r-way parallel data output to the base r butterfly computing unit;

基r蝶形计算单元：接收r路并行数据进行基r FFT计算，输出r路并行计算结果到第二级通道选通控制单元；Base-r butterfly calculation unit: receive r-channel parallel data to perform base-r FFT calculation, and output r-channel parallel calculation results to the second-stage channel gating control unit;

第二级选通控制单元：在r个并串转换模块之间进行选通，将接收到的r路并行FFT计算结果输出到其中1个并串转换模块，选通的并串转换模块序号与第一级选通控制单元选通的串并转换模块序号一致；The second-level gating control unit: gating among r parallel-to-serial conversion modules, outputting the received r parallel FFT calculation results to one of the parallel-to-serial conversion modules, and the sequence number of the selected parallel-to-serial conversion module is the same as The serial numbers of the serial-to-parallel conversion modules gated by the first-level strobe control unit are the same;

r个并串转换模块：与一个一级数据块中的r个二级数据块一一对应；经第二级选通控制单元选通后的并串转换模块，接收r路并行FFT计算结果，进行并串变换1路串行数据，将所述串行数据保存在对应的二级数据块的存储器内，存储位置与串并转换模块读取数据的位置一致，即实现原位存储。r parallel-serial conversion modules: one-to-one correspondence with r secondary data blocks in a first-level data block; the parallel-serial conversion module gated by the second-level gating control unit receives r parallel FFT calculation results, Perform parallel-to-serial conversion of one channel of serial data, store the serial data in the memory of the corresponding secondary data block, and the storage location is consistent with the location where the serial-to-parallel conversion module reads the data, that is, realize in-situ storage.

本发明与现有技术相比，具有以下优点：Compared with the prior art, the present invention has the following advantages:

(1)、在现有技术中限定FFT变换的数据点数N为2的幂次方，而本发明采用基r快速傅立叶变换，进行FFT变换的数据点数N＝r^S，其中r和S为任意整数，因此本发明的数据点数N具有更多的取值，可以在这些取值中选择补零最少的方案，从而减少对存储空间和计算时间的占用；(1), in the prior art, the number of data points N that limits FFT transformation is the power of 2, and the present invention adopts base r fast Fourier transform, carries out the number of data points N=r ^S of FFT transformation, wherein r and S are arbitrary Integer, so the number of data points N of the present invention has more values, and the scheme with the least zero padding can be selected in these values, thereby reducing the occupation of storage space and computing time;

(2)、本发明采用多蝶形并行计算，且并行蝶形计算单元的个数v＝r^z，其中z＝0、1、…、S-2，r和S为任意整数，因此本发明可以根据硬件资源的配置来选择并行度，具有较大的灵活性；(2), the present invention adopts multi-butterfly parallel computing, and the number of parallel butterfly computing units v=r ^z , wherein z=0, 1, ..., S-2, r and S are arbitrary integers, so the present invention The degree of parallelism can be selected according to the configuration of hardware resources, which has great flexibility;

(3)、本发明在进行N＝r^S点FFT变换时，采用v＝r^z个并行蝶形计算单元进行计算，整个计算周期为r^S-z-1(S-z-r)+2r(S-z-1)。(3), the present invention uses v=r ^z parallel butterfly computing units to perform calculations when performing N=r ^S -point FFT transformation, and the entire calculation cycle is r ^Sz-1 (Szr)+2r(Sz-1).

附图说明Description of drawings

图1为本发明的并行快速傅立叶变换处理方法流程图；Fig. 1 is the flow chart of parallel fast Fourier transform processing method of the present invention;

图2为本发明无冲突访问地址产生示意图；Fig. 2 is a schematic diagram of generation of conflict-free access addresses in the present invention;

图3为基r蝶形计算单元对单个双口存储器内数据进行处理的时序图；Fig. 3 is the sequence diagram that base r butterfly computing unit processes data in a single dual-port memory;

图4为时分复用基r蝶形计算单元对多个双口存储器内数据进行处理的时序图；Fig. 4 is the time sequence diagram that time division multiplexing base r-butterfly computing unit processes data in a plurality of dual-port memories;

图5为本发明的基r蝶形计算单元时分复用系统的原理框图。FIG. 5 is a functional block diagram of the time-division multiplexing system of the base-r-butterfly computing unit of the present invention.

图6a为本发明中r个存储器时分复用1个基r蝶形计算单元进行r组点FFT计算时的数据读取和存储示意图；Fig. 6a shows that r memories are time-division multiplexed with 1 base r butterfly computing unit in the present invention to perform r groups Schematic diagram of data reading and storage during point FFT calculation;

图6b为本发明中r个存储器时分复用1个基r蝶形计算单元进行r组点FFT计算结果合并时的数据读取和存储示意图；Fig. 6b shows that r memories are time-division multiplexed with 1 base r butterfly computing unit in the present invention to perform r groups Schematic diagram of data reading and storage when point FFT calculation results are merged;

图6c为本发明中v个存储器组时分复用1个基v蝶形计算单元进行v组点FFT计算结果合并时的数据读取和存储示意图。Fig. 6c shows that in the present invention, v memory groups are time-division multiplexed with 1 basic v-butterfly computing unit to perform v groups Schematic diagram of data reading and storage when point FFT calculation results are merged.

具体实施方式Detailed ways

下面结合附图和具体实例对本发明作进一步详细的描述：Below in conjunction with accompanying drawing and specific example the present invention is described in further detail:

(一)、并行FFT计算的理论推导(1) Theoretical derivation of parallel FFT calculation

1.1N点FFT的分解过程1.1 Decomposition process of N-point FFT

对接收数据序列x(n)进行N点FFT计算得到X(k)，计算公式如下所示：Perform N-point FFT calculation on the received data sequence x(n) to obtain X(k). The calculation formula is as follows:

其中，W_N＝e^-j2π/N，k＝0、1、…、N-1，n＝0、1、…、N-1。Wherein, W _N =e ^-j2π/N , k=0, 1, ..., N-1, n = 0, 1, ..., N-1.

定义v为能被N整除的正整数，则n和k可分解为：Define v as a positive integer divisible by N, then n and k can be decomposed into:

其中，n₂＝0、1、…、v-1，k₂＝0、1、…、v-1。式(2)将在[0,N-1]区间内的一维向量n、k映射到[0,v-1]×[0,(N/v)-1]的二维向量(n₁,n₂)和(k₁,k₂)，则式(1)可重写为：in, n ₂ =0, 1, . . . , v-1, k ₂ =0, 1, . . . , v-1. Equation (2) maps the one-dimensional vector n and k in the interval [0, N-1] to the two-dimensional vector (n ₁ ,n ₂ ) and (k ₁ ,k ₂ ), then formula (1) can be rewritten as:

其中，X'(k₁,n₂)＝FFT[x(n₁,n₂),(N/v)]。Wherein, X'(k ₁ ,n ₂ )=FFT[x(n ₁ ,n ₂ ),(N/v)].

式(3)表明一个N点FFT可以分解为v点FFT和N/v点FFT。同样，N/v点FFT可以进一步分解为更小点数的FFT，定义r为能被N整除的正整数，n₁和k₁可分解为：Equation (3) shows that an N-point FFT can be decomposed into v-point FFT and N/v-point FFT. Similarly, N/v point FFT can be further decomposed into FFT with smaller points, defining r as a positive integer divisible by N, n ₁ and k ₁ can be decomposed into:

其中，n'₂＝0、1、…、r-1，k'₂＝0、1、…、r-1。式(4)将在区间内的一维向量n₁和k₁映射为[0,r-1]×[0,N/(vr)-1]的二维向量。in, n' ₂ =0, 1, ..., r-1, k' ₂ =0, 1, . . . , r−1. Equation (4) will be in The one-dimensional vectors n ₁ and k ₁ in the interval are mapped to [0,r-1]×[0,N/(vr)-1] two-dimensional vectors.

按照式(4)的分解公式，将X'(k₁,n₂)整理如下：According to the decomposition formula of formula (4), arrange X'(k ₁ ,n ₂ ) as follows:

其中，X”(k'₁,n'₂,n₂)＝FFT[x(n'₁,n'₂,n₂),N/(vr)]。Wherein, X"(k' ₁ ,n' ₂ ,n ₂ )=FFT[x(n' ₁ ,n' ₂ ,n ₂ ),N/(vr)].

1.2N点FFT的并行计算过程1. Parallel calculation process of 2N-point FFT

根据以上推导的N点FFT的分解过程，本发明采用如图1所示的并行快速傅立叶变换处理方法流程，对输入数据序列x(n)进行N点FFT计算得到X(k)：According to the decomposition process of the N-point FFT deduced above, the present invention adopts the parallel fast Fourier transform processing method flow process as shown in Figure 1 to carry out N-point FFT calculation to the input data sequence x(n) to obtain X(k):

(b1)、对接收到的数据序列x(n)进行第一级数据分组，即将N个数据划分为v个一级数据块，每个一级数据块包括个数据；其中，第n₂个一级数据块中第n₁个数据x′(n₁,n₂)＝x(n₁v+n₂)，n₂＝0、1、…、v-1；N＝r^S，v＝r^Z，Z＝0、1、…、S-2，S和r为整数；(b1), performing first-level data grouping on the received data sequence x(n), that is, dividing N data into v first-level data blocks, and each first-level data block includes data; wherein, the n _1th data x′(n ₁ ,n ₂ )=x(n ₁ v+n ₂ ) in the n _2th primary data block, n ₂ =0, 1, ..., v-1; N = r ^S , v = r ^Z , Z = 0, 1, ..., S-2, S and r are integers;

(b2)、对步骤(b1)划分的各一级数据进行第二级数据分组，即将每个一级数据块划分为r个二级数据块，每个二级数据块包括个数据；其中，第n₂个一级数据块划分的第n'₂个二级数据块中的第n'₁个数据x″(n'₁,n'₂,n₂)＝x′(n'₁r+n'₂,n₂)，n'₂＝0、1、…、r-1，n₂＝0、1、…、v-1；(b2), carry out second-level data grouping to each first-level data that step (b1) divides, be about to divide each first-level data block into r second-level data blocks, and each second-level data block includes data; wherein, the n' _1th data x"(n' ₁ ,n' ₂ ,n ₂ ₎ =x' ₍ n' ₁ r+n' ₂ ,n ₂ ), n' ₂ =0, 1, ..., r-1, n ₂ =0, 1, ..., v-1;

(b3)、对每个二级数据块中的个数据进行点FFT计算；其中第n₂个一级数据块划分的第n'₂个二级数据块中的个数据FFT计算结果为其中，n'₂＝0、1、…、r-1，n₂＝0、1、…、v-1， (b3), for each secondary data block data Point FFT calculation; wherein the n'th ₂ secondary data blocks divided by n' ₂ primary data blocks The FFT calculation result of each data is in, n' ₂ =0, 1, ..., r-1, n ₂ =0, 1, ..., v-1,

(b4)、将每个一级数据块中的r个二级数据块的FFT计算结果进行合并，得到每个一级数据块的FFT计算结果；其中，第n₂个一级数据块中的个数据的FFT计算结果为其中，n₂＝0、1、…、v-1， (b4), the FFT calculation results of r secondary data blocks in each primary data block are merged to obtain the FFT calculation results of each primary data block; wherein, the n ₂ primary data blocks The FFT calculation result of the data is in, n ₂ =0, 1, . . . , v-1,

(b5)、将v个一级数据块的FFT计算结果进行合并，得到数据序列x(n)的N点FFT计算结果其中，_k＝0、1、…、N-1，W_N＝e^-j2π/N。(b5), merge the FFT calculation results of v first-level data blocks to obtain the N-point FFT calculation results of the data sequence x(n) Wherein, _k =0, 1, ..., N-1, W _N =e ^-j2π/N .

在以上处理过程中，步骤(b3)采用基r FFT计算得到每个二级数据块中的个数据的FFT结果。在步骤(b4)～(b5)中，由X”(k'₁,n'₂,n₂)得到X'(k₁,n₂)、再由X'(k₁,n₂)得到X(k)，这包括一级v个点基r FFT和一级N点基v FFT，也就是说在步骤(b4)中采用基r FFT计算对每个一级数据块中的r个二级数据块的FFT计算结果进行合并，在步骤(b5)中采用基v FFT计算对v个一级数据块的FFT计算结果进行合并。In the above process, step (b3) uses radix r FFT calculation to obtain the FFT results of the data. In steps (b4)~(b5), X'(k 1 ,n 2 ) is obtained from X"(k' ₁ ,n' ₂ ,n ₂ ), and X'(k _{1 ,n 2 ) is obtained from X'(k 1} _, _n ₂ ) (k), which includes a level of v Point-based r FFT and one-level N-point basis v FFT, that is to say, in step (b4), the basis r FFT calculation is used to combine the FFT calculation results of r secondary data blocks in each primary data block, and in In step (b5), basal v FFT calculations are used to combine the FFT calculation results of the v primary data blocks.

(二)、并行FFT计算结构设计(2) Parallel FFT calculation structure design

按照以上所述的步骤(b1)和(b2)，将点数为N＝r^S(S为整数)的数据序列x(n)划分为vr个二级数据块，然后可以采用基r FFT计算得到每个二级数据块中的个数据的FFT结果。由于以上所述的vr个二级数据块之间彼此独立，因此各二级数据块进行并行FFT计算，从而提高序列x(n)的N＝r^S点FFT计算速度。而且本发明提出了一种基r蝶形单元时分复用方法，可以实现1个一级数据块内的r个二级数据块时分复用1个基r蝶形单元，实现r个二级数据块的FFT计算，因此整个系统仅采用v个基r蝶形单元就可以实现vr个二级数据块的FFT并行计算，然后再通过一级基r和一级基v FFT即可得到N＝r^S点数据的FFT计算结果。According to the steps (b1) and (b2) described above, the data sequence x(n) whose number of points is N=r ^S (S is an integer) is divided into vr secondary data blocks, which can then be calculated by using radix r FFT in each secondary data block FFT results of the data. Since the aforementioned vr secondary data blocks are independent of each other, parallel FFT calculations are performed on each secondary data block, thereby improving the N=r ^S -point FFT calculation speed of the sequence x(n). And the present invention proposes a kind of basic r butterfly unit time division multiplexing method, can realize the time division multiplexing of r secondary data blocks in 1 primary data block with 1 basic r butterfly unit, realize r secondary data Therefore, the entire system can realize the FFT parallel calculation of vr secondary data blocks by using only v base r butterfly units, and then through the first-level base r and the first-level base v FFT, N=r can be obtained The FFT calculation result of point ^S data.

2.1、输入数据存储分配2.1. Input data storage allocation

本发明采用vr个点的双口存储器存放步骤(b1)～(b2)划分的vr个二级数据块。其中本发明将该vr个双口存储器分为v个存储器组，这v个存储器组对应于v个一级数据块；每个存储器组中包括r个双口存储器，这r个双口存储器对应于1个一级数据块中的r个二级数据块。The present invention adopts vr The point's dual-port memory stores the vr secondary data blocks divided by steps (b1)-(b2). Wherein, the present invention divides the vr dual-port memories into v memory groups, and these v memory groups correspond to v first-level data blocks; each memory group includes r dual-port memories, and these r dual-port memories correspond to r secondary data blocks in one primary data block.

本发明采用一个S位r进制计数器得到序列x(n)中每个数据的存储地址，即确定序列中第n个数据x(n)保存在第Group_Id(n)个存储器组中的第Ram_Id(n)个存储器中的存储地址序号Add_ID(n)，n＝0、1、…、N-1，具体实现方法如下：The present invention adopts an S-bit r-ary counter to obtain the storage address of each data in the sequence x(n), that is, to determine that the nth data x(n) in the sequence is stored in the Ram_Id in the Group_Id(n) memory group (n) storage address sequence number Add_ID(n) in the memory storage, n=0, 1, ..., N-1, the specific implementation method is as follows:

该计数器的计数值n的r进制数值为(a_S-1a_S-2…a_Z+1a_Za_Z-1…a₁a₀)_r，其中a₀～a_S-1为所述r进制数值第1位～第S位的数字，所述数字的取值范围为0～r-1，则:存储器组序号Group_Id(n)的_r进制数值为(a_Z-1…a₁a₀)_r，存储器序号Ram_Id(n)的r进制数值为(a_Z)_r，存储地址序号Add_ID(n)的r进制数值为(a_S-1a_S-2…a_Z+1)_r。The r-ary value of the count value n of the counter is (a _S-1 a _S-2 ...a _Z+1 a _Z a _Z-1 ...a ₁ a ₀ ) _r , where a ₀ ~ a _S-1 are all The 1st to the Sth digits of the r-ary system value, the value range of the number is 0 to r-1, then: the _r -ary value of the memory group serial number Group_Id(n) is (a _Z-1 ... a ₁ a ₀ ) _r , the r-ary value of the memory serial number Ram_Id(n) is (a _Z ) _r , the r-ary value of the storage address serial number Add_ID(n) is (a _S-1 a _S-2 …a _{Z +1} ) _r .

2.2、r个二级数据块的点FFT并行计算2.2, r secondary data blocks Point FFT Parallel Computing

根据上述的输入数据存储分配方法，每个二级数据块独立存放于1个存储器中，每个存储器中的点数据进行独立的FFT计算。以下给出并行FFT计算时，各存储器中数据访问地址的产生方法，以及r个二级数据块时分复用一个基r蝶形单元的方法。According to the above-mentioned input data storage allocation method, each secondary data block is independently stored in a memory, and the Independent FFT calculations are performed on the point data. The method for generating data access addresses in each memory and the method for time-division multiplexing r secondary data blocks into one basic r butterfly unit are given below during parallel FFT calculation.

2.2.1、每个存储器的点数据访问地址产生方法2.2.1, each memory Point data access address generation method

每个存储器中的点数据进行独立的FFT计算，这点数据序列x(n')的FFT定义如下：in each memory point data for independent FFT calculations, which The FFT of a point data sequence x(n') is defined as follows:

以下将时域地址标识n'和频域地址标识k'用M位r进制数表示：The time-domain address identifier n' and the frequency-domain address identifier k' are represented by M-bit r-ary numbers as follows:

其中，M＝S-z-1。根据式(7)对式(6)进行r进制改写，则：Wherein, M=S-z-1. Rewrite formula (6) in r-ary system according to formula (7), then:

从式(8)可以看出，N'点FFT计算可以分解为M次迭带计算，其中，第m次迭代公式为：It can be seen from formula (8) that the N'-point FFT calculation can be decomposed into M-time iterative calculations, where the m-th iteration formula is:

其中，m＝1,2,…,M，旋转因子的相位项从式(9)可以看出，在第m次迭代过程中，数据访问地址分为三个部分：Among them, m=1,2,...,M, the phase term of the rotation factor It can be seen from formula (9) that during the mth iteration, the data access address is divided into three parts:

其中，Addr(m,1)＝k_m-1， in, Addr(m,1)=k _m-1 ,

定义旋转因子各不相同的多个连续蝶形单元最大集合为一个蝶形集合，通过式(10)可知Addr(m,0)表示第m级中不重叠的蝶形集合的标识，Addr(m,1)表示每个蝶形单元中操作数的标识，Addr(m,2)则表示每个蝶形集合中所包含的蝶形单元的标识。而对于第m级的数据访问地址可以看成是三者的移位相加组合，因此本发明采用一个M位的r进制计数器C＝(c_M-1c_M-2…c₁c₀)_r，通过对C的截取来生成数据访问地址。Define the maximum set of multiple continuous butterfly units with different rotation factors as a butterfly set. From formula (10), it can be known that Addr(m,0) represents the identity of the non-overlapping butterfly set in the mth level, and Addr(m ,1) represents the identity of the operand in each butterfly unit, and Addr(m,2) represents the identity of the butterfly unit contained in each butterfly set. And for the data access address of the mth level, it can be regarded as the shift-add combination of the three, so the present invention adopts an M-bit r-ary counter C=(c _M-1 c _M-2 ... c ₁ c ₀ ) _r , generate data access address by intercepting C.

在对每个存储器中的点数据进行FFT计算时，需要对蝶形计算单元内的蝶形集合、蝶形集合内的蝶形单元以及蝶形单元内的r个操作数分别进行顺序处理。这样可以用(c_M-1c_M-2…c_M-m+2c_M-m+1)_r映射第m级中不重叠的蝶形集合的标识Addr(m,0)，(c₀)_r来映射每个蝶形单元中操作数的标识Addr(m,1)，(c_M-m-1c_M-m-2…c₁)_r映射每个蝶形集合中所包含的蝶形单元的标识Addr(m,2)，则每一级的访问地址和计数值C的关系如式(11)所示，第m级访问地址AcesAddr(m)可以表示如下：in each memory When performing FFT calculation on point data, it is necessary to sequentially process the butterfly set in the butterfly computing unit, the butterfly unit in the butterfly set, and the r operands in the butterfly unit respectively. In this way, (c _M-1 c _M-2 ... c _M-m+2 c _M-m+1 ) _r maps the identity Addr(m,0) of the non-overlapping butterfly set in the mth level, (c ₀ ) _r to map the identity of the operand in each butterfly unit Addr(m,1), (c _Mm-1 c _Mm-2 ...c ₁ ) _r maps the identity of the butterfly unit contained in each butterfly set Addr(m, 2), the relationship between the access address of each level and the count value C is shown in formula (11), and the access address of the mth level AcesAddr(m) can be expressed as follows:

每个存储器中各级FFT计算的数据访问地址生成过程如图2所示。The data access address generation process of each level of FFT calculation in each memory is shown in Figure 2.

2.2.2、基r蝶形计算单元计算过程2.2.2. Calculation process of base-r butterfly computing unit

对单个双口存储器进行N'点FFT计算，基r蝶形计算单元需要完成两部分操作：一是与旋转因子相乘的过程；二是与r点DFT矩阵相乘。前者可以通过从存储器顺序读取数据后经流水复数乘法器完成，后者则可通过寄存器延时来对即时顺序取出的r个操作数并行输入蝶形单元。To perform N'-point FFT calculation on a single dual-port memory, the basic r-butterfly calculation unit needs to complete two operations: one is the process of multiplying the twiddle factor; the other is multiplying the r-point DFT matrix. The former can be completed by sequentially reading data from the memory and passing through the pipeline complex multiplier, and the latter can input the r operands fetched in real-time order in parallel to the butterfly unit through register delay.

整个基r蝶形计算单元的计算过程如下所述：The calculation process of the whole basic r-butterfly computing unit is as follows:

(A)、按照上一节推导的数据访问地址由存储器的读总线Data_load取出数据，每r个数据代表蝶形单元的r个操作数据，依次标记为Op(i)，取出数据的同时和旋转因子相乘，结果标记为Op'(i)，其中i＝0,1,…,r-1；(A), According to the data access address deduced in the previous section, the data is fetched from the read bus Data _load of the memory. Each r data represents the r operation data of the butterfly unit, which are marked as Op(i) in turn, and the data is fetched at the same time as The twiddle factors are multiplied, and the result is marked as Op'(i), where i=0,1,...,r-1;

(B)、记蝶形单元的r个输入寄存器为Bf_reg_in(i)，输出寄存器为Bf_reg_out(i)，每从总线上取出r个数据后，令Bf_reg_in(i)＝Op'(i)，并进行蝶形计算。蝶形单元采用并行流水计算方式，每次计算完成蝶形单元有r-1个空闲时间，计算结果表示为Op”(i)，令Bf_reg_out(i)＝Op”(i)；(B), the r input registers of the butterfly unit are Bf_reg _in (i), and the output registers are Bf_reg _out (i). After each r data is taken out from the bus, Bf_reg _in (i)=Op'(i ), and carry out the butterfly calculation. The butterfly unit adopts a parallel pipeline calculation method, and the butterfly unit has r-1 idle time for each calculation, and the calculation result is expressed as Op”(i), so that Bf_reg _out (i)=Op”(i);

(C)、对Bf_reg_out(i)中的r个数据依次赋给存储器的写总线Data_store上，完成本次蝶形单元的计算过程。(C), sequentially assign r data in Bf_reg _out (i) to the write bus Data _store of the memory, and complete the calculation process of the butterfly unit this time.

上述过程的时序图如图3所示，从中可以看出在单个基r FFT计算过程中，蝶形单元每r个时钟周期都会有r-1个处于空闲状态。The timing diagram of the above process is shown in Figure 3, from which it can be seen that during the calculation process of a single radix r FFT, r-1 butterfly units will be in an idle state every r clock cycles.

2.2.3、时分复用基r蝶形计算单元完成二级数据块FFT计算2.2.3. The time-division multiplexing base r-butterfly calculation unit completes the FFT calculation of the secondary data block

在N点FFT计算的步骤(b3)中，需要对vr个二级数据块进行基r FFT计算，如果每个二级数据块都采用1个基r蝶形单元进行FFT计算，则vr个二级数据块需要vr个基r蝶形单元实现并行计算。但是由于基r蝶形单元采用并行流水计算方式，基r蝶形单元对单个数据块进行计算时，每r个时钟周期会有r-1个处于空闲状态，因此可以利用一个蝶形单元对r个N'点数据进行FFT。此时，只需要控制各N'点数据取数的起始时间即可完成蝶形单元的时分复用。将第f(f＝0,1,…,r-1)个N'点数据的r个操作数标记为Op_f(i)(i＝0,1,…,r-1)，将存储器总线读取和旋转因子相乘之后的结果记为Op_f'(i)，依次输入蝶形单元这r个N'点数据的操作数据，蝶形计算结果记为Op_f”(i)，并根据f的值存入相应的存储器，采用该时分复用方法后的处理时序如图4所示。In the step (b3) of N-point FFT calculation, it is necessary to perform radix-r FFT calculation on vr secondary data blocks. If each secondary data block uses 1 radix-r butterfly unit for FFT calculation, then vr Level data blocks need vr basic r butterfly units to realize parallel computation. However, since the base-r butterfly unit adopts the parallel pipeline computing method, when the base-r butterfly unit calculates a single data block, there will be r-1 in the idle state every r clock cycles, so one butterfly unit can be used for r Perform FFT on N' point data. At this time, the time-division multiplexing of the butterfly unit can be completed only by controlling the start time of each N'-point data fetching. Mark the r operands of the f(f=0,1,...,r-1)th N' point data as Op _f (i)(i=0,1,...,r-1), the memory bus The result after reading and multiplied by the twiddle factor is recorded as Op _f '(i), and the operation data of the r N' point data of the butterfly unit is input in turn, and the calculation result of the butterfly is recorded as Op _f ”(i), and according to The value of f is stored in the corresponding memory, and the processing sequence after adopting the time division multiplexing method is shown in Fig. 4 .

因此，本发明为了提高系统的资源利用率，采用v个基r蝶形单元对vr个二级数据块进行基r FFT计算，其中每个一级数据块中r个二级数据块通过时分复用系统共用1个基r蝶形计算单元。如图5所示，本发明采用的基r蝶形单元时分复用系统包括r个串并转换模块、第一级选通控制单元、基r蝶形计算单元、第二级选通控制单元和r个并串转换模块，其中：Therefore, in order to improve the resource utilization rate of the system, the present invention uses v radix-r butterfly units to perform radix-r FFT calculations on vr secondary data blocks, wherein r secondary data blocks in each primary data block are time-division multiplexed Use the system to share a base r butterfly computing unit. As shown in Figure 5, the base-r butterfly unit time-division multiplexing system adopted in the present invention includes r serial-to-parallel conversion modules, a first-level gating control unit, a base-r-butterfly calculation unit, a second-level gating control unit, and r parallel-to-serial conversion modules, wherein:

r个串并转换模块与一个一级数据块中的r个二级数据块一一对应。这r个串并转换模块分别从r个二级数据块的存储器中读取数据，得到r路串行数据，其中每路串行数据包括个数据点；然后每个串并转换模块对相应的串行数据进行串并转换，将串行的个数据点转换为r路并行数据，每路并行数据包括个数据点。The r serial-to-parallel conversion modules are in one-to-one correspondence with the r secondary data blocks in a primary data block. The r serial-to-parallel conversion modules respectively read data from the memories of the r secondary data blocks to obtain r channels of serial data, wherein each channel of serial data includes data points; then each serial-to-parallel conversion module performs serial-to-parallel conversion on the corresponding serial data, and converts the serial data points are converted into r parallel data, and each parallel data includes data points.

第一级选通控制单元对r个串并转换模块输出的并行数据进行选通操作，每次选通其中1个串并转换模块输出的r路并行数据，然后将所述r路并行数据输出到基r蝶形计算单元。The first-level gating control unit performs a gating operation on the parallel data output by r serial-to-parallel conversion modules, selects the r-way parallel data output by one of the serial-to-parallel conversion modules each time, and then outputs the r-way parallel data to the base r butterfly computing unit.

基r蝶形计算单元接收经第一级选通控制单元选通的r路并行数据，进行基r FFT计算，输出r路并行计算结果到第二级通道选通控制单元。The radix-r butterfly calculation unit receives the r-channel parallel data gated by the first-stage gating control unit, performs radix-r FFT calculation, and outputs the r-channel parallel calculation results to the second-stage channel gating control unit.

第二级选通控制单元在r个并串转换模块之间进行选通，将基r蝶形计算单元输出的r路并行FFT计算结果输出到其中1个并串转换模块，其中，选通的并串转换模块序号与第一级选通控制单元选通的串并转换模块序号一致。The second-level gating control unit gates among the r parallel-to-serial conversion modules, and outputs the r-way parallel FFT calculation results output by the base r butterfly calculation unit to one of the parallel-to-serial conversion modules, wherein the gating The serial number of the parallel-to-serial conversion module is consistent with the serial number of the serial-to-parallel conversion module gated by the first-level gate control unit.

r个并串转换模块与一个一级数据块中的r个二级数据块一一对应。经第二级选通控制单元选通后的并串转换模块，接收r路并行FFT计算结果，进行并串变换1路串行数据，将所述串行数据保存在对应的二级数据块的存储器内，存储位置与串并转换模块读取数据的位置一致，即实现原位存储。The r parallel-to-serial conversion modules are in one-to-one correspondence with the r secondary data blocks in one primary data block. The parallel-to-serial conversion module gated by the second-level strobe control unit receives r parallel FFT calculation results, performs parallel-to-serial conversion of 1-way serial data, and stores the serial data in the corresponding secondary data block In the memory, the storage location is consistent with the location where the data is read by the serial-to-parallel conversion module, that is, in-situ storage is realized.

以上时分复用计算过程中，各存储器中的数据读取和存储方式如图6a所示。During the above time-division multiplexing calculation process, the data reading and storage methods in each memory are shown in FIG. 6a.

2.2.4、时分复用基r蝶形计算单元完成FFT计算结果合并2.2.4. The time-division multiplexing base r-butterfly calculation unit completes the merging of FFT calculation results

在步骤(b4)～(b5)中，由X”(k'₁,n'₂,n₂)得到X'(k₁,n₂)、再由X'(k₁,n₂)得到X(k)，这包括一级v个点基r FFT和一级N点基v FFT，也就是说在步骤(b4)中采用基r FFT计算对每个一级数据块中的r个二级数据块的FFT计算结果进行合并，在步骤(b5)中采用基vFFT计算对v个一级数据块的FFT计算结果进行合并。为了提高计算效率，在进行以上FFT计算结果合并时，同样可以采用如图5所示的时分复用系统。In steps (b4)~(b5), X'(k 1 ,n 2 ) is obtained from X"(k' ₁ ,n' ₂ ,n ₂ ), and X'(k _{1 ,n 2 ) is obtained from X'(k 1} _, _n ₂ ) (k), which includes a level of v Point-based r FFT and one-level N-point basis v FFT, that is to say, in step (b4), the basis r FFT calculation is used to combine the FFT calculation results of r secondary data blocks in each primary data block, and in In step (b5), the basic vFFT calculation is used to combine the FFT calculation results of the v primary data blocks. In order to improve calculation efficiency, when combining the above FFT calculation results, a time division multiplexing system as shown in FIG. 5 may also be used.

其中，在步骤(b3)对每个二级数据块中的个数据进行点FFT计算时，存储器组的处理过程如图6a所示；在步骤(b4)中采用基r FFT计算对每个一级数据块中的r个二级数据块的FFT计算结果进行合并时，存储器组的处理过程如图6b所示；在步骤(b5)中采用基rFFT计算对每个一级数据块中的r个二级数据块的FFT计算结果进行合并时，存储器组的处理过程如图6c所示。Among them, in step (b3) for each secondary data block data During point FFT calculation, the processing process of the memory bank is as shown in Figure 6a; in step (b4), when the FFT calculation results of r secondary data blocks in each primary data block are merged by using radix r FFT calculation, The processing process of the memory group is shown in Figure 6b; when the FFT calculation results of r secondary data blocks in each primary data block are combined by using the basic rFFT calculation in step (b5), the processing process of the memory group is as follows Figure 6c shows.

实施例：Example:

在本实施例中，采用4个基2蝶形单元实现1024点数据的FFT计算，即：N＝1024，r＝2，v＝4，S＝10，Z＝2。In this embodiment, four radix-2 butterfly units are used to realize the FFT calculation of 1024 point data, namely: N=1024, r=2, v=4, S=10, Z=2.

按照本发明的处理方法，计算步骤如下：According to the processing method of the present invention, the calculation steps are as follows:

(1)、输入数据存储分配(1), input data storage allocation

在本实施例中，采用8个128点的双口存储器进行输入数据存储。其中，将这8个存储器划分为4个存储器组，每个存储器组包括2个存储器，每个存储器具有128个存储位置。In this embodiment, eight 128-point dual-port memories are used for input data storage. Wherein, the 8 memories are divided into 4 memory groups, each memory group includes 2 memories, and each memory has 128 storage locations.

首先对输入的1024点数据进行顺序编址，则数据x(n)的地址标识n的2进制表示为(n₉n₈…n₁n₀)₂。其中，第n个数据保存在第Group_Id(n)＝(n₁n₀)个存储器组中的第Ram_Id(n)＝(n₂)个存储器中，在该存储器中的存储地址为(n₉n₈…n₄n₃)₂。Firstly, sequentially address the input 1024-point data, then the binary representation of the address identifier n of the data x(n) is (n ₉ n ₈ …n ₁ n ₀ ) ₂ . Wherein, the nth data is stored in the Ram_Id(n)=(n ₂ )th memory in the Group_Id(n)=(n ₁ n ₀ )th memory group, and the storage address in the memory is (n ₉ n ₈ ... n ₄ n ₃ ) ₂ .

按照如上的存储分配方法，1024点数据在8个存储器中的存储情况如表1所示。According to the storage allocation method above, the storage situation of 1024 points of data in 8 memories is shown in Table 1.

表1输入数据在各存储器中的Table 1 Input data in each memory

(2)、无冲突地址访问实现8个128点并行FFT计算(2) Conflict-free address access to realize 8 128-point parallel FFT calculations

当数据存入相应存储器之后，开始对8个存储器内的数据同时进行128点基2FFT处理。对于128点基2FFT，在7级FFT计算过程中，通过对7位2进制计数器C＝(c₆c₅…c₁c₀)₂的截取生成数据访问地址。各级FFT计算的数据访问地址如表2所示。After the data is stored in the corresponding memory, the 128-point radix 2FFT processing is started on the data in the 8 memories at the same time. For 128-point base 2FFT, in the 7-level FFT calculation process, the data access address is generated by intercepting the 7-bit binary counter C=(c ₆ c ₅ ...c ₁ c ₀ ) ₂ . Table 2 shows the data access addresses for FFT calculations at all levels.

采用如图5所示的时分复用系统，每2个存储器的数据共用1个基2蝶形计算单元，从而用4个基2蝶形计算单元完成8个128点并行FFT计算，整个计算时间需要924个时钟周期。Using the time-division multiplexing system shown in Figure 5, the data of every two memories shares one radix-2 butterfly computing unit, so that 8 128-point parallel FFT calculations are completed with four radix-2 butterfly computing units, and the entire calculation time It takes 924 clock cycles.

表2每个存储器在各级FFT计算中的数据访问地址Table 2 The data access address of each memory in FFT calculation at all levels

(3)、第一级结果合成(3), first-level result synthesis

根据式(5)对每个存储器组分别进行一级256点基2FFT计算，由于有4个基2蝶形单元，且蝶形的2个操作数据分别存在不同的2个存储器当中，这样同一时刻能够从8个存储器中同时访问8个数据，仅用128个时钟周期就可完成该计算。According to the formula (5), each memory group is calculated with a 256-point radix-2 FFT. Since there are 4 radix-2 butterfly units, and the two operation data of the butterfly are stored in two different memories, so at the same time Can access 8 data simultaneously from 8 memorizers, can finish this calculation with only 128 clock cycles.

(4)、第二级结果合成(4), second-level result synthesis

对4组256点数据按照公式(3)采用一级基4FFT计算从而得到1024点的FFT结果。此时蝶形单元的操作数据分别属于不同的存储器组，因此可以对4个操作数进行并行访问。因此，完成该计算只需要256个时钟周期。Four groups of 256-point data are calculated according to the formula (3) using the first-order basis 4FFT to obtain the FFT result of 1024 points. At this time, the operation data of the butterfly unit belong to different memory banks, so the four operands can be accessed in parallel. Therefore, only 256 clock cycles are required to complete the calculation.

采用如上处理步骤，并行计算1024点基2FFT共需要1308个时钟周期，相对于与现有技术，可以有效降低计算时间损耗。Using the above processing steps, a total of 1308 clock cycles are required for parallel calculation of 1024-point base 2FFT, which can effectively reduce calculation time consumption compared with the prior art.

以上所述，仅为本发明一个具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本发明的保护范围之内。The above is only a specific embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in the present invention. All should be covered within the protection scope of the present invention.

本发明说明书中未作详细描述的内容属于本领域专业技术人员的公知技术。The content that is not described in detail in the specification of the present invention belongs to the well-known technology of those skilled in the art.

Claims

1. A parallel fast Fourier transform processing method, characterized in that comprising the steps:

(1), carry out first-level data grouping to the received data sequence x(n), n=0, 1, ..., N-1, be about to divide N data into v first-level data blocks, each described Level 1 data blocks include data; wherein, the n _1th data in the n _2th primary data block is x′(n ₁ ,n ₂ )=x(n ₂ v+n ₁ ), n ₂ =0, 1, ..., v-1; N = r ^S , v = r ^Z , Z = 0, 1, ..., S-2, S and r are integers;

(2), each first-level data block that step (1) is divided carries out second-level data grouping, is about to divide each first-level data block into r secondary data blocks, and each described second-level data block includes data; wherein, the n' _1th data x"(n' ₁ ,n' ₂ ,n ₂ ₎ =x' ₍ n' ₂ r+n' ₁ ,n ₂ ), n' ₂ =0, 1, ..., r-1, n ₂ =0, 1, ..., v-1;

(3), for each secondary data block data Point FFT calculation; wherein the n'th ₂ secondary data blocks divided by n' ₂ primary data blocks The FFT calculation result of each data is in, n' ₂ =0, 1, ..., r-1, n ₂ =0, 1, ..., v-1, Using v base-r butterfly units to perform base-r FFT calculations on vr secondary data blocks, r secondary data blocks in each first-level data block share one base-r butterfly calculation unit through a time-division multiplexing system; The time-division multiplexing system includes r serial-to-parallel conversion modules, a first-level gating control unit, a base r butterfly calculation unit, a second-level gating control unit, and r parallel-to-serial conversion modules, wherein:

r serial-to-parallel conversion modules: one-to-one correspondence with r secondary data blocks in a primary data block; read data from the memories of r secondary data blocks respectively to obtain r channels of serial data, wherein each channel Serial data includes data points; then each serial-to-parallel conversion module performs serial-to-parallel conversion on the corresponding serial data, and converts the serial data points are converted into r parallel data, and each parallel data includes data points;

The first-level gating control unit: performs gating operation on the parallel data output by r serial-to-parallel conversion modules, selects the r-way parallel data output by one of the serial-to-parallel conversion modules each time, and then transfers the r-way parallel data output to the base r butterfly computing unit;

Base-r butterfly calculation unit: receive r-channel parallel data to perform base-r FFT calculation, and output r-channel parallel calculation results to the second-stage channel gating control unit;

The second-level gating control unit: gating among r parallel-to-serial conversion modules, outputting the received r parallel FFT calculation results to one of the parallel-to-serial conversion modules, and the sequence number of the selected parallel-to-serial conversion module is the same as The serial numbers of the serial-to-parallel conversion modules gated by the first-level strobe control unit are the same;

r parallel-serial conversion modules: one-to-one correspondence with r secondary data blocks in a first-level data block; the parallel-serial conversion module gated by the second-level gating control unit receives r parallel FFT calculation results, Perform parallel-to-serial conversion of 1-way serial data, store the serial data in the memory of the corresponding secondary data block, the storage location is consistent with the location where the serial-to-parallel conversion module reads the data, that is, realize in-situ storage;

(4), the FFT calculation results of r secondary data blocks in each primary data block are merged to obtain the FFT calculation results of each primary data block; wherein, in the n ₂ primary data blocks The FFT calculation result of the data is in, n ₂ =0, 1, . . . , v-1,

(5) Merge the FFT calculation results of v first-level data blocks to obtain the N-point FFT calculation results of the data sequence x(n) Wherein, k=0, 1, ..., N-1, W _N =e ^-j2π/N .

2. a kind of parallel fast Fourier transform processing method according to claim 1, is characterized in that: in step (2), adopt vr The dual-port memory of point stores the data of vr secondary data blocks; the vr dual-port memory is divided into v memory groups, corresponding to v primary data blocks; each of the memory groups includes r dual-port Memory, corresponding to r secondary data blocks in one primary data block; wherein an S-bit r-ary counter is used to obtain the storage address of each data in the sequence x(n), that is, to determine the The nth data is stored in the storage address serial number Add_ID (n) in the Ram_Id (n) memory in the Group_Id (n) memory group, n=0, 1, ..., N-1, and the specific implementation method is as follows:

The r-ary value of the count value n of the counter is (a _S-1 a _S-2 ...a _Z+1 a _Z a _Z-1 ...a ₁ a ₀ ) _r , where a ₀ to a _S-1 are The r-ary value of the 1st to S-th digits of the r-ary value ranges from 0 to r-1, and the r-ary value of the memory group serial number Group_Id(n) is (a _Z-1 … a ₁ a ₀ ) _r , the r-ary value of the memory serial number Ram_Id(n) is (a _Z ) _r , the r-ary value of the storage address serial number Add_ID(n) is (a _S-1 a _S-2 …a _{Z +1} ) _r .

3. a kind of parallel Fast Fourier Transform processing method according to claim 1, is characterized in that: in step (3), adopt radix r FFT to calculate and obtain in each secondary data block The FFT result of data; Adopt base rFFT calculation in step (4) to merge the FFT calculation results of r secondary data blocks in each primary data block; Adopt base v FFT calculation in step (5) to The FFT calculation results of the v first-level data blocks are combined.