[go: up one dir, main page]

CN108762719A - A kind of parallel broad sense inner product reconfigurable controller - Google Patents

A kind of parallel broad sense inner product reconfigurable controller Download PDF

Info

Publication number
CN108762719A
CN108762719A CN201810497969.2A CN201810497969A CN108762719A CN 108762719 A CN108762719 A CN 108762719A CN 201810497969 A CN201810497969 A CN 201810497969A CN 108762719 A CN108762719 A CN 108762719A
Authority
CN
China
Prior art keywords
address
intermediate result
inner product
calculation module
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810497969.2A
Other languages
Chinese (zh)
Other versions
CN108762719B (en
Inventor
李丽
祁鹏展
鲍贤亮
宋文清
李伟
何书专
潘红兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810497969.2A priority Critical patent/CN108762719B/en
Publication of CN108762719A publication Critical patent/CN108762719A/en
Application granted granted Critical
Publication of CN108762719B publication Critical patent/CN108762719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Nonlinear Science (AREA)
  • Logic Circuits (AREA)
  • Complex Calculations (AREA)

Abstract

本发明的并行广义内积重构控制器,包括:中间结果计算模块,接收源数据并根据源数据计算中间结果向量,生成向量的地址,存入bank;每完成一个中间结果向量的计算生成一个完成信号,并将所述完成信号发送至最终结果计算模块,作为启动信号;最终结果计算模块,读数据进入复数乘累加器进行最终结果计算得到结果矩阵第L个元素,生成向量的地址,存入bank;数据存储地址处理模块,根据乒乓操作选择信号进行数据选择,生成正确的bank地址信号。有益效果:计算时间少且存储资源利用率大,可满足在许多信号检测应用场景中进行非均匀检测时,获取检验统计量的高实时性要求。

The parallel generalized inner product reconstruction controller of the present invention includes: an intermediate result calculation module, which receives source data and calculates an intermediate result vector according to the source data , generating a vector The address is stored in the bank; every time an intermediate result vector is completed The calculation generates a completion signal, and sends the completion signal to the final result calculation module as a start signal; the final result calculation module reads data into the complex multiplication accumulator for final result calculation to obtain the result matrix Lth element , generating a vector The address is stored in the bank; the data storage address processing module selects the data according to the ping-pong operation selection signal, and generates the correct bank address signal. Beneficial effects: the calculation time is less and the storage resource utilization rate is large, and the high real-time requirement for obtaining test statistics can be met when non-uniform detection is performed in many signal detection application scenarios.

Description

一种并行广义内积重构控制器A Parallel Generalized Inner Product Reconfiguration Controller

技术领域technical field

本发明属于非均匀检测技术领域,尤其涉及一种并行广义内积重构控制器。The invention belongs to the technical field of non-uniform detection, in particular to a parallel generalized inner product reconstruction controller.

背景技术Background technique

空时自适应处理(STAP)是一种对运动目标的检测技术。常规STAP算法中,必须进行杂波协方差矩阵估计。当利用二次数据进行杂波协方差矩阵的估计时,二次数据必须满足独立同分布的条件,才能减少性能损失。Space-time adaptive processing (STAP) is a detection technology for moving objects. In the conventional STAP algorithm, the clutter covariance matrix must be estimated. When using quadratic data to estimate the clutter covariance matrix, the quadratic data must meet the condition of independent and identical distribution in order to reduce performance loss.

在实际应用中,所检测到的信号回波不仅会被自然杂波污染,还会受到人为的非均匀干扰所污染,因此经常不满足独立同分布条件。In practical applications, the detected signal echo will not only be polluted by natural clutter, but also be polluted by man-made non-uniform interference, so the independent and identical distribution condition is often not satisfied.

针对样本中的干扰目标,Melvin首先提出了非均匀检测器(NHD)的思想,通过剔除包含干扰目标的样本,来抑制其对杂波协方差矩阵估计的影响。NHD的基本思路为:根据被干扰目标污染的样本与其他样本统计特性的差异,设置相应的检验统计量来区分两种样本。For the interference target in the sample, Melvin first proposed the idea of non-uniform detector (NHD), which can suppress its influence on the estimation of the clutter covariance matrix by eliminating the sample containing the interference target. The basic idea of NHD is: according to the difference between the statistical characteristics of the sample polluted by the interference target and other samples, set the corresponding test statistics to distinguish the two samples.

在NHD检验统计量选取方面,美国海军实验室Gerlach等人提出了广义内积(GIP)和自适应功率剩余两个准则。令XL表示初始样本中的第L个样本,则其对应的自相关矩阵表示为:其中T为杂噪协方差矩阵,令表示由L个样本组成的样本协方差矩阵,则每个样本对应的GIP值可表示为:根据每个样本对应的GIP值,可以有效剔除干扰目标。In the selection of NHD test statistics, Gerlach et al. proposed two criteria of generalized inner product (GIP) and adaptive power residual. Let X L represent the Lth sample in the initial sample, then its corresponding autocorrelation matrix is expressed as: where T is the noise covariance matrix, so that Represents a sample covariance matrix composed of L samples, then the GIP value corresponding to each sample can be expressed as: According to the GIP value corresponding to each sample, the interference target can be effectively eliminated.

广义内积非均匀检测方法对杂波的抑制能力与样本的数量大小有关,样本数量越大,杂波协方差矩阵数据越真实,其对杂波的抑制能力越强。软件上实现广义内积非均匀检测方法对大量样本进行计算时存在精度不高和运算时间过长的问题,以满足实际非均匀检测技术的高实时性要求。The ability of the generalized inner product non-uniform detection method to suppress clutter is related to the number of samples. The larger the number of samples, the more realistic the clutter covariance matrix data, and the stronger the ability to suppress clutter. The generalized inner product non-uniform detection method implemented in software has the problems of low precision and long operation time when calculating a large number of samples, so as to meet the high real-time requirements of the actual non-uniform detection technology.

发明内容Contents of the invention

本发明的目的是克服上述背景技术中的不足,提出一种并行广义内积重构控制器,更好地满足实际应用的高实时性、大点数计算的需求,具体通过以下技术方案来实现的:The purpose of the present invention is to overcome the deficiencies in the above-mentioned background technology, and propose a parallel generalized inner product reconstruction controller to better meet the needs of high real-time and large-point calculations in practical applications, specifically through the following technical solutions. :

所述并行广义内积重构控制器包括:The parallel generalized inner product reconstruction controller includes:

中间结果计算模块,接收源数据并根据源数据计算中间结果向量YL,生成向量YL的地址,存入bank;每完成一个中间结果向量YL的计算生成一个完成信号,并将所述完成信号发送至最终结果计算模块,作为启动信号;The intermediate result calculation module receives the source data and calculates the intermediate result vector Y L according to the source data, generates the address of the vector Y L , and stores it in the bank; every time the calculation of an intermediate result vector Y L is completed, a completion signal is generated, and the completed The signal is sent to the final result calculation module as a start signal;

最终结果计算模块,通过地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址,读数据进入复数乘累加器得到结果矩阵Z1xN第L个元素ZL,生成向量ZL的地址,存入bank;The final result calculation module continuously generates the address of the column X L element of the matrix X and the address of the corresponding intermediate result vector Y L element through the address generator, and reads the data into the complex multiplication accumulator to obtain the Lth element Z L of the result matrix Z 1xN , Generate the address of the vector Z L and store it in the bank;

数据存储地址处理模块,根据乒乓操作选择信号进行数据选择,同时对来自中间结果计算模块和最终结果计算模块的针对同一个bank的信号进行处理,生成正确的bank地址信号。The data storage address processing module selects data according to the ping-pong operation selection signal, and simultaneously processes signals for the same bank from the intermediate result calculation module and the final result calculation module to generate correct bank address signals.

所述并行广义内积运算的硬件实现方法的进一步设计在于,计算YL的过程是XL和方阵T,每一列乘累加的过程,所述方阵T的行列数与矩阵X的列数相等,该乘累加的过程通过多路并行计算实现。The further design of the hardware realization method of described parallel generalized inner product operation is that the process of calculating Y L is X L and square matrix T, and the process of multiplying and accumulating each column, the number of rows and columns of said square matrix T and the number of columns of matrix X equal, the process of multiplying and accumulating is realized by multi-channel parallel computing.

所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块采用四路并行的实现方式实现。A further design of the hardware implementation method of the parallel generalized inner product operation is that the intermediate result calculation module is implemented in a four-way parallel implementation.

所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块的源数据存储方式为:矩阵T按列存放在bank0-bank3中,存满之后继续按列存放于bank4-bank7中;矩阵X按列存放在bank8-bank11中。The further design of the hardware implementation method of the parallel generalized inner product operation is that the source data storage method of the intermediate result calculation module is: the matrix T is stored in bank0-bank3 by column, and continues to be stored in bank4-bank7 by column after it is full ;Matrix X is stored in bank8-bank11 by column.

所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块的中间结果存储方式为:奇数项存放到bank12中,偶数项存放到bank13中。The further design of the hardware implementation method of the parallel generalized inner product operation lies in that the intermediate result storage mode of the intermediate result calculation module is as follows: odd items are stored in bank12, and even items are stored in bank13.

所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块进行中间结果计算的流程为:在一次运算过程中,首先地址生成器生成X的一列元素XL和四列T矩阵元素地址,同时搬运对应的矩阵元素数据,输入复数乘累加器得到中间结果YL;接着由地址生成器生成中间结果存储地址,将中间结果存入bank中。The further design of the hardware implementation method of the parallel generalized inner product operation is that the intermediate result calculation module performs the intermediate result calculation process as follows: in an operation process, first the address generator generates a column of elements X L and four columns of T matrix of X The element address, and at the same time transport the corresponding matrix element data, input the complex number multiplied by the accumulator to obtain the intermediate result Y L ; then the address generator generates the storage address of the intermediate result, and stores the intermediate result in the bank.

所述并行广义内积运算的硬件实现方法的进一步设计在于,最终结果计算模块进行最终结果计算的流程为:当最终结果计算模块得到中间结果计算完成信号时,地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址;同时输入到复数乘累加器得到最终结果ZL,由地址生成器生成最终结果存储地址,将最终结果存入bank中。The further design of the hardware implementation method of the parallel generalized inner product operation is that the final result calculation module performs the final result calculation process as follows: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the columns of the matrix X The address of the X L element and the address of the corresponding intermediate result vector Y L element; at the same time input to the complex multiplication accumulator to obtain the final result Z L , the address generator generates the final result storage address, and stores the final result in the bank.

所述并行广义内积运算的硬件实现方法的进一步设计在于,所述复数乘法器均为延迟4个时钟周期的流水单精度浮点运算单元,复数乘法器的访存延迟设定为6个周期。The further design of the hardware implementation method of the parallel generalized inner product operation is that the complex multipliers are pipelined single-precision floating-point units with a delay of 4 clock cycles, and the memory access delay of the complex multipliers is set to 6 cycles .

所述并行广义内积运算的硬件实现方法的进一步设计在于,所述复数乘累加器为五个,其中四个用于四路并行计算中间结果,另一个用于同步计算最终结果。A further design of the hardware implementation method of the parallel generalized inner product operation is that there are five complex multiplication accumulators, four of which are used for four-way parallel calculation of intermediate results, and the other is used for synchronous calculation of final results.

所述并行广义内积运算的硬件实现方法的进一步设计在于,每个复数乘累加器由一个复数乘法器和三个复数加法器组成,在40nm CMOS工艺下DC综合的面积为19993.56μm2The further design of the hardware implementation method of the parallel generalized inner product operation is that each complex multiplication accumulator is composed of a complex multiplier and three complex adders, and the area of DC integration is 19993.56 μm 2 in 40nm CMOS technology.

本发明的优点Advantages of the invention

本发明提供的并行广义内积重构控制器采用计算一个中间结果后立即计算一个最终结果元素的策略,计算ZL-1的时间可以被隐藏于计算YL的时间内,计算时间少且存储资源利用率大。该并行广义内积重构控制器可满足在许多信号检测应用场景中进行非均匀检测时,获取检验统计量的高实时性要求。The parallel generalized inner product reconstruction controller provided by the present invention adopts the strategy of calculating a final result element immediately after calculating an intermediate result, and the time for calculating ZL -1 can be hidden in the time for calculating YL , and the calculation time is less and storage High resource utilization. The parallel generalized inner product reconstruction controller can meet the high real-time requirements for obtaining test statistics when performing non-uniform detection in many signal detection application scenarios.

附图说明Description of drawings

图1是并行广义内积重构控制器的架构示意图。Figure 1 is a schematic diagram of the architecture of a parallel generalized inner product reconstruction controller.

图2是并行广义内积数据存储示意图。Fig. 2 is a schematic diagram of parallel generalized inner product data storage.

图3是并行广义内积算法计算流程示意图。Fig. 3 is a schematic diagram of the calculation flow of the parallel generalized inner product algorithm.

具体实施方式Detailed ways

下面结合附图和具体实现案例对本发明进行详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings and specific implementation cases.

如图1,本实施例的并行广义内积重构控制器以四路并行为例,主要由由三个子模块组成,分别为:中间结果计算模块、最终结果计算模块以及数据存储地址处理模块。中间结果计算模块用于计算中间结果;最终结果计算模块计算最终结果;数据存储地址处理模块处理bank地址等相关信号。As shown in Figure 1, the parallel generalized inner product reconstruction controller of this embodiment takes four-way parallelism as an example, and is mainly composed of three sub-modules: an intermediate result calculation module, a final result calculation module, and a data storage address processing module. The intermediate result calculation module is used to calculate intermediate results; the final result calculation module calculates the final result; the data storage address processing module processes bank addresses and other related signals.

中间结果计算模块,完全流水的计算中间结果向量YL,包括生成XL列元素地址,对XL一列元素与方阵TMxM每一列进行内积乘累加运算,得到中间结果向量YL,生成向量YL的地址,存入bank。每完成一个YL的计算给出一个完成信号给最终结果计算模块,作为它的一次计算的启动信号。The intermediate result calculation module is a fully pipelined calculation of the intermediate result vector Y L , including generating the element address of the X L column, performing the inner product multiplication and accumulation operation on the elements of the X L column and each column of the square matrix T MxM to obtain the intermediate result vector Y L , and generating Address of vector Y L , stored in bank. Every time a calculation of Y L is completed, a completion signal is given to the final result calculation module as a start signal for its calculation.

最终结果计算模块,通过地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址,读数据进入复数乘累加器得到结果矩阵Z1xN第L个元素ZL,生成向量ZL的地址,存入bank。The final result calculation module continuously generates the address of the column X L element of the matrix X and the address of the corresponding intermediate result vector Y L element through the address generator, and reads the data into the complex multiplication accumulator to obtain the Lth element Z L of the result matrix Z 1xN , Generate the address of the vector Z L and store it in the bank.

数据存储地址处理模块,根据乒乓操作选择信号进行数据选择,同时对来自中间结果计算模块和最终结果计算模块的针对同一个bank的信号进行处理,生成正确的bank地址等信号。The data storage address processing module selects data according to the ping-pong operation selection signal, and simultaneously processes the signals for the same bank from the intermediate result calculation module and the final result calculation module to generate correct bank address and other signals.

如图1,存储单元包括15个bank,其中矩阵T存放于bank0-7,矩阵X存放于bank8-11,中间结果YL存放到bank12和bank13中,最终的并行广义内积矩阵存储在bank14中。运算单元包括5个复数乘累加器,复数乘累加器0-3用于四路并行计算中间结果,复数乘累加器4用于同时计算最终结果。As shown in Figure 1, the storage unit includes 15 banks, where the matrix T is stored in bank0-7, the matrix X is stored in bank8-11, the intermediate result Y L is stored in bank12 and bank13, and the final parallel generalized inner product matrix is stored in bank14 . The arithmetic unit includes 5 complex multiplication accumulators, complex multiplication accumulators 0-3 are used for four-way parallel calculation of intermediate results, and complex multiplication accumulator 4 is used for simultaneous calculation of final results.

如图2所示是并行广义内积数据存储示意图。其源数据存储方式为:矩阵T按列存放在bank0-bank3中,存满之后继续按列存放于bank4-bank7中;矩阵X按列存放在bank8-bank11中。如此存放便于计算中间结果YL时进行4路并行运算,也可以简化相应的DMA模块的设计;中间结果YL,Y1、Y3…等奇数项存放到bank12中(后者覆盖前者),Y2、Y4…等偶数项存放到bank13当中(后者覆盖前者)。最终的广义内积矩阵存储在bank14中。Figure 2 is a schematic diagram of parallel generalized inner product data storage. The source data storage method is as follows: the matrix T is stored in bank0-bank3 by column, and continues to be stored in bank4-bank7 by column after it is full; the matrix X is stored in bank8-bank11 by column. Such storage is convenient for 4-way parallel operation when calculating the intermediate result Y L , and can also simplify the design of the corresponding DMA module; intermediate results Y L , Y 1 , Y 3 ... and other odd items are stored in bank12 (the latter covers the former), Even items such as Y 2 , Y 4 ... are stored in bank13 (the latter covers the former). The final generalized inner product matrix is stored in bank14.

如图3,并行广义内积算法进行中间结果计算的流程为:在一次运算过程中,首先地址生成器1生成X的一列元素XL和四列T矩阵元素地址,同时搬运对应的矩阵元素数据,输入复数乘累加器得到中间结果YL,然后由地址生成器2生成中间结果存储地址,将中间结果存入bank中。As shown in Figure 3, the process of calculating the intermediate results of the parallel generalized inner product algorithm is as follows: in the course of one operation, first, the address generator 1 generates the addresses of one column element X L of X and four columns T matrix elements, and at the same time transfers the corresponding matrix element data , input the complex number multiplied by the accumulator to get the intermediate result Y L , then the address generator 2 generates the storage address of the intermediate result, and stores the intermediate result in the bank.

同理,并行广义内积算法进行最终结果计算的流程为:在一次运算过程中,当该模块得到中间结果计算完成信号时,地址生成器1连续生成矩阵X的列XL元素的地址,和相应中间结果向量YL元素的地址。同时输入到复数乘累加器得到最终结果ZL,然后由地址生成器2生成最终结果存储地址,将最终结果存入bank中。Similarly, the process of calculating the final result of the parallel generalized inner product algorithm is as follows: during one operation, when the module receives a signal of completion of intermediate result calculation, the address generator 1 continuously generates the addresses of the column X L elements of the matrix X, and The address of the corresponding intermediate result vector Y L element. At the same time, it is input to the complex multiplication accumulator to obtain the final result Z L , and then the address generator 2 generates the storage address of the final result, and stores the final result in the bank.

本发明所述并行广义内积算法硬件实现一次完整的计算包括如下步骤:The parallel generalized inner product algorithm hardware of the present invention realizes a complete calculation including the following steps:

步骤1)置L=1,从矩阵X的第一列开始计算;Step 1) put L=1, start to calculate from the first column of matrix X;

步骤2)计算中间结果YLStep 2) Calculate the intermediate result Y L .

计算中间结果YL包括如下步骤:Calculating the intermediate result Y L includes the following steps:

步骤2-1)根据地址生成器子模块所生成的地址,依次取XL和(T1T2T3T4)的元素送入乘累加子模块进行复数乘累加运算,得到(YL1YL2YL3YL4);Step 2-1) According to the address generated by the address generator sub-module, the elements of X L and (T 1 T 2 T 3 T 4 ) are sequentially taken and sent to the multiplication and accumulation sub-module for complex multiplication and accumulation operation to obtain (Y L1 Y L2 Y L3 Y L4 );

步骤2-2)根据地址生成器子模块所生成的地址将(YL1YL2YL3YL4)顺序写入中间结果bank中,同时取下一组4列T矩阵元素和XL,重复1)和2),直到完成YL的计算;Step 2-2) Write (Y L1 Y L2 Y L3 Y L4 ) into the intermediate result bank sequentially according to the address generated by the address generator sub-module, and at the same time take the next set of 4-column T matrix elements and X L , repeat 1 ) and 2), until the calculation of Y L is completed;

步骤3)计算最终结果ZL。与1),2)同步进行,若已产生YL-1,根据地址生成器所生成的地址依次取XL-1和YL-1的元素进行复数乘累加,得到ZL-1,根据地址生成器所生成的地址将最终结果写入最终结果bank中;Step 3) Calculate the final result Z L . Synchronous with 1) and 2), if Y L-1 has been generated, according to the address generated by the address generator, the elements of X L-1 and Y L-1 are sequentially taken for complex multiplication and accumulation, and Z L-1 is obtained according to The address generated by the address generator writes the final result into the final result bank;

步骤4)若L<N,L=L+1,跳转到步骤二,;Step 4) If L<N, L=L+1, jump to step 2;

步骤5)依次取XN和YN的元素进行复数乘累加,得到ZN,存入bank中,完成内积运算。Step 5) Take the elements of X N and Y N in turn to perform complex multiplication and accumulation to obtain Z N , store it in the bank, and complete the inner product operation.

本实施例的并行广义内积重构控制器中所用到的复数乘法器,复数加法器均为延迟4个时钟周期的流水单精度浮点运算单元,访存延迟为6个周期,采用EDA仿真/综合工具,工作主频达1GHz。The complex multipliers and complex adders used in the parallel generalized inner product reconstruction controller of this embodiment are pipelined single-precision floating-point units with a delay of 4 clock cycles, and the memory access delay is 6 cycles, using EDA simulation /Comprehensive tool, the working frequency is up to 1GHz.

本实施例的并行广义内积重构控制器总计耗用五个复数乘累加器,其中四个用来四路并行计算中间结果,另一个用来同步计算最终结果。每个复数乘累加器由一个复数乘法器和三个复数加法器构成,在40nm CMOS工艺下DC综合的面积为19993.56μm2The parallel generalized inner product reconstruction controller of this embodiment consumes five complex multiply-accumulators in total, four of which are used for four-way parallel calculation of intermediate results, and the other is used for synchronous calculation of final results. Each complex multiply-accumulator is composed of a complex multiplier and three complex adders, and the area of DC integration is 19993.56μm 2 in 40nm CMOS process.

本实施例的并行广义内积重构控制器采用计算一个中间结果后立即计算一个最终结果元素的策略,计算ZL-1的时间可以被隐藏于计算YL的时间内,相比于计算完整中间结果后并行计算最终结果的方法,计算时间少且存储资源利用率高。The parallel generalized inner product reconstruction controller of this embodiment adopts the strategy of calculating a final result element immediately after calculating an intermediate result, and the time for calculating ZL -1 can be hidden in the time for calculating YL , which is compared to the time for calculating the complete The method of calculating the final result in parallel after the intermediate result has less calculation time and high storage resource utilization.

本实施例的并行广义内积重构控制器的特点为计算速度快,点数灵活可变且存储资源利用率高。可以满足在数据量较大的数字信号处理,例如即时信号检测应用场景中进行非均匀检测时,获取检验统计量的高实时性要求。The parallel generalized inner product reconstruction controller of this embodiment is characterized by fast calculation speed, flexible and variable number of points, and high utilization rate of storage resources. It can meet the high real-time requirements for obtaining test statistics when performing non-uniform detection in digital signal processing with a large amount of data, such as real-time signal detection application scenarios.

以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或变换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of changes or modifications within the technical scope disclosed in the present invention. Any transformation should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.

Claims (10)

1.一种并行广义内积重构控制器,其特征在于:包括:1. A parallel generalized inner product reconstruction controller, characterized in that: comprising: 中间结果计算模块,接收源数据并根据源数据计算中间结果向量YL,生成向量YL的地址,存入bank;每完成一个中间结果向量YL的计算生成一个完成信号,并将所述完成信号发送至最终结果计算模块,作为启动信号;The intermediate result calculation module receives the source data and calculates the intermediate result vector Y L according to the source data, generates the address of the vector Y L , and stores it in the bank; every time the calculation of an intermediate result vector Y L is completed, a completion signal is generated, and the completed The signal is sent to the final result calculation module as a start signal; 最终结果计算模块,通过地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址,读数据进入复数乘累加器进行最终结果计算得到结果矩阵Z1xN第L个元素ZL,生成向量ZL的地址,存入bank;The final result calculation module continuously generates the address of the column X L element of matrix X and the address of the corresponding intermediate result vector Y L element through the address generator, and reads the data into the complex multiplication accumulator for final result calculation to obtain the result matrix Z 1xNth L The element Z L generates the address of the vector Z L and stores it in the bank; 数据存储地址处理模块,根据乒乓操作选择信号进行数据选择,同时对来自中间结果计算模块和最终结果计算模块的针对同一个bank的信号进行处理,生成正确的bank地址信号。The data storage address processing module selects data according to the ping-pong operation selection signal, and simultaneously processes signals for the same bank from the intermediate result calculation module and the final result calculation module to generate correct bank address signals. 2.根据权利要求1所述的并行广义内积运算的硬件实现方法,其特征在于:计算YL的过程是XL和方阵T,每一列乘累加的过程,所述方阵T的行列数与矩阵X的列数相等,该乘累加的过程通过多路并行计算实现。2. the hardware implementation method of parallel generalized inner product operation according to claim 1, is characterized in that: the process of calculating Y L is X L and square matrix T, and the process of multiplying and accumulating each column, the ranks of described square matrix T The number is equal to the number of columns of the matrix X, and the process of multiplying and accumulating is realized by multi-channel parallel computing. 3.根据权利要求2所述的并行广义内积重构控制器,其特征在于:中间结果计算模块采用四路并行的实现方式实现。3. The parallel generalized inner product reconstruction controller according to claim 2, characterized in that: the intermediate result calculation module is implemented in a four-way parallel implementation. 4.根据权利要求3所述的并行广义内积重构控制器,其特征在于:中间结果计算模块的源数据存储方式为:矩阵T按列存放在bank0-bank3中,存满之后继续按列存放于bank4-bank7中;矩阵X按列存放在bank8-bank11中。4. The parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the source data storage mode of the intermediate result calculation module is: the matrix T is stored in bank0-bank3 by column, and continues to be column-by-column after it is full Stored in bank4-bank7; matrix X is stored in bank8-bank11 by column. 5.根据权利要求3所述的并行广义内积重构控制器,其特征在于:中间结果计算模块的中间结果存储方式为:奇数项存放到bank12中,偶数项存放到bank13中。5. The parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the intermediate result storage method of the intermediate result calculation module is as follows: odd items are stored in bank12, and even items are stored in bank13. 6.根据权利要求1所述的并行广义内积重构控制器,其特征在于:中间结果计算模块进行中间结果计算的流程为:在一次运算过程中,首先地址生成器生成X的一列元素XL和四列T矩阵元素地址,同时搬运对应的矩阵元素数据,输入复数乘累加器得到中间结果YL;接着由地址生成器生成中间结果存储地址,将中间结果存入bank中。6. The parallel generalized inner product reconstruction controller according to claim 1, characterized in that: the intermediate result calculation module performs the intermediate result calculation process as follows: in an operation process, at first the address generator generates a column element X of X L and four columns of T matrix element addresses, simultaneously transport the corresponding matrix element data, and input the complex multiplication accumulator to obtain the intermediate result Y L ; then the address generator generates the intermediate result storage address, and stores the intermediate result in the bank. 7.根据权利要求1所述的并行广义内积重构控制器,其特征在于:最终结果计算模块进行最终结果计算的流程为:当最终结果计算模块得到中间结果计算完成信号时,地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址;同时输入到复数乘累加器得到最终结果ZL,由地址生成器生成最终结果存储地址,将最终结果存入bank中。7. parallel generalized inner product reconstruction controller according to claim 1, is characterized in that: the flow process that final result calculation module carries out final result calculation is: when final result calculation module obtains intermediate result calculation completion signal, address generator Continuously generate the address of the column X L element of the matrix X and the address of the corresponding intermediate result vector Y L element; at the same time input to the complex multiplication accumulator to obtain the final result Z L , the address generator generates the final result storage address, and stores the final result in in bank. 8.根据权利要求1所述的并行广义内积重构控制器,其特征在于:所述复数乘法器均为延迟4个时钟周期的流水单精度浮点运算单元,复数乘法器的访存延迟设定为6个周期。8. parallel generalized inner product reconfiguration controller according to claim 1, is characterized in that: described complex multiplier is the pipeline single-precision floating-point operation unit that delays 4 clock cycles, and the memory access delay of complex multiplier Set to 6 cycles. 9.根据权利要求1所述的并行广义内积重构控制器,其特征在于:所述复数乘累加器为五个,其中四个用于四路并行计算中间结果,另一个用于同步计算最终结果。9. The parallel generalized inner product reconstruction controller according to claim 1, characterized in that: there are five complex multiplication accumulators, four of which are used for four-way parallel calculation of intermediate results, and the other is used for synchronous calculation Final result. 10.根据权利要求1所述的并行广义内积重构控制器,其特征在于:每个复数乘累加器由一个复数乘法器和三个复数加法器组成,在40nm CMOS工艺下DC综合的面积为19993.56μm210. parallel generalized inner product reconstruction controller according to claim 1, is characterized in that: each complex multiplying accumulator is made up of a complex multiplier and three complex adders, and the area of DC synthesis under 40nm CMOS technology It is 19993.56 μm 2 .
CN201810497969.2A 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller Active CN108762719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810497969.2A CN108762719B (en) 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810497969.2A CN108762719B (en) 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller

Publications (2)

Publication Number Publication Date
CN108762719A true CN108762719A (en) 2018-11-06
CN108762719B CN108762719B (en) 2023-06-06

Family

ID=64004919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810497969.2A Active CN108762719B (en) 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller

Country Status (1)

Country Link
CN (1) CN108762719B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795687A (en) * 2019-10-29 2020-02-14 南京宁麒智能计算芯片研究院有限公司 Hierarchical segmentation system and method for autocorrelation algorithm
CN110796193A (en) * 2019-10-29 2020-02-14 南京宁麒智能计算芯片研究院有限公司 Reconfigurable KNN algorithm-based hardware implementation system and method
CN111045965A (en) * 2019-10-25 2020-04-21 南京大学 Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276902A (en) * 1988-11-07 1994-01-04 Fujitsu Limited Memory access system for vector data processed or to be processed by a vector processor
CN104794002A (en) * 2014-12-29 2015-07-22 南京大学 Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources
CN106855618A (en) * 2017-03-06 2017-06-16 西安电子科技大学 Based on the interference sample elimination method under broad sense inner product General Cell
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276902A (en) * 1988-11-07 1994-01-04 Fujitsu Limited Memory access system for vector data processed or to be processed by a vector processor
CN104794002A (en) * 2014-12-29 2015-07-22 南京大学 Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN106855618A (en) * 2017-03-06 2017-06-16 西安电子科技大学 Based on the interference sample elimination method under broad sense inner product General Cell

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张多利等: "二维高精度MUSIC算法的高速实现", 《合肥工业大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045965A (en) * 2019-10-25 2020-04-21 南京大学 Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method
CN110795687A (en) * 2019-10-29 2020-02-14 南京宁麒智能计算芯片研究院有限公司 Hierarchical segmentation system and method for autocorrelation algorithm
CN110796193A (en) * 2019-10-29 2020-02-14 南京宁麒智能计算芯片研究院有限公司 Reconfigurable KNN algorithm-based hardware implementation system and method

Also Published As

Publication number Publication date
CN108762719B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
JP7566105B2 (en) Vector calculation unit in neural network processor
CN108133270B (en) Convolutional Neural Network Acceleration Method and Device
CN108205519A (en) The multiply-add arithmetic unit of matrix and method
CN108762719B (en) Parallel generalized inner product reconstruction controller
CN103136165B (en) A kind of method of the Adaptive Sidelobe Canceling weights based on FPGA
CN109144469B (en) Pipeline structure neural network matrix operation architecture and method
CN103955447A (en) FFT accelerator based on DSP chip
Shiri et al. An FPGA implementation of singular value decomposition
JP7435602B2 (en) Computing equipment and computing systems
Mohanty et al. Design and performance analysis of fixed-point jacobi svd algorithm on reconfigurable system
Pathan et al. FPGA Based performance analysis of multiplier policies for FIR filter
CN102129419B (en) Based on the processor of fast fourier transform
CN103699355B (en) Variable-order pipeline serial multiply-accumulator
CN109446478A (en) A kind of complex covariance matrix computing system based on iteration and restructural mode
CN111008697B (en) Convolutional neural network accelerator implementation architecture
CN105893333B (en) A kind of hardware circuit for calculating covariance matrix in MUSIC algorithms
Zhao et al. An fpga-based hardware accelerator of ransac algorithm for matching of images feature points
CN104460444B (en) FPGA operational circuit based on generalized correlation coefficients
CN114244460B (en) Heterogeneous accelerated multi-path channel signal real-time generation method
CN104598199B (en) The data processing method and system of a kind of Montgomery modular multipliers for smart card
Anuradha et al. Implementation of high speed 64-bit MAC unit using FPGA
Sotiropoulos et al. A fast parallel matrix multiplication reconfigurable unit utilized in face recognitions systems
RU188978U1 (en) UNIFIED RECONFIGURED SCHEME OF COMMUTATION OF FAST FURIET TRANSFORMATION
CN204143432U (en) A kind of multiplier-divider
CN113592075A (en) Convolution operation device, method and chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant