CN108762719A - A kind of parallel broad sense inner product reconfigurable controller - Google Patents
A kind of parallel broad sense inner product reconfigurable controller Download PDFInfo
- Publication number
- CN108762719A CN108762719A CN201810497969.2A CN201810497969A CN108762719A CN 108762719 A CN108762719 A CN 108762719A CN 201810497969 A CN201810497969 A CN 201810497969A CN 108762719 A CN108762719 A CN 108762719A
- Authority
- CN
- China
- Prior art keywords
- address
- intermediate result
- inner product
- calculation module
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013215 result calculation Methods 0.000 claims abstract description 43
- 239000011159 matrix material Substances 0.000 claims abstract description 42
- 238000004364 calculation method Methods 0.000 claims abstract description 22
- 238000013500 data storage Methods 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000000034 method Methods 0.000 claims description 34
- 230000001360 synchronised effect Effects 0.000 claims description 4
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 230000001934 delay Effects 0.000 claims 1
- 238000003786 synthesis reaction Methods 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 11
- 238000012360 testing method Methods 0.000 abstract description 5
- 230000009286 beneficial effect Effects 0.000 abstract 1
- 238000013461 design Methods 0.000 description 10
- 238000009825 accumulation Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
- G06F12/0646—Configuration or reconfiguration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Nonlinear Science (AREA)
- Logic Circuits (AREA)
- Complex Calculations (AREA)
Abstract
本发明的并行广义内积重构控制器,包括:中间结果计算模块,接收源数据并根据源数据计算中间结果向量,生成向量的地址,存入bank;每完成一个中间结果向量的计算生成一个完成信号,并将所述完成信号发送至最终结果计算模块,作为启动信号;最终结果计算模块,读数据进入复数乘累加器进行最终结果计算得到结果矩阵第L个元素,生成向量的地址,存入bank;数据存储地址处理模块,根据乒乓操作选择信号进行数据选择,生成正确的bank地址信号。有益效果:计算时间少且存储资源利用率大,可满足在许多信号检测应用场景中进行非均匀检测时,获取检验统计量的高实时性要求。
The parallel generalized inner product reconstruction controller of the present invention includes: an intermediate result calculation module, which receives source data and calculates an intermediate result vector according to the source data , generating a vector The address is stored in the bank; every time an intermediate result vector is completed The calculation generates a completion signal, and sends the completion signal to the final result calculation module as a start signal; the final result calculation module reads data into the complex multiplication accumulator for final result calculation to obtain the result matrix Lth element , generating a vector The address is stored in the bank; the data storage address processing module selects the data according to the ping-pong operation selection signal, and generates the correct bank address signal. Beneficial effects: the calculation time is less and the storage resource utilization rate is large, and the high real-time requirement for obtaining test statistics can be met when non-uniform detection is performed in many signal detection application scenarios.
Description
技术领域technical field
本发明属于非均匀检测技术领域,尤其涉及一种并行广义内积重构控制器。The invention belongs to the technical field of non-uniform detection, in particular to a parallel generalized inner product reconstruction controller.
背景技术Background technique
空时自适应处理(STAP)是一种对运动目标的检测技术。常规STAP算法中,必须进行杂波协方差矩阵估计。当利用二次数据进行杂波协方差矩阵的估计时,二次数据必须满足独立同分布的条件,才能减少性能损失。Space-time adaptive processing (STAP) is a detection technology for moving objects. In the conventional STAP algorithm, the clutter covariance matrix must be estimated. When using quadratic data to estimate the clutter covariance matrix, the quadratic data must meet the condition of independent and identical distribution in order to reduce performance loss.
在实际应用中,所检测到的信号回波不仅会被自然杂波污染,还会受到人为的非均匀干扰所污染,因此经常不满足独立同分布条件。In practical applications, the detected signal echo will not only be polluted by natural clutter, but also be polluted by man-made non-uniform interference, so the independent and identical distribution condition is often not satisfied.
针对样本中的干扰目标,Melvin首先提出了非均匀检测器(NHD)的思想,通过剔除包含干扰目标的样本,来抑制其对杂波协方差矩阵估计的影响。NHD的基本思路为:根据被干扰目标污染的样本与其他样本统计特性的差异,设置相应的检验统计量来区分两种样本。For the interference target in the sample, Melvin first proposed the idea of non-uniform detector (NHD), which can suppress its influence on the estimation of the clutter covariance matrix by eliminating the sample containing the interference target. The basic idea of NHD is: according to the difference between the statistical characteristics of the sample polluted by the interference target and other samples, set the corresponding test statistics to distinguish the two samples.
在NHD检验统计量选取方面,美国海军实验室Gerlach等人提出了广义内积(GIP)和自适应功率剩余两个准则。令XL表示初始样本中的第L个样本,则其对应的自相关矩阵表示为:其中T为杂噪协方差矩阵,令表示由L个样本组成的样本协方差矩阵,则每个样本对应的GIP值可表示为:根据每个样本对应的GIP值,可以有效剔除干扰目标。In the selection of NHD test statistics, Gerlach et al. proposed two criteria of generalized inner product (GIP) and adaptive power residual. Let X L represent the Lth sample in the initial sample, then its corresponding autocorrelation matrix is expressed as: where T is the noise covariance matrix, so that Represents a sample covariance matrix composed of L samples, then the GIP value corresponding to each sample can be expressed as: According to the GIP value corresponding to each sample, the interference target can be effectively eliminated.
广义内积非均匀检测方法对杂波的抑制能力与样本的数量大小有关,样本数量越大,杂波协方差矩阵数据越真实,其对杂波的抑制能力越强。软件上实现广义内积非均匀检测方法对大量样本进行计算时存在精度不高和运算时间过长的问题,以满足实际非均匀检测技术的高实时性要求。The ability of the generalized inner product non-uniform detection method to suppress clutter is related to the number of samples. The larger the number of samples, the more realistic the clutter covariance matrix data, and the stronger the ability to suppress clutter. The generalized inner product non-uniform detection method implemented in software has the problems of low precision and long operation time when calculating a large number of samples, so as to meet the high real-time requirements of the actual non-uniform detection technology.
发明内容Contents of the invention
本发明的目的是克服上述背景技术中的不足,提出一种并行广义内积重构控制器,更好地满足实际应用的高实时性、大点数计算的需求,具体通过以下技术方案来实现的:The purpose of the present invention is to overcome the deficiencies in the above-mentioned background technology, and propose a parallel generalized inner product reconstruction controller to better meet the needs of high real-time and large-point calculations in practical applications, specifically through the following technical solutions. :
所述并行广义内积重构控制器包括:The parallel generalized inner product reconstruction controller includes:
中间结果计算模块,接收源数据并根据源数据计算中间结果向量YL,生成向量YL的地址,存入bank;每完成一个中间结果向量YL的计算生成一个完成信号,并将所述完成信号发送至最终结果计算模块,作为启动信号;The intermediate result calculation module receives the source data and calculates the intermediate result vector Y L according to the source data, generates the address of the vector Y L , and stores it in the bank; every time the calculation of an intermediate result vector Y L is completed, a completion signal is generated, and the completed The signal is sent to the final result calculation module as a start signal;
最终结果计算模块,通过地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址,读数据进入复数乘累加器得到结果矩阵Z1xN第L个元素ZL,生成向量ZL的地址,存入bank;The final result calculation module continuously generates the address of the column X L element of the matrix X and the address of the corresponding intermediate result vector Y L element through the address generator, and reads the data into the complex multiplication accumulator to obtain the Lth element Z L of the result matrix Z 1xN , Generate the address of the vector Z L and store it in the bank;
数据存储地址处理模块,根据乒乓操作选择信号进行数据选择,同时对来自中间结果计算模块和最终结果计算模块的针对同一个bank的信号进行处理,生成正确的bank地址信号。The data storage address processing module selects data according to the ping-pong operation selection signal, and simultaneously processes signals for the same bank from the intermediate result calculation module and the final result calculation module to generate correct bank address signals.
所述并行广义内积运算的硬件实现方法的进一步设计在于,计算YL的过程是XL和方阵T,每一列乘累加的过程,所述方阵T的行列数与矩阵X的列数相等,该乘累加的过程通过多路并行计算实现。The further design of the hardware realization method of described parallel generalized inner product operation is that the process of calculating Y L is X L and square matrix T, and the process of multiplying and accumulating each column, the number of rows and columns of said square matrix T and the number of columns of matrix X equal, the process of multiplying and accumulating is realized by multi-channel parallel computing.
所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块采用四路并行的实现方式实现。A further design of the hardware implementation method of the parallel generalized inner product operation is that the intermediate result calculation module is implemented in a four-way parallel implementation.
所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块的源数据存储方式为:矩阵T按列存放在bank0-bank3中,存满之后继续按列存放于bank4-bank7中;矩阵X按列存放在bank8-bank11中。The further design of the hardware implementation method of the parallel generalized inner product operation is that the source data storage method of the intermediate result calculation module is: the matrix T is stored in bank0-bank3 by column, and continues to be stored in bank4-bank7 by column after it is full ;Matrix X is stored in bank8-bank11 by column.
所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块的中间结果存储方式为:奇数项存放到bank12中,偶数项存放到bank13中。The further design of the hardware implementation method of the parallel generalized inner product operation lies in that the intermediate result storage mode of the intermediate result calculation module is as follows: odd items are stored in bank12, and even items are stored in bank13.
所述并行广义内积运算的硬件实现方法的进一步设计在于,中间结果计算模块进行中间结果计算的流程为:在一次运算过程中,首先地址生成器生成X的一列元素XL和四列T矩阵元素地址,同时搬运对应的矩阵元素数据,输入复数乘累加器得到中间结果YL;接着由地址生成器生成中间结果存储地址,将中间结果存入bank中。The further design of the hardware implementation method of the parallel generalized inner product operation is that the intermediate result calculation module performs the intermediate result calculation process as follows: in an operation process, first the address generator generates a column of elements X L and four columns of T matrix of X The element address, and at the same time transport the corresponding matrix element data, input the complex number multiplied by the accumulator to obtain the intermediate result Y L ; then the address generator generates the storage address of the intermediate result, and stores the intermediate result in the bank.
所述并行广义内积运算的硬件实现方法的进一步设计在于,最终结果计算模块进行最终结果计算的流程为:当最终结果计算模块得到中间结果计算完成信号时,地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址;同时输入到复数乘累加器得到最终结果ZL,由地址生成器生成最终结果存储地址,将最终结果存入bank中。The further design of the hardware implementation method of the parallel generalized inner product operation is that the final result calculation module performs the final result calculation process as follows: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the columns of the matrix X The address of the X L element and the address of the corresponding intermediate result vector Y L element; at the same time input to the complex multiplication accumulator to obtain the final result Z L , the address generator generates the final result storage address, and stores the final result in the bank.
所述并行广义内积运算的硬件实现方法的进一步设计在于,所述复数乘法器均为延迟4个时钟周期的流水单精度浮点运算单元,复数乘法器的访存延迟设定为6个周期。The further design of the hardware implementation method of the parallel generalized inner product operation is that the complex multipliers are pipelined single-precision floating-point units with a delay of 4 clock cycles, and the memory access delay of the complex multipliers is set to 6 cycles .
所述并行广义内积运算的硬件实现方法的进一步设计在于,所述复数乘累加器为五个,其中四个用于四路并行计算中间结果,另一个用于同步计算最终结果。A further design of the hardware implementation method of the parallel generalized inner product operation is that there are five complex multiplication accumulators, four of which are used for four-way parallel calculation of intermediate results, and the other is used for synchronous calculation of final results.
所述并行广义内积运算的硬件实现方法的进一步设计在于,每个复数乘累加器由一个复数乘法器和三个复数加法器组成,在40nm CMOS工艺下DC综合的面积为19993.56μm2。The further design of the hardware implementation method of the parallel generalized inner product operation is that each complex multiplication accumulator is composed of a complex multiplier and three complex adders, and the area of DC integration is 19993.56 μm 2 in 40nm CMOS technology.
本发明的优点Advantages of the invention
本发明提供的并行广义内积重构控制器采用计算一个中间结果后立即计算一个最终结果元素的策略,计算ZL-1的时间可以被隐藏于计算YL的时间内,计算时间少且存储资源利用率大。该并行广义内积重构控制器可满足在许多信号检测应用场景中进行非均匀检测时,获取检验统计量的高实时性要求。The parallel generalized inner product reconstruction controller provided by the present invention adopts the strategy of calculating a final result element immediately after calculating an intermediate result, and the time for calculating ZL -1 can be hidden in the time for calculating YL , and the calculation time is less and storage High resource utilization. The parallel generalized inner product reconstruction controller can meet the high real-time requirements for obtaining test statistics when performing non-uniform detection in many signal detection application scenarios.
附图说明Description of drawings
图1是并行广义内积重构控制器的架构示意图。Figure 1 is a schematic diagram of the architecture of a parallel generalized inner product reconstruction controller.
图2是并行广义内积数据存储示意图。Fig. 2 is a schematic diagram of parallel generalized inner product data storage.
图3是并行广义内积算法计算流程示意图。Fig. 3 is a schematic diagram of the calculation flow of the parallel generalized inner product algorithm.
具体实施方式Detailed ways
下面结合附图和具体实现案例对本发明进行详细说明。The present invention will be described in detail below in conjunction with the accompanying drawings and specific implementation cases.
如图1,本实施例的并行广义内积重构控制器以四路并行为例,主要由由三个子模块组成,分别为:中间结果计算模块、最终结果计算模块以及数据存储地址处理模块。中间结果计算模块用于计算中间结果;最终结果计算模块计算最终结果;数据存储地址处理模块处理bank地址等相关信号。As shown in Figure 1, the parallel generalized inner product reconstruction controller of this embodiment takes four-way parallelism as an example, and is mainly composed of three sub-modules: an intermediate result calculation module, a final result calculation module, and a data storage address processing module. The intermediate result calculation module is used to calculate intermediate results; the final result calculation module calculates the final result; the data storage address processing module processes bank addresses and other related signals.
中间结果计算模块,完全流水的计算中间结果向量YL,包括生成XL列元素地址,对XL一列元素与方阵TMxM每一列进行内积乘累加运算,得到中间结果向量YL,生成向量YL的地址,存入bank。每完成一个YL的计算给出一个完成信号给最终结果计算模块,作为它的一次计算的启动信号。The intermediate result calculation module is a fully pipelined calculation of the intermediate result vector Y L , including generating the element address of the X L column, performing the inner product multiplication and accumulation operation on the elements of the X L column and each column of the square matrix T MxM to obtain the intermediate result vector Y L , and generating Address of vector Y L , stored in bank. Every time a calculation of Y L is completed, a completion signal is given to the final result calculation module as a start signal for its calculation.
最终结果计算模块,通过地址生成器连续生成矩阵X的列XL元素的地址和相应中间结果向量YL元素的地址,读数据进入复数乘累加器得到结果矩阵Z1xN第L个元素ZL,生成向量ZL的地址,存入bank。The final result calculation module continuously generates the address of the column X L element of the matrix X and the address of the corresponding intermediate result vector Y L element through the address generator, and reads the data into the complex multiplication accumulator to obtain the Lth element Z L of the result matrix Z 1xN , Generate the address of the vector Z L and store it in the bank.
数据存储地址处理模块,根据乒乓操作选择信号进行数据选择,同时对来自中间结果计算模块和最终结果计算模块的针对同一个bank的信号进行处理,生成正确的bank地址等信号。The data storage address processing module selects data according to the ping-pong operation selection signal, and simultaneously processes the signals for the same bank from the intermediate result calculation module and the final result calculation module to generate correct bank address and other signals.
如图1,存储单元包括15个bank,其中矩阵T存放于bank0-7,矩阵X存放于bank8-11,中间结果YL存放到bank12和bank13中,最终的并行广义内积矩阵存储在bank14中。运算单元包括5个复数乘累加器,复数乘累加器0-3用于四路并行计算中间结果,复数乘累加器4用于同时计算最终结果。As shown in Figure 1, the storage unit includes 15 banks, where the matrix T is stored in bank0-7, the matrix X is stored in bank8-11, the intermediate result Y L is stored in bank12 and bank13, and the final parallel generalized inner product matrix is stored in bank14 . The arithmetic unit includes 5 complex multiplication accumulators, complex multiplication accumulators 0-3 are used for four-way parallel calculation of intermediate results, and complex multiplication accumulator 4 is used for simultaneous calculation of final results.
如图2所示是并行广义内积数据存储示意图。其源数据存储方式为:矩阵T按列存放在bank0-bank3中,存满之后继续按列存放于bank4-bank7中;矩阵X按列存放在bank8-bank11中。如此存放便于计算中间结果YL时进行4路并行运算,也可以简化相应的DMA模块的设计;中间结果YL,Y1、Y3…等奇数项存放到bank12中(后者覆盖前者),Y2、Y4…等偶数项存放到bank13当中(后者覆盖前者)。最终的广义内积矩阵存储在bank14中。Figure 2 is a schematic diagram of parallel generalized inner product data storage. The source data storage method is as follows: the matrix T is stored in bank0-bank3 by column, and continues to be stored in bank4-bank7 by column after it is full; the matrix X is stored in bank8-bank11 by column. Such storage is convenient for 4-way parallel operation when calculating the intermediate result Y L , and can also simplify the design of the corresponding DMA module; intermediate results Y L , Y 1 , Y 3 ... and other odd items are stored in bank12 (the latter covers the former), Even items such as Y 2 , Y 4 ... are stored in bank13 (the latter covers the former). The final generalized inner product matrix is stored in bank14.
如图3,并行广义内积算法进行中间结果计算的流程为:在一次运算过程中,首先地址生成器1生成X的一列元素XL和四列T矩阵元素地址,同时搬运对应的矩阵元素数据,输入复数乘累加器得到中间结果YL,然后由地址生成器2生成中间结果存储地址,将中间结果存入bank中。As shown in Figure 3, the process of calculating the intermediate results of the parallel generalized inner product algorithm is as follows: in the course of one operation, first, the address generator 1 generates the addresses of one column element X L of X and four columns T matrix elements, and at the same time transfers the corresponding matrix element data , input the complex number multiplied by the accumulator to get the intermediate result Y L , then the address generator 2 generates the storage address of the intermediate result, and stores the intermediate result in the bank.
同理,并行广义内积算法进行最终结果计算的流程为:在一次运算过程中,当该模块得到中间结果计算完成信号时,地址生成器1连续生成矩阵X的列XL元素的地址,和相应中间结果向量YL元素的地址。同时输入到复数乘累加器得到最终结果ZL,然后由地址生成器2生成最终结果存储地址,将最终结果存入bank中。Similarly, the process of calculating the final result of the parallel generalized inner product algorithm is as follows: during one operation, when the module receives a signal of completion of intermediate result calculation, the address generator 1 continuously generates the addresses of the column X L elements of the matrix X, and The address of the corresponding intermediate result vector Y L element. At the same time, it is input to the complex multiplication accumulator to obtain the final result Z L , and then the address generator 2 generates the storage address of the final result, and stores the final result in the bank.
本发明所述并行广义内积算法硬件实现一次完整的计算包括如下步骤:The parallel generalized inner product algorithm hardware of the present invention realizes a complete calculation including the following steps:
步骤1)置L=1,从矩阵X的第一列开始计算;Step 1) put L=1, start to calculate from the first column of matrix X;
步骤2)计算中间结果YL。Step 2) Calculate the intermediate result Y L .
计算中间结果YL包括如下步骤:Calculating the intermediate result Y L includes the following steps:
步骤2-1)根据地址生成器子模块所生成的地址,依次取XL和(T1T2T3T4)的元素送入乘累加子模块进行复数乘累加运算,得到(YL1YL2YL3YL4);Step 2-1) According to the address generated by the address generator sub-module, the elements of X L and (T 1 T 2 T 3 T 4 ) are sequentially taken and sent to the multiplication and accumulation sub-module for complex multiplication and accumulation operation to obtain (Y L1 Y L2 Y L3 Y L4 );
步骤2-2)根据地址生成器子模块所生成的地址将(YL1YL2YL3YL4)顺序写入中间结果bank中,同时取下一组4列T矩阵元素和XL,重复1)和2),直到完成YL的计算;Step 2-2) Write (Y L1 Y L2 Y L3 Y L4 ) into the intermediate result bank sequentially according to the address generated by the address generator sub-module, and at the same time take the next set of 4-column T matrix elements and X L , repeat 1 ) and 2), until the calculation of Y L is completed;
步骤3)计算最终结果ZL。与1),2)同步进行,若已产生YL-1,根据地址生成器所生成的地址依次取XL-1和YL-1的元素进行复数乘累加,得到ZL-1,根据地址生成器所生成的地址将最终结果写入最终结果bank中;Step 3) Calculate the final result Z L . Synchronous with 1) and 2), if Y L-1 has been generated, according to the address generated by the address generator, the elements of X L-1 and Y L-1 are sequentially taken for complex multiplication and accumulation, and Z L-1 is obtained according to The address generated by the address generator writes the final result into the final result bank;
步骤4)若L<N,L=L+1,跳转到步骤二,;Step 4) If L<N, L=L+1, jump to step 2;
步骤5)依次取XN和YN的元素进行复数乘累加,得到ZN,存入bank中,完成内积运算。Step 5) Take the elements of X N and Y N in turn to perform complex multiplication and accumulation to obtain Z N , store it in the bank, and complete the inner product operation.
本实施例的并行广义内积重构控制器中所用到的复数乘法器,复数加法器均为延迟4个时钟周期的流水单精度浮点运算单元,访存延迟为6个周期,采用EDA仿真/综合工具,工作主频达1GHz。The complex multipliers and complex adders used in the parallel generalized inner product reconstruction controller of this embodiment are pipelined single-precision floating-point units with a delay of 4 clock cycles, and the memory access delay is 6 cycles, using EDA simulation /Comprehensive tool, the working frequency is up to 1GHz.
本实施例的并行广义内积重构控制器总计耗用五个复数乘累加器,其中四个用来四路并行计算中间结果,另一个用来同步计算最终结果。每个复数乘累加器由一个复数乘法器和三个复数加法器构成,在40nm CMOS工艺下DC综合的面积为19993.56μm2。The parallel generalized inner product reconstruction controller of this embodiment consumes five complex multiply-accumulators in total, four of which are used for four-way parallel calculation of intermediate results, and the other is used for synchronous calculation of final results. Each complex multiply-accumulator is composed of a complex multiplier and three complex adders, and the area of DC integration is 19993.56μm 2 in 40nm CMOS process.
本实施例的并行广义内积重构控制器采用计算一个中间结果后立即计算一个最终结果元素的策略,计算ZL-1的时间可以被隐藏于计算YL的时间内,相比于计算完整中间结果后并行计算最终结果的方法,计算时间少且存储资源利用率高。The parallel generalized inner product reconstruction controller of this embodiment adopts the strategy of calculating a final result element immediately after calculating an intermediate result, and the time for calculating ZL -1 can be hidden in the time for calculating YL , which is compared to the time for calculating the complete The method of calculating the final result in parallel after the intermediate result has less calculation time and high storage resource utilization.
本实施例的并行广义内积重构控制器的特点为计算速度快,点数灵活可变且存储资源利用率高。可以满足在数据量较大的数字信号处理,例如即时信号检测应用场景中进行非均匀检测时,获取检验统计量的高实时性要求。The parallel generalized inner product reconstruction controller of this embodiment is characterized by fast calculation speed, flexible and variable number of points, and high utilization rate of storage resources. It can meet the high real-time requirements for obtaining test statistics when performing non-uniform detection in digital signal processing with a large amount of data, such as real-time signal detection application scenarios.
以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或变换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求的保护范围为准。The above is only a preferred embodiment of the present invention, but the scope of protection of the present invention is not limited thereto. Any person skilled in the art can easily conceive of changes or modifications within the technical scope disclosed in the present invention. Any transformation should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the protection scope of the claims.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497969.2A CN108762719B (en) | 2018-05-21 | 2018-05-21 | Parallel generalized inner product reconstruction controller |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497969.2A CN108762719B (en) | 2018-05-21 | 2018-05-21 | Parallel generalized inner product reconstruction controller |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108762719A true CN108762719A (en) | 2018-11-06 |
CN108762719B CN108762719B (en) | 2023-06-06 |
Family
ID=64004919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810497969.2A Active CN108762719B (en) | 2018-05-21 | 2018-05-21 | Parallel generalized inner product reconstruction controller |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108762719B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110795687A (en) * | 2019-10-29 | 2020-02-14 | 南京宁麒智能计算芯片研究院有限公司 | Hierarchical segmentation system and method for autocorrelation algorithm |
CN110796193A (en) * | 2019-10-29 | 2020-02-14 | 南京宁麒智能计算芯片研究院有限公司 | Reconfigurable KNN algorithm-based hardware implementation system and method |
CN111045965A (en) * | 2019-10-25 | 2020-04-21 | 南京大学 | Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276902A (en) * | 1988-11-07 | 1994-01-04 | Fujitsu Limited | Memory access system for vector data processed or to be processed by a vector processor |
CN104794002A (en) * | 2014-12-29 | 2015-07-22 | 南京大学 | Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources |
CN106855618A (en) * | 2017-03-06 | 2017-06-16 | 西安电子科技大学 | Based on the interference sample elimination method under broad sense inner product General Cell |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
-
2018
- 2018-05-21 CN CN201810497969.2A patent/CN108762719B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276902A (en) * | 1988-11-07 | 1994-01-04 | Fujitsu Limited | Memory access system for vector data processed or to be processed by a vector processor |
CN104794002A (en) * | 2014-12-29 | 2015-07-22 | 南京大学 | Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN106855618A (en) * | 2017-03-06 | 2017-06-16 | 西安电子科技大学 | Based on the interference sample elimination method under broad sense inner product General Cell |
Non-Patent Citations (1)
Title |
---|
张多利等: "二维高精度MUSIC算法的高速实现", 《合肥工业大学学报(自然科学版)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111045965A (en) * | 2019-10-25 | 2020-04-21 | 南京大学 | Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method |
CN110795687A (en) * | 2019-10-29 | 2020-02-14 | 南京宁麒智能计算芯片研究院有限公司 | Hierarchical segmentation system and method for autocorrelation algorithm |
CN110796193A (en) * | 2019-10-29 | 2020-02-14 | 南京宁麒智能计算芯片研究院有限公司 | Reconfigurable KNN algorithm-based hardware implementation system and method |
Also Published As
Publication number | Publication date |
---|---|
CN108762719B (en) | 2023-06-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7566105B2 (en) | Vector calculation unit in neural network processor | |
CN108133270B (en) | Convolutional Neural Network Acceleration Method and Device | |
CN108205519A (en) | The multiply-add arithmetic unit of matrix and method | |
CN108762719B (en) | Parallel generalized inner product reconstruction controller | |
CN103136165B (en) | A kind of method of the Adaptive Sidelobe Canceling weights based on FPGA | |
CN109144469B (en) | Pipeline structure neural network matrix operation architecture and method | |
CN103955447A (en) | FFT accelerator based on DSP chip | |
Shiri et al. | An FPGA implementation of singular value decomposition | |
JP7435602B2 (en) | Computing equipment and computing systems | |
Mohanty et al. | Design and performance analysis of fixed-point jacobi svd algorithm on reconfigurable system | |
Pathan et al. | FPGA Based performance analysis of multiplier policies for FIR filter | |
CN102129419B (en) | Based on the processor of fast fourier transform | |
CN103699355B (en) | Variable-order pipeline serial multiply-accumulator | |
CN109446478A (en) | A kind of complex covariance matrix computing system based on iteration and restructural mode | |
CN111008697B (en) | Convolutional neural network accelerator implementation architecture | |
CN105893333B (en) | A kind of hardware circuit for calculating covariance matrix in MUSIC algorithms | |
Zhao et al. | An fpga-based hardware accelerator of ransac algorithm for matching of images feature points | |
CN104460444B (en) | FPGA operational circuit based on generalized correlation coefficients | |
CN114244460B (en) | Heterogeneous accelerated multi-path channel signal real-time generation method | |
CN104598199B (en) | The data processing method and system of a kind of Montgomery modular multipliers for smart card | |
Anuradha et al. | Implementation of high speed 64-bit MAC unit using FPGA | |
Sotiropoulos et al. | A fast parallel matrix multiplication reconfigurable unit utilized in face recognitions systems | |
RU188978U1 (en) | UNIFIED RECONFIGURED SCHEME OF COMMUTATION OF FAST FURIET TRANSFORMATION | |
CN204143432U (en) | A kind of multiplier-divider | |
CN113592075A (en) | Convolution operation device, method and chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |