CN109800468B

CN109800468B - Register retiming-based multi-pipeline sequential circuit boxing operation method

Info

Publication number: CN109800468B
Application number: CN201811587501.9A
Authority: CN
Inventors: 李鹏; 李运娣; 郭小波
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2018-12-25
Filing date: 2018-12-25
Publication date: 2022-09-30
Anticipated expiration: 2038-12-25
Also published as: CN109800468A

Abstract

The invention proposes a multi-level sequential circuit packing operation method based on register retiming. The binning algorithm packs the look-up table circuit; according to the direction of the critical path of the sequential circuit, the type of the sequential circuit of the pipeline level after packing is judged, and the critical path delay calculation method of the sequential circuit after packing is used to calculate the key of the sequential circuit of the pipeline level after packing. Path Delay; retimes intermediate registers in the circuit based on the critical path delay. The invention can distribute the time delay of each pipeline stage as evenly as possible through register retiming, thereby improving the throughput rate of pipeline application, so that the application system can process more data in unit time; The mobile feature can reduce the critical path delay of the entire sequential circuit pipeline.

Description

A multi-pipeline sequential circuit packing operation method based on register retiming

技术领域technical field

本发明涉及FPGA设计流程的技术领域，尤其涉及一种基于寄存器重定时的多流水级时序电路装箱操作方法。The invention relates to the technical field of FPGA design flow, in particular to a multi-pipeline sequential circuit packing operation method based on register retiming.

背景技术Background technique

在FPGA设计流程中，硬件程序员设计的硬件描述语言(Hardware DescriptionLanguage，HDL)经过逻辑综合产生门级网表(与非门电路网表)，门级网表经过映射产生查找表(Look Up Table，LUT)电路，查找表电路经过装箱到FPGA的更大单元的逻辑块中，然后经过布局布线最后生成可下载到FPGA的比特流文件，如图1所示。硬件电路设计从程序员设计的硬件描述语言到下载到芯片中的网表需要经过逻辑综合、映射、装箱、布局布线阶段的处理，而经过映射后产生的查找表电路可以通过装箱分配到FPGA逻辑块中。In the FPGA design process, the hardware description language (Hardware Description Language, HDL) designed by the hardware programmer generates a gate-level netlist (NAND gate circuit netlist) through logic synthesis, and the gate-level netlist is mapped to generate a look-up table (Look Up Table). , LUT) circuit, the look-up table circuit is packed into the logic block of the larger unit of the FPGA, and then through the layout and routing, the bit stream file that can be downloaded to the FPGA is finally generated, as shown in Figure 1. The hardware circuit design from the hardware description language designed by the programmer to the netlist downloaded into the chip needs to be processed in the stages of logic synthesis, mapping, boxing, and placement and routing, and the look-up table circuit generated after the mapping can be allocated to the chip through boxing. FPGA logic block.

查找表电路通过装箱，将大部分网线装入逻辑块中。文献[Betz V,Rose J.VPR:anew packing,placement and routing tool for FPGA research[C].Proceedings ofthe International Workshop on Field-Programmable Logic and Applications,London,UK,1997:213-222.]通过将紧密联系的网线装箱来减少外部线路的数量。文献[Marquardt A,Betz V,Rose J.Using cluster based logic blocks and timing-drivenpacking to improve FPGA speed and density[C].Proceedings of International ACMSymposium on Field-Programmable Gate Arrays Citation,Monterey,California,USA,1999:37-46.]在考虑网线间联系程度的同时尽量将关键路径装入箱中来减少关键路径时延。文献[李鹏等，基于网线吸收和端口占用分析的FPGA装箱算法，《计算机辅助设计与图形学学报》，23卷3期，2011年9月.]则从网线吸收和端口占用两个方面出发，能够更有效地提高后续电路的布通率。由于装箱到内部的线路时延要比外部线路时延小的多。上述方案都没有考虑到装箱后的电路路径时延会发生较大变化，流水级的关键路径也会发生变化，各个流水级的时延也会由于装箱变得不一致，而整个流水级受统一的关键路径时延约束，原有方法时序电路装箱后最优的关键路径时延由所有流水级中最长路径时延决定的。The look-up table circuit packs most of the network cables into logic blocks by boxing. Literature [Betz V, Rose J. VPR: anew packing, placement and routing tool for FPGA research [C]. Proceedings of the International Workshop on Field-Programmable Logic and Applications, London, UK, 1997: 213-222.] Through the close Contact network cables are boxed to reduce the number of external wiring. Literature [Marquardt A, Betz V, Rose J. Using cluster based logic blocks and timing-drivenpacking to improve FPGA speed and density [C]. Proceedings of International ACMSymposium on Field-Programmable Gate Arrays Citation, Monterey, California, USA, 1999: 37-46.] While considering the degree of connection between the network cables, try to put the critical path into the box to reduce the critical path delay. The literature [Li Peng et al., FPGA packing algorithm based on network cable absorption and port occupancy analysis, "Journal of Computer Aided Design and Graphics", Volume 23, Issue 3, September 2011.] starts from the two aspects of network cable absorption and port occupancy. , which can more effectively improve the routing rate of subsequent circuits. Because the line delay from packing to the inside is much smaller than the outside line delay. The above solutions do not take into account that the circuit path delay after packing will change greatly, the critical path of the pipeline will also change, and the delay of each pipeline will become inconsistent due to packing, and the entire pipeline will be affected. Unified critical path delay constraint, the optimal critical path delay after sequential circuit packing in the original method is determined by the longest path delay in all pipeline stages.

发明内容SUMMARY OF THE INVENTION

针对现有方法没有考虑到装箱后的电路路径时延会发生较大变化，整个流水级受统一的关键路径时延约束的技术问题，本发明提出一种基于寄存器重定时的多流水级时序电路装箱操作方法，充分利用各流水级路径时延的差异，可利用寄存器的重定时操作使得各流水级的时延平均，从而有效降低流水线时序电路统一关键路径时延，进一步提高了电路系统的吞吐率。Aiming at the technical problem that the existing method does not take into account that the circuit path delay will change greatly after packing, and the entire pipeline stage is constrained by a unified critical path delay, the present invention proposes a multi-pipeline stage timing sequence based on register retiming. The circuit packing operation method makes full use of the difference in the path delay of each pipeline stage. The retiming operation of the register can be used to average the delay of each pipeline stage, thereby effectively reducing the unified critical path delay of the pipeline sequential circuit and further improving the circuit system. throughput rate.

为了达到上述目的，本发明的技术方案是这样实现的：一种基于寄存器重定时的多流水级时序电路装箱操作方法，其步骤如下：In order to achieve the above-mentioned purpose, the technical scheme of the present invention is realized as follows: a multi-pipeline sequential circuit packing operation method based on register retiming, the steps are as follows:

步骤一：利用FPGA设计流程将用户设计的硬件描述语言经过逻辑综合和映射阶段处理产生查找表电路；Step 1: Use the FPGA design process to process the hardware description language designed by the user through the logic synthesis and mapping stages to generate a look-up table circuit;

步骤二：利用装箱算法对查找表电路进行装箱；Step 2: Use the packing algorithm to pack the look-up table circuit;

步骤三：根据时序电路关键路径的走向判断装箱后流水级时序电路的类型，利用装箱后时序电路关键路径时延计算方法计算装箱后流水级时序电路关键路径时延；Step 3: Determine the type of the pipeline-level sequential circuit after packing according to the direction of the critical path of the sequential circuit, and calculate the critical path delay of the pipeline-level sequential circuit after packing by using the method for calculating the critical path delay of the sequential circuit after packing;

步骤四：根据步骤三计算的关键路径时延对电路中的中间寄存器重定时。Step 4: Retime the intermediate registers in the circuit according to the critical path delay calculated in Step 3.

所述流水级时序电路中的寄存器之间设有代表寄存器间的组合电路，流水级时序电路包括标准流水线型时序电路、分支流水线型时序电路和流水级中输出端口型时序电路；分支流水线型时序电路在寄存器层面存在有分支，流水级中输出端口型电路中有流水级中存在输出端口。A combined circuit representing the registers is arranged between the registers in the pipeline level sequential circuit, and the pipeline level sequential circuit includes a standard pipeline type sequential circuit, a branch pipeline type sequential circuit and an output port type sequential circuit in the pipeline stage; the branch pipeline type sequential circuit The circuit has branches at the register level, and there are output ports in the pipeline stage in the output port type circuit in the pipeline stage.

所述装箱后时序电路的关键路径时延计算方法为：The critical path delay calculation method of the sequential circuit after packing is as follows:

(1)标准流水线型时序电路：分别计算相邻寄存器之间的流水级的最长电路路径时延，将所有相加最长电路路径时延再除以流水级个数，得到该流水级时序电路关键路径时延；(1) Standard pipeline sequential circuit: Calculate the longest circuit path delay of the pipeline stage between adjacent registers respectively, divide all the added longest circuit path delays by the number of pipeline stages, and obtain the pipeline stage sequence Circuit critical path delay;

(2)分支流水线型时序电路：利用标准流水线型时序电路求解关键路径时延方法计算寄存器构成的每条流水级电路的关键路径时延，然后选择数值较大者为整个流水级时序电路关键路径时延；(2) Branch pipeline sequential circuit: use the standard pipeline sequential circuit to solve the critical path delay method to calculate the critical path delay of each pipeline stage circuit composed of registers, and then select the larger value as the critical path of the entire pipeline stage sequential circuit delay;

(3)流水级中输出端口型时序电路：利用标准流水线型时序电路求解关键路径时延方法计算没有输出端口的每条流水线的关键路径时延A，利用标准流水线型时序电路求解关键路径时延方法计算中间输出端口所在流水级之前的所有流水级平均时延B，输出端口所在流水级的末端定于中间查找表输出端口处；选择关键路径时延A和所有流水级平均时延B较大值为整个流水级时序电路的关键路径时延。(3) Output port sequential circuit in the pipeline stage: use the standard pipeline sequential circuit to solve the critical path delay method to calculate the critical path delay A of each pipeline without output ports, and use the standard pipeline sequential circuit to solve the critical path delay The method calculates the average delay B of all pipeline stages before the pipeline stage where the intermediate output port is located, and the end of the pipeline stage where the output port is located is set at the output port of the intermediate lookup table; select the critical path delay A and the average delay B of all pipeline stages to be larger The value is the critical path delay of the entire pipeline stage sequential circuit.

所述步骤四中利用关键路径时延进行计算寄存器重定时的方向的方法为：根据步骤三计算出来的关键路径时延对整个流水线电路中各流水级路径时延进行平均分配，在电路中标注各流水级分界点；对照分界点对流水线电路中流水级间的寄存器进行重定时方向判断：如果分界点在对应寄存器前方，则对应寄存器需要向电路输入端重定时，直到分界点为止；如果分界点在对应寄存器后方，则对应寄存器需要向电路输出端重定时，直到分界点为止。The method of using the critical path delay to calculate the direction of register retiming in the step 4 is as follows: according to the critical path delay calculated in step 3, the path delays of each pipeline stage in the entire pipeline circuit are evenly distributed, and are marked in the circuit. Demarcation point of each pipeline stage; judge the retiming direction of the registers between the pipeline stages in the pipeline circuit according to the demarcation point: if the demarcation point is in front of the corresponding register, the corresponding register needs to be retimed to the circuit input until the demarcation point; if the demarcation point is in front of the corresponding register point behind the corresponding register, the corresponding register needs to be retimed to the circuit output until the demarcation point.

对于每一个需要重定时的寄存器，首先使用逻辑单元块寄存器重定时方法判断是逻辑单元块间重定时还是逻辑单元块内重定时，选定目标逻辑单元块；然后使用装箱后电路BLE寄存器重定时方法对目标逻辑单元块内部对应基本逻辑单元内的寄存器向电路前端或者后端重定时。For each register that needs to be retimed, first use the logic unit block register retiming method to determine whether it is inter-logic unit block retiming or intra-logic unit block retiming, and select the target logic unit block; The timing method retimes the registers in the corresponding basic logic unit inside the target logic unit block to the front end or back end of the circuit.

所述逻辑单元块寄存器重定时的方法：基本逻辑单元最终装箱到FPGA的逻辑单元块中从逻辑单元块的角度看存在两种情况的寄存器重定时：The method for register retiming of the described logic unit block: the basic logic unit is finally boxed into the logic unit block of the FPGA. From the perspective of the logic unit block, there are two cases of register retiming:

(1)逻辑单元块间重定时：逻辑单元块I内的BLE中的寄存器移动到逻辑单元块II内的BLE中；(1) Retiming between logic unit blocks: the register in the BLE in the logic unit block I is moved to the BLE in the logic unit block II;

(2)逻辑单元块内部重定时：逻辑单元块内其中一个BLE中的寄存器移动到逻辑单元块内另一个BLE中。(2) Internal retiming of the logic unit block: A register in one of the BLEs in the logic unit block is moved to another BLE in the logic unit block.

所述步骤二中装箱算法进行FPGA逻辑单元块装箱的方法为：In described step 2, the method that packing algorithm carries out FPGA logic unit block packing is:

(a)查找表电路网表BLE装箱，BLE中的两路选择器根据查找表电路的实际情况进行配置；(a) Lookup table circuit netlist BLE packing, the two-way selector in BLE is configured according to the actual situation of the lookup table circuit;

(b)选择一个BLE作为种子装入到目标FPGA逻辑单元块中；(b) Select a BLE as a seed to load into the target FPGA logic unit block;

(c)按照吸引函数值选择BLE继续装入到目标FPGA逻辑单元块中，直到逻辑块单元块内部BLE资源装满或者外部端口达到FPGA逻辑单元块物理上限；(c) Select BLE to continue loading into the target FPGA logic unit block according to the attraction function value, until the BLE resource inside the logic block unit block is full or the external port reaches the physical upper limit of the FPGA logic unit block;

(d)选择新的种子BLE对未装箱逻辑单元块继续装箱，直到所有BLE装箱完毕。(d) Select a new seed BLE to continue boxing unboxed LU blocks until all BLEs are boxed.

所述装箱算法中步骤(a)中：单独的LUT进行BLE装箱，BLE中的寄存器资源闲置，两路选择器配置连接为LUT的输出端口；如果是一个LUT输出端驱动一个寄存器进行BLE装箱时，则BLE中LUT和寄存器资源都被利用，两路选择器配置连接为寄存器的输出端口；如果一个LUT输出端驱动两条路径，即查找表电路同时有两个输出端口，由于BLE只能设置为一个端口输出，所以该电路需要装到两个BLE中，其中一个BLE中配置LUT资源，寄存器资源闲置，两路选择器配置连接为LUT的输出端口；另一个BLE中LUT资源配置为线路直连，寄存器资源配置，两路选择器配置连接为寄存器的输出端口。In step (a) of the packing algorithm: a single LUT performs BLE packing, the register resources in BLE are idle, and the two-way selector is configured to be connected as the output port of the LUT; if one LUT output drives a register for BLE When packing, both the LUT and register resources in BLE are used, and the two-way selector configuration is connected as the output port of the register; if one LUT output drives two paths, that is, the look-up table circuit has two output ports at the same time, because the BLE It can only be set as one port output, so the circuit needs to be installed in two BLEs, one of which is configured with LUT resources, the register resources are idle, and the two-way selector configuration is connected to the output port of the LUT; the other BLE is configured with LUT resources For line direct connection, register resource configuration, two-way selector configuration is connected as the output port of the register.

装箱后电路BLE寄存器重定时方法为：由于流水级时序电路主要资源是查找表电路，所以大部分BLE中的寄存器资源是闲置的，为后续的装箱后电路寄存器重定时创造了条件；对于流水级电路两个寄存器直接相连的情况，可以对其中的寄存器依次重定时；The circuit BLE register retiming method after packing is as follows: Since the main resource of the pipeline-level sequential circuit is the look-up table circuit, most of the register resources in the BLE are idle, creating conditions for subsequent circuit register retiming after packing; for In the case where two registers of the pipeline stage circuit are directly connected, the registers can be retimed in sequence;

根据寄存器重定时方向可以分为以下两种类型：According to the register retiming direction, it can be divided into the following two types:

(1)BLE寄存器向电路前端重定时：第二BLE中寄存器需要向第一BLE重定时，如果第一BLE内部寄存器资源闲置，则可将第二BLE中的寄存器重定时到第一BLE内部，同时第一BLE中两路选择器配置连接为寄存器的输出端口，第二BLE中两路选择器配置连接为查找表的输出端口；如果第一BLE内部寄存器资源占用，则可首先将第一BLE向电路前端重定时腾出资源供电路后端的寄存器重定时；(1) The BLE register is retimed to the front end of the circuit: the register in the second BLE needs to be retimed to the first BLE. If the internal register resources of the first BLE are idle, the register in the second BLE can be retimed to the inside of the first BLE. At the same time, the two-way selectors in the first BLE are configured to be connected to the output ports of the register, and the two-way selectors in the second BLE are configured to be connected to the output ports of the lookup table; if the internal register resources of the first BLE are occupied, the first BLE Retiming to the front end of the circuit frees up resources for register retiming at the back end of the circuit;

(2)BLE寄存器向电路后端重定时：第一BLE和第二BLE中的寄存器需要向第三BLE重定时，如果第三BLE内部寄存器资源闲置，则可将第一BLE中和第二BLE中的寄存器重定时到第三BLE内部，同时第三BLE中两路选择器配置连接为寄存器的输出端口，第一BLE和第二BLE中两路选择器配置连接为查找表的输出端口；如果第三BLE内部寄存器资源占用，则可首先将第三BLE向电路后端重定时腾出资源供电路前端的寄存器重定时。(2) The BLE register is retimed to the back end of the circuit: the registers in the first BLE and the second BLE need to be retimed to the third BLE. If the internal register resources of the third BLE are idle, the first BLE and the second BLE can be neutralized. The register in the third BLE is retimed to the inside of the third BLE, and the two-way selector configuration in the third BLE is connected as the output port of the register, and the two-way selector configuration in the first BLE and the second BLE is connected as the output port of the lookup table; if The third BLE internal register resource is occupied, then the third BLE can be retimed to the back end of the circuit to free up resources for the register retiming of the front end of the circuit.

本发明的有益效果：利用装箱到逻辑块中的寄存器可以移动的特性，可以降低整个时序电路流水线的关键路径时延。具体流水线电路设计中，关键路径时延(CriticalPath Delay，CPD)(即流水线中所包含的所有流水级中最长流水级时延)决定了系统吞吐率的大小，CPD越小，系统吞吐率越高，本发明通过寄存器重定时可以将各流水级的时延尽可能均匀分布，从而提高流水线应用的吞吐率，吞吐率的提高也就意味着应用系统在单位时间内处理的数据越多。The beneficial effects of the present invention are that the critical path delay of the entire sequential circuit pipeline can be reduced by utilizing the feature that the registers packed into the logic block can be moved. In the design of a specific pipeline circuit, the critical path delay (CPD) (that is, the longest pipeline stage delay among all pipeline stages included in the pipeline) determines the system throughput rate. The smaller the CPD, the higher the system throughput rate. High, the present invention can distribute the delay of each pipeline stage as evenly as possible through register retiming, thereby improving the throughput rate of pipeline applications. The increase in throughput rate means that the application system processes more data per unit time.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为FPGA的设计流程。Figure 1 shows the design flow of the FPGA.

图2为本发明的流程图。Figure 2 is a flow chart of the present invention.

图3为FPGA逻辑单元块结构图，其中，(a)为FPGA，(b)为逻辑单元块，(c)为基础逻辑单元。FIG. 3 is a structural diagram of an FPGA logic unit block, wherein (a) is an FPGA, (b) is a logic unit block, and (c) is a basic logic unit.

图4为查找表电路BLE装箱实例图，其中，(a)为查找表BLE装箱，(b)为查找表输出端加寄存器BLE装箱，(c)为查找表输出端驱动两条路径BLE装箱。Figure 4 is a diagram of an example of look-up table circuit BLE packing, in which (a) is the look-up table BLE packing, (b) is the look-up table output plus register BLE packing, (c) is the look-up table output to drive two paths BLE boxing.

图5为标准流水线型查找表电路。Figure 5 is a standard pipelined look-up table circuit.

图6为分支流水线型查找表电路。Figure 6 is a branch-pipelined look-up table circuit.

图7为流水级中有输出端口型查找表电路。Figure 7 is a look-up table circuit with an output port in the pipeline stage.

图8为本发明装箱后查找表电路BLE寄存向电路前端重定时的示意图。FIG. 8 is a schematic diagram of the retiming of the lookup table circuit BLE register to the front end of the circuit after packing according to the present invention.

图9为本发明装箱后查找表电路BLE寄存向电路后端重定时的示意图。FIG. 9 is a schematic diagram of retiming the lookup table circuit BLE register to the back end of the circuit after packing according to the present invention.

图10为本发明逻辑单元块寄存器重定时的示意图。FIG. 10 is a schematic diagram of the retiming of the logic unit block register according to the present invention.

图11为求取单个流水级关键路径时延(CPD)和时间裕量的流程图。Figure 11 is a flow chart for finding the critical path delay (CPD) and time slack for a single pipeline stage.

图12为流水级间寄存器重定时方向判断的示意图。FIG. 12 is a schematic diagram showing the direction determination of register retiming between pipeline stages.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有付出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

如图2所示，一种基于寄存器重定时的多流水级时序电路装箱操作方法，其步骤如下：As shown in Figure 2, a multi-pipeline sequential circuit packing operation method based on register retiming, the steps are as follows:

步骤一：利用FPGA设计流程将用户设计的硬件描述语言经过逻辑综合和映射阶段处理产生查找表电路。Step 1: Use the FPGA design flow to process the hardware description language designed by the user through the logic synthesis and mapping stages to generate a look-up table circuit.

如图1所示，将用户设计的硬件描述语言经过FPGA的设计流程处理产生查找表电路，包括：将用户设计的硬件描述语言经过逻辑综合阶段产生门级网表(与非门电路网表)，将门级网表经过映射产生查找表(Look Up Table，LUT)电路，确定查找表电路符合多流水级时序电路要求。As shown in Figure 1, the hardware description language designed by the user is processed through the FPGA design process to generate a look-up table circuit, including: generating a gate-level netlist (NAND gate circuit netlist) from the hardware description language designed by the user through the logic synthesis stage , the gate-level netlist is mapped to generate a Look Up Table (LUT) circuit, and it is determined that the LUT circuit meets the requirements of a multi-pipeline sequential circuit.

步骤二：利用装箱算法对查找表电路进行装箱。Step 2: Use the binning algorithm to bin the look-up table circuit.

FPGA逻辑单元块结构如图3所示，FPGA逻辑单元块由两层结构，第一层逻辑单元块和第二层基本逻辑单元(Basic Logic Element，BLE)。FPGA是由逻辑单元块组成的，如图6(a)所示，而逻辑单元块是有数个连线资源共享的BLE组成的，如图6(b)所示。单个BLE内部则由一个LUT、一个寄存器资源和一个两路选择器组成，如图6(c)所示。通常情况下，根据具体的电路来配置BLE内部的查找表和寄存器资源。The structure of the FPGA logic unit block is shown in Figure 3. The FPGA logic unit block consists of a two-layer structure, the first layer of logic unit blocks and the second layer of basic logic units (Basic Logic Element, BLE). The FPGA is composed of logic unit blocks, as shown in Figure 6(a), and the logic unit block is composed of BLEs with several connection resources shared, as shown in Figure 6(b). A single BLE consists of a LUT, a register resource and a two-way selector, as shown in Figure 6(c). Usually, the lookup table and register resources inside BLE are configured according to the specific circuit.

传统FPGA逻辑单元块装箱步骤如下：The traditional FPGA logic cell block packing steps are as follows:

(1)查找表(LUT)电路网表BLE装箱(1) Look-up table (LUT) circuit netlist BLE packing

查找表电路BLE装箱时，BLE中的两路选择器根据电路的实际情况进行配置。When the lookup table circuit BLE is boxed, the two-way selector in the BLE is configured according to the actual situation of the circuit.

如图4(a)所示，单独的LUT进行BLE装箱，BLE中的寄存器资源闲置，两路选择器配置连接为LUT的输出端口。如图4(b)所示，如果是一个LUT输出端驱动一个寄存器进行BLE装箱时，则BLE中LUT和寄存器资源都被利用，两路选择器配置连接为寄存器的输出端口。如图4(c)所示，如果一个LUT输出端驱动两条路径，即查找表电路同时有两个输出端口，由于BLE只能设置为一个端口输出，所以该电路需要装到两个BLE中。BLE1中配置LUT资源，寄存器资源闲置，两路选择器配置连接为LUT的输出端口；BLE2中LUT资源配置为线路直连，寄存器资源配置，两路选择器配置连接为寄存器的输出端口。As shown in Figure 4(a), a single LUT is packed for BLE, the register resources in BLE are idle, and the two-way selector configuration is connected to the output port of the LUT. As shown in Figure 4(b), if a LUT output drives a register for BLE packing, both the LUT and register resources in BLE are used, and the two selectors are configured to be connected as the output ports of the register. As shown in Figure 4(c), if one LUT output drives two paths, that is, the look-up table circuit has two output ports at the same time. Since BLE can only be set to output from one port, the circuit needs to be installed in two BLEs. . The LUT resource is configured in BLE1, the register resource is idle, and the two-way selector is configured to be connected to the output port of the LUT; the LUT resource in BLE2 is configured to be directly connected to the line, the register resource is configured, and the two-way selector is configured to be connected to the output port of the register.

(2)选择一个BLE作为种子装入到目标FPGA逻辑单元块中；(2) Select a BLE to be loaded into the target FPGA logic unit block as a seed;

(3)按照吸引函数值选择BLE继续装入到目标FPGA逻辑单元块中；(3) Select BLE to continue loading into the target FPGA logic unit block according to the attraction function value;

吸引函数根据背景技术中文献[李鹏等，基于网线吸收和端口占用分析的FPGA装箱算法，《计算机辅助设计与图形学学报》，23卷3期，2011年9月.]提出的公式算出。这种BLE装箱过程一直持续到逻辑块单元块内部BLE资源装满或者外部端口达到FPGA逻辑单元块物理上限；The attraction function is calculated according to the formula proposed in the literature [Li Peng et al., FPGA packing algorithm based on network cable absorption and port occupancy analysis, "Journal of Computer Aided Design and Graphics", Volume 23, Issue 3, September 2011.]. This BLE boxing process continues until the internal BLE resources of the logic block unit block are full or the external port reaches the physical upper limit of the FPGA logic unit block;

(4)选择新的种子BLE对未装箱逻辑单元块继续装箱，直到所有BLE装箱完毕。(4) Select a new seed BLE to continue packing unpacked LU blocks until all BLEs are packed.

步骤三：根据时序电路关键路径的走向判断装箱后流水级时序电路的类型，利用装箱后时序电路关键路径时延计算方法计算装箱后流水级时序电路关键路径时延。Step 3: Determine the type of the pipeline-level sequential circuit after packing according to the direction of the critical path of the sequential circuit, and calculate the critical path delay of the pipeline-level sequential circuit after packing by using the critical path delay calculation method of the sequential circuit after packing.

单个流水级关键路径时延(CPD)和时间裕量求解过程：Single pipeline stage critical path delay (CPD) and time slack solution process:

利用图11所示方法求出单个流水级关键路径时延(CPD)和时间裕量。具体方法为：Use the method shown in Figure 11 to find the critical path delay (CPD) and time slack for a single pipeline stage. The specific method is:

(a)计算寄存器之间各条边的时延；(a) Calculate the delay of each edge between the registers;

(b)将流水级输入寄存器节点的T_arrival值设置为0；(b) Set the T _arrival value of the pipeline stage input register node to 0;

(c)计算其他节点的T_arrival值：

其中，i为流水线中的任意一个路径的起点，j为该路径的终点，T_arrival(i)为节点i的信号到达时间，T_arrival(j)为节点j的信号到达时间，fanin(j)代表连接节点j前的任意一个节点，delay(i,j)代表路径(i，j)的时延；(c) Calculate the T _arrival value of other nodes:

Among them, i is the starting point of any path in the pipeline, j is the end point of the path, T _arrival (i) is the signal arrival time of node i, T _arrival (j) is the signal arrival time of node j, fanin(j) Represents any node before connecting node j, and delay(i, j) represents the delay of path (i, j);

(d)将所有流水级输出端口寄存器T_required值设置为关键路径时延：

其中，registerout为任意一个输出端口的寄存器；(d) Set the value of all pipeline output port registers T _required to the critical path delay:

Among them, registerout is the register of any output port;

(e)利用下述公式计算其他节点的T_required值为：

其中，fanout(i)代表节点i向后所驱动的任意节点，T_required(i)表示节点i的信号最迟到达时间，T_requierd(j)表示节点j的信号最迟到达时间；(e) Calculate the T _required value of other nodes using the following formula:

Among them, fanout(i) represents any node driven backward by node i, T _required (i) represents the latest arrival time of the signal of node i, and T _requierd (j) represents the latest arrival time of the signal of node j;

(f)利用下述公式计算电路中任意连接的时间裕量值：slack(i,j)＝T_requierd(j)-T_arrival(i)-delay(i,j)。(f) Use the following formula to calculate the time slack value for any connection in the circuit: slack(i,j)=T _requierd (j)-T _arrival (i)-delay(i,j).

流水线电路关键路径时延(CPD)求解过程：The critical path delay (CPD) solution process of the pipeline circuit:

流水级电路中的寄存器之间设有代表寄存器间的组合电路，流水级时序电路包括标准流水线型时序电路、分支流水线型时序电路和流水级中输出端口型时序电路。流水级根据其路径的走向分为以下几种类型：A combinational circuit representing the registers is arranged between the registers in the pipeline stage circuit. The pipeline stage sequential circuit includes a standard pipeline type sequential circuit, a branch pipeline type sequential circuit and an output port type sequential circuit in the pipeline stage. Flow stages are divided into the following types according to the direction of their paths:

(1)标准流水线型(1) Standard pipeline type

如图5所示，寄存器a、b、c将电路分割为前后相连的两个流水级，可按照单个流水级关键路径时延(CPD)和时间裕量求解过程分别求出寄存器a到b和寄存器b到c之间最长电路路径时延，将其相加再除以流水级个数2，可以求出该流水级时序电路关键路径时延。后续可通过对寄存器b重定时操作使两个流水级都满足关键路径时延要求。标准流水线型时序电路关键路径时延计算方法是后续各类型时序电路计算关键路径时延的基础。As shown in Figure 5, registers a, b, and c divide the circuit into two pipeline stages that are connected before and after, and the registers a to b and The longest circuit path delay between registers b and c is added and divided by the number of pipeline stages 2, and the critical path delay of the sequential circuit of this pipeline level can be calculated. Subsequently, the two pipeline stages can meet the critical path delay requirements by retiming the operation of register b. The critical path delay calculation method of standard pipeline sequential circuits is the basis for calculating the critical path delay of subsequent types of sequential circuits.

(2)分支流水线型(2) branch pipeline type

分支流水线型时序电路在寄存器层面存在有分支。如图6所示，寄存器a、c、e、f和寄存器b、d、e、f构成两条流水级电路，它们的起点分别为寄存器a和b，终点都在寄存器f。可利用图3所示的标准流水线型求解关键路径时延方法分别求出寄存器a、c、e、f和寄存器b、d、e、f构成的两条流水级电路关键路径时延，然后选择数值较大的为图4所示的整个流水级时序电路关键路径时延。后续可通过对寄存器c、d、e重定时操作使两条分支路线上的所有流水级都满足关键路径时延要求。The branch pipeline sequential circuit has branches at the register level. As shown in Figure 6, registers a, c, e, f and registers b, d, e, and f form two pipeline stages, their starting points are registers a and b, respectively, and the end points are register f. The critical path delay of the two pipelined circuits composed of registers a, c, e, and f and registers b, d, e, and f can be calculated by using the standard pipeline method shown in Figure 3, and then select The larger value is the critical path delay of the entire pipeline-level sequential circuit shown in Figure 4. Subsequently, all pipeline stages on the two branch routes can meet the critical path delay requirement by retiming the operations on registers c, d, and e.

(3)流水级中输出端口型(3) The output port type in the flow stage

流水级中输出端口型电路中有流水级中存在输出端口，如图7所示，寄存器b和c之间的流水级内有查找表输出驱动了两条路径，一条通往流水级终端寄存器c，另一条为输出端口。因为受该输出端口限制，其后面的寄存器在该处无法继续向电路前端重定时，因此计算该时序电路关键路径时延的方法分为一下几个步骤：There are output ports in the pipeline stage in the output port type circuit in the pipeline stage. As shown in Figure 7, there is a lookup table output in the pipeline stage between registers b and c to drive two paths, one leads to the pipeline stage terminal register c , and the other is the output port. Because of the limitation of the output port, the following registers cannot continue to retime to the front end of the circuit, so the method for calculating the critical path delay of the sequential circuit is divided into the following steps:

利用图5所示的标准流水线型求解关键路径时延方法求出寄存器a、b、c、d流水级电路关键路径时延A。计算中间输出端口所在流水级之前的所有流水级平均时延，输出端口所在流水级的末端定于中间查找表输出端口处，计算出平均流水级关键路径时延B。选择关键路径时延A和平均流水级关键路径时延B较大值为图7所示的整个流水级时序电路关键路径时延。后续可通过对寄存器b和c重定时操作使整个时序电路所有流水级都满足关键路径时延要求。The critical path delay A of the pipeline stage circuit of registers a, b, c, d is obtained by using the standard pipeline method for solving the critical path delay shown in FIG. 5 . Calculate the average delay of all pipeline stages before the pipeline stage where the intermediate output port is located. The end of the pipeline stage where the output port is located is set at the output port of the intermediate lookup table, and the average pipeline level critical path delay B is calculated. The larger value of the critical path delay A and the average pipeline-level critical path delay B is selected as the critical path delay of the entire pipeline-level sequential circuit shown in FIG. 7 . Subsequently, all pipeline stages of the entire sequential circuit can meet the critical path delay requirements by retiming the operation of registers b and c.

利用关键路径时延进行计算寄存器重定时方向的步骤：Steps to calculate register retiming direction using critical path delay:

根据步骤三计算出来的关键路径时延对整个流水线电路中各流水级路径时延进行平均分配，在电路中标注各流水级分界点；如图12所示，根据关键路径时延可以计算出给流水线电路中流水级分界点应该为A、B。According to the critical path delay calculated in step 3, the path delay of each pipeline stage in the entire pipeline circuit is evenly distributed, and the demarcation points of each pipeline stage are marked in the circuit; as shown in Figure 12, according to the critical path delay, the given The demarcation points of the pipeline stages in the pipeline circuit should be A and B.

对照分界点对流水线电路中流水级间的寄存器进行重定时方向判断：如果分界点在对应寄存器前方，则对应寄存器需要向电路输入端重定时，直到分界点为止。如图12所示，分界点A在对应寄存器b的前方，寄存器b则需要向电路输入端重定时，直到分界点A。如果分界点在对应寄存器后方，则对应寄存器需要向电路输出端重定时，直到分界点为止。如图12所示，分界点B在对应寄存器c的后方，寄存器b则需要向电路输出端重定时，直到分界点B。The retiming direction of the registers between the pipeline stages in the pipeline circuit is judged against the demarcation point: if the demarcation point is in front of the corresponding register, the corresponding register needs to be retimed to the circuit input until the demarcation point. As shown in Figure 12, the demarcation point A is in front of the corresponding register b, and the register b needs to be retimed to the circuit input until the demarcation point A is reached. If the demarcation point is behind the corresponding register, the corresponding register needs to be retimed to the circuit output until the demarcation point. As shown in Figure 12, the demarcation point B is behind the corresponding register c, and the register b needs to be re-timed to the circuit output until the demarcation point B.

装箱后电路BLE寄存器重定时：由于流水级时序电路主要资源是LUT，所以大部分BLE中的寄存器资源是闲置的，这就为后续的装箱后电路寄存器重定时创造了条件。实际电路中，流水级电路两个寄存器直接相连的情况也极少出现，如果出现，可以对其中的寄存器依次重定时，这样也能保证装箱后电路寄存器重定时操作的进行。Circuit BLE register retiming after packing: Since the main resource of the pipeline-level sequential circuit is LUT, most of the register resources in BLE are idle, which creates conditions for subsequent circuit register retiming after packing. In the actual circuit, it is rare that the two registers of the pipeline circuit are directly connected. If it occurs, the registers can be retimed in sequence, which can also ensure the retiming operation of the circuit registers after packing.

装箱后电路BLE寄存器重定时根据寄存器重定时方向可以分为以下两种类型：The circuit BLE register retiming after boxing can be divided into the following two types according to the register retiming direction:

(1)BLE寄存器向电路前端重定时：(1) The BLE register retimes to the front end of the circuit:

如图8所示，BLE2中寄存器需要向BLE1重定时，如果BLE1内部寄存器资源闲置，则可将BLE2中的寄存器重定时到BLE1内部，同时BLE1中两路选择器配置连接为寄存器的输出端口，BLE2中两路选择器配置连接为查找表的输出端口。如果BLE1内部寄存器资源占用，则可首先将其向电路前端重定时腾出资源供电路后端的寄存器重定时。As shown in Figure 8, the registers in BLE2 need to be retimed to BLE1. If the internal register resources of BLE1 are idle, the registers in BLE2 can be retimed to the interior of BLE1. At the same time, the two selectors in BLE1 are configured to be connected to the output ports of the registers. The two-way selector in BLE2 is configured to be connected as the output port of the lookup table. If the BLE1 internal register resource is occupied, it can be retimed to the front end of the circuit to free up resources for the register retiming of the back end of the circuit.

(2)BLE寄存器向电路后端重定时(2) The BLE register retimes to the back end of the circuit

如图9所示，BLE1和BLE2中的寄存器需要向BLE3重定时，如果BLE3内部寄存器资源闲置，则可将BLE1和中BLE2中的寄存器重定时到BLE3内部，同时BLE3中两路选择器配置连接为寄存器的输出端口，BLE1和BLE2中两路选择器配置连接为查找表的输出端口。如果BLE3内部寄存器资源占用，则可首先将其向电路后端重定时腾出资源供电路前端的寄存器重定时。As shown in Figure 9, the registers in BLE1 and BLE2 need to be retimed to BLE3. If the internal register resources of BLE3 are idle, the registers in BLE1 and BLE2 can be retimed to the interior of BLE3, and the two selectors in BLE3 are configured to connect For the output port of the register, the two-way selector configuration in BLE1 and BLE2 is connected as the output port of the lookup table. If the BLE3 internal register resource is occupied, it can be retimed to the back end of the circuit to free up resources for the register retiming of the front end of the circuit.

逻辑单元块寄存器重定时：Logical unit block register retiming:

由于基本逻辑单元BLE最终装箱到FPGA的逻辑单元块中，因此从逻辑单元块的角度看存在两种情况的寄存器重定时。Since the basic logic unit BLE is eventually boxed into the logic unit block of the FPGA, there are two cases of register retiming from the logic unit block perspective.

(1)逻辑单元块间重定时：如图10所示，逻辑单元块A内BLE C中的寄存器移动到逻辑单元块B内BLE D中就属于逻辑单元块间寄存器重定时。重定时对BLE的内部产生的变化由装箱后电路BLE寄存器重定时描述。(1) Retiming between logic unit blocks: As shown in Figure 10, moving the register in BLE C in logic unit block A to BLE D in logic unit block B belongs to register retiming between logic unit blocks. Internally generated changes to BLE by retiming are described by the retiming of the BLE register after the boxing circuit.

(2)逻辑单元块内部重定时：如图10所示，逻辑单元块B内BLE A中的寄存器移动到逻辑单元块B内BLE B中就属于逻辑单元块内部重定时。重定时对BLE的内部产生的变化由装箱后电路BLE寄存器重定时描述。(2) Internal retiming of the logic unit block: As shown in Figure 10, the movement of the register in the BLE A in the logic unit block B to the BLE B in the logic unit block B belongs to the internal retiming of the logic unit block. Internally generated changes to BLE by retiming are described by the retiming of the BLE register after the boxing circuit.

以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included in the scope of the present invention. within the scope of protection.

Claims

1. a multi-pipeline sequential circuit packing operation method based on register retiming, is characterized in that, its steps are as follows:

Step 1: Use the FPGA design process to process the hardware description language designed by the user through the logic synthesis and mapping stages to generate a look-up table circuit;

Step 2: Use the packing algorithm to pack the look-up table circuit;

Step 3: Determine the type of the pipeline-level sequential circuit after packing according to the direction of the critical path of the sequential circuit, and calculate the critical path delay of the pipeline-level sequential circuit after packing by using the method for calculating the critical path delay of the sequential circuit after packing;

The critical path delay calculation method of the sequential circuit after packing is as follows:

(1) Standard pipeline sequential circuit: Calculate the longest circuit path delay of the pipeline stage between adjacent registers respectively, divide all the added longest circuit path delays by the number of pipeline stages, and obtain the pipeline stage sequence Circuit critical path delay;

(2) Branch pipeline sequential circuit: use the standard pipeline sequential circuit to solve the critical path delay method to calculate the critical path delay of each pipeline stage circuit composed of registers, and then select the larger value as the critical path of the entire pipeline stage sequential circuit delay;

(3) Output port sequential circuit in the pipeline stage: use the standard pipeline sequential circuit to solve the critical path delay method to calculate the critical path delay A of each pipeline without an output port, and use the standard pipeline sequential circuit to solve the critical path delay The method calculates the average delay B of all pipeline stages before the pipeline stage where the intermediate output port is located, and the end of the pipeline stage where the output port is located is set at the output port of the intermediate lookup table; select the critical path delay A and the average delay B of all pipeline stages to be larger The value is the critical path delay of the entire pipeline-level sequential circuit;

Step 4: Retime the intermediate registers in the circuit according to the critical path delay calculated in Step 3;

The circuit BLE register retiming method after packing is as follows: Since the main resource of the pipeline-level sequential circuit is the look-up table circuit, most of the register resources in the BLE are idle, creating conditions for subsequent circuit register retiming after packing; for In the case where two registers of the pipeline stage circuit are directly connected, the registers can be retimed in sequence;

According to the register retiming direction, it can be divided into the following two types:

(1) The BLE register is retimed to the front end of the circuit: the register in the second BLE needs to be retimed to the first BLE. If the internal register resources of the first BLE are idle, the register in the second BLE can be retimed to the inside of the first BLE. At the same time, the two-way selectors in the first BLE are configured to be connected to the output ports of the register, and the two-way selectors in the second BLE are configured to be connected to the output ports of the lookup table; if the internal register resources of the first BLE are occupied, the first BLE Retiming to the front end of the circuit frees up resources for register retiming at the back end of the circuit;

(2) The BLE register is retimed to the back end of the circuit: the registers in the first BLE and the second BLE need to be retimed to the third BLE. If the internal register resources of the third BLE are idle, the first BLE and the second BLE can be neutralized. The register in the third BLE is retimed to the inside of the third BLE, and the two-way selector configuration in the third BLE is connected as the output port of the register, and the two-way selector configuration in the first BLE and the second BLE is connected as the output port of the lookup table; if The third BLE internal register resource is occupied, then the third BLE can be retimed to the back end of the circuit to free up resources for the register retiming of the front end of the circuit.

2. the multi-pipeline sequential circuit packing operation method based on register retiming according to claim 1, is characterized in that, between the registers in the described pipeline level sequential circuit, be provided with the combination circuit representing between the registers, pipeline level Sequential circuits include standard pipeline type sequential circuits, branch pipeline type sequential circuits and output port type sequential circuits in the pipeline stage; branch pipeline type sequential circuits have branches at the register level, and output port type circuits in the pipeline stage have outputs in the pipeline stage. port.

3. the multi-pipeline sequential circuit packing operation method based on register retiming according to claim 1 and 2, is characterized in that, utilizes critical path time delay in described step 4 to carry out the method for calculating the direction of register retiming as : According to the critical path delay calculated in step 3, the path delay of each pipeline stage in the entire pipeline circuit is evenly distributed, and the demarcation point of each pipeline stage is marked in the circuit; Timing direction judgment: if the demarcation point is in front of the corresponding register, the corresponding register needs to be retimed to the circuit input until the demarcation point; if the demarcation point is behind the corresponding register, the corresponding register needs to be retimed to the circuit output until the demarcation point. until.

4. the multi-pipeline sequential circuit packing operation method based on register retiming according to claim 3, is characterized in that, for each register that needs retiming, at first use logic unit block register retiming method to judge that it is logic unit Retiming between blocks or retiming within a logic unit block, select the target logic unit block; then use the post-packing circuit BLE register retiming method to retime the registers in the corresponding basic logic unit inside the target logic unit block to the front-end or back-end of the circuit. timing.

5. the multi-pipeline sequential circuit packing operation method based on register retiming according to claim 4, is characterized in that, the method for described logic unit block register retiming: basic logic unit is finally packed into the logic unit of FPGA There are two cases of register retiming in the block from the perspective of the logic unit block:

(1) Retiming between logic unit blocks: the registers in the BLE in the logic unit block I are moved to the BLE in the logic unit block II;

(2) Internal retiming of the logic unit block: A register in one of the BLEs in the logic unit block is moved to another BLE in the logic unit block.

6. the multi-pipeline sequential circuit packing operation method based on register retiming according to claim 1 or 5, is characterized in that, the method that packing algorithm carries out FPGA logic unit block packing in described step 2 is:

(a) Lookup table circuit netlist BLE packing, the two-way selector in BLE is configured according to the actual situation of the lookup table circuit;

(b) Select a BLE as a seed to load into the target FPGA logic unit block;

(c) Select BLE according to the attraction function value and continue to load it into the target FPGA logic unit block until the internal BLE resources of the logic block unit block are full or the external port reaches the physical upper limit of the FPGA logic unit block;

(d) Select a new seed BLE to continue boxing unboxed LU blocks until all BLEs are boxed.

7. The multi-pipeline sequential circuit packing operation method based on register retiming according to claim 6, wherein in step (a) of the packing algorithm: a separate LUT performs BLE packing, and the BLE The register resources of the BLE are idle, and the two-way selectors are configured to be connected as the output ports of the LUT; if one LUT output drives a register for BLE packing, both the LUT and register resources in the BLE are used, and the two-way selectors are configured and connected as The output port of the register; if one LUT output drives two paths, that is, the look-up table circuit has two output ports at the same time. Since BLE can only be set to one port output, the circuit needs to be installed in two BLEs, one of which is BLE. The LUT resource is configured in the middle, the register resource is idle, and the two-way selector is configured to be connected as the output port of the LUT; the other LUT resource in the BLE is configured as a line direct connection, the register resource is configured, and the two-way selector is configured to be connected as the output port of the register.