CN115422876A

CN115422876A - High-Level Synthesis Process Layout Method

Info

Publication number: CN115422876A
Application number: CN202211039147.2A
Authority: CN
Inventors: 王自鑫; 张仕杰; 陈弟虎; 何晓曦
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-08-29
Filing date: 2022-08-29
Publication date: 2022-12-02
Also published as: WO2024045435A1

Abstract

The invention provides a high-level comprehensive process layout method, which comprises the following steps: obtaining a circuit description of a target circuit, and constructing and obtaining a control data flow diagram corresponding to the target circuit according to the circuit description; the control data flow graph is a directed graph representing the operation of a target circuit; dividing the control data flow graph through a plane planning algorithm to obtain layout constraint; scheduling the control data flow graph, and binding a scheduling result to obtain register transmission level description; obtaining a target netlist according to the transfer level description of the register by the layout constraint, and determining the flow layout of the target circuit according to the target netlist; the method reduces the congestion situation of the layout and the wiring through high-level synthesis, and can also reduce the delay increase caused by crossing an FPGA block boundary in the wiring process; the method can be widely applied to the technical field of circuit simulation.

Description

High-Level Synthesis Process Layout Method

技术领域technical field

本发明涉及电路仿真技术领域，尤其涉及一种高层次综合的流程布局方法。The invention relates to the technical field of circuit simulation, in particular to a high-level integrated flow layout method.

背景技术Background technique

高层次综合(High-level Synthesis，HLS)，指的是将高层次语言描述的逻辑结构，自动转换成低抽象级语言描述的电路模型的过程。HLS工具具有高效快速的特点，能够减少硬件工程师设计的时间，同时也让软件工程师完成硬件设计。High-level synthesis (High-level Synthesis, HLS) refers to the process of automatically converting the logical structure described in a high-level language into a circuit model described in a low-level language. HLS tools are efficient and fast, which can reduce the design time of hardware engineers and allow software engineers to complete hardware design.

但是，在相关技术方案中，HLS设计与手工设计的质量存在差距的一大原因是难以在HLS层面上准确地估计互联延迟，难以得到较好的全局物理布局。特别是在现场可编程逻辑门阵列(Field Programmable Gate Array，FPGA)布局布线中，物理综合器往往会使用距离较近的资源，带来布局布线的拥塞，增加整体的布线长度，进而降低电路的吞吐量。此外，FPGA的资源以可编程逻辑块(Configurable Logic Block，CLB)的形式排布，不同Block的资源情况可能不同，当忽略边界跨越Block进行互联时，容易带来更长的布线和延时。However, in related technical solutions, a major reason for the quality gap between HLS design and manual design is that it is difficult to accurately estimate the interconnection delay at the HLS level, and it is difficult to obtain a better global physical layout. Especially in Field Programmable Gate Array (Field Programmable Gate Array, FPGA) layout and routing, physical synthesizers often use resources that are relatively close to each other, which brings congestion in layout and routing, increases the overall wiring length, and then reduces the cost of the circuit. throughput. In addition, FPGA resources are arranged in the form of Configurable Logic Blocks (CLBs). The resources of different Blocks may be different. When interconnection across Blocks is ignored, longer wiring and delays are likely to result.

发明内容Contents of the invention

有鉴于此，为至少部分解决上述技术问题或者缺陷之一，本发明实施例的目的在于提供一种能够有效减少布线布局拥塞，降低时延的高层次综合的流程布局方法。In view of this, in order to at least partially solve one of the above-mentioned technical problems or defects, the purpose of the embodiments of the present invention is to provide a high-level integrated process layout method that can effectively reduce routing congestion and delay.

本申请技术方案提供了高层次综合的流程布局方法，包括以下步骤：The technical solution of this application provides a high-level comprehensive process layout method, including the following steps:

获取目标电路的电路描述，根据所述电路描述构建得到所述目标电路对应的控制数据流图；所述控制数据流图为表征目标电路运算操作的有向图；Obtaining a circuit description of the target circuit, and constructing a control data flow graph corresponding to the target circuit according to the circuit description; the control data flow graph is a directed graph representing the operation of the target circuit;

通过平面规划算法对所述控制数据流图进行分割，得到布局约束；Segmenting the control data flow graph through a plane planning algorithm to obtain layout constraints;

对所述控制数据流图进行调度，将调度结果进行绑定得到寄存器传输级描述；Scheduling the control data flow graph, and binding the scheduling results to obtain a register transmission level description;

根据所述布局约束对所述寄存器传输级描述得到目标网表，根据所述目标网表确定所述目标电路的流程布局。A target netlist is obtained by describing the register transfer level according to the layout constraints, and a flow layout of the target circuit is determined according to the target netlist.

在本申请方案的一种可行的实施例中，所述通过平面规划算法对所述控制数据流图进行分割，得到布局约束，包括：In a feasible embodiment of the solution of the present application, the planar planning algorithm is used to segment the control data flow graph to obtain layout constraints, including:

获取所述电路描述中所述目标电路的FPGA架构；Obtain the FPGA architecture of the target circuit described in the circuit description;

根据所述控制数据流图中数据流进程对应的函数，进行编译得到寄存器转换级的电路模块；Compiling according to the function corresponding to the data flow process in the control data flow diagram to obtain the circuit module of the register conversion stage;

根据所述电路模块对所述FPGA架构进行分区，确定分区结果的成本函数；Partition the FPGA architecture according to the circuit modules, and determine the cost function of the partition results;

通过对所述FPGA架构的分区结果进行分割迭代，确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值，输出得到所述布局约束。By segmenting and iterating the partition results of the FPGA architecture, it is determined that the cost function is a minimum value or the resource in the partition reaches a critical value of the resource constraint, and the layout constraint is obtained as an output.

在本申请方案的一种可行的实施例中，所述对所述控制数据流图进行调度，将调度结果进行绑定得到寄存器传输级描述，包括：In a feasible embodiment of the solution of the present application, the scheduling of the control data flow graph, and binding the scheduling results to obtain a register transmission level description include:

对所述控制数据流图的子图进行调度；Scheduling the subgraphs of the control dataflow graph;

对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡；performing delay balancing on the connection line insertion pipeline between the nodes in the control data flow graph;

将调度结果以及延迟平衡后的结果进行数学整合，根据数学整合后的结果与所述目标电路进行绑定，得到所述寄存器传输级描述。Mathematically integrating the scheduling results and the delay-balanced results, and binding the mathematically integrated results with the target circuit to obtain the register transfer level description.

在本申请方案的一种可行的实施例中，所述布局约束包括时序约束或物理约束中的至少之一；所述目标网表包括综合网表或布局布线网表中的至少之一；所述根据所述布局约束对所述寄存器传输级描述得到目标网表，根据所述目标网表确定所述目标电路的流程布局，包括：In a feasible embodiment of the solution of the present application, the placement constraints include at least one of timing constraints or physical constraints; the target netlist includes at least one of a synthesis netlist or a place-and-route netlist; Describe the register transfer level according to the layout constraints to obtain a target netlist, and determine the flow layout of the target circuit according to the target netlist, including:

根据所述时序约束和/或所述物理约束构建得到第一输入；constructing a first input according to the timing constraints and/or the physical constraints;

根据所述寄存器传输级描述构建得到第二输入；Constructing according to the register transfer level description to obtain a second input;

通过FPGA物理综合器，将所述第一输入以及所述第二输入进行整合处理输出得到综合网表；Through the FPGA physical synthesizer, the first input and the second input are integrated and processed to output to obtain a comprehensive netlist;

通过FPGA物理综合器，根据所述第一输入以及所述第二输入进行布局布线处理输出得到布局布线网表；Through the FPGA physical synthesizer, according to the first input and the second input, the layout and routing processing output is obtained to obtain the layout and routing netlist;

根据所述综合网表以及所述布局布线网表确定所述目标电路的流程布局。The flow layout of the target circuit is determined according to the integrated netlist and the place-and-route netlist.

在本申请方案的一种可行的实施例中，所述成本函数用于表征所述FPGA架构的分区边界的导线数量；所述成本函数为：In a feasible embodiment of the scheme of the present application, the cost function is used to characterize the number of wires of the partition boundary of the FPGA architecture; the cost function is:

其中，C为成本值，v_i以及v_j表征所述控制数据流图中的节点，i＝1,2,3,…n，j＝1,2,3,…n，n为正整数，E表征节点间FIFO通道的集合，e_ij为v_i和v_j之间的连接线，row表示行数，col表示列数，width表示数据位宽。Wherein, C is a cost value, v _i and v _j represent nodes in the control data flow graph, i=1,2,3,...n, j=1,2,3,...n, n is a positive integer, E represents the collection of FIFO channels between nodes, e _ij is the connecting line between v _i and v _j , row represents the number of rows, col represents the number of columns, and width represents the data bit width.

在本申请方案的一种可行的实施例中，所述资源约束的表达式如下：In a feasible embodiment of the solution of this application, the expression of the resource constraint is as follows:

其中，v_d表示节点v分配的分区空间，v_area表示节点的所需资源，r_v表示当前分区r所容纳的节点集合，(r_child)_area表示每个分区中的资源数量。Among them, v _d represents the partition space allocated by node v, v _area represents the required resources of the node, r _v represents the set of nodes accommodated by the current partition r, and (r _child ) _area represents the number of resources in each partition.

在本申请方案的一种可行的实施例中，所述通过对所述FPGA架构的分区结果进行分割迭代，确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值，输出得到所述布局约束，包括：In a feasible embodiment of the solution of the present application, by performing segmentation iteration on the partition result of the FPGA architecture, it is determined that the cost function is the minimum value or the resource in the partition reaches the critical value of the resource constraint, and the output is obtained The layout constraints include:

获取所述控制数据流图中节点在分割迭代之前的第一坐标，根据分割方式确定坐标变换关系，根据所述坐标变换关系将所述第一坐标变换得到第二坐标；Obtaining the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determining the coordinate transformation relationship according to the segmentation method, and transforming the first coordinates to obtain the second coordinates according to the coordinate transformation relationship;

所述分割方式包括水平方向分割或竖直方向分割。The division manner includes horizontal division or vertical division.

在本申请方案的一种可行的实施例中，所述坐标变换关系的表达式如下：In a feasible embodiment of the scheme of the present application, the expression of the coordinate transformation relationship is as follows:

其中，v.row表示第二坐标中的行坐标，v.col表示第二坐标中的列坐标，(v.row)_prev表示第一坐标中的行坐标，(v.col)_prev表示第一坐标中的列坐标，v_d表示节点v分配的分区空间，vertical partition表示水平方向分割，horizontal partition表示竖直方向分割。Among them, v.row represents the row coordinate in the second coordinate, v.col represents the column coordinate in the second coordinate, (v.row) _prev represents the row coordinate in the first coordinate, (v.col) _prev represents the first The column coordinates in the coordinates, v _d indicates the partition space allocated by node v, vertical partition indicates horizontal partition, and horizontal partition indicates vertical partition.

在本申请方案的一种可行的实施例中，所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡中，延迟平衡的表达式如下：In a feasible embodiment of the solution of the present application, when inserting the connection lines between the nodes in the control data flow graph into the pipeline for delay balance, the delay balance expression is as follows:

e_ij.balance＝(S_i-S_j-e_ij.lat)e _ij.balance ＝(S _i -S _j -e _ij.lat )

其中，S_i表示节点v_i的时间步，S_j表示节点v_j的时间步，S_i-S_j表示节点v_i和节点v_j之间的所有路径之间的最大延迟；e_ij.lat表示插入流水线之前存在的额外时延；e_ij.balance表示插入流水线后产生的平衡时延。Among them, S _i represents the time step of node v _i , S _j represents the time step of node v _j , S _i -S _j represents the maximum delay between all paths between node v _i and node v _j ; e _ij.lat Indicates the additional delay before inserting into the pipeline; e _ij.balance indicates the balance delay after inserting into the pipeline.

在本申请方案的一种可行的实施例中，所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡这一步骤，包括：In a feasible embodiment of the solution of the present application, the step of performing delay balancing on the connection line insertion pipeline between nodes in the control data flow graph includes:

根据所述平衡时延构建面积开销的目标函数，所述目标函数为：Constructing an objective function of area overhead according to the balanced time delay, the objective function is:

其中，e_ij.width为流水线在节点v_i与节点v_j之间最大数据位宽。Wherein, e _ij.width is the maximum data bit width of the pipeline between node v _i and node v _j .

本发明的优点和有益效果将在下面的描述中部分给出，其他部分可以通过本发明的具体实施方式了解得到：The advantages and beneficial effects of the present invention will be partially provided in the following description, and other parts can be understood through the specific implementation of the present invention:

本申请技术方案提出了一种基于平面规划算法的高层次综合指导FPGA物理布局约束的全流程布局方法，方法通过目标电路的电路描述构建得到控制数据流图，并通过平面规划算法对所述控制数据流图进行分割，得到布局约束；在布局约束的基础上进行目标电路对应资源的调度和绑定得到寄存器传输级描述，进一步进行综合处理、布局布线得到目标电路的网表；通过高层次综合减少布局布线的拥塞情况，同时也能够减少布线过程中跨FPGA block边界带来的延迟增加。The technical solution of this application proposes a full-process layout method based on a high-level comprehensive guidance of FPGA physical layout constraints based on a plane planning algorithm. The method obtains a control data flow graph through the circuit description of the target circuit, and controls the control data flow through the plane planning algorithm. Segment the data flow diagram to obtain the layout constraints; on the basis of the layout constraints, schedule and bind the corresponding resources of the target circuit to obtain the register transmission level description, and further perform comprehensive processing, layout and routing to obtain the netlist of the target circuit; through high-level synthesis Reduce the congestion of layout and routing, and at the same time reduce the delay increase caused by crossing the FPGA block boundary during the routing process.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1为本申请技术方案中所提供的高层次综合的流程布局方法的步骤流程图；Fig. 1 is a flow chart of the steps of the high-level comprehensive process layout method provided in the technical solution of the present application;

图2为本申请技术方案中控制数据流图的示意图；Fig. 2 is a schematic diagram of the control data flow diagram in the technical solution of the present application;

图3为本申请技术方案中迭代分区过程示意图；Fig. 3 is a schematic diagram of the iterative partitioning process in the technical solution of the present application;

图4(a)为本申请技术方案中平衡延迟示意图之一；Figure 4(a) is one of the schematic diagrams of the balance delay in the technical solution of the present application;

图4(b)为本申请技术方案中平衡延迟示意图之二；Figure 4(b) is the second schematic diagram of the balance delay in the technical solution of the present application;

图5为本申请技术方案中FIFO流水线示意图。FIG. 5 is a schematic diagram of a FIFO pipeline in the technical solution of the present application.

具体实施方式detailed description

下面详细描述本发明的实施例，实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，仅用于解释本发明，而不能理解为对本发明的限制。对于以下实施例中的步骤编号，其仅为了便于阐述说明而设置，对步骤之间的顺序不做任何限定，实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。Embodiments of the present invention are described in detail below, and examples of the embodiments are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. For the step numbers in the following embodiments, it is only set for the convenience of illustration and description, and the order between the steps is not limited in any way. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art sexual adjustment.

基于目前相关技术方案中，特别是在FPGA布局布线中，物理综合器往往会使用距离较近的资源，从而容易带来布局布线的拥塞，增加整体的布线长度，进而降低电路的吞吐量。针对相关技术方案中所存在的技术缺陷，本申请技术方案提出了一种基于平面规划算法的高层次综合工具指导FPGA物理布局约束的全流程布局方法。Based on the current related technical solutions, especially in FPGA layout and routing, the physical synthesizer often uses resources that are relatively close, which easily causes layout and routing congestion, increases the overall wiring length, and reduces the throughput of the circuit. In view of the technical defects in the related technical solutions, the technical solution of this application proposes a full-process layout method based on a plane planning algorithm-based high-level synthesis tool to guide FPGA physical layout constraints.

在第一方面，如图1所示，本申请技术方案提供了高层次综合的流程布局方法；方法包括步骤S100-S400：In the first aspect, as shown in Figure 1, the technical solution of the present application provides a high-level comprehensive process layout method; the method includes steps S100-S400:

S100、获取目标电路的电路描述，根据所述电路描述构建得到所述目标电路对应的控制数据流图；S100. Obtain a circuit description of the target circuit, and construct a control data flow diagram corresponding to the target circuit according to the circuit description;

其中，如图2所示，控制数据流图为表征目标电路运算操作的有向图，在图2中的实线表示数据依赖关系，虚线表示控制依赖关系，三角形符号表示分支操作。实施例中电路描述包括但不限于通过VHDL语言或者Verilog语言，对目标电路中的信号输入，元器件以及元器件所执行的逻辑操作进行描述的内容；目标电路可以是指真实场景中硬件电路。具体在实施例中，首先获取输入的电路描述，构建控制数据流图，在实施例中，控制数据流图(Control Data Flow Graph，CDFG)是一种有向图G＝<V,E>，其中V表示控制数据流图中所有节点的集合，控制数据流图中的每一个节点代表目标电路中的一个运算操作；E表示控制数据流图中全部有向连接线的集合，控制数据流图中每一条连接两个节点的有向边代表这两个相应运算操作之间存在的数据或控制依赖关系。控制数据流图的控制关系依赖边体现电路描述的控制依赖性，数据关系依赖边体现电路描述的数据依赖性。基于控制数据流图的性质，在实施例中构建得到控制数据流图的过程中，首先，实施例通过编译器前端，根据源代码将电路描述中的行为级描述的高级语言代码生成中间码；然后实施例通过编译器后端将电路描述中变量映射为节点，将控制和数据依赖性映射为有向边，构建控制流数据流图。Among them, as shown in Figure 2, the control data flow graph is a directed graph representing the operation of the target circuit. The solid line in Figure 2 represents the data dependency, the dotted line represents the control dependency, and the triangle symbol represents the branch operation. The circuit description in the embodiment includes, but is not limited to, the description of signal input, components, and logical operations performed by the components in the target circuit through VHDL language or Verilog language; the target circuit may refer to a hardware circuit in a real scene. Specifically, in the embodiment, the input circuit description is first obtained, and the control data flow graph is constructed. In the embodiment, the control data flow graph (Control Data Flow Graph, CDFG) is a directed graph G=<V, E>, Among them, V represents the set of all nodes in the control data flow graph, and each node in the control data flow graph represents an operation in the target circuit; E represents the set of all directed connection lines in the control data flow graph, and the control data flow graph Each directed edge connecting two nodes in represents a data or control dependency between the two corresponding operations. The control relationship dependency side of the control data flow graph reflects the control dependency of the circuit description, and the data relationship dependency side reflects the data dependency of the circuit description. Based on the nature of the control data flow graph, in the process of constructing the control data flow graph in the embodiment, first, the embodiment uses the front end of the compiler to generate an intermediate code from the high-level language code of the behavior-level description in the circuit description according to the source code; Then, the embodiments map the variables in the circuit description into nodes, map the control and data dependencies into directed edges through the backend of the compiler, and construct the control flow data flow graph.

S200、通过平面规划算法对所述控制数据流图进行分割，得到布局约束；S200. Segment the control data flow graph by using a plane planning algorithm to obtain layout constraints;

其中，实施例中的平面规划算法可以采用割平面算法进行求解规划；实施例中的布局可以定义为一组物理约束，用于控制逻辑在模型中的放置方式。具体在实施例中，首先根据目标电路对应的FPGA架构，确定目标电路对应的分区数量、资源以及资源最大利用率；然后，将控制数据流图中数据流进程对应的函数编译为一个寄存器转换级电路(RegisterTransfer Level，RTL)模块，放置在初始分区；基于割平面算法，将当前分区水平地或竖直地一分为二，计算并选择成本函数最小的方案，基于所得到的方案，确定目标电路对应在FPGA架构中布局约束。Wherein, the planar programming algorithm in the embodiment can use the cutting plane algorithm to solve the plan; the layout in the embodiment can be defined as a set of physical constraints, which are used to control the placement of the logic in the model. Specifically in the embodiment, first, according to the FPGA architecture corresponding to the target circuit, determine the number of partitions, resources, and maximum utilization of the resources corresponding to the target circuit; then, compile the function corresponding to the data flow process in the control data flow diagram into a register conversion stage The circuit (RegisterTransfer Level, RTL) module is placed in the initial partition; based on the cutting plane algorithm, the current partition is divided into two horizontally or vertically, and the scheme with the smallest cost function is calculated and selected. Based on the obtained scheme, the target is determined Circuits correspond to placement constraints in the FPGA fabric.

在一些可行的实施方式中，通过平面规划算法对所述控制数据流图进行分割，得到布局约束这一步骤S200，可以包括步骤S210-S240：In some feasible implementation manners, the step S200 of segmenting the control data flow graph through a plane planning algorithm to obtain layout constraints may include steps S210-S240:

S210、获取所述电路描述中所述目标电路的FPGA架构；S210. Obtain the FPGA architecture of the target circuit in the circuit description;

S220、根据所述控制数据流图中数据流进程对应的函数，进行编译得到寄存器转换级的电路模块；S220. Compile according to the function corresponding to the data flow process in the control data flow graph to obtain a circuit module of the register conversion stage;

S230、根据所述电路模块对所述FPGA架构进行分区，确定分区结果的成本函数；S230. Partition the FPGA architecture according to the circuit modules, and determine a cost function of the partition result;

S240、通过对所述FPGA架构的分区结果进行分割迭代，确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值，输出得到所述布局约束；S240. By segmenting and iterating the partition result of the FPGA architecture, it is determined that the cost function is a minimum value or the resources in the partition reach the critical value of the resource constraint, and output to obtain the layout constraint;

具体在实施例中，FPGA架构中有多个Block，不同Block的资源可能会不同。良好的布局有助于减少布线拥塞，并提高设计中可实现的时序结果(QoR)质量；而实施例中，布局约束使用到了Pblock指令以实现对资源分区的指定，Pblock边界允许利用时钟区域边界来定义pblock的大小，而不是使用SLICE、BRAM、DSP等范围，有助于限制时钟偏斜，并有助于设计的整体时钟放置。并且基于控制数据流图，实施例对HLS设计进行分区的基础是数据流编程样式，即HLS设计是流式的，设计结构被描述成一张有向图；在有向图中节点表示需要进行运算处理的单元，节点之间的连接线描述了数据传输路径；在有向图中，相邻节点通过量接线传输数据，节点消耗数据进行计算，并将产生的数据输出到输入输出序列作为下一个计算单元的输入。Specifically, in the embodiment, there are multiple Blocks in the FPGA architecture, and resources of different Blocks may be different. Good layout helps to reduce routing congestion and improve the quality of timing results (QoR) achievable in the design; and in the embodiment, the layout constraints use Pblock instructions to specify resource partitions, and Pblock boundaries allow the use of clock region boundaries To define the size of the pblock, instead of using SLICE, BRAM, DSP, etc. range, helps limit clock skew and helps the overall clock placement of the design. And based on the control data flow graph, the basis for partitioning the HLS design in the embodiment is the data flow programming style, that is, the HLS design is stream-like, and the design structure is described as a directed graph; nodes in the directed graph represent operations that need to be performed The unit of processing, the connection line between the nodes describes the data transmission path; in the directed graph, the adjacent nodes transmit data through the volume line, the node consumes the data for calculation, and outputs the generated data to the input and output sequence as the next input to the computing unit.

示例性地，实例中HLS设计采用了一个数据流编程模型，其中每个函数对应一个数据流进程，每个函数对应一个RTL模块，模块之间使用FIFO进行通信。则构建出图G＝<V,E>，其中V表示数据流的集合，每一个节点代表一个函数；E表示顶点之间FIFO通道的集合。Exemplarily, the HLS design in the example adopts a data flow programming model, wherein each function corresponds to a data flow process, each function corresponds to an RTL module, and FIFO is used for communication between modules. Then construct a graph G=<V, E>, where V represents a collection of data streams, and each node represents a function; E represents a collection of FIFO channels between vertices.

S300、对所述控制数据流图进行调度，将调度结果进行绑定得到寄存器传输级描述；S300. Scheduling the control data flow graph, and binding the scheduling results to obtain a register transmission level description;

具体在实施例中，需要针对控制数据流图的子图进行相应的调度；对控制数据流图进行延迟平衡；进而，在一些可行的实施方案中，步骤S300可以包括步骤S310-S330：Specifically, in the embodiment, corresponding scheduling needs to be performed on the subgraphs of the control data flow graph; delay balancing is performed on the control data flow graph; furthermore, in some feasible implementations, step S300 may include steps S310-S330:

S310、对所述控制数据流图的子图进行调度；S310. Scheduling the subgraphs of the control data flow graph;

S320、对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡；S320. Perform delay balancing on connecting lines between nodes in the control data flow graph into the pipeline;

S330、将调度结果以及延迟平衡后的结果进行数学整合，根据数学整合后的结果与所述目标电路进行绑定，得到所述寄存器传输级描述。S330. Mathematically integrate the scheduling result and the delay-balanced result, and bind the mathematically integrated result to the target circuit to obtain the register transfer level description.

具体在实施例中，对控制数据流图的子图采用高层次综合工具的默认方式进行调度；然后，对控制数据流图的割边插入流水线，进行平衡延迟；将步骤S310-S320中得到的调度结果进行数学整合后，获得综合调度结果；对综合调度结果对目标电路对应的FPGA架构中的资源进行绑定，获得寄存器传输级描述。Specifically, in the embodiment, the subgraph of the control data flow graph is scheduled using the default mode of the high-level synthesis tool; then, the edge cutting of the control data flow graph is inserted into the pipeline, and the delay is balanced; the obtained in steps S310-S320 After the scheduling results are mathematically integrated, the comprehensive scheduling result is obtained; the comprehensive scheduling result is bound to the resources in the FPGA architecture corresponding to the target circuit, and the register transfer level description is obtained.

S400、根据所述布局约束对所述寄存器传输级描述得到目标网表，根据所述目标网表确定所述目标电路的流程布局；S400. Describe the register transfer level according to the layout constraints to obtain a target netlist, and determine the flow layout of the target circuit according to the target netlist;

具体在实施例中，根据步骤S200中所得到布局约束以及步骤S300中所得到的寄存器传输级描述；对FPGA架构中的资源进行整合处理以及布局布线操作，得到对应的目标网表，从而确定目标电路对应的控制流程布局。Specifically, in the embodiment, according to the layout constraints obtained in step S200 and the register transfer level description obtained in step S300; the resources in the FPGA architecture are integrated and placed and routed to obtain the corresponding target netlist, thereby determining the target The corresponding control flow layout of the circuit.

在一些可行的实施方式中，实施例中的布局约束包括时序约束或物理约束中的至少之一；目标网表包括综合网表或布局布线网表中的至少之一；进而实施例中根据所述布局约束对所述寄存器传输级描述得到目标网表，根据所述目标网表确定所述目标电路的流程布局这一步骤S400，可以包括步骤S410-S450：In some feasible implementation manners, the placement constraints in the embodiments include at least one of timing constraints or physical constraints; the target netlist includes at least one of a synthesis netlist or a place-and-route netlist; and in the embodiments, according to the The layout constraint describes the register transfer level to obtain a target netlist, and the step S400 of determining the flow layout of the target circuit according to the target netlist may include steps S410-S450:

S410、根据所述时序约束和/或所述物理约束构建得到第一输入；S410. Construct and obtain a first input according to the timing constraint and/or the physical constraint;

S420、根据所述寄存器传输级描述构建得到第二输入；S420. Construct and obtain a second input according to the register transfer level description;

S430、通过FPGA物理综合器，将所述第一输入以及所述第二输入进行整合处理输出得到综合网表；S430. Using the FPGA physical synthesizer, integrate the first input and the second input to output an integrated netlist;

S440、通过FPGA物理综合器，根据所述第一输入以及所述第二输入进行布局布线处理输出得到布局布线网表；S440. Using the FPGA physical synthesizer, perform placement and routing processing and output according to the first input and the second input to obtain a placement and routing netlist;

S450、根据所述综合网表以及所述布局布线网表确定所述目标电路的流程布局。S450. Determine the flow layout of the target circuit according to the integrated netlist and the place-and-route netlist.

具体在实施例中，首先将步骤S200中所得到的布局约束和高层次综合工具本身得到时序约束、物理约束作为综合实现的一组约束条件输入，即为第一输入；将步骤S300中所得到的寄存器传输级描述作为综合实现的RTL输入，即第二输入；然后实施例运行FPGA物理综合器，执行综合、布局布线操作，获得综合后网表和布局布线后网表，根据所得到的网表确定目标电路的流程布局。Specifically, in the embodiment, first, the layout constraints obtained in step S200 and the timing constraints and physical constraints obtained by the high-level synthesis tool itself are input as a set of constraints for synthesis implementation, which is the first input; the obtained in step S300 The register transfer level description is the RTL input implemented as synthesis, i.e. the second input; then the embodiment runs the FPGA physical synthesizer, performs synthesis, placement and routing operations, and obtains the netlist after synthesis and the netlist after placement and routing, and according to the obtained netlist The table determines the flow layout of the target circuit.

在实施例中，步骤S230中根据所述电路模块对所述FPGA架构进行分区，确定分区结果的成本函数，其中，成本函数物理意义是穿过分区边界的导线数量的总和。进而，实施例中的成本函数为：In an embodiment, in step S230, the FPGA architecture is partitioned according to the circuit modules, and a cost function of the partition result is determined, wherein the physical meaning of the cost function is the sum of the number of wires passing through the partition boundary. Furthermore, the cost function in the embodiment is:

在实施例中，步骤S240通过对所述FPGA架构的分区结果进行分割迭代，确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值，输出得到所述布局约束；其中，资源约束的表达式如下：In an embodiment, step S240 performs segmentation iteration on the partition result of the FPGA architecture, determines that the cost function is the minimum value or the resource in the partition reaches the critical value of the resource constraint, and outputs the layout constraint; wherein, the resource The expression of the constraint is as follows:

在实施例中，对所述FPGA架构的分区结果进行分割迭代，确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值，输出得到所述布局约束这一步骤S240，可以包括步骤S241-S242：In an embodiment, the partitioning result of the FPGA architecture is divided and iterated, the cost function is determined to be the minimum value or the resource in the partition reaches the critical value of the resource constraint, and the step S240 of obtaining the layout constraint is output, which may include Steps S241-S242:

S241、获取所述控制数据流图中节点在分割迭代之前的第一坐标，根据分割方式确定坐标变换关系，根据所述坐标变换关系将所述第一坐标变换得到第二坐标；S241. Obtain the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determine a coordinate transformation relationship according to the segmentation method, and transform the first coordinates to obtain second coordinates according to the coordinate transformation relationship;

S242、所述分割方式包括水平方向分割或竖直方向分割；S242. The division method includes horizontal division or vertical division;

具体在实施例中，如图3所示，分区的过程可以看作是迭代地一次次一分为二，直到成本函数最小或者不再满足约束条件，最后再添加流水线FIFO；第一步是把所有函数映射成RTL模块，放在一个分区中，称为初始化分区，其中依赖关系为1指向2、3、4，2、3、4指向5，而且2和3的资源占用比较少；第二步是竖直地一分为二，123放在上方，45放在下方；第三步是对于每个分区水平地一分为二，其结果是2和3位于左上方，1位于右上方，4位于左下方，5位于右下方；最后一步是为跨越block边界的走线添加FIFO流水线，保证电路设计的吞吐量。Specifically in the embodiment, as shown in Figure 3, the process of partitioning can be regarded as iteratively dividing into two parts again and again, until the cost function is minimum or no longer meets the constraint conditions, and finally the pipeline FIFO is added; the first step is to All functions are mapped into RTL modules and placed in one partition, called the initialization partition, where the dependency relationship is 1 points to 2, 3, 4, 2, 3, 4 points to 5, and the resource occupation of 2 and 3 is relatively small; the second The first step is to divide into two vertically, 123 is placed at the top, and 45 is placed at the bottom; the third step is to divide each partition into two horizontally, and the result is that 2 and 3 are at the upper left, and 1 is at the upper right. 4 is at the bottom left and 5 is at the bottom right; the last step is to add FIFO pipelines for the traces that cross the block boundary to ensure the throughput of the circuit design.

进一步地，实施例中坐标变换关系的表达式如下：Further, the expression of the coordinate transformation relation in the embodiment is as follows:

在实施例中，步骤S320对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡。In an embodiment, step S320 performs delay balancing on the connection line insertion pipeline between nodes in the control data flow graph.

具体在实施例中，给定一个已经分区和流水线化的数据流图G<V,E>，每个顶点v∈V代表数据流设计中的一个函数，每条边e∈E代表函数之间的FIFO通道，宽度e.width表示边的位宽，延时e.lat代表在上一个流水线步骤中插入的额外延时，平衡延时e.balance代表在当前步骤中的平衡延时。对于每条边e∈E，每条路径的总延时可以表示为：Specifically, in the embodiment, given a partitioned and pipelined data flow graph G<V, E>, each vertex v∈V represents a function in the data flow design, and each edge e∈E represents the relationship between functions The FIFO channel, the width e.width represents the bit width of the edge, the delay e.lat represents the additional delay inserted in the previous pipeline step, and the balance delay e.balance represents the balance delay in the current step. For each edge e∈E, the total delay of each path can be expressed as:

其中，{p₁,p₂}表示一对重新汇聚的路径。进一步地，延迟平衡中对于每条边(连接线)e，可以认为S_i≥S_j+e_ij.lat，额外的平衡延时可以表示为：Among them, {p ₁ ,p ₂ } represent a pair of reconverged paths. Further, for each edge (connection line) e in delay balance, it can be considered that S _i ≥ S _j +e _ij.lat , and the additional balance delay can be expressed as:

e_ij.balance＝(S_i-S_j-e_ij.lat)e _ij.balance ＝(S _i -S _j -e _ij.lat )

其中，S_i表示节点v_i的时间步，S_j表示节点v_j的时间步，S_i-S_j表示节点v_i和节点v_j之间的所有路径之间的最大延迟；e_ij.lat表示上一个流水线步骤在顶点v_i和v_j之间最长路径中插入的额外延时；e_ij.balance表示当前流水线步骤在顶点v_i和v_j之间最长路径中插入的额外延时。Among them, S _i represents the time step of node v _i , S _j represents the time step of node v _j , S _i -S _j represents the maximum delay between all paths between node v _i and node v _j ; e _ij.lat Represents the extra delay inserted in the longest path between vertices v _i and v _j by the previous pipeline step; e _ij.balance represents the extra delay inserted by the current pipeline step in the longest path between vertices v _i and v _j .

示例性地，如图4所示，图4(a)表示平衡延迟过程中的cut-set1剪割集；图4(b)表示平衡延迟过程中的cut-set2和cut-set3剪割集，其中，边缘e13、e37和e27是根据平面图分区进行管道化的，然后每个边缘都携带1个单位的插入延迟。同时假设e14的位宽为2，其他所有边均为1。在延迟平衡步骤中，最优解决方案是给e47、e57、e67每条边增加2个单位的延迟，给e12每个增加1个单位的延迟。请注意，e27和e37可以存在于同一个剪割集cut-set中。Exemplarily, as shown in Figure 4, Figure 4 (a) represents the cut-set1 cut set in the balance delay process; Figure 4 (b) represents the cut-set2 and cut-set3 cut sets in the balance delay process, Among them, edges e13, e37, and e27 are pipelined according to the floor plan partition, and then each edge carries 1 unit of insertion delay. Also assume that e14 has a bit width of 2 and all other sides are 1. In the delay balancing step, the optimal solution is to add 2 units of delay to each edge of e47, e57, and e67, and add 1 unit of delay to each of e12. Note that e27 and e37 can exist in the same cut-set.

如图5所示，分区划分后基于FIFO进行连接，使其能够流水线化。采用FIFO，可以直接调度匹配接口信号，而不影响功能，并且提供电路功能的并行性。As shown in Figure 5, after the partitions are divided, they are connected based on FIFO, so that they can be pipelined. With the FIFO, the matching interface signals can be directly scheduled without affecting the function, and the parallelism of the circuit function is provided.

在一些可行的实施方式中，所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡这一步骤S320，还可以包括步骤S321：In some feasible implementation manners, the step S320 of delay balancing the connection line insertion pipeline between the nodes in the control data flow graph may also include step S321:

S321、根据所述平衡时延构建面积开销的目标函数；S321. Construct an objective function of area cost according to the balanced time delay;

具体在实施例中，平衡延迟的优化目标是最小化总面积开销，并且考虑到每条边的位宽开销。实施例中的目标函数为：Specifically, in an embodiment, the optimization goal of balanced delay is to minimize the total area overhead, and take into account the bit width overhead of each edge. The objective function in the embodiment is:

其中，e_ij.width为流水线在节点v_i与节点v_j之间最大数据位宽Among them, e _ij.width is the maximum data bit width of the pipeline between node v _i and node v _j

从上述具体的实施过程，可以总结出，本发明所提供的技术方案相较于现有技术存在以下优点或优势：From the above specific implementation process, it can be concluded that the technical solution provided by the present invention has the following advantages or advantages compared with the prior art:

本发明可以应用在中大型的数据流编程模型的高层次综合设计中，发挥高层次综合工具能够快速原型设计的特点，同时延展了设计流程，从高级语言到硬件描述语言，再到物理布局。高层次综合工具指导物理布局的全流程设计方法，可以进一步提高高层次综合设计的布局布线情况，减少布局拥塞，保证电路整体的吞吐量。The present invention can be applied in the high-level comprehensive design of medium and large-scale data flow programming models, utilizes the characteristics of high-level comprehensive tools capable of rapid prototyping, and at the same time extends the design process, from high-level language to hardware description language, and then to physical layout. The high-level synthesis tool guides the whole-process design method of physical layout, which can further improve the layout and routing of high-level synthesis design, reduce layout congestion, and ensure the overall throughput of the circuit.

示例性地，对RISCV CPU设计分别使用默认的HLS和应用布局约束HLS，可以明显确定应用布局约束的HLS设计将资源占用率最高的CPU模块、FFT模块、USB1模块、USB2模块进行分区流水线处理，同时分布在相邻但不同的block块。在满足时序约束的情况下，时序裕量基本没有改变，但是拥塞情况大大改善，也减少了物理综合的运行时间。As an example, using the default HLS and the application layout constraints HLS for the RISCV CPU design, it can be clearly determined that the HLS design with the application layout constraints performs partition pipeline processing on the CPU module, FFT module, USB1 module, and USB2 module with the highest resource occupancy. At the same time, they are distributed in adjacent but different blocks. In the case of meeting the timing constraints, the timing margin is basically unchanged, but the congestion situation is greatly improved, and the running time of physical synthesis is also reduced.

在一些可选择的实施例中，在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如，取决于所涉及的功能/操作，连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外，在本发明的流程图中所呈现和描述的实施例以示例的方式被提供，目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的，其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative implementations, the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams. For example, two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

此外，虽然在功能性模块的背景下描述了本发明，但应当理解的是，除非另有相反说明，功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中，或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是，有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说，考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下，在工程师的常规技术内将会了解该模块的实际实现。因此，本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是，所公开的特定概念仅仅是说明性的，并不意在限制本发明的范围，本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, although the invention has been described in the context of functional modules, it should be understood that one or more of the functions and/or features may be integrated into a single physical device and/or software module unless stated to the contrary. or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary to understand the present invention. Rather, given the attributes, functions and internal relationships of the various functional blocks in the devices disclosed herein, the actual implementation of the blocks will be within the ordinary skill of the engineer. Accordingly, those skilled in the art can implement the present invention set forth in the claims without undue experimentation using ordinary techniques. It is also to be understood that the particular concepts disclosed are illustrative only and are not intended to limit the scope of the invention which is to be determined by the appended claims and their full scope of equivalents.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment used.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本发明的实施例，本领域的普通技术人员可以理解：在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型，本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and modifications can be made to these embodiments without departing from the principle and spirit of the present invention. The scope of the invention is defined by the claims and their equivalents.

以上是对本发明的较佳实施进行了具体说明，但本发明并不限于上述实施例，熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换，这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of the preferred implementation of the present invention, but the present invention is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present invention. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.

Claims

1. The high-level comprehensive process layout method is characterized by comprising the following steps of:

obtaining a circuit description of a target circuit, and constructing and obtaining a control data flow diagram corresponding to the target circuit according to the circuit description; the control data flow graph is a directed graph representing the operation of a target circuit;

dividing the control data flow graph through a plane planning algorithm to obtain layout constraint;

scheduling the control data flow graph, and binding a scheduling result to obtain register transmission level description;

and describing the register transmission level according to the layout constraint to obtain a target netlist, and determining the flow layout of the target circuit according to the target netlist.

2. The high-level integrated process layout method of claim 1, wherein the segmenting the control data flow graph by a floor planning algorithm to obtain layout constraints comprises:

acquiring an FPGA framework of the target circuit in the circuit description;

compiling according to a function corresponding to a data flow process in the control data flow graph to obtain a register conversion level circuit module;

partitioning the FPGA framework according to the circuit module, and determining a cost function of a partitioning result;

and performing segmentation iteration on the partition result of the FPGA framework, determining that the cost function is the minimum value or the resource in the partition reaches the critical value of the resource constraint, and outputting to obtain the layout constraint.

3. The high-level integrated process layout method of claim 1, wherein the scheduling the control data flow graph and binding the scheduling result to obtain a register transfer level description comprises:

scheduling subgraphs of the control data flow graph;

inserting connecting lines between nodes in the control data flow graph into a production line for delay balance;

and mathematically integrating the scheduling result and the result after delay balance, and binding the result after mathematical integration with the target circuit to obtain the register transmission level description.

4. The high-level integrated process layout method of claim 1, wherein the layout constraints comprise at least one of timing constraints or physical constraints; the target netlist comprises at least one of a synthesized netlist or a place and route netlist; the obtaining a target netlist by describing the register transfer level according to the layout constraint, and determining the process layout of the target circuit according to the target netlist, includes:

constructing and obtaining a first input according to the time sequence constraint and/or the physical constraint;

constructing according to the register transmission level description to obtain a second input;

integrating, processing and outputting the first input and the second input through an FPGA physical synthesizer to obtain a synthesized netlist;

performing layout and wiring processing output according to the first input and the second input through an FPGA physical synthesizer to obtain a layout and wiring netlist;

and determining the flow layout of the target circuit according to the comprehensive netlist and the layout and routing netlist.

5. The method of high-level synthesis flow layout of claim 2, wherein the cost function is used to characterize the number of wires at the partition boundaries of the FPGA architecture; the cost function is:

wherein C is a cost value, v _i And v _j Characterizing nodes in the control dataflow graph, i =1,2,3, … n, j =1,2,3, … n, n being positive integers, E characterizing a set of FIFO lanes between nodes, E _ij Is v is _i And v _j The connecting line between, row, col, and width represent the number of rows, columns, and data bits.

6. The high-level integrated process layout method of claim 2, wherein the resource constraints are expressed as follows:

wherein v is _d Representing the partition space allocated by node v, v _area Representing the required resources of the node, r _v Represents the set of nodes held by the current partition r, (r) _child ) _area Indicating the number of resources in each partition.

7. The method of claim 2, wherein the determining that the cost function is a minimum value or a threshold value of resource constraints is reached by resources in a partition by performing segmentation iteration on partition results of the FPGA architecture and outputting the layout constraints comprises:

acquiring a first coordinate of a node in the control data flow graph before segmentation iteration, determining a coordinate transformation relation according to a segmentation mode, and transforming the first coordinate to obtain a second coordinate according to the coordinate transformation relation;

the dividing mode comprises horizontal direction dividing or vertical direction dividing.

8. The process layout method of high-level synthesis according to claim 7, wherein the expression of the coordinate transformation relationship is as follows:

wherein v.row represents a row coordinate in the second coordinate, v.col represents a column coordinate in the second coordinate, (v.row) _prev Indicates the line coordinate in the first coordinate, (v _prev Representing column coordinates in a first coordinate, v _d The partition space allocated by the node v is represented, the vertical part represents the horizontal direction division, and the horizontal part represents the vertical direction division.

9. The method of claim 3, wherein the delay balancing is performed for the connection lines between the nodes in the control data flow graph inserted into the pipeline, and the expression of the delay balancing is as follows:

e _ij. balance＝(S _i -S _j -e _ij.lat )

wherein S is _i Representing a node v _i Time step of S _j Representing a node v _j Time step of S _i -S _j Representing a node v _i And node v _j Maximum delay between all paths in between; e.g. of the type _ij.lat Indicating an extra delay that exists before insertion into the pipeline; e.g. of the type _ij.balance Representing the resulting equilibrium delay after insertion into the pipeline.

10. The method of claim 9, wherein the step of balancing delay for the connecting line insertion pipelines between nodes in the control data flow graph comprises:

constructing an objective function of the area overhead according to the balance time delay, wherein the objective function is as follows:

wherein e is _ij.width At node v for the pipeline _i And node v _j The maximum data bit width therebetween.