WO2024045435A1 - Process layout method for high-level synthesis - Google Patents
Process layout method for high-level synthesis Download PDFInfo
- Publication number
- WO2024045435A1 WO2024045435A1 PCT/CN2022/141006 CN2022141006W WO2024045435A1 WO 2024045435 A1 WO2024045435 A1 WO 2024045435A1 CN 2022141006 W CN2022141006 W CN 2022141006W WO 2024045435 A1 WO2024045435 A1 WO 2024045435A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data flow
- control data
- flow graph
- partition
- layout
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 74
- 230000008569 process Effects 0.000 title claims abstract description 38
- 238000003786 synthesis reaction Methods 0.000 title abstract description 13
- 230000015572 biosynthetic process Effects 0.000 title abstract description 12
- 238000012546 transfer Methods 0.000 claims abstract description 24
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 13
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000005192 partition Methods 0.000 claims description 53
- 230000011218 segmentation Effects 0.000 claims description 21
- 230000009466 transformation Effects 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000000638 solvent extraction Methods 0.000 abstract description 4
- 238000004088 simulation Methods 0.000 abstract description 2
- 238000010276 construction Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 36
- 238000013461 design Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 4
- 238000003780 insertion Methods 0.000 description 4
- 230000037431 insertion Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 101000941170 Homo sapiens U6 snRNA phosphodiesterase 1 Proteins 0.000 description 1
- 102100031314 U6 snRNA phosphodiesterase 1 Human genes 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012938 design process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/392—Floor-planning or layout, e.g. partitioning or placement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/394—Routing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/30—Circuit design
- G06F30/39—Circuit design at the physical level
- G06F30/398—Design verification or optimisation, e.g. using design rule check [DRC], layout versus schematics [LVS] or finite element methods [FEM]
Definitions
- the invention relates to the technical field of circuit simulation, and in particular to a high-level integrated process layout method.
- High-level synthesis refers to the process of automatically converting logical structures described in high-level languages into circuit models described in low-abstract language. HLS tools are efficient and fast, which can reduce the design time of hardware engineers and also allow software engineers to complete hardware design.
- the purpose of embodiments of the present invention is to provide a high-level integrated process layout method that can effectively reduce wiring layout congestion and reduce latency.
- control data flow graph is a directed graph representing the operation operation of the target circuit
- the register transfer level is described according to the layout constraints to obtain a target netlist, and the flow layout of the target circuit is determined according to the target netlist.
- control data flow graph is segmented through a floor planning algorithm to obtain layout constraints, including:
- the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and the layout constraint is obtained as output.
- scheduling the control data flow graph and binding the scheduling results to obtain a register transfer level description includes:
- the scheduling results and the delay-balanced results are mathematically integrated, and the mathematically integrated results are bound to the target circuit to obtain the register transfer level description.
- the layout constraints include at least one of timing constraints or physical constraints;
- the target netlist includes at least one of a comprehensive netlist or a placement and routing netlist; Describing the register transfer level according to the layout constraints to obtain a target netlist, and determining the process layout of the target circuit based on the target netlist, including:
- the first input and the second input are integrated and processed to output a comprehensive netlist
- place and route processing is performed according to the first input and the second input to obtain a place and route netlist
- the flow layout of the target circuit is determined according to the comprehensive netlist and the placement and routing netlist.
- the cost function is used to characterize the number of wires at the partition boundary of the FPGA architecture; the cost function is:
- C is the cost value
- v i and v j represent the nodes in the control data flow graph
- i 1,2,3,...n
- j 1,2,3,...n
- n is a positive integer
- E represents the set of FIFO channels between nodes
- e ij is the connection line between v i and v j
- row represents the number of rows
- col represents the number of columns
- width represents the data bit width.
- v d represents the partition space allocated by node v
- v area represents the required resources of the node
- r v represents the set of nodes accommodated by the current partition r
- (r child ) area represents the number of resources in each partition.
- the layout constraints include:
- the segmentation method includes horizontal segmentation or vertical segmentation.
- v.row represents the row coordinate in the second coordinate
- v.col represents the column coordinate in the second coordinate
- (v.row) prev represents the row coordinate in the first coordinate
- (v.col) prev represents the first Column coordinates in the coordinates
- v d represents the partition space allocated by node v
- vertical partition represents horizontal division
- horizontal partition represents vertical division.
- connection lines between nodes in the control data flow graph are inserted into the pipeline to perform delay balancing.
- delay balancing is as follows:
- S i represents the time step of node v i
- S j represents the time step of node v j
- S i -S j represents the maximum delay between all paths between node v i and node v j
- e ij.lat Indicates the additional delay that exists before the pipeline is inserted
- e ij.balance indicates the balance delay generated after the pipeline is inserted.
- the step of delay balancing the connection line insertion pipeline between nodes in the control data flow graph includes:
- An objective function of area overhead is constructed based on the balanced delay, and the objective function is:
- e ij.width is the maximum data bit width of the pipeline between node vi and node v j .
- the technical solution of this application proposes a full-process layout method based on a high-level comprehensive guidance of FPGA physical layout constraints based on a floor planning algorithm.
- the method constructs a control data flow diagram through the circuit description of the target circuit, and uses the floor planning algorithm to control the control data.
- the data flow graph is segmented to obtain layout constraints; on the basis of the layout constraints, the corresponding resources of the target circuit are scheduled and bound to obtain a register transfer level description, and further comprehensive processing and placement and routing are performed to obtain the netlist of the target circuit; through high-level synthesis Reduce the congestion of layout and routing, and also reduce the increase in delay caused by crossing FPGA block boundaries during the routing process.
- Figure 1 is a step flow chart of the high-level comprehensive process layout method provided in the technical solution of this application;
- FIG. 2 is a schematic diagram of the control data flow diagram in the technical solution of this application.
- FIG. 3 is a schematic diagram of the iterative partitioning process in the technical solution of this application.
- Figure 4(a) is one of the balance delay schematic diagrams in the technical solution of this application.
- Figure 4(b) is the second balance delay diagram in the technical solution of this application.
- Figure 5 is a schematic diagram of the FIFO pipeline in the technical solution of this application.
- the technical solution of this application proposes a full-process layout method based on a high-level comprehensive tool based on a floor planning algorithm to guide FPGA physical layout constraints.
- the technical solution of this application provides a high-level comprehensive process layout method; the method includes steps S100-S400:
- the control data flow graph is a directed graph that represents the operation operations of the target circuit.
- the solid lines in Figure 2 represent data dependencies, the dotted lines represent control dependencies, and the triangle symbols represent branch operations.
- the circuit description includes but is not limited to describing the signal input, components, and logical operations performed by the components in the target circuit through VHDL language or Verilog language; the target circuit may refer to a hardware circuit in a real scene.
- the input circuit description is first obtained and a control data flow graph is constructed.
- the control relationship dependency edge of the control data flow graph reflects the control dependence of the circuit description, and the data relationship dependency edge reflects the data dependence of the circuit description.
- the embodiment uses the compiler front end to generate intermediate code from the high-level language code of the behavioral level description in the circuit description according to the source code;
- the embodiment uses the compiler backend to map variables in the circuit description into nodes, and map control and data dependencies into directed edges to build a control flow data flow graph.
- the plane planning algorithm in the embodiment can use the cutting plane algorithm for solution planning;
- the layout in the embodiment can be defined as a set of physical constraints for controlling the placement of logic in the model.
- the plane planning algorithm in the embodiment can use the cutting plane algorithm for solution planning;
- the layout in the embodiment can be defined as a set of physical constraints for controlling the placement of logic in the model.
- the circuit (Register Transfer Level, RTL) module is placed in the initial partition; based on the cutting plane algorithm, the current partition is divided into two horizontally or vertically, and the solution with the smallest cost function is calculated and selected. Based on the obtained solution, determine The target circuit corresponds to the layout constraints in the FPGA architecture.
- the step S200 of segmenting the control data flow graph through a floor planning algorithm to obtain layout constraints may include steps S210-S240:
- the resources of different Blocks may be different.
- Good layout helps reduce routing congestion and improves the quality of timing results (QoR) achievable in the design;
- the layout constraints use the Pblock instruction to specify the resource partition, and the Pblock boundary allows the use of clock region boundaries. Defining the size of the Pblock instead of using SLICE, BRAM, DSP, etc. ranges helps limit clock skew and aids in the overall clock placement of the design.
- the embodiment partitions the HLS design based on the data flow programming style, that is, the HLS design is streaming, and the design structure is described as a directed graph; nodes in the directed graph represent operations that need to be performed The unit of processing, the connection line between nodes describes the data transmission path; in the directed graph, adjacent nodes transmit data through quantity wiring, the nodes consume the data for calculation, and output the generated data to the input-output sequence as the next input to the calculation unit.
- step S300 may include steps S310-S330:
- S330 mathematically integrate the scheduling results and the delay-balanced results, and bind the mathematically integrated results to the target circuit to obtain the register transfer level description.
- the subgraphs of the control data flow graph are scheduled using the default method of the high-level synthesis tool; then, the cutting edge of the control data flow graph is inserted into the pipeline to perform balancing delay; the results obtained in steps S310-S320 are After the scheduling results are mathematically integrated, the comprehensive scheduling results are obtained; the comprehensive scheduling results are bound to the resources in the FPGA architecture corresponding to the target circuit to obtain the register transfer level description.
- the resources in the FPGA architecture are integrated and placed and routed to obtain the corresponding target netlist, thereby determining the target.
- the corresponding control flow layout of the circuit according to the layout constraints obtained in step S200 and the register transfer level description obtained in step S300, the resources in the FPGA architecture are integrated and placed and routed to obtain the corresponding target netlist, thereby determining the target.
- the corresponding control flow layout of the circuit according to the layout constraints obtained in step S200 and the register transfer level description obtained in step S300, the resources in the FPGA architecture are integrated and placed and routed to obtain the corresponding target netlist, thereby determining the target.
- the layout constraints in the embodiment include at least one of timing constraints or physical constraints; the target netlist includes at least one of a comprehensive netlist or a placement and routing netlist; further, in the embodiment, according to the The layout constraints describe the register transfer level to obtain a target netlist.
- the step S400 of determining the process layout of the target circuit based on the target netlist may include steps S410-S450:
- S450 Determine the process layout of the target circuit according to the comprehensive netlist and the placement and routing netlist.
- the layout constraints obtained in step S200 and the timing constraints and physical constraints obtained by the high-level synthesis tool itself are input as a set of constraints for comprehensive implementation, which is the first input; the layout constraints obtained in step S300 are input
- the register transfer level description is used as the RTL input of the synthesis implementation, that is, the second input; then the embodiment runs the FPGA physical synthesizer, performs synthesis and placement and routing operations, and obtains the post-synthesis netlist and the post-layout netlist.
- the table determines the flow layout of the target circuit.
- step S230 the FPGA architecture is partitioned according to the circuit module, and a cost function of the partition result is determined, where the physical meaning of the cost function is the sum of the number of wires passing through the partition boundary. Furthermore, the cost function in the embodiment is:
- C is the cost value
- v i and v j represent the nodes in the control data flow graph
- i 1,2,3,...n
- j 1,2,3,...n
- n is a positive integer
- E represents the set of FIFO channels between nodes
- e ij is the connection line between v i and v j
- row represents the number of rows
- col represents the number of columns
- width represents the data bit width.
- step S240 performs segmentation and iteration on the partition results of the FPGA architecture, determines that the cost function is the minimum value or the resources in the partition reach the critical value of resource constraints, and outputs the layout constraints; where, resources
- the expression of the constraint is as follows:
- v d represents the partition space allocated by node v
- v area represents the required resources of the node
- r v represents the set of nodes accommodated by the current partition r
- (r child ) area represents the number of resources in each partition.
- the step S240 of performing segmentation and iteration on the partition results of the FPGA architecture, determining that the cost function is the minimum value or the resources in the partition reaches a critical value of resource constraints, and outputting the layout constraints may include Steps S241-S242:
- the segmentation method includes horizontal segmentation or vertical segmentation
- the partitioning process can be seen as iteratively dividing it into two parts again and again until the cost function is minimum or the constraints are no longer met, and finally the pipeline FIFO is added;
- the first step is to All functions are mapped to RTL modules and placed in a partition, called the initialization partition, in which the dependency relationship is 1 points to 2, 3, 4, 2, 3, 4 points to 5, and 2 and 3 occupy less resources;
- second The first step is to divide it into two vertically, with 123 placed at the top and 45 at the bottom;
- the third step is to divide each partition into two horizontally, with the result that 2 and 3 are located at the upper left and 1 is located at the upper right. 4 is located at the lower left and 5 is located at the lower right;
- the last step is to add a FIFO pipeline for the traces that cross the block boundary to ensure the throughput of the circuit design.
- v.row represents the row coordinate in the second coordinate
- v.col represents the column coordinate in the second coordinate
- (v.row) prev represents the row coordinate in the first coordinate
- (v.col) prev represents the first Column coordinates in the coordinates
- v d represents the partition space allocated by node v
- vertical partition represents horizontal division
- horizontal partition represents vertical division.
- step S320 performs delay balancing on the connection line insertion pipeline between nodes in the control data flow graph.
- each vertex v ⁇ V represents a function in the data flow design
- each edge e ⁇ E represents a function between functions.
- FIFO channel width e.width represents the bit width of the edge
- delay e.lat represents the additional delay inserted in the previous pipeline step
- balance delay e.balance represents the balance delay in the current step.
- ⁇ p 1 , p 2 ⁇ represents a pair of re-converged paths. Furthermore, for each edge (connection line) e in delay balance, it can be considered that S i ⁇ S j +e ij.lat , and the additional balance delay can be expressed as:
- S i represents the time step of node v i
- S j represents the time step of node v j
- S i -S j represents the maximum delay between all paths between node v i and node v j
- e ij.lat represents the extra delay inserted by the previous pipeline step in the longest path between vertices v i and v j
- e ij.balance represents the additional delay inserted by the current pipeline step in the longest path between vertices v i and v j .
- Figure 4(a) represents the cut-set1 cut set in the balanced delay process
- Figure 4(b) represents the cut-set2 and cut-set3 cut sets in the balanced delay process.
- edges e13, e37 and e27 are pipelined according to the floor plan partition, and then each edge carries 1 unit of insertion delay.
- bit width of e14 is 2 and all other edges are 1.
- the optimal solution is to add 2 units of delay to each edge of e47, e57, and e67, and to add 1 unit of delay to each of e12.
- e27 and e37 can exist in the same cut-set.
- connection is based on FIFO after partitioning, so that it can be pipelined.
- FIFO matching interface signals can be directly scheduled without affecting functions, and provide parallelism of circuit functions.
- step S320 of delay balancing the connection line insertion pipeline between nodes in the control data flow graph may also include step S321:
- the optimization goal of balancing delay is to minimize the total area overhead and take into account the bit width overhead of each side.
- the objective function in the embodiment is:
- e ij.width is the maximum data bit width of the pipeline between node v i and node v j
- the invention can be applied in the high-level comprehensive design of medium and large data flow programming models, taking advantage of the characteristics of high-level comprehensive tools to enable rapid prototyping, and at the same time extending the design process from high-level language to hardware description language to physical layout.
- High-level synthesis tools guide the full-process design method of physical layout, which can further improve the layout and routing conditions of high-level synthesis design, reduce layout congestion, and ensure the overall throughput of the circuit.
- the HLS design with the application layout constraints will partition the CPU module, FFT module, USB1 module, and USB2 module with the highest resource usage for partition pipeline processing. Distributed in adjacent but different blocks at the same time.
- the timing constraints are met, the timing margin is basically unchanged, but the congestion situation is greatly improved, and the running time of physical synthesis is also reduced.
- the functions/operations noted in the block diagrams may occur out of the order noted in the operational illustrations.
- two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality/operations involved.
- the embodiments presented and described in the flow diagrams of the present invention are provided by way of example for the purpose of providing a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of a larger operation are performed independently.
- logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Geometry (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Architecture (AREA)
- Design And Manufacture Of Integrated Circuits (AREA)
Abstract
Provided in the present invention is a process layout method for high-level synthesis. The method comprises the following steps: acquiring a circuit description for a target circuit, and performing construction according to the circuit description, so as to obtain a control data flow graph corresponding to the target circuit, wherein the control data flow graph is a directed graph that represents computation operations in the target circuit; partitioning the control data flow graph by means of a plane planning algorithm, so as to obtain layout constraints; scheduling the control data flow graph, and binding a scheduling result, so as to obtain a register transfer level description; and processing the register transfer level description according to the layout constraints, so as to obtain a target netlist, and determining a process layout of the target circuit according to the target netlist. The method reduces the congestion situations during layout wiring by means of high-level synthesis, can also reduce increases in the delay caused by the crossing of the boundary of an FPGA block during the wiring process, and can be widely applied to the technical field of circuit simulation.
Description
本发明涉及电路仿真技术领域,尤其涉及一种高层次综合的流程布局方法。The invention relates to the technical field of circuit simulation, and in particular to a high-level integrated process layout method.
高层次综合(High-level Synthesis,HLS),指的是将高层次语言描述的逻辑结构,自动转换成低抽象级语言描述的电路模型的过程。HLS工具具有高效快速的特点,能够减少硬件工程师设计的时间,同时也让软件工程师完成硬件设计。High-level synthesis (HLS) refers to the process of automatically converting logical structures described in high-level languages into circuit models described in low-abstract language. HLS tools are efficient and fast, which can reduce the design time of hardware engineers and also allow software engineers to complete hardware design.
但是,在相关技术方案中,HLS设计与手工设计的质量存在差距的一大原因是难以在HLS层面上准确地估计互联延迟,难以得到较好的全局物理布局。特别是在现场可编程逻辑门阵列(Field Programmable Gate Array,FPGA)布局布线中,物理综合器往往会使用距离较近的资源,带来布局布线的拥塞,增加整体的布线长度,进而降低电路的吞吐量。此外,FPGA的资源以可编程逻辑块(Configurable Logic Block,CLB)的形式排布,不同Block的资源情况可能不同,当忽略边界跨越Block进行互联时,容易带来更长的布线和延时。However, in related technical solutions, a major reason for the quality gap between HLS design and manual design is that it is difficult to accurately estimate the interconnection delay at the HLS level and obtain a better global physical layout. Especially in field programmable gate array (Field Programmable Gate Array, FPGA) layout and wiring, physical synthesizers often use resources that are relatively close together, causing layout and wiring congestion, increasing the overall wiring length, and thus reducing the circuit cost. throughput. In addition, FPGA resources are arranged in the form of Configurable Logic Blocks (CLB). The resource conditions of different blocks may be different. When interconnection across blocks is ignored, it is easy to cause longer wiring and delay.
发明内容Contents of the invention
有鉴于此,为至少部分解决上述技术问题或者缺陷之一,本发明实施例的目的在于提供一种能够有效减少布线布局拥塞,降低时延的高层次综合的流程布局方法。In view of this, in order to at least partially solve one of the above technical problems or defects, the purpose of embodiments of the present invention is to provide a high-level integrated process layout method that can effectively reduce wiring layout congestion and reduce latency.
本申请技术方案提供了高层次综合的流程布局方法,包括以下步骤:The technical solution of this application provides a high-level comprehensive process layout method, including the following steps:
获取目标电路的电路描述,根据所述电路描述构建得到所述目标电路对应的控制数据流图;所述控制数据流图为表征目标电路运算操作的有向图;Obtain the circuit description of the target circuit, and construct a control data flow graph corresponding to the target circuit based on the circuit description; the control data flow graph is a directed graph representing the operation operation of the target circuit;
通过平面规划算法对所述控制数据流图进行分割,得到布局约束;Segment the control data flow graph through a floor planning algorithm to obtain layout constraints;
对所述控制数据流图进行调度,将调度结果进行绑定得到寄存器传输级描述;Schedule the control data flow graph, and bind the scheduling results to obtain a register transfer level description;
根据所述布局约束对所述寄存器传输级描述得到目标网表,根据所述目标网表确定所述目标电路的流程布局。The register transfer level is described according to the layout constraints to obtain a target netlist, and the flow layout of the target circuit is determined according to the target netlist.
在本申请方案的一种可行的实施例中,所述通过平面规划算法对所述控制数据流图进行分割,得到布局约束,包括:In a feasible embodiment of the solution of this application, the control data flow graph is segmented through a floor planning algorithm to obtain layout constraints, including:
获取所述电路描述中所述目标电路的FPGA架构;Obtain the FPGA architecture of the target circuit described in the circuit description;
根据所述控制数据流图中数据流进程对应的函数,进行编译得到寄存器转换级的电路模块;Compile according to the function corresponding to the data flow process in the control data flow graph to obtain the circuit module of the register conversion level;
根据所述电路模块对所述FPGA架构进行分区,确定分区结果的成本函数;Partition the FPGA architecture according to the circuit module and determine the cost function of the partition result;
通过对所述FPGA架构的分区结果进行分割迭代,确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值,输出得到所述布局约束。By performing segmentation and iteration on the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and the layout constraint is obtained as output.
在本申请方案的一种可行的实施例中,所述对所述控制数据流图进行调度,将调度结果进行绑定得到寄存器传输级描述,包括:In a feasible embodiment of the solution of this application, scheduling the control data flow graph and binding the scheduling results to obtain a register transfer level description includes:
对所述控制数据流图的子图进行调度;Schedule subgraphs of the control data flow graph;
对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡;Insert connection lines between nodes in the control data flow graph into pipelines to perform delay balancing;
将调度结果以及延迟平衡后的结果进行数学整合,根据数学整合后的结果与所述目标电路进行绑定,得到所述寄存器传输级描述。The scheduling results and the delay-balanced results are mathematically integrated, and the mathematically integrated results are bound to the target circuit to obtain the register transfer level description.
在本申请方案的一种可行的实施例中,所述布局约束包括时序约束或物理约束中的至少之一;所述目标网表包括综合网表或布局布线网表中的至少之一;所述根据所述布局约束对所述寄存器传输级描述得到目标网表,根据所述目标网表确定所述目标电路的流程布局,包括:In a feasible embodiment of the solution of the present application, the layout constraints include at least one of timing constraints or physical constraints; the target netlist includes at least one of a comprehensive netlist or a placement and routing netlist; Describing the register transfer level according to the layout constraints to obtain a target netlist, and determining the process layout of the target circuit based on the target netlist, including:
根据所述时序约束和/或所述物理约束构建得到第一输入;Construct according to the timing constraints and/or the physical constraints to obtain the first input;
根据所述寄存器传输级描述构建得到第二输入;Construct according to the register transfer level description to obtain the second input;
通过FPGA物理综合器,将所述第一输入以及所述第二输入进行整合处理输出得到综合网表;Through the FPGA physical synthesizer, the first input and the second input are integrated and processed to output a comprehensive netlist;
通过FPGA物理综合器,根据所述第一输入以及所述第二输入进行布局布线处理输出得到布局布线网表;Through the FPGA physical synthesizer, place and route processing is performed according to the first input and the second input to obtain a place and route netlist;
根据所述综合网表以及所述布局布线网表确定所述目标电路的流程布局。The flow layout of the target circuit is determined according to the comprehensive netlist and the placement and routing netlist.
在本申请方案的一种可行的实施例中,所述成本函数用于表征所述FPGA架构的分区边界的导线数量;所述成本函数为:In a feasible embodiment of the solution of this application, the cost function is used to characterize the number of wires at the partition boundary of the FPGA architecture; the cost function is:
其中,C为成本值,v
i以及v
j表征所述控制数据流图中的节点,i=1,2,3,…n,j=1,2,3,…n,n为正整数,E表征节点间FIFO通道的集合,e
ij为v
i和v
j之间的连接线,row表示行数,col表示列数,width表示数据位宽。
Among them, C is the cost value, v i and v j represent the nodes in the control data flow graph, i=1,2,3,...n, j=1,2,3,...n, n is a positive integer, E represents the set of FIFO channels between nodes, e ij is the connection line between v i and v j , row represents the number of rows, col represents the number of columns, and width represents the data bit width.
在本申请方案的一种可行的实施例中,所述资源约束的表达式如下:In a feasible embodiment of the solution of this application, the expression of the resource constraint is as follows:
其中,v
d表示节点v分配的分区空间,v
area表示节点的所需资源,r
v表示当前分区r所容纳的节点集合,(r
child)
area表示每个分区中的资源数量。
Among them, v d represents the partition space allocated by node v, v area represents the required resources of the node, r v represents the set of nodes accommodated by the current partition r, and (r child ) area represents the number of resources in each partition.
在本申请方案的一种可行的实施例中,所述通过对所述FPGA架构的分区结果进行分割迭代,确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值,输出得到所述布局约束,包括:In a feasible embodiment of the solution of the present application, by dividing and iterating the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and the output is: The layout constraints include:
获取所述控制数据流图中节点在分割迭代之前的第一坐标,根据分割方式确定坐标变换关系,根据所述坐标变换关系将所述第一坐标变换得到第二坐标;Obtain the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determine the coordinate transformation relationship according to the segmentation method, and transform the first coordinates according to the coordinate transformation relationship to obtain the second coordinates;
所述分割方式包括水平方向分割或竖直方向分割。The segmentation method includes horizontal segmentation or vertical segmentation.
在本申请方案的一种可行的实施例中,所述坐标变换关系的表达式如下:In a feasible embodiment of the solution of this application, the expression of the coordinate transformation relationship is as follows:
其中,v.row表示第二坐标中的行坐标,v.col表示第二坐标中的列坐标,(v.row)
prev表示第一坐标中的行坐标,(v.col)
prev表示第一坐标中的列坐标,v
d表示节点v分配的分区空间,vertical partition表示水平方向分割,horizontal partition表示竖直方向分割。
Among them, v.row represents the row coordinate in the second coordinate, v.col represents the column coordinate in the second coordinate, (v.row) prev represents the row coordinate in the first coordinate, (v.col) prev represents the first Column coordinates in the coordinates, v d represents the partition space allocated by node v, vertical partition represents horizontal division, and horizontal partition represents vertical division.
在本申请方案的一种可行的实施例中,所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡中,延迟平衡的表达式如下:In a feasible embodiment of the solution of the present application, the connection lines between nodes in the control data flow graph are inserted into the pipeline to perform delay balancing. The expression of delay balancing is as follows:
e
ij.balance=(S
i-S
j-e
ij.lat)
e ij.balance =(S i -S j -e ij.lat )
其中,S
i表示节点v
i的时间步,S
j表示节点v
j的时间步,S
i-S
j表示节点v
i和节点v
j之间的所有路径之间的最大延迟;e
ij.lat表示插入流水线之前存在的额外时延;e
ij.balance表示插入流水线后产生的平衡时延。
Among them, S i represents the time step of node v i , S j represents the time step of node v j , S i -S j represents the maximum delay between all paths between node v i and node v j ; e ij.lat Indicates the additional delay that exists before the pipeline is inserted; e ij.balance indicates the balance delay generated after the pipeline is inserted.
在本申请方案的一种可行的实施例中,所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡这一步骤,包括:In a feasible embodiment of the solution of the present application, the step of delay balancing the connection line insertion pipeline between nodes in the control data flow graph includes:
根据所述平衡时延构建面积开销的目标函数,所述目标函数为:An objective function of area overhead is constructed based on the balanced delay, and the objective function is:
其中,e
ij.width为流水线在节点v
i与节点v
j之间最大数据位宽。
Among them, e ij.width is the maximum data bit width of the pipeline between node vi and node v j .
本发明的优点和有益效果将在下面的描述中部分给出,其他部分可以通过本发明的具体实施方式了解得到:The advantages and beneficial effects of the present invention will be partially given in the following description, and other parts can be understood through the specific implementation of the present invention:
本申请技术方案提出了一种基于平面规划算法的高层次综合指导FPGA物理布局约束的全流程布局方法,方法通过目标电路的电路描述构建得到控制数据流图,并通过平面规划算法对所述控制数据流图进行分割,得到布局约束;在布局约束的基础上进行目标电路对应资源的调度和绑定得到寄存器传输级描述,进一步进行综合处理、布局布线得到目标电路的网表;通过高层次综合减少布局布线的拥塞情况,同时也能够减少布线过程中跨FPGA block边界带来的延迟增加。The technical solution of this application proposes a full-process layout method based on a high-level comprehensive guidance of FPGA physical layout constraints based on a floor planning algorithm. The method constructs a control data flow diagram through the circuit description of the target circuit, and uses the floor planning algorithm to control the control data. The data flow graph is segmented to obtain layout constraints; on the basis of the layout constraints, the corresponding resources of the target circuit are scheduled and bound to obtain a register transfer level description, and further comprehensive processing and placement and routing are performed to obtain the netlist of the target circuit; through high-level synthesis Reduce the congestion of layout and routing, and also reduce the increase in delay caused by crossing FPGA block boundaries during the routing process.
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.
图1为本申请技术方案中所提供的高层次综合的流程布局方法的步骤流程图;Figure 1 is a step flow chart of the high-level comprehensive process layout method provided in the technical solution of this application;
图2为本申请技术方案中控制数据流图的示意图;Figure 2 is a schematic diagram of the control data flow diagram in the technical solution of this application;
图3为本申请技术方案中迭代分区过程示意图;Figure 3 is a schematic diagram of the iterative partitioning process in the technical solution of this application;
图4(a)为本申请技术方案中平衡延迟示意图之一;Figure 4(a) is one of the balance delay schematic diagrams in the technical solution of this application;
图4(b)为本申请技术方案中平衡延迟示意图之二;Figure 4(b) is the second balance delay diagram in the technical solution of this application;
图5为本申请技术方案中FIFO流水线示意图。Figure 5 is a schematic diagram of the FIFO pipeline in the technical solution of this application.
下面详细描述本发明的实施例,实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。对于以下实施例中的步骤编号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。The embodiments of the present invention are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and cannot be understood as limiting the present invention. The step numbers in the following embodiments are only set for the convenience of explanation. The order between the steps is not limited in any way. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art. sexual adjustment.
基于目前相关技术方案中,特别是在FPGA布局布线中,物理综合器往往会使用距离较 近的资源,从而容易带来布局布线的拥塞,增加整体的布线长度,进而降低电路的吞吐量。针对相关技术方案中所存在的技术缺陷,本申请技术方案提出了一种基于平面规划算法的高层次综合工具指导FPGA物理布局约束的全流程布局方法。Based on current related technical solutions, especially in FPGA layout and routing, physical synthesizers tend to use resources that are relatively close together, which can easily cause congestion in layout and routing, increase the overall wiring length, and thus reduce the throughput of the circuit. In view of the technical defects existing in related technical solutions, the technical solution of this application proposes a full-process layout method based on a high-level comprehensive tool based on a floor planning algorithm to guide FPGA physical layout constraints.
在第一方面,如图1所示,本申请技术方案提供了高层次综合的流程布局方法;方法包括步骤S100-S400:In the first aspect, as shown in Figure 1, the technical solution of this application provides a high-level comprehensive process layout method; the method includes steps S100-S400:
S100、获取目标电路的电路描述,根据所述电路描述构建得到所述目标电路对应的控制数据流图;S100. Obtain the circuit description of the target circuit, and construct a control data flow diagram corresponding to the target circuit based on the circuit description;
其中,如图2所示,控制数据流图为表征目标电路运算操作的有向图,在图2中的实线表示数据依赖关系,虚线表示控制依赖关系,三角形符号表示分支操作。实施例中电路描述包括但不限于通过VHDL语言或者Verilog语言,对目标电路中的信号输入,元器件以及元器件所执行的逻辑操作进行描述的内容;目标电路可以是指真实场景中硬件电路。具体在实施例中,首先获取输入的电路描述,构建控制数据流图,在实施例中,控制数据流图(Control Data Flow Graph,CDFG)是一种有向图G=<V,E>,其中V表示控制数据流图中所有节点的集合,控制数据流图中的每一个节点代表目标电路中的一个运算操作;E表示控制数据流图中全部有向连接线的集合,控制数据流图中每一条连接两个节点的有向边代表这两个相应运算操作之间存在的数据或控制依赖关系。控制数据流图的控制关系依赖边体现电路描述的控制依赖性,数据关系依赖边体现电路描述的数据依赖性。基于控制数据流图的性质,在实施例中构建得到控制数据流图的过程中,首先,实施例通过编译器前端,根据源代码将电路描述中的行为级描述的高级语言代码生成中间码;然后实施例通过编译器后端将电路描述中变量映射为节点,将控制和数据依赖性映射为有向边,构建控制流数据流图。Among them, as shown in Figure 2, the control data flow graph is a directed graph that represents the operation operations of the target circuit. The solid lines in Figure 2 represent data dependencies, the dotted lines represent control dependencies, and the triangle symbols represent branch operations. In the embodiment, the circuit description includes but is not limited to describing the signal input, components, and logical operations performed by the components in the target circuit through VHDL language or Verilog language; the target circuit may refer to a hardware circuit in a real scene. Specifically, in the embodiment, the input circuit description is first obtained and a control data flow graph is constructed. In the embodiment, the control data flow graph (Control Data Flow Graph, CDFG) is a directed graph G=<V,E>, Where V represents the set of all nodes in the control data flow graph, and each node in the control data flow graph represents an operation in the target circuit; E represents the set of all directed connections in the control data flow graph, and the control data flow graph Each directed edge connecting two nodes in represents the data or control dependency between the two corresponding operations. The control relationship dependency edge of the control data flow graph reflects the control dependence of the circuit description, and the data relationship dependency edge reflects the data dependence of the circuit description. Based on the nature of the control data flow graph, in the process of constructing the control data flow graph in the embodiment, first, the embodiment uses the compiler front end to generate intermediate code from the high-level language code of the behavioral level description in the circuit description according to the source code; The embodiment then uses the compiler backend to map variables in the circuit description into nodes, and map control and data dependencies into directed edges to build a control flow data flow graph.
S200、通过平面规划算法对所述控制数据流图进行分割,得到布局约束;S200. Segment the control data flow graph through a floor planning algorithm to obtain layout constraints;
其中,实施例中的平面规划算法可以采用割平面算法进行求解规划;实施例中的布局可以定义为一组物理约束,用于控制逻辑在模型中的放置方式。具体在实施例中,首先根据目标电路对应的FPGA架构,确定目标电路对应的分区数量、资源以及资源最大利用率;然后,将控制数据流图中数据流进程对应的函数编译为一个寄存器转换级电路(Register Transfer Level,RTL)模块,放置在初始分区;基于割平面算法,将当前分区水平地或竖直地一分为二,计算并选择成本函数最小的方案,基于所得到的方案,确定目标电路对应在FPGA架构中布局约束。Among them, the plane planning algorithm in the embodiment can use the cutting plane algorithm for solution planning; the layout in the embodiment can be defined as a set of physical constraints for controlling the placement of logic in the model. Specifically, in the embodiment, first according to the FPGA architecture corresponding to the target circuit, the number of partitions, resources and maximum resource utilization rate corresponding to the target circuit are determined; then, the functions corresponding to the data flow process in the control data flow graph are compiled into a register conversion level The circuit (Register Transfer Level, RTL) module is placed in the initial partition; based on the cutting plane algorithm, the current partition is divided into two horizontally or vertically, and the solution with the smallest cost function is calculated and selected. Based on the obtained solution, determine The target circuit corresponds to the layout constraints in the FPGA architecture.
在一些可行的实施方式中,通过平面规划算法对所述控制数据流图进行分割,得到布局约束这一步骤S200,可以包括步骤S210-S240:In some feasible implementations, the step S200 of segmenting the control data flow graph through a floor planning algorithm to obtain layout constraints may include steps S210-S240:
S210、获取所述电路描述中所述目标电路的FPGA架构;S210. Obtain the FPGA architecture of the target circuit described in the circuit description;
S220、根据所述控制数据流图中数据流进程对应的函数,进行编译得到寄存器转换级的电路模块;S220. Compile according to the function corresponding to the data flow process in the control data flow graph to obtain the circuit module of the register conversion level;
S230、根据所述电路模块对所述FPGA架构进行分区,确定分区结果的成本函数;S230. Partition the FPGA architecture according to the circuit module and determine the cost function of the partition result;
S240、通过对所述FPGA架构的分区结果进行分割迭代,确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值,输出得到所述布局约束;S240. By performing segmentation and iteration on the partition results of the FPGA architecture, determine that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and output the layout constraint;
具体在实施例中,FPGA架构中有多个Block,不同Block的资源可能会不同。良好的布局有助于减少布线拥塞,并提高设计中可实现的时序结果(QoR)质量;而实施例中,布局约束使用到了Pblock指令以实现对资源分区的指定,Pblock边界允许利用时钟区域边界来定义Pblock的大小,而不是使用SLICE、BRAM、DSP等范围,有助于限制时钟偏斜,并有助于设计的整体时钟放置。并且基于控制数据流图,实施例对HLS设计进行分区的基础是数据流编程样式,即HLS设计是流式的,设计结构被描述成一张有向图;在有向图中节点表示需要进行运算处理的单元,节点之间的连接线描述了数据传输路径;在有向图中,相邻节点通过量接线传输数据,节点消耗数据进行计算,并将产生的数据输出到输入输出序列作为下一个计算单元的输入。Specifically, in the embodiment, there are multiple Blocks in the FPGA architecture, and the resources of different Blocks may be different. Good layout helps reduce routing congestion and improves the quality of timing results (QoR) achievable in the design; in the embodiment, the layout constraints use the Pblock instruction to specify the resource partition, and the Pblock boundary allows the use of clock region boundaries. Defining the size of the Pblock instead of using SLICE, BRAM, DSP, etc. ranges helps limit clock skew and aids in the overall clock placement of the design. And based on the control data flow graph, the embodiment partitions the HLS design based on the data flow programming style, that is, the HLS design is streaming, and the design structure is described as a directed graph; nodes in the directed graph represent operations that need to be performed The unit of processing, the connection line between nodes describes the data transmission path; in the directed graph, adjacent nodes transmit data through quantity wiring, the nodes consume the data for calculation, and output the generated data to the input-output sequence as the next input to the calculation unit.
示例性地,实例中HLS设计采用了一个数据流编程模型,其中每个函数对应一个数据流进程,每个函数对应一个RTL模块,模块之间使用FIFO进行通信。则构建出图G=<V,E>,其中V表示数据流的集合,每一个节点代表一个函数;E表示顶点之间FIFO通道的集合。For example, the HLS design in the example adopts a data flow programming model, in which each function corresponds to a data flow process, each function corresponds to an RTL module, and FIFO is used to communicate between modules. Then a graph G = <V, E> is constructed, where V represents a collection of data flows, and each node represents a function; E represents a collection of FIFO channels between vertices.
S300、对所述控制数据流图进行调度,将调度结果进行绑定得到寄存器传输级描述;S300. Schedule the control data flow graph, and bind the scheduling results to obtain a register transfer level description;
具体在实施例中,需要针对控制数据流图的子图进行相应的调度;对控制数据流图进行延迟平衡;进而,在一些可行的实施例中,步骤S300可以包括步骤S310-S330:Specifically, in the embodiment, corresponding scheduling needs to be performed for the subgraphs of the control data flow graph; delay balancing is performed on the control data flow graph; further, in some feasible embodiments, step S300 may include steps S310-S330:
S310、对所述控制数据流图的子图进行调度;S310. Schedule the subgraphs of the control data flow graph;
S320、对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡;S320. Insert the connection lines between nodes in the control data flow graph into the pipeline to perform delay balancing;
S330、将调度结果以及延迟平衡后的结果进行数学整合,根据数学整合后的结果与所述目标电路进行绑定,得到所述寄存器传输级描述。S330: mathematically integrate the scheduling results and the delay-balanced results, and bind the mathematically integrated results to the target circuit to obtain the register transfer level description.
具体在实施例中,对控制数据流图的子图采用高层次综合工具的默认方式进行调度;然后,对控制数据流图的割边插入流水线,进行平衡延迟;将步骤S310-S320中得到的调度结果进行数学整合后,获得综合调度结果;对综合调度结果对目标电路对应的FPGA架构中的资源进行绑定,获得寄存器传输级描述。Specifically, in the embodiment, the subgraphs of the control data flow graph are scheduled using the default method of the high-level synthesis tool; then, the cutting edge of the control data flow graph is inserted into the pipeline to perform balancing delay; the results obtained in steps S310-S320 are After the scheduling results are mathematically integrated, the comprehensive scheduling results are obtained; the comprehensive scheduling results are bound to the resources in the FPGA architecture corresponding to the target circuit to obtain the register transfer level description.
S400、根据所述布局约束对所述寄存器传输级描述得到目标网表,根据所述目标网表确 定所述目标电路的流程布局;S400. Describe the register transfer level according to the layout constraints to obtain a target netlist, and determine the process layout of the target circuit according to the target netlist;
具体在实施例中,根据步骤S200中所得到布局约束以及步骤S300中所得到的寄存器传输级描述;对FPGA架构中的资源进行整合处理以及布局布线操作,得到对应的目标网表,从而确定目标电路对应的控制流程布局。Specifically, in the embodiment, according to the layout constraints obtained in step S200 and the register transfer level description obtained in step S300, the resources in the FPGA architecture are integrated and placed and routed to obtain the corresponding target netlist, thereby determining the target. The corresponding control flow layout of the circuit.
在一些可行的实施方式中,实施例中的布局约束包括时序约束或物理约束中的至少之一;目标网表包括综合网表或布局布线网表中的至少之一;进而实施例中根据所述布局约束对所述寄存器传输级描述得到目标网表,根据所述目标网表确定所述目标电路的流程布局这一步骤S400,可以包括步骤S410-S450:In some feasible implementations, the layout constraints in the embodiment include at least one of timing constraints or physical constraints; the target netlist includes at least one of a comprehensive netlist or a placement and routing netlist; further, in the embodiment, according to the The layout constraints describe the register transfer level to obtain a target netlist. The step S400 of determining the process layout of the target circuit based on the target netlist may include steps S410-S450:
S410、根据所述时序约束和/或所述物理约束构建得到第一输入;S410. Construct and obtain the first input according to the timing constraints and/or the physical constraints;
S420、根据所述寄存器传输级描述构建得到第二输入;S420. Construct according to the register transfer level description to obtain the second input;
S430、通过FPGA物理综合器,将所述第一输入以及所述第二输入进行整合处理输出得到综合网表;S430. Use the FPGA physical synthesizer to integrate the first input and the second input and output them to obtain a comprehensive netlist;
S440、通过FPGA物理综合器,根据所述第一输入以及所述第二输入进行布局布线处理输出得到布局布线网表;S440. Through the FPGA physical synthesizer, perform placement and routing processing and output according to the first input and the second input to obtain a placement and routing netlist;
S450、根据所述综合网表以及所述布局布线网表确定所述目标电路的流程布局。S450. Determine the process layout of the target circuit according to the comprehensive netlist and the placement and routing netlist.
具体在实施例中,首先将步骤S200中所得到的布局约束和高层次综合工具本身得到时序约束、物理约束作为综合实现的一组约束条件输入,即为第一输入;将步骤S300中所得到的寄存器传输级描述作为综合实现的RTL输入,即第二输入;然后实施例运行FPGA物理综合器,执行综合、布局布线操作,获得综合后网表和布局布线后网表,根据所得到的网表确定目标电路的流程布局。Specifically, in the embodiment, the layout constraints obtained in step S200 and the timing constraints and physical constraints obtained by the high-level synthesis tool itself are input as a set of constraints for comprehensive implementation, which is the first input; the layout constraints obtained in step S300 are input The register transfer level description is used as the RTL input of the synthesis implementation, that is, the second input; then the embodiment runs the FPGA physical synthesizer, performs synthesis and placement and routing operations, and obtains the post-synthesis netlist and the post-layout netlist. According to the obtained network The table determines the flow layout of the target circuit.
在实施例中,步骤S230中根据所述电路模块对所述FPGA架构进行分区,确定分区结果的成本函数,其中,成本函数物理意义是穿过分区边界的导线数量的总和。进而,实施例中的成本函数为:In the embodiment, in step S230, the FPGA architecture is partitioned according to the circuit module, and a cost function of the partition result is determined, where the physical meaning of the cost function is the sum of the number of wires passing through the partition boundary. Furthermore, the cost function in the embodiment is:
其中,C为成本值,v
i以及v
j表征所述控制数据流图中的节点,i=1,2,3,…n,j=1,2,3,…n,n为正整数,E表征节点间FIFO通道的集合,e
ij为v
i和v
j之间的连接线,row表示行数,col表示列数,width表示数据位宽。
Among them, C is the cost value, v i and v j represent the nodes in the control data flow graph, i=1,2,3,...n, j=1,2,3,...n, n is a positive integer, E represents the set of FIFO channels between nodes, e ij is the connection line between v i and v j , row represents the number of rows, col represents the number of columns, and width represents the data bit width.
在实施例中,步骤S240通过对所述FPGA架构的分区结果进行分割迭代,确定所述成本 函数为最小值或者分区中的资源到达资源约束的临界值,输出得到所述布局约束;其中,资源约束的表达式如下:In the embodiment, step S240 performs segmentation and iteration on the partition results of the FPGA architecture, determines that the cost function is the minimum value or the resources in the partition reach the critical value of resource constraints, and outputs the layout constraints; where, resources The expression of the constraint is as follows:
其中,v
d表示节点v分配的分区空间,v
area表示节点的所需资源,r
v表示当前分区r所容纳的节点集合,(r
child)
area表示每个分区中的资源数量。
Among them, v d represents the partition space allocated by node v, v area represents the required resources of the node, r v represents the set of nodes accommodated by the current partition r, and (r child ) area represents the number of resources in each partition.
在实施例中,对所述FPGA架构的分区结果进行分割迭代,确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值,输出得到所述布局约束这一步骤S240,可以包括步骤S241-S242:In an embodiment, the step S240 of performing segmentation and iteration on the partition results of the FPGA architecture, determining that the cost function is the minimum value or the resources in the partition reaches a critical value of resource constraints, and outputting the layout constraints may include Steps S241-S242:
S241、获取所述控制数据流图中节点在分割迭代之前的第一坐标,根据分割方式确定坐标变换关系,根据所述坐标变换关系将所述第一坐标变换得到第二坐标;S241. Obtain the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determine the coordinate transformation relationship according to the segmentation method, and transform the first coordinates according to the coordinate transformation relationship to obtain the second coordinates;
S242、所述分割方式包括水平方向分割或竖直方向分割;S242. The segmentation method includes horizontal segmentation or vertical segmentation;
具体在实施例中,如图3所示,分区的过程可以看作是迭代地一次次一分为二,直到成本函数最小或者不再满足约束条件,最后再添加流水线FIFO;第一步是把所有函数映射成RTL模块,放在一个分区中,称为初始化分区,其中依赖关系为1指向2、3、4,2、3、4指向5,而且2和3的资源占用比较少;第二步是竖直地一分为二,123放在上方,45放在下方;第三步是对于每个分区水平地一分为二,其结果是2和3位于左上方,1位于右上方,4位于左下方,5位于右下方;最后一步是为跨越block边界的走线添加FIFO流水线,保证电路设计的吞吐量。Specifically, in the embodiment, as shown in Figure 3, the partitioning process can be seen as iteratively dividing it into two parts again and again until the cost function is minimum or the constraints are no longer met, and finally the pipeline FIFO is added; the first step is to All functions are mapped to RTL modules and placed in a partition, called the initialization partition, in which the dependency relationship is 1 points to 2, 3, 4, 2, 3, 4 points to 5, and 2 and 3 occupy less resources; second The first step is to divide it into two vertically, with 123 placed at the top and 45 at the bottom; the third step is to divide each partition into two horizontally, with the result that 2 and 3 are located at the upper left and 1 is located at the upper right. 4 is located at the lower left and 5 is located at the lower right; the last step is to add a FIFO pipeline for the traces that cross the block boundary to ensure the throughput of the circuit design.
进一步地,实施例中坐标变换关系的表达式如下:Further, the expression of the coordinate transformation relationship in the embodiment is as follows:
其中,v.row表示第二坐标中的行坐标,v.col表示第二坐标中的列坐标,(v.row)
prev表示第一坐标中的行坐标,(v.col)
prev表示第一坐标中的列坐标,v
d表示节点v分配的分区空间,vertical partition表示水平方向分割,horizontal partition表示竖直方向分割。
Among them, v.row represents the row coordinate in the second coordinate, v.col represents the column coordinate in the second coordinate, (v.row) prev represents the row coordinate in the first coordinate, (v.col) prev represents the first Column coordinates in the coordinates, v d represents the partition space allocated by node v, vertical partition represents horizontal division, and horizontal partition represents vertical division.
在实施例中,步骤S320对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡。In an embodiment, step S320 performs delay balancing on the connection line insertion pipeline between nodes in the control data flow graph.
具体在实施例中,给定一个已经分区和流水线化的数据流图G<V,E>,每个顶点v∈V代表数据流设计中的一个函数,每条边e∈E代表函数之间的FIFO通道,宽度e.width表示边的位宽,延时e.lat代表在上一个流水线步骤中插入的额外延时,平衡延时e.balance代表在当前步骤中的平衡延时。对于每条边e∈E,每条路径的总延时可以表示为:Specifically, in the embodiment, given a data flow graph G<V,E> that has been partitioned and pipelined, each vertex v∈V represents a function in the data flow design, and each edge e∈E represents a function between functions. FIFO channel, width e.width represents the bit width of the edge, delay e.lat represents the additional delay inserted in the previous pipeline step, and balance delay e.balance represents the balance delay in the current step. For each edge e∈E, the total delay of each path can be expressed as:
其中,{p
1,p
2}表示一对重新汇聚的路径。进一步地,延迟平衡中对于每条边(连接线)e,可以认为S
i≥S
j+e
ij.lat,额外的平衡延时可以表示为:
Among them, {p 1 , p 2 } represents a pair of re-converged paths. Furthermore, for each edge (connection line) e in delay balance, it can be considered that S i ≥ S j +e ij.lat , and the additional balance delay can be expressed as:
e
ij.balance=(S
i-S
j-e
ij.lat)
e ij.balance =(S i -S j -e ij.lat )
其中,S
i表示节点v
i的时间步,S
j表示节点v
j的时间步,S
i-S
j表示节点v
i和节点v
j之间的所有路径之间的最大延迟;e
ij.lat表示上一个流水线步骤在顶点v
i和v
j之间最长路径中插入的额外延时;e
ij.balance表示当前流水线步骤在顶点v
i和v
j之间最长路径中插入的额外延时。
Among them, S i represents the time step of node v i , S j represents the time step of node v j , S i -S j represents the maximum delay between all paths between node v i and node v j ; e ij.lat Represents the extra delay inserted by the previous pipeline step in the longest path between vertices v i and v j ; e ij.balance represents the additional delay inserted by the current pipeline step in the longest path between vertices v i and v j .
示例性地,如图4所示,图4(a)表示平衡延迟过程中的cut-set1剪割集;图4(b)表示平衡延迟过程中的cut-set2和cut-set3剪割集,其中,边缘e13、e37和e27是根据平面图分区进行管道化的,然后每个边缘都携带1个单位的插入延迟。同时假设e14的位宽为2,其他所有边均为1。在延迟平衡步骤中,最优解决方案是给e47、e57、e67每条边增加2个单位的延迟,给e12每个增加1个单位的延迟。请注意,e27和e37可以存在于同一个剪割集cut-set中。For example, as shown in Figure 4, Figure 4(a) represents the cut-set1 cut set in the balanced delay process; Figure 4(b) represents the cut-set2 and cut-set3 cut sets in the balanced delay process. Among them, edges e13, e37 and e27 are pipelined according to the floor plan partition, and then each edge carries 1 unit of insertion delay. Also assume that the bit width of e14 is 2 and all other edges are 1. In the delay balancing step, the optimal solution is to add 2 units of delay to each edge of e47, e57, and e67, and to add 1 unit of delay to each of e12. Note that e27 and e37 can exist in the same cut-set.
如图5所示,分区划分后基于FIFO进行连接,使其能够流水线化。采用FIFO,可以直接调度匹配接口信号,而不影响功能,并且提供电路功能的并行性。As shown in Figure 5, the connection is based on FIFO after partitioning, so that it can be pipelined. Using FIFO, matching interface signals can be directly scheduled without affecting functions, and provide parallelism of circuit functions.
在一些可行的实施方式中,所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡这一步骤S320,还可以包括步骤S321:In some feasible implementations, the step S320 of delay balancing the connection line insertion pipeline between nodes in the control data flow graph may also include step S321:
S321、根据所述平衡时延构建面积开销的目标函数;S321. Construct an objective function of area overhead according to the balanced delay;
具体在实施例中,平衡延迟的优化目标是最小化总面积开销,并且考虑到每条边的位宽开销。实施例中的目标函数为:Specifically in the embodiment, the optimization goal of balancing delay is to minimize the total area overhead and take into account the bit width overhead of each side. The objective function in the embodiment is:
其中,e
ij.width为流水线在节点v
i与节点v
j之间最大数据位宽
Among them, e ij.width is the maximum data bit width of the pipeline between node v i and node v j
从上述具体的实施过程,可以总结出,本发明所提供的技术方案相较于现有技术存在以下优点或优势:From the above specific implementation process, it can be concluded that the technical solution provided by the present invention has the following advantages or advantages compared with the existing technology:
本发明可以应用在中大型的数据流编程模型的高层次综合设计中,发挥高层次综合工具能够快速原型设计的特点,同时延展了设计流程,从高级语言到硬件描述语言,再到物理布局。高层次综合工具指导物理布局的全流程设计方法,可以进一步提高高层次综合设计的布局布线情况,减少布局拥塞,保证电路整体的吞吐量。The invention can be applied in the high-level comprehensive design of medium and large data flow programming models, taking advantage of the characteristics of high-level comprehensive tools to enable rapid prototyping, and at the same time extending the design process from high-level language to hardware description language to physical layout. High-level synthesis tools guide the full-process design method of physical layout, which can further improve the layout and routing conditions of high-level synthesis design, reduce layout congestion, and ensure the overall throughput of the circuit.
示例性地,对RISCV CPU设计分别使用默认的HLS和应用布局约束HLS,可以明显确定应用布局约束的HLS设计将资源占用率最高的CPU模块、FFT模块、USB1模块、USB2模块进行分区流水线处理,同时分布在相邻但不同的block块。在满足时序约束的情况下,时序裕量基本没有改变,但是拥塞情况大大改善,也减少了物理综合的运行时间。For example, using the default HLS and the application layout constraint HLS respectively for the RISCV CPU design, it can be clearly determined that the HLS design with the application layout constraints will partition the CPU module, FFT module, USB1 module, and USB2 module with the highest resource usage for partition pipeline processing. Distributed in adjacent but different blocks at the same time. When the timing constraints are met, the timing margin is basically unchanged, but the congestion situation is greatly improved, and the running time of physical synthesis is also reduced.
在一些可选择的实施例中,在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如,取决于所涉及的功能/操作,连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外,在本发明的流程图中所呈现和描述的实施例以示例的方式被提供,目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。可选择的实施例是可预期的,其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。In some alternative embodiments, the functions/operations noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality/operations involved. Furthermore, the embodiments presented and described in the flow diagrams of the present invention are provided by way of example for the purpose of providing a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of a larger operation are performed independently.
此外,虽然在功能性模块的背景下描述了本发明,但应当理解的是,除非另有相反说明,功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中,或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是,有关每个模块的实际实现的详细讨论对于理解本发明是不必要的。更确切地说,考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下,在工程师的常规技术内将会了解该模块的实际实现。因此,本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本发明。还可以理解的是,所公开的特定概念仅仅是说明性的,并不意在限制本发明的范围,本发明的范围由所附权利要求书及其等同方案的全部范围来决定。Furthermore, although the invention has been described in the context of functional modules, it should be understood that, unless stated to the contrary, one or more of the functions and/or features may be integrated into a single physical device and/or software module. , or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be understood that a detailed discussion regarding the actual implementation of each module is not necessary to understand the invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be within the ordinary skill of an engineer, taking into account the properties, functions and internal relationships of the modules. Therefore, a person skilled in the art using ordinary skills can implement the invention set forth in the claims without undue experimentation. It will also be understood that the specific concepts disclosed are illustrative only and are not intended to limit the scope of the invention, which is to be determined by the full scope of the appended claims and their equivalents.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或 者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art will appreciate that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and purposes of the invention. The scope of the invention is defined by the claims and their equivalents.
以上是对本发明的较佳实施进行了具体说明,但本发明并不限于上述实施例,熟悉本领域的技术人员在不违背本发明精神的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a detailed description of the preferred implementation of the present invention, but the present invention is not limited to the above embodiments. Those skilled in the art can also make various equivalent modifications or substitutions without violating the spirit of the present invention. Equivalent modifications or substitutions are included within the scope defined by the claims of this application.
Claims (10)
- 高层次综合的流程布局方法,其特征在于,包括以下步骤:A high-level comprehensive process layout method is characterized by including the following steps:获取目标电路的电路描述,根据所述电路描述构建得到所述目标电路对应的控制数据流图;所述控制数据流图为表征目标电路运算操作的有向图;Obtain the circuit description of the target circuit, and construct a control data flow graph corresponding to the target circuit based on the circuit description; the control data flow graph is a directed graph representing the operation operation of the target circuit;通过平面规划算法对所述控制数据流图进行分割,得到布局约束;Segment the control data flow graph through a floor planning algorithm to obtain layout constraints;对所述控制数据流图进行调度,将调度结果进行绑定得到寄存器传输级描述;Schedule the control data flow graph, and bind the scheduling results to obtain a register transfer level description;根据所述布局约束对所述寄存器传输级描述得到目标网表,根据所述目标网表确定所述目标电路的流程布局。The register transfer level is described according to the layout constraints to obtain a target netlist, and the process layout of the target circuit is determined according to the target netlist.
- 根据权利要求1所述的高层次综合的流程布局方法,其特征在于,所述通过平面规划算法对所述控制数据流图进行分割,得到布局约束,包括:The high-level comprehensive process layout method according to claim 1, characterized in that the control data flow graph is segmented through a plane planning algorithm to obtain layout constraints, including:获取所述电路描述中所述目标电路的FPGA架构;Obtain the FPGA architecture of the target circuit described in the circuit description;根据所述控制数据流图中数据流进程对应的函数,进行编译得到寄存器转换级的电路模块;Compile according to the function corresponding to the data flow process in the control data flow graph to obtain the circuit module of the register conversion level;根据所述电路模块对所述FPGA架构进行分区,确定分区结果的成本函数;Partition the FPGA architecture according to the circuit module and determine the cost function of the partition result;通过对所述FPGA架构的分区结果进行分割迭代,确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值,输出得到所述布局约束。By performing segmentation and iteration on the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and the layout constraint is obtained as output.
- 根据权利要求1所述的高层次综合的流程布局方法,其特征在于,所述对所述控制数据流图进行调度,将调度结果进行绑定得到寄存器传输级描述,包括:The high-level comprehensive process layout method according to claim 1, characterized in that: scheduling the control data flow graph and binding the scheduling results to obtain a register transfer level description, including:对所述控制数据流图的子图进行调度;Schedule subgraphs of the control data flow graph;对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡;Insert connection lines between nodes in the control data flow graph into pipelines to perform delay balancing;将调度结果以及延迟平衡后的结果进行数学整合,根据数学整合后的结果与所述目标电路进行绑定,得到所述寄存器传输级描述。The scheduling results and the delay-balanced results are mathematically integrated, and the mathematically integrated results are bound to the target circuit to obtain the register transfer level description.
- 根据权利要求1所述的高层次综合的流程布局方法,其特征在于,所述布局约束包括时序约束或物理约束中的至少之一;所述目标网表包括综合网表或布局布线网表中的至少之一;所述根据所述布局约束对所述寄存器传输级描述得到目标网表,根据所述目标网表确定所述目标电路的流程布局,包括:The high-level integrated process layout method according to claim 1, wherein the layout constraints include at least one of timing constraints or physical constraints; the target netlist includes a comprehensive netlist or a placement and routing netlist. At least one of; describing the register transfer level according to the layout constraints to obtain a target netlist, and determining the process layout of the target circuit according to the target netlist, including:根据所述时序约束和/或所述物理约束构建得到第一输入;Construct according to the timing constraints and/or the physical constraints to obtain the first input;根据所述寄存器传输级描述构建得到第二输入;Construct according to the register transfer level description to obtain the second input;通过FPGA物理综合器,将所述第一输入以及所述第二输入进行整合处理输出得到综合网表;Through the FPGA physical synthesizer, the first input and the second input are integrated and processed to output a comprehensive netlist;通过FPGA物理综合器,根据所述第一输入以及所述第二输入进行布局布线处理输出得到布局布线网表;Through the FPGA physical synthesizer, place and route processing is performed according to the first input and the second input to obtain a place and route netlist;根据所述综合网表以及所述布局布线网表确定所述目标电路的流程布局。The flow layout of the target circuit is determined according to the comprehensive netlist and the placement and routing netlist.
- 根据权利要求2所述的高层次综合的流程布局方法,其特征在于,所述成本函数用于表征所述FPGA架构的分区边界的导线数量;所述成本函数为:The high-level integrated process layout method according to claim 2, characterized in that the cost function is used to characterize the number of wires at the partition boundary of the FPGA architecture; the cost function is:其中,C为成本值,v i以及v j表征所述控制数据流图中的节点,i=1,2,3,…n,j=1,2,3,…n,n为正整数,E表征节点间FIFO通道的集合,e ij为v i和v j之间的连接线,row表示行数,col表示列数,width表示数据位宽。 Among them, C is the cost value, v i and v j represent the nodes in the control data flow graph, i=1,2,3,...n, j=1,2,3,...n, n is a positive integer, E represents the set of FIFO channels between nodes, e ij is the connection line between v i and v j , row represents the number of rows, col represents the number of columns, and width represents the data bit width.
- 根据权利要求2所述的高层次综合的流程布局方法,其特征在于,所述资源约束的表达式如下:The high-level comprehensive process layout method according to claim 2, characterized in that the expression of the resource constraints is as follows:其中,v d表示节点v分配的分区空间,v area表示节点的所需资源,r v表示当前分区r所容纳的节点集合,(r child) area表示每个分区中的资源数量。 Among them, v d represents the partition space allocated by node v, v area represents the required resources of the node, r v represents the set of nodes accommodated by the current partition r, and (r child ) area represents the number of resources in each partition.
- 根据权利要求2所述的高层次综合的流程布局方法,其特征在于,所述通过对所述FPGA架构的分区结果进行分割迭代,确定所述成本函数为最小值或者分区中的资源到达资源约束的临界值,输出得到所述布局约束,包括:The high-level comprehensive process layout method according to claim 2, characterized in that by performing segmentation and iteration on the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach resource constraints. The critical value of , the output obtains the layout constraints, including:获取所述控制数据流图中节点在分割迭代之前的第一坐标,根据分割方式确定坐标变换关系,根据所述坐标变换关系将所述第一坐标变换得到第二坐标;Obtain the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determine the coordinate transformation relationship according to the segmentation method, and transform the first coordinates according to the coordinate transformation relationship to obtain the second coordinates;所述分割方式包括水平方向分割或竖直方向分割。The segmentation method includes horizontal segmentation or vertical segmentation.
- 根据权利要求7所述的高层次综合的流程布局方法,其特征在于,所述坐标变换关系的表达式如下:The high-level comprehensive process layout method according to claim 7, characterized in that the expression of the coordinate transformation relationship is as follows:其中,v.row表示第二坐标中的行坐标,v.col表示第二坐标中的列坐标,(v.row) prev表 示第一坐标中的行坐标,(v.col) prev表示第一坐标中的列坐标,v d表示节点v分配的分区空间,vertical partition表示水平方向分割,horizontal partition表示竖直方向分割。 Among them, v.row represents the row coordinate in the second coordinate, v.col represents the column coordinate in the second coordinate, (v.row) prev represents the row coordinate in the first coordinate, (v.col) prev represents the first Column coordinates in the coordinates, v d represents the partition space allocated by node v, vertical partition represents horizontal division, and horizontal partition represents vertical division.
- 根据权利要求3所述的高层次综合的流程布局方法,其特征在于,所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡中,延迟平衡的表达式如下:The high-level comprehensive process layout method according to claim 3, characterized in that the connection lines between nodes in the control data flow graph are inserted into the pipeline to perform delay balancing, and the expression of delay balancing is as follows:e ij.balance=(S i-S j-e ij.lat) e ij.balance =(S i -S j -e ij.lat )其中,S i表示节点v i的时间步,S j表示节点v j的时间步,S i-S j表示节点v i和节点v j之间的所有路径之间的最大延迟;e ij.lat表示插入流水线之前存在的额外时延;e ij.balance表示插入流水线后产生的平衡时延。 Among them, S i represents the time step of node v i , S j represents the time step of node v j , S i -S j represents the maximum delay between all paths between node v i and node v j ; e ij.lat Indicates the additional delay that exists before the pipeline is inserted; e ij.balance indicates the balance delay generated after the pipeline is inserted.
- 根据权利要求9所述的高层次综合的流程布局方法,其特征在于,所述对所述控制数据流图中节点之间的连接线插入流水线进行延迟平衡这一步骤,包括:The high-level comprehensive process layout method according to claim 9, characterized in that the step of inserting connection lines between nodes in the control data flow graph into pipelines for delay balancing includes:根据所述平衡时延构建面积开销的目标函数,所述目标函数为:An objective function of area overhead is constructed based on the balanced delay, and the objective function is:其中,e ij.width为流水线在节点v i与节点v j之间最大数据位宽。 Among them, e ij.width is the maximum data bit width of the pipeline between node vi and node v j .
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211039147.2 | 2022-08-29 | ||
CN202211039147.2A CN115422876A (en) | 2022-08-29 | 2022-08-29 | High-Level Synthesis Process Layout Method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024045435A1 true WO2024045435A1 (en) | 2024-03-07 |
Family
ID=84199444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/141006 WO2024045435A1 (en) | 2022-08-29 | 2022-12-22 | Process layout method for high-level synthesis |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115422876A (en) |
WO (1) | WO2024045435A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118350338A (en) * | 2024-04-19 | 2024-07-16 | 深圳市瑞凯鸿辰科技有限公司 | Chip design method and chip for optimizing space utilization |
CN119294328A (en) * | 2024-12-13 | 2025-01-10 | 中科亿海微电子科技(苏州)有限公司 | FPGA layout method, storage medium and electronic device based on data flow planning |
CN119378461A (en) * | 2024-12-31 | 2025-01-28 | 北京开源芯片研究院 | Design method, device, electronic device and readable storage medium of network on chip |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115422876A (en) * | 2022-08-29 | 2022-12-02 | 中山大学 | High-Level Synthesis Process Layout Method |
CN117236253B (en) * | 2023-11-10 | 2024-02-02 | 苏州异格技术有限公司 | FPGA wiring method and device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102419789A (en) * | 2011-12-16 | 2012-04-18 | 中山大学 | A high-level synthesis method and system |
CN104239595A (en) * | 2013-06-24 | 2014-12-24 | 阿尔特拉公司 | Method and Apparatus for Implementing a System-Level Design Tool for Design Planning and Architecture Exploration |
CN107086218A (en) * | 2016-02-12 | 2017-08-22 | 格罗方德半导体公司 | Placement and routing methods for implementing back-biasing in FDSOI |
CN115422876A (en) * | 2022-08-29 | 2022-12-02 | 中山大学 | High-Level Synthesis Process Layout Method |
-
2022
- 2022-08-29 CN CN202211039147.2A patent/CN115422876A/en active Pending
- 2022-12-22 WO PCT/CN2022/141006 patent/WO2024045435A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102419789A (en) * | 2011-12-16 | 2012-04-18 | 中山大学 | A high-level synthesis method and system |
CN104239595A (en) * | 2013-06-24 | 2014-12-24 | 阿尔特拉公司 | Method and Apparatus for Implementing a System-Level Design Tool for Design Planning and Architecture Exploration |
CN107086218A (en) * | 2016-02-12 | 2017-08-22 | 格罗方德半导体公司 | Placement and routing methods for implementing back-biasing in FDSOI |
CN115422876A (en) * | 2022-08-29 | 2022-12-02 | 中山大学 | High-Level Synthesis Process Layout Method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118350338A (en) * | 2024-04-19 | 2024-07-16 | 深圳市瑞凯鸿辰科技有限公司 | Chip design method and chip for optimizing space utilization |
CN118350338B (en) * | 2024-04-19 | 2025-02-18 | 厦门锐信图芯科技有限公司 | Chip design method and chip for optimizing space utilization |
CN119294328A (en) * | 2024-12-13 | 2025-01-10 | 中科亿海微电子科技(苏州)有限公司 | FPGA layout method, storage medium and electronic device based on data flow planning |
CN119378461A (en) * | 2024-12-31 | 2025-01-28 | 北京开源芯片研究院 | Design method, device, electronic device and readable storage medium of network on chip |
Also Published As
Publication number | Publication date |
---|---|
CN115422876A (en) | 2022-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2024045435A1 (en) | Process layout method for high-level synthesis | |
US9652576B2 (en) | Detailed placement with search and repair | |
Rabaey et al. | Fast prototyping of datapath-intensive architectures | |
US7251800B2 (en) | Method and apparatus for automated circuit design | |
US7500216B1 (en) | Method and apparatus for performing physical synthesis hill-climbing on multi-processor machines | |
JP5706689B2 (en) | Architectural physical synthesis | |
JP5608079B2 (en) | Architectural physical synthesis | |
US9589090B1 (en) | Method and apparatus for performing multiple stage physical synthesis | |
US8782591B1 (en) | Physically aware logic synthesis of integrated circuit designs | |
US10684776B2 (en) | Memory configuration for inter-processor communication in an MPSoC | |
US8918748B1 (en) | M/A for performing automatic latency optimization on system designs for implementation on programmable hardware | |
JP2006011878A (en) | High-level synthesis method for semiconductor integrated circuit | |
JP2011103133A (en) | Circuit designing tool | |
US9262359B1 (en) | Method and system for implementing pipeline flip-flops | |
JP2002123563A (en) | Compiling method, composing device, and recording medium | |
US10586004B2 (en) | Method and apparatus for utilizing estimations for register retiming in a design compilation flow | |
CN113906428A (en) | Compiling flow of heterogeneous multi-core architecture | |
CN112257368A (en) | Clock layout method, device, EDA tool and computer readable storage medium | |
US8443334B1 (en) | Method and apparatus for generating graphical representations of slack potential for slack paths | |
US10339244B1 (en) | Method and apparatus for implementing user-guided speculative register retiming in a compilation flow | |
Zhang et al. | Reclaiming over-the-IP-block routing resources with buffering-aware rectilinear Steiner minimum tree construction | |
Lukac et al. | Geometric refactoring of quantum and reversible circuits: Quantum layout | |
Fung et al. | Slack allocation and routing to improve FPGA timing while repairing short-path violations | |
US20170249409A1 (en) | Emulation of synchronous pipeline registers in integrated circuits with asynchronous interconnection resources | |
Cong et al. | A metric for layout-friendly microarchitecture optimization in high-level synthesis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22957245 Country of ref document: EP Kind code of ref document: A1 |