WO2024045435A1

WO2024045435A1 - Process layout method for high-level synthesis

Info

Publication number: WO2024045435A1
Application number: PCT/CN2022/141006
Authority: WO
Inventors: 粟涛; 周沁茗; 黄静雯; 郭家怡; 陈弟虎; 王自鑫
Original assignee: 中山大学
Priority date: 2022-08-29
Filing date: 2022-12-22
Publication date: 2024-03-07
Also published as: CN115422876A

Abstract

Provided in the present invention is a process layout method for high-level synthesis. The method comprises the following steps: acquiring a circuit description for a target circuit, and performing construction according to the circuit description, so as to obtain a control data flow graph corresponding to the target circuit, wherein the control data flow graph is a directed graph that represents computation operations in the target circuit; partitioning the control data flow graph by means of a plane planning algorithm, so as to obtain layout constraints; scheduling the control data flow graph, and binding a scheduling result, so as to obtain a register transfer level description; and processing the register transfer level description according to the layout constraints, so as to obtain a target netlist, and determining a process layout of the target circuit according to the target netlist. The method reduces the congestion situations during layout wiring by means of high-level synthesis, can also reduce increases in the delay caused by the crossing of the boundary of an FPGA block during the wiring process, and can be widely applied to the technical field of circuit simulation.

Description

High-level integrated process layout method

Technical field

The invention relates to the technical field of circuit simulation, and in particular to a high-level integrated process layout method.

Background technique

High-level synthesis (HLS) refers to the process of automatically converting logical structures described in high-level languages into circuit models described in low-abstract language. HLS tools are efficient and fast, which can reduce the design time of hardware engineers and also allow software engineers to complete hardware design.

However, in related technical solutions, a major reason for the quality gap between HLS design and manual design is that it is difficult to accurately estimate the interconnection delay at the HLS level and obtain a better global physical layout. Especially in field programmable gate array (Field Programmable Gate Array, FPGA) layout and wiring, physical synthesizers often use resources that are relatively close together, causing layout and wiring congestion, increasing the overall wiring length, and thus reducing the circuit cost. throughput. In addition, FPGA resources are arranged in the form of Configurable Logic Blocks (CLB). The resource conditions of different blocks may be different. When interconnection across blocks is ignored, it is easy to cause longer wiring and delay.

Contents of the invention

In view of this, in order to at least partially solve one of the above technical problems or defects, the purpose of embodiments of the present invention is to provide a high-level integrated process layout method that can effectively reduce wiring layout congestion and reduce latency.

The technical solution of this application provides a high-level comprehensive process layout method, including the following steps:

Obtain the circuit description of the target circuit, and construct a control data flow graph corresponding to the target circuit based on the circuit description; the control data flow graph is a directed graph representing the operation operation of the target circuit;

Segment the control data flow graph through a floor planning algorithm to obtain layout constraints;

Schedule the control data flow graph, and bind the scheduling results to obtain a register transfer level description;

The register transfer level is described according to the layout constraints to obtain a target netlist, and the flow layout of the target circuit is determined according to the target netlist.

In a feasible embodiment of the solution of this application, the control data flow graph is segmented through a floor planning algorithm to obtain layout constraints, including:

Obtain the FPGA architecture of the target circuit described in the circuit description;

Compile according to the function corresponding to the data flow process in the control data flow graph to obtain the circuit module of the register conversion level;

Partition the FPGA architecture according to the circuit module and determine the cost function of the partition result;

By performing segmentation and iteration on the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and the layout constraint is obtained as output.

In a feasible embodiment of the solution of this application, scheduling the control data flow graph and binding the scheduling results to obtain a register transfer level description includes:

Schedule subgraphs of the control data flow graph;

Insert connection lines between nodes in the control data flow graph into pipelines to perform delay balancing;

The scheduling results and the delay-balanced results are mathematically integrated, and the mathematically integrated results are bound to the target circuit to obtain the register transfer level description.

In a feasible embodiment of the solution of the present application, the layout constraints include at least one of timing constraints or physical constraints; the target netlist includes at least one of a comprehensive netlist or a placement and routing netlist; Describing the register transfer level according to the layout constraints to obtain a target netlist, and determining the process layout of the target circuit based on the target netlist, including:

Construct according to the timing constraints and/or the physical constraints to obtain the first input;

Construct according to the register transfer level description to obtain the second input;

Through the FPGA physical synthesizer, the first input and the second input are integrated and processed to output a comprehensive netlist;

Through the FPGA physical synthesizer, place and route processing is performed according to the first input and the second input to obtain a place and route netlist;

The flow layout of the target circuit is determined according to the comprehensive netlist and the placement and routing netlist.

In a feasible embodiment of the solution of this application, the cost function is used to characterize the number of wires at the partition boundary of the FPGA architecture; the cost function is:

Among them, C is the cost value, v _i and v _j represent the nodes in the control data flow graph, i=1,2,3,...n, j=1,2,3,...n, n is a positive integer, E represents the set of FIFO channels between nodes, e _ij is the connection line between v _i and v _j , row represents the number of rows, col represents the number of columns, and width represents the data bit width.

In a feasible embodiment of the solution of this application, the expression of the resource constraint is as follows:

Among them, v _d represents the partition space allocated by node v, v _area represents the required resources of the node, r _v represents the set of nodes accommodated by the current partition r, and (r _child ) _area represents the number of resources in each partition.

In a feasible embodiment of the solution of the present application, by dividing and iterating the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and the output is: The layout constraints include:

Obtain the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determine the coordinate transformation relationship according to the segmentation method, and transform the first coordinates according to the coordinate transformation relationship to obtain the second coordinates;

The segmentation method includes horizontal segmentation or vertical segmentation.

In a feasible embodiment of the solution of this application, the expression of the coordinate transformation relationship is as follows:

Among them, v.row represents the row coordinate in the second coordinate, v.col represents the column coordinate in the second coordinate, (v.row) _prev represents the row coordinate in the first coordinate, (v.col) _prev represents the first Column coordinates in the coordinates, v _d represents the partition space allocated by node v, vertical partition represents horizontal division, and horizontal partition represents vertical division.

In a feasible embodiment of the solution of the present application, the connection lines between nodes in the control data flow graph are inserted into the pipeline to perform delay balancing. The expression of delay balancing is as follows:

e _ij.balance =(S _i -S _j -e _ij.lat )

Among them, S _i represents the time step of node v _i , S _j represents the time step of node v _j , S _i -S _j represents the maximum delay between all paths between node v _i and node v _j ; e _ij.lat Indicates the additional delay that exists before the pipeline is inserted; e _ij.balance indicates the balance delay generated after the pipeline is inserted.

In a feasible embodiment of the solution of the present application, the step of delay balancing the connection line insertion pipeline between nodes in the control data flow graph includes:

An objective function of area overhead is constructed based on the balanced delay, and the objective function is:

Among them, e _ij.width is the maximum data bit width of the pipeline between node _vi and node v _j .

The advantages and beneficial effects of the present invention will be partially given in the following description, and other parts can be understood through the specific implementation of the present invention:

The technical solution of this application proposes a full-process layout method based on a high-level comprehensive guidance of FPGA physical layout constraints based on a floor planning algorithm. The method constructs a control data flow diagram through the circuit description of the target circuit, and uses the floor planning algorithm to control the control data. The data flow graph is segmented to obtain layout constraints; on the basis of the layout constraints, the corresponding resources of the target circuit are scheduled and bound to obtain a register transfer level description, and further comprehensive processing and placement and routing are performed to obtain the netlist of the target circuit; through high-level synthesis Reduce the congestion of layout and routing, and also reduce the increase in delay caused by crossing FPGA block boundaries during the routing process.

Description of drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.

Figure 1 is a step flow chart of the high-level comprehensive process layout method provided in the technical solution of this application;

Figure 2 is a schematic diagram of the control data flow diagram in the technical solution of this application;

Figure 3 is a schematic diagram of the iterative partitioning process in the technical solution of this application;

Figure 4(a) is one of the balance delay schematic diagrams in the technical solution of this application;

Figure 4(b) is the second balance delay diagram in the technical solution of this application;

Figure 5 is a schematic diagram of the FIFO pipeline in the technical solution of this application.

Detailed ways

The embodiments of the present invention are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are only used to explain the present invention and cannot be understood as limiting the present invention. The step numbers in the following embodiments are only set for the convenience of explanation. The order between the steps is not limited in any way. The execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art. sexual adjustment.

Based on current related technical solutions, especially in FPGA layout and routing, physical synthesizers tend to use resources that are relatively close together, which can easily cause congestion in layout and routing, increase the overall wiring length, and thus reduce the throughput of the circuit. In view of the technical defects existing in related technical solutions, the technical solution of this application proposes a full-process layout method based on a high-level comprehensive tool based on a floor planning algorithm to guide FPGA physical layout constraints.

In the first aspect, as shown in Figure 1, the technical solution of this application provides a high-level comprehensive process layout method; the method includes steps S100-S400:

S100. Obtain the circuit description of the target circuit, and construct a control data flow diagram corresponding to the target circuit based on the circuit description;

Among them, as shown in Figure 2, the control data flow graph is a directed graph that represents the operation operations of the target circuit. The solid lines in Figure 2 represent data dependencies, the dotted lines represent control dependencies, and the triangle symbols represent branch operations. In the embodiment, the circuit description includes but is not limited to describing the signal input, components, and logical operations performed by the components in the target circuit through VHDL language or Verilog language; the target circuit may refer to a hardware circuit in a real scene. Specifically, in the embodiment, the input circuit description is first obtained and a control data flow graph is constructed. In the embodiment, the control data flow graph (Control Data Flow Graph, CDFG) is a directed graph G=<V,E>, Where V represents the set of all nodes in the control data flow graph, and each node in the control data flow graph represents an operation in the target circuit; E represents the set of all directed connections in the control data flow graph, and the control data flow graph Each directed edge connecting two nodes in represents the data or control dependency between the two corresponding operations. The control relationship dependency edge of the control data flow graph reflects the control dependence of the circuit description, and the data relationship dependency edge reflects the data dependence of the circuit description. Based on the nature of the control data flow graph, in the process of constructing the control data flow graph in the embodiment, first, the embodiment uses the compiler front end to generate intermediate code from the high-level language code of the behavioral level description in the circuit description according to the source code; The embodiment then uses the compiler backend to map variables in the circuit description into nodes, and map control and data dependencies into directed edges to build a control flow data flow graph.

S200. Segment the control data flow graph through a floor planning algorithm to obtain layout constraints;

Among them, the plane planning algorithm in the embodiment can use the cutting plane algorithm for solution planning; the layout in the embodiment can be defined as a set of physical constraints for controlling the placement of logic in the model. Specifically, in the embodiment, first according to the FPGA architecture corresponding to the target circuit, the number of partitions, resources and maximum resource utilization rate corresponding to the target circuit are determined; then, the functions corresponding to the data flow process in the control data flow graph are compiled into a register conversion level The circuit (Register Transfer Level, RTL) module is placed in the initial partition; based on the cutting plane algorithm, the current partition is divided into two horizontally or vertically, and the solution with the smallest cost function is calculated and selected. Based on the obtained solution, determine The target circuit corresponds to the layout constraints in the FPGA architecture.

In some feasible implementations, the step S200 of segmenting the control data flow graph through a floor planning algorithm to obtain layout constraints may include steps S210-S240:

S210. Obtain the FPGA architecture of the target circuit described in the circuit description;

S220. Compile according to the function corresponding to the data flow process in the control data flow graph to obtain the circuit module of the register conversion level;

S230. Partition the FPGA architecture according to the circuit module and determine the cost function of the partition result;

S240. By performing segmentation and iteration on the partition results of the FPGA architecture, determine that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and output the layout constraint;

Specifically, in the embodiment, there are multiple Blocks in the FPGA architecture, and the resources of different Blocks may be different. Good layout helps reduce routing congestion and improves the quality of timing results (QoR) achievable in the design; in the embodiment, the layout constraints use the Pblock instruction to specify the resource partition, and the Pblock boundary allows the use of clock region boundaries. Defining the size of the Pblock instead of using SLICE, BRAM, DSP, etc. ranges helps limit clock skew and aids in the overall clock placement of the design. And based on the control data flow graph, the embodiment partitions the HLS design based on the data flow programming style, that is, the HLS design is streaming, and the design structure is described as a directed graph; nodes in the directed graph represent operations that need to be performed The unit of processing, the connection line between nodes describes the data transmission path; in the directed graph, adjacent nodes transmit data through quantity wiring, the nodes consume the data for calculation, and output the generated data to the input-output sequence as the next input to the calculation unit.

For example, the HLS design in the example adopts a data flow programming model, in which each function corresponds to a data flow process, each function corresponds to an RTL module, and FIFO is used to communicate between modules. Then a graph G = <V, E> is constructed, where V represents a collection of data flows, and each node represents a function; E represents a collection of FIFO channels between vertices.

S300. Schedule the control data flow graph, and bind the scheduling results to obtain a register transfer level description;

Specifically, in the embodiment, corresponding scheduling needs to be performed for the subgraphs of the control data flow graph; delay balancing is performed on the control data flow graph; further, in some feasible embodiments, step S300 may include steps S310-S330:

S310. Schedule the subgraphs of the control data flow graph;

S320. Insert the connection lines between nodes in the control data flow graph into the pipeline to perform delay balancing;

S330: mathematically integrate the scheduling results and the delay-balanced results, and bind the mathematically integrated results to the target circuit to obtain the register transfer level description.

Specifically, in the embodiment, the subgraphs of the control data flow graph are scheduled using the default method of the high-level synthesis tool; then, the cutting edge of the control data flow graph is inserted into the pipeline to perform balancing delay; the results obtained in steps S310-S320 are After the scheduling results are mathematically integrated, the comprehensive scheduling results are obtained; the comprehensive scheduling results are bound to the resources in the FPGA architecture corresponding to the target circuit to obtain the register transfer level description.

S400. Describe the register transfer level according to the layout constraints to obtain a target netlist, and determine the process layout of the target circuit according to the target netlist;

Specifically, in the embodiment, according to the layout constraints obtained in step S200 and the register transfer level description obtained in step S300, the resources in the FPGA architecture are integrated and placed and routed to obtain the corresponding target netlist, thereby determining the target. The corresponding control flow layout of the circuit.

In some feasible implementations, the layout constraints in the embodiment include at least one of timing constraints or physical constraints; the target netlist includes at least one of a comprehensive netlist or a placement and routing netlist; further, in the embodiment, according to the The layout constraints describe the register transfer level to obtain a target netlist. The step S400 of determining the process layout of the target circuit based on the target netlist may include steps S410-S450:

S410. Construct and obtain the first input according to the timing constraints and/or the physical constraints;

S420. Construct according to the register transfer level description to obtain the second input;

S430. Use the FPGA physical synthesizer to integrate the first input and the second input and output them to obtain a comprehensive netlist;

S440. Through the FPGA physical synthesizer, perform placement and routing processing and output according to the first input and the second input to obtain a placement and routing netlist;

S450. Determine the process layout of the target circuit according to the comprehensive netlist and the placement and routing netlist.

Specifically, in the embodiment, the layout constraints obtained in step S200 and the timing constraints and physical constraints obtained by the high-level synthesis tool itself are input as a set of constraints for comprehensive implementation, which is the first input; the layout constraints obtained in step S300 are input The register transfer level description is used as the RTL input of the synthesis implementation, that is, the second input; then the embodiment runs the FPGA physical synthesizer, performs synthesis and placement and routing operations, and obtains the post-synthesis netlist and the post-layout netlist. According to the obtained network The table determines the flow layout of the target circuit.

In the embodiment, in step S230, the FPGA architecture is partitioned according to the circuit module, and a cost function of the partition result is determined, where the physical meaning of the cost function is the sum of the number of wires passing through the partition boundary. Furthermore, the cost function in the embodiment is:

In the embodiment, step S240 performs segmentation and iteration on the partition results of the FPGA architecture, determines that the cost function is the minimum value or the resources in the partition reach the critical value of resource constraints, and outputs the layout constraints; where, resources The expression of the constraint is as follows:

In an embodiment, the step S240 of performing segmentation and iteration on the partition results of the FPGA architecture, determining that the cost function is the minimum value or the resources in the partition reaches a critical value of resource constraints, and outputting the layout constraints may include Steps S241-S242:

S241. Obtain the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determine the coordinate transformation relationship according to the segmentation method, and transform the first coordinates according to the coordinate transformation relationship to obtain the second coordinates;

S242. The segmentation method includes horizontal segmentation or vertical segmentation;

Specifically, in the embodiment, as shown in Figure 3, the partitioning process can be seen as iteratively dividing it into two parts again and again until the cost function is minimum or the constraints are no longer met, and finally the pipeline FIFO is added; the first step is to All functions are mapped to RTL modules and placed in a partition, called the initialization partition, in which the dependency relationship is 1 points to 2, 3, 4, 2, 3, 4 points to 5, and 2 and 3 occupy less resources; second The first step is to divide it into two vertically, with 123 placed at the top and 45 at the bottom; the third step is to divide each partition into two horizontally, with the result that 2 and 3 are located at the upper left and 1 is located at the upper right. 4 is located at the lower left and 5 is located at the lower right; the last step is to add a FIFO pipeline for the traces that cross the block boundary to ensure the throughput of the circuit design.

Further, the expression of the coordinate transformation relationship in the embodiment is as follows:

In an embodiment, step S320 performs delay balancing on the connection line insertion pipeline between nodes in the control data flow graph.

Specifically, in the embodiment, given a data flow graph G<V,E> that has been partitioned and pipelined, each vertex v∈V represents a function in the data flow design, and each edge e∈E represents a function between functions. FIFO channel, width e.width represents the bit width of the edge, delay e.lat represents the additional delay inserted in the previous pipeline step, and balance delay e.balance represents the balance delay in the current step. For each edge e∈E, the total delay of each path can be expressed as:

Among them, {p ₁ , p ₂ } represents a pair of re-converged paths. Furthermore, for each edge (connection line) e in delay balance, it can be considered that S _i ≥ S _j +e _ij.lat , and the additional balance delay can be expressed as:

e _ij.balance =(S _i -S _j -e _ij.lat )

Among them, S _i represents the time step of node v _i , S _j represents the time step of node v _j , S _i -S _j represents the maximum delay between all paths between node v _i and node v _j ; e _ij.lat Represents the extra delay inserted by the previous pipeline step in the longest path between vertices v _i and v _j ; e _ij.balance represents the additional delay inserted by the current pipeline step in the longest path between vertices v _i and v _j .

For example, as shown in Figure 4, Figure 4(a) represents the cut-set1 cut set in the balanced delay process; Figure 4(b) represents the cut-set2 and cut-set3 cut sets in the balanced delay process. Among them, edges e13, e37 and e27 are pipelined according to the floor plan partition, and then each edge carries 1 unit of insertion delay. Also assume that the bit width of e14 is 2 and all other edges are 1. In the delay balancing step, the optimal solution is to add 2 units of delay to each edge of e47, e57, and e67, and to add 1 unit of delay to each of e12. Note that e27 and e37 can exist in the same cut-set.

As shown in Figure 5, the connection is based on FIFO after partitioning, so that it can be pipelined. Using FIFO, matching interface signals can be directly scheduled without affecting functions, and provide parallelism of circuit functions.

In some feasible implementations, the step S320 of delay balancing the connection line insertion pipeline between nodes in the control data flow graph may also include step S321:

S321. Construct an objective function of area overhead according to the balanced delay;

Specifically in the embodiment, the optimization goal of balancing delay is to minimize the total area overhead and take into account the bit width overhead of each side. The objective function in the embodiment is:

Among them, e _ij.width is the maximum data bit width of the pipeline between node v _i and node v _j

From the above specific implementation process, it can be concluded that the technical solution provided by the present invention has the following advantages or advantages compared with the existing technology:

The invention can be applied in the high-level comprehensive design of medium and large data flow programming models, taking advantage of the characteristics of high-level comprehensive tools to enable rapid prototyping, and at the same time extending the design process from high-level language to hardware description language to physical layout. High-level synthesis tools guide the full-process design method of physical layout, which can further improve the layout and routing conditions of high-level synthesis design, reduce layout congestion, and ensure the overall throughput of the circuit.

For example, using the default HLS and the application layout constraint HLS respectively for the RISCV CPU design, it can be clearly determined that the HLS design with the application layout constraints will partition the CPU module, FFT module, USB1 module, and USB2 module with the highest resource usage for partition pipeline processing. Distributed in adjacent but different blocks at the same time. When the timing constraints are met, the timing margin is basically unchanged, but the congestion situation is greatly improved, and the running time of physical synthesis is also reduced.

In some alternative embodiments, the functions/operations noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending on the functionality/operations involved. Furthermore, the embodiments presented and described in the flow diagrams of the present invention are provided by way of example for the purpose of providing a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of a larger operation are performed independently.

Furthermore, although the invention has been described in the context of functional modules, it should be understood that, unless stated to the contrary, one or more of the functions and/or features may be integrated into a single physical device and/or software module. , or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be understood that a detailed discussion regarding the actual implementation of each module is not necessary to understand the invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be within the ordinary skill of an engineer, taking into account the properties, functions and internal relationships of the modules. Therefore, a person skilled in the art using ordinary skills can implement the invention set forth in the claims without undue experimentation. It will also be understood that the specific concepts disclosed are illustrative only and are not intended to limit the scope of the invention, which is to be determined by the full scope of the appended claims and their equivalents.

The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment.

In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials or features are included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art will appreciate that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principles and purposes of the invention. The scope of the invention is defined by the claims and their equivalents.

The above is a detailed description of the preferred implementation of the present invention, but the present invention is not limited to the above embodiments. Those skilled in the art can also make various equivalent modifications or substitutions without violating the spirit of the present invention. Equivalent modifications or substitutions are included within the scope defined by the claims of this application.

Claims

A high-level comprehensive process layout method is characterized by including the following steps:

Obtain the circuit description of the target circuit, and construct a control data flow graph corresponding to the target circuit based on the circuit description; the control data flow graph is a directed graph representing the operation operation of the target circuit;

Segment the control data flow graph through a floor planning algorithm to obtain layout constraints;

Schedule the control data flow graph, and bind the scheduling results to obtain a register transfer level description;

The register transfer level is described according to the layout constraints to obtain a target netlist, and the process layout of the target circuit is determined according to the target netlist.
The high-level comprehensive process layout method according to claim 1, characterized in that the control data flow graph is segmented through a plane planning algorithm to obtain layout constraints, including:

Obtain the FPGA architecture of the target circuit described in the circuit description;

Compile according to the function corresponding to the data flow process in the control data flow graph to obtain the circuit module of the register conversion level;

Partition the FPGA architecture according to the circuit module and determine the cost function of the partition result;

By performing segmentation and iteration on the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach the critical value of the resource constraint, and the layout constraint is obtained as output.
The high-level comprehensive process layout method according to claim 1, characterized in that: scheduling the control data flow graph and binding the scheduling results to obtain a register transfer level description, including:

Schedule subgraphs of the control data flow graph;

Insert connection lines between nodes in the control data flow graph into pipelines to perform delay balancing;

The scheduling results and the delay-balanced results are mathematically integrated, and the mathematically integrated results are bound to the target circuit to obtain the register transfer level description.
The high-level integrated process layout method according to claim 1, wherein the layout constraints include at least one of timing constraints or physical constraints; the target netlist includes a comprehensive netlist or a placement and routing netlist. At least one of; describing the register transfer level according to the layout constraints to obtain a target netlist, and determining the process layout of the target circuit according to the target netlist, including:

Construct according to the timing constraints and/or the physical constraints to obtain the first input;

Construct according to the register transfer level description to obtain the second input;

Through the FPGA physical synthesizer, the first input and the second input are integrated and processed to output a comprehensive netlist;

Through the FPGA physical synthesizer, place and route processing is performed according to the first input and the second input to obtain a place and route netlist;

The flow layout of the target circuit is determined according to the comprehensive netlist and the placement and routing netlist.
The high-level integrated process layout method according to claim 2, characterized in that the cost function is used to characterize the number of wires at the partition boundary of the FPGA architecture; the cost function is:

Among them, C is the cost value, v i and v j represent the nodes in the control data flow graph, i=1,2,3,...n, j=1,2,3,...n, n is a positive integer, E represents the set of FIFO channels between nodes, e ij is the connection line between v i and v j , row represents the number of rows, col represents the number of columns, and width represents the data bit width.
The high-level comprehensive process layout method according to claim 2, characterized in that the expression of the resource constraints is as follows:

Among them, v d represents the partition space allocated by node v, v area represents the required resources of the node, r v represents the set of nodes accommodated by the current partition r, and (r child ) area represents the number of resources in each partition.
The high-level comprehensive process layout method according to claim 2, characterized in that by performing segmentation and iteration on the partition results of the FPGA architecture, it is determined that the cost function is the minimum value or the resources in the partition reach resource constraints. The critical value of , the output obtains the layout constraints, including:

Obtain the first coordinates of the nodes in the control data flow graph before the segmentation iteration, determine the coordinate transformation relationship according to the segmentation method, and transform the first coordinates according to the coordinate transformation relationship to obtain the second coordinates;

The segmentation method includes horizontal segmentation or vertical segmentation.
The high-level comprehensive process layout method according to claim 7, characterized in that the expression of the coordinate transformation relationship is as follows:

Among them, v.row represents the row coordinate in the second coordinate, v.col represents the column coordinate in the second coordinate, (v.row) prev represents the row coordinate in the first coordinate, (v.col) prev represents the first Column coordinates in the coordinates, v d represents the partition space allocated by node v, vertical partition represents horizontal division, and horizontal partition represents vertical division.
The high-level comprehensive process layout method according to claim 3, characterized in that the connection lines between nodes in the control data flow graph are inserted into the pipeline to perform delay balancing, and the expression of delay balancing is as follows:

e ij.balance =(S i -S j -e ij.lat )

Among them, S i represents the time step of node v i , S j represents the time step of node v j , S i -S j represents the maximum delay between all paths between node v i and node v j ; e ij.lat Indicates the additional delay that exists before the pipeline is inserted; e ij.balance indicates the balance delay generated after the pipeline is inserted.
The high-level comprehensive process layout method according to claim 9, characterized in that the step of inserting connection lines between nodes in the control data flow graph into pipelines for delay balancing includes:

An objective function of area overhead is constructed based on the balanced delay, and the objective function is:

Among them, e ij.width is the maximum data bit width of the pipeline between node vi and node v j .