[go: up one dir, main page]

CN102054109B - Lower hardware mapping method of integrated circuit, and data control flow generation method and device - Google Patents

Lower hardware mapping method of integrated circuit, and data control flow generation method and device Download PDF

Info

Publication number
CN102054109B
CN102054109B CN201010622446.XA CN201010622446A CN102054109B CN 102054109 B CN102054109 B CN 102054109B CN 201010622446 A CN201010622446 A CN 201010622446A CN 102054109 B CN102054109 B CN 102054109B
Authority
CN
China
Prior art keywords
operator
integrated circuit
data
statement
mapped
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201010622446.XA
Other languages
Chinese (zh)
Other versions
CN102054109A (en
Inventor
王新安
胡子一
安辉耀
王腾
谢峥
张兴
周生明
赵秋奇
马芝
孙亚春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Shenzhen Graduate School
Original Assignee
Peking University Shenzhen Graduate School
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Shenzhen Graduate School filed Critical Peking University Shenzhen Graduate School
Priority to CN201010622446.XA priority Critical patent/CN102054109B/en
Publication of CN102054109A publication Critical patent/CN102054109A/en
Application granted granted Critical
Publication of CN102054109B publication Critical patent/CN102054109B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Design And Manufacture Of Integrated Circuits (AREA)

Abstract

本发明公开了一种集成电路下层硬件映射方法及装置,通过对描述集成电路算方法的计算机语言程序进行分析,并将其映射为描述集成电路算法的数据控制流图,再转换为相应的算子时空图,并对数据控制流图进行时序约束,从而根据时序标注对算子时空图进行聚类压缩,再生成集成电路下层硬件电路逻辑描述,从而创造了一种从计算机语言到集成电路下层硬件电路逻辑描述的映射工具,标准化地实现了集成电路从C或MATLAB等语言生成下层硬件的过程,实现起来方便快捷。本发明公开的数据控制流生成方法及装置通过对计算机语言程序分析得到其相应的数据相关性、数据可并行性和相应控制信息等,从而生成相应的数据控制流图,帮助硬件工程师进行硬件设计。

The invention discloses a method and device for mapping lower-layer hardware of an integrated circuit. By analyzing a computer language program describing an integrated circuit algorithm, mapping it into a data control flow graph describing an integrated circuit algorithm, and then converting it into a corresponding algorithm Sub-space-time diagrams, and time sequence constraints on the data control flow graph, so as to cluster and compress the operator space-time diagram according to the timing annotations, and then generate the logic description of the hardware circuit of the lower layer of the integrated circuit, thus creating a system from computer language to the lower layer of integrated circuits The mapping tool for hardware circuit logic description standardizes the process of generating lower-level hardware from languages such as C or MATLAB, which is convenient and quick to implement. The method and device for generating data control flow disclosed by the present invention obtains its corresponding data correlation, data parallelism and corresponding control information by analyzing computer language programs, thereby generating corresponding data control flow diagrams and helping hardware engineers to design hardware .

Description

Lower hardware mapping method of integrated circuit, Data Control stream generating method and device
Technical field
The present invention relates to integrated circuit (IC) design field, especially a kind of lower hardware mapping method of integrated circuit, Data Control stream generating method and device.
Background technology
In integrated circuit fields, the design rate of integrated circuit lags behind the speed of development of integrated circuit fabrication process conventionally.Especially, after the manufacturing process of integrated circuit enters nanoscale, the design rate of integrated circuit has lagged far behind the speed of development of integrated circuit fabrication process.Therefore,, for integrated circuit (IC) design field, improving design rate is one of current the most urgent problem.As shown in Figure 1, in prior art, the design of integrated circuit generally includes two parts: first is the description from the arthmetic statement based on C language or MATLAB language to RTL level; Second portion is from rtl description to standard block ASIC structure or the implementation procedure of gate array existing (or other S-ASIC structure) or FPGA structure.The wherein realization of the second portion instrument support of existing comparative maturity at present, its implementation procedure meets the requirements such as efficient, quick substantially.Therefore, the key point that improves design rate has dropped in the realization of first, namely from the arthmetic statement of C language or MATLAB language etc. to the description of RTL level, this can be referred to as to the mapping method of integrated circuit lower hardware or High Level Synthesis or structural level comprehensive.
But because the realization of first is mainly the understanding to C language or MATLAB language according to self by technician, be converted into artificially the description of RTL level.That is to say, the realization of first is subject to the impact of technician's self experience and knowledge level, and for different technician, the time of realization exists larger difference.Implementation for first, some external companies have launched corresponding research and have pushed away some implementation tools, such as the AutoPilot of Catapult C, AutoESL of Mentor is, the SPARK of the Cynthesizer of Forte Design System, UC San Diego etc.
Summary of the invention
The main technical problem to be solved in the present invention is, a kind of lower hardware mapping method of integrated circuit and device are provided, and can improve the design rate of integrated circuit.
The present invention also provides a kind of Data Control stream generating method and device, and comprising data dependence, data can concurrency and the analysis of corresponding control information, can ancillary hardware circuit designer carry out circuit design.
For solving the problems of the technologies described above, the technical solution used in the present invention is as follows:
A lower hardware mapping method of integrated circuit, comprises step:
Process analysis step, for reading the computer language procedure of describing integrated circuit algorithm, and therefrom identifies mapped execution object and parameter object;
Data Control flow graph generates step, for the execution object identifying and parameter object being mapped to the respective nodes of the Data Control flow graph of describing integrated circuit algorithm;
Operator spacetime diagram generates step, for the function treatment of carrying out according to each node of Data Control flow graph, from the operator cell library of setting up in advance, obtain at least one operator unit of corresponding function, Data Control flow graph is converted to the operator spacetime diagram being formed by operator unit;
Temporal constraint step, for determining total temporal constraint according to the requirement of user specification requirement and target integrated circuit technology, each the operator unit label time in operator spacetime diagram, carries out temporal constraint to each level of operator spacetime diagram;
Spacetime diagram compression step, for according to time-labeling, operator spacetime diagram being carried out to the cluster compression on space, makes overall algorithm execution time close to total temporal constraint;
Lower hardware mapping step, for generating integrated circuit lower hardware logical description according to the operator spacetime diagram after cluster compression.
Method based on above-mentioned, the present invention also provides a kind of integrated circuit lower hardware mapping device, comprising:
Process analysis module, for reading the computerese of describing integrated circuit algorithm, and therefrom identifies mapped execution object and parameter object;
Data Control flow graph generation module, for being mapped to the execution object identifying and parameter object the respective nodes of the Data Control flow graph of describing integrated circuit algorithm;
Operator spacetime diagram generation module, for the function treatment of carrying out according to each node of Data Control flow graph, from the operator cell library of setting up in advance, take out at least one operator unit of corresponding function, Data Control flow graph is converted to the operator spacetime diagram being formed by operator unit;
Temporal constraint module, for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, each the operator unit label time in operator spacetime diagram, carries out temporal constraint to each level of operator spacetime diagram;
Spacetime diagram compression module, for according to time-labeling, spacetime diagram being carried out to the cluster compression on space, and makes it overall algorithm execution time close to total temporal constraint;
Lower hardware mapping block, generates integrated circuit lower hardware logical description according to the spacetime diagram after cluster compression.
The present invention also provides a kind of Data Control stream generating method, comprises step:
Process analysis step, for reading the computer language procedure of describing integrated circuit algorithm, according to the rule of this kind of computerese, from described computer language procedure, identify mapped execution object and parameter object, described execution object comprises operational order and/or steering order, and described parameter object comprises at least one in input data, output data, intermediate data;
Data Control flow graph generates step, for the execution object identifying and parameter object being mapped to the respective nodes of the Data Control flow graph of describing integrated circuit algorithm.
Further, at described Data Control flow graph, generate in step, described operation table command mappings, for processing block diagram, is mapped as to the control stream for identification-state, state transitions condition and state control signal by described steering order, described parameter object is mapped as to the memory node in data stream.
Wherein, described process analysis step comprises: according to the rule of this kind of computerese, from described computer language procedure, find out function, computing in the input and output of described function and function is resolved, identify and carry out object and parameter object, in resolving, if current resolved function is the upper layer functions that comprises lower layer functions, continue lower layer functions to resolve.
Further, described steering order is the recursion instruction in loop statement, and described loop statement comprises quiet cycle statement and dynamic circulation statement, and described Data Control flow graph generation step comprises:
When loop statement is quiet cycle statement, according to cycle index, loop body is launched, after loop body launches, bring parameter object into and obtain new operation expression, operational order in operation expression is mapped as to processing block diagram, the parameter object in operation expression is mapped as to the memory node in data stream;
When loop statement is while can be changed into the dynamic circulation statement of quiet cycle statement, according to the different occasion change state loop statements that call, it is quiet cycle statement, its expansion is obtained to new operation expression, recursion instruction in operation expression is mapped as to processing block diagram, the parameter object in operation expression is mapped as to the memory node in data stream;
When loop statement is individual layer dynamic circulation statement, the content map of loop statement, for processing block diagram, is mapped as to state machine by recursion instruction;
When loop statement is multilayer dynamic circulation statement, by the content map of outer loop statement and interior loop statement, be respectively that the first processing block diagram and second is processed block diagram, outer recursion instruction is mapped as to the first state machine, by interior loop command mappings, be the second state machine, and described second process block diagram and the second state machine and is mapped in described first and processes in block diagram; Or by the content map of outer loop statement and interior loop statement, be respectively that the first processing block diagram and second is processed block diagram, recursion instruction is mapped as to state machine, and described second processes block diagram is mapped in described first and processes in block diagram, and the cycle index that the status number of described state machine equals described circulation adds one.
Further, described steering order is the steering order in branch's control statement, described Data Control flow graph generates step and comprises: described branch steering order is mapped as to MUX, the content map of branch's control statement is that the input end of MUX is processed block diagram, and the controlled condition of branch's control statement is mapped as the control end of MUX and processes block diagram.
Further, described branch control statement is nested branch control statement, Ze Jiang upper strata branch steering order is mapped as the first MUX, the content map of described upper strata branch control statement is that the input end of described the first MUX is processed block diagram, and the controlled condition of described upper strata branch control statement is mapped as the control end of described the first MUX and processes block diagram; Branch of lower floor steering order is mapped as to the second MUX, the content map of branch of described lower floor control statement is that the input end of described the second MUX is processed block diagram, the controlled condition of branch of described lower floor control statement is mapped as the control end of described the second MUX and processes block diagram, and the output of described the first MUX is as the input of described the second MUX.
Method based on above-mentioned, the present invention also provides a kind of Data Control stream generating apparatus, comprising:
Process analysis module, for reading the computer language procedure of describing integrated circuit algorithm, according to the rule of this kind of computerese, from described computer language procedure, identify mapped execution object and parameter object, described execution object comprises operational order and/or steering order, and described parameter object comprises at least one in input data, output data, intermediate data;
Data Control flow graph generation module, for being mapped to the execution object identifying and parameter object the respective nodes of the Data Control flow graph of describing integrated circuit algorithm.
Further, described Data Control flow graph generation module comprises: carry out object map module, for the operational order of described execution object is mapped as to processing block diagram, and for the steering order of described execution object being mapped as to the control stream of identification-state, state transitions condition and state control signal;
Parameter object mapping block, for being mapped as parameter object the memory node in data stream.
The invention has the beneficial effects as follows: by the analysis to former c program or MATLAB program, identify execution object and the parameter object of mapping; And the execution object identifying and parameter object are become to Data Control flow graph, this Data Control flow graph can represent the algorithm of integrated circuit; Then each node in Data Control flow graph is substituted to generating operator spacetime diagram with operator; The operator spacetime diagram generating, through cluster compression, makes the overall execution time of the spacetime diagram after compression close to total temporal constraint; Spacetime diagram after compression is generated to the lower hardware circuit of integrated circuit.Thereby created a kind of mapping tool from computerese to integrated circuit lower hardware circuit, realized to standardization the process of integrated circuit from language generation lower hardwares such as C or MATLAB, implemented convenient and swift.
The present invention also can concurrency and corresponding control information etc. by computer language procedure analysis being obtained to its corresponding data dependence, data, thereby according to the corresponding Data Control flow graph of these Information generations, and wherein controlling expression formula conversion method is the core of all linguistic expression's conversions, and its conversion efficiency directly affects the data volume of generating algorithm Data Control flow graph.The inventive method is taken concurrency factor, hardware configuration into account in advance such as state machine, MUX etc., can help to a greater extent Hardware Engineer to carry out hardware design.
Accompanying drawing explanation
Fig. 1 is method of designing integrated circuit process flow diagram of the prior art;
Fig. 2 is ADDS Operator structure schematic diagram;
Fig. 3 is ADDS operator function figure;
Fig. 4 is the universal architecture schematic diagram of storage class operator;
Fig. 5 is the universal architecture schematic diagram of class of paths operator;
Fig. 6 is for controlling the universal architecture schematic diagram of class operator;
Fig. 7 a and Fig. 7 b are respectively the structured flowchart of a kind of embodiment of lower hardware mapping process flow diagram of the present invention and integrated circuit lower hardware mapping device;
Fig. 8 is the process flow diagram of in a kind of embodiment of Data Control stream drawing generating method of the present invention, function X264_me_search being analyzed;
Fig. 9 is the process flow diagram that Fig. 8 function X264_me_search and pixel_sad_16 * 16 generated data are controlled flow graph;
Figure 10 is the mapping structure figure of the individual layer dynamic circulation statement in a kind of embodiment of the present invention;
Figure 11 is a kind of mapping structure figure of the multilayer dynamic circulation statement in a kind of embodiment of the present invention;
Figure 12 is another mapping structure figure of the multilayer dynamic circulation statement in a kind of embodiment of method of the present invention;
Figure 13 is the structural drawing of the branch's control statement mapping in a kind of embodiment of the present invention;
Figure 14 is the structural drawing of the nested branch control statement mapping in a kind of embodiment of the present invention;
Figure 15 is the Data Control flow graph of the function X264_me_search of the present embodiment;
Figure 16 is the Data Control flow graph of function pixel_sad_16 * 16 of the present embodiment;
Figure 17 is the structured flowchart of a kind of embodiment of Data Control flow graph generating apparatus of the present invention;
Figure 18 is the structured flowchart that the Data Control of Figure 17 flows a kind of embodiment of generation module;
Figure 19 is the structural drawing of a kind of embodiment of order related data flow expansion of the present invention;
Figure 20 is the feedback data stream that exists of the present invention, by operator, expands into local flow's line structure schematic diagram;
Figure 21 is the structural drawing of a kind of embodiment of parallel data stream expansion of the present invention;
Figure 22 is the L0 of function x264_me_search and the operator spacetime diagram of L1 logic generation of the present embodiment;
Figure 23 a and Figure 23 b be the first of the logic L3 of function x264_me_search and the operator spacetime diagram of second portion generation respectively;
Figure 24 is the operator spacetime diagram that the logic L5 of the function X264_me_search of the present embodiment generates;
Figure 25 a and Figure 25 b are respectively the function x264_me_search of the present embodiment and the operator spacetime diagram of letter pixel_sad_16 * 16;
Figure 26 a and Figure 26 b are respectively the function X264_me_search of the present embodiment and the temporal constraint schematic diagram of function pixel_sad_16 * 16;
Figure 27 is the spacetime diagram after function pixel_sad_16 * 16 cluster of the present embodiment is compressed;
Figure 28 is depicted as a kind of embodiment schematic diagram that solidifies customization;
Figure 29 is the comparison diagram after the compression of X264_me_search cluster.
Embodiment
Below by embodiment, by reference to the accompanying drawings the present invention is described in further detail.
Look back the development course that method of designing integrated circuit is learned, can see: when integrated circuit fabrication process enters epoch of 1um, occurred classifying as with gate array the method for designing of elementary cell; When integrated circuit fabrication process enters epoch of 0.5um, there is take the method for designing that standard block is elementary cell; When integrated circuit fabrication process enters epoch of 0.18um, there is take the method for designing that IP kernel is elementary cell.This shows: the design methodology of integrated circuit is along with the development of integrated circuit fabrication process on the one hand, and the unit granularity of the elementary cell (door, standard block, IP kernel) that method of designing integrated circuit is used in learning on the other hand constantly increases.Meanwhile, the appearance of each new elementary cell, all indicates the revolutionary progress of method of designing integrated circuit.Therefore, what can rationally predict is, along with the progress at full speed of integrated circuit fabrication process nearly ten years, especially integrated circuit fabrication process enters after nanoscale, to there is and open the new situation of integrated circuit (IC) design in the elementary cell of coarsegrain more, to adapt to the develop rapidly of integrated circuit fabrication process.
Operator is as the elementary cell in integrated circuit building block, its granularity is greater than the granularity of standard block, the computer programming language that is conducive to describe integrated circuit algorithm shines upon to lower hardware, therefore the present invention adopts the method for designing integrated circuit based on operator, made to accelerate the design rate of integrated circuit, to adapt to the progress of integrated circuit fabrication process.
In the present invention, conventional operator has five classes, is respectively computing class operator, storage class operator, class of paths operator, controls class operator and clock class operator.
1, computing class operator.
Computing operator (AU) is for realizing the elementary cell of logical operation or arithmetical operation or the hybrid operation of logical and arithmetic.It comprises arithmetic logical unit and computing configuration register, computing configuration register is for receiving and storage computing configuration-direct, the arithmetical logic operation that different computing configuration-directs is corresponding different, that is to say, by computing configuration-direct, can make same computing operator realize multiple different function.Below, the ADDS operator of take describes computing operator as example.
Fig. 2 is the structural representation of ADDS operator, it comprises for realizing and adds the ADD unit of reducing and for realizing < </> > unit of shifting function, by the parameter value of control bit X is set, can make ADDS operator realize multiple different function, such as, the form of Fig. 3 shows the corresponding relation of different control bit X values and different operating in a kind of embodiment.As this operator that can realize multiple difference in functionality by control bit X of ADDS, be called restructural operator, restructural operator, because abundant application function can be used in different scenes, has reduced the operator number storing in operator cell library.And restructural operator can also be realized dynamic reconstruct by changing the mode of control bit in its implementation.
2, storage class operator.
Be illustrated in figure 4 the basic structure schematic diagram (in figure, CU represents to control operator) of storage class operator (MU).Storage operators comprises stored configuration register (MU configuration register) and storage unit, and storage unit comprises address-generation unit, data-carrier store, data generation unit and data output control unit.Stored configuration register can be by data output control unit configuration store operator (MU) memory bank (various storage mediums: writing and/or the playback mode MEM such as register, RAM), the working method of can also config memory corresponding address-generation unit.Into precalculated position is directly stored input data in the address generating according to address-generation unit, and the data of needs are exported from deposit position.
3, class of paths operator.
As shown in Figure 5, be the universal architecture schematic diagram of class of paths operator (LU).Class of paths operator LU comprises routing configuration register (LU configuration register) and forms alteration switch and the data register (REG) of Route Selection unit, wherein, routing configuration register is controlled the control of operator CU, controls alteration switch and realize the connection between nonidentity operation operator AU according to the mode of expectation under the control action of controlling operator CU.Data register is for the inputoutput data of temporary computing class operator LU and storage class operator M U.
4, control class operator.
As shown in Figure 6, for controlling the universal architecture schematic diagram of class operator (CU).The effect of controlling class operator is mainly that configuration information is sent to corresponding configuration register, and configuration computing operator AU, storage operators MU and path operator LU realize predetermined function.The form of controlling operator CU comprises three kinds of counter, state machine and micro-orders.Wherein micro-order structure comprises code translator, programmable counter, command memory and Pipeline control module etc.Control operator CU and to each functional unit, send configuration information by carrying out simple configuration-direct, seldom, so order register capacity is little in the instruction of supporting due to CU, and code translator is very simple.
5, clock class operator.
Clock operator is for the clock control signal of computing class operator, storage class operator, class of paths operator and control class operator, and clock signal comprises the signal of controlling clock start-stop and controlling clock frequency, and clock signal can configure according to the mode of expectation.
Above five class operators are the bases of realizing following embodiment, be understandable that, above-mentioned to according to function, the operator for integrated circuit (IC) design being divided into five large classes not exclusive dividing mode, can also carry out targetedly according to actual conditions the division of wide region more or thinner scope.
In an embodiment of the present invention, a kind of mapped system from computerese to integrated circuit lower hardware circuit is provided, be the lower hardware mapping method of integrated circuit of this system as shown in Figure 7a, comprise the following steps: (in the present embodiment, " spacetime diagram " is equal to " operator spacetime diagram ", and the former is only for being called for short.)
Step S1, analyzes program, reads the computer language procedure of describing integrated circuit algorithm, identifies mapped execution object and parameter object according to the rule of this computerese from described computer language procedure.Special IC, for realizing specific agreement or function, and first these functions and agreement are described by computer language procedure conventionally, and computerese wherein adopts C language or MATLAB language etc. conventionally.The computer language procedure of writing is input in mapped system of the present invention, this mapped system identifies mapped execution object and parameter object from described computer language procedure according to the rule of coding computerese used again.
In present embodiment, this execution object comprises operational order and/or steering order, and this parameter object comprises at least one in input data, output data, intermediate data.Operational order in the present embodiment comprises and adding, subtracts, takes advantage of and the computing such as displacement.
Step S2, generated data is controlled flow graph, and the execution object identifying and parameter object are mapped to the respective nodes in the Data Control flow graph of describing integrated circuit algorithm.Described operational order is mapped as to processing block diagram, described steering order is mapped as to the control stream for identification-state, state transitions condition and state control signal, described input data, output data and intermediate data are mapped as to the memory node in data stream.
Step S3, operator spacetime diagram generates step, for the processing capacity of carrying out according to each node of Data Control flow graph, from the operator cell library of setting up in advance, take out at least one operator unit of corresponding function, Data Control flow graph is converted to the operator spacetime diagram being formed by operator unit.First Data Control flow graph is launched according to its data flow dependency, then convert each node after launching to can complete this nodal function operator unit.With the combination of one or more operators unit, replace each node in Data Control flow graph, the combination of one or more operators unit can complete the function identical with each node.For how Data Control flow graph being launched, include but not limited to following several mode: if the data stream in Data Control flow graph is for order related data flow structure, adopt the mode of streamline to launch order related data flow; If there is feedback in the data stream in Data Control flow graph, this data stream is a circulation, and this data stream exists data dependence, so this data stream can not be converted into flowing structure; If but while there is not data dependence between this data stream internal data, described internal data is not existed each data stream of data dependence to adopt the mode of local flow's waterline to launch; If there is not data dependence between the data stream in Data Control flow graph, adopt parallel mode to launch this data stream, and convert the operator spacetime diagram being formed by operator unit to.
Step S4, temporal constraint step, for determining total temporal constraint according to the requirement of user specification requirement and target integrated circuit technology, to each the operator unit label time in operator spacetime diagram.On the other hand, from operator cell library, can extract operator time sequence information, operator spacetime diagram is done to sequential mark, form the object of temporal constraint.Thereby according to data flow characteristic can be by temporal constraint each level specific to operator spacetime diagram, realize each level of operator spacetime diagram carried out to temporal constraint.Because operator can form different operator function pieces, and then form different operator function groups, each operator function group is an operator level.
If described data flow architecture is parallel data stream, total temporal constraint is divided equally to each the operator level being given in corresponding spacetime diagram, and divided the temporal constraint of each operator level equally each operator unit in this operator level.Using the basic sequential unit of the total operator of corresponding each operator level of each node of serial in Data Control stream as overall temporal constraint, the ratio that accounts for the sequential summation that operator unit that in each operator level, the longest arithmetic path shines upon is corresponding according to the sequential of the computing operator that in each operator level, the longest arithmetic path shines upon is distributed the sequential of each operator level.
Step S5, spacetime diagram compression step, for according to time-labeling, spacetime diagram being carried out the cluster compression in space (being on hardware resource or area), and makes it overall algorithm execution time close to total temporal constraint.
In one embodiment, spacetime diagram is compressed and comprised the following steps: in operator spacetime diagram, find out computing class operator and/or the identical storage class operator of memory attribute that attribute is identical; Then according to time-labeling, the identical computing class operator of operational attribute is spatially merged compression and/or the identical storage class operator of memory attribute is spatially merged to compression; Then introduce and control class operator, the computing class operator after compression and/or storage class computing operator are generated to corresponding configuration instruction, realize the multiplexing of computing class operator and/or storage class operator.
The step of cluster compression step and generation restructural operator function piece, all can produce not only a kind of result.The same subfunction of different function calls, because confinement time is different, the cluster result producing is also different.Therefore need to be optimized according to parameters such as time, area, power consumptions, by performance (execution time) discharge order, just the cluster result that meets time-constrain represents that its hardware realizes Least-cost, therefore selects overall algorithm execution time close to completing the optimum results of the spacetime diagram of the needed total temporal constraint of integrated circuit algorithm as cluster compression.
Step S6, lower hardware mapping step, generates integrated circuit lower hardware logical description according to the spacetime diagram after cluster compression.
Lower hardware mapping method of integrated circuit based on above-mentioned, the invention also discloses a kind of integrated circuit lower hardware mapping device, please refer to Fig. 7 b, and the integrated circuit lower hardware mapping device of the present embodiment comprises:
Process analysis module 1 for reading the computer language procedure of describing integrated circuit algorithm, identifies mapped execution object and parameter object from described computer language procedure according to the rule of this computerese; Data Control flow graph generation module 2, for being mapped to by the execution object identifying and parameter object the Data Control flow graph of describing integrated circuit algorithm; Operator spacetime diagram generation module 3, for the processing capacity of carrying out according to each node of Data Control flow graph, from the operator cell library of setting up in advance, take out at least one operator unit of corresponding function, thereby Data Control flow graph is converted to the operator spacetime diagram being formed by operator unit; Temporal constraint module 4, for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, each the operator unit label time in operator spacetime diagram, carries out temporal constraint to each level of operator spacetime diagram; Spacetime diagram compression module 5, for carry out the cluster compression on space during to spacetime diagram according to time-labeling, and makes it overall algorithm execution time close to total temporal constraint; Lower hardware mapping block 6, generates integrated circuit lower hardware circuit according to the spacetime diagram after cluster compression.
Below in conjunction with specific embodiment, lower hardware mapping method of integrated circuit of the present invention and device are described.
H.264 be the common digital video coding standard of formulating of the common joint video team (JVT) of setting up of International Telecommunication Association (ITU-T) and ISO (International Standards Organization) (ISO).The X264_me_search function of the H.264 C language description of standard of take in the present embodiment is example, and method of the present invention is described in more detail.
As shown in Figure 8, for function X264me search is analyzed and comprises step:
S11, read computer language procedure, and search function in this computer language procedure.In the present embodiment, first the fetch program, go forward side by side lang method and lexical analysis, obtain function X264_me_search.
S12, this function is resolved, obtain function calling relationship and the parameter object of this function, this parameter object comprises input data, output data, the input constant of this function, and the intermediate data of layer functions on this, and each parameter object of this function is carried out corresponding data dependence, shares the mark of the information such as storage, distributed store.In the present embodiment, function x264_me_search is analyzed and obtains its input variable, output variable, input constant and output constant is as shown in table 1:
Table 1:
Signal name Data type Direction Explanation
i_pixel Int IN ∥PIXEL WxH
Continued 1:
Figure GDA0000146368240000091
In the present embodiment, to this function x264me search, internal analysis obtains its built-in variable and constant, as shown in table 2:
Table 2:
Signal name Data type Explanation
i_pixel 1nt ∥ pixel WxH
bcost 1nt The interim optimum estimate point of ∥
bmx 1nt The interim motion vector x of ∥ component
bmy 1nt The interim motion vector y of ∥ component
p_ffef uint8t * ∥ reference frame address
i_iter 1nt ∥ loop variable
The current resolved function of function calling relationship judgement of the current resolved function of S13, basis comprises lower layer functions, and search in this way lower layer functions, and perform step S14, otherwise end operation.In the present embodiment, if do not comprise lower layer functions in current resolved function, this function is also bottom function.This function is being carried out in resolving, function pixel_sad_16 * 16 that obtained this function call, function x264_me_search is upper layer functions, function pixel_sad_16 * 16 are lower layer functions.In the present embodiment, if do not comprise lower layer functions in current resolved function, this function is also bottom function.
S14, lower layer functions is analyzed, obtained the parameter object of this lower layer functions, comprise input data and output data, and each parameter object of this lower layer functions is carried out data dependence, shares the mark of the information such as storage, distributed store.In the present embodiment, lower layer functions pixel_sad_16 * 16 are analyzed, obtained the input and output of these lower layer functions pixel_sad_16 * 16, as shown in table 3:
Table 3:
Signal name Data type Direction Explanation
pix1 uint8t * IN input pixel value 1
i_stride_pix1 1nt IN pixel value 1 storage line width
pix2 uint8t * IN input pixel value 2
i_stride_pix2 1nt IN pixel value 1 storage line width
i_sum 1nt OUT ∥ SAD result of calculation
In present embodiment, data dependence refers to analyzes the relation being associated between the variable draw and/or constant, comprises that computing is relevant relevant with storage.Wherein, computing is relevant is that input and output are relevant, and output variable or the output signal of process computing operator are relevant to its input variable or input signal; Storage is relevant comprises that write-read is relevant, read-write is relevant and write relevant, wherein, write-read is relevant refers to that variable proper operation for same memory address is sequentially for first writing and read, and reads variable and is relevant to and writes variable, if traffic error occurs write-after-read in operation; Read-write is relevant refers to that variable proper operation for same memory address is sequentially for first reading and write, and writes variable and is relevant to and reads variable, if write-then-read is that read data is capped and makes a mistake in operation; Writing and relevant referring to that the variable for same memory address will write according to proper operation order, writing and between variable, have correlativity.
In present embodiment, shared storage refers to be connected between the modules of same storage and can mutually access, and the data between them can exchange by sharing storage mode.And distributed store refers to that each module has the storage of its independent allocation, the storage of each other module of module inaccessible, the data communication between them can only be mutual by communication port.In hardware design, the shared storage distributed store that compares has increased extra interconnect resources, therefore when generating algorithm Data Control flow graph, will separate and share storage and distributed store according to its content.
As shown in Figure 9, the execution object and the parameter object that according to analysis, obtain, generated data is controlled flow graph and is comprised step:
S21, the parameter object that identification is obtained are mapped as the memory node in data stream, and distinguish sharing storage and distributed store according to the storage information of mark.
S22, according to mark data dependence, the operational order in function is mapped as to the processing block diagram in Data Control flow graph.The computings such as the operational order in the present embodiment comprises and adding, subtracts, displacement.
S23, according to the data dependence of mark, the steering order in function is mapped as in Data Control flow graph to the control stream for identification-state, state transitions condition and state control signal.
In the present embodiment, steering order comprises call relation between function, recursion instruction etc.Loop statement comprises quiet cycle and dynamic circulation statement, and wherein dynamic circulation statement comprises again dynamic circulation, individual layer dynamic circulation and the multilayer dynamic circulation that can be changed into quiet cycle.Quiet cycle refers to that loop variable is constant; The dynamic circulation that can be changed into quiet cycle refers to that cycle index is variable, once but when the occasion of its application, determine, its loop variable is also just determined, in this circulation, the occasion of application is determined, its cycle index also just becomes constant, thereby becomes quiet cycle from dynamic circulation; Individual layer dynamic circulation refers to that cycle index is variable, and there is no nested other circulations; Multilayer dynamic circulation refers to that cycle index is variable, and is nested with interior loop.
In the present embodiment, when loop statement is quiet cycle, by this quiet cycle, is mapped as control stream and comprises step:
S2411, according to cycle index, loop body is launched, obtain the new loop body with cycle index equivalent number.Each loop body comprises operation expression, and has common parameter object between each operation expression.
In the present embodiment, in function static void predict_16x16_dc, include three loop statements:
Figure GDA0000146368240000101
Figure GDA0000146368240000111
Thus, known first loop unrolling is obtained to new operation expression, is respectively:
dc=dc+src[-1];dc=dc+src[-i_stride];dc=dc+src[-1+i_stride];dc=dc+src[1-i_stride];......dc=dc+src[-1+15*i_stride];dc=dc+src[15-i_stride];dc=(dc+16)>>5。
Second and the 3rd loop unrolling obtain respectively operation expression and are:
src[0]=dc;...src[15]=dc;
src[0+i_stride]=dc;...src[15+i_stride]=dc;...src[0+15*i_stride]=dc;...src[15+15*i_stride]=dc;
S2412, new expression formula expansion being obtained according to parameter object are carried out iteration, thereby are obtained a new operation expression.In the present embodiment, the new operation expression being obtained from first above-mentioned loop unrolling, carries out iteration according to index dc by these operation expressions, obtains a new expression formula about dc:
dc=(0+src[-1]+src[-i_stride]+src[-1+i_stride]+src[1-i_stride]+...+src[-1+15*i_stride]+src[15-i_stride]+16)>>5。
S2413, the operational order in this new operation expression is mapped as to processing block diagram, the parameter object in operation expression is mapped as to the memory node in data stream.In the present embodiment, when loop statement is while can be changed into the dynamic circulation statement of quiet cycle, once because the environment of its application is determined, its loop variable just becomes constant, and its corresponding mapping step is identical with the mapping step of quiet cycle statement.
In the present embodiment, when loop statement is the mapping of individual layer dynamic circulation, the generation step of controlling stream comprises:
S2421a, circulating content is mapped as to processing block diagram, recursion instruction is mapped as to state machine.The present embodiment, by respectively circulating content being mapped as to processing block diagram, being mapped as recursion instruction the mode of state machine, thereby data stream is divided into two states: order status (sequence) A and recurrent state (loop) A, as shown in figure 10.
In the present embodiment, when circulation is multilayer dynamic circulation, the generation method of controlling stream comprises:
S2421b, respectively by the content map of interior loop statement and outer loop statement for processing block diagram A and processing block diagram B, by outer cyclic mapping, be state machine B respectively, interior loop is mapped as to state machine A, and this processing block diagram A and state machine A are mapped in and process in block diagram B, as shown in figure 11.That is to say in the present embodiment, adopt each circulation in multilayer dynamic circulation to be mapped as state machine, and the control that the nested mode of carrying out state machine generates corresponding Data Control stream is flowed.
Certainly in the present embodiment, it is the mode of a unified state machine that this multilayer dynamic circulation also can adopt interior loop and outer cyclic mapping, as shown in figure 12, if but nested circulation is the circulation of N layer, circulating content is mapped as to processing block diagram, by cyclic mapping, be a unified state machine, and the status number of state machine is N+1.That is to say in the present embodiment, when multilayer dynamic circulation adopts unified state machine, the status number of state machine equals cycle index and adds one.
In the present embodiment, according to the control statement in computer language procedure, comprise branch's control statement, and branch's control statement comprises single branch control statement and nested branch control statement.
In the present embodiment, during control statement Wei Dan branch of Dang Gai branch control statement:
S2421b, branch's steering order is mapped as to MUX.
The input end that S2422b is MUX by control statement content map is processed block diagram.
S2423b, the control end that controlled condition is mapped as to MUX are processed block diagram, finally obtain the structural drawing of this branch's control statement, as shown in figure 13.In the present embodiment, if this control statement content is or/and controlled condition is variable, be mapped as memory node.
In the present embodiment, when Dang Gai branch control statement is nested branch control statement, owing to function being analyzed before, therefore obtain upper strata control statement and the bottom control statement of this nested branch control statement:
S2421c, upper strata branch steering order is mapped as to MUX 1, the input end that is MUX 1 by the content map of upper strata branch control statement is processed block diagram, and the control end that the controlled condition in upper strata branch control statement is mapped as to this MUX 1 is processed block diagram.
S2422c, branch of lower floor steering order is mapped as to MUX 2, the content map of branch of lower floor control statement is that the input end of MUX 2 is processed block diagram, and the control end that the controlled condition in branch of lower floor control statement is mapped as to MUX 2 is processed block diagram.
S2423c, selects 1 output as the input of MUX 2 multichannel, thereby obtains the structural drawing of this nested branch control statement, as shown in figure 14.
Certainly in the present embodiment, if this control statement content or/and controlled condition is variable, is mapped as corresponding memory node in data stream.
Method based on above-mentioned, thus obtain respectively corresponding to function X264_me_search and function pixel_sad_16 * 16, Data Control flow graph, as shown in Figure 15 and Figure 16.
Control expression formula conversion method in the present embodiment is the core of all linguistic expression's conversions, and its conversion efficiency directly affects the data volume of generating algorithm Data Control flow graph.The conversion method target that the present embodiment proposes is hardware realization, therefore in advance concurrency factor, hardware configuration is taken into account as state machine, MUX etc., can help to a greater extent Hardware Engineer to carry out hardware design.
Please refer to Figure 17, Data Control stream map generalization method based on above-mentioned, the present invention also provides a kind of Data Control flow graph generating apparatus, comprise: process analysis module 1, for reading the computer language procedure of describing integrated circuit algorithm, according to the rule of this kind of computerese, from computer language procedure, identify mapped execution object and parameter object, wherein carry out object and comprise operational order and/or steering order, parameter object comprises at least one in input data, output data and intermediate data; Data Control flow graph generation module 2, for being mapped to by the execution object identifying and parameter object the Data Control flow graph of describing integrated circuit algorithm.
Please refer to Figure 18, in the present embodiment, Data Control flow graph generation module 2 comprises: carry out object map submodule 21, for the operational order of carrying out object is mapped as to processing block diagram, and for the steering order of carrying out object being mapped as to the control stream of identification-state, state transitions condition and state control signal; Parameter object mapping submodule 22, for being mapped as parameter object the memory node in data stream.
The Data Control flow graph generating based on said method, is described in more detail generating operator space-time drawing generating method below in conjunction with specific embodiment.
Transfer principle by Data Control flow graph generating operator spacetime diagram is: computing comprises 1 with storage implementation structure, and same storer is shared in input and output, arithmetic section restructural, i.e. the data stream form based on processor; 2 pipeline data stream forms; 3 parallel data stream forms.Wherein first kind form is general form, and its data stream form is serial in time, and will be according to sequential demand by its parallelization in from higher level lanquage to hardware conversion.
Therefore the principle of, in the present embodiment, data stream being launched is:
I, order related data flow structure: utilize operator unit to launch, this expansion can be implemented streamline, and flow beat was calculated according to the longest processing time, by storer, as data between streamline, is stored, and adjusts streamline beat.For example, the execution time of FunA, Func B and Func C is respectively 3,4 and 5 unit interval; The input bandwidth of Func A is 3data/cycle, output bandwidth is 2data/cycle, the input bandwidth of FuncB is 5data/cycle, output bandwidth is 4data/cycle, the input bandwidth of Func C is 7data/cycle, output bandwidth is 6data/cycle, and raw data A_OUT, B_IN, B_OUT, C_IN, C_OUT and A_IN are shared to same storage, and arithmetic section is reconstruct.Data stream is closed and to be: A_OUT=B_IN, and B_OUT=C_IN, and during C_OUT ≠ A_IN, be that the output data of Func A are as the input of Func B, the output data of Func B are as the input of Func C, and do not have the feedback of data, and this data stream can be converted into pipeline organization.Wherein, flow beat was calculated according to the longest processing time, by storage operators, as interrupting, adjusted streamline beat, as shown in figure 19.
The data stream of II, existence feedback: when data stream is a circulation, for there being the data stream of data dependence can not be converted into pipeline organization.If there is data dependence but be stored between each batch data of same shared storage, and each batch data inside is not while there is no data dependence, can realize the inside flowing water of processing each batch data, can reduce like this bandwidth of storage.For example, data A_IN is by A_IN_0, A_IN_1 and A_IN_2 form, although A_IN integral body depends on the output of C_OUT, A_IN_0, between A_IN_1 and A_IN_2, there is no data dependence, therefore can utilize inner flowing water by A_IN_0, A_IN_1 and A_IN_2 do in batches, obtain being Func A after complete C_OUT again, thereby obtain local flow's waterline formal transformation structure, as shown in figure 20.
III, parallel data flow structure: owing to there is no inputoutput data correlativity, the hardware independent expansion that can walk abreast.For example, as A_IN ≠ A_OUT ≠ B_OUT ≠ C_OUT, B_IN ≠ A_OUT ≠ B_OUT ≠ C_OUT, during C_IN ≠ A_OUT ≠ B_OUT ≠ C_OUT, be Func A, Func B, any input of Func C is all uncorrelated with their output, thereby data stream is expanded into parallel form, as shown in figure 21.
Principle based on above-mentioned, is converted to corresponding operator spacetime diagram by Data Control flow graph and specifically comprises step:
S31, set up operator cell library in advance.
S32, according to mark shared storage information and distributed store information, inputoutput data and intermediate data are stored.
S33, in Data Control flow graph, existence order related data flow, expands into pipeline organization by this data stream.
S34, in operator cell library, take out with launch after the operator unit of each node corresponding function.When processing capacity that the node in Data Control flow graph wherein carries out is simple, only need an operator or two operators in corresponding operator cell library, when yet the processing of carrying out when its functional module is more complicated, need to be corresponding to a plurality of operators in operator cell library, and replace corresponding node in Data Control flow graph with the combination of the plurality of operator.In the present embodiment, arithmetic logic L0 and the L1 of function x264me search are respectively: L0 logic: bmx=x264_clip3 ((m-> mvp[0]+2) > > 2,-m-> i_mv_range, m-> i_mv_range); L1 logic: bmy=x264_clip3 ((m-> mvp[1]+2) > > 2,-m-> i_mv_range, m-> i_mv_range).
In the present embodiment, logic L0 and logic L1 are order related data flow structure, are therefore expanded into operator spacetime diagram as shown in figure 22, wherein, the configuration signal that X0 and X1 produce for controlling operator, it is controlled stream and dots.Rectangle in figure is processed computing operator or the storage operators of block diagram representative mapping, and their interconnection completes by linking operator.
In the present embodiment, between data stream, not only there is alphabetic data dependency structure, also may there is the feedback of data stream, perform step: S35, according to storage information, judgement is stored between the data of each batch data inside of same shared storage whether have data dependence, if there is no data dependence, can realize the inside flowing water of processing this batch data, thereby reduce the bandwidth of storage.
In the present embodiment, the arithmetic logic L3 of function x264me search comprises two parts, is respectively: first:
bcost=h->pixf.sad[i_pixel](m->p_fenc,m->i_stride,p_fref,(m->i_stride)*5);
for(i_iter=0;i_iter<16;i_iter++)
{int best=0;int cost[4];
(cost[0])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(-1)*m->i_stride*5+(0)],m->i_stride*5)+m->lm*(bs_size_se(((bmx+(0))<<2)-m->mvp[0])+bs_size_se(((bmy+(-1))<<2)-m->mvp[1]));
(cost[1])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(1)*m->i_stride*5+(0)],m->i_stride*5)+m->lm*(bs_size_se(((bmx+(0))<<2)-m->mvp[0])+bs_size_se(((bmy+(1))<<2)-m->mvp[1]));
(cost[2])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(0)*m->i_stride*5+(-1)],m->i_stride*5)+m->lm*(bs_size_se(((bmx+(-1))<<2)-m->mvp[0])+bs_size_se(((bmy+(0))<<2)-m->mvp[1]));
(cost[3])=h->pixf.sad[i_pixel](m->p_fenc,
m->i_stride,&p_fref[(0)*m->i_stride*5+(1)],m->i_stride*5)+m->lm*(bs_size_se(((bmx+(1))<<2)-m->mvp[0])+bs_size_se(((bmy+(0))<<2)-m->mvp[1]));
Second portion:
if(cost[1]<cost[0])best=1;
if(cost[2]<cost[best])best=2;
if(cost[3]<cost[best])best=3;
if(bcost<=cost[best])
break;
bcost=cost[best];
Wherein, first is order related data flow, existence order related data flow in each batch data in second portion, and there is feedback, first generate inner streamline, feeding back conversion, two parts are converted to the spacetime diagram being comprised of operator unit, respectively as shown in Figure 23 a and Figure 23 b.
In the present embodiment, be stored between each batch data stream in same shared storage and also may do not have data dependence, separate between each batch data stream, by this data stream parallel expansion.
S36, carries out parallel pipeline by data stream.In the present embodiment, the arithmetic logic L5 of function x264_me_search is: m-> mv[0]=bmx < < 2; M-> mv[1]=bmy < < 2; This logic L5 is parallel pipeline, and its generating operator spacetime diagram as shown in figure 24.
Through mentioned above principle, the operator spacetime diagram of being changed by the Data Control flow graph of function x264_me_search and function pixel_sad_16 * 16, respectively as shown in Figure 25 a and Figure 25 b.Wherein, function pixel_sad_16 * 16 in original program due to function X264_me_search, have been called, so in Figure 25 a, clearly do not indicate the concrete operator spacetime diagram of function pixel_sad_16 * 16, the operator spacetime diagram of function pixel_sad_16 * 16 is as shown in Figure 25 b.
The Data Control flow graph that method based on above-mentioned generates, the method that this Data Control flow graph is applied to temporal constraint below in conjunction with specific embodiment describes in detail.
Data flow diagram is applied to the method for temporal constraint, is divided into two stages:
I, the requirement specification definition of determining algorithm and target integrated circuit technology, and adopt the basic sequential unit of different operators according to different process.II, take function as unit carries out the mark of temporal constraint to data flow diagram, according to its data dependence and variable storage information, can obtain from the temporal constraint of the downward layer functions of top layer.
Based on mentioned above principle, in the present embodiment, the method that Data Control flow graph is applied to temporal constraint comprises step:
S41, according to the requirement of user specification demand and target integrated circuit technology, determine total temporal constraint.In the present embodiment, the c program of algorithm is H.264 described to X264 code, if algorithm requirements specification is defined as
Figure GDA0000146368240000151
(video resolution is 1280 * 720 pixels, processes 60 frame video datas p.s.).If target integrated circuit technology is 130nm technique, call operator technology library, can obtain standard unit's sequential is 5ns, utilizing operator technology library can generative circuit dominant frequency be 200MHz.If target integrated circuit technology is 65nm technique, call operator technology library, can obtain standard unit's sequential is 2.5ns, utilizing operator technology library can generative circuit dominant frequency be 400MHz.
In the present embodiment, according to step S41, obtain after total temporal constraint, data flow architectures different in Data Control flow graph is carried out to temporal constraint and comprise two kinds of methods, be respectively according to parallel data stream and carry out temporal constraint; According to serial data stream, carry out temporal constraint.
When the data stream in Data Control flow graph is parallel data stream, step S42 comprises step:
S421a, divide total temporal constraint equally each operator level in corresponding spacetime diagram, and divide the temporal constraint of each operator level equally each operator unit in this operator level.The H.264 algorithm of take in the present embodiment is example, and in frame level is processed, its constraint comes from algorithm requirements specification and is defined as 720p@60fps, i.e. processing 60 frames per second.If selecting technique is 130nm technique, utilizing operator technology library can generative circuit dominant frequency be 200MHz, and the temporal constraint that obtains this level is 200MHz/60 frame=3.33M cycle/ frame.At macroblock level Data processing, because each macro block processing sequence is serial processing, therefore Timing Constraints distributes according to serial, be that every frame is comprised of 80 * 45=3600 macro block, the processing power that requires each macro block is 1280 * 720 * 60/256=216000MB/s, when each macro block is processed required operator unit, ordinal number is 200MHz/216000MB/s=926cycle/MB, and the sequential that is about to the computing of frame level is divided equally to each macro block: 3.33M cycle/ frame/3600=926cycle/MB.Because the macro block in video encoder is divided into inter prediction, infra-frame prediction, transform and quantization, entropy coding, block elimination filtering etc., these macro blocks are processed a kind of parallel form that is operating as, therefore the temporal constraint of each macro block is also 926cycle/MB, thereby has generated successively the temporal constraint of bottom layer treatment module downwards.
In the present embodiment, when serial data is carried out to temporal constraint, step S42 comprises step:
S421b, using the basic sequential unit of the total operator of corresponding each operator level of each node of serial in Data Control stream as overall temporal constraint, the ratio that accounts for the sequential summation that operator unit that in each operator level, the longest arithmetic path shines upon is corresponding according to the sequential of the computing operator that in each operator level, the longest arithmetic path shines upon is distributed the sequential of each operator level.For example, when two operator level A and the pass of B in Data Control flow graph are serial, and their overall temporal constraint be them N the basic sequential of operator with.Sequential corresponding to computing operator that the longest arithmetic path was shone upon in operator level A and B of take is benchmark, in this benchmark, show that ratio distributes N the basic sequential unit of operator.The computing operator number that for example in operator level A, the longest arithmetic path shines upon is Ma operator, the basic sequential unit of a corresponding Na operator; The computing operator number that in operator level B, the longest arithmetic path shines upon is Mb operator, the basic sequential unit of a corresponding Nb operator, and the temporal constraint being assigned to so on operator level A is: the temporal constraint being assigned on operator level B is:
Figure GDA0000146368240000162
In the present embodiment, computing operator can be that many basic sequential units carry out, so operator number is not necessarily consistent with the basic sequential unit of operator.
S422b, the sign that serial data stream contains parallel data in Data Control flow graph, revises temporal constraint in step S421b according to the sign of this parallel data.For example, modules A can obtain information p-x-n-m, and wherein p is concurrency sign, and x is concurrency kind, comprises the types such as pipeline parallel method, local pipeline parallel method, data parallel; N is parallel position, walks abreast n processing stage; M is and line number.
According to above information, the temporal constraint of modules A can be adjusted into
Figure GDA0000146368240000163
β wherein nfor the empirical parameter in stage n parallelization, N pnfor the basic sequential unit of processing operator in stage n.
Method based on above-mentioned, the structure during to structure shown in Figure 25 a and Figure 25 b after row sequential mark is as shown in Figure 26 a and Figure 26 b.
Because structure shown in Figure 25 a has been called the Operator structure of function pixel_sad_16 * 16, so first structure shown in Figure 25 b is compressed, the result after compression as shown in figure 27; And then the row compression during to structure shown in Figure 25 a according to the method that the spacetime diagram shown in Figure 27 is compressed, thereby the spacetime diagram of the function X264_me_search after being compressed.
When structure shown in 25a and Figure 25 b being carried out to spacetime diagram compression, mainly follow following principle:
1, the computing class operator that in operator spacetime diagram, operational attribute is identical is carried out to cluster compression.Such as, in spacetime diagram, two parallel add operation operators can be compressed into an add operation operator, the mode of simultaneously controlling operator by introducing realizes the multiplexing of addition operator after compression, complete with compress before two functions that addition operator is identical.As can be seen here, after operator spacetime diagram is compressed, the number of operator can significantly reduce, thereby has saved the area of integrated circuit, and correspondingly, the operator after compression is realized multiplexing by control operator, has increased the execution time of integrated circuit overall algorithm.Be understandable that, to the cluster compression of computing class operator, must cause that storage class operator, control class operator, class of paths operator and clock class operator also correspondingly change, so can also do corresponding cluster compression further to save integrated circuit area according to actual conditions to above-mentioned operator, storage class operator especially wherein.
2, when introducing control operator, generate corresponding configuration-direct, described configuration-direct is worked according to predetermined mode for controlling the operator of generation, thus the identical function of realization and compression pre-operator.
3, for the possible cluster compression result of same operator spacetime diagram, have multiple.Therefore, in compression process, the spacetime diagram that after selection compression, the spacetime diagram overall algorithm execution time approaches confinement time is most as final compression result, the spacetime diagram that the selection overall algorithm execution time approaches confinement time is most as compression result, can, in the situation that guaranteeing to meet sequential condition, save the area of integrated circuit the largelyst.Be the integrated circuit maximum execution time calculating according to the performance index of user's proposition confinement time.
Above-mentioned to after the compression of spacetime diagram cluster, can reduce area and the power consumption of integrated circuit.And the operator generating after cluster compression has certain regularity.
Above-mentioned spacetime diagram is carried out after cluster compression, in the time of can also be to some operator wherein, row be optimized, and a kind of mode of optimization is that row solidifies customization during to some operator.Such as, in Figure 28, the left side is the computing class operator after a kind of compression, due to not use of logic unit wherein, so obtain the Operator structure shown in the right in Figure 28 after the logic unit of this operator can being removed, dwindle further the area of operator.Again such as, for the ADDS operator shown in Fig. 2, owing to passing through to change the value of control bit X, can make this operator realize different functions, and in a certain concrete integrated circuit, in fact only use the addition shifting function of this operator, the value of the control bit X of this operator can be fixed as to 000, thereby met the power consumption of having saved integrated circuit under the prerequisite of functional requirement.By solidifying the method for customization, can in circuit, only occur realizing the minimal hardware that identical function operates, then these minimal hardware be carried out to full Custom Design, make it there is no other expanded function.Like this, both can guarantee the correct execution of algorithm, can optimize again area and the power consumption of integrated circuit.
After spacetime diagram before and after the compression of X264_me_search function cluster is analyzed, can obtain the form shown in Figure 29.From this form, can find out, after cluster compression, operator number used reduces to 139 by 1724, although 500 cycles that correspondingly sequential was increased to from 292 cycles, 500 cycles are still less than the constraint cycle 600 of X264_me_search.As can be seen here, after cluster compression, in the execution time that guarantees overall algorithm, meet under the condition of temporal constraint, can also reduce significantly operator number used, thereby save area and the power consumption of integrated circuit.
Method based on above-mentioned, by the X264_me_search function to the H.264 C language description of standard, carry out process analysis, and after shining upon, obtain the Data Control flow graph of function X264_me_search as shown in figure 15, by this Data Control flow graph, be converted to the operator spacetime diagram as shown in Figure 25 a and Figure 25 b, according to information such as data dependences, Data Control flow graph is carried out to temporal constraint and obtain the temporal constraint figure as shown in Figure 26 a and Figure 26 b, then according to time-labeling, spacetime diagram is carried out to cluster and compress the spacetime diagram after the cluster compression that obtains function pixel_sad_16 * 16 as shown in figure 27, last again by the spacetime diagram generation integrated circuit lower hardware circuit after this compression.Due in to computer language procedure analytic process, obtaining data dependence, data can concurrency and corresponding control information, and this program is mapped as to Data Control flow graph, can concurrency and corresponding control information thereby the Data Control flow graph that makes to obtain comprises data dependence, data, thus ancillary hardware circuit designer is carried out circuit design effectively.And it is Data Control flow graph that the device of the present embodiment can be realized computer language procedure automatic mapping, thereby be converted to again operator spacetime diagram, greatly promote integrated circuit Design of Hardware Architecture efficiency.
Method from computer language procedure generated data control flow graph provided by the invention is not only adapted to above-described embodiment, also can be used for other and by computer language procedure, is generated the mapping process of integrated circuit lower hardware logical description.
Above content is in conjunction with concrete embodiment further description made for the present invention, can not assert that specific embodiment of the invention is confined to these explanations.For general technical staff of the technical field of the invention, without departing from the inventive concept of the premise, can also make some simple deduction or replace, all should be considered as belonging to protection scope of the present invention.

Claims (8)

1. lower hardware mapping method of integrated circuit, is characterized in that comprising:
Process analysis step for reading the computer language procedure of describing integrated circuit algorithm, identifies mapped execution object and parameter object from described computer language procedure according to the rule of this computerese;
Data Control flow graph generates step, and for the execution object map identifying being become to describe processing block diagram or the control stream of the data flow diagram of integrated circuit algorithm, parameter object is mapped to the memory node in the Data Control flow graph of describing integrated circuit algorithm;
Operator spacetime diagram generates step, for the function treatment of carrying out according to each node of Data Control flow graph, from the operator cell library of setting up in advance, take out at least one operator unit of corresponding function, Data Control flow graph is converted to the operator spacetime diagram being formed by operator unit;
Temporal constraint step, for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, each the operator unit label time in operator spacetime diagram, carries out temporal constraint to each level of operator spacetime diagram;
Operator spacetime diagram compression step, for according to time-labeling, operator spacetime diagram being carried out to the cluster compression on space, and makes it overall algorithm execution time close to total temporal constraint;
Lower hardware mapping step, generates integrated circuit lower hardware logical description according to the operator spacetime diagram after cluster compression.
2. lower hardware mapping method of integrated circuit as claimed in claim 1, is characterized in that, described execution object comprises operational order and/or steering order, and described parameter object comprises at least one in input data, output data and intermediate data.
3. lower hardware mapping method of integrated circuit as claimed in claim 2, it is characterized in that, described process analysis step comprises: according to the rule of this kind of computerese, from described computer language procedure, find out function, computing in the input and output of described function and function is resolved, identify and carry out object and parameter object, in resolving, if current resolved function is the upper layer functions that comprises lower layer functions, continue lower layer functions to resolve.
4. lower hardware mapping method of integrated circuit as claimed in claim 3, it is characterized in that, described steering order is the recursion instruction in loop statement, and described loop statement comprises quiet cycle statement and dynamic circulation statement, and described Data Control flow graph generates step and comprises:
When loop statement is quiet cycle statement, according to cycle index, loop body is launched, after loop body launches, bring parameter object into and obtain new operation expression, operational order in described operation expression is mapped as to processing block diagram, the parameter object in described operation expression is mapped as to the memory node in data stream;
When loop statement is while can be changed into the dynamic circulation statement of quiet cycle statement, according to the different occasion change state loop statements that call, it is quiet cycle statement, and its expansion is obtained to new operation expression, operational order in described operation expression is mapped as to processing block diagram, the parameter object in described operation expression is mapped as to the memory node in data stream;
When loop statement is individual layer dynamic circulation statement, the content map of loop statement, for processing block diagram, is mapped as to state machine by recursion instruction;
When loop statement is multilayer dynamic circulation statement, by the content map of outer loop statement and interior loop statement, be respectively that the first processing block diagram and second is processed block diagram, outer recursion instruction is mapped as to the first state machine, by interior loop command mappings, be the second state machine, and described second process block diagram and the second state machine and is mapped in described first and processes in block diagram; Or by the content map of outer loop statement and interior loop statement, be respectively that the first processing block diagram and second is processed block diagram, recursion instruction is mapped as to state machine, and described second processes block diagram is mapped in described first and processes in block diagram, and the cycle index that the status number of described state machine equals described circulation adds one.
5. lower hardware mapping method of integrated circuit as claimed in claim 4, is characterized in that, described steering order is the steering order in branch's control statement, and described Data Control flow graph generates step and comprises:
Described branch steering order is mapped as to MUX, and the input end that the content map of branch's control statement is MUX is processed block diagram, and the control end that the controlled condition of branch's control statement is mapped as to described MUX is processed block diagram.
6. lower hardware mapping method of integrated circuit as claimed in claim 5, it is characterized in that, described branch control statement is nested branch control statement, Ze Jiang upper strata branch steering order is mapped as the first MUX, the content map of described upper strata branch control statement is that the input end of described the first MUX is processed block diagram, and the controlled condition of described upper strata branch control statement is mapped as the control end of described the first MUX and processes block diagram; Branch of lower floor steering order is mapped as to the second MUX, described in the content of branch of described lower floor control statement, the input end of the second MUX is processed block diagram, the controlled condition of branch of described lower floor control statement is mapped as the control end of described the second MUX and processes block diagram, and the output of described the first MUX is as the input of described the second MUX.
7. integrated circuit lower hardware mapping device, is characterized in that comprising:
Process analysis module for reading for describing the computer language procedure of integrated circuit algorithm, identifies mapped execution object and parameter object from described computer language procedure according to the rule of this computerese;
Data Control flow graph generation module, for the execution object map identifying being become to describe processing block diagram or the control stream of the data flow diagram of integrated circuit algorithm, parameter object is mapped to the memory node in the Data Control flow graph of describing integrated circuit algorithm;
Operator spacetime diagram generation module, for the function treatment of carrying out according to each node of Data Control flow graph, from the operator cell library of setting up in advance, take out at least one operator unit of corresponding function, Data Control flow graph is converted to the operator spacetime diagram being formed by operator unit;
Temporal constraint module, for determining total temporal constraint according to the requirement of user specification demand and target integrated circuit technology, each the operator unit label time in operator spacetime diagram, carries out temporal constraint to each level of operator spacetime diagram;
Operator spacetime diagram compression module, for according to time-labeling, operator spacetime diagram being carried out to the cluster compression on space, and makes it overall algorithm execution time close to total temporal constraint;
Lower hardware mapping block, generates integrated circuit lower hardware logical description according to the operator spacetime diagram after cluster compression.
8. integrated circuit lower hardware mapping device as claimed in claim 7, is characterized in that,
Described execution object comprises operational order and/or steering order, and described parameter object comprises at least one in input data, output data and intermediate data.
CN201010622446.XA 2010-12-31 2010-12-31 Lower hardware mapping method of integrated circuit, and data control flow generation method and device Expired - Fee Related CN102054109B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010622446.XA CN102054109B (en) 2010-12-31 2010-12-31 Lower hardware mapping method of integrated circuit, and data control flow generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010622446.XA CN102054109B (en) 2010-12-31 2010-12-31 Lower hardware mapping method of integrated circuit, and data control flow generation method and device

Publications (2)

Publication Number Publication Date
CN102054109A CN102054109A (en) 2011-05-11
CN102054109B true CN102054109B (en) 2014-03-19

Family

ID=43958422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010622446.XA Expired - Fee Related CN102054109B (en) 2010-12-31 2010-12-31 Lower hardware mapping method of integrated circuit, and data control flow generation method and device

Country Status (1)

Country Link
CN (1) CN102054109B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043886B (en) * 2010-12-31 2012-10-24 北京大学深圳研究生院 Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow
CN102054107B (en) * 2010-12-31 2013-11-06 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
US20130097568A1 (en) * 2011-10-14 2013-04-18 William W. Yang Global clock handler object for hdl environment
CN102693340B (en) * 2012-05-19 2014-04-09 孙凌宇 Large scale integrated circuit partitioning method on basis of multilevel partitioning method and weighted hypergraph
CN103647522A (en) * 2013-11-19 2014-03-19 吕晓兰 Four-mold remainder system based FIR filter and design method thereof
CN108170957B (en) * 2017-12-28 2022-01-04 佛山中科芯蔚科技有限公司 Method and system for generating data control flow diagram and integrated circuit design method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7530047B2 (en) * 2003-09-19 2009-05-05 Cadence Design Systems, Inc. Optimized mapping of an integrated circuit design to multiple cell libraries during a single synthesis pass
CN101901161A (en) * 2010-07-21 2010-12-01 四川大学 A Hierarchical Control Data Flow Graph Modeling Method Oriented to Partitioning Energy-related Software/Hardware
CN102043886A (en) * 2010-12-31 2011-05-04 北京大学深圳研究生院 Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow
CN102054107A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN102054108A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and time-space diagram compression method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4397744B2 (en) * 2004-06-25 2010-01-13 パナソニック株式会社 Method for high-level synthesis of semiconductor integrated circuit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7530047B2 (en) * 2003-09-19 2009-05-05 Cadence Design Systems, Inc. Optimized mapping of an integrated circuit design to multiple cell libraries during a single synthesis pass
CN101901161A (en) * 2010-07-21 2010-12-01 四川大学 A Hierarchical Control Data Flow Graph Modeling Method Oriented to Partitioning Energy-related Software/Hardware
CN102043886A (en) * 2010-12-31 2011-05-04 北京大学深圳研究生院 Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow
CN102054107A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN102054108A (en) * 2010-12-31 2011-05-11 北京大学深圳研究生院 Lower hardware mapping method of integrated circuit, and time-space diagram compression method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
牛亚文.HCDFG-II-面向C语言系统描述的控制/数据流图表示.《计算机辅助设计与图形学学报》.2004,第16卷(第11期),全文. *
王新安.算子设计方法缩小IC设计与制造间的"剪刀差".《集成电路应用》.2010,(第7期),全文.

Also Published As

Publication number Publication date
CN102054109A (en) 2011-05-11

Similar Documents

Publication Publication Date Title
CN102043886B (en) Underlying hardware mapping method for integrated circuit as well as time sequence constraint method and device for data control flow
CN102054109B (en) Lower hardware mapping method of integrated circuit, and data control flow generation method and device
Mohaidat et al. A survey on neural network hardware accelerators
Li et al. Dynamic dataflow scheduling and computation mapping techniques for efficient depthwise separable convolution acceleration
Catthoor et al. Application-specific architectural methodologies for high-throughput digital signal and image processing
US7305649B2 (en) Automatic generation of a streaming processor circuit
CN102054108B (en) Lower hardware mapping method of integrated circuit, and time-space diagram compression method and device
Hoffmann et al. A survey on CNN and RNN implementations
CN102054107B (en) Lower hardware mapping method of integrated circuit, and space-time diagram generation method and device
CN113158599B (en) Quantum informatics-based chip and chip-based EDA device
Xu et al. CMSA: Configurable multi-directional systolic array for convolutional neural networks
Chiu et al. Flexibility: FPGAs and CAD in deep learning acceleration
Sau et al. Automated design flow for coarse-grained reconfigurable platforms: An RVC-CAL multi-standard decoder use-case
CN113157638B (en) In-memory computing processor with low power consumption and processing operation method
Corvino et al. Design space exploration in application-specific hardware synthesis for multiple communicating nested loops
Potocnik et al. Optimizing foundation model inference on a many-tiny-core open-source risc-v platform
Muñoz-Martínez et al. A novel network fabric for efficient spatio-temporal reduction in flexible DNN accelerators
CN103136162B (en) Cloud framework and the method for designing based on this framework in ASIC sheet
CN102075763A (en) Intra-frame sub-block predictor circuit for video encoder and method for implementing same
US20230168892A1 (en) Risc-v-based 3d interconnected multi-core processor architecture and working method thereof
CN116483633A (en) Data augmentation method and related device
Vemparala et al. Hw-flow: A multi-abstraction level hw-cnn codesign pruning methodology
Goldbrunner et al. Memory access pattern profiling for streaming applications based on MATLAB models
Shee et al. Architectural exploration of heterogeneous multiprocessor systems for JPEG
Mal et al. Dynamically Reconfigurable Perception using Dataflow Parameterization of Channel Attention

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140319

Termination date: 20211231