CN107239260A - A kind of control of many predicates and compiling optimization method towards digital signal processor - Google Patents
A kind of control of many predicates and compiling optimization method towards digital signal processor Download PDFInfo
- Publication number
- CN107239260A CN107239260A CN201710328842.3A CN201710328842A CN107239260A CN 107239260 A CN107239260 A CN 107239260A CN 201710328842 A CN201710328842 A CN 201710328842A CN 107239260 A CN107239260 A CN 107239260A
- Authority
- CN
- China
- Prior art keywords
- predicate
- control
- many
- register file
- physics
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000005457 optimization Methods 0.000 title claims abstract description 14
- 238000005206 flow analysis Methods 0.000 claims description 5
- 230000001186 cumulative effect Effects 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 abstract description 3
- 238000006243 chemical reaction Methods 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 230000008030 elimination Effects 0.000 description 4
- 238000003379 elimination reaction Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30101—Special purpose registers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/443—Optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The present invention relates to a kind of many predicates control towards digital signal processor and compiling optimization method, many predicate control instruction forms are:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, implication is p (1), p (2) ..., p (n), n>When=1, n control predicates are true, instruction Rs=Rm op Rn are normally performed, and otherwise Rs=Rm op Rn cancel;Wherein, Rs=Rm op Rn reference computationses or access instruction, p (1), p (2) ..., p (n) is virtual predicate register file;Rm, Rn, Rs are general register.The present invention more flexibly and efficiently can support multiple conditions to be converted to predicate using many predicate control forms, and the code efficiency of generation is higher;Many predicate forms eliminate the phenomenon that predicate defines nesting, and many predicates control to cause the Predicate evaluation overall situationization, flattening.
Description
Technical field
The present invention relates to digital signal processor architecture design and optimization field, especially a kind of face
The control of many predicates and compiling optimization method to digital signal processor.
Background technology
It is the basic obstacle for carrying out instruction-level exploitation that branch, which redirects, and predicated execution is the machine that a kind of effective elimination branch redirects
System, it is that program is controlled to the conversion relied on to data dependence.The elimination that branch redirects can improve the feasibility of program
Can, from hardware architecture, the elimination that branch redirects can reduce hardware spending caused by branch prediction failure;From compiling
Aspect says that the elimination that branch redirects can expand scheduling scope, it is allowed to the overlapping parallel execution of multiple condition routing instructions, digs
The instruction-level parallelism across multiple Program paths is dug.
General classical predicate form is:
P1, p2=Rm cond Rn
(p1)op1
(p2)op2
If condition calculates Rm cond Rn and sets up (wherein cond compares calculating for certain, is greater than), then p1 is 0,
P2 is 1, so as to instruct op1 to cancel, op2 is normally run;, whereas if condition Rm cond Rn are invalid, then p1 is that 1, p2 is
0, so as to instruct op1 normally to run, op2 cancels.
This predicate way of realization has stronger versatility, also allows for compiling optimization and supports, but it has certain office
It is sex-limited:First, it is impossible to eliminate the conditional jump sentence of predicate control;Second, it is possible to cause longer Predicate evaluation to rely on road
Footpath.
The content of the invention
The primary and foremost purpose of the present invention, which is that offer is a kind of, both can efficiently utilize the abundant of digital signal processor offer
Logical operation resource, many predicates towards digital signal processor that the execution efficiency of conditional branching can be improved again are controlled and compiled
Translate optimization method.
To achieve the above object, present invention employs following technical scheme:A kind of many meanings towards digital signal processor
Word is controlled and compiling optimization method, and many predicate control instruction forms are:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, contain
Justice be p (1), p (2) ..., p (n), n>When=1, n control predicates are true, instruction Rs=Rm op Rn are normally performed, otherwise
Rs=Rm op Rn cancel;Wherein, Rs=Rm op Rn refer to computations or access instruction, p (1), p (2) ..., and p (n) is
Virtual predicate register file;Rm, Rn, Rs are general register;
The execution step of this many predicate control formats is:
(1) predicate register file be designed as 0 of 32 physical register Pred, 1,2 ..., 31, be 1 when 0
When, it is 1 to represent physics predicate register file p0, and physics predicate register file p1 is 0, when being 0 for 0, represents physics predicate register file
P0 is 0, and physics predicate register file p1 is 1;When being 1 for 1, it is 1, physics predicate register file p3 to represent physics predicate register file p2
For 0, when being 0 for 1, it is 0 to represent physics predicate register file p2, and physics predicate register file p3 is 1;The like;Wherein p0,
P2 ..., p30 be referred to as true physics predicate register file, p1, p3 ..., p31 be referred to as false physics predicate register file;
(2) multiple control predicates of present instruction are read, i.e. p (1), p (2) ..., p (n) are raw according to above corresponding relation
In pairs should 32 physical registers Mask and C, wherein Mask be present instruction the corresponding physical register of control predicate
Sum of correspondence position binary weights, C for many predicates corresponding physical register correspondence position of present instruction weight with it is corresponding very
The value product of predicate register file cumulative and;
(3) whether Pred [Mask]==C for judging many predicate control instructions is true, if true, then present instruction is being just
Often perform, if vacation, then present instruction is cancelled.
The instruction format for defining many predicates is:P (1), p (2)=Rm cond Rn, wherein cond are comparison condition, including
It is more than, is more than or equal to, is equal to, is not equal to, is less than, is less than or equal to, when condition is true, p (1) is false, and p (2) is true;Conversely, p
(1) it is true, p (2) is false.
In compiling, its step is as follows:
Step one:Optimize rear end in compiling, handled successively according to region, calculate predicate switching cost, select suitable area
Domain carries out predicate conversion;
Step 2:By control flow analysis, the local predicate of each basic block is recognized;
Step 3:Current region controlling stream is traveled through, the absolute control predicate path for belonging to each basic block is calculated, along control
System stream direction is calculated successively, and the process is iteration, until finding whole control predicate paths of each basic block;
Step 4:Merge basic block, be that correspondence predicate is placed in the instruction of basic block, multiple basic blocks merge into one substantially
Block, ultimately generates predicate intermediate code;Merging the basic block stage, whole conditions is calculated and places the forward of merging basic block
Position, and effective many predicate control instructions are placed on the position rearward for merging basic block, ultimately form the calculating of many predicates
Form.
As shown from the above technical solution, the advantage of the invention is that:First, many predicate control forms can be more flexible high
Effect ground supports multiple conditions to be converted to predicate, and the code efficiency of generation is higher;Second, many predicate forms eliminate predicate define it is embedding
The phenomenon of set, many predicates control to cause the Predicate evaluation overall situationization, flattening;3rd, many predicate control forms eliminate predicate it
Between dependence, simplify predicate data dependence analysis so that compiling optimization it is highly efficient.
Brief description of the drawings
Fig. 1 is classical predicate compiler framework;
Fig. 2 is that many predicates control compiler framework;
Fig. 3 is the local predicate of identification;
The absolute path that Fig. 4 controls for identification predicate;
Fig. 5 is that many predicates control basic block.
Embodiment
A kind of control of many predicates and compiling optimization method towards digital signal processor, many predicate control instruction forms
For:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, implication is p (1), p (2) ..., p (n), n>=1, n control predicates
When being true, instruction Rs=Rm op Rn are normally performed, and otherwise Rs=Rm op Rn cancel;Wherein, Rs=Rm op Rn are referred to
Computations or access instruction, p (1), p (2) ..., p (n) is virtual predicate register file;Rm, Rn, Rs are general register;
The execution step of this many predicate control formats is:
(1) predicate register file be designed as 0 of 32 physical register Pred, 1,2 ..., 31, be 1 when 0
When, it is 1 to represent physics predicate register file p0, and physics predicate register file p1 is 0, when being 0 for 0, represents physics predicate register file
P0 is 0, and physics predicate register file p1 is 1;When being 1 for 1, it is 1, physics predicate register file p3 to represent physics predicate register file p2
For 0, when being 0 for 1, it is 0 to represent physics predicate register file p2, and physics predicate register file p3 is 1;The like;Wherein p0,
P2 ..., p30 be referred to as true physics predicate register file, p1, p3 ..., p31 be referred to as false physics predicate register file;
(2) multiple control predicates of present instruction are read, i.e. p (1), p (2) ..., p (n) are raw according to above corresponding relation
In pairs should 32 physical registers Mask and C, wherein Mask be present instruction the corresponding physical register of control predicate
Sum of correspondence position binary weights, C for many predicates corresponding physical register correspondence position of present instruction weight with it is corresponding very
The value product of predicate register file cumulative and;
(3) whether Pred [Mask]==C for judging many predicate control instructions is true, if true, then present instruction is being just
Often perform, if vacation, then present instruction is cancelled.
The instruction format for defining many predicates is:P (1), p (2)=Rm cond Rn, wherein cond are comparison condition, including
It is more than, is more than or equal to, is equal to, is not equal to, is less than, is less than or equal to, when condition is true, p (1) is false, and p (2) is true;Conversely, p
(1) it is true, p (2) is false.
In compiling, its step is as follows:
Step one:Optimize rear end in compiling, handled successively according to region, calculate predicate switching cost, select suitable area
Domain carries out predicate conversion;
Step 2:By control flow analysis, the local predicate of each basic block is recognized;
Step 3:Current region controlling stream is traveled through, the absolute control predicate path for belonging to each basic block is calculated, along control
System stream direction is calculated successively, and the process is iteration, until finding whole control predicate paths of each basic block;
Step 4:Merge basic block, be that correspondence predicate is placed in the instruction of basic block, multiple basic blocks merge into one substantially
Block, ultimately generates predicate intermediate code;Merging the basic block stage, whole conditions is calculated and places the forward of merging basic block
Position, and effective many predicate control instructions are placed on the position rearward for merging basic block, ultimately form the calculating of many predicates
Form.
Fig. 1 is classical predicate compiler framework, first selects to be adapted to the region of predicate conversion, then carries out data-flow analysis,
The predicate of each basic block is recognized, according to the dependence between basic block predicate, corresponding predicate definition statement is inserted, now
The predicate definition instruction or jump instruction of predicate control can be introduced, predicate control deposit is placed in the finally instruction for each basic block
Device, and basic block merging is carried out, generate predicate intermediate code.
Fig. 2 is many predicate recognition frameworks, first selects to be adapted to the region of predicate conversion, by control flow analysis, identification is every
The control predicate of individual basic block a, basic block has been possible to multiple Partial controll predicates;Then the exhausted of each basic block is calculated
To control predicate path, calculated successively along controlling stream direction, the process is iteration, until finding the whole of each control block
Control predicate path;The absolute control predicate path for finding each basic block is to adapt to flattening predicate form.Merging
The basic block stage, it is possible to whole conditions is calculated and places the forward position for merging basic block, and effective many predicate controls
System instruction is placed on the position rearward for merging basic block, ultimately forms many predicate control intermediate code forms.
Fig. 3 is the example of the local predicate of an identification, and control block BB2, BB5 Partial controll predicate are respectively true physics meaning
Word register p2, false physics predicate register file p5;And control block BB3, BB4 then have multiple Partial controll predicates, wherein BB3 two
Partial controll predicate is that false physics predicate register file p1, false physics predicate register file p3, BB4 two Partial controll predicates are true
Physics predicate register file p4, true physics predicate register file p6.
Fig. 4 is the schematic diagram that this all controls predicate path, and control block BB2 absolute control predicate path is that have one
True physics predicate register file p2 compositions, and control block BB3, BB4, BB5 then have multiple absolute control paths.BB3 absolute control
Predicate path, to there is two, is the first paths of false physics predicate register file p1 compositions, false physics predicate register file p3 respectively
The second paths constituted with true physics predicate register file p2, it is meant that otherwise the control predicate of basic block BB3 instructions is false
Physics predicate register file p1, otherwise it is that (many predicate controls refer to false physics predicate register file p3 and true physics predicate register file p2
Make).BB4 absolute control predicate path is three, is true physics predicate register file p4 and true physics predicate register file p2 respectively
First paths of composition, true physics predicate register file p6 and false physics predicate register file p1 compositions the second paths, true thing
Manage predicate register file p6, false physics predicate register file p3, the third path of true physics predicate register file p2 compositions.BB5's is exhausted
Two are included to control predicate path, is the first of false physics predicate register file p5 and false physics predicate register file p1 compositions respectively
Paths, the Article 2 that false physics predicate register file p5, vacation physics predicate register file p3 are constituted with true physics predicate register file p2
Path.
Fig. 5 is many predicate control routines, first dos command line DOS:
P1, p2=a>B | | p3, p4=c>D | | p5, p6=a>c
Defined for many predicates, define p1, p3, the true physics predicate register files of p5 tri-, p2, p4, the false physics meanings of p6 tri-
Word register.
Second dos command line DOS:
(p4, p2) r8=2 | | (p6, p1) r8=2 | | (p6, p3, p2) r8=2 | | (p5, p1) re=3 | | (p5, p3, p2)
Re=3
Control to go for many predicates, otherwise wherein the dos command line DOS represents that re is equal under the conditions of being really at (p2, p4) or (p6)
2, otherwise it is equal to 3 under the conditions of being really at (p1, p5) or (p2, p3, p5).
In summary, many predicate control forms that the present invention is used more flexibly and efficiently can support multiple condition conversions
For predicate, the code efficiency of generation is higher;Many predicate forms eliminate the phenomenon that predicate defines nesting, and many predicates control to cause meaning
Word calculates the overall situationization, flattening;Many predicate control forms eliminate the dependence between predicate, simplify predicate data dependence
Analysis so that compiling optimization is highly efficient.
Claims (3)
1. a kind of control of many predicates and compiling optimization method towards digital signal processor, it is characterised in that:
Many predicate control instruction forms are:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, implication is p (1), p (2) ..., p
(n),n>When=1, n control predicates are true, instruction Rs=Rm op Rn are normally performed, and otherwise Rs=Rm op Rn cancel;
Wherein, Rs=Rm op Rn reference computationses or access instruction, p (1), p (2) ..., p (n) is virtual predicate register file;Rm、
Rn, Rs are general register;
The execution step of this many predicate control formats is:
(1) predicate register file be designed as 0 of 32 physical register Pred, 1,2 ..., 31, when being 1 for 0, table
It is 1 to show physics predicate register file p0, and physics predicate register file p1 is 0, when being 0 for 0, and it is 0 to represent physics predicate register file p0,
Physics predicate register file p1 is 1;When being 1 for 1, it is 1 to represent physics predicate register file p2, and physics predicate register file p3 is 0, when
1 when being 0, it is 0 to represent physics predicate register file p2, and physics predicate register file p3 is 1;The like;Wherein p0, p2 ...,
P30 is referred to as true physics predicate register file, p1, p3 ..., p31 be referred to as false physics predicate register file;
(2) multiple control predicates of present instruction, i.e. p (1), p (2) ..., p (n), according to above corresponding relation, generation pair are read
Should 32 physical registers Mask and C, wherein Mask for present instruction control predicate corresponding physical register correspondence
The sum of position binary weights, C is the weight and corresponding true predicate of the corresponding physical register correspondence position of many predicates of present instruction
The value product of register cumulative and;
(3) whether Pred [Mask]==C for judging many predicate control instructions is true, and if true, then present instruction is normally held
OK, if vacation, then present instruction is cancelled.
2. the control of many predicates and compiling optimization method according to claim 1 towards digital signal processor, its feature
It is:The instruction format for defining many predicates is:P (1), p (2)=Rm cond Rn, wherein cond are comparison condition, including big
In, be more than or equal to, be equal to, be not equal to, be less than, be less than or equal to, when condition is true, p (1) is false, and p (2) is true;Conversely, p (1)
It is true, p (2) is false.
3. the control of many predicates and compiling optimization method according to claim 1 towards digital signal processor, its feature
It is:In compiling, its step is as follows:
Step one:Optimize rear end in compiling, handled successively according to region, calculate predicate switching cost, select suitable region to enter
Row predicate is changed;
Step 2:By control flow analysis, the local predicate of each basic block is recognized;
Step 3:Current region controlling stream is traveled through, the absolute control predicate path for belonging to each basic block is calculated, along controlling stream
Direction is calculated successively, and the process is iteration, until finding whole control predicate paths of each basic block;
Step 4:Merge basic block, be that correspondence predicate is placed in the instruction of basic block, multiple basic blocks merge into a basic block,
Ultimately generate predicate intermediate code;Merging the basic block stage, whole conditions is calculated and places the forward position for merging basic block
Put, and effective many predicate control instructions are placed on the position rearward for merging basic block, ultimately form the calculating shape of many predicates
Formula.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710328842.3A CN107239260B (en) | 2017-05-11 | 2017-05-11 | Multi-predicate control and compiling optimization method for digital signal processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710328842.3A CN107239260B (en) | 2017-05-11 | 2017-05-11 | Multi-predicate control and compiling optimization method for digital signal processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107239260A true CN107239260A (en) | 2017-10-10 |
CN107239260B CN107239260B (en) | 2020-07-24 |
Family
ID=59985047
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710328842.3A Active CN107239260B (en) | 2017-05-11 | 2017-05-11 | Multi-predicate control and compiling optimization method for digital signal processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107239260B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020124400A1 (en) * | 2018-12-19 | 2020-06-25 | 华为技术有限公司 | Multi-branch skip processing apparatus and method, and processor |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5859999A (en) * | 1996-10-03 | 1999-01-12 | Idea Corporation | System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers |
CN102830954A (en) * | 2012-08-24 | 2012-12-19 | 北京中科信芯科技有限责任公司 | Method and device for instruction scheduling |
CN103617049A (en) * | 2013-12-19 | 2014-03-05 | 中国科学院声学研究所 | Code moving method based on complementary predicates |
US20150089189A1 (en) * | 2013-09-24 | 2015-03-26 | Apple Inc. | Predicate Vector Pack and Unpack Instructions |
CN105045646A (en) * | 2015-08-06 | 2015-11-11 | 中国电子科技集团公司第三十八研究所 | Partial predicate implementation of clustering structure and compilation optimization method |
-
2017
- 2017-05-11 CN CN201710328842.3A patent/CN107239260B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5859999A (en) * | 1996-10-03 | 1999-01-12 | Idea Corporation | System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers |
CN102830954A (en) * | 2012-08-24 | 2012-12-19 | 北京中科信芯科技有限责任公司 | Method and device for instruction scheduling |
US20150089189A1 (en) * | 2013-09-24 | 2015-03-26 | Apple Inc. | Predicate Vector Pack and Unpack Instructions |
CN103617049A (en) * | 2013-12-19 | 2014-03-05 | 中国科学院声学研究所 | Code moving method based on complementary predicates |
CN105045646A (en) * | 2015-08-06 | 2015-11-11 | 中国电子科技集团公司第三十八研究所 | Partial predicate implementation of clustering structure and compilation optimization method |
Non-Patent Citations (1)
Title |
---|
樊永朝 等: "BWDSP10x上地址和数据谓词执行的编译优化", 《计算机系统应用》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020124400A1 (en) * | 2018-12-19 | 2020-06-25 | 华为技术有限公司 | Multi-branch skip processing apparatus and method, and processor |
CN113168327A (en) * | 2018-12-19 | 2021-07-23 | 华为技术有限公司 | A multi-branch jump processing device and method, and a processor |
Also Published As
Publication number | Publication date |
---|---|
CN107239260B (en) | 2020-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP2834171B2 (en) | Compilation method | |
US8024718B2 (en) | System and method for optimizing source code | |
US20080141229A1 (en) | Processor, program conversion apparatus, program conversion method, and computer program | |
US20170322786A1 (en) | Methods and systems to vectorize scalar computer program loops having loop-carried dependences | |
US5361354A (en) | Optimization of alternate loop exits | |
US5450585A (en) | Compiler with delayed conditional branching | |
JP2002116916A (en) | Method for optimizing program and compiler using the same | |
JPH07114473A (en) | Compiler instruction string optimization method | |
US5854933A (en) | Method for optimizing a computer program by moving certain load and store instructions out of a loop | |
US6611956B1 (en) | Instruction string optimization with estimation of basic block dependence relations where the first step is to remove self-dependent branching | |
CN103645930A (en) | Method for establishing assemble level cross-file scheduling framework | |
CN107239260A (en) | A kind of control of many predicates and compiling optimization method towards digital signal processor | |
US20090019431A1 (en) | Optimised compilation method during conditional branching | |
Reissmann et al. | Efficient control flow restructuring for GPUs | |
CN103617049B (en) | code moving method based on complementary predicate | |
US20050149912A1 (en) | Dynamic online optimizer | |
US20030145190A1 (en) | Compiler algorithm to implement speculative stores | |
US20170206068A1 (en) | Program optimization based on directives for intermediate code | |
JPH05265769A (en) | Instruction scheduling processing method in compiler | |
CN101779192A (en) | Data processing with protection against soft errors | |
JP3757825B2 (en) | Inter-processor communication reduction method, parallel compiler apparatus and program | |
KR100655275B1 (en) | How to Compile Conditional Branching Instructions | |
Chang et al. | A compilation method for zero overhead loop in DSPs with VLIW | |
US11829754B2 (en) | Compile device, compile method, and non-transitory computer readable medium for increasing a speed of a program | |
JP2701246B2 (en) | Compiler vectorization method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |