[go: up one dir, main page]

CN107239260A - A kind of control of many predicates and compiling optimization method towards digital signal processor - Google Patents

A kind of control of many predicates and compiling optimization method towards digital signal processor Download PDF

Info

Publication number
CN107239260A
CN107239260A CN201710328842.3A CN201710328842A CN107239260A CN 107239260 A CN107239260 A CN 107239260A CN 201710328842 A CN201710328842 A CN 201710328842A CN 107239260 A CN107239260 A CN 107239260A
Authority
CN
China
Prior art keywords
predicate
control
many
register file
physics
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710328842.3A
Other languages
Chinese (zh)
Other versions
CN107239260B (en
Inventor
王向前
王昊
项利萍
郭二辉
洪一
孙立宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN201710328842.3A priority Critical patent/CN107239260B/en
Publication of CN107239260A publication Critical patent/CN107239260A/en
Application granted granted Critical
Publication of CN107239260B publication Critical patent/CN107239260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30101Special purpose registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The present invention relates to a kind of many predicates control towards digital signal processor and compiling optimization method, many predicate control instruction forms are:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, implication is p (1), p (2) ..., p (n), n>When=1, n control predicates are true, instruction Rs=Rm op Rn are normally performed, and otherwise Rs=Rm op Rn cancel;Wherein, Rs=Rm op Rn reference computationses or access instruction, p (1), p (2) ..., p (n) is virtual predicate register file;Rm, Rn, Rs are general register.The present invention more flexibly and efficiently can support multiple conditions to be converted to predicate using many predicate control forms, and the code efficiency of generation is higher;Many predicate forms eliminate the phenomenon that predicate defines nesting, and many predicates control to cause the Predicate evaluation overall situationization, flattening.

Description

A kind of control of many predicates and compiling optimization method towards digital signal processor
Technical field
The present invention relates to digital signal processor architecture design and optimization field, especially a kind of face The control of many predicates and compiling optimization method to digital signal processor.
Background technology
It is the basic obstacle for carrying out instruction-level exploitation that branch, which redirects, and predicated execution is the machine that a kind of effective elimination branch redirects System, it is that program is controlled to the conversion relied on to data dependence.The elimination that branch redirects can improve the feasibility of program Can, from hardware architecture, the elimination that branch redirects can reduce hardware spending caused by branch prediction failure;From compiling Aspect says that the elimination that branch redirects can expand scheduling scope, it is allowed to the overlapping parallel execution of multiple condition routing instructions, digs The instruction-level parallelism across multiple Program paths is dug.
General classical predicate form is:
P1, p2=Rm cond Rn
(p1)op1
(p2)op2
If condition calculates Rm cond Rn and sets up (wherein cond compares calculating for certain, is greater than), then p1 is 0, P2 is 1, so as to instruct op1 to cancel, op2 is normally run;, whereas if condition Rm cond Rn are invalid, then p1 is that 1, p2 is 0, so as to instruct op1 normally to run, op2 cancels.
This predicate way of realization has stronger versatility, also allows for compiling optimization and supports, but it has certain office It is sex-limited:First, it is impossible to eliminate the conditional jump sentence of predicate control;Second, it is possible to cause longer Predicate evaluation to rely on road Footpath.
The content of the invention
The primary and foremost purpose of the present invention, which is that offer is a kind of, both can efficiently utilize the abundant of digital signal processor offer Logical operation resource, many predicates towards digital signal processor that the execution efficiency of conditional branching can be improved again are controlled and compiled Translate optimization method.
To achieve the above object, present invention employs following technical scheme:A kind of many meanings towards digital signal processor Word is controlled and compiling optimization method, and many predicate control instruction forms are:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, contain Justice be p (1), p (2) ..., p (n), n>When=1, n control predicates are true, instruction Rs=Rm op Rn are normally performed, otherwise Rs=Rm op Rn cancel;Wherein, Rs=Rm op Rn refer to computations or access instruction, p (1), p (2) ..., and p (n) is Virtual predicate register file;Rm, Rn, Rs are general register;
The execution step of this many predicate control formats is:
(1) predicate register file be designed as 0 of 32 physical register Pred, 1,2 ..., 31, be 1 when 0 When, it is 1 to represent physics predicate register file p0, and physics predicate register file p1 is 0, when being 0 for 0, represents physics predicate register file P0 is 0, and physics predicate register file p1 is 1;When being 1 for 1, it is 1, physics predicate register file p3 to represent physics predicate register file p2 For 0, when being 0 for 1, it is 0 to represent physics predicate register file p2, and physics predicate register file p3 is 1;The like;Wherein p0, P2 ..., p30 be referred to as true physics predicate register file, p1, p3 ..., p31 be referred to as false physics predicate register file;
(2) multiple control predicates of present instruction are read, i.e. p (1), p (2) ..., p (n) are raw according to above corresponding relation In pairs should 32 physical registers Mask and C, wherein Mask be present instruction the corresponding physical register of control predicate Sum of correspondence position binary weights, C for many predicates corresponding physical register correspondence position of present instruction weight with it is corresponding very The value product of predicate register file cumulative and;
(3) whether Pred [Mask]==C for judging many predicate control instructions is true, if true, then present instruction is being just Often perform, if vacation, then present instruction is cancelled.
The instruction format for defining many predicates is:P (1), p (2)=Rm cond Rn, wherein cond are comparison condition, including It is more than, is more than or equal to, is equal to, is not equal to, is less than, is less than or equal to, when condition is true, p (1) is false, and p (2) is true;Conversely, p (1) it is true, p (2) is false.
In compiling, its step is as follows:
Step one:Optimize rear end in compiling, handled successively according to region, calculate predicate switching cost, select suitable area Domain carries out predicate conversion;
Step 2:By control flow analysis, the local predicate of each basic block is recognized;
Step 3:Current region controlling stream is traveled through, the absolute control predicate path for belonging to each basic block is calculated, along control System stream direction is calculated successively, and the process is iteration, until finding whole control predicate paths of each basic block;
Step 4:Merge basic block, be that correspondence predicate is placed in the instruction of basic block, multiple basic blocks merge into one substantially Block, ultimately generates predicate intermediate code;Merging the basic block stage, whole conditions is calculated and places the forward of merging basic block Position, and effective many predicate control instructions are placed on the position rearward for merging basic block, ultimately form the calculating of many predicates Form.
As shown from the above technical solution, the advantage of the invention is that:First, many predicate control forms can be more flexible high Effect ground supports multiple conditions to be converted to predicate, and the code efficiency of generation is higher;Second, many predicate forms eliminate predicate define it is embedding The phenomenon of set, many predicates control to cause the Predicate evaluation overall situationization, flattening;3rd, many predicate control forms eliminate predicate it Between dependence, simplify predicate data dependence analysis so that compiling optimization it is highly efficient.
Brief description of the drawings
Fig. 1 is classical predicate compiler framework;
Fig. 2 is that many predicates control compiler framework;
Fig. 3 is the local predicate of identification;
The absolute path that Fig. 4 controls for identification predicate;
Fig. 5 is that many predicates control basic block.
Embodiment
A kind of control of many predicates and compiling optimization method towards digital signal processor, many predicate control instruction forms For:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, implication is p (1), p (2) ..., p (n), n>=1, n control predicates When being true, instruction Rs=Rm op Rn are normally performed, and otherwise Rs=Rm op Rn cancel;Wherein, Rs=Rm op Rn are referred to Computations or access instruction, p (1), p (2) ..., p (n) is virtual predicate register file;Rm, Rn, Rs are general register;
The execution step of this many predicate control formats is:
(1) predicate register file be designed as 0 of 32 physical register Pred, 1,2 ..., 31, be 1 when 0 When, it is 1 to represent physics predicate register file p0, and physics predicate register file p1 is 0, when being 0 for 0, represents physics predicate register file P0 is 0, and physics predicate register file p1 is 1;When being 1 for 1, it is 1, physics predicate register file p3 to represent physics predicate register file p2 For 0, when being 0 for 1, it is 0 to represent physics predicate register file p2, and physics predicate register file p3 is 1;The like;Wherein p0, P2 ..., p30 be referred to as true physics predicate register file, p1, p3 ..., p31 be referred to as false physics predicate register file;
(2) multiple control predicates of present instruction are read, i.e. p (1), p (2) ..., p (n) are raw according to above corresponding relation In pairs should 32 physical registers Mask and C, wherein Mask be present instruction the corresponding physical register of control predicate Sum of correspondence position binary weights, C for many predicates corresponding physical register correspondence position of present instruction weight with it is corresponding very The value product of predicate register file cumulative and;
(3) whether Pred [Mask]==C for judging many predicate control instructions is true, if true, then present instruction is being just Often perform, if vacation, then present instruction is cancelled.
The instruction format for defining many predicates is:P (1), p (2)=Rm cond Rn, wherein cond are comparison condition, including It is more than, is more than or equal to, is equal to, is not equal to, is less than, is less than or equal to, when condition is true, p (1) is false, and p (2) is true;Conversely, p (1) it is true, p (2) is false.
In compiling, its step is as follows:
Step one:Optimize rear end in compiling, handled successively according to region, calculate predicate switching cost, select suitable area Domain carries out predicate conversion;
Step 2:By control flow analysis, the local predicate of each basic block is recognized;
Step 3:Current region controlling stream is traveled through, the absolute control predicate path for belonging to each basic block is calculated, along control System stream direction is calculated successively, and the process is iteration, until finding whole control predicate paths of each basic block;
Step 4:Merge basic block, be that correspondence predicate is placed in the instruction of basic block, multiple basic blocks merge into one substantially Block, ultimately generates predicate intermediate code;Merging the basic block stage, whole conditions is calculated and places the forward of merging basic block Position, and effective many predicate control instructions are placed on the position rearward for merging basic block, ultimately form the calculating of many predicates Form.
Fig. 1 is classical predicate compiler framework, first selects to be adapted to the region of predicate conversion, then carries out data-flow analysis, The predicate of each basic block is recognized, according to the dependence between basic block predicate, corresponding predicate definition statement is inserted, now The predicate definition instruction or jump instruction of predicate control can be introduced, predicate control deposit is placed in the finally instruction for each basic block Device, and basic block merging is carried out, generate predicate intermediate code.
Fig. 2 is many predicate recognition frameworks, first selects to be adapted to the region of predicate conversion, by control flow analysis, identification is every The control predicate of individual basic block a, basic block has been possible to multiple Partial controll predicates;Then the exhausted of each basic block is calculated To control predicate path, calculated successively along controlling stream direction, the process is iteration, until finding the whole of each control block Control predicate path;The absolute control predicate path for finding each basic block is to adapt to flattening predicate form.Merging The basic block stage, it is possible to whole conditions is calculated and places the forward position for merging basic block, and effective many predicate controls System instruction is placed on the position rearward for merging basic block, ultimately forms many predicate control intermediate code forms.
Fig. 3 is the example of the local predicate of an identification, and control block BB2, BB5 Partial controll predicate are respectively true physics meaning Word register p2, false physics predicate register file p5;And control block BB3, BB4 then have multiple Partial controll predicates, wherein BB3 two Partial controll predicate is that false physics predicate register file p1, false physics predicate register file p3, BB4 two Partial controll predicates are true Physics predicate register file p4, true physics predicate register file p6.
Fig. 4 is the schematic diagram that this all controls predicate path, and control block BB2 absolute control predicate path is that have one True physics predicate register file p2 compositions, and control block BB3, BB4, BB5 then have multiple absolute control paths.BB3 absolute control Predicate path, to there is two, is the first paths of false physics predicate register file p1 compositions, false physics predicate register file p3 respectively The second paths constituted with true physics predicate register file p2, it is meant that otherwise the control predicate of basic block BB3 instructions is false Physics predicate register file p1, otherwise it is that (many predicate controls refer to false physics predicate register file p3 and true physics predicate register file p2 Make).BB4 absolute control predicate path is three, is true physics predicate register file p4 and true physics predicate register file p2 respectively First paths of composition, true physics predicate register file p6 and false physics predicate register file p1 compositions the second paths, true thing Manage predicate register file p6, false physics predicate register file p3, the third path of true physics predicate register file p2 compositions.BB5's is exhausted Two are included to control predicate path, is the first of false physics predicate register file p5 and false physics predicate register file p1 compositions respectively Paths, the Article 2 that false physics predicate register file p5, vacation physics predicate register file p3 are constituted with true physics predicate register file p2 Path.
Fig. 5 is many predicate control routines, first dos command line DOS:
P1, p2=a>B | | p3, p4=c>D | | p5, p6=a>c
Defined for many predicates, define p1, p3, the true physics predicate register files of p5 tri-, p2, p4, the false physics meanings of p6 tri- Word register.
Second dos command line DOS:
(p4, p2) r8=2 | | (p6, p1) r8=2 | | (p6, p3, p2) r8=2 | | (p5, p1) re=3 | | (p5, p3, p2) Re=3
Control to go for many predicates, otherwise wherein the dos command line DOS represents that re is equal under the conditions of being really at (p2, p4) or (p6) 2, otherwise it is equal to 3 under the conditions of being really at (p1, p5) or (p2, p3, p5).
In summary, many predicate control forms that the present invention is used more flexibly and efficiently can support multiple condition conversions For predicate, the code efficiency of generation is higher;Many predicate forms eliminate the phenomenon that predicate defines nesting, and many predicates control to cause meaning Word calculates the overall situationization, flattening;Many predicate control forms eliminate the dependence between predicate, simplify predicate data dependence Analysis so that compiling optimization is highly efficient.

Claims (3)

1. a kind of control of many predicates and compiling optimization method towards digital signal processor, it is characterised in that:
Many predicate control instruction forms are:(p (1), p (2) ..., p (n)) Rs=Rm op Rn, implication is p (1), p (2) ..., p (n),n>When=1, n control predicates are true, instruction Rs=Rm op Rn are normally performed, and otherwise Rs=Rm op Rn cancel; Wherein, Rs=Rm op Rn reference computationses or access instruction, p (1), p (2) ..., p (n) is virtual predicate register file;Rm、 Rn, Rs are general register;
The execution step of this many predicate control formats is:
(1) predicate register file be designed as 0 of 32 physical register Pred, 1,2 ..., 31, when being 1 for 0, table It is 1 to show physics predicate register file p0, and physics predicate register file p1 is 0, when being 0 for 0, and it is 0 to represent physics predicate register file p0, Physics predicate register file p1 is 1;When being 1 for 1, it is 1 to represent physics predicate register file p2, and physics predicate register file p3 is 0, when 1 when being 0, it is 0 to represent physics predicate register file p2, and physics predicate register file p3 is 1;The like;Wherein p0, p2 ..., P30 is referred to as true physics predicate register file, p1, p3 ..., p31 be referred to as false physics predicate register file;
(2) multiple control predicates of present instruction, i.e. p (1), p (2) ..., p (n), according to above corresponding relation, generation pair are read Should 32 physical registers Mask and C, wherein Mask for present instruction control predicate corresponding physical register correspondence The sum of position binary weights, C is the weight and corresponding true predicate of the corresponding physical register correspondence position of many predicates of present instruction The value product of register cumulative and;
(3) whether Pred [Mask]==C for judging many predicate control instructions is true, and if true, then present instruction is normally held OK, if vacation, then present instruction is cancelled.
2. the control of many predicates and compiling optimization method according to claim 1 towards digital signal processor, its feature It is:The instruction format for defining many predicates is:P (1), p (2)=Rm cond Rn, wherein cond are comparison condition, including big In, be more than or equal to, be equal to, be not equal to, be less than, be less than or equal to, when condition is true, p (1) is false, and p (2) is true;Conversely, p (1) It is true, p (2) is false.
3. the control of many predicates and compiling optimization method according to claim 1 towards digital signal processor, its feature It is:In compiling, its step is as follows:
Step one:Optimize rear end in compiling, handled successively according to region, calculate predicate switching cost, select suitable region to enter Row predicate is changed;
Step 2:By control flow analysis, the local predicate of each basic block is recognized;
Step 3:Current region controlling stream is traveled through, the absolute control predicate path for belonging to each basic block is calculated, along controlling stream Direction is calculated successively, and the process is iteration, until finding whole control predicate paths of each basic block;
Step 4:Merge basic block, be that correspondence predicate is placed in the instruction of basic block, multiple basic blocks merge into a basic block, Ultimately generate predicate intermediate code;Merging the basic block stage, whole conditions is calculated and places the forward position for merging basic block Put, and effective many predicate control instructions are placed on the position rearward for merging basic block, ultimately form the calculating shape of many predicates Formula.
CN201710328842.3A 2017-05-11 2017-05-11 Multi-predicate control and compiling optimization method for digital signal processor Active CN107239260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710328842.3A CN107239260B (en) 2017-05-11 2017-05-11 Multi-predicate control and compiling optimization method for digital signal processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710328842.3A CN107239260B (en) 2017-05-11 2017-05-11 Multi-predicate control and compiling optimization method for digital signal processor

Publications (2)

Publication Number Publication Date
CN107239260A true CN107239260A (en) 2017-10-10
CN107239260B CN107239260B (en) 2020-07-24

Family

ID=59985047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710328842.3A Active CN107239260B (en) 2017-05-11 2017-05-11 Multi-predicate control and compiling optimization method for digital signal processor

Country Status (1)

Country Link
CN (1) CN107239260B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020124400A1 (en) * 2018-12-19 2020-06-25 华为技术有限公司 Multi-branch skip processing apparatus and method, and processor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5859999A (en) * 1996-10-03 1999-01-12 Idea Corporation System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
CN103617049A (en) * 2013-12-19 2014-03-05 中国科学院声学研究所 Code moving method based on complementary predicates
US20150089189A1 (en) * 2013-09-24 2015-03-26 Apple Inc. Predicate Vector Pack and Unpack Instructions
CN105045646A (en) * 2015-08-06 2015-11-11 中国电子科技集团公司第三十八研究所 Partial predicate implementation of clustering structure and compilation optimization method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5859999A (en) * 1996-10-03 1999-01-12 Idea Corporation System for restoring predicate registers via a mask having at least a single bit corresponding to a plurality of registers
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
US20150089189A1 (en) * 2013-09-24 2015-03-26 Apple Inc. Predicate Vector Pack and Unpack Instructions
CN103617049A (en) * 2013-12-19 2014-03-05 中国科学院声学研究所 Code moving method based on complementary predicates
CN105045646A (en) * 2015-08-06 2015-11-11 中国电子科技集团公司第三十八研究所 Partial predicate implementation of clustering structure and compilation optimization method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
樊永朝 等: "BWDSP10x上地址和数据谓词执行的编译优化", 《计算机系统应用》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020124400A1 (en) * 2018-12-19 2020-06-25 华为技术有限公司 Multi-branch skip processing apparatus and method, and processor
CN113168327A (en) * 2018-12-19 2021-07-23 华为技术有限公司 A multi-branch jump processing device and method, and a processor

Also Published As

Publication number Publication date
CN107239260B (en) 2020-07-24

Similar Documents

Publication Publication Date Title
JP2834171B2 (en) Compilation method
US8024718B2 (en) System and method for optimizing source code
US20080141229A1 (en) Processor, program conversion apparatus, program conversion method, and computer program
US20170322786A1 (en) Methods and systems to vectorize scalar computer program loops having loop-carried dependences
US5361354A (en) Optimization of alternate loop exits
US5450585A (en) Compiler with delayed conditional branching
JP2002116916A (en) Method for optimizing program and compiler using the same
JPH07114473A (en) Compiler instruction string optimization method
US5854933A (en) Method for optimizing a computer program by moving certain load and store instructions out of a loop
US6611956B1 (en) Instruction string optimization with estimation of basic block dependence relations where the first step is to remove self-dependent branching
CN103645930A (en) Method for establishing assemble level cross-file scheduling framework
CN107239260A (en) A kind of control of many predicates and compiling optimization method towards digital signal processor
US20090019431A1 (en) Optimised compilation method during conditional branching
Reissmann et al. Efficient control flow restructuring for GPUs
CN103617049B (en) code moving method based on complementary predicate
US20050149912A1 (en) Dynamic online optimizer
US20030145190A1 (en) Compiler algorithm to implement speculative stores
US20170206068A1 (en) Program optimization based on directives for intermediate code
JPH05265769A (en) Instruction scheduling processing method in compiler
CN101779192A (en) Data processing with protection against soft errors
JP3757825B2 (en) Inter-processor communication reduction method, parallel compiler apparatus and program
KR100655275B1 (en) How to Compile Conditional Branching Instructions
Chang et al. A compilation method for zero overhead loop in DSPs with VLIW
US11829754B2 (en) Compile device, compile method, and non-transitory computer readable medium for increasing a speed of a program
JP2701246B2 (en) Compiler vectorization method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant