CN1961285B - TLB correlated branch predictor and method for use therof - Google Patents
TLB correlated branch predictor and method for use therof Download PDFInfo
- Publication number
- CN1961285B CN1961285B CN2004800432090A CN200480043209A CN1961285B CN 1961285 B CN1961285 B CN 1961285B CN 2004800432090 A CN2004800432090 A CN 2004800432090A CN 200480043209 A CN200480043209 A CN 200480043209A CN 1961285 B CN1961285 B CN 1961285B
- Authority
- CN
- China
- Prior art keywords
- branch
- shift register
- translation look
- history
- aside buffer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000002596 correlated effect Effects 0.000 title description 6
- 238000013519 translation Methods 0.000 claims abstract description 85
- 230000006378 damage Effects 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 18
- 230000004807 localization Effects 0.000 claims description 17
- 238000012423 maintenance Methods 0.000 claims description 2
- 230000004044 response Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 8
- 230000002093 peripheral effect Effects 0.000 description 6
- 230000000295 complement effect Effects 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000002266 amputation Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 229920006395 saturated elastomer Polymers 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3842—Speculative instruction execution
- G06F9/3844—Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
Embodiments of the present invention relate to an apparatus and method t enable efficient branch prediction in super-scalar and other branching-enabled processors. In accordance with an embodiment of the present invention, a branch predictor may be include a branch prediction circuit to predict a branch outcome in an executing instruction in a processor using an input form a translation look-aside buffer.
Description
Technical field
Embodiments of the invention relate to high-performance processor, relate in particular to the instruction branch predictor that uses translation look-aside buffer input and dynamic length global branch history.
Background technology
Along with the increase that branch instruction is sent the speed and the instruction pipelining degree of depth, branch prediction accurately is to the potential performance that realizes the superscale out-of-order processors ever more important that becomes.The branch predictor of some prior art or be implemented as the branch predictor of no global history, or be implemented as the two-stage branch predictor that has global history.
In some branch predictor, global history is made up of m closest branch, and each has all write down in the m position global shift register of whether taking branch and realizes therein.Unfortunately, existing global shift register only writes down the global history of regular length.Yet nearest research is pointed out, by using the global history of different length, may make from the different instruction of distinct program and experience better forecasting accuracy.
Fig. 1 is the circuit block diagram of a kind of branch predictor known in the art.In Fig. 1, the historical shift register 110 in m position comprises the single place shift input at m on the throne place and the single place shift output at 1 place on the throne, and wherein this single place shift input receives the indication of the branch that whether takes specific instruction.For example, value " 1 " is used for indication and takes branch, and " 0 " is used for indication and does not take this branch.Historical shift register 110 is used to store regular length, and (that is, the m bit length) global branch prediction history shifts out the highest significant position value, promptly primary value, and will export the whole m position global branch prediction history value that will store.
In Fig. 1, historical shift register 110 is coupled to XOR (EXCLUSIVE-OR) door 120, and historical shift register 110 value that will be stored in the m position global branch prediction history in the historical shift register 110 exports first input of XOR gate 120 to.XOR gate 120 also is coupled to the branch address register 130 that m position branch address is exported to XOR gate 120 second inputs.If from the m position branch address of branch address register 130 inputs and the m position global history coupling of importing from historical shift register 110, then XOR gate 120 exports m position global history to pattern history table 140.Should be noted that from the m position branch address of branch address register 130 input before being output, can be shifted, expansion or amputation, with the figure place of coupling from historical shift register 110 outputs.The result is, even the length of global branch prediction history value changes to some extent, still always is complementary with position from the global branch predicted value of historical shift register 110 inputs from the figure place of the m position branch address bit string of branch address register 130 outputs.
In Fig. 1, pattern history table 140 is by 2
mIndividual clauses and subclauses are formed, and wherein each clauses and subclauses in the form all comprise " local history ".Local history information is stored in 2 saturated branch predictors usually.Be used for selecting then clauses and subclauses from pattern history table 140 from the m position global history of XOR gate 120 outputs, these clauses and subclauses are used for carrying out prediction subsequently.By this design, use firm predicted entry to store effective historical information that wherein different branch instructions is relative to each other.
In Fig. 1,2 branch predictors are safeguarded one 2 digit counter.It will be based on its oneself content output branch prediction when being cited.For example, if " 10 " are 2 contents of fallout predictor (that is, the pattern history table clauses and subclauses) of distributing to a branch, then this branch prediction " is taked ".In a certain moment after a while, this content is updated after true directions is known.For example, if this branch " by taking ", then " 10 " are updated to " 11 ", and if this branch " is not taked ", then are updated to " 01 ".Generally speaking, when 2 bit counter value more than or equal to its peaked half, promptly 2
2-1, predict that then this branch is not taked at=2 o'clock.On the contrary, if 2 bit counter value, predict then that this branch is not taked less than 2.In other words,, predict that then this branch is taked if this 2 digit counter comprises " 10 " (that is, 2) or " 11 " (that is, 3), if but 2 digit counters comprise " 00 " (that is, 0) or " 01 " (that is, 1), predict that then this branch is not taked.
Local history means that the output of branch will depend on its oneself history, and global history means that then the output of branch depends on other branch histories.In following short code example, if first branch output " taking ", then second branch also exports " taking ".Then, one independently 2 branch predictors (, taking to have the pattern history entries of global history) corresponding to the d==0 of branch will be used to keep this to have the information of this global history and 2 grades of branch's prediction scheme.
If (d==0) if // d=0
D=1; // d=1 then is set
If (d==1) if // d=1
... // then continued the conditional order of d=1
Unfortunately, because the global history register among Fig. 1 110 all only writes down the global history of regular length in all cases, so enough not good based on the accuracy of the branch prediction of regular length global history.For example, can not distinguish the previous branch instruction relevant exactly based on the branch prediction of regular length global history with current branch instruction.Similarly, be not only and use the global history of regular length can't always predict incoherent other branch instructions exactly, and relevantly exist in some cases and under other situation that they should exist, but do not exist.For example, in following example code, if memory operand is X, then Y can be owing to data locality has consecutive value.Branch predictor can be carried out aforesaid operations.Yet this relation will be destroyed along with the forfeiture of data locality.
If (d==0) if // d=0
D=X; // d=X then is set
If (d==Y) if // d=Y
// then continue the conditional order of d=Y
This kind situation shows overall being correlated with and not only depends on global history or branch address sometimes, also depends on data locality.As shown in the example above, when d in second instruction is configured to equate with X, and d is judged as when being not equal to Y in the 3rd instruction, data locality can take place lose.As a result, may not carry out the conditional order of d=Y.This can damage global history too.Therefore, a kind of branch predictor that can avoid above-mentioned deficiency of expectation.
Summary of the invention
Embodiments of the invention comprise a kind of branch prediction circuit, be used for using input from translation look-aside buffer to come branch outcome in the execution command of prediction processor, this branch prediction circuit comprises: pattern history table, and each clauses and subclauses in the described pattern history table all comprise a local history; And the historical shift register that is coupled to described pattern history table and described translation look-aside buffer, be used to store the dynamic length global branch history that is used to execute instruction; Be coupled to the storer of described historical shift register, described storer is delivered to described historical shift register with a reset values when the miss signal that receives from described translation look-aside buffer; Wherein, described historical shift register when receiving in response to the miss signal of described translation look-aside buffer from reset values that storer sends with himself zero clearing, with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes.
Other embodiment of the present invention comprise a kind of branch predictor, and this branch predictor comprises branch prediction circuit.This branch prediction circuit comprises: pattern history table, and each clauses and subclauses in the described pattern history table all comprise a local history; Be coupled to the historical shift register of a described pattern history table and a translation look-aside buffer, be used to store the dynamic length global branch history that is used to execute instruction; Be coupled to the storer of the historical shift register in described translation look-aside buffer and the described branch prediction circuit, the described historical shift register of zero clearing during the miss indication of described storer in receiving described translation look-aside buffer is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And the feedback loop that is coupled to described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1 be value 1.
Other embodiment of the present invention comprise a kind of processor, and this processor comprises a translation look-aside buffer and a branch prediction circuit.This branch prediction circuit comprises: pattern history table, and each clauses and subclauses in the described pattern history table all comprise a local history; Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, be used to store the dynamic length global branch history that is used to execute instruction; Be coupled to the storer of the historical shift register in described translation look-aside buffer and the described branch prediction circuit, the described historical shift register of zero clearing during the miss indication of described storer in receiving described translation look-aside buffer is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And the feedback loop that is coupled to described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1 be value 1.
Other embodiment of the present invention comprises a kind of computing system.This computing system comprises storer and processor.Described processor comprises translation look-aside buffer and branch prediction circuit.This branch prediction circuit comprises: pattern history table, and each clauses and subclauses in the described pattern history table all comprise a local history; Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, described historical shift register is receiving miss indication from described translation look-aside buffer with himself zero clearing; Be coupled to the latched memory of the historical shift register in described translation look-aside buffer and the described branch prediction circuit, the described historical shift register of zero clearing during the miss indication of described latched memory in receiving described translation look-aside buffer is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And the feedback loop that is coupled to described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1 be value 1.
Other embodiment of the present invention comprises a kind of method that is used for branch prediction.This method can be used the branch outcome of coming many execution commands in the prediction processor from the input of a translation look-aside buffer, the branch outcome of each bar of described many execution commands that maintenance is predicted; And receive about with the translation look-aside buffer of one of described many execution commands data that are associated in when miss indication takes place with the global branch history zero clearing, with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes.
Other embodiment of the present invention comprises a kind of equipment that is used for branch prediction.This equipment comprises and is used to use the device that comes the branch outcome of many execution commands in the prediction processor from the input of translation look-aside buffer.Be used to use input to come the device of the branch outcome of many execution commands in the prediction processor to comprise: the device of branch outcome that is used to predict each bar of described many execution commands from translation look-aside buffer; Be used to keep the device of branch outcome of each bar of described many execution commands predicted; And be used for receive about in when, with the translation look-aside buffer of one of described many execution commands data that are associated miss indication taking place with the global branch history zero clearing, with the device of the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes.
Other embodiment of the present invention comprises a kind of method that is used for branch prediction.This method is used and is selected a predicted entry from the input of translation look-aside buffer; Predict based on described predicted entry and described input whether a branch can be taked; Reception about described branch whether by the actual information of taking; Whether use about described branch by the described predicted entry of the actual information updating of taking; Whether the global history value of upgrading in the historical shift register is taked by actual to indicate described branch, the described historical shift register of zero clearing during the miss indication in receiving described translation look-aside buffer wherein is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And take out next branch instruction.
Other embodiment of the present invention comprises a kind of equipment that is used for branch prediction.This equipment comprises: be used to use the device of selecting a predicted entry from the input of translation look-aside buffer; Be used for predicting the device whether a branch can be taked based on described predicted entry and described input; Whether be used to receive about described branch by the device of the actual information of taking; Whether be used to about described branch by the device of the actual described predicted entry of taking of information updating; Whether the global history value that is used for upgrading historical shift register to indicate described branch by the actual device of taking, wherein during the miss indication in receiving described translation look-aside buffer the described historical shift register of zero clearing with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And the device that is used to take out next branch instruction.
Description of drawings
Fig. 1 is the circuit block diagram of branch predictor known in the art.
Fig. 2 is the circuit block diagram that is used for the translation look-aside buffer correlated branch predictor of processor according to an embodiment of the invention.
Fig. 3 is the process flow diagram of method according to an embodiment of the invention.
Fig. 4 is the block diagram that comprises the computer system of the one or more processors that use according to one embodiment of the invention and storer.
Embodiment
Embodiments of the invention relate to a kind of apparatus and method that are used for the relevant branch prediction of translation look-aside buffer, include but not limited to, have or do not have the relevant branch predictor and/or the relevant branch predictor of two-stage translation look-aside buffer of global history translation look-aside buffering of distance to go branch history.For example, according to one embodiment of present invention, processor can comprise relevant branch predictor, and this branch predictor has the incoming line from translation look-aside buffer to the global branch history shift register.In the indication translation look-aside buffer when miss incoming line taking place can be used for the zero clearing of global branch history shift register.Because the global branch history that is stored in the global branch history shift register can be by the training of data part, so when translation look-aside buffer is miss, the zero clearing of global branch history shift register helped avoid the destruction of the global branch history that the non-data localization that caused by the translation look-aside buffer miss data caused.
Fig. 2 is the circuit block diagram that is used for the translation look-aside buffer correlated branch predictor of processor according to an embodiment of the invention.In Fig. 2, processor 200 can comprise the historical shift register 210 in m position, this history shift register 210 can comprise first single place shift input (being similar to the single place shift input among Fig. 1), the input of second single place shift and single place shift output (being similar to the single place shift output among Fig. 1), and wherein the input of first single place shift receives the indication whether branch about specific instruction is taked.Historical shift register 210 can be used for storing the dynamic length global branch history that is used to execute instruction.Generally speaking, can use value for the highest significant position of " 1 " identifies effective history length, for example, if the highest effectively " 1 " on the 5th of m bit shift register, then global history can be confirmed as long m-5 position.As a result, whether " 1 " of highest significant position does not indicate branch to occur.According to one embodiment of present invention, " 1 " value can be used as the permission signal that indication branch is taked, and " 0 " can be used as the non-permission signal that indication branch is not taked.Historical shift register 210 can be used for storing the dynamic length global branch prediction history that maximum length is the m-1 position, and the value of output highest significant position, the i.e. value of m-1 position.Therefore, string " 0000...01 " can indicating length be zero global history, this can indicate recently from historical shift register 210 flush global history.Similarly, according to one embodiment of present invention, think that string " 0000...00 " is meaningless, because it can indicate non-existent global history length, and think that string " 1X...Y " (wherein X and Y can be equal to " 0 " or " 1 ") comprises the longest possibility global history length, the i.e. length of m-1 position that register can comprise.
In Fig. 2, historical shift register 210 can be coupled to XOR gate 220, and historical shift register 210 can export first input of XOR gate 220 to being stored in m position global branch prediction history value in the historical shift register 210.XOR gate 220 also can be coupled to the branch address register 230 that m position branch address is exported to XOR gate 220 second inputs.If from the m position branch address of branch address register 230 inputs and the m position global history coupling of importing from historical shift register 210, then XOR gate 220 can export m position global history to pattern history table 240.Should be noted that m position branch address from branch address register 230 can be shifted before being output, expansion or amputation, be complementary with output figure place with historical shift register 210.As a result, even the length of global branch prediction history value changes to some extent, always be complementary with position from the global branch predicted value of historical shift register 210 inputs from the figure place of the m position branch address bit string of branch address register 230 output is still general.
In Fig. 2, pattern history table 240 can be made up of 2m clauses and subclauses, and wherein each clauses and subclauses in the table all can comprise one " local history ".Local history information is stored in 2 saturated branch predictors usually.Can be used for selecting then clauses and subclauses from the m position global history of XOR gate 220 outputs from pattern history table 240, these clauses and subclauses can be used for carrying out prediction.By this design, can use firm predicted entry to be stored in effective historical information that wherein different branch instructions is relative to each other.
Generally speaking, in Fig. 2, historical shift register 210 can be shifted as shown in Figure 1, but 2 exceptions are arranged, that is, when global branch history will be by flush and when the global history string value equals " 1XYZ...; " the time, wherein X, Y and Z can be equal to " 0 " or " 1 ".At first, in Fig. 2, if historical shift register 210 will be by flush, the global branch history string in the then historical shift register 210 can be cleared and be set to equal " 0000...01 ".Secondly, when historical shift register 210 comprises the long global branch history in m-1 position, mean that promptly " 1 " can be stored in the highest significant position of historical shift register 210 (that is position 1), " 1 " value of storing on the throne 1 can be held, and the place value in the position 2 can be shifted out.
In Fig. 2, feedback circuit 270 can be coupled to 1 position, position and 2 positions, position in the historical shift register 210.Feedback circuit 270 can comprise with historical shift register 210 be coupled with the highest significant position that receives output and with or (OR) door 290 that be coupled with (AND) door 280, or door 290 can be coupled with 1 position, position and 2 positions, position in the historical shift register 210.Feedback circuit 270 can be used for keeping 1 value of the highest significant position in the position, m-1 position in the historical shift register 210.More specifically, can be coupled to the output of historical shift register 210 with first input 281 of door 280.Can receive " 1 " value with second input of door 280, this value can be carried out and computing with the output valve of historical shift register 210, thus obtain via output 287 from export to door 280 or door 290 first import 291 with value.Or second input 293 of door 290 can be coupled to 2 positions, position in the historical shift register 210 and receive value from this position.Or door 290 output 297 can be coupled in the historical shift register 210 1 position, position and will or value export this position to.Because have one group " 1 " input, so only have two kinds of possible input combinations, i.e. (0,1) and (1,1) with second input 283 of door 280.In any case, all have only two kinds with the possible output valve of door 280.That is, if the output valve of position, m-1 position also is " 1 " in the historical shift register 210, then can from door 280 outputs " 1 ", and if the output valve of position, m-1 position is " 0 " in the historical shift register 210, then can from door 280 outputs " 0 ".Similarly, though or door 290 also only has two identical possibility output valves (promptly, " 0 " or " 1 "), but because or first input, 291 and second input of door 290 293 all be not limited to single value, may import combination so the gained result can come from four, promptly (0,0), (0,1), (1,0) and (1,1).As seen in can the logical OR table from table 1, " 1 " can be used as four results' outputs of three in may the input values combination.Therefore, because always export " 1 " with position 1 value of door 280 in historical shift register 210 when " 1 ", so just can see, feedback circuit 270 is worth " 1 " in holding position 1 position, up to historical shift register 210 by the miss zero clearing of TLB.
Table 1
Embodiments of the invention can be realized in out-of-order processors, in this processor, get finger/decoding unit and can take out such as instructions such as macro instructions from memory locations such as for example instruction cache, and these instructions are decoded.For complex instruction set computer (CISC) (CISC) framework, get finger/decoding unit and complicated order can be decoded into one or more micro-instructions/operations.Generally speaking, these micro-orders have defined loading-store type architecture, can be to realizing such as other frameworks such as Reduced Instruction Set Computer (RISC) or very long instruction word (VLIW) framework so that relate to the micro-order of storage operation.
In typical R ISC framework, do not decode the instruction into micro-order.Because the present invention can realize RISC framework and CISC framework, so unless otherwise prescribed, otherwise just need not between instruction and micro-instructions/operations, to make differentiation, and be simply referred to as instruction.
Fig. 3 is the process flow diagram of method according to an embodiment of the invention.In Fig. 3, can use from the input of TLB from for example selecting a predicted entry (310) the pattern history table 240, and predict dynamically based on selected predicted entry and TLB input whether a branch is taked (320).Whether this method can receive about this branch by the actual information of taking (330), and whether is taked to upgrade predicted entry by actual based on this branch, for example upgrades in pattern history table 240 (340).Whether indicated branch whether by actual global history value of taking and pattern history table 240 in the renewable for example historical shift register 210 based on branch by actual taking; And can take out next branch instruction (360).Generally speaking, this method only just stops when processor cuts out or do not have extra instruction process to carry out.
In an alternative embodiment that does not clearly illustrate of the present invention, if extra branch instruction is not available immediately, then the method among Fig. 3 can stop and wait for more branch instruction.
Though the method among Fig. 3 may contain the particular order of carrying out this method, embodiments of the invention should not be limited on this order.In fact, can conceive wherein the embodiments of the invention that can carry out some or all elements of this method with any order, include but not limited in unordered (" OOO ") processor for example executed in parallel whole or in part.Similarly, though, the method among Fig. 3 is simplified to only reflects a branch at every turn, also can conceive wherein and can handle the embodiments of the invention of a plurality of branches simultaneously, and these embodiment are subjected to the dependent restriction of any available data natch for ease of illustrating.
Below the false code of Jian Huaing partly shows the operation of the realization of TLB correlated global history branch predictor according to an embodiment of the invention.
check_and_initialize_predictor(argc,argv,&inTrace,&aPredictor);
while(!inTrace->EndOfTrace()){
aPredictor->SelectPredictionEntry(inTrace->GetAddress(),inTrace->TLBMissOrNot());
// be TLB information here
bool?pr-taken=aPredictor->prediction(inTrace->ForwardBranchOrNot());
// permission static prediction
aPredictor->UpdatePredictor(inTrace->TakenOrNot(),pr_taken);
// after knowing the real goal of branch, upgrade pattern history table and the global register that is shifted
InTrace->read_trace (); // read next branch instruction in this simulation
}
aPredictor->Show?AccuracyQ;
For example, in above false code, can see that fallout predictor upgrades these predictions in instruction term of execution operation with the result that predicts each branch in this instruction and after learning realistic objective.Though above-mentioned pseudo-code example may hint the serial execution, it has just illustrated general conception, and can conceive the optional embodiment that the wherein parallel and/or unordered execution of each branch can take place according to any data dependency constrained each other natch.
Fig. 4 is the block diagram according to the computer system that comprises one or more processors and storer of one embodiment of the invention use.In Fig. 4, computer system 400 can comprise the one or more processors 410 (l) that are coupled to memory bus 420 to 410 (n), and memory bus 420 can be coupled to system logic 430.One or more processors 410 (l) each to 410 (n) can be the N bit processor, and can comprise demoder (not shown) and one or more N bit register (not shown).System logic 430 can be coupled to system storage 440 by bus 450, and can be coupled to nonvolatile memory 470 and one or more peripherals 480 (l) to 480 (m) by peripheral bus 460.Peripheral bus 460 can represent for example to meet one or more peripheral component interconnect (pci) bus of PCI Local BusSpecification (PCI local bus specification) revised edition 2.2 of the PCI Special Interest Group (SIG) that announced on Dec 18th, 1998; The ISA(Industry Standard Architecture) bus; The EISA Specification (EISA standard) 3.12 editions that meets the BCPR Services Inc that announced in 1992,1992 expansion ISA (EISA) bus; The USB (universal serial bus) (USB) that meets the USB Specification (USB standard) that announced on September 23rd, 1,998 1.1 editions; And similar peripheral bus.Nonvolatile memory 470 can be such as sram devices such as ROM (read-only memory) (ROM) or flash memories.Peripherals 480 (l) can comprise for example keyboard to 480 (m); Mouse or other pointing devices; Such as mass-memory units such as hard disk drive, compact-disc (CD) driver, CD and digital video disc (DVD) drivers; Display or the like.
Though disclose the present invention in detail, should be appreciated that, also can make various changes, replacement and change at this.In addition, though described the software and hardware of controlling some function, these functions can use the combination of software, hardware or software and hardware to realize as known in the art like that.Equally, in claims, term " instruction " can comprise instruction in the RISC framework or the instruction in the CISC framework, and the instruction of using in other computer architectures.Those of ordinary skills can easily determine and make other examples under the prerequisite that does not deviate from the spirit and scope of the present invention that defined by appended claims.
Claims (30)
1. branch predictor comprises:
Branch prediction circuit, its uses input from translation look-aside buffer to come branch outcome in the execution command in the prediction processor, comprising:
Pattern history table, each clauses and subclauses in the described pattern history table all comprise a local history; And
Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, be used to store the dynamic length global branch history that is used to execute instruction;
Be coupled to the storer of described historical shift register, described storer is delivered to described historical shift register with a reset values when the miss signal that receives from described translation look-aside buffer;
Wherein, described historical shift register when receiving in response to the miss signal of described translation look-aside buffer from reset values that storer sends with himself zero clearing, with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes.
2. branch predictor as claimed in claim 1 is characterized in that, described storer is:
Three-state buffer.
3. branch predictor as claimed in claim 1 is characterized in that, described branch prediction circuit also comprises:
Be coupled to the feedback loop of described historical shift register, described feedback loop keeps the value of highest significant position in the described historical shift register.
4. branch predictor as claimed in claim 3 is characterized in that described feedback loop remains 1 with the value of described highest significant position.
5. branch predictor as claimed in claim 3 is characterized in that, the highest effective 1 value place bit position has determined to be stored in the length of the global branch history in the described historical shift register in described historical shift register.
6. branch predictor as claimed in claim 5 is characterized in that, the length that is stored in the described global branch history in the described historical shift register is by the highest described effective 1 value place bit position definition.
7. branch predictor as claimed in claim 3 is characterized in that, described feedback loop comprises:
Be coupled to described historical shift register with door, describedly receive the output place value of described historical shift register and allow signal with door; And
Be coupled to described and door and described historical shift register or door, described or door receive from described with first input value and from second input value of described historical shift register, and export a new place value to described historical shift register.
8. branch predictor as claimed in claim 1 is characterized in that described historical shift register contains the global branch history of distance to go.
9. branch predictor as claimed in claim 1 is characterized in that, described historical shift register comprises the m position, and exports a m bit pattern history value to described pattern history table via XOR gate.
10. branch predictor as claimed in claim 9 is characterized in that, described XOR gate receives a described m bit pattern history value and a m position branch address value, and exports a m bit pattern history value to described pattern history table.
11. a branch predictor comprises:
Branch prediction circuit comprises:
Pattern history table, each clauses and subclauses in the described pattern history table all comprise a local history;
Be coupled to the historical shift register of a described pattern history table and a translation look-aside buffer, be used to store the dynamic length global branch history that is used to execute instruction;
Be coupled to the storer of the historical shift register in described translation look-aside buffer and the described branch prediction circuit, the described historical shift register of zero clearing during the miss indication of described storer in receiving described translation look-aside buffer is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And
Be coupled to the feedback loop of described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1 be value 1.
12. branch predictor as claimed in claim 11 is characterized in that, described branch prediction circuit further comprises:
The branch address storer, storage is used for the address of each indicated branch of described historical shift register.
13. branch predictor as claimed in claim 11 is characterized in that, described storer comprises:
Three-state buffer.
14. branch predictor as claimed in claim 11 is characterized in that, described feedback loop comprises:
Be coupled to described historical shift register with door, describedly receive the output place value of described historical shift register and allow signal with door; And
Be coupled to described and door and described historical shift register or door, described or door receive from described with first input value and from second input value of described historical shift register, and export a new place value to described historical shift register.
15. a processor comprises:
Translation look-aside buffer;
Branch prediction circuit comprises:
Pattern history table, each clauses and subclauses in the described pattern history table all comprise a local history;
Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, be used to store the dynamic length global branch history that is used to execute instruction;
Be coupled to the storer of the historical shift register in described translation look-aside buffer and the described branch prediction circuit, the described historical shift register of zero clearing during the miss indication of described storer in receiving described translation look-aside buffer is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And
Be coupled to the feedback loop of described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1 be value 1.
16. processor as claimed in claim 15 is characterized in that, described branch prediction circuit further comprises:
The branch address storer, storage is used for the address of each indicated branch of described historical shift register.
17. processor as claimed in claim 15 is characterized in that, described storer comprises:
Three-state buffer.
18. processor as claimed in claim 15 is characterized in that, described feedback loop comprises:
Be coupled to described historical shift register with door, describedly receive the output place value of described historical shift register and allow signal with door; And
Be coupled to described and door and described historical shift register or door, described or door receive from described with first input value and from second input value of described historical shift register, and export a new place value to described historical shift register.
19. a computing system comprises:
Storer;
Be coupled to the processor of described storer, described processor comprises
Translation look-aside buffer;
Branch prediction circuit comprises:
Pattern history table, each clauses and subclauses in the described pattern history table all comprise a local history;
Be coupled to the historical shift register of described pattern history table and described translation look-aside buffer, described historical shift register is receiving miss indication from described translation look-aside buffer with himself zero clearing;
Be coupled to the latched memory of the historical shift register in described translation look-aside buffer and the described branch prediction circuit, the described historical shift register of zero clearing during the miss indication of described latched memory in receiving described translation look-aside buffer is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And
Be coupled to the feedback loop of described branch prediction circuit, described feedback loop keeps the value of highest significant position in the described branch prediction circuit when the length of described global branch history equals m-1 be value 1.
20. computing system as claimed in claim 19 is characterized in that, described branch prediction circuit further comprises:
The branch address storer, storage is used for the address of each indicated branch of described historical shift register.
21. a method that is used for branch prediction comprises:
Use comes the branch outcome of many execution commands in the prediction processor from the input of a translation look-aside buffer, wherein uses the input from translation look-aside buffer to come the branch outcome of many execution commands in the prediction processor to comprise:
Predict the branch outcome of each bar of described many execution commands;
The branch outcome of each bar of described many execution commands that maintenance is predicted; And
Receive about with the translation look-aside buffer of one of described many execution commands data that are associated in when miss indication takes place with the global branch history zero clearing, with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes.
22. method as claimed in claim 21 is characterized in that, described global branch history zero clearing is comprised receiving when miss indication takes place in translation look-aside buffer:
Replace described global branch history with a predetermined clear value.
23. an equipment that is used for branch prediction comprises:
Be used to use the device that comes the branch outcome of many execution commands in the prediction processor from the input of translation look-aside buffer, wherein be used to use input to come the device of the branch outcome of many execution commands in the prediction processor to comprise from translation look-aside buffer:
Be used to predict the device of branch outcome of each bar of described many execution commands;
Be used to keep the device of branch outcome of each bar of described many execution commands predicted; And
Be used for receive about in when, with the translation look-aside buffer of one of described many execution commands data that are associated miss indication taking place with the global branch history zero clearing, with the device of the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes.
24. equipment as claimed in claim 23 is characterized in that, is used for receiving about when miss indication takes place for described translation look-aside buffer the device of described global branch history zero clearing being comprised:
Be used for replacing the device of described global branch history with a predetermined clear value.
25. a method that is used for branch prediction comprises:
Use is selected a predicted entry from the input of translation look-aside buffer;
Predict based on described predicted entry and described input whether a branch can be taked;
Reception about described branch whether by the actual information of taking;
Whether use about described branch by the described predicted entry of the actual information updating of taking;
Whether the global history value of upgrading in the historical shift register is taked by actual to indicate described branch, the described historical shift register of zero clearing during the miss indication in receiving described translation look-aside buffer wherein is with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And
Take out next branch instruction.
26. method as claimed in claim 25 is characterized in that, uses and selects predicted entry to comprise from the input of translation look-aside buffer:
Use is selected a predicted entry from the input of described translation look-aside buffer from pattern history table.
27. method as claimed in claim 25 is characterized in that, upgrades described predicted entry and comprises:
Upgrade the described predicted entry in the pattern history table.
28. an equipment that is used for branch prediction comprises:
Be used to use the device of selecting a predicted entry from the input of translation look-aside buffer;
Be used for predicting the device whether a branch can be taked based on described predicted entry and described input;
Whether be used to receive about described branch by the device of the actual information of taking;
Whether be used to about described branch by the device of the actual described predicted entry of taking of information updating;
Whether the global history value that is used for upgrading historical shift register to indicate described branch by the actual device of taking, wherein during the miss indication in receiving described translation look-aside buffer the described historical shift register of zero clearing with the destruction of the global branch history avoiding being caused by the non-data localization that the translation look-aside buffer miss data causes; And
Be used to take out the device of next branch instruction.
29. equipment as claimed in claim 28 is characterized in that, is used to use from the input of translation look-aside buffer select the device of predicted entry to comprise:
Be used for using the device of selecting a predicted entry from the input of described translation look-aside buffer from pattern history table;
Be used to upgrade the global history value to indicate described branch whether by the actual device of taking; And
Be used to take out the device of next bar branch instruction.
30. equipment as claimed in claim 28 is characterized in that, the device that is used to upgrade described predicted entry comprises:
Be used for upgrading the device of the predicted entry of described pattern history table.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2004/000583 WO2005119428A1 (en) | 2004-06-02 | 2004-06-02 | Tlb correlated branch predictor and method for use therof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1961285A CN1961285A (en) | 2007-05-09 |
CN1961285B true CN1961285B (en) | 2011-05-25 |
Family
ID=35463053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2004800432090A Expired - Fee Related CN1961285B (en) | 2004-06-02 | 2004-06-02 | TLB correlated branch predictor and method for use therof |
Country Status (4)
Country | Link |
---|---|
JP (1) | JP4533432B2 (en) |
CN (1) | CN1961285B (en) |
DE (1) | DE112004002877T5 (en) |
WO (1) | WO2005119428A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7984279B2 (en) * | 2006-11-03 | 2011-07-19 | Qualcomm Incorporated | System and method for using a working global history register |
US10534613B2 (en) * | 2017-04-28 | 2020-01-14 | Intel Corporation | Supporting learned branch predictors |
GB2577708B (en) * | 2018-10-03 | 2022-09-07 | Advanced Risc Mach Ltd | An apparatus and method for monitoring events in a data processing system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5938761A (en) * | 1997-11-24 | 1999-08-17 | Sun Microsystems | Method and apparatus for branch target prediction |
US6427206B1 (en) * | 1999-05-03 | 2002-07-30 | Intel Corporation | Optimized branch predictions for strongly predicted compiler branches |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH04337833A (en) * | 1991-05-15 | 1992-11-25 | Koufu Nippon Denki Kk | Data processor |
IE940855A1 (en) * | 1993-12-20 | 1995-06-28 | Motorola Inc | Data processor with speculative instruction fetching and¹method of operation |
US6079003A (en) * | 1997-11-20 | 2000-06-20 | Advanced Micro Devices, Inc. | Reverse TLB for providing branch target address in a microprocessor having a physically-tagged cache |
US6546481B1 (en) * | 1999-11-05 | 2003-04-08 | Ip - First Llc | Split history tables for branch prediction |
US6681345B1 (en) * | 2000-08-15 | 2004-01-20 | International Business Machines Corporation | Field protection against thread loss in a multithreaded computer processor |
-
2004
- 2004-06-02 CN CN2004800432090A patent/CN1961285B/en not_active Expired - Fee Related
- 2004-06-02 JP JP2007513656A patent/JP4533432B2/en not_active Expired - Fee Related
- 2004-06-02 WO PCT/CN2004/000583 patent/WO2005119428A1/en active Application Filing
- 2004-06-02 DE DE112004002877T patent/DE112004002877T5/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5938761A (en) * | 1997-11-24 | 1999-08-17 | Sun Microsystems | Method and apparatus for branch target prediction |
US6427206B1 (en) * | 1999-05-03 | 2002-07-30 | Intel Corporation | Optimized branch predictions for strongly predicted compiler branches |
Non-Patent Citations (3)
Title |
---|
Kenneth L.Short.Embedded Microprocessor Systems Design:An Introduction Using the Intel 80C188EB 1.Prentice Hall,1998,185-188. |
Kenneth L.Short.Embedded Microprocessor Systems Design:An Introduction Using the Intel 80C188EB 1.Prentice Hall,1998,185-188. * |
Tse-Yu Yeh,Yale N.Patt.Alternative Implementations of Two-Level Adaptive BranchPrediction.The 19th Annual International Symposium on Computer Architecture.1992,124-134. * |
Also Published As
Publication number | Publication date |
---|---|
WO2005119428A1 (en) | 2005-12-15 |
DE112004002877T5 (en) | 2007-05-03 |
CN1961285A (en) | 2007-05-09 |
JP4533432B2 (en) | 2010-09-01 |
JP2008501166A (en) | 2008-01-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI507980B (en) | Optimizing register initialization operations | |
JP6849274B2 (en) | Instructions and logic to perform a single fused cycle increment-comparison-jump | |
CN102792265B (en) | Instruction based on machine state cracks | |
CN104756090B (en) | The caching for providing extension replaces status information | |
US6502188B1 (en) | Dynamic classification of conditional branches in global history branch prediction | |
CN104050012A (en) | Instruction Emulation Processors, Methods, And Systems | |
CN101449238A (en) | Local and global branch prediction information storage | |
TW201704991A (en) | Backtracking compatibility by algorithm matching, deactivating features, or limiting performance | |
US20080072024A1 (en) | Predicting instruction branches with bimodal, little global, big global, and loop (BgGL) branch predictors | |
CN104834503A (en) | Processor with granular add immediates capability & methods | |
US10664280B2 (en) | Fetch ahead branch target buffer | |
CN107111550A (en) | Conversion is omitted by selective page and prefetches conversion omission time delay in concealing program Memory Controller | |
WO2017172256A1 (en) | Processors, methods, and systems to allocate load and store buffers based on instruction type | |
RU2639695C2 (en) | Processors, methods and systems for gaining access to register set either as to number of small registers, or as to integrated big register | |
CN105453030A (en) | Mode dependent partial width load to wider register processors, methods, and systems | |
JP5941488B2 (en) | Convert conditional short forward branch to computationally equivalent predicate instruction | |
JPH10228377A (en) | Information processor for predicting branch | |
CN105247479B (en) | Instruction order implement instruction to, processor, method and system | |
US5784589A (en) | Distributed free register tracking for register renaming using an availability tracking register associated with each stage of an execution pipeline | |
KR20010033300A (en) | Branch prediction with return selection bits to categorize type of branch prediction | |
US6516410B1 (en) | Method and apparatus for manipulation of MMX registers for use during computer boot-up procedures | |
JP6253706B2 (en) | Hardware device | |
CN1961285B (en) | TLB correlated branch predictor and method for use therof | |
US10069512B2 (en) | Systems, methods, and apparatuses for decompression using hardware and software | |
US20060015706A1 (en) | TLB correlated branch predictor and method for use thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110525 Termination date: 20150602 |
|
EXPY | Termination of patent right or utility model |