CN101604235B

CN101604235B - A Method for Embedded Processor Branch Prediction

Info

Publication number: CN101604235B
Application number: CN2009101006080A
Authority: CN
Inventors: 郑秋华; 吴国华; 张祯; 王玉娟; 方美娥
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: NANTONG LIWANG MACHINE TOOL Co Ltd
Priority date: 2009-07-10
Filing date: 2009-07-10
Publication date: 2012-03-28
Anticipated expiration: 2029-07-10
Also published as: CN101604235A

Abstract

本发明涉及一种嵌入式处理器分支预测的方法。本发明方法是基于反向分支预测机制的改进，具体是改进反向分支预测机制的存储管理策略和指标控制，使其在遭遇复杂的巢状循环时不须立即清除原有记录，能充分利用储存空间以保留不同的巢状循环结构并精确地选择对应的分支记录进行预测。本发明针对嵌入式处理器的特定应用环境通过对反向分支预测机制的分支预测方法的改进，结合定制的分支目标缓冲，提出了一种动态的分支预测机制，该机制基在全局索引方式，对反向分支预测机制结构进行了改进设计，实现了对循环逻辑中反向分支指令的精确预测。The invention relates to a method for branch prediction of an embedded processor. The method of the present invention is based on the improvement of the reverse branch prediction mechanism, specifically improving the storage management strategy and index control of the reverse branch prediction mechanism, so that it does not need to clear the original records immediately when it encounters a complex nested cycle, and can make full use of The storage space keeps different nested loop structures and precisely selects the corresponding branch records for prediction. The present invention proposes a dynamic branch prediction mechanism based on the global index method by improving the branch prediction method of the reverse branch prediction mechanism and combining the customized branch target buffer for the specific application environment of the embedded processor. The structure of reverse branch prediction mechanism is improved, and the precise prediction of reverse branch instructions in loop logic is realized.

Description

A Method for Embedded Processor Branch Prediction

技术领域 technical field

本发明属于计算机技术领域，涉及一种微处理器的反向分支预测方法。The invention belongs to the technical field of computers and relates to a reverse branch prediction method of a microprocessor.

背景技术 Background technique

众所周知，分支预测技术一直是提高通用处理器性能的重要方法。分支预测的本质是克服指令控制相关，提高指令并行度，使得处理器的性能得到提高。在这方面学术界和工业界都进行了大量的研究和实践。现在的通用处理器大多采用深度流水线和宽发射机制，分支预测是两者的关键支撑技术。当前处理器靠深度流水提高主频一直还一直令人们不太满意，但是同时应该注意到没有一定深度的流水，处理器频率就不可能太高，也就不会有很高的性能。目前一般处理器的流水线都在10～30之间跳跃。耶鲁·帕特预测宽发射将成为单芯片集成10亿晶体管主要解决方案，而且宽发射也是提高单片处理器性能的重要手段之一。最近英特尔的新处理器Conroe就从原先奔腾的三发射提高到四发射，使得处理器性能提高30％。分支预测技术不仅在高性能通用处理器中采用，而且在嵌入式处理器也广泛采用。As we all know, branch prediction technology has always been an important method to improve the performance of general-purpose processors. The essence of branch prediction is to overcome instruction control correlation, improve instruction parallelism, and improve processor performance. Both academia and industry have done a lot of research and practice in this regard. Most of the current general-purpose processors use deep pipelines and wide launch mechanisms, and branch prediction is the key supporting technology for both. People are still not satisfied with current processors relying on deep pipelines to increase the main frequency, but at the same time, it should be noted that without a certain depth of pipelines, the processor frequency cannot be too high, and there will be no high performance. At present, the pipeline of general processors jumps between 10 and 30. Yale Pat predicts that wide emission will become the main solution for single-chip integration of 1 billion transistors, and wide emission is also one of the important means to improve the performance of single-chip processors. Recently, Intel's new processor Conroe has increased from the original Pentium's three launches to four launches, making the processor performance increase by 30%. Branch prediction technology is not only used in high-performance general-purpose processors, but also widely used in embedded processors.

发明内容 Contents of the invention

本发明的目的是提供一种硬件复杂度低，能有效提高预测精度、改善分支指令在管线执行中所导致的效能损失的分支预测方法。The object of the present invention is to provide a branch prediction method with low hardware complexity, which can effectively improve the prediction accuracy and improve the performance loss caused by branch instructions in pipeline execution.

本发明方法是基于反向分支预测机制的改进，具体是改进反向分支预测机制的存储管理策略和指标控制，使其在遭遇复杂的巢状循环时不须立即清除原有记录，能充分利用储存空间以保留不同的巢状循环结构并精确地选择对应的分支记录进行预测。The method of the present invention is based on the improvement of the reverse branch prediction mechanism, specifically improving the storage management strategy and index control of the reverse branch prediction mechanism, so that it does not need to clear the original records immediately when it encounters a complex nested cycle, and can make full use of The storage space keeps different nested loop structures and precisely selects the corresponding branch records for prediction.

本发明方法具体是这样实现的：The inventive method is specifically realized like this:

步骤(1).计算机提取指令，并将提取的指令与机制中内存栈首指针所指的记录对比，分别进行操作：Step (1). The computer extracts the instruction, and compares the extracted instruction with the record pointed to by the head pointer of the memory stack in the mechanism, and performs operations respectively:

a.如果提取的指令为改进的反向分支预测机制中记录过的指令，则送出预测分支指令的目的地址作为下一指令抓取指令的地址。a. If the extracted instruction is an instruction recorded in the improved reverse branch prediction mechanism, send the destination address of the predicted branch instruction as the address of the next instruction to grab the instruction.

b.如果提取的指令不是改进的反向分支预测机制中记录过的指令，则改进的反向分支预测机制不做特定动作，管线正常执行。b. If the extracted instruction is not an instruction recorded in the improved reverse branch prediction mechanism, the improved reverse branch prediction mechanism does not perform any specific action, and the pipeline executes normally.

步骤(2).对步骤(1)提取的指令进行处理，具体是：Step (2). The instructions extracted in step (1) are processed, specifically:

c.如果提取的指令为以往执行过的反向跳跃指令，并已在步骤a产生预测分支的效果，当指令与改进的反向分支预测机制中已经记录过的指令相符，则表示改进的反向分支预测机制预测正确；如果当指令与改进的反向分支预测机制中已经记录过的指令不相符，则表示预测错误，清除改进的反向分支预测机制抓取的指令，并恢复管线抓取正确的指令地址。c. If the extracted instruction is a reverse jump instruction that has been executed in the past, and has produced the effect of predicting the branch in step a, when the instruction matches the instruction that has been recorded in the improved reverse branch prediction mechanism, it means that the improved reverse The prediction to the branch prediction mechanism is correct; if the instruction does not match the instructions recorded in the improved reverse branch prediction mechanism, it means that the prediction is wrong, clear the instructions captured by the improved reverse branch prediction mechanism, and resume the pipeline capture correct instruction address.

d.如果指令不是记录在改进的反向分支预测机制中的目前预测的反向分支指令，当执行指令经译码判断为分支指令时，若分支指令为回向分支指令且发生跳跃时，判断其跳跃的目的地址与改进的反向分支预测机制中最外层巢状记录指标所对应的跳跃记录是否构成巢状，若未构成巢状则表示已脱离改进的反向分支预测机制已记录的巢状回路，两者皆在指令执行阶段更新其对应的改进的反向分支预测机制空间记录；如果为其他情况，则不做处理。d. If the instruction is not the currently predicted reverse branch instruction recorded in the improved reverse branch prediction mechanism, when the execution instruction is decoded and judged as a branch instruction, if the branch instruction is a reverse branch instruction and a jump occurs, judge Whether the destination address of the jump and the jump record corresponding to the outermost nested record index in the improved reverse branch prediction mechanism form a nest. For nested loops, both of them update their corresponding improved reverse branch prediction mechanism space records during the instruction execution phase; otherwise, they are not processed.

步骤(3).根据步骤(2)的判断结果执行指令，此时进行存储档位的指标调整，分为以下部分进行：Step (3). Execute the command according to the judgment result of step (2). At this time, the index adjustment of the storage gear is performed, which is divided into the following parts:

e.当改进的反向分支预测机制已记录的反向分支指令再一次被执行，预测其跳跃的确发生跳跃，此时改进的反向分支预测机制预测正确，若没有其它分支指令更改程序流程，储存档位回到改进的反向分支预测机制所构建之巢状循环最内层；改进的反向分支预测机制将在比较该分支跳跃地址与原有记录而判断出巢状循环的最内层后，更改读出预测地址的读出数据指标栈首指针令其指向巢状循环的最内层，其余指标保持不动。e. When the reverse branch instruction recorded by the improved reverse branch prediction mechanism is executed again, it is predicted that the jump does occur, and the improved reverse branch prediction mechanism predicts correctly at this time. If there are no other branch instructions to change the program flow, The storage position returns to the innermost layer of the nested loop constructed by the improved reverse branch prediction mechanism; the improved reverse branch prediction mechanism will judge the innermost layer of the nested loop by comparing the jump address of the branch with the original record Afterwards, change the read data index stack head pointer of the read prediction address to make it point to the innermost layer of the nested loop, and keep the rest of the index unchanged.

f.当改进的反向分支预测机制已记录的反向分支指令再一次被执行，预测其跳跃但因不符合执行条件而未发生跳跃时，改进的反向分支预测机制读出预测地址读出数据指标栈首指针指向下一个储存档位(即指向下一层循环圈)以进行预测，其余指标则保持不动。f. When the reverse branch instruction recorded by the improved reverse branch prediction mechanism is executed again, and its jump is predicted but the jump does not occur because it does not meet the execution conditions, the improved reverse branch prediction mechanism reads the predicted address and reads The head pointer of the data index stack points to the next storage position (that is, points to the next cycle circle) for prediction, and the other indexes remain unchanged.

g.当指令经过指令译码阶段的流程确认此指令为一条未存在于改进的反向分支预测机制记录档位中且发生跳跃的反向分支指令，并经由地址比较发现该指令与改进的反向分支预测机制档位所储存指令构成巢状循环，此时，若改进的反向分支预测机制未曾储存到末端档位而发生循环，则此分支指令根据数据写入指标的栈尾指针连续存入改进的反向分支预测机制档位中，并将读出数据指标的栈首指针调到巢状结构最内层，当前指针指向新建立之分支档位，而写入的数据指标的栈尾指针向下调整一个档位。g. When the instruction passes through the process of the instruction decoding stage, it is confirmed that the instruction is a reverse branch instruction that does not exist in the recording gear of the improved reverse branch prediction mechanism and jumps, and it is found that the instruction is the same as the improved reverse branch instruction through address comparison. The instruction stored in the branch prediction mechanism gear constitutes a nested loop. At this time, if the improved reverse branch prediction mechanism has not been stored in the end gear and a loop occurs, the branch instruction is stored continuously according to the stack tail pointer of the data write index. Into the gear of the improved reverse branch prediction mechanism, and adjust the stack head pointer of the read data indicator to the innermost layer of the nest structure, the current pointer points to the newly created branch gear, and the stack end of the written data indicator The pointer adjusts one notch down.

h.当指令经过指令译码阶段的流程确认此指令为一条未存在于改进的反向分支预测机制记录档位中且发生跳跃的反向分支指令，并经由地址比较发现该指令与改进的反向分支预测机制档位所储存指令构成巢状循环，此时若改进的反向分支预测机制已储存到末端档位而发生循环，改进的反向分支预测机制由最初档位开始覆盖原有记录。h. When the instruction passes through the process of the instruction decoding stage, it is confirmed that this instruction is a reverse branch instruction that does not exist in the recording gear of the improved reverse branch prediction mechanism and jumps, and it is found that the instruction is the same as the improved reverse branch instruction through address comparison. The instructions stored in the branch prediction mechanism gear form a nested loop. At this time, if the improved reverse branch prediction mechanism has been stored to the end gear and a cycle occurs, the improved reverse branch prediction mechanism will overwrite the original record from the first gear .

i.若指令为前置分支指令且跳跃地址未超出改进的反向分支预测机制中已存反向分支的PC范围，在比较各存储档位与此前置跳跃地址后改进的反向分支预测机制将控制栈首栈尾，任意指针调整到对应地址。i. If the instruction is a pre-branch instruction and the jump address does not exceed the PC range of the reverse branch stored in the improved reverse branch prediction mechanism, the improved reverse branch prediction after comparing each storage position with the pre-jump address The mechanism will control the head and tail of the stack, and adjust any pointer to the corresponding address.

j.若指令为跳跃地址超出改进的反向分支预测机制中已存反向分支的PC范围的前置分支指令或其它非分支指令，由于此类行为不会对程序流程造成改变，此时改进的反向分支预测机制将保持不变。j. If the instruction is a pre-branch instruction or other non-branch instruction whose jump address exceeds the PC range of the reverse branch stored in the improved reverse branch prediction mechanism, since such behavior will not cause changes to the program flow, the improved The reverse branch prediction mechanism will remain unchanged.

本发明针对嵌入式处理器的特定应用环境通过对反向分支预测机制的分支预测方法的改进，结合定制的分支目标缓冲，提出了一种动态的分支预测机制，该机制基在全局索引方式，对反向分支预测机制结构进行了改进设计，实现了对循环逻辑中反向分支指令的精确预测。The present invention proposes a dynamic branch prediction mechanism based on the global index method by improving the branch prediction method of the reverse branch prediction mechanism and combining the customized branch target buffer for the specific application environment of the embedded processor. The structure of reverse branch prediction mechanism is improved, and the precise prediction of reverse branch instructions in loop logic is realized.

附图说明 Description of drawings

图1为本发明包含子过程调用的大型巢状循环图；Fig. 1 is a large-scale nested loop diagram comprising sub-process calls in the present invention;

图2为本发明六组六向分支构成之大型巢状循环图。Fig. 2 is a diagram of a large-scale nested cycle formed by six groups of six-direction branches in the present invention.

具体实施方式Detailed ways

下面结合附图和实例来对本发明所述的一种嵌入式处理器分支预测的方法做进一步的描述。A method for branch prediction of an embedded processor according to the present invention will be further described below in conjunction with the accompanying drawings and examples.

一种嵌入式处理器分支预测的方法的具体步骤是：The specific steps of a method for embedded processor branch prediction are:

步骤(1).计算机提取指令，并将提取的指令与改进的反向分支预测机制中内存栈首指针所指的记录对比，分别进行操作：Step (1). The computer extracts the instruction, and compares the instruction extracted with the record indicated by the head pointer of the memory stack in the improved reverse branch prediction mechanism, and operates respectively:

b.如果提取的指令不是机制中记录过的指令，则反向分支预测机制不做特定动作，管线正常执行。b. If the fetched instruction is not an instruction recorded in the mechanism, the reverse branch prediction mechanism does not perform specific actions, and the pipeline executes normally.

c.如果提取的指令为以往执行过的反向跳跃指令，并已在步骤a产生预测分支的效果，当指令与改进的反向分支预测机制中已经记录过的指令相符，则表示改进的反向分支预测机制预测正确；若不相符，则表示预测错误，并且清除改进的反向分支预测机制抓取的指令，并恢复管线抓取正确的指令地址。c. If the extracted instruction is a reverse jump instruction that has been executed in the past, and has produced the effect of predicting the branch in step a, when the instruction matches the instruction that has been recorded in the improved reverse branch prediction mechanism, it means that the improved reverse The prediction to the branch prediction mechanism is correct; if it does not match, it means that the prediction is wrong, and the instructions captured by the improved reverse branch prediction mechanism are cleared, and the pipeline is restored to capture the correct instruction address.

d.如果指令为不是记录在改进的反向分支预测机制中的目前预测的反向分支指令，当执行指令经译码判断为分支指令时，若分支指令为回向分支指令且发生跳跃时，判断其跳跃的目的地址与改进的反向分支预测机制中最外层巢状记录指标所对应的跳跃记录是否构成巢状，若未构成巢状则表示已脱离改进的反向分支预测机制已记录的巢状回路，两者皆在指令执行阶段更新其对应的改进的反向分支预测机制空间记录；如果为其他情况，则不做处理。d. If the instruction is a currently predicted reverse branch instruction that is not recorded in the improved reverse branch prediction mechanism, when the execution instruction is decoded and judged to be a branch instruction, if the branch instruction is a reverse branch instruction and a jump occurs, Judging whether the destination address of the jump and the jump record corresponding to the outermost nested record index in the improved reverse branch prediction mechanism form a nest, if no nest is formed, it means that the improved reverse branch prediction mechanism has been recorded In the nested loop of , both of them update their corresponding improved reverse branch prediction mechanism space records in the instruction execution stage; if it is other cases, it will not be processed.

g.当指令经过指令译码阶段的流程确认此指令为一条未存在于改进的反向分支预测机制记录档位中且发生跳跃的反向分支指令，并经由地址比较发现该指令与改进的反向分支预测机制档位所储存指令构成巢状循环，此时若改进的反向分支预测机制未曾储存到末端档位而发生循环，则此分支指令需根据数据写入指标的栈尾指针连续存入改进的反向分支预测机制档位中，并将读出数据指标的栈首指针调到巢状结构最内层，当前指针指向新建立之分支档位，而写入的数据指标的栈尾指针向下调整一个档位。g. When the instruction passes through the process of the instruction decoding stage, it is confirmed that the instruction is a reverse branch instruction that does not exist in the recording gear of the improved reverse branch prediction mechanism and jumps, and it is found that the instruction is the same as the improved reverse branch instruction through address comparison. The instruction stored in the branch prediction mechanism gear constitutes a nested loop. At this time, if the improved reverse branch prediction mechanism has not been stored in the end gear and a loop occurs, the branch instruction needs to be continuously stored according to the stack tail pointer of the data writing index. Into the gear of the improved reverse branch prediction mechanism, and adjust the stack head pointer of the read data indicator to the innermost layer of the nest structure, the current pointer points to the newly created branch gear, and the stack end of the written data indicator The pointer adjusts one notch down.

通过模拟评估可得出在图1的大型巢状循环时，其预测正确率分别为89.4％与90.9％。以图2为例，在遭遇其他复杂结构如调用子程序的干扰时，在主程序S1的三组双层巢状循环中，BL S2将调用子程序S2并遭遇其它的巢状循环结构，此时反向分支预测机制无论是否还有可用的储存档位均需立即清除已有的分支记录造成效能降低，导致反向分支预测机制与改进的反向分支预测机制的预测正确率分别变为75.7％与90.8％，产生显著的差异。Through simulation evaluation, it can be concluded that the prediction accuracy rates are 89.4% and 90.9% for the large-scale nested circulation in Fig. 1, respectively. Taking Figure 2 as an example, when encountering interference from other complex structures such as calling subroutines, in the three sets of double-layer nested loops of the main program S1, BL S2 will call the subroutine S2 and encounter other nested loop structures. The reverse branch prediction mechanism needs to clear the existing branch records immediately regardless of whether there are available storage slots, resulting in a decrease in performance, resulting in a prediction accuracy rate of 75.7% for the reverse branch prediction mechanism and the improved reverse branch prediction mechanism. % and 90.8%, resulting in a significant difference.

Claims

1. the method for a branch prediction of embedded processor is characterized in that the concrete steps of this method are:

Step (1). computing machine extracts instruction, and the record of internal memory stack owner pointer indication contrasts in the instruction that will extract and the mechanism, operates respectively:

If the instruction of a. extracting is the instruction of writing down in the improved backward branch forecasting mechanism, the destination address of then seeing the predicted branches instruction off grasps the address of instruction as next instruction;

If the instruction of b. extracting is not the instruction of writing down in the improved backward branch forecasting mechanism, then improved backward branch forecasting mechanism is not done specific action, and pipeline is normally carried out;

Step (2). the instruction to step (1) is extracted is handled, specifically:

If the reverse skip instruction of the instruction of c. extracting for carried out in the past; And produced the effect of predicted branches at step a; The instruction of having write down conforms in instruction and improved backward branch forecasting mechanism, then representes improved backward branch forecasting mechanism prediction correctly; If the instruction of having write down does not conform in instruction and improved backward branch forecasting mechanism, then represent prediction error, remove the instruction that improved backward branch forecasting mechanism grasps, and recover the correct instruction address of pipeline extracting;

If d. instruction is not the backward branch instruction that is recorded in the present prediction in the improved backward branch forecasting mechanism; When executing instruction when decoding is judged as branch instruction; If branch instruction is back when branch instruction and generation jump; Judge whether the pairing jump record of outermost layer nido record index constitutes nido in destination address and the improved backward branch forecasting mechanism of its jump; If not constituting nido then representes to have broken away from the nido loop that improved backward branch forecasting mechanism has write down, the record formation of jumping nido does not constitute nido with the record that jumps, and the two all upgrades the improved backward branch forecasting mechanism spatial registration of its correspondence in the execution phase; If be other situation, then do not process;

Step (3). according to the judged result execution command of step (2), store the index adjustment of gear this moment, be divided into the lower part and carry out:

E. the backward branch instruction that has write down when improved backward branch forecasting mechanism is performed again; Predict that its jump jumps really; This moment, improved backward branch forecasting mechanism prediction was correct; If do not have other branch instruction change program circuit, store gear and get back to the constructed nido circulation innermost layer of improved backward branch forecasting mechanism; Improved backward branch forecasting mechanism will be at relatively this branch's jumping address and original record and after judging nido round-robin innermost layer; The sense data index stack owner pointer that predicted address is read in change makes it point to nido round-robin innermost layer, and it is motionless that all the other indexs keep;

F. the backward branch instruction that has write down when improved backward branch forecasting mechanism is performed again; Predict its jump but when not meeting executive condition and do not take place to jump; Improved backward branch forecasting mechanism is read predicted address sense data index stack owner pointer and is pointed to the next gear that stores to predict, it is motionless that all the other indexs then keep;

G. confirm that through the flow process in instruction decode stage this instruction is a backward branch instruction that is not present in the improved backward branch forecasting mechanism record gear and jump takes place when instructing; And find relatively that via the address the stored instruction of this instruction and improved backward branch forecasting mechanism gear constitutes the nido circulation; At this moment; If improved backward branch forecasting mechanism be not stored into terminal gear and had circulated; Then this branch instruction deposits in the improved backward branch forecasting mechanism gear according to the stack tail pointer that data write index continuously; And the stack owner pointer of sense data index is transferred to nido structure innermost layer, current pointer points to branch's gear of new foundations, and the stack tail pointer of the data target that writes is adjusted a gear downwards;

H. confirm that through the flow process in instruction decode stage this instruction is a backward branch instruction that is not present in the improved backward branch forecasting mechanism record gear and jump takes place when instructing; And find relatively that via the address the stored instruction of this instruction and improved backward branch forecasting mechanism gear constitutes the nido circulation; This moment, improved backward branch forecasting mechanism began to cover original record by initial gear if improved backward branch forecasting mechanism has been stored into terminal gear and has circulated;

I. if instruction does not exceed the PC scope of having deposited backward branch in the improved backward branch forecasting mechanism for preposition branch instruction and jumping address; Improved backward branch forecasting mechanism is with the first stack tail of control stack after respectively storing the preposition therewith jumping address of gear, and pointer is adjusted to corresponding address arbitrarily;

J. if instruction exceeds preposition branch instruction or other non-branch instruction of the PC scope of having deposited backward branch in the improved backward branch forecasting mechanism for jumping address; Because this class behavior can not cause change to program circuit, improved backward branch forecasting mechanism will remain unchanged this moment.