JPH06161748A

JPH06161748A - Subroutine return prediction mechanism

Info

Publication number: JPH06161748A
Application number: JP2403242A
Authority: JP
Inventors: Jr Simon C Steely; シースティーリージュニアシモン; David J Sager; ジェイサージャーディヴィッド
Original assignee: Digital Equipment Corp
Current assignee: Digital Equipment Corp
Priority date: 1989-12-18
Filing date: 1990-12-18
Publication date: 1994-06-10
Also published as: US5179673A; EP0433709B1; DE69033339T2; KR940009378B1; EP0433709A2; KR910012914A; EP0433709A3; CA2032320A1; DE69033339D1

Abstract

(57)【要約】【目的】パイプライン式コンピュータにおいてスタッ
クを使用するときに複雑な手順を必要とせずパイプライ
ン内のバブルを除去すると共に復帰アドレスを与える機
構を提供する。【構成】コンピュータパイプラインにおいてサブルー
チン復帰命令の入力に応答して予想されるサブルーチン
復帰アドレスを発生する機構であり、リングポインタカ
ウンタと、これに接続されたリングバッファとを有して
いる。リングポインタカウンタのリングポインタは、サ
ブルーチン呼び出し命令又は復帰命令のいずれかがパイ
プラインに入るときに変更される。リングバッファのバ
ッファ位置は、サブルーチン呼び出し命令がパイプライ
ンに入るときリングポインタで指示されたバッファ位置
に入力の値が記憶されるようになっている。リングバッ
ファは、サブルーチン復帰命令がコンピュータパイプラ
インに入るときにリングポインタで指示されたバッファ
位置からの値を与え、この値が予想されるサブルーチン
復帰アドレスとなる。 (57) [Summary] [Object] To provide a mechanism for removing a bubble in a pipeline and giving a return address without using a complicated procedure when using a stack in a pipeline type computer. A mechanism for generating an expected subroutine return address in response to an input of a subroutine return instruction in a computer pipeline, which has a ring pointer counter and a ring buffer connected thereto. The ring pointer of the ring pointer counter is changed when either a subroutine call instruction or a return instruction enters the pipeline. The buffer position of the ring buffer is such that the value of the input is stored in the buffer position designated by the ring pointer when the subroutine call instruction enters the pipeline. The ring buffer gives a value from the buffer position pointed to by the ring pointer when the subroutine return instruction enters the computer pipeline, and this value becomes the expected subroutine return address.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、パイプライン式コンピ
ュータにおいて制御の流れを変更する技術に係り、より
詳細には、高度にパイプライン構造となったコンピュー
タにおいてサブルーチンに関連した命令を処理すること
に係る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique for changing a control flow in a pipeline type computer, and more particularly to processing instructions related to a subroutine in a computer having a highly pipelined structure. Pertain to.

【０００２】[0002]

【従来の技術】コンピュータにおいて命令をパイプライ
ン構成にする考え方は良く知られている。単一命令の処
理は、フェッチ、デコード及び実行といった多数の異な
った段階で行なわれる。パイプライン式コンピュータに
おいては、種々の段階の各々が種々の命令に同時に作用
する。例えば、パイプラインの長さが３つの段階のみで
ある場合には、パイプラインを通過した第１の命令が第
３の段階によって実行されている間に、パイプラインに
入る第２の命令が第２の段階によって実行され、一方、
パイプラインに入る第３の命令は第１の段階によって実
行される。パイプライン構成は、単一の命令が完全に処
理されるのを待ってから第２命令の処理を開始する場合
に比して、非常に効率的な命令処理方法である。コンピ
ュータプログラムの通常の流れにおいては、どの命令が
次にパイプラインに入るかが容易に分かる。ほとんどの
場合に、順番で次の命令がパイプラインに入り、従っ
て、例えば、命令１０１は命令１００の後にパイプライ
ンに入る。2. Description of the Related Art The concept of making instructions pipelined in a computer is well known. The processing of a single instruction occurs in many different stages, such as fetching, decoding and executing. In pipelined computers, each of the various stages operates on various instructions simultaneously. For example, if the length of the pipeline is only three stages, then while the first instruction that has passed through the pipeline is being executed by the third stage, the second instruction that enters the pipeline is Performed in two stages, while
The third instruction entering the pipeline is executed by the first stage. The pipeline configuration is a very efficient instruction processing method as compared with the case where the processing of the second instruction is started after waiting for the single instruction to be completely processed. In the normal flow of a computer program, it is easy to see which instruction goes into the pipeline next. In most cases, the next instruction in sequence enters the pipeline, so, for example, instruction 101 enters the pipeline after instruction 100.

【０００３】この通常の制御の流れに対する１つの例外
はサブルーチンとして知られている。サブルーチンと
は、或るプログラム又は別々のプログラム内の別々の点
において同じタスクを実行するために呼び出すことので
きるプログラム又は一連の命令である。例えば、命令１
００は、命令２００で実行を開始するサブルーチンを呼
び出す。サブルーチンは命令２００ないし２０２を実行
し、次いで、命令１０２において主たる流れに戻る。更
に、命令２００ないし２０２より成る同じサブルーチン
が主たる流れの多数の種々の位置から呼び出されそして
主たる流れの種々の位置へ戻る。One exception to this normal flow of control is known as a subroutine. A subroutine is a program or set of instructions that can be called to perform the same task at different points within a program or different programs. For example, instruction 1
00 calls a subroutine that begins execution at instruction 200. The subroutine executes instructions 200-202 and then returns to the main flow at instruction 102. In addition, the same subroutine of instructions 200-202 is called from a number of different locations in the main stream and returns to various locations in the main stream.

【０００４】これらのサブルーチンは、著しいパイプラ
イン構造のコンピュータ（パイプラインに多数の段を有
するもの）について問題を課する。サブルーチンを呼び
出す命令は、どれがパイプラインに入る次の命令である
か（即ち、呼び出されたサブルーチンにおける第１の命
令であるか）を判断するに充分な情報を含んでいるが、
サブルーチンにおける復帰命令はこのような情報を含ま
ない。実際に、復帰命令は、この復帰命令から復帰アド
レスが分かる前にパイプラインの全ての段を通過しなけ
ればならない。復帰命令がパイプラインを通過してから
別の命令に入るまでコンピュータが待期された場合に
は、復帰命令の後に全く命令のないバブルが生じ、コン
ピュータの性能を低下させる。These subroutines pose a problem for computers of significant pipeline structure (those with multiple stages in the pipeline). The instruction that calls the subroutine contains enough information to determine which is the next instruction to enter the pipeline (ie, the first instruction in the called subroutine),
The return instruction in the subroutine does not include such information. In fact, the return instruction must go through all stages of the pipeline before the return address is known from the return instruction. If the computer is delayed until the return instruction passes through the pipeline until it enters another instruction, a bubble with no instruction occurs after the return instruction, degrading the performance of the computer.

【０００５】これらのバブルを回避するために、スタッ
クとして知られている機構が使用されていた。基本的
に、スタックはサブルーチンが呼び出されたときに復帰
アドレスを記憶するものであり、サブルーチンが完了し
て制御が復帰命令によって主たる流れに復帰されたとき
には、この復帰アドレスがスタック内で探索されそして
パイプラインに送られる。従って、パイプラインは、適
当な命令をパイプラインに入れることにより制御を主た
る流れに復帰させることができる。復帰アドレスのスタ
ックを保持しそしてサブルーチンから戻るときにこれら
の復帰アドレスを用いて次の命令を探索することによ
り、パイプライン内のバブルが除去される。To avoid these bubbles, a mechanism known as the stack was used. Basically, the stack stores the return address when the subroutine is called, and when the subroutine is completed and control is returned to the main flow by the return instruction, this return address is searched in the stack and Sent to the pipeline. Therefore, the pipeline can return control to the main flow by putting the appropriate instructions into the pipeline. Bubbles in the pipeline are eliminated by holding a stack of return addresses and using these return addresses to search for the next instruction when returning from the subroutine.

【０００６】[0006]

【発明が解決しようとする課題】スタック機構に伴なう
問題は、スタックのサイズに限度があると共に、多数の
サブルーチンが呼び出されたときにスタックのオーバー
ラン及びアンダーランを取り扱う手順が複雑であること
である。換言すれば、スタックが１２の位置を含んでい
る場合に、スタックオーバーランに対する複雑な手順に
頼ることなく一度に呼び出すことのできるサブルーチン
は１２個であるに過ぎない。The problem with the stack mechanism is that the size of the stack is limited and the procedure for handling stack overruns and underruns when multiple subroutines are called is complicated. That is. In other words, if the stack contains 12 locations, only 12 subroutines can be called at a time without resorting to the complicated procedure for stack overruns.

【０００７】そこで、復帰アドレスを与えると共に、パ
イプライン内のバブルを排除し、然もスタックを用いる
ときに必要な複雑な手順を必要としない機構が要望され
ている。Therefore, there is a demand for a mechanism that gives a return address, eliminates bubbles in the pipeline, and does not require the complicated procedure required when using the stack.

【０００８】[0008]

【課題を解決するための手段】このそして別の要望は、
サブルーチン復帰アドレスを予想するようにスタック機
構の特性を模擬するリングバッファを提供する本発明に
よって満足される。このリングバッファは、サブルーチ
ンが呼び出されるたびに（即ち、パイプラインに入るた
びに）予想される復帰アドレスをそのリングバッファ位
置の１つに記憶する。サブルーチン復帰命令がパイプラ
インに入ると、予想される復帰アドレスがリングバッフ
ァからパイプラインに送られ、主たる流れからの適当な
命令がパイプラインに入ることができる。このように、
パイプラインのバブルが除去される。[Means for Solving the Problems] This and another request are
It is satisfied by the present invention to provide a ring buffer that mimics the characteristics of the stack mechanism to predict the subroutine return address. This ring buffer stores the expected return address each time the subroutine is called (ie, each time it enters the pipeline) in one of its ring buffer locations. When a subroutine return instruction enters the pipeline, the expected return address is sent from the ring buffer to the pipeline so that the appropriate instructions from the main stream can enter the pipeline. in this way,
Bubbles in the pipeline are removed.

【０００９】本発明に用いられるリングバッファはサイ
ズが限定されたものであり、例えば、８個の異なった復
帰アドレスを記憶するための８つの位置を有している。
８個より多いサブルーチンが復帰なしに呼び出された場
合には、リングバッファ形態で、最も早く記憶された復
帰アドレスが、そのリングバッファに記憶されているよ
り最近の復帰アドレスでオーバーライトされる。最終的
に、オーバーライトされた復帰アドレスに関連したサブ
ルーチンがパイプラインを通る処理を完了し、そして制
御の流れが主たる流れに変更されたときには、リングバ
ッファ内の予想される復帰アドレスが間違ったものとな
る。復帰命令の実際の復帰アドレスは、復帰命令がパイ
プラインを完全に通過したときにパイプラインの端で分
かる。この実際の復帰アドレスは、この時にリングバッ
ファ内の予想される復帰アドレスと比較される。予想さ
れる復帰アドレスが間違っていることがこの比較によっ
て示されると、パイプラインがフラッシュされ、実際の
復帰アドレスを用いてその命令から再開される。The ring buffer used in the present invention is of limited size and has, for example, eight positions for storing eight different return addresses.
If more than eight subroutines are called without a return, the earliest stored return address in ring buffer form is overwritten with the more recent return address stored in that ring buffer. Eventually, the expected return address in the ring buffer is incorrect when the subroutine associated with the overwritten return address completes processing through the pipeline, and control flow is changed to main flow. Becomes The actual return address of the return instruction is known at the end of the pipeline when the return instruction has completely traversed the pipeline. This actual return address is then compared to the expected return address in the ring buffer. If this comparison indicates that the expected return address is incorrect, the pipeline is flushed and the instruction is restarted with the actual return address.

【００１０】充分な働きをするプログラムの場合、復帰
アドレスは９０％の時間にわたって正しく予想される。For well-functioning programs, the return address is correctly predicted over 90% of the time.

【００１１】[0011]

【実施例】図１はコンピュータプログラムの制御の流れ
を示すブロック図である。命令１００−１０７は、命令
の主たる流れ１０を作り上げる命令である。命令２００
−２０２の第２の流れがサブルーチン１２を構成する。
図１の例においては、サブルーチン１２が２つの命令１
０１及び１０４の１つから呼び出される。サブルーチン
１２が例えば命令１０１から呼び出されると、コンピュ
ータは命令２００、２０１を実行し、命令２０２で主た
る流れ１０に復帰する。主たる流れ１０の実行は命令１
０２において再開される。然し乍ら、サブルーチン１２
が命令１０４から呼び出された場合には、サブルーチン
１２は命令の流れを命令１０５において主たる流れ１０
に戻さねばならない。1 is a block diagram showing the control flow of a computer program. Instructions 100-107 are instructions that make up the main flow of instructions 10. Command 200
The second stream at -202 constitutes subroutine 12.
In the example of FIG. 1, the subroutine 12 has two instructions 1
Called from one of 01 and 104. When the subroutine 12 is called from, for example, the instruction 101, the computer executes the instructions 200 and 201, and the instruction 202 returns to the main flow 10. Execution of main flow 10 is instruction 1
It will be resumed at 02. However, subroutine 12
, Is called from instruction 104, subroutine 12 directs the instruction flow to the main flow 10 at instruction 105.
Must be returned to.

【００１２】上記流れの説明から明らかなように、主た
る流れ１０は２つの場所の１つにおいてサブルーチン１
２から復帰することができる。もっと大規模なプログラ
ムにおいては、主たる流れへの復帰をいかなる数の場所
でも行なうことができる。パイプライン式コンピュータ
において命令を実行する場合には、パイプラインの次々
の段で多数の動作が実行される。コンピュータのパイプ
ライン構成は公知であり、パイプラインの個別の段で個
別の命令に対して同時に動作することが含まれる。例え
ば、パイプラインに５つの段がある場合、これら５つの
段の各々において異なった命令が同時に実行され、各段
において個々の命令に対して別々の動作が行なわれる。
命令をパイプライン構成にすることは、各命令が完了し
てから次の命令の処理を開始するのを待機する場合より
も非常に効率的に命令を処理する方法である。As is apparent from the above description of the flow, the main flow 10 is the subroutine 1 in one of two places.
You can return from 2. In larger programs, the return to the main flow can occur at any number of locations. When executing instructions in a pipelined computer, many operations are performed in successive stages of the pipeline. Computer pipeline configurations are well known and involve operating simultaneously on separate instructions in separate stages of the pipeline. For example, if there are five stages in the pipeline, different instructions are executed concurrently in each of these five stages, with each stage performing different operations on individual instructions.
Pipelining instructions is a way to process instructions much more efficiently than waiting for each instruction to complete before starting to process the next instruction.

【００１３】本発明により構成されたパイプラインが図
２に示されており、参照番号２０で示されている。パイ
プライン２０はプログラムカウンタバッファ２１を有し
ており、これはプログラムカウント即ち“ＰＣ”を命令
キャッシュ２２にバッファする。命令キャッシュ２２に
はいつでも多数の命令が記憶される。命令キャッシュ２
２がプログラムカウンタバッファ２１からＰＣを受け取
ると、コード化された命令が命令フェッチデコーダ２４
に送られる。その名称の示す通り、デコーダ２４は、プ
ログラムカウンタバッファ２１によって指示された命令
キャッシュ２２からの命令をデコードする。パイプライ
ン技術によれば、デコーダ２４はパイプラインの第１命
令をデコードし、その間に第２の命令が命令キャッシュ
２２内で探索される。A pipeline constructed in accordance with the present invention is shown in FIG. 2 and is designated by the reference numeral 20. The pipeline 20 has a program counter buffer 21, which buffers the program count or "PC" in the instruction cache 22. A large number of instructions are stored in the instruction cache 22 at any time. Instruction cache 2
When 2 receives the PC from the program counter buffer 21, the coded instruction is transferred to the instruction fetch decoder 24.
Sent to. As the name implies, the decoder 24 decodes the instruction from the instruction cache 22 designated by the program counter buffer 21. According to the pipeline technique, the decoder 24 decodes the first instruction of the pipeline while the second instruction is searched in the instruction cache 22.

【００１４】デコードされた命令はデコーダ２４から命
令バッファ２６に送られ、該バッファ２６はデコードさ
れた命令を単にバッファするだけのものである。命令バ
ッファ２６の後には、スケジューリング形式のファンク
ションを実行する発行論理回路２８がある。パイプライ
ン内の他の段は参照番号３０で示されているが、これは
いかなる数の段であってもよい。パイプライン２０の最
終段は、命令を実行する実行段３２である。第２図のパ
イプライン２０には６個の段があるので、６個までの命
令を同時に処理することができる。Decoded instructions are sent from decoder 24 to instruction buffer 26, which merely buffers the decoded instructions. Following the instruction buffer 26 is an issue logic circuit 28 which performs a scheduling type function. The other stages in the pipeline are shown at reference numeral 30, but this could be any number of stages. The final stage of the pipeline 20 is the execution stage 32, which executes instructions. Since pipeline 20 of FIG. 2 has six stages, up to six instructions can be processed simultaneously.

【００１５】通常は、命令キャッシュ２２から１つの命
令、例えば命令１００が呼び出された後に、次の命令が
キャッシュ２２から呼び出され、順番で次の命令、例え
ば命令１０１となる。命令１０１のような呼び出し命令
の場合には、デコードされた命令それ自体が、実行され
るべき次の命令のためのＰＣを与える。この場合、呼び
出し命令の後に実行されるべき次の命令は、サブルーチ
ンの第１命令、例えば命令２００となる。このように、
順番で次の命令（ＰＣ＋１）か又はデコードされた呼び
出し命令により指示された命令かのいずれかを用いるこ
とにより、パイプライン２０にはいっぱいの命令が保持
される。パイプライン内の１つ以上の段にバブルがある
（命令がない）場合にはパイプライン２０が効率的に使
用されない。このようなバブルは、以下に述べるよう
に、命令２０２のようなサブルーチン復帰命令で潜在的
に生じることがある。Usually, after one instruction, for example, the instruction 100, is called from the instruction cache 22, the next instruction is called from the cache 22 and becomes the next instruction, for example, the instruction 101 in order. In the case of a calling instruction, such as instruction 101, the decoded instruction itself provides the PC for the next instruction to be executed. In this case, the next instruction to be executed after the calling instruction is the first instruction of the subroutine, for example instruction 200. in this way,
A full instruction is held in pipeline 20 by using either the next instruction in turn (PC + 1) or the instruction pointed to by the decoded call instruction. Pipeline 20 is not used efficiently if there are bubbles (no instructions) in one or more stages in the pipeline. Such bubbles can potentially occur on subroutine return instructions, such as instruction 202, as described below.

【００１６】サブルーチン復帰命令でパイプライン２０
にバブルが生じる理由は、サブルーチン復帰命令が実行
段３２において実行されてしまうまで、主たる流れに復
帰する実際の場所（実際の復帰アドレス）が分からない
からである。更に別の手段がとられない場合には、次の
命令のアドレスが分からないので段２２−３０がサブル
ーチン復帰命令の後方で空になる。先に述べたように、
このバブルはパイプライン２０の使用効率が低いことを
表わしている。Pipeline 20 by a subroutine return instruction
The reason why the bubble occurs is that the actual location (actual return address) of returning to the main flow is unknown until the subroutine return instruction is executed in the execution stage 32. If no further measures are taken, the address of the next instruction is not known and stages 22-30 are empty after the subroutine return instruction. As mentioned earlier,
This bubble indicates that the usage efficiency of the pipeline 20 is low.

【００１７】パイプライン２０のバブルを防ぐために
は、パイプライン２０の復帰命令の後に処理すべき次の
命令が何であるかを予想する機構がなければならない。
或るコンピュータアーキテクチャでは、サブルーチンが
呼び出されるたびに各々の復帰アドレスを記憶するスタ
ックを使用することができる。このスタックは後入れ先
出し方式でこれらの復帰アドレスを記憶し、これによ
り、パイプラインにおいて次の復帰命令がデコードされ
るときには、スタックに記憶された最後の復帰アドレス
がスタックから発生される最初の復帰アドレスとなる。
然し乍ら、或るコンピュータアーキテクチャでは、スタ
ックを使用することができない。In order to prevent bubbles in the pipeline 20, there must be a mechanism for predicting what the next instruction to process after the pipeline 20 return instruction.
Some computer architectures can use a stack that stores each return address each time the subroutine is called. This stack stores these return addresses in a last in, first out manner so that when the next return instruction is decoded in the pipeline, the last return address stored in the stack is the first return generated from the stack. It becomes an address.
However, in some computer architectures the stack cannot be used.

【００１８】本発明は、スタック装置を実際に使用する
ことなくスタックの基本的な機能を果たすものである。
（本発明はスタックのないアーキテクチャに有用である
が、スタックを使用するアーキテクチャにも有用であ
る。）むしろ、図２に示すように、リングバッファ３４
が使用される。このリングバッファ３４は、比較的少数
の予想される復帰アドレスを保持する。以下で詳細に説
明するように、リングバッファ３４に保たれた予想され
る復帰アドレスは、命令フェッチデコーダ２４が復帰ア
ドレスをデコードしたときに命令キャッシュ２２から次
の命令をフェッチするのに使用される。次いで、パイプ
ライン２０は、それ自体にバブルを形成することなく次
の命令に基づいて動作し続ける。予想される復帰アドレ
スが最終的に間違っていると分った場合には、命令シー
ケンスが中止され、正しい命令シーケンスを用いて実行
が続けられる。The present invention performs the basic functions of the stack without actually using the stack device.
(The present invention is useful for architectures without stacks, but also for architectures that use stacks.) Rather, as shown in FIG.
Is used. This ring buffer 34 holds a relatively small number of expected return addresses. The expected return address held in the ring buffer 34 is used to fetch the next instruction from the instruction cache 22 when the instruction fetch decoder 24 decodes the return address, as described in detail below. . The pipeline 20 then continues to operate based on the next instruction without forming a bubble on itself. If the expected return address is finally found to be incorrect, the instruction sequence is aborted and execution continues with the correct instruction sequence.

【００１９】リングバッファ３４は、例えば、深さ８の
バッファであり、各位置に予想される復帰アドレスが記
憶される。上記で述べたように、命令キャッシュ２２は
プログラムカウンタバッファ２１からプログラムカウン
ト（ＰＣ）を受け取って命令を探索する。このＰＣは加
算器３８にも送られ、該加算器はＰＣに値１を加える。
この値ＰＣ＋１はリングバッファ３４及びマルチプレク
サ３６に送られる。サブルーチン又は分岐を伴わない通
常の命令シーケンス中には、１つの命令の後に順次に別
の命令が続き、ＰＣ＝ＰＣ＋１となる。最初の命令に対
するプログラムカウントがＰＣである場合には、次の命
令に対するプログラムカウントがＰＣ＋１となり、そし
て第３の命令がＰＣ＋２となる。他の実施例において
は、増分値が１とは異なり、例えば、命令３００に続く
命令のアドレスが３０４であり、そしてその次が３０８
となる。この場合、ＰＣの値は４だけ変化し、従ってＰ
Ｃ＝ＰＣ＋４となる。The ring buffer 34 is, for example, a buffer having a depth of 8 and stores an expected return address at each position. As described above, the instruction cache 22 receives the program count (PC) from the program counter buffer 21 and searches for an instruction. This PC is also sent to adder 38, which adds the value 1 to PC.
This value PC + 1 is sent to the ring buffer 34 and the multiplexer 36. During a normal instruction sequence without a subroutine or branch, one instruction is followed by another instruction in sequence, PC = PC + 1. If the program count for the first instruction is PC, then the program count for the next instruction will be PC + 1 and the third instruction will be PC + 2. In another embodiment, the increment value is different than 1, eg, the address of the instruction following instruction 300 is 304 and then 308.
Becomes In this case, the value of PC changes by 4, so P
C = PC + 4.

【００２０】又、マルチプレクサ３６は命令フェッチデ
コーダ２４から入力を受け取る。一方の入力は呼び出し
命令内に含まれた呼び出されたサブルーチンアドレス
（ＣＳＡ）であり、上記呼び出し命令は命令キャッシュ
２２から受け取られてデコーダ２４でデコードされたも
のである。上記呼び出されたサブルーチンアドレスは、
デコードされた命令が呼び出し命令であるときに命令キ
ャッシュ２２に対するＰＣとしてマルチプレクサ３６か
ら選択される。命令フェッチデコーダ２４においてデコ
ードされた命令が復帰命令であり、呼び出し命令でない
場合には、ＲＰカウンタ４０を経てリングバッファ３４
に信号が送られる。ＲＰカウンタ４０は、リングバッフ
ァ３４をインデックスするリングポインタ（ＲＰ）を含
んでいる。リングバッファ３４は、ＲＰによって指示さ
れた予想される復帰アドレス（ＰＲＡ）をマルチプレク
サ３６に送る。命令フェッチデコーダ２４からの信号の
制御のもとで、マルチプレクサ３６は命令キャッシュ２
２に対するＰＣとしてＰＲＡを選択する。リングバッフ
ァ３４の動作については以下に説明する。The multiplexer 36 also receives inputs from the instruction fetch decoder 24. One input is the called subroutine address (CSA) contained in the calling instruction, which is received from the instruction cache 22 and decoded by the decoder 24. The subroutine address called above is
It is selected from the multiplexer 36 as the PC for the instruction cache 22 when the decoded instruction is a call instruction. When the instruction decoded by the instruction fetch decoder 24 is a return instruction and not a call instruction, the ring buffer 34 is passed through the RP counter 40.
Is sent to. The RP counter 40 includes a ring pointer (RP) that indexes the ring buffer 34. Ring buffer 34 sends to multiplexer 36 the expected return address (PRA) pointed to by RP. Under the control of the signal from the instruction fetch decoder 24, the multiplexer 36 operates in the instruction cache 2
Select PRA as the PC for 2. The operation of the ring buffer 34 will be described below.

【００２１】以上、ＰＣを選択するためのマルチプレク
サ３６の３つの入力について説明した。これらのうちの
最初のものはＰＣ＋１であり、これは、命令フェッチデ
コーダ２４によってデコードされた命令が呼び出し命令
でも復帰命令でもないときにマルチプレクサによって選
択される。マルチプレクサ３６への第２入力は、デコー
ドされた入力に含まれた呼び出されたサブルーチンアド
レス（ＣＳＡ）であり、これは命令フェッチデコーダ２
４によりマルチプレクサ３６へ送られたものである。Ｃ
ＳＡは、デコーダ２４が呼び出し命令をデコードしたと
きに使用される。以上に述べたマルチプレクサ３６への
第３入力は、復帰命令がデコーダ２４によってデコード
されたときにリングバッファ３４によってマルチプレク
サ３６に送られた予想される復帰アドレス（ＰＲＡ）で
ある。リングバッファ３４の動作は以下に述べる。The three inputs of the multiplexer 36 for selecting the PC have been described above. The first of these is PC + 1, which is selected by the multiplexer when the instruction decoded by instruction fetch decoder 24 is neither a call nor a return instruction. The second input to the multiplexer 36 is the called subroutine address (CSA) contained in the decoded input, which is the instruction fetch decoder 2
4 sent to the multiplexer 36. C
The SA is used when the decoder 24 decodes the calling instruction. The third input to multiplexer 36, described above, is the expected return address (PRA) sent by ring buffer 34 to multiplexer 36 when the return instruction was decoded by decoder 24. The operation of the ring buffer 34 will be described below.

【００２２】前記したように、リングバッファ３４は、
予想される復帰アドレスを記憶する限定された数のバッ
ファ位置を備えている。リングバッファ３４のバッファ
位置は、ＲＰカウンタ４０に保持されたリングポインタ
（ＲＰ）によってインデックスされる。ＲＰカウンタの
動作は簡単である。デコードされた命令が呼び出し命令
であることを示すデコーダ２４からの信号をＲＰカウン
タ４０が受け取ると、ＲＰカウンタ４０は１だけ増加さ
れ、次に高いバッファ位置を指す。式で表わすと、サブ
ルーチン呼び出しの際に、ＲＰ＝ＲＰ＋１となる。デコ
ーダ２４によってデコードされた命令が復帰命令である
ときには、ＲＰが１だけ減少される。これは、サブルー
チン復帰命令に対しＲＰ＝ＲＰ−１という式を与える。As described above, the ring buffer 34 is
It has a limited number of buffer locations to store expected return addresses. The buffer position of the ring buffer 34 is indexed by the ring pointer (RP) held in the RP counter 40. The operation of the RP counter is simple. When the RP counter 40 receives a signal from the decoder 24 indicating that the decoded instruction is a call instruction, the RP counter 40 is incremented by 1 and points to the next higher buffer position. Expressed as an expression, RP = RP + 1 when the subroutine is called. When the instruction decoded by the decoder 24 is a return instruction, RP is decreased by 1. This gives the equation RP = RP-1 for the subroutine return instruction.

【００２３】サブルーチン呼び出し命令をデコードする
際にＲＰによって指示されたリングバッファ３４内の位
置に入れられる値は、ＲＰによって指示されたリングバ
ッファ３４の位置に呼び出し命令のアドレスがロードさ
れた後のＰＣの次のアドレス値である。この値ＰＣ＋１
はＰＲＡとなる。ＰＲＡは、サブルーチン復帰命令がデ
コーダ２４によってデコードされたときにリングバッフ
ァ３４によってマルチプレクサ３６に送られる。ＲＰに
よって指示された位置にＰＣ＋１をロードしたりそこか
らＰＲＡを復帰したりすることは、デコーダ２４により
発生されたリングバッファ制御信号に基づいて行なわれ
る。The value put in the position in the ring buffer 34 designated by the RP when decoding the subroutine calling instruction is the PC after the address of the calling instruction is loaded in the position of the ring buffer 34 designated by the RP. Is the next address value of. This value PC + 1
Becomes PRA. The PRA is sent by the ring buffer 34 to the multiplexer 36 when the subroutine return instruction is decoded by the decoder 24. Loading PC + 1 into the location indicated by RP and returning PRA from it is done based on the ring buffer control signal generated by decoder 24.

【００２４】リングバッファ３４の動作例を以下に述べ
る。パイプライン２０に入る最初の命令は命令１００で
ある。これは、ＰＣ＝１００を用いて命令キャッシュ２
２内で探索される。命令はデコーダ２４においてデコー
ドされ、これはサブルーチン呼び出しでも復帰でもない
ので、デコーダ２４からの制御信号は、ＰＣ＋１となる
べき次のＰＣを選択する。この場合、ＰＣ＋１の値は１
０１であり、従って命令１０１が命令キャッシュ２２か
らデコーダ２４へ送られる。An operation example of the ring buffer 34 will be described below. The first instruction to enter pipeline 20 is instruction 100. This is instruction cache 2 using PC = 100
2 is searched. The instruction is decoded in the decoder 24, which is neither a subroutine call nor a return, so the control signal from the decoder 24 selects the next PC to be PC + 1. In this case, the value of PC + 1 is 1.
01, so instruction 101 is sent from instruction cache 22 to decoder 24.

【００２５】命令１０１はサブルーチン呼び出し（図１
参照）であり、デコーダ２４はＲＰカウンタ４０に信号
を送る。ＲＰカウンタ４０のＲＰは、例えばＲＰ＝３か
らＲＰ＝４へ増加され、リングバッファ３４内の新たな
バッファ位置ＲＰ（４）を指す。ＰＣ＋１の値、この場
合は１０１＋１＝１０２は、リングバッファ３４におい
てリングバッファ位置ＲＰ（４）に記憶される。Instruction 101 is a subroutine call (see FIG.
(See reference), and the decoder 24 sends a signal to the RP counter 40. The RP of the RP counter 40 is increased from RP = 3 to RP = 4, for example, and points to a new buffer position RP (4) in the ring buffer 34. The value of PC + 1, in this case 101 + 1 = 102, is stored in ring buffer 34 at ring buffer location RP (4).

【００２６】デコーダ２４は、呼び出し命令１０１をデ
コードする際に、制御信号をマルチプレクサ３６に送る
と共に、呼び出されたサブルーチンアドレス（ＣＳＡ）
を次のＰＣとして送る。ＣＳＡは、デコードされたサブ
ルーチン呼び出し命令１０１に含まれているので、デコ
ーダ２４によりマルチプレクサ３６に送ることができ
る。When the decoder 24 decodes the call instruction 101, it sends a control signal to the multiplexer 36 and also calls the called subroutine address (CSA).
As the next PC. Since the CSA is included in the decoded subroutine call instruction 101, it can be sent to the multiplexer 36 by the decoder 24.

【００２７】サブルーチン１２は、命令２００が命令キ
ャッシュ２２から探索されてデコードされるように実行
される。命令２００は呼び出し命令でも復帰命令でもな
いので、命令２０１が命令キャッシュ２２からフェッチ
される（ＰＣ＝ＰＣ＋１＝２０１）。同様に、パイプラ
インにおいて命令２０１の後に命令２０２が続く。然し
乍ら、命令２０２はサブルーチン復帰命令である。Subroutine 12 is executed such that instruction 200 is retrieved from instruction cache 22 and decoded. Since the instruction 200 is neither a call instruction nor a return instruction, the instruction 201 is fetched from the instruction cache 22 (PC = PC + 1 = 201). Similarly, instruction 201 is followed by instruction 202 in the pipeline. However, the instruction 202 is a subroutine return instruction.

【００２８】サブルーチン復帰命令２０２がデコーダ２
４によってデコードされるときには、潜在的に次の命令
は命令１０２又は１０５のいずれかとなる。（図１参
照）リングバッファ３４を用いると、正しい復帰アドレ
スを通常与えることができる。復帰命令をデコードする
際には、デコーダ２４がＲＰカウンタ４０に信号を送
る。ＲＰカウンタ４０に含まれたＲＰによって指示され
たＰＲＡはマルチプレクサ３６に送られる。この例にお
いて、ＲＰ（４）に記憶された命令１０２に関連したＰ
ＲＡはマルチプレクサ３６に送られる。デコーダ２４は
マルチプレクサ３６に制御信号を送ってこのマルチプレ
クサがＰＲＡを次のＰＣとして選択するようにする。従
って、送られたＰＲＡを使用し、命令１０２が命令キャ
ッシュ２２から送られ、デコーダ２４によってデコード
される。又、ＲＰカウンタ４０内のＲＰは減少され、新
たにＲＰ（３）を指す。The subroutine return instruction 202 is the decoder 2
When decoded by 4, the potentially next instruction is either instruction 102 or 105. Using the ring buffer 34 (see FIG. 1), the correct return address can usually be provided. When decoding the return instruction, the decoder 24 sends a signal to the RP counter 40. The PRA designated by the RP included in the RP counter 40 is sent to the multiplexer 36. In this example, P associated with instruction 102 stored in RP (4)
RA is sent to multiplexer 36. Decoder 24 sends a control signal to multiplexer 36 to cause it to select PRA as the next PC. Therefore, using the PRA sent, the instruction 102 is sent from the instruction cache 22 and decoded by the decoder 24. Also, the RP in the RP counter 40 is decremented and newly points to RP (3).

【００２９】以上、ＰＲＡが正しいときのパイプライン
２０の動作について説明した。然し乍ら、或る状態にお
いては、ＰＲＡが正しくない。例えば、これは、サブル
ーチンが呼び出し命令の直後の命令ではなくて主たる流
れ１０の別の位置への復帰を生じさせる場合に起こるこ
とがある。ＰＲＡが間違っていることは、復帰命令が完
全にパイプライン２０を通過して実行されてしまうまで
分からない。復帰命令に対する実際の復帰アドレスが分
かるのはその時点である。その間に、復帰命令に続く多
数の命令が入力され、パイプライン２０の種々の段に入
れられる。パイプライン２０は、実際の復帰アドレスが
予想される復帰アドレスとは異なっており、修正手段を
とることを確認しなければならない。The operation of the pipeline 20 when the PRA is correct has been described above. However, in some situations the PRA is incorrect. For example, this may occur if the subroutine causes a return of the main stream 10 to another location rather than the instruction immediately following the calling instruction. The PRA is wrong until the return instruction is completely executed through the pipeline 20. It is at that point that the actual return address for the return instruction is known. In the meantime, a number of instructions following the return instruction are input and placed in various stages of pipeline 20. It must be ensured that the pipeline 20 takes corrective measures as the actual return address is different from the expected return address.

【００３０】実際の復帰アドレス（ＡＲＡ）はパイプラ
インの端から比較ユニット４２へ送られる。比較ユニッ
ト４２はＡＲＡを受け取るのと同時に、ＰＲＡも受け取
る。ＰＲＡは一連の遅延ラッチ４４を経て比較ユニット
４２へ送られる。ＰＲＡ及びＡＲＡが比較される。比較
が不一致である場合には、不一致比較信号がデコーダ２
４及び発行論理回路２８に送られる。この不一致比較信
号により、復帰命令の後にパイプライン２０に入った全
ての命令がパイプライン２０からフラッシングされる。
マルチプレクサ３６はＡＲＡをその第４入力に受け取
り、従って命令キャッシュ２２へ送られるＰＣは不一致
比較が確認された後にこのＡＲＡになる。次いで、実際
の復帰アドレス（ＡＲＡ）にある命令で始まる新たな一
連の命令がパイプライン２０によって処理される。The actual return address (ARA) is sent to the compare unit 42 from the end of the pipeline. At the same time the comparison unit 42 receives the ARA, it also receives the PRA. The PRA is sent to the comparison unit 42 via a series of delay latches 44. PRA and ARA are compared. If the comparisons do not match, the mismatch comparison signal is the decoder 2
4 and issue logic 28. By this mismatch comparison signal, all the instructions that have entered the pipeline 20 after the return instruction are flushed from the pipeline 20.
Multiplexer 36 receives the ARA on its fourth input, so the PC sent to instruction cache 22 becomes this ARA after a mismatch comparison is confirmed. The pipeline 20 then processes a new series of instructions beginning with the instruction at the actual return address (ARA).

【００３１】リングバッファ３４は限定された数のバッ
ファ位置しか有していない。これにより、予想違いの復
帰アドレスがリングバッファ３４によって与えられるこ
とがある。例えば、リングバッファ３４が８個の位置Ｒ
Ｐ（０）−ＲＰ（７）を有していると仮定する。８個の
サブルーチンが呼び出され、復帰がなされない場合に
は、リングバッファ３４がいっぱいになる。９番目のサ
ブルーチン呼び出しが行なわれると、第１のリングバッ
ファ位置ＲＰ（０）が標準リングバッファ方式でオーバ
ーライトされる。その前にＲＰ（０）にあったＰＲＡ
は、この位置に新たなＰＲＡがオーバーライトされるこ
とにより本質的に失なわれる。従って、９個のサブルー
チン復帰が発生された場合には、呼び出された最初のサ
ブルーチンに対する正しいサブルーチン復帰アドレスが
ＲＰ（０）にもはや見つからない。そうではなくて、Ｒ
Ｐ（０）に記憶されたＰＲＡの新たな値がＰＲＡとして
発生される。このＰＲＡが間違っていることは、ＰＲＡ
とＡＲＡとの比較によって最終的に決定される。この比
較不一致が確認された際に前記の適当な修正手段がとら
れる。Ring buffer 34 has only a limited number of buffer positions. As a result, a wrong return address may be given by the ring buffer 34. For example, the ring buffer 34 has eight positions R
Suppose we have P (0) -RP (7). If eight subroutines are called and no return is made, the ring buffer 34 will be full. When the ninth subroutine call is made, the first ring buffer position RP (0) is overwritten by the standard ring buffer method. PRA that was in RP (0) before that
Is essentially lost by overwriting a new PRA in this position. Thus, if nine subroutine returns have occurred, the correct subroutine return address for the first subroutine called is no longer found in RP (0). R, not
The new value of PRA stored in P (0) is generated as PRA. What is wrong with this PRA is
And ARA are finally determined. When this comparison mismatch is confirmed, the appropriate corrective measures described above are taken.

【００３２】パイプライン式コンピュータによるプログ
ラムの実行中に、パイプラインがその全ての段において
全ての命令を中止しなければならない事象が生じること
がある。これはパイプラインを“フラッシングする”と
も称する。このときパイプライン２０において実行され
ている全ての作業は排除される。これらの排除された命
令を“シャドー（Shadow)"と称する。典型的に、パイプ
ラインの前端は或るエラー処理アドレスにおいて命令の
実行を開始する。During the execution of a program by a pipelined computer, an event may occur in which the pipeline must abort all instructions in all its stages. This is also referred to as "flushing" the pipeline. At this time, all the work being executed in the pipeline 20 is eliminated. These excluded instructions are called "Shadow". Typically, the front end of the pipeline begins executing instructions at some error handling address.

【００３３】説明上、パイプラインをフラッシュする事
象を“トラップ”と称する。これらのトラップの例は、
予想違い分岐、仮想メモリページ欠陥、リソースビジ
ー、及びハードウエアのパリティエラーである。サブル
ーチン復帰予想機構の性能向上を計るために、リングバ
ッファ３４は、トラップを生じた命令がパイプライン２
０のリングバッファ段を通過したときに該バッファがあ
ったポイントに対してバックアップされる必要がある。
トラップが生じないことが分かるまでリングバッファ３
４に対して生じる各々の変化のコピーを保持するので
は、あまりに経費のかゝるやり方である。For purposes of explanation, the event of flushing the pipeline is called a "trap". Examples of these traps are
Mispredicted branches, virtual memory page defects, resource busy, and hardware parity errors. In order to improve the performance of the subroutine return prediction mechanism, the ring buffer 34 stores the instruction causing the trap in the pipeline 2
It needs to be backed up to the point where the buffer was when it passed through the 0 ring buffer stage.
Ring buffer 3 until no trap is found
Keeping a copy of each change that occurs for 4 is an overly costly way.

【００３４】図３の実施例では、ＣＲＰカウンタ４５に
保持される別のリングポインタ、“確認リングポイン
タ”即ちＣＲＰを設けることにより、この問題が解消さ
れる。ＣＲＰはＲＰと同様に変化し、即ちサブルーチン
呼び出し命令が見られたときに増加されそしてサブルー
チン復帰命令が見られたときに減少される。その相違
は、ＣＲＰカウンタ４５がパイプライン２０の最終段に
到達する命令を監視することである。最終段にある命令
に基づいてＣＲＰに変更が加えられる。トラップが生じ
ると、ＲＰカウンタ４０のＲＰはＣＲＰカウンタ４５の
ＣＲＰの値にセットされる。In the embodiment of FIG. 3, this problem is solved by providing another ring pointer, the "confirmation ring pointer" or CRP, held in the CRP counter 45. CRP changes similarly to RP, ie it is increased when a subroutine call instruction is seen and decreased when a subroutine return instruction is seen. The difference is that the CRP counter 45 monitors the instructions that reach the final stage of the pipeline 20. Changes are made to the CRP based on the instructions in the last stage. When the trap occurs, the RP of the RP counter 40 is set to the CRP value of the CRP counter 45.

【００３５】これは、トラップの後に新たな命令が実行
され始めたときにＲＰに正しい同期を保持させる。リン
グバッファ３４の入力を書き込んだトラップのシャドー
に或るサブルーチン呼び出し命令が生じることがあり、
サブルーチン復帰命令のその後の予想が間違ったものと
なる。然し乍ら、ＲＰは同期しており、ＲＰが間違った
入力を通過するや否や、リングバッファ３４は再び良好
な状態となる。This causes the RP to maintain the correct synchronization when a new instruction begins executing after the trap. A certain subroutine call instruction may occur in the shadow of the trap in which the input of the ring buffer 34 is written,
Subsequent expectations of the subroutine return instruction will be incorrect. However, RP is in sync and as soon as RP passes the wrong input, ring buffer 34 is in good condition again.

【００３６】本発明は以上に述べた実施例に限定される
ものではなく、色々なサイズのリングバッファをもつ色
々なサイズのパイプラインに適用できる。The present invention is not limited to the above-described embodiments, but can be applied to pipelines of various sizes having ring buffers of various sizes.

[Brief description of drawings]

【図１】主たる流れ及びサブルーチンを有するプログラ
ムのブロック図である。FIG. 1 is a block diagram of a program having a main flow and a subroutine.

【図２】本発明の実施例を用いたパイプラインのブロッ
ク図である。FIG. 2 is a block diagram of a pipeline using an embodiment of the present invention.

【図３】本発明の別の実施例を用いたパイプラインのブ
ロック図である。FIG. 3 is a block diagram of a pipeline using another embodiment of the present invention.

[Explanation of symbols]

１０……命令の主たる流れ、１２……サブルーチン、１００〜１０７，２００〜２０２……命令、２０……パイプライン、２１……プログラムカウンタバッファ、２２……命令キャッシュ、２４……命令フェッチデコーダ、２６……命令バッファ、２８……発行論理回路、３２……実行段、３４……リングバッファ、３６……マルチプレクサ、４０……ＲＰカウンタ。 10 ... Main flow of instruction, 12 ... Subroutine, 100-107, 200-202 ... Instruction, 20 ... Pipeline, 21 ... Program counter buffer, 22 ... Instruction cache, 24 ... Instruction fetch decoder, 26 ... Instruction buffer, 28 ... Issuing logic circuit, 32 ... Execution stage, 34 ... Ring buffer, 36 ... Multiplexer, 40 ... RP counter.

Claims

[Claims]

1. A structure for generating an expected subroutine return address in response to an input of a subroutine return instruction in a computer pipeline, wherein a subroutine call instruction is incremented as it enters the computer pipeline and the subroutine return instruction is A ring pointer counter containing a ring pointer that is decremented upon entering the computer pipeline; and a ring buffer connected to the ring pointer counter and having a buffer location and an input and an output. The buffer stores the value at the input when the subroutine call instruction enters the computer pipeline into the buffer location pointed to by the ring pointer, and when the subroutine return instruction enters the computer pipeline. Gives the value of the output from the indicated buffer position pointer, the value of the output is structure, which is a subroutine return address to be expected above.

2. A comparison unit connected to the ring buffer for comparing the actual return address generated by the computer pipeline in response to processing the return instruction with the expected return address for the return instruction. 2. The structure of claim 1, wherein the unit has an output that provides a compare mismatch signal when the actual return address is not the same as the expected return address.

3. An instruction cache storing coded instructions having an input for receiving a program count indexing the coded instructions and an output provided with the indexed coded instructions. And an instruction fetch decoder that has an input connected to the output of the instruction cache and decodes a coded instruction, and the coded instruction is a subroutine call address when the coded instruction is a subroutine call instruction. And a multiplexer control signal, a ring pointer counter control signal, and an instruction fetch decoder having as output the decoded instruction. The instruction fetch decoder is connected to an execution stage and is decoded. Instructions are executed and input to the above instruction cache A gram counter is connected, the counter having an output for providing a program count to the input of the instruction cache and an input, and further having a plurality of inputs and a control input connected to the multiplexer control signal output of the instruction fetch decoder. , A multiplexer having an output connected to the input of the program counter, the adder having its input connected to the output of the program counter and the output of the adder being the program count plus one. Connected to one of the multiplexer inputs provided with an equal value, and further provided with a ring pointer counter, the input of which is connected to the instruction fetch decoder to receive a ring pointer counter control signal,
The ring pointer counter includes a ring pointer pointing to a buffer position in response to a ring pointer counter control signal, the ring pointer being incremented when the instruction fetch decoder decodes a subroutine call instruction and the instruction fetch decoder A ring buffer is provided which is reduced when decoding a subroutine return instruction and which further has an input connected to the output of the adder, a plurality of buffer locations and an output connected to one of the multiplexer inputs. This ring buffer stores the above value in the buffer position designated by the ring pointer when the subroutine call instruction is decoded, and is designated by the ring pointer when the subroutine return instruction is decoded. Supplied from Ffa position the value in the ring buffer output, a computer pipeline, wherein the value of the ring buffer output is a subroutine return address to be expected.

4. A comparison unit connected to the ring buffer, wherein the actual return address generated by the computer pipeline in response to processing the return instruction is the expected return address for that return address. 4. The pipeline according to claim 3, further comprising a comparing unit for comparing, the output of the unit being provided with a compare mismatch signal when the actual return address is not the same as the expected return address.

5. Further comprising means for processing the correct sequence of instructions starting with the instruction pointed to by the actual return address when the comparison mismatch signal indicates a mismatch between the expected return address and the actual return address. The pipeline according to claim 4, characterized in that

6. A confirmation ring pointer counter connected to the execution stage and the ring pointer counter, wherein the confirmation ring pointer contained in the counter is incremented when the execution stage receives a subroutine call instruction and the execution stage is 4. The pipeline of claim 3, wherein the ring pointer counter is provided with a confirmation ring pointer which is decremented upon receipt of a subroutine return instruction and which further replaces the ring pointer when a trap occurs. .

7. A method for predicting a subroutine return address in a pipeline computer, comprising: 1 in response to a call instruction, at the address of the call instruction.
Store a value equal to the sum of the values in one of a plurality of buffer positions in the ring buffer, indicate the buffer position containing the most recently stored value, and respond the return instruction with the above instruction for the most recently stored value. And the output is an expected subroutine return address, and then indicates the buffer location containing the most recently stored value.