JPH07319693A

JPH07319693A - Pipeline processing method

Info

Publication number: JPH07319693A
Application number: JP11621394A
Authority: JP
Inventors: Yuji Suzuki; 裕司鈴木; Kakuji Saitou; 拡二斎藤; Genichi Takeda; 元一武田; Kenji Matsubara; 健二松原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-05-30
Filing date: 1994-05-30
Publication date: 1995-12-08

Abstract

(57)【要約】【目的】命令をパイプライン処理するパイプライン処理
装置において、ロード命令と該ロード命令の次に実行す
べき命令との間にデータ依存関係がある場合に、次命令
の演算待ち時間を短縮する。【構成】ロード命令のキャッシュのヒット判定が判明す
る前にキャッシュからのデータをロード命令の次に実行
する命令に送り先行して演算を実行させる。ロード命令
がキャッシュミスとなった場合には次命令を演算ステー
ジの前のステージに戻す。次命令を演算ステージの前の
ステージに戻す動作はロード命令のキャッシュミスによ
って発生するブロック転送中に行う。【効果】本発明によればロード命令の次の命令の演算待
ち時間が短縮されるのでパイプライン処理装置の処理速
度を向上させることができる。またデータ読み出しステ
ージとヒット判定が判明するステージが離れた場合でも
ペナルティ増加とはならず処理速度の低下を防ぐことが
できる。 (57) [Abstract] [Purpose] In a pipeline processing device that pipelines an instruction, if a load instruction and an instruction to be executed next to the load instruction have a data dependency relationship, the operation of the next instruction is performed. Reduce waiting time. [Structure] Before the cache hit judgment of the load instruction is known, the data from the cache is sent to the instruction to be executed next to the load instruction to execute the operation in advance. When the load instruction causes a cache miss, the next instruction is returned to the stage before the operation stage. The operation of returning the next instruction to the stage before the operation stage is performed during the block transfer generated by the cache miss of the load instruction. According to the present invention, since the operation waiting time of the instruction following the load instruction is shortened, the processing speed of the pipeline processing device can be improved. Further, even when the data read stage and the stage where the hit judgment is known are separated, the penalty does not increase and the processing speed can be prevented from decreasing.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、各命令を複数のステー
ジに分割してパイプライン処理するパイプライン処理装
置において、ロード命令により読み出されたデータを他
の命令に高速で渡して処理するパイプライン処理装置に
関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention, in a pipeline processing apparatus for dividing each instruction into a plurality of stages and performing pipeline processing, passes data read by a load instruction to other instructions at high speed for processing. The present invention relates to a pipeline processing device.

【０００２】[0002]

【従来の技術】従来のパイプライン処理装置では、各命
令は、命令読み出しステージ（ＩＦステージ）、命令デ
コードステージ（Ｄステージ）、命令演算ステージ（Ｅ
ステージ）、メモリアクセスステージ（Ａステージ）、
命令キャンセルステージ（Ｎステージ）、書き込みステ
ージ（Ｗステージ）の６つのステージに分割されて処理
される。2. Description of the Related Art In a conventional pipeline processing apparatus, each instruction has an instruction read stage (IF stage), an instruction decode stage (D stage), and an instruction operation stage (E).
Stage), memory access stage (A stage),
The processing is divided into six stages, that is, an instruction cancel stage (N stage) and a write stage (W stage).

【０００３】図８に従来のパイプライン処理装置の構成
図を示す。FIG. 8 shows a block diagram of a conventional pipeline processing apparatus.

【０００４】各ステージにはそれぞれのステージのアド
レス、データ及命令を保持する保持回路２、３、４、
５、６、７、１４、１６、１９、１０、２６を備え、Ｉ
Ｆステージにはプログラムのカウントを行うプログラム
カウンタ８を備え、Ｄステージには先行命令とのデータ
依存関係を検出する検出回路２４を備え、Ｅステージに
は命令の演算を行う演算装置１５を備え、Ｎステージに
はデータキャッシュ１７のヒット判定を行う判定回路２
０を備えている。命令はＩＦステージで命令キャッシュ
メモリ９から読み出しが行われ、データはＡステージで
データキャッシュメモリ１７から読み出しが行われる。
またレジスタファイル２１からの読み出しはＤステージ
で行われ、レジスタファイル２１への書き込みはＷステ
ージで行われる。さらにＮステージとＤステージの間に
はデータキャッシュ１７から読み出されたデータを送る
バイパス１１９が設けられている。Each stage has a holding circuit 2, 3, 4, for holding the address, data and instruction of each stage.
5, 6, 7, 14, 16, 19, 10, 26, and I
The F stage is provided with a program counter 8 for counting a program, the D stage is provided with a detection circuit 24 for detecting a data dependency relationship with a preceding instruction, and the E stage is provided with an arithmetic unit 15 for performing instruction arithmetic. The determination circuit 2 for determining the hit of the data cache 17 in the N stage
It has 0. The instruction is read from the instruction cache memory 9 in the IF stage, and the data is read from the data cache memory 17 in the A stage.
Further, reading from the register file 21 is performed at the D stage, and writing to the register file 21 is performed at the W stage. Further, a bypass 119 for sending the data read from the data cache 17 is provided between the N stage and the D stage.

【０００５】図２の命令列は、まず第１の命令であるロ
ード命令がレジスタｘのデータとレジスタｂのデータを
加算し、該加算結果をデータキャッシュの読み出しアド
レスとして該アドレスにあるデータをレジスタｒ１に読
み出し、次に第２の命令である加算命令がレジスタｒ１
のデータとレジスタｒ２のデータを加算し、該加算結果
をレジスタｔに書き込むという動作を行うものである。
ここで第１の命令と第２の命令はレジスタｒ１を介して
データの依存関係がある。In the instruction sequence shown in FIG. 2, first, a load instruction, which is the first instruction, adds the data in register x and the data in register b, and the addition result is used as the read address of the data cache to register the data at that address. Then, the second instruction, that is, the add instruction, is read to the register r1.
And the data in the register r2 are added, and the addition result is written in the register t.
Here, the first instruction and the second instruction have a data dependency relationship via the register r1.

【０００６】図９には従来技術に従うパイプライン処理
装置において、図２の命令列が処理される場合に第１の
命令であるロード命令がキャッシュヒットした場合のタ
イムチャートが示してあり、図１０には第１の命令であ
るロード命令がキャッシュミスした場合のタイムチャー
トが示してある。FIG. 9 shows a time chart in the case where the load instruction which is the first instruction causes a cache hit when the instruction sequence of FIG. 2 is processed in the pipeline processing apparatus according to the prior art, and FIG. Shows a time chart when the load instruction which is the first instruction causes a cache miss.

【０００７】以下、図面に基づいて図２の命令列を例と
して従来のパイプライン処理装置の具体的動作について
示す。The concrete operation of the conventional pipeline processing apparatus will be described below with reference to the drawings by taking the instruction sequence of FIG. 2 as an example.

【０００８】まず第１の命令であるロード命令がＩＦス
テージで命令キャッシュ９から読み出される。次に第１
の命令であるロード命令はＤステージでレジスタファイ
ル２１からレジスタｘのデータとレジスタｂのデータを
それぞれデータ線１３１、１１０に読み出し、選択回路
１２、１３でそれぞれデータ線１３１、１１０を選択し
データ線１１１、１１２に出力する。このとき第２の命
令である加算命令がＩＦステージで命令キャッシュ９か
ら読み出される。第１の命令であるロード命令はＥステ
ージでデータ１１１、１１２を用いて演算装置１５で加
算を実行し、データキャッシュの読み出しアドレス１１
５を出力する。このとき第２の命令である加算命令はＤ
ステージでレジスタファイル２１からレジスタｒ１のデ
ータとレジスタｒ２のデータをそれぞれデータ線１３
１、１１０に読み出している。ここで図２に示すように
第１の命令であるロード命令と第２の命令である加算命
令との間ではレジスタｒ１を介してデータの依存関係が
あるので、検出回路２４でデータ依存関係があることが
検出される。依存関係があることが検出されると第２の
命令である加算命令は、この時点では第１の命令である
ロード命令によってレジスタｒ１はまだ更新されていな
いので、レジスタｒ１のデータが読み出されているデー
タ線１３１には古いデータが出力されていることになる
ため、命令演算ステージであるＥステージには送られず
直前のＤステージでステージロック信号１２９によって
保持される。第１の命令であるロード命令はデータ依存
関係の有無にかかわらず引き続き処理され、Ａステージ
でＥステージからの出力結果１１５を用いて、データキ
ャッシュ読み出しアドレス１１６としデータキャッシュ
１７からデータ１１７を読み出し、選択回路２７でデー
タ１１７を選択しＮステージデータ保持回路１９に送
り、またデータアドレスタグ１８とデータＴＬＢ２８か
らはそれぞれ物理アドレス１１８、１３３を読み出し、
第１の命令であるロード命令のキャッシュのヒット判定
を行うヒット判定回路２０まで送る。First, the load instruction which is the first instruction is read from the instruction cache 9 in the IF stage. Then the first
In the D stage, the load instruction, which is the instruction of, reads out the data of the register x and the data of the register b from the register file 21 to the data lines 131 and 110, respectively, and selects the data lines 131 and 110 by the selection circuits 12 and 13, respectively. Output to 111 and 112. At this time, the add instruction which is the second instruction is read from the instruction cache 9 in the IF stage. The load instruction, which is the first instruction, performs addition in the arithmetic unit 15 using the data 111 and 112 at the E stage, and the read address 11 of the data cache.
5 is output. At this time, the addition instruction which is the second instruction is D
At the stage, the data of the register r1 and the data of the register r2 are respectively transferred from the register file 21 to the data line 13
1, 110 are read. Here, as shown in FIG. 2, since there is a data dependency relationship between the load instruction which is the first instruction and the addition instruction which is the second instruction via the register r1, the data dependency relationship is detected in the detection circuit 24. Is detected. When it is detected that there is a dependency, the add instruction which is the second instruction reads the data in the register r1 because the register r1 has not been updated by the load instruction which is the first instruction at this point. Since the old data has been output to the data line 131 that is in operation, the data is not sent to the E stage, which is the instruction operation stage, and is held by the stage lock signal 129 at the immediately preceding D stage. The load instruction, which is the first instruction, is continuously processed regardless of the presence or absence of data dependency, and the output result 115 from the E stage is used in the A stage to read the data 117 from the data cache 17 as the data cache read address 116. The selection circuit 27 selects the data 117 and sends it to the N stage data holding circuit 19, and reads the physical addresses 118 and 133 from the data address tag 18 and the data TLB 28, respectively.
It is sent to the hit determination circuit 20 which determines the cache hit of the load instruction which is the first instruction.

【０００９】ここでまず第１の命令であるロード命令が
Ｎステージでヒット判定回路２０によってキャッシュヒ
ットと判明した場合には、ステージロック信号１２９は
解除され、Ｎステージにはキャッシュメモリ１７から有
効なデータが読み出されていることになる。第１の命令
であるロード命令は該有効データを直ちにＮステージか
らバイパス１１９を通してＤステージに送る。また該ロ
ード命令はレジスタｒ１に有効データ１１９をＷステー
ジで書き込む。第２の命令である加算命令は、Ｎステー
ジからレジスタｒ１に相当する有効なデータ１１９が送
られてくると、送られてきた該データ１１９を選択回路
１２で選択しデータ線１１１に出力するとともに、すで
にＤステージでレジスタファイル２１から読み出されて
いるレジスタｒ２のデータ１１０を選択回路１３で選択
しデータ線１１２に出力する。第２の命令である加算命
令はＥステージで、Ｄステージからの出力データ１１
１、１１２を用いて演算装置１５で加算を実行し、加算
結果１１５を出力する。そしてＮステージで加算結果で
あるデータ１１６を選択回路２７で選択しＷステージで
レジスタｔに該加算結果を書き込む。Here, when the load instruction which is the first instruction is found to be a cache hit by the hit determination circuit 20 in the N stage, the stage lock signal 129 is released, and the cache memory 17 is valid for the N stage. The data has been read. The first instruction, the load instruction, immediately sends the valid data from the N stage to the D stage through the bypass 119. The load instruction also writes the valid data 119 to the register r1 at the W stage. When the valid data 119 corresponding to the register r1 is sent from the N stage, the add instruction which is the second instruction selects the sent data 119 by the selection circuit 12 and outputs it to the data line 111. , The data 110 of the register r2 already read from the register file 21 at the D stage is selected by the selection circuit 13 and output to the data line 112. The add instruction, which is the second instruction, is the E stage and the output data 11 from the D stage.
The arithmetic unit 15 performs addition using 1 and 112, and outputs the addition result 115. Then, at the N stage, the data 116 as the addition result is selected by the selection circuit 27, and at the W stage, the addition result is written in the register t.

【００１０】従来技術に従うパイプライン処理装置にお
いて第１の命令であるロード命令がキャッシュヒットし
た場合、第２の命令である加算命令は２サイクルペナル
ティとなる。When a load instruction which is the first instruction causes a cache hit in the pipeline processing device according to the conventional technique, the addition instruction which is the second instruction has a penalty of two cycles.

【００１１】一方第１の命令であるロード命令がＮステ
ージでキャッシュミスと判明した場合には、Ｎステージ
にはキャッシュメモリ１７から無効なデータが読み出さ
れているので、該ロード命令は有効なデータをＮステー
ジに読み出すためにブロック転送を行い、ブロック転送
中はステージロック信号１２６によりＮステージで保持
される。第２の命令である加算命令は、ブロック転送に
よって有効なデータがＮステージに読み出されるまでス
テージロック信号１２９により引き続きＤステージで保
持される。キャッシュミスが判明してから複数サイクル
の後ブロック転送が終了すると、各ステージロック信号
１２６、１２９は解除され、第１の命令であるロード命
令が存在するＮステージには有効なデータが読み出され
ている。第１の命令であるロード命令は有効なデータが
Ｎステージに読み出されると、該有効データをバイパス
１１９を通してＮステージからＤステージに送る。また
該ロード命令はＷステージでレジスタｒ１に有効データ
１１９を書き込む。第２の命令である加算命令は、Ｎス
テージからＤステージに有効なデータ１１９が送られて
くると、送られてきた該データ１１９を選択回路１２で
選択しデータ線１１１に出力するとともに、すでにＤス
テージでレジスタファイル２１から読み出されているレ
ジスタｒ２のデータ１１０を選択回路１３で選択しデー
タ線１１２に出力する。第２の命令である加算命令はＥ
ステージで、Ｄステージからの出力データ１１１、１１
２を用いて演算装置１５で加算を実行し、加算結果１１
５を出力する。そしてＮステージで加算結果であるデー
タ１１６を選択回路２７で選択しＷステージでレジスタ
ｔに該加算結果を書き込む。On the other hand, when the load instruction, which is the first instruction, is found to be a cache miss in the N stage, invalid data is read from the cache memory 17 in the N stage, so the load instruction is valid. Block transfer is performed to read data to the N stage, and the block is held in the N stage by the stage lock signal 126 during the block transfer. The add instruction, which is the second instruction, is continuously held in the D stage by the stage lock signal 129 until valid data is read to the N stage by the block transfer. When the block transfer is completed after a plurality of cycles after the cache miss is found, the stage lock signals 126 and 129 are released, and valid data is read out to the N stage where the load instruction which is the first instruction exists. ing. The first instruction, the load instruction, sends valid data from the N stage to the D stage through the bypass 119 when valid data is read to the N stage. The load instruction writes valid data 119 in the register r1 at the W stage. When the valid data 119 is sent from the N stage to the D stage, the add instruction, which is the second instruction, selects the sent data 119 by the selection circuit 12 and outputs it to the data line 111. The selection circuit 13 selects the data 110 of the register r2 read from the register file 21 in the D stage and outputs it to the data line 112. The second addition instruction is E
In the stage, output data 111, 11 from the D stage
2 is used to perform addition in the arithmetic unit 15, and the addition result 11
5 is output. Then, at the N stage, the data 116 as the addition result is selected by the selection circuit 27, and at the W stage, the addition result is written in the register t.

【００１２】従来技術に従うパイプライン処理装置にお
いて第１の命令であるロード命令がキャッシュミスした
場合、第２の命令である加算命令は２サイクル＋ブロッ
ク転送サイクルペナルティとなる。When a load instruction which is the first instruction causes a cache miss in the pipeline processing apparatus according to the prior art, the addition instruction which is the second instruction has a penalty of 2 cycles + block transfer cycle.

【００１３】一般にデータキャッシュのデータ読み出し
とデータキャッシュのヒット判定ではデータキャッシュ
のヒット判定の方が遅い。例えばダイレクトマップのデ
ータキャッシュ１７ではデータ１１７を読み出すと同時
に、データアドレスタグ１８からデータ１１７が存在し
たアドレス１１８を読み出し該アドレス１１８を用いて
ヒット判定を行う。該ヒット判定を行うヒット判定回路
２０においてヒット判定結果１２３を出力するまでに時
間がかかるため、データキャッシュ１７のデータ読み出
し１１７とデータキャッシュ１７のヒット判定１２３で
はデータキャッシュ１７のヒット判定１２３の方が遅く
なる。Generally, in the data read of the data cache and the hit judgment of the data cache, the hit judgment of the data cache is slower. For example, in the direct map data cache 17, at the same time as reading the data 117, the address 118 where the data 117 existed is read from the data address tag 18 and the hit determination is performed using the address 118. Since it takes a long time to output the hit determination result 123 in the hit determination circuit 20 that performs the hit determination, the data read 117 of the data cache 17 and the hit determination 123 of the data cache 17 are better in the hit determination 123 of the data cache 17. Become slow.

【００１４】上述した従来例はヒット判定の方がデータ
読み出しよりも１サイクル遅い場合を示している。The above-mentioned conventional example shows a case where the hit determination is one cycle later than the data reading.

【００１５】しかし近年では、さらに高速化されたパイ
プライン処理装置が求められている。パイプライン処理
装置の高速化が行われると１サイクルの時間が短くな
り、ヒット判定を行うヒット判定回路のディレイがさら
に厳しくなることになり、１サイクルではヒット判定結
果を出力することができず、ヒット判定結果判明までに
２サイクル以上遅くなることが考えられる。ここでヒッ
ト判定の方がデータ読み出しよりも２サイクル遅い場合
には単純に第２の命令である加算命令のペナルティは１
サイクル増加する。However, in recent years, there has been a demand for a pipeline processing device which has a higher speed. If the pipeline processing device is speeded up, the time for one cycle is shortened, the delay of the hit determination circuit for performing the hit determination becomes more severe, and the hit determination result cannot be output in one cycle. It may be delayed by two cycles or more until the hit determination result is known. Here, if the hit determination is two cycles later than the data read, the penalty of the addition instruction, which is the second instruction, is 1
Cycle increase.

【００１６】図１３に従来技術に従ったパイプライン処
理装置において、ヒット判定の方がデータ読み出しより
も２サイクル遅くなった場合のキャッシュヒット時のタ
イムチャートを示す。図１３に示すように、このときの
第２の命令である加算命令は３サイクルペナルティとな
る。図１４にはキャッシュミス時のタイムチャートを示
す。図１４に示すように、このときの加算命令は３サイ
クル＋ブロック転送サイクルペナルティとなる。FIG. 13 shows a time chart at the time of a cache hit in the pipeline processor according to the prior art when the hit determination is delayed by two cycles from the data read. As shown in FIG. 13, the add instruction, which is the second instruction at this time, has a 3-cycle penalty. FIG. 14 shows a time chart at the time of a cache miss. As shown in FIG. 14, the addition instruction at this time is 3 cycles + block transfer cycle penalty.

【００１７】[0017]

【発明が解決しようとする課題】図２に示すような命令
列のキャッシュヒット時のペナルティは性能上最も重要
である。上記従来例ではキャッシュヒット時のペナルテ
ィは２サイクルペナルティであるがこれをさらに低減す
ることが必要である。The penalty at the time of a cache hit of an instruction string as shown in FIG. 2 is the most important in terms of performance. In the above-mentioned conventional example, the penalty at the time of a cache hit is a 2-cycle penalty, but it is necessary to further reduce this penalty.

【００１８】また上記従来例では、処理の高速化などに
よりデータキャッシュのデータ読み出しを行うステージ
とヒット判定を行うステージが離れるとキャッシュヒッ
ト時のペナルティが増加し性能低下につながる。Further, in the above-mentioned conventional example, if the stage for reading the data from the data cache and the stage for making the hit determination are separated from each other due to speeding up of the processing, the penalty at the time of cache hit increases and the performance deteriorates.

【００１９】本発明は上述のような事情に鑑みてなされ
たものであり、第２の命令が過剰に保持されることを排
除するパイプライン処理装置を提供することを目的とす
る。The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a pipeline processing device that eliminates excessive holding of a second instruction.

【００２０】[0020]

【課題を解決するための手段】請求項１の発明は、第１
の命令であるロード命令と次に実行すべき第２の命令と
の間にデータの依存関係があった場合、第１の命令であ
るロード命令によってデータキャッシュから読み出され
たデータを、第１の命令であるロード命令のキャッシュ
のヒット判定が判明するステージよりも前のステージか
ら演算ステージの前のステージに送るものである。具体
的には第１の命令であるロード命令によって読み出され
たデータをキャッシュのヒット判定が判明する前に演算
ステージの前のステージに送るバイパスと該バイパスを
通して送られてきたデータを演算ステージに送り演算さ
せるように制御する制御手段とを備える構成とする。The invention according to claim 1 is the first
If there is a data dependency between the load instruction that is the first instruction and the second instruction that is to be executed next, the data read from the data cache by the load instruction that is the first instruction is This instruction is sent from the stage prior to the stage where the cache hit determination of the load instruction is clear to the stage before the operation stage. Specifically, a bypass that sends the data read by the load instruction, which is the first instruction, to the stage before the operation stage before the hit judgment of the cache becomes clear, and the data sent through the bypass to the operation stage. And a control means for controlling so as to perform a feed calculation.

【００２１】請求項２の発明は、第１の命令であるロー
ド命令がキャッシュミスと判明した場合、あらかじめ第
１の命令であるロード命令はバイパスを通してデータを
第２の命令に対し送っているので、演算ステージ以降ま
で送られてしまった第２の命令を、演算ステージより前
のステージから再実行するものである。具体的には、請
求項１の構成に、第１の命令であるロード命令がキャッ
シュミスとなった場合、第２の命令を演算ステージの前
のステージに戻すパスと、該パスを通して送られてきた
第２の命令を再実行するように制御する制御手段とを備
える構成とする。According to the second aspect of the present invention, when the load instruction which is the first instruction is found to be a cache miss, the load instruction which is the first instruction sends data to the second instruction through the bypass in advance. The second instruction sent up to and after the operation stage is re-executed from the stage before the operation stage. Specifically, in the configuration of claim 1, when the load instruction which is the first instruction causes a cache miss, a path for returning the second instruction to the stage before the operation stage and a path for sending the second instruction are sent through the path. And a control means for controlling the second instruction to be re-executed.

【００２２】請求項３の発明は、第１の命令であるロー
ド命令のキャッシュミスによって発生したブロック転送
中に、第２の命令を演算ステージより前のステージに戻
し、演算ステージの前のステージで保持しておき、ブロ
ック転送が終了した時には読み出された有効データを第
２の命令が直ちに受け取り演算実行可能な状態にしてお
く構成とするものである。According to the third aspect of the invention, the second instruction is returned to the stage prior to the operation stage during the block transfer generated by the cache miss of the load instruction which is the first instruction, and the second instruction is returned to the stage before the operation stage. The configuration is such that the second instruction immediately receives the read valid data when the block transfer is completed, and the operation is ready to be executed.

【００２３】[0023]

【作用】請求項１の発明の構成により、第１の命令であ
るロード命令と第２の命令との間にデータの依存関係が
あるときには、キャッシュから読み出されたデータはキ
ャッシュのヒット判定が判明する前に演算ステージの前
に送られるため、第２の命令は先行して演算を実行する
ことが可能となり、キャッシュヒット時の第２の命令の
過剰な保持が排除される。According to the structure of the first aspect of the present invention, when there is a data dependency relationship between the load instruction which is the first instruction and the second instruction, the data read from the cache is judged as a cache hit. Since it is sent before the operation stage before it is known, the second instruction can execute the operation in advance, and the excessive holding of the second instruction at the time of cache hit is eliminated.

【００２４】請求項２の発明の構成により、第１の命令
であるロード命令がキャッシュミスした場合、演算ステ
ージ以降まで送られてしまった第２の命令を再度実行す
ることが可能であり、正常動作が保証される。According to the configuration of the second aspect of the present invention, when the load instruction which is the first instruction causes a cache miss, it is possible to re-execute the second instruction that has been sent up to and after the operation stage, and the normal operation is performed. Operation is guaranteed.

【００２５】請求項３の発明により、第２の命令を演算
ステージの前のステージに戻し、演算ステージの前のス
テージで保持するまでの動作が第１の命令であるロード
命令のブロック転送中に行われ、該ブロック転送が終了
したときには第２の命令の演算が直ちに実行されるので
請求項２の動作はペナルティ増加とはならない。According to the invention of claim 3, the operation of returning the second instruction to the stage before the operation stage and holding it in the stage before the operation stage is during the block transfer of the load instruction which is the first instruction. When the block transfer is completed, the operation of the second instruction is immediately executed, so that the operation of claim 2 does not result in an increase in penalty.

【００２６】[0026]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。Embodiments of the present invention will be described below with reference to the drawings.

【００２７】図５に第１の実施例として請求項１と請求
項２の発明に従うパイプライン処理装置の構成を示す。FIG. 5 shows the configuration of a pipeline processing apparatus according to the first and second aspects of the present invention as a first embodiment.

【００２８】第１の実施例におけるパイプライン処理装
置は、各ステージにはそれぞれのステージのアドレス、
データ及命令を保持する保持回路２、３、４、５、６、
７、１４、１６、１９、１０、２６を備え、ＩＦステー
ジにはプログラムのカウントを行うプログラムカウンタ
８を備え、Ｄステージには先行命令とのデータ依存関係
を検出する検出回路２４を備え、Ｅステージには命令の
演算を行う演算装置１５を備え、Ｎステージにはデータ
キャッシュのヒット判定を行う判定回路２０を備えてい
る。命令はＩＦステージで命令キャッシュメモリ９から
読み出しが行われ、データはＡステージでデータキャッ
シュメモリ１７から読み出しが行われる。またレジスタ
ファイル２１からの読み出しはＤステージで行われ、レ
ジスタファイル２１への書き込みはＷステージで行われ
る。さらにＮステージからＤステージ及Ａステージから
Ｄステージにはそれぞれデータキャッシュ１７から読み
出されたデータを送るバイパス１１９、１１７が設けら
れ、ＥステージからＩＦステージにはアドレスを送るパ
ス１０４が設けられている。In the pipeline processing apparatus according to the first embodiment, each stage has an address of each stage,
Holding circuits 2, 3, 4, 5, 6, for holding data and instructions,
7, 14, 16, 19, 10, 26, the IF stage is provided with a program counter 8 for counting a program, the D stage is provided with a detection circuit 24 for detecting a data dependency relationship with a preceding instruction, and E is provided. The stage is provided with an arithmetic unit 15 for performing an instruction operation, and the N stage is provided with a determination circuit 20 for performing a data cache hit determination. The instruction is read from the instruction cache memory 9 in the IF stage, and the data is read from the data cache memory 17 in the A stage. Further, reading from the register file 21 is performed at the D stage, and writing to the register file 21 is performed at the W stage. Further, the N-stage to D-stage and the A-stage to D-stage are provided with bypasses 119 and 117 for sending the data read from the data cache 17, respectively, and the E-stage to the IF stage are provided with a path 104 for sending an address. There is.

【００２９】図６には第１の実施例のパイプライン処理
装置において、図２の命令列が処理される場合に第１の
命令であるロード命令がキャッシュヒットした場合のタ
イムチャートが示してあり、図７には第１の命令である
ロード命令がキャッシュミスした場合のタイムチャート
が示してある。FIG. 6 shows a time chart in the case where the load instruction which is the first instruction causes a cache hit in the pipeline processing device of the first embodiment when the instruction sequence of FIG. 2 is processed. FIG. 7 shows a time chart in the case where the load instruction which is the first instruction causes a cache miss.

【００３０】以下、図２の命令列を例にして第１の実施
例のパイプライン処理装置の具体的な動作について説明
する。The specific operation of the pipeline processing apparatus of the first embodiment will be described below by taking the instruction sequence of FIG. 2 as an example.

【００３１】まず第１の命令であるロード命令がＩＦス
テージで命令キャッシュ９から読み出される。次に第１
の命令であるロード命令はＤステージでレジスタファイ
ル２１からレジスタｘのデータとレジスタｂのデータを
それぞれデータ線１３１、１１０に読み出し、選択回路
１２、１３でそれぞれデータ線１３１、１１０を選択し
データ線１１１、１１２に出力する。このとき第２の命
令である加算命令がＩＦステージで命令キャッシュ９か
ら読み出される。第１の命令であるロード命令はＥステ
ージでデータ１１１、１１２を用いて演算装置１５で加
算を実行し、データキャッシュの読み出しアドレス１１
５を出力する。このとき第２の命令である加算命令はＤ
ステージでレジスタファイル２１からレジスタｒ１のデ
ータとレジスタｒ２のデータをそれぞれデータ線１３
１、１１０に読み出している。ここで図２に示すように
第１の命令であるロード命令と第２の命令である加算命
令との間ではレジスタｒ１を介してデータの依存関係が
あるので、検出回路２４でデータ依存関係があることが
検出される。依存関係があることが検出されると第２の
命令である加算命令は、レジスタｒ１のデータが読み出
されているデータ線１３１には古いデータが出力されて
いることになるため、命令演算ステージであるＥステー
ジには送られず直前のＤステージでステージロック信号
１２９によって保持される。第１の命令であるロード命
令はデータ依存関係の有無にかかわらず引き続き処理さ
れ、ＡステージでＥステージからの出力結果１１５を用
いて、データキャッシュ読み出しアドレス１１６としデ
ータキャッシュ１７からデータ１１７を読み出す。Ａス
テージで読み出されたデータはバイパス１１７を通して
Ｄステージに送られ、ここで第２の命令である加算命令
を保持していたステージロック信号１２９は解除され
る。第２の命令である加算命令は、Ａステージからレジ
スタｒ１に相当するデータ１１７が送られてくると、送
られてきた該データ１１７を選択回路１２で選択しデー
タ線１１１に出力するとともに、すでにＤステージでレ
ジスタファイル２１から読み出されているレジスタｒ２
のデータ１１０を選択回路１３で選択しデータ線１１２
に出力する。第２の命令である加算命令はＥステージ
で、Ｄステージからの出力データ１１１、１１２を用い
て演算装置１５で、第１の命令であるロード命令のキャ
ッシュのヒット判定が判明する前に先行して加算を実行
し、加算結果１１５を出力する。第１の命令であるロー
ド命令は、Ａステージでデータ１１７を選択回路２７で
選択しＮステージデータ保持回路２９にも送り、またデ
ータキャッシュ読み出しアドレス１１６でデータアドレ
スタグ１８とデータＴＬＢ２８からもそれぞれ物理アド
レス１１８、１３３を読み出し、第１の命令であるロー
ド命令のキャッシュのヒット判定を行うヒット判定回路
２０まで送る。First, the load instruction which is the first instruction is read from the instruction cache 9 at the IF stage. Then the first
In the D stage, the load instruction, which is the instruction of, reads out the data of the register x and the data of the register b from the register file 21 to the data lines 131 and 110, respectively, and selects the data lines 131 and 110 by the selection circuits 12 and 13, respectively. Output to 111 and 112. At this time, the add instruction which is the second instruction is read from the instruction cache 9 in the IF stage. The load instruction, which is the first instruction, performs addition in the arithmetic unit 15 using the data 111 and 112 at the E stage, and the read address 11 of the data cache.
5 is output. At this time, the addition instruction which is the second instruction is D
At the stage, the data of the register r1 and the data of the register r2 are respectively transferred from the register file 21 to the data line 13
1, 110 are read. Here, as shown in FIG. 2, since there is a data dependency relationship between the load instruction which is the first instruction and the addition instruction which is the second instruction via the register r1, the data dependency relationship is detected in the detection circuit 24. Is detected. When it is detected that there is a dependency relationship, the addition instruction, which is the second instruction, means that old data is output to the data line 131 from which the data of the register r1 is read, and therefore the instruction operation stage Is not sent to the E stage, which is held by the stage lock signal 129 at the immediately preceding D stage. The load instruction, which is the first instruction, is continuously processed regardless of the presence or absence of data dependency, and the output result 115 from the E stage is used in the A stage to read the data 117 from the data cache 17 as the data cache read address 116. The data read in the A stage is sent to the D stage through the bypass 117, and the stage lock signal 129 holding the addition instruction which is the second instruction is released here. When the data 117 corresponding to the register r1 is sent from the A stage, the addition instruction, which is the second instruction, selects the sent data 117 by the selection circuit 12 and outputs it to the data line 111. Register r2 read from register file 21 in D stage
Data 110 is selected by the selection circuit 13 and the data line 112 is selected.
Output to. The addition instruction which is the second instruction is preceded by the E stage before the hit judgment of the cache of the load instruction which is the first instruction is determined by the arithmetic unit 15 using the output data 111 and 112 from the D stage. Then, the addition is performed and the addition result 115 is output. The load instruction, which is the first instruction, selects the data 117 at the A stage by the selection circuit 27 and sends it to the N stage data holding circuit 29. Also, the data cache read address 116 physically causes the data address tag 18 and the data TLB 28 to be used. The addresses 118 and 133 are read and sent to the hit determination circuit 20 which determines the cache hit of the load instruction which is the first instruction.

【００３２】まず第１の命令であるロード命令がＮステ
ージでヒット判定回路２０によってキャッシュヒットと
判明した場合には、第１の命令のＡステージで読み出さ
れたデータ１１７は有効なデータであるので、第１の命
令であるロード命令はＡステージから送られてきてＮス
テージに出力されている有効データ１１９をＷステージ
でレジスタｒ１に書き込む。また第２の命令である加算
命令の先行演算も有効であるので該加算命令は引き続き
処理が継続され、Ｎステージでは選択回路２７で第２の
命令である加算命令の加算結果１１６を選択し、Ｗステ
ージでレジスタｔに該加算結果を書き込む。First, when the load instruction which is the first instruction is identified as a cache hit by the hit determination circuit 20 in the N stage, the data 117 read in the A stage of the first instruction is valid data. Therefore, the load instruction which is the first instruction writes the valid data 119 sent from the A stage and output to the N stage to the register r1 in the W stage. Further, since the preceding operation of the addition instruction which is the second instruction is also effective, the addition instruction continues to be processed, and in the N stage, the selection circuit 27 selects the addition result 116 of the addition instruction which is the second instruction, At the W stage, the addition result is written in the register t.

【００３３】第１の実施例のパイプライン処理装置にお
いて第１の命令であるロード命令がキャッシュヒットし
た場合、第２の命令である加算命令は１サイクルペナル
ティとなる。In the pipeline processing device of the first embodiment, when the load instruction which is the first instruction causes a cache hit, the addition instruction which is the second instruction has a one-cycle penalty.

【００３４】一方第１の命令であるロード命令がＮステ
ージでキャッシュミスと判明した場合には、第１の命令
のＡステージで読み出されたデータは無効なデータであ
るので、該ロード命令は有効なデータを読み出すために
ブロック転送を行い、ブロック転送中はステージロック
信号１２６によりＮステージで保持される。また第２の
命令である加算命令もブロック転送中はステージロック
信号１２６によりＥステージで保持される。キャッシュ
ミス発生から複数サイクルの後ブロック転送が終了する
と、ステージロック信号１２６は解除され、第１の命令
であるロード命令が存在するＮステージには有効なデー
タが読み出されている。第１の命令であるロード命令は
Ｗステージで有効データ１１９をレジスタｒ１に書き込
む。しかし第２の命令である加算命令は、ブロック転送
が終了した時にはすでに加算演算は終了しているが、第
１の命令であるロード命令によってＡステージからＤス
テージに送られてきたデータ１１７は無効データである
ので該加算演算は無効演算であり、該加算命令は再度実
行し直す必要がある。第２の命令である加算命令は、第
１の命令であるロード命令のブロック転送が終了したと
きにはＥステージに存在するので、Ｅステージにある加
算命令のアドレスをパス１０４を通して前のステージに
戻し選択回路１でアドレス１０４を選択しＩＦステージ
に戻す。戻された該アドレス１０２で命令キャッシュ９
から第２の命令である加算命令を再度読み出す。ここで
命令を演算ステージよりも前のステージに戻し再実行す
る処理をリトライと呼ぶ。第２の命令である加算命令が
Ｄステージまで処理されたときには、レジスタｒ１には
第１の命令であるロード命令によって読み出されたデー
タがすでに書き込まれている。第２の命令である加算命
令はＤステージでレジスタファイル２１からレジスタｒ
１のデータとレジスタｒ２のデータをそれぞれデータ線
１３１、１１０に読み出し、選択回路１２、１３でそれ
ぞれデータ１３１、１１０を選択しデータ線１１１、１
１２に出力する。第２の命令である加算命令はＥステー
ジで、Ｄステージからの出力データ１１１、１１２を用
いて演算装置１５で加算を再度実行し、Ａステージで該
加算結果である１１６を選択回路２７で選択し、Ｗステ
ージで該加算結果をレジスタｔに書き込む。On the other hand, when the load instruction, which is the first instruction, is found to be a cache miss in the N stage, the data read in the A stage of the first instruction is invalid data. Block transfer is performed to read valid data, and the block is held in the N stage by the stage lock signal 126 during the block transfer. Further, the add instruction which is the second instruction is also held in the E stage by the stage lock signal 126 during the block transfer. When the block transfer ends after a plurality of cycles from the occurrence of the cache miss, the stage lock signal 126 is released, and valid data is read to the N stage where the load instruction which is the first instruction exists. The load instruction, which is the first instruction, writes the valid data 119 to the register r1 at the W stage. However, the addition instruction which is the second instruction has already completed the addition operation when the block transfer is finished, but the data 117 sent from the A stage to the D stage by the load instruction which is the first instruction is invalid. Since it is data, the addition operation is an invalid operation, and the addition instruction needs to be executed again. The add instruction which is the second instruction exists in the E stage when the block transfer of the load instruction which is the first instruction is completed. Therefore, the address of the add instruction in the E stage is returned to the previous stage through the path 104 and selected. The address 104 is selected by the circuit 1 and returned to the IF stage. The instruction cache 9 at the returned address 102
From the second instruction, the addition instruction which is the second instruction is read again. Here, the process of returning the instruction to the stage before the operation stage and re-executing it is called a retry. When the add instruction which is the second instruction is processed up to the D stage, the data read by the load instruction which is the first instruction is already written in the register r1. The add instruction which is the second instruction is executed from the register file 21 to the register r in the D stage.
The data of 1 and the data of the register r2 are read to the data lines 131 and 110, respectively, and the selection circuits 12 and 13 select the data 131 and 110, respectively.
Output to 12. The addition instruction which is the second instruction is the E stage, the addition is performed again in the arithmetic unit 15 using the output data 111 and 112 from the D stage, and the addition circuit 116 is selected by the selection circuit 27 at the A stage. Then, in the W stage, the addition result is written in the register t.

【００３５】第１の実施例のパイプライン処理装置にお
いて第１の命令であるロード命令がキャッシュミスした
場合、第２の命令である加算命令は２サイクル＋ブロッ
ク転送サイクル＋リトライ処理サイクルペナルティとな
る。In the pipeline processing device of the first embodiment, when the load instruction which is the first instruction causes a cache miss, the addition instruction which is the second instruction becomes 2 cycles + block transfer cycle + retry processing cycle penalty. .

【００３６】第２の実施例としては、第１の実施例のパ
イプライン処理装置に請求項３の発明を加えた構成とす
る。The second embodiment has a configuration in which the invention of claim 3 is added to the pipeline processing apparatus of the first embodiment.

【００３７】図１に第２の実施例の構成を示す。FIG. 1 shows the configuration of the second embodiment.

【００３８】図３には第２の実施例のパイプライン処理
装置において、図２の命令列が処理される場合に第１の
命令であるロード命令がキャッシュヒットした場合のタ
イムチャートが示してあり、図４には第１の命令である
ロード命令がキャッシュミスした場合のタイムチャート
が示してある。FIG. 3 shows a time chart in the case where the load instruction which is the first instruction causes a cache hit when the instruction sequence of FIG. 2 is processed in the pipeline processing device of the second embodiment. FIG. 4 shows a time chart when the first instruction, the load instruction, causes a cache miss.

【００３９】以下、図面に基づいて第２の実施例の具体
的な動作について説明する。The specific operation of the second embodiment will be described below with reference to the drawings.

【００４０】まず第１の命令であるロード命令がＩＦス
テージで命令キャッシュ９から読み出される。次に第１
の命令であるロード命令はＤステージでレジスタファイ
ル２１からレジスタｘのデータとレジスタｂのデータを
それぞれデータ線１３１、１１０に読み出し、選択回路
１２、１３でそれぞれデータ線１３１、１１０を選択し
データ線１１１、１１２に出力する。このとき第２の命
令である加算命令がＩＦステージで命令キャッシュ９か
ら読み出される。第１の命令であるロード命令はＥステ
ージでデータ１１１、１１２を用いて演算装置１５で加
算を実行し、データキャッシュの読み出しアドレス１１
５を出力する。このとき第２の命令である加算命令はＤ
ステージでレジスタファイル２１からレジスタｒ１のデ
ータとレジスタｒ２のデータをそれぞれデータ線１３
１、１１０に読み出している。ここで図２に示すように
第１の命令であるロード命令と第２の命令である加算命
令との間ではレジスタｒ１を介してデータの依存関係が
あるので、検出回路２４でデータ依存関係があることが
検出される。依存関係があることが検出されると第２の
命令である加算命令は、レジスタｒ１のデータが読み出
されているデータ線１３１には古いデータが出力されて
いることになるため、命令演算ステージであるＥステー
ジには送られず直前のＤステージでステージロック信号
１２９によって保持される。第１の命令であるロード命
令はデータ依存関係の有無にかかわらず引き続き処理さ
れ、ＡステージでＥステージからの出力結果１１５を用
いて、データキャッシュ読み出しアドレス１１６としデ
ータキャッシュ１７からデータ１１７を読み出す。Ａス
テージで読み出されたデータはバイパス１１７を通して
Ｄステージに送られ、ここで第２の命令である加算命令
を保持していたステージロック信号１２９は解除され
る。第２の命令である加算命令は、Ａステージからレジ
スタｒ１に相当するデータ１１７が送られてくると、送
られてきた該データ１１７を選択回路１２で選択しデー
タ線１１１に出力するとともに、すでにＤステージでレ
ジスタファイル２１から読み出されているレジスタｒ２
のデータ１１０を選択回路１３で選択しデータ線１１２
に出力する。第２の命令である加算命令はＥステージ
で、Ｄステージからの出力データ１１１、１１２を用い
て演算装置１５で、第１の命令であるロード命令のキャ
ッシュのヒット判定が判明する前に先行して加算を実行
し、加算結果１１５を出力する。第１の命令であるロー
ド命令は、Ａステージでデータ１１７を選択回路２７で
選択しＮステージデータ保持回路２９にも送り、またデ
ータキャッシュ読み出しアドレス１１６でデータアドレ
スタグ１８とデータＴＬＢ２８からもそれぞれ物理アド
レス１１８、１３３を読み出し、第１の命令であるロー
ド命令のキャッシュのヒット判定を行うヒット判定回路
２０まで送る。First, the load instruction which is the first instruction is read from the instruction cache 9 in the IF stage. Then the first
In the D stage, the load instruction, which is the instruction of, reads out the data of the register x and the data of the register b from the register file 21 to the data lines 131 and 110, respectively, and selects the data lines 131 and 110 by the selection circuits 12 and 13, respectively. Output to 111 and 112. At this time, the add instruction which is the second instruction is read from the instruction cache 9 in the IF stage. The load instruction, which is the first instruction, performs addition in the arithmetic unit 15 using the data 111 and 112 at the E stage, and the read address 11 of the data cache.
5 is output. At this time, the addition instruction which is the second instruction is D
At the stage, the data of the register r1 and the data of the register r2 are respectively transferred from the register file 21 to the data line 13
1, 110 are read. Here, as shown in FIG. 2, since there is a data dependency relationship between the load instruction which is the first instruction and the addition instruction which is the second instruction via the register r1, the data dependency relationship is detected in the detection circuit 24. Is detected. When it is detected that there is a dependency relationship, the addition instruction, which is the second instruction, means that old data is output to the data line 131 from which the data of the register r1 is read, and therefore the instruction operation stage Is not sent to the E stage, which is held by the stage lock signal 129 at the immediately preceding D stage. The load instruction, which is the first instruction, is continuously processed regardless of the presence or absence of data dependency, and the output result 115 from the E stage is used in the A stage to read the data 117 from the data cache 17 as the data cache read address 116. The data read in the A stage is sent to the D stage through the bypass 117, and the stage lock signal 129 holding the addition instruction which is the second instruction is released here. When the data 117 corresponding to the register r1 is sent from the A stage, the addition instruction, which is the second instruction, selects the sent data 117 by the selection circuit 12 and outputs it to the data line 111. Register r2 read from register file 21 in D stage
Data 110 is selected by the selection circuit 13 and the data line 112 is selected.
Output to. The addition instruction which is the second instruction is preceded by the E stage before the hit judgment of the cache of the load instruction which is the first instruction is determined by the arithmetic unit 15 using the output data 111 and 112 from the D stage. Then, the addition is performed and the addition result 115 is output. The load instruction, which is the first instruction, selects the data 117 at the A stage by the selection circuit 27 and sends it to the N stage data holding circuit 29. Also, the data cache read address 116 physically causes the data address tag 18 and the data TLB 28 to be used. The addresses 118 and 133 are read and sent to the hit determination circuit 20 which determines the cache hit of the load instruction which is the first instruction.

【００４１】まず第１の命令であるロード命令がＮステ
ージでヒット判定回路２０によってキャッシュヒットと
判明した場合には、第１の命令のＡステージで読み出さ
れたデータ１１７は有効なデータであるので、第１の命
令であるロード命令はＡステージから送られてきてＮス
テージに出力されている有効データ１１９をＷステージ
でレジスタｒ１に書き込む。また第２の命令である加算
命令の先行演算も有効であるので該加算命令は引き続き
処理が継続され、Ｎステージでは選択回路２７で第２の
命令である加算命令の加算結果１１６を選択し、Ｗステ
ージでレジスタｔに該加算結果を書き込む。First, in the case where the load instruction which is the first instruction is found to be a cache hit by the hit determination circuit 20 in the N stage, the data 117 read in the A stage of the first instruction is valid data. Therefore, the load instruction which is the first instruction writes the valid data 119 sent from the A stage and output to the N stage to the register r1 in the W stage. Further, since the preceding operation of the addition instruction which is the second instruction is also effective, the addition instruction continues to be processed, and in the N stage, the selection circuit 27 selects the addition result 116 of the addition instruction which is the second instruction, At the W stage, the addition result is written in the register t.

【００４２】第２の実施例のパイプライン処理装置にお
いて第１の命令であるロード命令がキャッシュヒットし
た場合、第１の実施例と同様の動作が行われ、第２の命
令である加算命令は１サイクルペナルティとなる。When the load instruction which is the first instruction causes a cache hit in the pipeline processing device of the second embodiment, the same operation as in the first embodiment is performed, and the addition instruction which is the second instruction is executed. There is a one-cycle penalty.

【００４３】一方第１の命令であるロード命令がＮステ
ージでキャッシュミスと判明した場合には、第１の命令
のＡステージで読み出されたデータは無効なデータであ
るので、該ロード命令は有効なデータを読み出すために
ブロック転送を行い、ブロック転送中はステージロック
信号１２６によりＮステージで保持される。また第２の
命令である加算命令の先行演算は無効なデータを使用し
ていることになり該先行演算は無効演算となるため、該
加算命令は再度実行し直す必要がある。第２の命令であ
る加算命令は、第１の命令であるロード命令がキャッシ
ュミスと判明したときにはＥステージに存在するので、
Ｅステージにある加算命令のアドレスをパス１０４を通
して前のステージに戻し選択回路１でアドレス１０４を
選択しＩＦステージに戻す。戻された該アドレス１０２
で命令キャッシュ９から第２の命令である加算命令を再
度読み出し、Ｅステージの前のステージであるＤステー
ジで第１の命令であるロード命令のブロック転送によっ
て有効なデータが送られてくるまで第２の命令である加
算命令はステージロック信号１２８によって再度保持さ
れる。ここで第１の実施例では第２の命令のリトライ処
理は第１の命令のブロック転送後に実行しているが、第
２の実施例では第２の命令のリトライ処理をブロック転
送中に行い、ブロック転送が終了する前に該リトライ処
理を終了させている。キャッシュミス発生から複数サイ
クルの後ブロック転送が終了すると、ステージロック信
号１２６、１２８は解除され、第１の命令であるロード
命令が存在するＮステージには有効なデータが読み出さ
れている。第１の命令であるロード命令はＮステージに
読み出されているレジスタｒ１に相等する有効なデータ
をバイパス１１９を通してＮステージからＤステージに
送る。また該ロード命令はＷステージでレジスタｒ１に
有効データ１１９を書き込む。第２の命令である加算命
令はＮステージからＤステージに有効なデータ１１９が
送られてくると、送られてきた該データ１１９を選択回
路１２で選択しデータ線１１１に出力するとともに、す
でにＤステージでレジスタファイル２１から読み出され
ているレジスタｒ２のデータ１１０を選択回路１３で選
択しデータ線１１２に出力する。第２の命令である加算
命令はＥステージで、Ｄステージからの出力データ１１
１、１１２を用いて演算装置１５で再度加算を実行し、
加算結果１１５を出力する。そしてＮステージで加算結
果であるデータ１１６を選択回路２７で選択しＷステー
ジでレジスタｔに該加算結果を書き込む。On the other hand, when the load instruction, which is the first instruction, is found to be a cache miss in the N stage, the data read in the A stage of the first instruction is invalid data. Block transfer is performed to read valid data, and the block is held in the N stage by the stage lock signal 126 during the block transfer. Further, since the preceding operation of the add instruction which is the second instruction uses invalid data, the preceding operation becomes an invalid operation, so that the add instruction needs to be executed again. The add instruction, which is the second instruction, exists in the E stage when the load instruction, which is the first instruction, is found to be a cache miss.
The address of the add instruction in the E stage is returned to the previous stage through the path 104, and the address 104 is selected by the selection circuit 1 and returned to the IF stage. The returned address 102
Then, the add instruction which is the second instruction is read again from the instruction cache 9 and the valid data is sent by block transfer of the load instruction which is the first instruction in the D stage which is the stage before the E stage. The add instruction, which is the instruction of 2, is held again by the stage lock signal 128. Here, in the first embodiment, the retry processing of the second instruction is executed after the block transfer of the first instruction, but in the second embodiment, the retry processing of the second instruction is executed during the block transfer, The retry process is completed before the block transfer is completed. When the block transfer ends after a plurality of cycles from the occurrence of the cache miss, the stage lock signals 126 and 128 are released, and valid data is read out to the N stage where the load instruction which is the first instruction exists. The load instruction, which is the first instruction, sends valid data equivalent to the register r1 read to the N stage from the N stage to the D stage through the bypass 119. The load instruction writes valid data 119 in the register r1 at the W stage. When the valid data 119 is sent from the N stage to the D stage, the add instruction, which is the second instruction, selects the sent data 119 by the selection circuit 12 and outputs it to the data line 111. The data 110 of the register r2 read from the register file 21 at the stage is selected by the selection circuit 13 and output to the data line 112. The add instruction, which is the second instruction, is the E stage and the output data 11 from the D stage.
The addition is performed again by the arithmetic unit 15 using 1, 112,
The addition result 115 is output. Then, at the N stage, the data 116 as the addition result is selected by the selection circuit 27, and at the W stage, the addition result is written in the register t.

【００４４】第２の実施例のパイプライン処理装置にお
いて第１の命令であるロード命令がキャッシュミスした
場合、第２の命令である加算命令は２サイクル＋ブロッ
ク転送サイクルペナルティとなる。In the pipeline processing device of the second embodiment, when the load instruction which is the first instruction causes a cache miss, the addition instruction which is the second instruction has a penalty of 2 cycles + block transfer cycle.

【００４５】上記の第１の実施例と第２の実施例はデー
タキャッシュのヒット判定の方がデータキャッシュのデ
ータ読み出しよりも１サイクル遅い場合を示している。The above-described first and second embodiments show the case where the hit judgment of the data cache is one cycle later than the data read of the data cache.

【００４６】第３の実施例としては、第２の実施例にお
けるパイプライン段数を６段に限るものではない。The number of pipeline stages in the second embodiment is not limited to six in the third embodiment.

【００４７】図１１に第３の実施例のパイプライン処理
装置において、処理の高速化などによりヒット判定の方
がデータ読み出しよりも２サイクル遅くなった場合のキ
ャッシュヒット時のタイムチャートを示す。図１１に示
すように、このときの第２の命令である加算命令は１サ
イクルペナルティとなる。図１２にはキャッシュミス時
のタイムチャートを示す。図１２に示すように、このと
きの加算命令は３サイクル＋ブロック転送サイクルペナ
ルティとなる。FIG. 11 shows a time chart at the time of a cache hit in the pipeline processing apparatus of the third embodiment when the hit determination is delayed by 2 cycles from the data reading due to the speedup of the processing. As shown in FIG. 11, the addition instruction, which is the second instruction at this time, has a one-cycle penalty. FIG. 12 shows a time chart at the time of a cache miss. As shown in FIG. 12, the addition instruction at this time is 3 cycles + block transfer cycle penalty.

【００４８】第４の実施例としては、第３の実施例にお
ける第２の命令のリトライ処理で戻すデータをアドレス
だけに限るものではない。In the fourth embodiment, the data returned by the retry process of the second instruction in the third embodiment is not limited to the address.

【００４９】第５の実施例としては、第４の実施例にお
ける第２の命令のリトライ処理の戻り先を演算ステージ
の前のステージであるならばＩＦステージに限るもので
はない。In the fifth embodiment, the return destination of the retry processing of the second instruction in the fourth embodiment is not limited to the IF stage as long as it is the stage before the operation stage.

【００５０】[0050]

【発明の効果】以上説明したように、請求項１の発明に
よると、第１の命令であるロード命令と第２の命令との
間にデータの依存関係があるときには、キャッシュから
読み出されたデータはキャッシュのヒット判定が判明す
る前に演算ステージの前に送られるため、第２の命令は
先行して演算を行うことが可能となり、キャッシュヒッ
ト時には過剰な保持が排除されることになりつまりペナ
ルティが低減されるのでパイプライン処理装置の処理速
度を向上させることができる。またデータキャッシュか
らのデータ読み出しのステージとキャッシュのヒット判
定が判明するステージが離れても、第２の命令の先行演
算が可能なことから、キャッシュヒット時の過剰な保持
が排除され、キャッシュヒット時のペナルティが増加し
ないのでパイプライン処理装置の処理速度の低下を防ぐ
ことができる。As described above, according to the first aspect of the present invention, when there is a data dependency between the load instruction which is the first instruction and the second instruction, the data is read from the cache. Since the data is sent before the operation stage before the cache hit decision is known, the second instruction can perform the operation in advance, and excessive holding is eliminated at the time of cache hit. Since the penalty is reduced, the processing speed of the pipeline processing device can be improved. Further, even if the stage for reading data from the data cache and the stage for determining the hit determination of the cache are separated from each other, since the preceding operation of the second instruction is possible, excessive holding at the time of cache hit is eliminated, and at the time of cache hit. Therefore, the processing speed of the pipeline processing device can be prevented from decreasing.

【００５１】請求項２の発明の構成により、第１の命令
であるロード命令がキャッシュミスした場合でも、演算
ステージ以降まで送られてしまった第２の命令が再度実
行可能であり、正常動作が保証される。According to the configuration of the second aspect of the present invention, even if the load instruction which is the first instruction causes a cache miss, the second instruction that has been sent up to and after the operation stage can be executed again, and the normal operation is performed. Guaranteed.

【００５２】請求項３の発明により、第２の命令を演算
ステージの前のステージに戻し、演算ステージの前のス
テージで保持するまでの動作が第１の命令であるロード
命令のブロック転送中に行われ、該ブロック転送が終了
したときには読み出された有効データを第２の命令が直
ちに受け取り演算を実行することが可能であるので請求
項２の動作はペナルティ増加とはならず、パイプライン
処理装置の処理速度の低下を防ぐことができる。According to the invention of claim 3, the operation of returning the second instruction to the stage before the operation stage and holding it in the stage before the operation stage is during the block transfer of the load instruction which is the first instruction. When the block transfer is completed, the second instruction can immediately receive the read valid data and execute the operation. Therefore, the operation of claim 2 does not increase the penalty, and the pipeline processing is performed. It is possible to prevent the processing speed of the device from decreasing.

[Brief description of drawings]

【図１】本発明に従うパイプライン処理装置の第２の実
施例の構成図。FIG. 1 is a configuration diagram of a second embodiment of a pipeline processing device according to the present invention.

【図２】データ依存関係がある命令列。FIG. 2 is an instruction sequence having a data dependency relationship.

【図３】第２の実施例におけるキャッシュヒット時のタ
イムチャート。FIG. 3 is a time chart at the time of a cache hit in the second embodiment.

【図４】第２の実施例におけるキャッシュミス時のタイ
ムチャート。FIG. 4 is a time chart when a cache miss occurs in the second embodiment.

【図５】本発明に従うパイプライン処理装置の第１の実
施例の構成図。FIG. 5 is a configuration diagram of a first embodiment of a pipeline processing device according to the present invention.

【図６】第１の実施例におけるキャッシュヒット時のタ
イムチャート。FIG. 6 is a time chart when a cache hit occurs in the first embodiment.

【図７】第１の実施例におけるキャッシュミス時のタイ
ムチャート。FIG. 7 is a time chart when a cache miss occurs in the first embodiment.

【図８】従来技術に従うパイプライン処理装置の構成
図。FIG. 8 is a configuration diagram of a pipeline processing device according to a conventional technique.

【図９】従来技術におけるキャッシュヒット時のタイム
チャート。FIG. 9 is a time chart when a cache hit occurs in the conventional technique.

【図１０】従来技術におけるキャッシュミス時のタイム
チャート。FIG. 10 is a time chart when a cache miss occurs in the conventional technique.

【図１１】第２の実施例においてデータ読み出しとキャ
ッシュのヒット判定が２サイクル離れた場合のキャッシ
ュヒット時のタイムチャート。FIG. 11 is a time chart when a cache hit occurs when data read and cache hit determination are separated by two cycles in the second embodiment.

【図１２】第２の実施例においてデータ読み出しとキャ
ッシュのヒット判定が２サイクル離れた場合のキャッシ
ュミス時のタイムチャート。FIG. 12 is a time chart when a cache miss occurs when data read and cache hit determination are separated by two cycles in the second embodiment.

【図１３】従来技術においてデータ読み出しとキャッシ
ュのヒット判定が２サイクル離れた場合のキャッシュヒ
ット時のタイムチャート。FIG. 13 is a time chart when a cache hit occurs when data read and cache hit determination are separated by two cycles in the prior art.

【図１４】従来技術においてデータ読み出しとキャッシ
ュのヒット判定が２サイクル離れた場合のキャッシュミ
ス時のタイムチャート。FIG. 14 is a time chart when a cache miss occurs when data read and cache hit determination are separated by two cycles in the prior art.

[Explanation of symbols]

１、１２、１３、２７…選択回路、２、３、４、５、６、７、１０、１４、１６、１９、２
６…ステージ保持回路、８…プログラムカウンタ、９…命令キャッシュメモリ、１１…レジスタ選択回路、１５…演算装置、１７…データキャッシュメモリ、１８…データアドレスタグ、２０…ヒット判定回路、２１…レジスタファイル、２２…再実行制御装置、２３…制御装置、２４…依存関係検出回路、２５…先行演算制御装置、２８…データＴＬＢ。1, 12, 13, 27 ... Selection circuit, 2, 3, 4, 5, 6, 7, 10, 14, 16, 19, 2
6 ... Stage holding circuit, 8 ... Program counter, 9 ... Instruction cache memory, 11 ... Register selection circuit, 15 ... Arithmetic unit, 17 ... Data cache memory, 18 ... Data address tag, 20 ... Hit determination circuit, 21 ... Register file 22 ... Re-execution control device, 23 ... Control device, 24 ... Dependency detection circuit, 25 ... Preceding operation control device, 28 ... Data TLB.

───────────────────────────────────────────────────── フロントページの続き (72)発明者松原健二神奈川県秦野市堀山下１番地株式会社日立製作所汎用コンピュータ事業部内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Kenji Matsubara, 1 Horiyamashita, Horiyamashita, Hadano, Kanagawa Prefecture

Claims

[Claims]

1. A pipeline processing device that divides a first instruction and a second instruction to be executed after the first instruction into a plurality of stages to perform pipeline processing, wherein the first instruction is loaded. When the second instruction is an instruction and uses the data read by the load instruction which is the first instruction, the operation of the second instruction is performed before the cache hit determination of the load instruction is determined. A pipeline processing device comprising: a bypass for sending read data to a computing means of a stage to be executed; and a control means for controlling the sending of the read data to the computing means by using the bypass.

2. The pipeline processing device according to claim 1, wherein the second instruction is an instruction that uses data read by a load instruction that is the first instruction, and the load instruction that is the first instruction is a cache miss. When you wake up
A path for returning the second instruction from the stage after the operation stage to the stage before the operation stage and a second path using the path
And a control means for controlling so that the second instruction is returned to the stage before the operation stage and the second instruction is re-executed.

3. The pipeline processing apparatus according to claim 2, wherein the second instruction is returned to a stage prior to the operation stage during block transfer caused by a cache miss of the load instruction which is the first instruction. Pipeline processing, characterized in that it has control means for holding the data in the stage prior to the above, and when the block transfer is completed, the second instruction immediately receives the valid data read and executes the operation. apparatus.