JPH0512110A

JPH0512110A - Information processing equipment

Info

Publication number: JPH0512110A
Application number: JP3327282A
Authority: JP
Inventors: Masao Inoue; 雅夫井上
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-12-25
Filing date: 1991-12-11
Publication date: 1993-01-22

Abstract

(57)【要約】【目的】中央処理装置と主記憶との間に２階層のキャ
ッシュを備えた情報処理装置において、命令の高速並列
実行を実現する。【構成】メモリアクセスのための論理アドレスを２つ
のアドレスバス６，８上に並列に送出する中央処理装置
（ＣＰＵ）１と第１階層及び第２階層のキャッシュ（Ｌ
１，Ｌ２−ＣＡＣＨＥ）４，５との間に、論理アドレス
から物理アドレスへの変換のためのアドレス変換装置
（ＭＭＵａ，ＭＭＵｂ）２，３をそれぞれ設ける。Ｌ１
−ＣＡＣＨＥ４へのアクセスがミスした場合には、バス
セレクタ１０を通じて、該Ｌ１−ＣＡＣＨＥ４へのアド
レスをＬ２−ＣＡＣＨＥ５へ転送する。連続した２つの
ロード命令の高速並列実行が可能になる。 (57) [Abstract] [Purpose] To realize high-speed parallel execution of instructions in an information processing device having a two-level cache between a central processing unit and a main memory. A central processing unit (CPU) 1 for sending a logical address for memory access to two address buses 6 and 8 in parallel, and a cache (L) of the first and second layers
1, L2-CACHE) 4, 5 are provided with address translators (MMUa, MMUb) 2, 3 for translation from logical addresses to physical addresses. L1
-When access to CACHE4 is missed, the address to L1-CACHE4 is transferred to L2-CACHE5 through bus selector 10. High-speed parallel execution of two consecutive load instructions becomes possible.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数の命令を並列に実
行できる中央処理装置と複数階層のキャッシュとを備え
た情報処理装置に係るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information processing apparatus having a central processing unit capable of executing a plurality of instructions in parallel and a plurality of hierarchical caches.

【０００２】[0002]

【従来の技術】図７は、中央処理装置と主記憶（不図
示）との間に２階層のキャッシュを備えた従来の情報処
理装置の例を示すブロック図である。同図において、１
０１は２つの命令を並列に実行できる中央処理装置（Ｃ
ＰＵ）、１０３はＣＰＵ１０１に近く上位に位置する第
１階層のキャッシュ（Ｌ１−ＣＡＣＨＥ）、１０４はＬ
１−ＣＡＣＨＥ１０３より下位に位置する第２階層のキ
ャッシュ（Ｌ２−ＣＡＣＨＥ）である。ただし、アクセ
ス速度はＬ１−ＣＡＣＨＥ１０３の方が速く、記憶容量
はＬ２−ＣＡＣＨＥ１０４の方が大きい。また、１０２
はＬ１−ＣＡＣＨＥ１０３に物理アドレスを渡すための
アドレス変換装置（ＭＭＵ）、１０５及び１０７はアド
レスバス、１０６及び１０８はデータバス、１０９はＬ
１−ＣＡＣＨＥ１０３へのアクセスがミスした場合にそ
の旨をＣＰＵ１０１に伝えるためのミスヒット信号線で
ある。2. Description of the Related Art FIG. 7 is a block diagram showing an example of a conventional information processing apparatus having a two-level cache between a central processing unit and a main memory (not shown). In the figure, 1
01 is a central processing unit (C which can execute two instructions in parallel).
PU), 103 is a first-level cache (L1-CACHE) located near the CPU 101 and in an upper level, and 104 is L
It is a second layer cache (L2-CACHE) located below 1-CACHE 103. However, the access speed is higher in L1-CACHE 103, and the storage capacity is larger in L2-CACHE 104. Also, 102
Is an address translation unit (MMU) for passing a physical address to L1-CACHE 103, 105 and 107 are address buses, 106 and 108 are data buses, and 109 is L.
1-A miss hit signal line for notifying the CPU 101 when the access to the 1-CACHE 103 is missed.

【０００３】なお、Ｌ１−ＣＡＣＨＥ１０３は、ライト
スルー方式のコヒーレンシープロトコルを採用したもの
である。ライトスルー方式とは、複数階層のキャッシュ
間のデータの統一性を保証するためのコヒーレンシープ
ロトコルである。すなわち、この例の場合には、Ｌ１−
ＣＡＣＨＥ１０３のデータをストア命令等により更新し
たとき、そのＬ１−ＣＡＣＨＥ１０３で更新されたデー
タと同一アドレスのＬ２−ＣＡＣＨＥ１０４におけるデ
ータをも同時に更新するのである。The L1-CACHE 103 employs a write-through type coherency protocol. The write-through method is a coherency protocol for guaranteeing the consistency of data between caches in multiple layers. That is, in the case of this example, L1-
When the data in CACHE 103 is updated by a store instruction or the like, the data in L2-CACHE 104 having the same address as the data updated in L1-CACHE 103 is also updated at the same time.

【０００４】図７の情報処理装置において、メモリアク
セスを必要とするロード命令をＣＰＵ１０１が実行する
場合には、ロードしようとするデータのメモリアドレス
がＣＰＵ１０１からアドレスバス１０５上に論理アドレ
スとして送出され、アドレスバス１０５上の論理アドレ
スの一部分がＭＭＵ１０２により物理アドレスに変換さ
れ、残りの論理アドレスの部分と合わせてＬ１−ＣＡＣ
ＨＥ１０３がアクセスされる。そして、そのアクセスが
ヒットした場合には、該Ｌ１−ＣＡＣＨＥ１０３からデ
ータバス１０６を通じてＣＰＵ１０１にデータが取り込
まれる。また、Ｌ１−ＣＡＣＨＥ１０３のアクセスにお
いてミスした場合には、Ｌ１−ＣＡＣＨＥ１０３をアク
セスしたデ−タのアドレスと同じアドレスがＬ１，Ｌ２
−ＣＡＣＨＥ間のアドレスバス１０７を通じてＬ２−Ｃ
ＡＣＨＥ１０４に渡され、そのＬ２−ＣＡＣＨＥ１０４
からデータバス１０８，１０６を通じてＣＰＵ１０１に
所望のデータが取り込まれる。なお、Ｌ２−ＣＡＣＨＥ
１０４でもミスした場合には主記憶からデータを取り込
むのであるが、そのための構成は図示を省略している。In the information processing apparatus of FIG. 7, when the CPU 101 executes a load instruction requiring memory access, the memory address of the data to be loaded is sent from the CPU 101 to the address bus 105 as a logical address, A part of the logical address on the address bus 105 is converted into a physical address by the MMU 102, and L1-CAC is combined with the remaining part of the logical address.
HE 103 is accessed. When the access hits, data is fetched from the L1-CACHE 103 to the CPU 101 through the data bus 106. Further, when a mistake is made in accessing L1-CACHE 103, the same address as the address of the data accessing L1-CACHE 103 is L1, L2.
-L2-C through the address bus 107 between CACHE
Passed to ACHE104, and its L2-CACHE104
From the data bus 108, 106, desired data is fetched into the CPU 101. In addition, L2-CACHE
If 104 also makes a mistake, the data is fetched from the main memory, but the structure for that is not shown.

【０００５】図８は、図７中のＣＰＵ１０１の内部構成
を示すブロック図である。図８において、１１１は順次
実行すべき複数の命令を保持しておくための命令バッフ
ァ（ＩＢ）、１１２はＩＢ１１１からフェッチした命令
を解読するための第１の命令解読器（ＤＥＣａ）、１１
３は同じくＩＢ１１１からフェッチした次の命令を解読
するための第２の命令解読器（ＤＥＣｂ）、１１４はＤ
ＥＣａ１１２により解読した命令を実行するための第１
の演算器（ＡＬＵａ）、１１５はＤＥＣｂ１１３により
解読した命令を実行するための第２の演算器（ＡＬＵ
ｂ）、１１６は例えばｒ０〜ｒ３１の３２本のレジスタ
を備えたレジスタファイル（ＲＦ）、１１７はアドレス
セレクタ、１１８はアドレスセレクタ１１７を切り換え
るための制御回路（ＣＴＲＬ）である。また、１１９及
び１２０はフェッチ信号線、１２１及び１２２はデコー
ド信号線、１２３及び１２４はアドレス又はデータの伝
送のための信号線、１２５はＣＴＲＬ１１８からアドレ
スセレクタ１１７への選択信号線、１２６及び１２７は
ＲＦ１１６からＡＬＵａ１１４，ＡＬＵｂ１１５へのデ
ータの伝送のための信号線である。前記のアドレスバス
１０５はアドレスセレクタ１１７の出力側に、データバ
ス１０６はＲＦ１１６に、ミスヒット信号線１０９はＣ
ＴＲＬ１１８に各々接続される。なお、アドレスセレク
タ１１７は、ＡＬＵｂ１１５側に切り換えられている場
合でも、Ｌ１−ＣＡＣＨＥ１０３へのアクセスがミスし
たことを示す信号をＣＴＲＬ１１８がミスヒット信号線
１０９を通じて受け取ったときには、そのＣＴＲＬ１１
８によりＡＬＵａ１１４側に切り換えられるものであ
る。FIG. 8 is a block diagram showing the internal structure of the CPU 101 in FIG. In FIG. 8, 111 is an instruction buffer (IB) for holding a plurality of instructions to be sequentially executed, 112 is a first instruction decoder (DECa) for decoding the instruction fetched from the IB 111, 11
3 is a second instruction decoder (DECb) for decoding the next instruction also fetched from the IB 111, and 114 is D
First for executing instructions decoded by ECa112
Of the second arithmetic unit (ALUa) 115 for executing the instruction decoded by the DECb 113.
b), 116 is a register file (RF) including 32 registers r0 to r31, 117 is an address selector, and 118 is a control circuit (CTRL) for switching the address selector 117. 119 and 120 are fetch signal lines, 121 and 122 are decode signal lines, 123 and 124 are signal lines for address or data transmission, 125 is a selection signal line from the CTRL 118 to the address selector 117, and 126 and 127 are A signal line for transmitting data from the RF 116 to the ALUa 114 and the ALUb 115. The address bus 105 is at the output side of the address selector 117, the data bus 106 is at RF 116, and the mishit signal line 109 is at C.
Each is connected to the TRL 118. Even when the address selector 117 is switched to the ALUb 115 side, when the CTRL 118 receives a signal indicating that the access to the L1-CACHE 103 is missed through the mishit signal line 109, the CTRL 11
8 is switched to the ALUa 114 side.

【０００６】次に、図９及び図１０に基づいて、以上の
構成を有する従来の情報処理装置の動作を説明する。Next, the operation of the conventional information processing apparatus having the above configuration will be described with reference to FIGS. 9 and 10.

【０００７】図９は、メモリアクセスを必要とする１つ
のロード命令Ｓ１とメモリアクセスを必要としない３つ
の算術演算命令Ｓ２〜Ｓ４との連続した４つの命令から
なるプログラムを実行した場合のタイミング図であっ
て、そのロード命令Ｓ１に対してＬ１−ＣＡＣＨＥ１０
３へのアクセスがヒットした場合を示すものである。同
図中のロード(ld)命令Ｓ１は、ＲＦ１１６の中のレジス
タｒ０で示すメモリアドレスのデータをレジスタｒ１０
へロードすることを要求する命令である。加算(add) 命
令Ｓ２はレジスタｒ１の内容に即値８を加えた結果をレ
ジスタｒ２へストアすることを、減算(sub) 命令Ｓ３は
レジスタｒ３の内容から即値８を引いた結果をレジスタ
ｒ３へストアすることを、加算(add) 命令Ｓ４はレジス
タｒ４の内容にレジスタｒ５の内容を加えた結果をレジ
スタｒ４へストアすることを各々要求する命令である。
また、Ｄは命令コードのフェッチ及び解読とオペランド
データのフェッチとのためのデコードステージ（Ｄステ
ージ）を、Ｅは実行ステージ（Ｅステージ）を、Ｍはキ
ャッシュへのアクセスのためのメモリステージ（Ｍステ
ージ）を、Ｓはレジスタへのデータのストアのためのス
トアステージ（Ｓステージ）を各々表わしている。FIG. 9 is a timing chart when a program consisting of four consecutive instructions, one load instruction S1 requiring memory access and three arithmetic operation instructions S2 to S4 not requiring memory access, is executed. And L1-CACHE10 for the load instruction S1
3 shows a case where the access to 3 is hit. A load (ld) instruction S1 shown in the figure stores the data of the memory address indicated by the register r0 in the RF 116 in the register r10.
Is an instruction that requires loading. The add (add) instruction S2 stores the result of adding the immediate value 8 to the content of the register r1 to the register r2, and the subtract (sub) instruction S3 stores the result of subtracting the immediate value 8 from the content of the register r3 to the register r3. That is, the add instruction S4 is an instruction requesting that the result of adding the content of the register r5 to the content of the register r4 be stored in the register r4.
Further, D is a decode stage (D stage) for fetching and decoding an instruction code and fetching operand data, E is an execution stage (E stage), and M is a memory stage (M stage for accessing a cache. Stage) and S represents a store stage (S stage) for storing data in a register.

【０００８】ＣＰＵ１０１は、ロード(ld)命令Ｓ１と加
算(add)命令Ｓ２とを並列に実行する機能を有する。The CPU 101 has a function of executing a load (ld) instruction S1 and an add (add) instruction S2 in parallel.

【０００９】この動作を詳細に説明すると、まずサイク
ル１は、両命令Ｓ１，Ｓ２の各々についてのＤステージ
となっている。つまり、ＩＢ１１１からＤＥＣａ１１２
へのロード(ld)命令Ｓ１のフェッチ及びその解読と、同
じくＩＢ１１１からＤＥＣｂ１１３への加算(add) 命令
Ｓ２のフェッチ及びその解読とが同時に行われる。ロー
ド(ld)命令Ｓ１については、ロードの対象とすべきメモ
リアドレスの計算のためのオペランドデータがＩＢ１１
１から更にフェッチされる。Describing this operation in detail, first, the cycle 1 is the D stage for each of the instructions S1 and S2. That is, IB111 to DECa112
The load (ld) instruction S1 to the DECb 113 and the addition (add) instruction S2 from the IB 111 to the DECb 113 are fetched and decoded at the same time. For the load (ld) instruction S1, the operand data for calculating the memory address to be loaded is IB11.
Further fetched from 1.

【００１０】サイクル２は、両命令Ｓ１，Ｓ２の各々に
ついてのＥステージとなっている。まず、ロード(ld)命
令Ｓ１については、オペランドデータがＤＥＣａ１１２
からＡＬＵａ１１４へ渡されてアドレス計算が実行され
る。ＡＬＵｂ１１５は、これと並行して、ＤＥＣｂ１１
３から加算(add) 命令Ｓ２の解読結果を受け取り、ＲＦ
１１６をアクセスして加算を実行する。一方、ＤＥＣａ
１１２及びＤＥＣｂ１１３における解読結果は各々ＣＴ
ＲＬ１１８へも送られ、メモリアクセスのためのＡＬＵ
ａ１１４の側の信号線１２３をアドレスバス１０５に接
続するようにアドレスセレクタ１１７が切り換えられ
る。これにより、ＡＬＵａ１１４によるアドレス計算の
結果は、アドレスセレクタ１１７を経てアドレスバス１
０５上に送出される。ＭＭＵ１０２は、このアドレスバ
ス１０５上の論理アドレスを物理アドレスに変換する。
また、Ｌ１−ＣＡＣＨＥ１０３からは検索のためのタグ
が読み出される。Cycle 2 is the E stage for each of the instructions S1 and S2. First, for the load (ld) instruction S1, the operand data is DECa112.
To ALUa 114 for address calculation. In parallel with this, the ALUb115
The decoding result of the add instruction S2 is received from 3 and RF
116 is accessed to perform addition. On the other hand, DECa
The decoding results in 112 and DECb113 are CT
It is also sent to the RL 118 and is an ALU for memory access.
The address selector 117 is switched so that the signal line 123 on the side of a114 is connected to the address bus 105. As a result, the result of the address calculation by the ALUa 114 passes through the address selector 117 and the address bus 1
05. The MMU 102 translates the logical address on the address bus 105 into a physical address.
Also, a tag for retrieval is read from L1-CACHE 103.

【００１１】サイクル３は、ロード(ld)命令Ｓ１につい
てのＭステージである。このＭステージでは、前記Ｌ１
−ＣＡＣＨＥ１０３から読み出したタグと前記ＭＭＵ１
０２において得られた物理アドレスのタグとの比較を経
て、ロードすべきデータがＬ１−ＣＡＣＨＥ１０３から
データバス１０６上に読み出される。図９は、ロード(l
d)命令Ｓ１に対してＬ１−ＣＡＣＨＥ１０３がヒットし
た場合を示しており、このＭステージから次のＳステー
ジへ直ちに移行する。一方、加算(add) 命令Ｓ２につい
ては、前記Ｅステージ（サイクル２）から、以上のロー
ド(ld)命令Ｓ１のためのＭステージとしての１つのサイ
クルをおいて、ロード(ld)命令Ｓ１とともに次のＳステ
ージ（サイクル４）へ移行する。Cycle 3 is the M stage for the load (ld) instruction S1. In this M stage, the L1
-Tag read from CACHE 103 and the MMU1
After being compared with the tag of the physical address obtained in 02, the data to be loaded is read out from the L1-CACHE 103 onto the data bus 106. Figure 9 shows the load (l
d) Shows a case where L1-CACHE 103 hits the instruction S1 and immediately shifts from this M stage to the next S stage. On the other hand, for the add instruction S2, one cycle as the M stage for the load (ld) instruction S1 described above is performed from the E stage (cycle 2), and the next instruction is executed together with the load (ld) instruction S1. No. S stage (cycle 4).

【００１２】サイクル４では、ロード(ld)命令Ｓ１に応
じたデータバス１０６上のデータがＲＦ１１６の中にス
トアされると同時に、加算(add)命令Ｓ２に応じたＡＬ
Ｕｂ１１５による加算結果もＲＦ１１６中にストアされ
る（Ｓステージ）。In cycle 4, data on the data bus 106 corresponding to the load (ld) instruction S1 is stored in the RF 116, and at the same time, AL corresponding to the add (add) instruction S2 is stored.
The addition result by Ub115 is also stored in RF116 (S stage).

【００１３】また、次の２つの命令すなわち減算(sub)
命令Ｓ３及び加算(add) 命令Ｓ４は、以上のロード(ld)
命令Ｓ１及び加算(add) 命令Ｓ２の実行と並行して、サ
イクル２〜５において、ＡＬＵａ１１４及びＡＬＵｂ１
１５の双方を用いて同時に実行される。The following two instructions, subtraction (sub)
Instruction S3 and addition (add) instruction S4 are the above load (ld)
In parallel with the execution of the instruction S1 and the add instruction S2, in cycles 2-5, ALUa114 and ALUb1
Simultaneous execution using both of 15.

【００１４】図１０は、以上の図９の場合と同一の４つ
の命令すなわちロード(ld)命令Ｓ１、加算(add) 命令Ｓ
２、減算(sub) 命令Ｓ３及び加算(add) 命令Ｓ４からな
るプログラムを実行するに際して、ロード(ld)命令Ｓ１
に対してサイクル３のＭステージでＬ１−ＣＡＣＨＥ１
０３へのアクセスがミスした場合のタイミング図であ
る。同図中のＥ、Ｅ１及びＥ２は、いずれも実行ステー
ジを表わしている。Ｄ、Ｍ及びＳは、図９の場合と同様
の意味である。FIG. 10 shows the same four instructions as in the case of FIG. 9 described above, that is, a load (ld) instruction S1 and an add (add) instruction S.
2. When executing a program consisting of a subtraction (sub) instruction S3 and an addition (add) instruction S4, a load (ld) instruction S1
Against L1-CACHE1 in the M stage of cycle 3
It is a timing chart when access to 03 is missed. E, E1 and E2 in the figure all represent execution stages. D, M and S have the same meanings as in the case of FIG.

【００１５】サイクル３のロード(ld)命令Ｓ１のための
ＭステージにおいてＬ１−ＣＡＣＨＥ１０３へのアクセ
スがミスした場合には、このロード(ld)命令Ｓ１につい
て、サイクル４はＥ１ステージとなる。このＥ１ステー
ジでは、Ｌ２−ＣＡＣＨＥ１０４から検索のためのタグ
が読み出される。次のＥ２ステージ（サイクル５）で
は、Ｌ２−ＣＡＣＨＥ１０４から読み出したタグと前記
ＭＭＵ１０２において得られた物理アドレスのタグとが
比較される。このタグの比較を経て、次のＭステージ
（サイクル６）において、ロードすべきデータがＬ２−
ＣＡＣＨＥ１０４からデータバス１０８，１０６上に読
み出される。この際、ＣＰＵ１０１中のアドレスセレク
タ１１７はミスヒット信号線１０９上の信号に呼応して
ＡＬＵａ１１４側に切り換えられたままとなり、そのＣ
ＰＵ１０１からアドレスバス１０５を通じてＬ１−ＣＡ
ＣＨＥ１０３に与えられたアドレスと同じアドレスが、
Ｌ１，Ｌ２−ＣＡＣＨＥ１０３，１０４間のアドレスバ
ス１０７を通じてＬ２−ＣＡＣＨＥ１０４に与えられ
る。ただし、Ｌ２−ＣＡＣＨＥ１０４はＬ１−ＣＡＣＨ
Ｅ１０３に比べてアクセス速度が遅くかつ大容量である
ので、Ｌ１−ＣＡＣＨＥ１０３へのアクセスは２サイク
ル（Ｅ及びＭステージ）で済むところ、Ｌ２−ＣＡＣＨ
Ｅ１０４へのアクセスには３サイクル（Ｅ１、Ｅ２及び
Ｍステージ）を必要とするのである。そして、サイクル
７のＳステージにおいて、ＣＰＵ１０１は、データバス
１０６上に読み出したデータをＲＦ１１６中にストアす
る。If the access to L1-CACHE 103 is missed in the M stage for the load (ld) instruction S1 in cycle 3, cycle 4 becomes the E1 stage for this load (ld) instruction S1. In this E1 stage, a tag for retrieval is read from the L2-CACHE 104. In the next E2 stage (cycle 5), the tag read from the L2-CACHE 104 is compared with the tag of the physical address obtained in the MMU 102. After comparing the tags, the data to be loaded is L2- in the next M stage (cycle 6).
The data is read from the CACHE 104 onto the data buses 108 and 106. At this time, the address selector 117 in the CPU 101 remains switched to the ALUa 114 side in response to the signal on the mishit signal line 109, and its C
L1-CA from PU 101 through address bus 105
The same address given to CHE103 is
It is given to L2-CACHE 104 through an address bus 107 between L1, L2-CACHE 103 and 104. However, L2-CACHE 104 is L1-CACH
Since the access speed is slower and the capacity is larger than that of the E103, it is possible to access the L1-CACHE 103 in two cycles (E and M stages).
Accessing E104 requires three cycles (E1, E2 and M stages). Then, in the S stage of cycle 7, the CPU 101 stores the read data on the data bus 106 in the RF 116.

【００１６】なお、ＣＰＵ１０１は、ＤＥＣａ１１２に
フェッチした命令がメモリアクセスを必要としない算術
演算命令であり、同時にＤＥＣｂ１１３にフェッチした
命令がメモリアクセスを必要とするロード命令である場
合には、ＡＬＵｂ１１５がアドレスセレクタ１１７を通
じてアドレスバス１０５上にアドレスを送出する。If the instruction fetched to the DECa 112 is an arithmetic operation instruction that does not require memory access and the instruction fetched to the DECb 113 is a load instruction that requires memory access at the same time, the CPU 101 determines that the ALUb 115 has an address. The address is sent to the address bus 105 through the selector 117.

【００１７】[0017]

【発明が解決しようとする課題】上記従来の情報処理装
置は、ＣＰＵ１０１がアドレスセレクタ１１７を通じて
１つのアドレスしか送出することができず、Ｌ１−ＣＡ
ＣＨＥ１０３へのアクセスがミスした場合に限ってＬ２
−ＣＡＣＨＥ１０４をアクセスする構成であったため
に、メモリアクセスを必要とする２つの命令を並行して
高速に実行することができない問題があった。この点に
ついて、図１１及び図１２に基づいて説明する。In the above conventional information processing apparatus, the CPU 101 can send only one address through the address selector 117, and the L1-CA
L2 only when access to CHE103 is missed
-Because the configuration is such that CACHE 104 is accessed, there has been a problem that two instructions that require memory access cannot be executed in parallel at high speed. This point will be described with reference to FIGS. 11 and 12.

【００１８】図１１は、上記従来の情報処理装置におい
て連続した４つのロード命令Ｓ１〜Ｓ４からなるプログ
ラムを実行した場合の動作を示すタイミング図であっ
て、いずれのロード命令についてもＬ１−ＣＡＣＨＥ１
０３へのアクセスがヒットした場合を示すものである。
同図中の第１のロード(ld)命令Ｓ１は、レジスタｒ０で
示すメモリアドレスのデータをレジスタｒ１０へロード
することを要求する命令である。同様に、第２のロード
(ld)命令Ｓ２はレジスタｒ１で示すメモリアドレスのデ
ータをレジスタｒ１１へロードすることを、第３のロー
ド(ld)命令Ｓ３はレジスタｒ２で示すメモリアドレスの
データをレジスタｒ１２へロードすることを、第４のロ
ード(ld)命令Ｓ４はレジスタｒ３で示すメモリアドレス
のデータをレジスタｒ１３へロードすることを各々要求
する命令である。FIG. 11 is a timing chart showing an operation when a program consisting of four consecutive load instructions S1 to S4 is executed in the above-mentioned conventional information processing apparatus, and L1-CACHE1 for all load instructions.
This shows a case where access to 03 is hit.
A first load (ld) instruction S1 in the figure is an instruction requesting that the data at the memory address indicated by the register r0 be loaded into the register r10. Similarly, the second load
The (ld) instruction S2 loads the data at the memory address indicated by the register r1 into the register r11, and the third load (ld) instruction S3 loads the data at the memory address indicated by the register r2 into the register r12. The fourth load (ld) instruction S4 is an instruction requesting that the data at the memory address indicated by the register r3 be loaded into the register r13.

【００１９】ＣＰＵ１０１は、２つのロード(ld)命令Ｓ
１，Ｓ２についてもこれらを並列に実行する機能を有す
る。The CPU 101 has two load (ld) instructions S
1 and S2 also have a function of executing these in parallel.

【００２０】この動作を詳細に説明すると、まずサイク
ル１は、両命令Ｓ１，Ｓ２の各々についてのＤステージ
となっている。つまり、ＩＢ１１１からＤＥＣａ１１２
への第１のロード(ld)命令Ｓ１のフェッチ及びその解読
と、同じくＩＢ１１１からＤＥＣｂ１１３への第２のロ
ード(ld)命令Ｓ２のフェッチ及びその解読とが同時に行
われる。また、両ロード(ld)命令Ｓ１，Ｓ２の各々につ
いて、メモリアドレスの計算のためのオペランドデータ
がＩＢ１１１から更にフェッチされる。Explaining this operation in detail, the cycle 1 is the D stage for each of the instructions S1 and S2. That is, IB111 to DECa112
The first load (ld) instruction S1 to the DECb 113 and the second load (ld) instruction S2 from the IB 111 to the DECb 113 are simultaneously fetched and decoded. Further, for each of the load (ld) instructions S1 and S2, operand data for calculating the memory address is further fetched from the IB 111.

【００２１】サイクル２は、両ロード(ld)命令Ｓ１，Ｓ
２の各々についてＥステージとなるべきサイクルであ
る。ところが、第１のロード(ld)命令Ｓ１について、Ａ
ＬＵａ１１４でアドレス計算が実行され、アドレスセレ
クタ１１７を通じてその計算結果がアドレスバス１０５
上に送出され、ＭＭＵ１０２で論理アドレスの物理アド
レスへの変換が行われ、Ｌ１−ＣＡＣＨＥ１０３からタ
グの読み出しが行われている間は、ＡＬＵｂ１１５は、
アドレスセレクタ１１７を通してアドレスバス１０５上
にアドレスを送出することができない。したがって、サ
イクル２は、第１のロード(ld)命令Ｓ１にとってはＥス
テージとなるものの、第２のロード(ld)命令Ｓ２につい
ては同図中に「ｈｏｌｄ」で示すように待ち状態とな
る。In cycle 2, both load (ld) instructions S1 and S
It is a cycle in which each of the two cases should be in the E stage. However, regarding the first load (ld) instruction S1, A
The address calculation is executed by the LUa 114, and the calculation result is sent to the address bus 105 through the address selector 117.
While being sent to the upper side, the MMU 102 is converting the logical address to the physical address, and the tag is read from the L1-CACHE 103, the ALUb 115 is
An address cannot be sent out on the address bus 105 through the address selector 117. Therefore, although the cycle 2 is in the E stage for the first load (ld) instruction S1, the second load (ld) instruction S2 is in the waiting state as indicated by "hold" in the figure.

【００２２】サイクル３に至ってＣＰＵ１０１が第１の
ロード(ld)命令Ｓ１についてＭステージに移行し、Ｌ１
−ＣＡＣＨＥ１０３についてタグの比較とデータバス１
０６上へのデータの読み出しとが行われる段階になる
と、該第１のロード(ld)命令Ｓ１のためのアドレスセレ
クタ１１７、アドレスバス１０５及びＭＭＵ１０２の専
用が解除されるので、ＣＰＵ１０１は、アドレスセレク
タ１１７をＡＬＵｂ１１５側に切り換えて、第２のロー
ド(ld)命令Ｓ２についてＥステージへ移行することがで
きる。つまり、サイクル３は、第１のロード(ld)命令Ｓ
１についてはＭステージ、第２のロード(ld)命令Ｓ２に
ついてはＥステージとなる。すなわち、第２のロード(l
d)命令Ｓ２のＥステージは、第１のロード(ld)命令Ｓ１
のＥステージより１サイクルだけ遅れるのである。In cycle 3, the CPU 101 shifts to the M stage for the first load (ld) instruction S1 and changes to L1.
-About CACHE103 Tag comparison and data bus 1
At the stage of reading data onto 06, the exclusive use of the address selector 117, the address bus 105 and the MMU 102 for the first load (ld) instruction S1 is released, so that the CPU 101 determines that the address selector 117 can be switched to the ALUb 115 side to shift to the E stage for the second load (ld) instruction S2. That is, in cycle 3, the first load (ld) instruction S
1 is in the M stage, and the second load (ld) instruction S2 is in the E stage. That is, the second load (l
d) The E stage of the instruction S2 is the first load (ld) instruction S1.
It is one cycle later than the E stage.

【００２３】２つのロード命令を並行して実行しようと
する際に一方のロード命令のＥステージに対して他方の
ロード命令のＥステージが遅れるという関係は、第２の
ロード(ld)命令Ｓ２と第３のロード(ld)命令Ｓ３との
間、及び、第３のロード(ld)命令Ｓ３と第４のロード(l
d)命令Ｓ４との間についても同様である。したがって、
第２〜第４のロード(ld)命令Ｓ２〜Ｓ４については、い
ずれも少なくとも１サイクルの「ｈｏｌｄ」が入る。こ
の結果、連続した４つのロード(ld)命令Ｓ１〜Ｓ４から
なるプログラムの実行には、図１１に示すように少なく
とも７サイクルを要することとなる。The relationship that the E stage of one load instruction is delayed with respect to the E stage of the other load instruction when two load instructions are executed in parallel is related to the second load (ld) instruction S2. Between the third load (ld) instruction S3, and between the third load (ld) instruction S3 and the fourth load (l
d) The same applies to the command S4. Therefore,
For each of the second to fourth load (ld) instructions S2 to S4, at least one cycle of "hold" is entered. As a result, it takes at least 7 cycles to execute a program composed of four consecutive load (ld) instructions S1 to S4, as shown in FIG.

【００２４】以上の事情は、あるロード命令についてＬ
１−ＣＡＣＨＥ１０３がミスした場合には、更に悪い結
果をもたらす。The above situation is that a load instruction has L
If 1-CACHE 103 misses, it will give worse results.

【００２５】図１２は、図１１の場合と同一の連続した
４つのロード(ld)命令Ｓ１〜Ｓ４からなるプログラムを
実行する際に、第１のロード(ld)命令Ｓ１についてのＬ
１−ＣＡＣＨＥ１０３へのアクセスのみがミスした場合
のタイミング図である。FIG. 12 shows the L for the first load (ld) instruction S1 when executing the same program consisting of four consecutive load (ld) instructions S1 to S4 as in the case of FIG.
FIG. 9 is a timing diagram when only the access to 1-CACHE 103 is missed.

【００２６】サイクル３の第１のロード(ld)命令Ｓ１の
ためのＭステージにおいてＬ１−ＣＡＣＨＥ１０３への
アクセスがミスした場合には、ＣＰＵ１０１中のＣＴＲ
Ｌ１１８は、Ｌ１−ＣＡＣＨＥ１０３へのアクセスがミ
スしたことを示す信号をミスヒット信号線１０９を通じ
て受け取り、第２のロード(ld)命令Ｓ２のＥステージの
ために一旦ＡＬＵｂ１１５側に切り換えられていたアド
レスセレクタ１１７を第１のロード(ld)命令Ｓ１のため
にＡＬＵａ１１４側に戻すように切り換える。この結
果、アドレスバス１０５上にはサイクル２のＥステージ
と同じく第１のロード(ld)命令Ｓ１に呼応したメモリア
ドレスが送出されることとなる。したがって、第１のロ
ード(ld)命令Ｓ１について図１０の場合と同様に、図１
２中のサイクル４はＭＭＵ１０２による論理アドレスか
ら物理アドレスへの変換及びＬ２−ＣＡＣＨＥ１０４か
らのタグの読み出しのためのＥ１ステージ、サイクル５
はＬ２−ＣＡＣＨＥ１０４から読み出したタグとＭＭＵ
１０２において得られた物理アドレスのタグとの比較の
ためのＥ２ステージ、サイクル６はＬ２−ＣＡＣＨＥ１
０４からデータバス１０８，１０６上へのデータの読み
出しのためのＭステージ、サイクル７はＲＦ１１６への
データのストアのためのＳステージとなる。When the access to L1-CACHE 103 is missed in the M stage for the first load (ld) instruction S1 of cycle 3, the CTR in CPU 101
The L118 receives a signal indicating that the access to the L1-CACHE 103 has missed through the mishit signal line 109, and is temporarily switched to the ALUb115 side for the E stage of the second load (ld) instruction S2. 117 is switched back to the ALUa 114 side for the first load (ld) instruction S1. As a result, the memory address corresponding to the first load (ld) instruction S1 is transmitted to the address bus 105 as in the E stage of cycle 2. Therefore, as in the case of FIG. 10, the first load (ld) instruction S1 of FIG.
Cycle 4 in 2 is the E1 stage for conversion of logical address to physical address by MMU 102 and reading of tag from L2-CACHE 104, cycle 5
Is the tag and MMU read from L2-CACHE104
E2 stage for comparison of the physical address obtained in 102 with the tag, cycle 6 is L2-CACHE1
04 is an M stage for reading data onto the data buses 108 and 106, and cycle 7 is an S stage for storing data to the RF 116.

【００２７】この間、第２のロード(ld)命令Ｓ２のＬ１
−ＣＡＣＨＥ１０３に関するタグの比較及びデータの読
み出しのためのＭステージは、第１のロード(ld)命令Ｓ
１のＭステージの完了後のサイクル７まで待たされる。
つまり、第２のロード(ld)命令Ｓ２については、サイク
ル４〜６の間、同図中にいずれも「ｈｏｌｄ」で示すよ
うに待ち状態となる。また、第３のロード(ld)命令Ｓ３
のためのＬ１−ＣＡＣＨＥ１０３へのアクセスに関する
Ｅステージと第４のロード(ld)命令Ｓ４のためのＬ１−
ＣＡＣＨＥ１０３へのアクセスに関するＥステージと
は、各々サイクル７及びサイクル８まで待たされる。こ
の結果、図１２に示すように、連続した４つのロード(l
d)命令Ｓ１〜Ｓ４からなるプログラムの実行に際して第
１のロード(ld)命令Ｓ１についてＬ１−ＣＡＣＨＥ１０
３へのアクセスがミスした場合には、そのプログラムの
実行に１０サイクルを要することとなる。During this time, L1 of the second load (ld) instruction S2
The M stage for tag comparison and data read for CACHE 103 is the first load (ld) instruction S
Wait until cycle 7 after completion of M stage 1
In other words, the second load (ld) instruction S2 is in a waiting state during cycles 4 to 6 as indicated by "hold" in the figure. Also, the third load (ld) instruction S3
L1 for access to CACHE 103 and L1-for fourth load (ld) instruction S4
The E stage related to access to the CACHE 103 waits until cycle 7 and cycle 8, respectively. As a result, as shown in Figure 12, four consecutive loads (l
d) L1-CACHE10 for the first load (ld) instruction S1 when executing the program including the instructions S1 to S4
If the access to 3 is missed, it takes 10 cycles to execute the program.

【００２８】本発明の目的は、複数の命令を並列に実行
できる中央処理装置と主記憶との間に複数階層のキャッ
シュを備えた情報処理装置において、命令の並列実行を
高速化することにある。It is an object of the present invention to speed up parallel execution of instructions in an information processing apparatus having a cache of a plurality of layers between a central processing unit capable of executing a plurality of instructions in parallel and a main memory. .

【００２９】[0029]

【課題を解決するための手段】上記目的を達成するため
に本発明が講じた手段は、複数の命令の並列実行に際し
て複数階層のキャッシュを同時にアクセスするためのア
クセス手段を備えるようにしたものである。Means for Solving the Problems The means taken by the present invention to achieve the above object is to provide an access means for simultaneously accessing a plurality of layers of caches when a plurality of instructions are executed in parallel. is there.

【００３０】具体的に説明すると、請求項１の発明は、
複数の命令を並列に実行するための実行手段と、前記実
行手段に最も近い最上位の階層から最も遠い最下位の階
層までの複数階層のキャッシュと、前記実行手段による
複数の命令の並列実行時に前記複数階層のキャッシュを
同時にアクセスするためのアクセス手段とを備えた構成
を採用したものである。More specifically, the invention of claim 1 is as follows.
Execution means for executing a plurality of instructions in parallel, a cache of a plurality of layers from the highest hierarchy closest to the execution means to the lowest hierarchy farthest from the execution means, and at the time of parallel execution of the plurality of instructions by the execution means A configuration including an access unit for simultaneously accessing the caches of a plurality of layers is adopted.

【００３１】また、複数のロード命令の高速並列実行を
実現するために、請求項２の発明では、前記アクセス手
段は、各々メモリアクセスを必要とする複数の命令を前
記実行手段が並列実行する場合には、該複数の命令の各
々に基づいた互いに異なるアドレスにより前記複数階層
のキャッシュの各々へのアクセスを同時に開始する機能
を有することとした。Further, in order to realize a high speed parallel execution of a plurality of load instructions, in the invention of claim 2, the access means executes a plurality of instructions each requiring a memory access in parallel by the execution means. Has a function of simultaneously starting access to each of the caches of the plurality of layers by different addresses based on each of the plurality of instructions.

【００３２】また、キャッシュのアクセスミスが生じた
場合でも複数のロード命令の高速並列実行を保証するた
めに、請求項３の発明の前記アクセス手段は、メモリア
クセスを必要とする複数の命令のうちのある命令に関す
る上位の階層のキャッシュへのアクセスがヒットした場
合には該上位の階層のキャッシュへのアクセスを有効と
する一方で、該上位の階層のキャッシュへのアクセスが
ミスした場合には、該アクセスのミスが生じた命令につ
いての前記上位の階層のキャッシュより下位の階層のキ
ャッシュへのアクセスを、メモリアクセスを必要とする
他の命令に関するキャッシュアクセスの実行と並行して
開始する機能を有することとした。Further, in order to guarantee high-speed parallel execution of a plurality of load instructions even when a cache access miss occurs, the access means of the invention according to claim 3 is one of a plurality of instructions requiring memory access. When the access to the cache of the upper layer for a certain instruction is hit, the access to the cache of the upper layer is enabled, while the access to the cache of the upper layer is missed, It has a function of starting the access to the cache in the lower layer than the cache in the upper layer for the instruction in which the access miss occurs in parallel with the execution of the cache access for other instructions requiring memory access. I decided.

【００３３】また、キャッシュのアクセスミスが生じた
場合でもロード命令とロード命令以外の命令との高速並
列実行を実現するために、請求項４の発明では、前記ア
クセス手段は、メモリアクセスを必要とする命令とメモ
リアクセスを必要としない他の命令とを前記実行手段が
並列実行する場合には、前記メモリアクセスを必要とす
る命令に基づく同一のアドレスにより前記複数階層のキ
ャッシュへのアクセスを同時に開始し、かつ前記メモリ
アクセスを必要とする命令に関する上位の階層のキャッ
シュへのアクセスがヒットした場合には該上位の階層の
キャッシュへのアクセスを有効とする一方で、該上位の
階層のキャッシュへのアクセスがミスした場合には該命
令について該上位の階層のキャッシュへのアクセスと同
時に開始していた下位の階層のキャッシュへのアクセス
を有効とする機能を有することとした。Further, in order to realize high-speed parallel execution of a load instruction and an instruction other than the load instruction even when a cache access miss occurs, in the invention of claim 4, the access means requires a memory access. When the execution means executes in parallel an instruction to be executed and another instruction which does not require memory access, the access to the caches of the plurality of layers is simultaneously started by the same address based on the instruction requiring the memory access. In addition, when the access to the cache of the upper layer for the instruction requiring the memory access is hit, the access to the cache of the upper layer is enabled, while the access to the cache of the upper layer is enabled. If the access was missed, the instruction was started at the same time as the access to the cache in the upper hierarchy. It was to have a function to enable access to the cache's place in the hierarchy.

【００３４】また、論理アドレスから物理アドレスへの
好都合な変換のために、請求項５の発明では、前記アク
セス手段は、前記複数階層のキャッシュへの同時アクセ
スのための複数の論理アドレスを該複数階層のキャッシ
ュの各々に適した物理アドレスに変換するためのアドレ
ス変換装置を備えることとした。Further, in order to perform a convenient conversion from a logical address to a physical address, in the invention of claim 5, the access means provides a plurality of logical addresses for simultaneous access to the caches of the plurality of layers. An address translation device for translating into a physical address suitable for each of the hierarchical caches is provided.

【００３５】また、２階層のキャッシュを備えた情報処
理装置の具現化のために、請求項６の発明は、上位の第
１階層のキャッシュと下位の第２階層のキャッシュとの
２階層のキャッシュを備え、かつ該２階層のキャッシュ
の間には前記第１階層のキャッシュへのアクセスのため
のアドレスを前記第２階層のキャッシュへ転送するため
のキャッシュ間のアドレスバスが設けられた構成を採用
したものであって、しかも次のような実行手段とアクセ
ス手段とを備えることとしたものである。すなわち、実
行手段は、順次実行すべき複数の命令を保持しておくた
めの命令バッファと、該命令バッファから同時にフェッ
チした２つの命令を各々解読するための２つの命令解読
器と、該２つの命令解読器により解読した命令を各々実
行するための２つの演算器と、該２つの演算器による命
令の実行結果を各々格納するためのレジスタファイルと
を有する。前記２つの演算器の各々は、メモリアクセス
を必要とする命令に対しては該メモリアクセスのための
論理アドレスを計算により求めて該論理アドレスを出力
する機能を有する。また、アクセス手段は、前記２つの
演算器の各々から出力される論理アドレスを前記２階層
のキャッシュの各々へ導くための第１及び第２のアドレ
スセレクタと、該第１のアドレスセレクタの出力側に接
続された第１のアドレスバスと、前記第２のアドレスセ
レクタの出力側に接続された第２のアドレスバスと、前
記第１階層のキャッシュへのアクセスのために前記第１
のアドレスバス上の論理アドレスを物理アドレスに変換
するための第１のアドレス変換装置と、前記第２階層の
キャッシュへのアクセスのために前記第２のアドレスバ
ス上の論理アドレスを物理アドレスに変換するための第
２のアドレス変換装置と、前記キャッシュ間のアドレス
バスと前記第２のアドレスバスとのいずれかを選択して
該選択したアドレスバスを前記第２階層のキャッシュに
接続するためのバスセレクタと、前記第１及び第２のア
ドレスセレクタ並びに前記バスセレクタの各々の切り換
えを制御するための制御回路とを有することとしたもの
である。In order to realize an information processing apparatus having a two-level cache, the invention of claim 6 is a two-level cache including an upper first-level cache and a lower second-level cache. And an address bus between the caches for transferring an address for accessing the cache of the first layer to the cache of the second layer is provided between the caches of the second layer. In addition, the following execution means and access means are provided. That is, the execution means includes an instruction buffer for holding a plurality of instructions to be sequentially executed, two instruction decoders for decoding two instructions fetched simultaneously from the instruction buffer, and the two instruction decoders. It has two arithmetic units for executing the instructions decoded by the instruction decoder and a register file for storing the execution results of the instructions by the two arithmetic units, respectively. Each of the two arithmetic units has a function of calculating a logical address for the memory access for an instruction requiring the memory access and outputting the logical address. The access means includes first and second address selectors for guiding a logical address output from each of the two arithmetic units to each of the two-level caches, and an output side of the first address selector. A first address bus connected to the first address bus, a second address bus connected to the output side of the second address selector, and the first address bus for accessing the first level cache.
First address translation device for translating a logical address on the second address bus into a physical address, and a logical address on the second address bus for accessing the second-level cache And a bus for selecting one of the address bus between the caches and the second address bus and connecting the selected address bus to the cache of the second hierarchy. A selector and a control circuit for controlling switching of each of the first and second address selectors and the bus selector are provided.

【００３６】また、キャッシュ間の協調を図るために、
請求項７の発明では、前記複数階層のキャッシュのうち
の上位の階層のキャッシュは、該上位の階層のキャッシ
ュより下位の階層のキャッシュの各々に比べて、記憶容
量が小さくかつ短いサイクルでアクセスが完了すること
とした。Further, in order to achieve cooperation between the caches,
According to the invention of claim 7, the cache of the upper layer of the caches of the plurality of layers has a smaller storage capacity and can be accessed in a shorter cycle than each of the caches of the layer lower than the cache of the upper layer. I decided to complete it.

【００３７】また、キャッシュ間のデータの統一性を保
証するために、請求項８の発明では、前記複数階層のキ
ャッシュは、上位の階層のキャッシュのあるアドレスの
データが書き換えられた場合には該上位の階層のキャッ
シュより下位の階層のキャッシュの各々の同一のアドレ
スのデータが該上位の階層のキャッシュと同一のデータ
で書き換えられる機能を有することとすることにより、
上位の階層のキャッシュについてライトスルー方式のコ
ヒーレンシープロトコルを実現している。Further, in order to ensure the uniformity of data between the caches, in the invention of claim 8, the caches of a plurality of layers are set in the case where data at an address in a cache of an upper layer is rewritten. By having the function of rewriting the data of the same address in each of the caches of the lower layer than the cache of the upper layer with the same data as the cache of the upper layer,
A write-through type coherency protocol is implemented for the caches in the upper layers.

【００３８】[0038]

【作用】請求項１の発明によれば、複数の命令の並列実
行に際して複数階層のキャッシュが同時にアクセスされ
るので、上位の階層のキャッシュへのアクセスがミスし
た場合に限って下位の階層のキャッシュをアクセスする
従来の構成とは違ってキャッシュに対するアクセス待ち
が解消でき、命令実行速度が向上する。According to the first aspect of the present invention, since caches of a plurality of layers are simultaneously accessed when a plurality of instructions are executed in parallel, caches of a lower layer are cached only when an access to a cache of an upper layer is missed. Unlike the conventional configuration for accessing the cache, the wait for access to the cache can be eliminated, and the instruction execution speed is improved.

【００３９】また、請求項２の発明によれば、各々メモ
リアクセスを必要とする複数の命令の並列実行に際して
複数階層のキャッシュへの各々互いに異なるアドレスに
基づいたアクセスが同時に開始されるので、複数のロー
ド命令の高速並列実行を実現できる。Further, according to the second aspect of the present invention, when a plurality of instructions each requiring a memory access are executed in parallel, access to the caches of a plurality of hierarchies based on mutually different addresses is started at the same time. High-speed parallel execution of the load instruction of can be realized.

【００４０】また、請求項３の発明によれば、各々メモ
リアクセスを必要とする複数の命令の並列実行に際して
上位の階層のキャッシュのアクセスミスが生じた場合で
も、該アクセスミスが生じた命令についての下位の階層
のキャッシュへのアクセスは、メモリアクセスを必要と
する他の命令に関するキャッシュアクセスの実行と並行
して開始されるので、複数のロード命令の高速並列実行
を保証できる。つまり、従来は最初のロード命令の実行
においてキャッシュへのアクセスがミスした場合には後
続するロード命令のペナルティが甚だしく大きくなって
いたのに比べて、そのペナルティを大幅に低減すること
ができる。According to the third aspect of the present invention, even when an access miss occurs in a cache in an upper hierarchy when a plurality of instructions each requiring a memory access are executed in parallel, the instruction in which the access miss occurs Since the access to the cache in the lower hierarchy of is started in parallel with the execution of the cache access for other instructions that require memory access, high-speed parallel execution of multiple load instructions can be guaranteed. In other words, in the past, when the access to the cache is missed in the execution of the first load instruction, the penalty of the subsequent load instruction was significantly increased, but the penalty can be greatly reduced.

【００４１】また、請求項４の発明によれば、メモリア
クセスを必要とする命令とメモリアクセスを必要としな
い他の命令との並列実行に際して、メモリアクセスを必
要とする命令について下位の階層のキャッシュへのアク
セスが上位の階層のキャッシュへのアクセスと同時に先
行的に開始されるので、上位の階層のキャッシュのアク
セスミスが生じた場合でも、先行的に開始していた下位
の階層のキャッシュへのアクセスを有効とすることによ
り下位の階層のキャッシュに対するアクセスを早く完了
することができる。つまり、上位の階層のキャッシュへ
のアクセスがミスしたことを確認したうえで下位の階層
のキャッシュへのアクセスを開始する場合に比べてアク
セスミスに対するペナルティが低減され、キャッシュの
アクセスミスが生じた場合でもロード命令とロード命令
以外の命令との高速並列実行を実現できる。According to the invention of claim 4, in parallel execution of an instruction requiring memory access and another instruction not requiring memory access, a cache in a lower hierarchy for the instruction requiring memory access. Since the access to the cache of the upper layer is started in advance at the same time as the access to the cache of the upper layer, even if an access miss of the cache of the upper layer occurs, the cache of the lower layer that was started earlier is accessed. By making the access valid, the access to the cache in the lower hierarchy can be completed quickly. In other words, when a cache access miss occurs, the penalty for an access miss is reduced compared to when the access to the cache of the upper layer is confirmed and then the access to the cache of the lower layer is started. However, high-speed parallel execution of load instructions and instructions other than load instructions can be realized.

【００４２】また、請求項５の発明によれば、アドレス
変換装置が複数の論理アドレスを複数階層のキャッシュ
の各々のための物理アドレスに並列に変換するので、物
理アドレスをもってアクセスすべき複数階層のキャッシ
ュへの同時アクセスが可能となる。According to the invention of claim 5, the address translation device translates a plurality of logical addresses into physical addresses for the caches of a plurality of layers in parallel, so that the physical addresses of a plurality of layers to be accessed are accessed. Simultaneous access to the cache is possible.

【００４３】また、請求項６の発明に係る情報処理装置
の具体的な構成によれば、２つの命令の並列実行に際し
て２階層のキャッシュが同時にアクセスされるので、キ
ャッシュに対するアクセス待ちが解消でき、実行速度が
向上する。また、アクセスミスに対するペナルティが低
減される。According to the concrete configuration of the information processing apparatus of the sixth aspect of the present invention, since the caches of the two layers are simultaneously accessed when the two instructions are executed in parallel, the waiting for access to the cache can be eliminated. Execution speed is improved. Further, the penalty for access miss is reduced.

【００４４】また、請求項７の発明によれば、下位の階
層のキャッシュに比べて記憶容量の小さい上位の階層の
キャッシュは短いサイクルでアクセスが完了するので、
アクセス速度の異なる複数階層のキャッシュ間の協調を
図ることができる。Further, according to the invention of claim 7, since the cache of the upper layer, which has a smaller storage capacity than the cache of the lower layer, is accessed in a short cycle,
It is possible to achieve cooperation among caches of multiple layers having different access speeds.

【００４５】また、請求項８の発明によれば、上位の階
層のキャッシュのあるアドレスのデータが書き換えられ
た場合には下位の階層のキャッシュの同一アドレスのデ
ータが該上位の階層のキャッシュと同一のデータで書き
換えられるので、複数階層のキャッシュ間のデータの統
一性を保証することができる。Further, according to the invention of claim 8, when the data of a certain address in the cache of the upper layer is rewritten, the data of the same address of the cache of the lower layer is the same as the cache of the upper layer. Since the data is rewritten with the data of, it is possible to guarantee the consistency of the data between the caches of multiple layers.

【００４６】[0046]

【実施例】以下、本発明の一実施例を図面に基づいて詳
細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described in detail below with reference to the drawings.

【００４７】図１は、中央処理装置（ＣＰＵ）と主記憶
（不図示）との間に２階層のキャッシュを備えた本発明
の実施例に係る情報処理装置のブロック図である。同図
において、１は２つの命令を並列に実行できるＣＰＵ、
４は第１階層のキャッシュ（Ｌ１−ＣＡＣＨＥ）、５は
第２階層のキャッシュ（Ｌ２−ＣＡＣＨＥ）である。た
だし、アクセス速度はＬ１−ＣＡＣＨＥ４の方が速く、
記憶容量はＬ２−ＣＡＣＨＥ５の方が大きい。また、２
はＬ１−ＣＡＣＨＥ４に物理アドレスを渡すための第１
のアドレス変換装置（ＭＭＵａ）、３はＬ２−ＣＡＣＨ
Ｅ５に物理アドレスを渡すための第２のアドレス変換装
置（ＭＭＵｂ）、６，８及び１２はアドレスバス、７，
９及び１１はデータバスである。１０は、Ｌ１−ＣＡＣ
ＨＥ４へのアクセスのための第１のアドレスバス６上の
アドレスと同じアドレスが乗っているＬ１，Ｌ２−ＣＡ
ＣＨＥ４，５間のアドレスバス１２と、ＣＰＵ１からＬ
１−ＣＡＣＨＥ４を経由せずにＬ２−ＣＡＣＨＥ５を直
接アクセスするための第２のアドレスバス８とのうちの
いずれかを選択するためのバスセレクタである。また、
１３はＬ１−ＣＡＣＨＥ４へのアクセスがミスした場合
にその旨をＣＰＵ１に伝えるためのミスヒット信号線、
１４はＣＰＵ１からバスセレクタ１０へのバス選択信号
線である。FIG. 1 is a block diagram of an information processing apparatus according to an embodiment of the present invention, which is provided with a two-level cache between a central processing unit (CPU) and a main memory (not shown). In the figure, 1 is a CPU capable of executing two instructions in parallel,
Reference numeral 4 is a first layer cache (L1-CACHE), and 5 is a second layer cache (L2-CACHE). However, the access speed of L1-CACHE4 is faster,
The storage capacity of L2-CACHE5 is larger. Also, 2
Is the first to pass the physical address to L1-CACHE4
Address translator (MMUa), 3 is L2-CACH
A second address translation unit (MMUb) for passing a physical address to E5, 6, 8 and 12 are address buses, 7,
Reference numerals 9 and 11 are data buses. 10 is L1-CAC
L1, L2-CA carrying the same address as the address on the first address bus 6 for accessing HE4
Address bus 12 between CHE 4 and 5 and CPU 1 to L
It is a bus selector for selecting either of the second address bus 8 for directly accessing L2-CACHE5 without passing through 1-CACHE4. Also,
Reference numeral 13 is a mishit signal line for notifying the CPU 1 when the access to L1-CACHE 4 is missed,
Reference numeral 14 is a bus selection signal line from the CPU 1 to the bus selector 10.

【００４８】なお、Ｌ１−ＣＡＣＨＥ４へのアクセスは
タグの読み出しとタグの比較及びデータの読み出しとを
各々１サイクルで実行する合計２サイクルのパイプライ
ン動作となっており、Ｌ２−ＣＡＣＨＥ５へのアクセス
はタグの読み出しとタグの比較とデータの読み出しとを
各々１サイクルで実行する合計３サイクルのパイプライ
ン動作となっている。つまり、Ｌ２−ＣＡＣＨＥ５はＬ
１−ＣＡＣＨＥ４に比べてアクセス速度が遅くかつ大容
量であるので、図７の場合と同様に、Ｌ１−ＣＡＣＨＥ
４へのアクセスは２サイクル（Ｅ及びＭステージ）で済
むところ、Ｌ２−ＣＡＣＨＥ５へのアクセスには３サイ
クル（Ｅ１、Ｅ２及びＭステージ）を必要とするのであ
る。また、Ｌ１−ＣＡＣＨＥ４はライトスルー方式のコ
ヒーレンシープロトコルを採用したものであって、Ｌ１
−ＣＡＣＨＥ４のデータをストア命令等により更新した
とき、そのＬ１−ＣＡＣＨＥ４で更新されたデータと同
一アドレスのＬ２−ＣＡＣＨＥ５におけるデータも同時
に更新される。The access to L1-CACHE4 is a pipeline operation of a total of two cycles, in which tag reading, tag comparison and data reading are executed in one cycle each, and access to L2-CACHE5 is performed. The pipeline operation is a total of 3 cycles in which tag reading, tag comparison, and data reading are each performed in one cycle. That is, L2-CACHE5 is L
Since the access speed is slower and the capacity is larger than that of 1-CACHE4, as in the case of FIG. 7, L1-CACHE is used.
Accessing 4 requires only 2 cycles (E and M stages), while accessing L2-CACHE5 requires 3 cycles (E1, E2 and M stages). L1-CACHE4 is a write-through type coherency protocol.
-When the data in CACHE4 is updated by a store instruction or the like, the data in L2-CACHE5 at the same address as the data updated in L1-CACHE4 is also updated at the same time.

【００４９】図１の情報処理装置において、メモリアク
セスを必要とする２つのロード命令をＣＰＵ１が並列実
行する場合には、互いに独立した２つのアドレスバス
６，８上に各々異なる論理アドレスがＣＰＵ１から送出
される。そして、ＣＰＵ１から第１のアドレスバス６を
通じて出された一方の論理アドレスの一部分がＭＭＵａ
２により物理アドレスに変換され、残りの部分と合わせ
てＬ１−ＣＡＣＨＥ４がアクセスされる。そして、その
アクセスがヒットした場合には、Ｌ１−ＣＡＣＨＥ４か
ら第１のデータバス７を通じてＣＰＵ１にデータが取り
込まれる。同時にＣＰＵ１から第２のアドレスバス８を
通じて出された他方の論理アドレスの一部分がＭＭＵｂ
３により物理アドレスに変換され、残りの部分と合わせ
てバスセレクタ１０を介してＬ２−ＣＡＣＨＥ５がアク
セスされる。そして、そのアクセスがヒットした場合に
は、Ｌ２−ＣＡＣＨＥ５から第２のデータバス９を通じ
てＣＰＵ１にデータが取り込まれる。また、Ｌ１−ＣＡ
ＣＨＥ４のアクセスにおいてミスした場合には、セレク
タ１０がＬ１，Ｌ２−ＣＡＣＨＥ間のアドレスバス１２
の側に切り換えられて、Ｌ１−ＣＡＣＨＥ４をアクセス
したデ−タのアドレスと同じアドレスがＬ２−ＣＡＣＨ
Ｅ５に渡され、そのＬ２−ＣＡＣＨＥ５からデータバス
１１，７を通じてＣＰＵ１にデータが取り込まれる。な
お、Ｌ２−ＣＡＣＨＥ５でもミスした場合には主記憶か
らデータを取り込むのであるが、そのための構成は図示
を省略している。In the information processing apparatus of FIG. 1, when the CPU 1 executes two load instructions requiring memory access in parallel, different logical addresses from the CPU 1 on the two independent address buses 6 and 8, respectively. Sent out. Then, a part of one logical address issued from the CPU 1 through the first address bus 6 is MMUa.
It is converted to a physical address by 2 and L1-CACHE4 is accessed together with the remaining part. When the access hits, the data is fetched from the L1-CACHE 4 to the CPU 1 through the first data bus 7. At the same time, a part of the other logical address issued from the CPU 1 through the second address bus 8 is MMUb.
3 is converted into a physical address, and L2-CACHE5 is accessed through the bus selector 10 together with the remaining part. When the access hits, the data is fetched from the L2-CACHE 5 to the CPU 1 through the second data bus 9. Also, L1-CA
When a miss occurs in the access of CHE4, the selector 10 causes the address bus 12 between L1, L2-CACHE.
, The same address as the address of the data that accessed L1-CACHE4 is switched to the L2-CACH
The data is transferred to E5, and the data is fetched from the L2-CACHE 5 to the CPU 1 through the data buses 11 and 7. If L2-CACHE5 also makes a mistake, data is fetched from the main memory, but the configuration for that is omitted in the figure.

【００５０】図２は、図１中のＣＰＵ１の内部構成を示
すブロック図である。図２において、２１は順次実行す
べき複数の命令を保持しておくための命令バッファ（Ｉ
Ｂ）、２２はＩＢ２１からフェッチした命令を解読する
ための第１の命令解読器（ＤＥＣａ）、２３は同じくＩ
Ｂ２１からフェッチした次の命令を解読するための第２
の命令解読器（ＤＥＣｂ）、２４はＤＥＣａ２２により
解読した命令を実行するための第１の演算器（ＡＬＵ
ａ）、２５はＤＥＣｂ２３により解読した命令を実行す
るための第２の演算器（ＡＬＵｂ）、２６は例えばｒ０
〜ｒ３１の３２本のレジスタを備えたレジスタファイル
（ＲＦ）、２７及び２８はアドレスセレクタ、２９はデ
ータセレクタ、３０は２つのアドレスセレクタ２７，２
８及びデータセレクタ２９並びに前記バスセレクタ１０
を各々切り換えるための制御回路（ＣＴＲＬ）である。
また、３１及び３２はフェッチ信号線、３３及び３４は
デコード信号線、３５及び３６はアドレス又はデータの
伝送のための信号線、３７はＣＴＲＬ３０から２つのア
ドレスセレクタ２７，２８へのアドレス選択信号線、３
８はＣＴＲＬ３０からデータセレクタ２９へのデータ選
択信号線、３９はデータセレクタ２９からＲＦ２６への
データの伝送のための信号線、４０及び４１はＲＦ２６
からＡＬＵａ２４，ＡＬＵｂ２５へのデータの伝送のた
めの信号線である。前記の第１のアドレスバス６は第１
のアドレスセレクタ２７の出力側に、第２のアドレスバ
ス８は第２のアドレスセレクタ２８の出力側に、第１及
び第２のデータバス７，９はデータセレクタ２９の入力
側に、ミスヒット信号線１３及びバス選択信号線１４は
ＣＴＲＬ３０に各々接続される。FIG. 2 is a block diagram showing the internal structure of the CPU 1 in FIG. In FIG. 2, reference numeral 21 denotes an instruction buffer (I) for holding a plurality of instructions to be sequentially executed.
B), 22 is a first instruction decoder (DECa) for decoding the instruction fetched from the IB 21, and 23 is also I
Second to decode the next instruction fetched from B21
Instruction decoder (DECb), 24 is a first arithmetic unit (ALU) for executing the instruction decoded by the DECa 22.
a), 25 is a second arithmetic unit (ALUb) for executing the instruction decoded by the DECb 23, and 26 is, for example, r0
Register file (RF) including 32 registers of r31 to r31, 27 and 28 are address selectors, 29 is a data selector, and 30 is two address selectors 27 and 2.
8 and data selector 29 and the bus selector 10
Is a control circuit (CTRL) for switching each of the above.
Further, 31 and 32 are fetch signal lines, 33 and 34 are decode signal lines, 35 and 36 are signal lines for transmitting addresses or data, and 37 is an address selection signal line from the CTRL 30 to the two address selectors 27 and 28. Three
8 is a data selection signal line from the CTRL 30 to the data selector 29, 39 is a signal line for transmitting data from the data selector 29 to RF 26, and 40 and 41 are RF 26
From ALUa24 and ALUb25. The first address bus 6 is the first
To the output side of the address selector 27, the second address bus 8 to the output side of the second address selector 28, and the first and second data buses 7 and 9 to the input side of the data selector 29. The line 13 and the bus selection signal line 14 are connected to the CTRL 30, respectively.

【００５１】次に、図３〜図６に基づいて、以上の構成
を有する本実施例に係る情報処理装置の動作を説明す
る。Next, the operation of the information processing apparatus according to this embodiment having the above configuration will be described with reference to FIGS.

【００５２】図３は、図１１及び図１２の場合と同一の
連続した４つのロード(ld)命令Ｓ１〜Ｓ４からなるプロ
グラムを実行するに際して、いずれのロード(ld)命令に
ついてもＬ１，Ｌ２−ＣＡＣＨＥ４，５のいずれもがア
クセスミスを生じない場合のタイミング図である。同図
において、Ｄは命令コードのフェッチ及び解読とオペラ
ンドデータのフェッチとのためのデコードステージを、
Ｅ、Ｅ１及びＥ２は実行ステージを、Ｍはキャッシュへ
のアクセスのためのメモリステージを、Ｓはレジスタへ
のデータのストアのためのストアステージを各々表わし
ている。FIG. 3 shows that when a program consisting of four consecutive load (ld) instructions S1 to S4, which is the same as in the case of FIGS. 11 and 12, is executed, L1 and L2- are applied to any of the load (ld) instructions. FIG. 11 is a timing diagram when neither CACHE 4 nor 5 causes an access miss. In the figure, D is a decode stage for fetching and decoding an instruction code and fetching operand data,
E, E1 and E2 represent execution stages, M represents a memory stage for accessing a cache, and S represents a store stage for storing data in a register.

【００５３】まず、ＣＰＵ１による第１のロード(ld)命
令Ｓ１と第２のロード(ld)命令Ｓ２との並列実行につい
て説明する。First, the parallel execution of the first load (ld) instruction S1 and the second load (ld) instruction S2 by the CPU 1 will be described.

【００５４】図３中のサイクル１は、両命令Ｓ１，Ｓ２
の各々についてのＤステージとなっている。つまり、Ｉ
Ｂ２１からＤＥＣａ２２への第１のロード(ld)命令Ｓ１
のフェッチ及びその解読と、同じくＩＢ２１からＤＥＣ
ｂ２３への第２のロード(ld)命令Ｓ２のフェッチ及びそ
の解読とが同時に行われる。また、両ロード(ld)命令Ｓ
１，Ｓ２の各々について、メモリアドレスの計算のため
のオペランドデータがＩＢ２１から更にフェッチされ
る。In cycle 1 in FIG. 3, both instructions S1 and S2 are executed.
It is the D stage for each of. That is, I
First load (ld) instruction S1 from B21 to DECa22
Fetching and decoding it, and also from IB21 to DEC
Fetching and decoding of the second load (ld) instruction S2 to b23 are performed at the same time. Both load (ld) instructions S
For each of S1 and S2, operand data for calculating the memory address is further fetched from IB21.

【００５５】サイクル２は、第１のロード(ld)命令Ｓ１
についてのＥステージであり、かつ第２のロード(ld)命
令Ｓ２についてのＥ１ステージである。まず、第１のロ
ード(ld)命令Ｓ１についてはオペランドデータがＤＥＣ
ａ２２からＡＬＵａ２４へ渡されてアドレス計算が実行
され、第２のロード(ld)命令Ｓ２についてはオペランド
データがＤＥＣｂ２３からＡＬＵｂ２５へ渡されてアド
レス計算が実行される。一方、ＤＥＣａ２２及びＤＥＣ
ｂ２３における解読結果は各々ＣＴＲＬ３０へも送ら
れ、ＡＬＵａ２４の側の信号線３５を第１のアドレスバ
ス６に接続するように第１のアドレスセレクタ２７が切
り換えられるとともに、ＡＬＵｂ２５の側の信号線３６
を第２のアドレスバス８に接続するように第２のアドレ
スセレクタ２８が切り換えられる。また、バスセレクタ
１０は、第２のアドレスバス８の側に切り換えられる。
これにより、ＡＬＵａ２４によるアドレス計算の結果が
第１のアドレスセレクタ２７を経て第１のアドレスバス
６上に送出されると同時に、ＡＬＵｂ２５によるアドレ
ス計算の結果が第２のアドレスセレクタ２８を経て第２
のアドレスバス８上に送出され、Ｌ２−ＣＡＣＨＥ５は
ＣＰＵ１からの直接アクセスが可能となる。そして、Ｍ
ＭＵａ２は第１のアドレスバス６上の論理アドレスを物
理アドレスに変換する一方、ＭＭＵｂ３は第２のアドレ
スバス８上の論理アドレスを物理アドレスに変換する。
また、Ｌ１，Ｌ２−ＣＡＣＨＥ４，５の各々から、検索
のためのタグが読み出される。In cycle 2, the first load (ld) instruction S1
And the E1 stage for the second load (ld) instruction S2. First, for the first load (ld) instruction S1, the operand data is DEC.
The address calculation is executed by passing the data from a22 to the ALU a24, and the operand data of the second load (ld) instruction S2 is transferred from the DECb23 to the ALUb25 and the address calculation is executed. On the other hand, DECa22 and DEC
The decoding result in b23 is also sent to the CTRL 30, respectively, and the first address selector 27 is switched so as to connect the signal line 35 on the ALUa 24 side to the first address bus 6 and the signal line 36 on the ALU b25 side.
The second address selector 28 is switched so that the second address selector 28 is connected to the second address bus 8. Further, the bus selector 10 is switched to the second address bus 8 side.
As a result, the result of the address calculation by the ALUa 24 is transmitted to the first address bus 6 via the first address selector 27, and at the same time, the result of the address calculation by the ALUb 25 is transmitted via the second address selector 28 to the second address selector 28.
On the address bus 8 of L.sub.2, and the L2-CACHE 5 can be directly accessed from the CPU 1. And M
The MUa 2 translates a logical address on the first address bus 6 into a physical address, while the MMUb 3 translates a logical address on the second address bus 8 into a physical address.
Also, a tag for retrieval is read from each of L1, L2-CACHE 4, 5.

【００５６】サイクル３は、第１のロード(ld)命令Ｓ１
についてのＭステージであり、かつ第２のロード(ld)命
令Ｓ２についてのＥ２ステージである。まず、第１のロ
ード(ld)命令Ｓ１については、前記Ｌ１−ＣＡＣＨＥ４
から読み出したタグと前記ＭＭＵａ２において得られた
物理アドレスのタグとの比較を経て、Ｌ１−ＣＡＣＨＥ
４からロードすべきデータが第１のデータバス７上に読
み出される。一方、第２のロード(ld)命令Ｓ２について
は、前記Ｌ２−ＣＡＣＨＥ５から読み出したタグと前記
ＭＭＵｂ３において得られた物理アドレスのタグとの比
較のみが実行される。Ｌ２−ＣＡＣＨＥ５はＬ１−ＣＡ
ＣＨＥ４に比べてアクセス速度が遅くかつ大容量である
ので、Ｌ１−ＣＡＣＨＥ４についてはタグの比較とデー
タの読み出しとを１つのサイクルで完了することができ
るのに対して、Ｌ２−ＣＡＣＨＥ５についてはタグの比
較のみに１サイクルを要するのである。In cycle 3, the first load (ld) instruction S1
, And the E2 stage for the second load (ld) instruction S2. First, regarding the first load (ld) instruction S1, the L1-CACHE4
After comparing the tag read from the tag with the tag of the physical address obtained in the MMUa2, L1-CACHE
The data to be loaded from 4 is read onto the first data bus 7. On the other hand, for the second load (ld) instruction S2, only the comparison between the tag read from the L2-CACHE5 and the tag of the physical address obtained in the MMUb3 is executed. L2-CACHE5 is L1-CA
Since the access speed is slower and the capacity is larger than that of CHE4, tag comparison and data reading can be completed in one cycle for L1-CACHE4, whereas tag access is possible for L2-CACHE5. Only one comparison requires one cycle.

【００５７】サイクル４は、第１のロード(ld)命令Ｓ１
についてのＳステージであり、かつ第２のロード(ld)命
令Ｓ２についてのＭステージである。図３は、第１のロ
ード(ld)命令Ｓ１に対してＬ１−ＣＡＣＨＥ４がヒット
した場合を示しており、ＭステージからＳステージへ直
ちに移行し、該第１のロード(ld)命令Ｓ１に呼応した第
１のデータバス７上のデータがデータセレクタ２９を経
てＲＦ２６の中にストアされる。一方、第２のロード(l
d)命令Ｓ２については、Ｌ２−ＣＡＣＨＥ５から第２の
データバス９上へのデータの読み出しが行われる。In cycle 4, the first load (ld) instruction S1
, And the M stage for the second load (ld) instruction S2. FIG. 3 shows a case where L1-CACHE4 is hit with respect to the first load (ld) instruction S1, and the stage immediately shifts from the M stage to the S stage to respond to the first load (ld) instruction S1. The data on the first data bus 7 is stored in the RF 26 via the data selector 29. On the other hand, the second load (l
d) For the instruction S2, data is read from the L2-CACHE 5 onto the second data bus 9.

【００５８】そして、サイクル５では、データセレクタ
２９が第１のデータバス７の側から第２のデータバス９
の側へ切り換えられて、第２のロード(ld)命令Ｓ２に呼
応した第２のデータバス９上のデータがＲＦ２６の中に
ストアされる。Then, in cycle 5, the data selector 29 moves from the first data bus 7 side to the second data bus 9 side.
, And the data on the second data bus 9 in response to the second load (ld) instruction S2 is stored in the RF 26.

【００５９】第３及び第４のロード(ld)命令Ｓ３，Ｓ４
の実行も、以上の第１及び第２のロード(ld)命令Ｓ１，
Ｓ２の実行と同様にして、サイクル２からサイクル６に
かけて行われる。このとき、第３のロード(ld)命令Ｓ３
についてはＬ１−ＣＡＣＨＥ４をアクセスし、第４のロ
ード(ld)命令Ｓ４についてはＬ２−ＣＡＣＨＥ５をアク
セスする。ここで、サイクル３において、Ｌ２−ＣＡＣ
ＨＥ５に関し、第２のロード(ld)命令Ｓ２のＥ２ステー
ジにおけるタグの比較と第４のロード(ld)命令Ｓ４のＥ
１ステージにおけるタグの読み出しとがオーバーラップ
しているが、Ｌ２−ＣＡＣＨＥ５のアクセスはタグの読
み出し、タグの比較及びデータの読み出しが３サイクル
のパイプライン動作で行われているため、問題はない。Third and fourth load (ld) instructions S3 and S4
Is executed by executing the above first and second load (ld) instructions S1,
Similar to the execution of S2, it is performed from cycle 2 to cycle 6. At this time, the third load (ld) instruction S3
Is accessed, and L2-CACHE5 is accessed for the fourth load (ld) instruction S4. Here, in cycle 3, L2-CAC
Regarding HE5, tag comparison at the E2 stage of the second load (ld) instruction S2 and E of the fourth load (ld) instruction S4
Although the reading of the tag in the first stage overlaps, there is no problem in the access of L2-CACHE5 because the reading of the tag, the comparison of the tag, and the reading of the data are performed in the pipeline operation of 3 cycles.

【００６０】以上のとおり、従来は図１１に示すように
いずれのキャッシュにもアクセスミスがないにもかかわ
らず７サイクルを要していた連続する４つのロード(ld)
命令Ｓ１〜Ｓ４からなるプログラムの実行が、本実施例
では、各ロード命令についてＬ１，Ｌ２−ＣＡＣＨＥ
４，５を並列にアクセスする構成の採用により、図３に
示すように６サイクルで完了する。As described above, as shown in FIG. 11, four consecutive loads (ld) that conventionally required 7 cycles even though there was no access miss in any cache.
In the present embodiment, the execution of the program consisting of the instructions S1 to S4 is L1, L2-CACHE for each load instruction.
By adopting a configuration of accessing 4, 5 in parallel, it is completed in 6 cycles as shown in FIG.

【００６１】図４は、図３の場合と同一の連続した４つ
のロード(ld)命令Ｓ１〜Ｓ４からなるプログラムを実行
するに際して、第１のロード(ld)命令Ｓ１についてのＬ
１−ＣＡＣＨＥ４へのアクセスのみがミスした場合のタ
イミング図である。FIG. 4 shows a case where the same load (ld) instruction S1 to S4 as in FIG.
It is a timing diagram when only access to 1-CACHE4 misses.

【００６２】サイクル３の第１のロード(ld)命令Ｓ１の
ためのＭステージにおいてＬ１−ＣＡＣＨＥ４へのアク
セスがミスした場合には、ＣＰＵ１中のＣＴＲＬ３０
は、Ｌ１−ＣＡＣＨＥ４へのアクセスがミスしたことを
示す信号をミスヒット信号線１３を通じて受け取り、該
信号に呼応して、アドレスバス８側に切り換えられてい
たバスセレクタ１０を第１のロード(ld)命令Ｓ１のため
にＬ１，Ｌ２−ＣＡＣＨＥ４，５間のアドレスバス１２
側に切り換える。この結果、第１のロード(ld)命令Ｓ１
にとって、サイクル４はＬ２−ＣＡＣＨＥ５からのタグ
の読み出しのためのＥ１ステージ、サイクル５はＬ２−
ＣＡＣＨＥ５から読み出したタグとＭＭＵａ２における
サイクル４で得られた物理アドレスのタグとの比較のた
めのＥ２ステージ、サイクル６はＬ２−ＣＡＣＨＥ５か
らデータバス１１，７上へのデータの読み出しのための
Ｍステージ、サイクル７はＲＦ２６へのデータのストア
のためのＳステージとなる。If the access to L1-CACHE4 in the M stage for the first load (ld) instruction S1 in cycle 3 is missed, CTRL30 in CPU1
Receives a signal indicating that the access to L1-CACHE4 is missed through the mishit signal line 13, and in response to the signal, the bus selector 10 which has been switched to the address bus 8 side is loaded into the first load (ld ) Address bus 12 between L1, L2-CACHE 4, 5 for instruction S1
Switch to the side. As a result, the first load (ld) instruction S1
, Cycle 4 is an E1 stage for reading a tag from L2-CACHE 5, cycle 5 is an L2-stage.
E2 stage for comparing the tag read from CACHE5 with the tag of the physical address obtained in cycle 4 in MMUa2, cycle 6 is the M stage for reading data from L2-CACHE5 onto the data buses 11 and 7. , Cycle 7 is the S stage for storing data in RF 26.

【００６３】この間に、第２のロード(ld)命令Ｓ２につ
いては、サイクル４においてＬ２−ＣＡＣＨＥ５から第
２のデータバス９へのデータの読み出し（Ｍステージ）
が行われ、サイクル５においてＲＦ２６へのデータのス
トア（Ｓステージ）が完了する。In the meantime, with respect to the second load (ld) instruction S2, the data is read from the L2-CACHE 5 to the second data bus 9 in the cycle 4 (M stage).
Then, the storage of data in the RF 26 (S stage) is completed in cycle 5.

【００６４】第３及び第４のロード(ld)命令Ｓ３，Ｓ４
の実行も、図３の場合と同様に、以上の第１及び第２の
ロード(ld)命令Ｓ１，Ｓ２の実行と並行して、サイクル
２からサイクル６にかけて行われる。このとき、第３の
ロード(ld)命令Ｓ３についてはＬ１−ＣＡＣＨＥ４をア
クセスし、第４のロード(ld)命令Ｓ４についてはＬ２−
ＣＡＣＨＥ５をアクセスしようとする。ところが、サイ
クル３においてＬ１−ＣＡＣＨＥ４へのアクセスがミス
したことを示す信号をミスヒット信号線１３を通じて受
け取ったＣＴＲＬ３０は、第４のロード(ld)命令Ｓ４の
Ｅ２ステージを直ちに無効として該命令についてはサイ
クル３を「ｈｏｌｄ」としたうえで、サイクル４におい
て、第１のアドレスバス６を通じてＡＬＵｂ２５のアド
レス計算結果を出力するように２つのアドレスセレクタ
２７，２８を切り換える。この結果、第３及び第４のロ
ード(ld)命令Ｓ３，Ｓ４については、いずれもＬ１−Ｃ
ＡＣＨＥ４をアクセスすることとなる。つまり、第３の
ロード(ld)命令Ｓ３にとって、サイクル３はＡＬＵａ２
４のアドレス計算結果に基づくＬ１−ＣＡＣＨＥ４から
のタグの読み出しのためのＥステージ、サイクル４はタ
グの比較及びＬ１−ＣＡＣＨＥ４からのデータの読み出
しのためのＭステージ、サイクル５はデータのストアの
ためのＳステージとなる。また、第４のロード(ld)命令
Ｓ４にとって、サイクル４はＡＬＵｂ２５のアドレス計
算結果に基づくＬ１−ＣＡＣＨＥ４からのタグの読み出
しのためのＥステージ、サイクル５はタグの比較及びＬ
１−ＣＡＣＨＥ４からのデータの読み出しのためのＭス
テージ、サイクル６はデータのストアのためのＳステー
ジとなる。Third and fourth load (ld) instructions S3 and S4
Is executed from cycle 2 to cycle 6 in parallel with the execution of the first and second load (ld) instructions S1 and S2 as in the case of FIG. At this time, L1-CACHE4 is accessed for the third load (ld) instruction S3, and L2- is accessed for the fourth load (ld) instruction S4.
Try to access CACHE5. However, the CTRL 30, which has received the signal indicating that the access to the L1-CACHE 4 is missed in the cycle 3 through the mishit signal line 13, immediately invalidates the E2 stage of the fourth load (ld) instruction S4 and After setting cycle 3 to “hold”, in cycle 4, the two address selectors 27 and 28 are switched so as to output the address calculation result of the ALUb 25 through the first address bus 6. As a result, both the third and fourth load (ld) instructions S3 and S4 are L1-C.
ACHE4 will be accessed. That is, for the third load (ld) instruction S3, cycle 3 is ALUa2.
E stage for reading the tag from L1-CACHE4 based on the address calculation result of 4, the M stage for comparing the tag and reading the data from L1-CACHE4, cycle 5 for storing the data It will be the S stage. For the fourth load (ld) instruction S4, cycle 4 is the E stage for reading the tag from L1-CACHE4 based on the address calculation result of the ALUb 25, and cycle 5 is the tag comparison and L stage.
The M stage for reading data from 1-CACHE 4 and the S stage for storing data in cycle 6 are provided.

【００６５】以上のとおり、従来は図１２に示すように
キャッシュのアクセスミスがある場合には１０サイクル
を要していた連続する４つのロード(ld)命令Ｓ１〜Ｓ４
の実行が、本実施例では図４に示すように７サイクルで
完了する。As described above, the conventional four load (ld) instructions S1 to S4 required 10 cycles when there is a cache access miss as shown in FIG.
Is completed in 7 cycles in this embodiment as shown in FIG.

【００６６】次に、メモリアクセスを必要とする１つの
ロード命令Ｓ１とメモリアクセスを必要としない３つの
算術演算命令Ｓ２〜Ｓ４との連続した４つの命令からな
るプログラムを実行する場合について説明する。Next, a case will be described in which a program consisting of four consecutive instructions, one load instruction S1 requiring memory access and three arithmetic operation instructions S2 to S4 not requiring memory access, is executed.

【００６７】図５は、図９及び図１０の場合と同一の４
つの命令すなわちロード(ld)命令Ｓ１、加算(add) 命令
Ｓ２、減算(sub) 命令Ｓ３及び加算(add) 命令Ｓ４から
なるプログラムを実行するに際して、ロード命令Ｓ１に
対してＬ１−ＣＡＣＨＥ４がヒットしたときのタイミン
グ図である。FIG. 5 is the same as the case of FIG. 9 and FIG.
When executing a program consisting of one instruction, that is, a load (ld) instruction S1, an addition (add) instruction S2, a subtraction (sub) instruction S3, and an addition (add) instruction S4, L1-CACHE4 is hit to the load instruction S1. FIG.

【００６８】図５において、サイクル１は、両命令Ｓ
１，Ｓ２の各々についてのＤステージとなっている。つ
まり、ＩＢ２１からＤＥＣａ２２へのロード(ld)命令Ｓ
１のフェッチ及びその解読と、同じくＩＢ２１からＤＥ
Ｃｂ２３への加算(add) 命令Ｓ２のフェッチ及びその解
読とが同時に行われる。ロード(ld)命令Ｓ１について
は、ロードの対象とすべきメモリアドレスの計算のため
のオペランドデータがＩＢ２１から更にフェッチされ
る。In FIG. 5, in cycle 1, both instructions S
It is the D stage for each of S1 and S2. That is, the load (ld) instruction S from IB21 to DECa22
Fetch 1 and decode it, also from IB21 to DE
The addition (add) instruction S2 to Cb23 is fetched and decoded at the same time. For the load (ld) instruction S1, the operand data for calculating the memory address to be loaded is further fetched from the IB21.

【００６９】サイクル２は、両命令Ｓ１，Ｓ２の各々に
ついてのＥステージとなっている。ただし、ロード(ld)
命令Ｓ１については、Ｅ１ステージともなっている。す
なわち、ＤＥＣａ２２及びＤＥＣｂ２３における解読結
果が各々ＣＴＲＬ３０へ送られた場合に、一方の命令が
メモリアクセスを必要とし、かつ他方の命令がメモリア
クセスを必要としないときは、ＣＴＲＬ３０は、第１及
び第２のアドレスバス６，８上に同じ論理アドレスを送
出するように２つのアドレスセレクタ２７，２８を切り
換える。この例では２つの命令解読器ＤＥＣａ２２，Ｄ
ＥＣｂ２３のうちのＤＥＣａ２２で解読された命令がメ
モリアクセスを必要とする命令であるので、ＡＬＵａ２
４の側の信号線３５を第１のアドレスバス６に接続する
ように第１のアドレスセレクタ２７が切り換えられると
ともに、同じくＡＬＵａ２４の側の信号線３５を第２の
アドレスバス８にも接続するように第２のアドレスセレ
クタ２８が切り換えられる。また、バスセレクタ１０
は、第２のアドレスバス８側に切り換えられる。この結
果、ＡＬＵａ２４におけるロード(ld)命令Ｓ１に関する
アドレス計算の結果は、Ｌ１，Ｌ２−ＣＡＣＨＥ４，５
の双方を同時にアクセスするために用いられることとな
る。つまり、サイクル２は、ロード(ld)命令Ｓ１にとっ
てＥステージかつＥ１ステージとなるのであって、ＭＭ
Ｕａ２は第１のアドレスバス６上の論理アドレスを物理
アドレスに変換する一方、ＭＭＵｂ３は第２のアドレス
バス８上の同じ論理アドレスを物理アドレスに変換す
る。また、Ｌ１，Ｌ２−ＣＡＣＨＥ４，５の各々から、
検索のためのタグが読み出される。一方、ＡＬＵｂ２５
は、以上のロード(ld)命令Ｓ１のためのＥかつＥ１ステ
ージの処理と並行して、ＤＥＣｂ２３から加算(add) 命
令Ｓ２の解読結果を受け取り、ＲＦ２６をアクセスして
加算を実行する。なお、ＣＰＵ１内のデータセレクタ２
９は、第１のデータバス７側に切り換えられる。Cycle 2 is the E stage for each of the instructions S1 and S2. However, load (ld)
The instruction S1 is also the E1 stage. That is, when the decoding results in the DECa 22 and the DECb 23 are sent to the CTRL 30, respectively, and when one instruction requires the memory access and the other instruction does not require the memory access, the CTRL 30 sets the first and second instructions. The two address selectors 27 and 28 are switched so that the same logical address is transmitted to the address buses 6 and 8. In this example, two instruction decoders DECa22, D
Since the instruction decoded by DECa22 of ECb23 is an instruction that requires memory access, ALUa2
The first address selector 27 is switched to connect the signal line 35 on the No. 4 side to the first address bus 6, and the signal line 35 on the ALUa 24 side is also connected to the second address bus 8. The second address selector 28 is switched to. In addition, the bus selector 10
Are switched to the second address bus 8 side. As a result, the result of the address calculation regarding the load (ld) instruction S1 in the ALUa 24 is L1, L2-CACHE4,5.
Will be used to access both simultaneously. In other words, cycle 2 is the E stage and the E1 stage for the load (ld) instruction S1.
Ua2 translates a logical address on the first address bus 6 into a physical address, while MMUb3 translates the same logical address on the second address bus 8 into a physical address. Also, from each of L1, L2-CACHE4,5
The tag for retrieval is read. On the other hand, ALUb25
Receives the decoding result of the add instruction S2 from the DECb23 in parallel with the processing of the E and E1 stages for the load (ld) instruction S1, and accesses the RF26 to execute the addition. The data selector 2 in the CPU 1
9 is switched to the first data bus 7 side.

【００７０】サイクル３は、ロード(ld)命令Ｓ１につい
てのＭステージかつＥ２ステージである。つまり、Ｌ１
−ＣＡＣＨＥ４から読み出したタグとＭＭＵａ２におい
て得られた物理アドレスのタグとの比較及びＬ１−ＣＡ
ＣＨＥ４から第１のデータバス７上へのデータの読み出
しを内容とするＭステージの動作と、Ｌ２−ＣＡＣＨＥ
５から読み出したタグとＭＭＵｂ３において得られた物
理アドレスのタグとの比較のみを内容とするＥ２ステー
ジの動作とが並行して進められる。図５はロード(ld)命
令Ｓ１に対してＬ１−ＣＡＣＨＥ４がヒットした場合を
示しており、この場合にはＥ２ステージで得られた結果
は無効とされてＭステージから次のＳステージへ直ちに
移行する。一方、加算(add)命令Ｓ２については、前記
Ｅステージ（サイクル２）から、以上のロード(ld)命令
Ｓ１のためのＭ及びＥ２ステージとしての１つのサイク
ルをおいて、ロード(ld)命令Ｓ１とともに次のＳステー
ジ（サイクル４）へ移行する。Cycle 3 is the M stage and E2 stage for the load (ld) instruction S1. That is, L1
-Comparison between the tag read from CACHE4 and the tag of the physical address obtained in MMUa2, and L1-CA
L2-CACHE operation of the M stage, which includes reading data from the CHE 4 onto the first data bus 7.
The operation of the E2 stage, which includes only the comparison between the tag read from step 5 and the tag of the physical address obtained in MMUb3, proceeds in parallel. FIG. 5 shows a case where L1-CACHE4 is hit with respect to the load (ld) instruction S1. In this case, the result obtained in the E2 stage is invalidated and the M stage immediately shifts to the next S stage. To do. On the other hand, with respect to the add (add) instruction S2, one cycle as the M and E2 stages for the load (ld) instruction S1 described above is placed after the load (ld) instruction S1 from the E stage (cycle 2). At the same time, the process proceeds to the next S stage (cycle 4).

【００７１】サイクル４では、ロード(ld)命令Ｓ１に呼
応した第１のデータバス７上のデータがデータセレクタ
２９を通じてＲＦ２６の中にストアされると同時に、加
算(add) 命令Ｓ２に呼応したＡＬＵｂ２５による加算結
果もＲＦ２６中にストアされる（Ｓステージ）。In cycle 4, the data on the first data bus 7 corresponding to the load (ld) instruction S1 is stored in the RF 26 through the data selector 29, and at the same time, the ALUb 25 corresponding to the add (add) instruction S2 is stored. The addition result of is also stored in the RF 26 (S stage).

【００７２】また、次の２つの命令すなわち減算(sub)
命令Ｓ３及び加算(add) 命令Ｓ４は、以上のロード(ld)
命令Ｓ１及び加算(add) 命令Ｓ２の実行と並行して、サ
イクル２〜５において、ＡＬＵａ２４及びＡＬＵｂ２５
の双方を用いて同時に実行される。したがって、図９の
場合と同じ５サイクルで、以上の４つの命令Ｓ１〜Ｓ４
からなるプログラムの実行が完了する。In addition, the following two instructions, that is, subtraction (sub)
Instruction S3 and addition (add) instruction S4 are the above load (ld)
In parallel with the execution of the instruction S1 and the addition (add) instruction S2, in cycles 2-5, ALUa24 and ALUb25
Both are executed simultaneously. Therefore, in the same five cycles as in the case of FIG. 9, the above four instructions S1 to S4 are executed.
The execution of the program consisting of is completed.

【００７３】図６は、以上の図５の場合と同一の４つの
命令すなわちロード(ld)命令Ｓ１、加算(add) 命令Ｓ
２、減算(sub) 命令Ｓ３及び加算(add) 命令Ｓ４からな
るプログラムを実行するに際して、ロード(ld)命令Ｓ１
に対してサイクル３のＭステージでＬ１−ＣＡＣＨＥ４
へのアクセスがミスした場合のタイミング図である。FIG. 6 shows the same four instructions as in the case of FIG. 5, namely, a load (ld) instruction S1 and an add (add) instruction S.
2. When executing a program consisting of a subtraction (sub) instruction S3 and an addition (add) instruction S4, a load (ld) instruction S1
Against L1-CACHE4 in the M stage of cycle 3
FIG. 8 is a timing diagram when access to the memory is missed.

【００７４】この場合には、サイクル３におけるＥ２ス
テージとしてのＬ２−ＣＡＣＨＥ５へのアクセスが有効
となり、サイクル４はロード(ld)命令Ｓ１にとってＭス
テージとなる。つまり、サイクル３におけるＬ２−ＣＡ
ＣＨＥ５に関するタグの比較の結果を受けて、サイクル
４ではロードすべきデータがＬ２−ＣＡＣＨＥ５から第
２のデータバス９上に読み出される。ＣＰＵ１内のデー
タセレクタ２９は、Ｌ１−ＣＡＣＨＥ４からミスヒット
信号線１３を通じてアクセスミスを知らせる信号を受け
取ったＣＴＲＬ３０によって第２のデータバス９側に切
り換えられており、サイクル５ではロード(ld)命令Ｓ１
に呼応した第２のデータバス９上のデータがデータセレ
クタ２９を通してＲＦ２６中にストアされる（Ｓステー
ジ）。In this case, access to L2-CACHE5 as the E2 stage in cycle 3 becomes valid, and cycle 4 becomes the M stage for the load (ld) instruction S1. That is, L2-CA in cycle 3
In the cycle 4, the data to be loaded is read from the L2-CACHE 5 onto the second data bus 9 in response to the result of the comparison of the tags relating to CHE 5. The data selector 29 in the CPU 1 is switched to the second data bus 9 side by the CTRL 30 which receives the signal notifying the access miss from the L1-CACHE 4 through the mishit signal line 13, and in the cycle 5, the load (ld) instruction S1 is sent.
The data on the second data bus 9 in response to is stored in the RF 26 through the data selector 29 (S stage).

【００７５】他の３つの算術演算命令Ｓ２〜Ｓ４につい
ては、図５の場合と同様にサイクル５までに処理が完了
するように並列実行が進められる。As for the other three arithmetic operation instructions S2 to S4, parallel execution is advanced so that the processing is completed by cycle 5, as in the case of FIG.

【００７６】以上のとおり、従来は図１０に示すように
Ｌ１−ＣＡＣＨＥ４のアクセスミスがある場合には７サ
イクルを要していた１つのロード命令Ｓ１と３つの算術
演算命令Ｓ２〜Ｓ４との連続した４つの命令からなるプ
ログラムの実行が、本実施例では、１つのロード(ld)命
令Ｓ１についてＬ１，Ｌ２−ＣＡＣＨＥ４，５を並列に
アクセスする構成の採用により、図６に示すように５サ
イクルで完了する。つまり、Ｌ１−ＣＡＣＨＥ４のアク
セスミスに起因したペナルティを減らすことができるの
である。As described above, one load instruction S1 and three arithmetic operation instructions S2 to S4, which conventionally took 7 cycles when there is an L1-CACHE4 access miss as shown in FIG. 10, are consecutive. In the present embodiment, the execution of the program consisting of four instructions is performed in five cycles as shown in FIG. 6 by adopting a configuration in which L1, L2-CACHE4,5 are accessed in parallel for one load (ld) instruction S1. Complete with. That is, it is possible to reduce the penalty caused by the access error of L1-CACHE4.

【００７７】なお、以上は各々論理アドレスを物理アド
レスに変換するための２つのアドレス変換装置（ＭＭＵ
ａ２，ＭＭＵｂ３）を備えた情報処理装置に関する説明
であったが、複数の読み出しポートを備えかつ複数の論
理アドレスを同時に物理アドレスに変換できる１つのア
ドレス変換装置を用いて同様の機能を有する情報処理装
置を構成できることは言うまでもない。また、ＣＰＵ１
から出力される第１及び第２のアドレスバス６，８上の
論理アドレスをそのまま用いてＬ１，Ｌ２−ＣＡＣＨＥ
４，５を並列にアクセスできる場合は、ＭＭＵａ２及び
ＭＭＵｂ３を設ける必要がない。The above description is based on the two address translation devices (MMU) for translating each logical address into a physical address.
a2, MMUb3), the information processing device having the same function by using one address conversion device having a plurality of read ports and capable of simultaneously converting a plurality of logical addresses into physical addresses. It goes without saying that the device can be configured. Also, CPU1
L1, L2-CACHE using the logical addresses on the first and second address buses 6 and 8 output from
If four and five can be accessed in parallel, it is not necessary to provide MMUa2 and MMUb3.

【００７８】[0078]

【発明の効果】以上説明してきたとおり、請求項１の発
明によれば、複数の命令の並列実行に際して複数階層の
キャッシュを同時にアクセスする構成を採用したので、
上位の階層のキャッシュへのアクセスがミスした場合に
限って下位の階層のキャッシュをアクセスする従来の構
成とは違ってキャッシュに対するアクセス待ちが解消で
き、命令の実行速度が向上する。As described above, according to the first aspect of the present invention, since the caches of a plurality of layers are simultaneously accessed in parallel execution of a plurality of instructions,
Unlike the conventional configuration in which the cache of the lower layer is accessed only when the access to the cache of the upper layer is missed, the wait for access to the cache can be eliminated, and the instruction execution speed is improved.

【００７９】また、請求項２の発明によれば、各々メモ
リアクセスを必要とする複数の命令の並列実行に際して
複数階層のキャッシュへの各々互いに異なるアドレスに
基づいたアクセスを同時に開始する構成を採用したの
で、例えば連続した複数のロード命令を有するプログラ
ムの実行に際してその実行が高速化する。According to the second aspect of the present invention, a configuration is adopted in which, when a plurality of instructions each requiring a memory access are executed in parallel, access to caches of a plurality of layers based on mutually different addresses is simultaneously started. Therefore, for example, when executing a program having a plurality of consecutive load instructions, the execution speed is increased.

【００８０】また、請求項３の発明によれば、各々メモ
リアクセスを必要とする複数の命令の並列実行に際して
上位の階層のキャッシュのアクセスミスが生じた場合に
は、該アクセスミスが生じた命令についての下位の階層
のキャッシュへのアクセスを、メモリアクセスを必要と
する他の命令に関するキャッシュアクセスの実行と並行
して開始する構成を採用したので、キャッシュのアクセ
スミスがある場合でも複数のロード命令の高速並列実行
を保証できる。According to the third aspect of the present invention, when a plurality of instructions each requiring a memory access are executed in parallel, when an access miss occurs in the cache in the upper hierarchy, the instruction causing the access miss occurs. Since a structure is adopted in which the access to the cache in the lower hierarchy of is started in parallel with the execution of the cache access for other instructions that require memory access, multiple load instructions are executed even if there is a cache access miss. High-speed parallel execution of can be guaranteed.

【００８１】また、請求項４の発明によれば、メモリア
クセスを必要とする命令とメモリアクセスを必要としな
い他の命令との並列実行に際して、メモリアクセスを必
要とする命令についての下位の階層のキャッシュへのア
クセスを上位の階層のキャッシュへのアクセスと同時に
先行的に開始する構成を採用したので、上位の階層のキ
ャッシュのアクセスミスが生じた場合のペナルティが低
減し、ロード命令とロード命令以外の命令との高速並列
実行を実現できる。According to the fourth aspect of the invention, in parallel execution of an instruction requiring memory access and another instruction not requiring memory access, a lower hierarchy of the instruction requiring memory access Since a structure is adopted in which access to the cache is started in advance at the same time as access to the cache in the upper layer, the penalty in the case of an access miss in the cache in the upper layer is reduced, and other than load and load instructions. High-speed parallel execution with the instruction of can be realized.

【００８２】また、請求項５の発明によれば、複数の論
理アドレスを複数階層のキャッシュの各々のための物理
アドレスに並列に変換するアドレス変換装置を設けた構
成を採用したので、物理アドレスをもってアクセスすべ
き複数階層のキャッシュへの同時アクセスを実現するこ
とができる。Further, according to the invention of claim 5, since an address translation device for translating a plurality of logical addresses into a physical address for each of a plurality of layers of caches in parallel is provided, a physical address is provided. It is possible to realize simultaneous access to multiple levels of cache to be accessed.

【００８３】また、請求項６の発明によれば、２階層の
キャッシュを備えた情報処理装置において、命令バッフ
ァとレジスタファイルとの間に２つの命令解読器と２つ
の演算器とを設けることにより２つの命令を並列に実行
するための実行手段を構成し、かつ該実行手段と前記２
階層のキャッシュとの間に必要に応じて切り換え使用さ
れる２つのアドレスバスと２つのアドレス変換装置とを
設けることにより２階層のキャッシュを同時にアクセス
するためのアクセス手段を構成したので、キャッシュに
対するアクセス待ちが解消でき、命令実行速度の向上と
アクセスミスに対するペナルティの低減とを実現するこ
とができる。Further, according to the invention of claim 6, in the information processing apparatus having the two-level cache, two instruction decoders and two arithmetic units are provided between the instruction buffer and the register file. An execution means for executing two instructions in parallel, and the execution means and the above-mentioned 2
By providing two address buses and two address translation devices that are used for switching between the two-tier caches as necessary, an access means for simultaneously accessing the two-tier caches is constructed. The waiting can be eliminated, and the instruction execution speed can be improved and the penalty for access miss can be reduced.

【００８４】また、請求項７の発明によれば、下位の階
層のキャッシュに比べて記憶容量の小さい上位の階層の
キャッシュは短いサイクルでアクセスを完了する構成を
採用したので、アクセス速度の異なる複数階層のキャッ
シュ間の協調を図ることができる。Further, according to the invention of claim 7, the cache of the upper layer, which has a smaller storage capacity than the cache of the lower layer, adopts a configuration in which the access is completed in a short cycle. Coordination between caches in the hierarchy can be achieved.

【００８５】また、請求項８の発明によれば、上位の階
層のキャッシュのあるアドレスのデータが書き換えられ
た場合には下位の階層のキャッシュの同一アドレスのデ
ータを該上位の階層のキャッシュと同一のデータで書き
換えるライトスルー方式のコヒーレンシープロトコルを
採用したので、複数階層のキャッシュ間のデータの統一
性を保証することができる。Further, according to the invention of claim 8, when the data of a certain address in the cache of the upper layer is rewritten, the data of the same address of the cache of the lower layer is the same as the cache of the upper layer. Since a write-through coherency protocol in which the data is rewritten with the above data is adopted, it is possible to guarantee the consistency of the data between the caches of multiple layers.

[Brief description of drawings]

【図１】本発明の実施例に係る情報処理装置のブロック
図である。FIG. 1 is a block diagram of an information processing apparatus according to an embodiment of the present invention.

【図２】図１中のＣＰＵの内部構成を示すブロック図で
ある。FIG. 2 is a block diagram showing an internal configuration of a CPU in FIG.

【図３】図１の情報処理装置において連続した４つのロ
ード命令からなるプログラムを実行した場合の動作を示
すタイミング図であって、いずれのロード命令について
も２階層のキャッシュのいずれもがアクセスミスを生じ
ない場合を示すものである。FIG. 3 is a timing chart showing an operation when a program consisting of four consecutive load instructions is executed in the information processing apparatus of FIG. It shows the case where no occurs.

【図４】図３と同様の図であって、第１のロード命令に
ついての第１階層のキャッシュへのアクセスのみがミス
した場合を示すものである。FIG. 4 is a view similar to FIG. 3, showing a case where only a first-level cache access for a first load instruction is missed.

【図５】図１の情報処理装置において１つのロード命令
と３つの算術演算命令との連続した４つの命令からなる
プログラムを実行した場合の動作を示すタイミング図で
あって、ロード命令についての第１階層のキャッシュへ
のアクセスがヒットした場合を示すものである。5 is a timing chart showing an operation in the case where a program consisting of four consecutive instructions of one load instruction and three arithmetic operation instructions is executed in the information processing apparatus of FIG. 1, and FIG. This shows a case where an access to the cache of one layer is hit.

【図６】図５と同様の図であって、ロード命令について
の第１階層のキャッシュへのアクセスがミスした場合を
示すものである。FIG. 6 is a diagram similar to FIG. 5 and shows a case where an access to the first level cache for a load instruction is missed.

【図７】従来の情報処理装置の例を示すブロック図であ
る。FIG. 7 is a block diagram showing an example of a conventional information processing device.

【図８】図７中のＣＰＵの内部構成を示すブロック図で
ある。8 is a block diagram showing an internal configuration of a CPU in FIG.

【図９】図７の情報処理装置における図５に対応したタ
イミング図である。9 is a timing chart corresponding to FIG. 5 in the information processing apparatus of FIG.

【図１０】図７の情報処理装置における図６に対応した
タイミング図である。10 is a timing chart corresponding to FIG. 6 in the information processing apparatus of FIG.

【図１１】図７の情報処理装置において連続した４つの
ロード命令からなるプログラムを実行した場合の動作を
示すタイミング図であって、いずれのロード命令につい
ても第１階層のキャッシュへのアクセスがヒットした場
合を示すものである。11 is a timing chart showing an operation when a program consisting of four consecutive load instructions is executed in the information processing apparatus of FIG. 7, and an access to the cache of the first layer is hit for any of the load instructions. This shows the case where

【図１２】図１１と同様の図であって、第１階層のキャ
ッシュへのアクセスが第１のロード命令のみについてミ
スした場合を示すものである。FIG. 12 is a diagram similar to FIG. 11 and shows a case where an access to the cache of the first layer misses only for the first load instruction.

[Explanation of symbols]

１中央処理装置（ＣＰＵ）２第１のアドレス変換装置（ＭＭＵａ）３第２のアドレス変換装置（ＭＭＵｂ）４第１階層のキャッシュ（Ｌ１−ＣＡＣＨＥ）５第２階層のキャッシュ（Ｌ２−ＣＡＣＨＥ）６第１のアドレスバス７第１のデータバス８第２のアドレスバス９第２のデータバス１０バスセレクタ１１Ｌ１，Ｌ２−ＣＡＣＨＥ間のデータバス１２Ｌ１，Ｌ２−ＣＡＣＨＥ間のアドレスバス１３ミスヒット信号線１４バス選択信号線２１命令バッファ（ＩＢ）２２第１の命令解読器（ＤＥＣａ）２３第２の命令解読器（ＤＥＣｂ）２４第１の演算器（ＡＬＵａ）２５第２の演算器（ＡＬＵｂ）２６レジスタファイル（ＲＦ）２７第１のアドレスセレクタ２８第２のアドレスセレクタ２９データセレクタ３０制御回路（ＣＴＲＬ） 1 Central processing unit (CPU) 2 First address translation unit (MMUa) 3 Second Address Translation Unit (MMUb) 4 First level cache (L1-CACHE) 5 Second level cache (L2-CACHE) 6 First address bus 7 First data bus 8 Second address bus 9 Second data bus 10 bus selector 11 Data bus between L1, L2-CACHE 12 Address bus between L1, L2-CACHE 13 Miss hit signal line 14 Bus selection signal line 21 Instruction buffer (IB) 22 First Instruction Decoder (DECa) 23 Second Instruction Decoder (DECb) 24 First arithmetic unit (ALUa) 25 Second arithmetic unit (ALUb) 26 register file (RF) 27 First Address Selector 28 Second address selector 29 Data selector 30 Control circuit (CTRL)

Claims

[Claims]

1. An execution means for executing a plurality of instructions in parallel, a plurality of layers of caches from the highest hierarchy closest to the execution means to the lowest hierarchy farthest from the execution means, and a plurality of caches by the execution means. An information processing apparatus comprising: an access unit for simultaneously accessing the caches of a plurality of layers when instructions are executed in parallel.

2. The information processing apparatus according to claim 1, wherein the access unit is based on each of the plurality of instructions when the execution unit executes a plurality of instructions each requiring a memory access in parallel. An information processing apparatus having a function of simultaneously starting access to each of the caches of a plurality of layers by different addresses.

3. The information processing apparatus according to claim 2, wherein the access unit, when an access to a cache in a higher hierarchy regarding an instruction of a plurality of instructions requiring memory access hits, When the access to the cache in the upper layer is enabled, but the access to the cache in the upper layer is missed, the access to the cache in the upper layer is lower than that of the cache in the upper layer. An information processing apparatus having a function of starting access to a cache of a hierarchy in parallel with execution of cache access relating to another instruction requiring memory access.

4. The information processing apparatus according to claim 1, wherein the access unit, when the execution unit executes in parallel an instruction that requires memory access and another instruction that does not require memory access, When access to the caches of the plurality of layers is started at the same time by the same address based on the instruction requiring the memory access, and an access to the cache of an upper layer related to the instruction requiring the memory access is hit. Enables the access to the cache of the upper layer, while the access to the cache of the upper layer is missed, the instruction is started simultaneously with the access to the cache of the upper layer. An information processing apparatus having a function of validating access to a cache in a lower hierarchy.

5. The information processing apparatus according to claim 1, wherein the access unit sets a plurality of logical addresses for simultaneous access to the caches of a plurality of layers to a physical address suitable for each of the caches of a plurality of layers. An information processing apparatus comprising an address translation device for translation.

6. The information processing apparatus according to claim 1, wherein the caches of a plurality of layers include a two-layer cache including an upper first-layer cache and a lower second-layer cache, and the two-layer cache. Between the caches, an address bus between the caches for transferring an address for accessing the cache of the first layer to the cache of the second layer is provided. Buffer for holding the instructions of the above, two instruction decoders for decoding the two instructions fetched simultaneously from the instruction buffer, and the instructions decoded by the two instruction decoders, respectively. And two register files for respectively storing execution results of instructions by the two arithmetic units, each of the two arithmetic units being , Has a function of calculating a logical address for the memory access for an instruction requiring the memory access and outputting the logical address, wherein the access means outputs from each of the two arithmetic units. First and second address selectors for directing a logical address to be stored to each of the two-level caches, and the first and second address selectors.
A first address bus connected to the output side of the second address selector, a second address bus connected to the output side of the second address selector, and the first address bus for accessing the first level cache. A first address translation device for translating a logical address on a first address bus into a physical address, and a logical address on the second address bus for accessing a cache of the second layer. A second address translation device for translating data into a cache, and selecting one of the address bus between the caches and the second address bus and connecting the selected address bus to the cache of the second hierarchy. Bus selector, and a control circuit for controlling switching of each of the first and second address selectors and the bus selector. The information processing apparatus according to claim and.

7. The information processing apparatus according to claim 1, wherein a cache of an upper layer of the caches of the plurality of layers has a storage capacity higher than that of a cache of a lower layer than the cache of the upper layer. An information processing apparatus characterized in that access is completed in a short cycle with a small size.

8. The information processing apparatus according to claim 1, wherein the caches of the plurality of layers are lower than the cache of the upper layer when data at an address in the cache of the upper layer is rewritten. An information processing apparatus having a function of rewriting data of the same address in each of the caches of the same cache with the same data as the cache of the upper hierarchy.