JPH07200404A

JPH07200404A - Cache memory using dram

Info

Publication number: JPH07200404A
Application number: JP5303685A
Authority: JP
Inventors: Shigenori Shimizu; 茂則清水
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1993-12-03
Filing date: 1993-12-03
Publication date: 1995-08-04

Abstract

PURPOSE: To constitute a secondary cache which can operate with zero weight against a processor by using a DRAM and a system having a small area and small power consumption. CONSTITUTION: A data memory of a secondary cache is constituted by means of a DRAM and integrated on a single chip together with a control logic and a tag memory. At the same time, four words which are continuously accessed are interleaved in different rows and stored in the data memory. Thus, a DRAM access operation is attained at an apparently very high speed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は高性能マイクロプロセッ
サに接続される二次キャッシュメモリの高速化に関する
ものである。特に、これをＤＲＡＭメモリを用いて実現
する手段に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speeding up a secondary cache memory connected to a high performance microprocessor. In particular, it relates to means for realizing this using a DRAM memory.

【０００２】[0002]

【従来技術】高性能マイクロプロセッサからメインメモ
リに対するアクセス速度はプロセッサの動作速度（クロ
ック速度）に比べて一般にかなり遅い。たとえば、メイ
ンクロックから見たＤＲＡＭメインメモリのアクセスタ
イムは１２０〜１５０ｎｓ程度であるのに対して、プロ
セッサのクロック速度は１５ｎｓ（６６ＭＨｚ）以上で
ある場合が多い。このような速度の違いをそのまま放置
することはプロセッサの使用効率が低減し、明らかなシ
ステム性能の低下を来すことになる。そこで、プロセッ
サとメインメモリの間により高速にアクセス可能なキャ
ッシュメモリをおいて頻繁に使用するデータについては
キャッシュメモリからアクセスをすることによってデー
タのアクセス時間を短縮し、プロセッサの性能を十分に
発揮させることが従来から広く行われてきた。従って、
プロセッサの性能を十分に発揮させるには高速なキャッ
シュメモリを実現することが重要な要素となる。2. Description of the Related Art The access speed from a high performance microprocessor to a main memory is generally considerably slower than the operating speed (clock speed) of the processor. For example, the access time of the DRAM main memory viewed from the main clock is about 120 to 150 ns, while the clock speed of the processor is often 15 ns (66 MHz) or higher. If such a difference in speed is left as it is, the use efficiency of the processor is reduced and the system performance is obviously lowered. Therefore, the cache memory that can be accessed at a higher speed between the processor and the main memory is used, and frequently used data is accessed from the cache memory to shorten the data access time and fully utilize the processor performance. This has been widely done in the past. Therefore,
Achieving a high-speed cache memory is an important factor in achieving the full performance of the processor.

【０００３】このために、近年特にプロセッサと同一の
チップ上にキャッシュメモリを実装し高速化を図ること
が一般に行われている。しかし、このように同一のチッ
プ上にキャッシュメモリを実装することはチップの面積
の問題から制約がある。つまり、容量の大きなキャッシ
ュメモリの実装は困難であり、現在実装可能とされてい
るのは４〜３２ｋＢ程度のものである。なお、このよう
に同一チップ上に実装することを「オンチップ」とい
い、このようにプロセッサに直結して実装・接続される
キャッシュメモリを一次キャッシュという。For this reason, in recent years, it has been generally practiced to mount a cache memory on the same chip as the processor to increase the speed. However, mounting the cache memory on the same chip in this way is limited by the problem of the chip area. In other words, it is difficult to mount a large capacity cache memory, and it is currently possible to mount a cache memory of about 4 to 32 kB. Note that mounting on the same chip in this manner is called "on-chip", and the cache memory directly mounted and connected to the processor in this way is called a primary cache.

【０００４】このオンチップキャッシュの実装に係わる
容量の制約を解決するために、キャッシュメモリをその
データのアクセス頻度に応じて二段階程度に構成し、も
っとも頻繁にアクセスされるデータに関しては一次キャ
ッシュ（Ｌ１）に、その次に頻繁にアクセスされるデー
タに関しては二次キャッシュ（Ｌ２）に格納することが
多くなっている。すなわち、図１に示すとおり、通常マ
イクロプロセッサ（ＣＰＵ）と同一のチップに４〜３２
ｋＢ程度の一次キャッシュを実装し、さらに外部に１２
８〜５１２ｋＢの二次キャッシュを接続することによっ
てシステムが構成される。ここで、図１においてはキャ
ッシュライン長１２８ビットの２ウエイのセットアソシ
アテイブキャッシュの例を示している。タグ情報として
は１キャッシュライン当たり１８ビットを仮定してい
る。In order to solve the capacity restriction associated with the implementation of the on-chip cache, the cache memory is constructed in about two stages according to the access frequency of the data, and the most frequently accessed data is stored in the primary cache ( L2) is frequently stored in the secondary cache (L2) with respect to data that is frequently accessed next. That is, as shown in FIG. 1, 4 to 32 are usually mounted on the same chip as the microprocessor (CPU).
A primary cache of about kB is implemented, and 12 external caches are installed.
The system is configured by connecting a secondary cache of 8 to 512 kB. Here, FIG. 1 shows an example of a two-way set associative cache having a cache line length of 128 bits. As the tag information, 18 bits per cache line are assumed.

【０００５】ここで、プロセッサの性能を十分に発揮さ
せるためには二次キャッシュにおいてもプロセッサのサ
イクルタイムに匹敵したアクセスタイムを有している高
速メモリを使用する必要がある。もし、プロセッサが５
０ＭＨｚで動作する場合には１０〜２０ｎｓ程度のアク
セスタイムを有している高速ＳＲＡＭでこれを構成する
のが一般的である。Here, in order to make full use of the performance of the processor, it is necessary to use a high speed memory having an access time comparable to the cycle time of the processor even in the secondary cache. If the processor is 5
In the case of operating at 0 MHz, it is general to configure this by a high speed SRAM having an access time of about 10 to 20 ns.

【０００６】キャッシュメモリはキャッシュタグメモリ
とキャッシュデータメモリの二つで構成される。前者は
アクセスした内容を論理回路で処理して、キャッシュヒ
ット／ミスの判断を行う作用を有している。従って、キ
ャッシュデータメモリよりもさらに高速のメモリを必要
とするが、その判断のための情報は各キャッシュライン
に対するアドレスビットと数ビットのステータスビット
のみで表されて格納されるため、比較的小さな容量のメ
モリで足りる。一方、後者は実際にデータを格納する部
分であり、大きな容量を必要とする。かかる両者の要求
特性の差異に基づいて前者はキャッシュコントローラ中
にインテグレートされ、後者は独立の高速ＳＲＡＭチッ
プ数個を用いて実現されるか、前者も独立のより高速な
ＳＲＡＭチップを用いて実現される。The cache memory is composed of a cache tag memory and a cache data memory. The former has a function of processing the accessed contents by a logic circuit to judge cache hit / miss. Therefore, a memory faster than the cache data memory is required, but the information for the determination is stored by being represented by only the address bit for each cache line and a few status bits, and therefore a relatively small capacity. Memory is enough. On the other hand, the latter is a part that actually stores data and requires a large capacity. Based on the difference in the required characteristics between the two, the former is integrated in the cache controller, and the latter is realized by using several independent high-speed SRAM chips, or the former is also realized by using independent and higher-speed SRAM chips. It

【０００７】その場合、キャッシュデータメモリに使用
するＳＲＡＭチップの数は例えば２５６ｋＢのキャッシ
ュを構成する時には３２ｋＢの高速ＳＲＡＭチップ８個
となる。In this case, the number of SRAM chips used for the cache data memory is, for example, eight 32 kB high-speed SRAM chips when forming a 256 kB cache.

【０００８】しかしこのように複数のＳＲＡＭチップを
使用する方法によればいくつかの問題点が提起される。
第一に実装面積の問題がある。この方法によれば、ＣＰ
Ｕと同等のＩ／Ｏピン数を必要とするキャッシュコント
ローラ１個と４〜８個の高速ＳＲＡＭが必要となる。こ
れは、ＣＰＵ本体の数倍の実装面積を必要とすることが
多い。However, according to the method of using a plurality of SRAM chips, some problems are raised.
First, there is a mounting area problem. According to this method, CP
One cache controller that requires the same number of I / O pins as U and 4 to 8 high-speed SRAMs are required. This often requires a mounting area several times that of the CPU body.

【０００９】第二に消費電力の問題がある。多数の高速
ＳＲＡＭが同時にアクテイブとなるため、大きな電力を
消費する。そして、それに伴う熱の発生も問題となる。Secondly, there is a problem of power consumption. A large number of high-speed SRAMs are active at the same time, which consumes a large amount of power. The generation of heat associated therewith also poses a problem.

【００１０】第三の問題としてはウエイ数の限界が挙げ
られる。一般にキャッシュメモリのヒット率をその容量
を増大させることなく向上させる手法として同一の番地
に数個のウエイを割り当てる方法（セットアソシエ−テ
イブキャッシュ）がとられる。このウエイ数が多いほど
ヒット率が高くなる。しかし、例えば４ウエイのアソシ
エーテイブキャッシュを実現するには、プロセッサのデ
ータ幅の４倍のデータビットが同時にアクセスできなけ
ればならない。それに見合った構成をＳＲＡＭによって
実現するにはビット幅の大きなＳＲＡＭチップを採用す
る、小容量のチップを多数使用する等が考えられるが、
いずれもコスト上問題がある。The third problem is the limit of the number of ways. Generally, as a method of improving the hit rate of the cache memory without increasing its capacity, a method of assigning several ways to the same address (set associative cache) is used. The greater the number of ways, the higher the hit rate. However, in order to realize, for example, a 4-way associative cache, four times as many data bits as the data width of the processor must be simultaneously accessible. In order to realize a configuration corresponding to the SRAM by using an SRAM chip having a large bit width, it is conceivable to use a large number of small-capacity chips.
Both have cost problems.

【００１１】さらに、キャッシュ領域での配線遅延、信
号スキュウなどを考慮すると、その分のタイミングマー
ジンを考慮して設計する必要がある。すなわち、かかる
原因からＳＲＡＭの動作は実質的に遅延するので、この
遅延を補償するためには本来の目標である１５ｎｓより
さらに数ｎｓ速いアクセスタイムを有するＳＲＡＭの使
用が要請される。かかる配線遅延、信号スキュウ等を最
小限に抑制するための設計が必要となるが、これは非常
に困難な問題である。Further, in consideration of wiring delay in the cache area, signal skew, etc., it is necessary to design in consideration of the timing margin. That is, since the operation of the SRAM is substantially delayed due to such a cause, it is required to use the SRAM having an access time several ns faster than the original target of 15 ns to compensate for this delay. A design for suppressing such wiring delay, signal skew, etc. is required, but this is a very difficult problem.

【００１２】そして、最後にコスト上の問題も避けられ
ない。複数の高速ＳＲＡＭチップを使用することによっ
て非常に高価なものとなるためである。Finally, the cost problem is unavoidable. This is because the use of a plurality of high speed SRAM chips makes them very expensive.

【００１３】次に０ウエイトの二次キャッシュに必要な
条件についてその動作面から説明を加える。なお、ここ
で「０ウエイト」とは全く無駄な待ちクロックサイクル
を消費することなく、連続してアクセス可能な態様をい
う。この態様で動作する場合にプロセッサの能力を完全
に活用できるので本願発明ではこれを目標とする。図２
にこの場合の動作を示す。この図は４８６Ｄｘ２／６６
ＭＨｚの場合についての説明である。ＣＰＵのリードア
クセスがＬ１キャッシュでミスした場合、キャッシュラ
イン長１２８ビット（１６バイト）の一次キャッシュ
（オンチップ）に３２ビットづつ４回のバースト転送で
外部よりデータが読みこまれる。この状況において、二
次キャッシュが０ウエイト動作でＬ１キャッシュにデー
タを供給するためには、最初のワード転送は２バスクロ
ックで、次以降の３つのワードの転送はそれぞれ１バス
クロックで完了する必要がある。つまり、前者（リード
オフサイクルという）は６０ｎｓ、後者（バーストサイ
クルという）はそれぞれ３０ｎｓで動作する必要があ
る。Next, the conditions necessary for the 0-wait secondary cache will be described from the viewpoint of its operation. Here, "0 wait" means a mode in which continuous access is possible without consuming wasteful waiting clock cycles. This is the goal of the present invention, because the processor's capabilities can be fully utilized when operating in this manner. Figure 2
Shows the operation in this case. This figure shows 486Dx2 / 66
This is a description for the case of MHz. When the read access of the CPU misses in the L1 cache, the data is read from the outside by the burst transfer of 32 bits each to the primary cache (on-chip) of the cache line length of 128 bits (16 bytes) by 32 bits. In this situation, in order for the secondary cache to supply data to the L1 cache in the 0 wait operation, the first word transfer must be completed with 2 bus clocks, and the transfer of the following 3 words must be completed with 1 bus clock each. There is. That is, the former (called a read-off cycle) needs to operate at 60 ns, and the latter (called a burst cycle) needs to operate at 30 ns.

【００１４】そして、これらの値がそのまま要求される
アクセスタイムの特性になるのではないことに留意すべ
きである。リードオフサイクルに対してはＣＰＵのアド
レスサイクル開始の信号遅延および、ＣＰＵに対するデ
ータのセットアップタイム、バーストサイクルについて
はＣＰＵに対するデータセットアプタイムがそれぞれ必
要なためである。従って、表１に示す程度のアクセスタ
イムを実現することが結局必要となる。It should be noted that these values do not directly represent the required access time characteristics. This is because a signal delay for starting the address cycle of the CPU and a data setup time for the CPU are required for the read-off cycle, and a data set-up time for the CPU is required for the burst cycle. Therefore, it is eventually necessary to realize the access time as shown in Table 1.

【表１】サイクルタイムアクセスタイムリードオフサイクル６０３０バーストサイクル３０１５（単位はｎｓ）[Table 1] Cycle time Access time Read-off cycle 60 30 Burst cycle 30 15 (Unit: ns)

【００１５】このようにサイクルによって要求されるア
クセスタイムが異なるのが一つの制約条件である。例え
ば、これに適合するように二次キャッシュをＳＲＡＭを
用いて構成するとすれば、速い方のアクセスタイムであ
る１５ｎｓを基準に高速ＳＲＡＭを選択する必要があ
る。このことはリードオフサイクルのアクセスタイムを
考えれば、オーバースペックを容認することとなる。One of the constraints is that the access time required for each cycle is different. For example, if the secondary cache is configured to use the SRAM so as to conform to this, it is necessary to select the high-speed SRAM based on the faster access time of 15 ns. This means that over-spec is acceptable, considering the access time of the read-off cycle.

【００１６】このような、リードオフサイクルとバース
トサイクル間のアクセスタイムの相違に起因する無駄を
防止するためにはＤＲＡＭメモリチップを使用すること
が考えられる。ＤＲＡＭメモリチップでは行アドレスで
選択された一行分のデータがセンスアンプにバッファさ
れ、その後にアクセスすべきワードについてはその行内
の列アドレスの変更のみでこれを行う。従って、最初の
ワードアクセスについては比較的長時間を必要とする
が、その後の列アドレスの変更によるアクセスについて
は極めて短時間で十分である。このようなアクセスの態
様をページアクセスという。このことは、表１に示した
リードオフサイクルとバーストサイクルのアクセスタイ
ムの比率と非常によく適合するという点で、ＤＲＡＭの
使用はＳＲＡＭを使用する場合に比べて大きな利点があ
ると考えられる。In order to prevent the waste caused by the difference in access time between the read-off cycle and the burst cycle, it is possible to use a DRAM memory chip. In the DRAM memory chip, the data for one row selected by the row address is buffered in the sense amplifier, and for the word to be accessed thereafter, this is done only by changing the column address in the row. Therefore, a relatively long time is required for the first word access, but an extremely short time is sufficient for the subsequent access by changing the column address. This type of access is called page access. This is very compatible with the access time ratio of the read-off cycle and the burst cycle shown in Table 1, and it is considered that the use of DRAM has a great advantage as compared with the case of using SRAM.

【００１７】しかし、ＤＲＡＭメモリチップを使用すれ
ば以下の二点が問題となる。まず、ＤＲＡＭメモリチッ
プはアクセス時間の絶対値が大きく、要求される特性を
満たさないという点である。すなわち、ＤＲＡＭメモリ
チップにおいては最初のワードのアクセスタイムとそれ
以降のページアクセスタイムがそれぞれ６０、３０ｎｓ
程度であり、要求特性の２倍程度の時間を必要とする。
次に、データ幅の問題がある。汎用ＤＲＡＭを用いれば
仮に２ウエイ程度のウエイ数のキャッシュメモリを実現
するにしても多くのモジュールを並列にする必要があ
る。このことによる実装面積、消費電力等の弊害につい
ては述べたところである。However, if a DRAM memory chip is used, the following two problems will occur. First, the DRAM memory chip has a large absolute value of access time and does not satisfy required characteristics. That is, in the DRAM memory chip, the access time of the first word and the page access time after that are 60 and 30 ns, respectively.
However, the time required is about twice the required characteristics.
Next, there is the data width problem. If a general-purpose DRAM is used, even if a cache memory having a number of ways of about 2 ways is realized, many modules need to be arranged in parallel. The adverse effects such as mounting area and power consumption due to this have been described.

【００１８】このようにＤＲＡＭメモリチップを単に用
いた場合、従来知られていた手法によっては０ウエイト
のアクセスを実現する二次キャッシュは実用的なレベル
では構成できないこととなる。As described above, when the DRAM memory chip is simply used, the secondary cache for realizing 0-wait access cannot be constructed at a practical level by the conventionally known method.

【００１９】[0019]

【発明が解決しようとする課題】今まで述べてきたとこ
ろから明らかなとおり、本願発明の目的はプロセッサの
性能をフルに発揮するに足る二次キャッシュを実現する
ことである。As is clear from what has been described above, an object of the present invention is to realize a secondary cache sufficient to fully bring out the performance of the processor.

【００２０】そして、この実現方法は実用的なものであ
る必要がある。すなわち、キャッシュデータメモリに必
要な実装面積を大幅に低減し、かつ、その消費電力や発
生熱量が小さいことが望ましい。Then, this realization method needs to be practical. That is, it is desirable that the mounting area required for the cache data memory be significantly reduced and that the power consumption and the amount of heat generated be small.

【００２１】[0021]

【課題を解決するための手段】本願発明はかかる問題点
に鑑み、論理チップ上にＤＲＡＭメモリをインテグレー
トすることによってＤＲＡＭの小面積・低消費電力とい
う長所と、ＤＲＡＭメモリの行アドレスアクセスにおけ
るサイクルとページモードアクセスのサイクルタイムが
異なるという事実を利用するとともに、汎用ＤＲＡＭメ
モリチップを単純に使用する場合の問題点であったＣＰ
Ｕから見たアクセスタイムの絶対値を改善することによ
って０ウエイト動作可能な二次キャッシュを実現するも
のである。In view of the above problems, the present invention integrates a DRAM memory on a logic chip and has the advantage of a small area and low power consumption of the DRAM and the cycle of row address access of the DRAM memory. Utilizing the fact that the cycle time of page mode access is different, it was a problem when simply using a general-purpose DRAM memory chip.
By improving the absolute value of the access time viewed from U, a secondary cache capable of 0 wait operation is realized.

【００２２】キャッシュメモリは本来ＤＲＡＭによって
構成されているメインメモリの遅さを補償する目的で登
場したものである。かかる経緯から、メインメモリを構
成するＤＲＡＭ（一般的に速度が遅いと考えられてい
る）でキャッシュメモリを構成しようとする発想はなか
った。本願発明はＤＲＡＭメモリの低面積、安価という
さまざまな特長に鑑み、これをキャッシュメモリに流用
しようとするものであり、この点において従来技術の流
れから発想を逆転させるとともに、これを全く異にする
ものである。The cache memory originally appeared for the purpose of compensating for the delay of the main memory which is originally composed of DRAM. From such a background, there was no idea to configure the cache memory with the DRAM (generally considered to be slow) configuring the main memory. The present invention intends to divert the DRAM memory to various features such as a low area and a low cost, and intends to divert it to the cache memory. In this respect, the idea is reversed from the flow of the conventional technique, and this is completely different. It is a thing.

【００２３】さらに、セットアソシエーテイブキャッシ
ュにおいて、同一セットとして選択される複数のキャッ
シュライン間で、ＣＰＵのデータビット幅単位にデータ
をインタリーブして格納することにより、キャッシュデ
ータメモリのバンク数を増大させることなく、見かけ上
のＤＲＡＭメモリのアクセスタイム、サイクルタイムを
増大させると同時に、ＤＲＡＭメモリのプリチャージタ
イムの隠蔽を実現する。Further, in the set associative cache, the number of banks of the cache data memory is increased by interleaving and storing data in units of the data bit width of the CPU between a plurality of cache lines selected as the same set. Without this, the apparent access time and cycle time of the DRAM memory are increased, and at the same time concealment of the precharge time of the DRAM memory is realized.

【００２４】[0024]

【実施例】図３に本願発明に係わる実施例について示
す。この実施例においてはキャッシュタグメモリとして
４ｋｘ３６ｂ構成のＳＲＡＭブロック４個を用い、キャ
ッシュデータメモリとして６４ｋｘ１８ｂ構成のＤＲＡ
Ｍブロック４個を用いてそれぞれ構成する。この構成に
よれば、ライン長１２８ビットの２ウエイ・セットアソ
シアテイブ５１２ＫＢの二次キャッシュを実現すること
になる。EXAMPLE FIG. 3 shows an example according to the present invention. In this embodiment, four SRAM blocks having a 4kx36b structure are used as a cache tag memory, and a DRA having a 64kx18b structure is used as a cache data memory.
Each of them is configured by using four M blocks. With this configuration, a 2-way set associative 512 KB secondary cache with a line length of 128 bits is realized.

【００２５】次にタグメモリとデータメモリの内容をそ
れぞれ図４−Ａ，図４−Ｂに示す。図４−Ａに示すよう
に、同一セットの２つのキャッシュラインに相当するタ
グ情報を、タグメモリのＳＲＡＭの同一アドレスに配置
する。この同一セットの２つのキャッシュラインが図４
−Ｂに示されるように、３６ビット（データ分３２ビッ
ト＋パリテイ分４ビット）単位でＤＲＡＭブロックのそ
れぞれ連続した４つのアドレス上に配置される。この連
続した４つのアドレスは物理的にはＤＲＡＭの同一行中
に存在するものであり、ページモードによって高速アク
セスが可能となる。ページモードの使用が前提となるの
はこのモードによれば、表１に示したようなリードオ
フ、バーストの各サイクルのアクセスタイムの傾向に合
致するためにである。Next, the contents of the tag memory and the data memory are shown in FIGS. 4-A and 4-B, respectively. As shown in FIG. 4-A, tag information corresponding to two cache lines of the same set is arranged at the same address in the SRAM of the tag memory. Two cache lines of this same set are shown in FIG.
As shown in -B, the data is arranged in units of 36 bits (32 bits for data + 4 bits for parity) on four consecutive addresses of the DRAM block. These four consecutive addresses are physically present in the same row of the DRAM, and the page mode enables high-speed access. The use of the page mode is premised because this mode matches the tendency of the access time in each cycle of read-off and burst as shown in Table 1.

【００２６】このような二次キャッシュを構成し、一次
キャッシュでキャッシュミスし、二次キャッシュでヒッ
トした場合を想定する。この時、そのヒットした結果の
データが３２ビットｘ４のバースト転送で一次キャッシ
ュにラインフィルされる。この時のタイミングチャート
を図５に示す。この例においてはプロセッサのバスクロ
ックは３３ＭＨｚと仮定し、二次キャッシュの動作クロ
ックは６６ＭＨｚと仮定する。この図にしたがって説明
すると、プロセッサのＴ１バスサイクルの後半でタグメ
モリとキャッシュデータメモリのアクセスを開始する。
キャッシュデータメモリは同一セットの二つのキャッシ
ュラインの該当ワード位置を同時にリードする。この例
では，キャッシュラインＡとＢの最初のワードＤＡ１，
ＤＢ１をリードしている。どちらのキャッシュラインで
ヒットしたのかが確定した後に、該当キャッシュライン
の残りワード（ＤＡ２，ＤＡ３，ＤＡ４）をＤＲＡＭブ
ロックよりページモードを使用して読みだす。この例に
おいてＤＲＡＭメモリのブロックの第一ワードのアクセ
スタイムと後続ワードのアクセスタイムがそれぞれ３
０，１５ｎｓなので、要求特性に合致する。It is assumed that such a secondary cache is constructed, a cache miss occurs in the primary cache, and a hit occurs in the secondary cache. At this time, the data resulting from the hit is line-filled in the primary cache by a 32-bit x 4 burst transfer. The timing chart at this time is shown in FIG. In this example, the processor bus clock is assumed to be 33 MHz and the secondary cache operating clock is assumed to be 66 MHz. Explaining according to this figure, access to the tag memory and cache data memory is started in the latter half of the T1 bus cycle of the processor.
The cache data memory simultaneously reads the corresponding word positions of two cache lines of the same set. In this example, the first word DA1, of cache lines A and B
Leading DB1. After determining which cache line is hit, the remaining words (DA2, DA3, DA4) of the corresponding cache line are read from the DRAM block using the page mode. In this example, the access time of the first word and the access time of the subsequent words of the block of the DRAM memory are 3 respectively.
Since it is 0,15 ns, it meets the required characteristics.

【００２７】このように、ＤＲＡＭメモリをチップ上に
実装するという技術を用いることによって要求特性に適
合した高速の二次キャッシュを実現できる。例えば、同
一チップ上ＤＲＡＭメモリをインテグレートすることに
よって、これらが別チップに構成されていた場合に必要
であった、チップ間の信号遅延が省略される。キャッシ
ュデータメモリとしＤＲＡＭメモリに与えられるアドレ
ス信号や制御信号などは制御回路から与えられるが、別
チップ構成の場合にはこれらの信号には制御回路のＩ／
Ｏドライバの遅延時間（例えば５ｎｓ程度）と、ＤＲＡ
ＭメモリチップのＩ／Ｏレシーバの遅延時間（例えば２
ｎｓ程度）が加算される。同一チップ上にインテグレー
トされる場合はこれらは明らかに不要となるから、高速
のアクセスの実現が可能となる。As described above, by using the technique of mounting the DRAM memory on the chip, it is possible to realize a high-speed secondary cache adapted to the required characteristics. For example, by integrating the DRAM memories on the same chip, the signal delay between the chips, which is necessary when these memories are configured on different chips, is omitted. Address signals and control signals given to the cache data DRAM and the DRAM memory are given from the control circuit. However, in the case of another chip configuration, these signals are I / O of the control circuit.
Delay time of O driver (for example, about 5 ns) and DRA
Delay time of the I / O receiver of the M memory chip (for example, 2
ns) is added. If they are integrated on the same chip, these are obviously unnecessary, so that high-speed access can be realized.

【００２８】この点を図８を用いてデータを二次キャッ
シュから読み出す場合について具体的に説明する。この
説明では、従来のデータメモリが同一チップ上に実装さ
れていないＤＲＡＭメモリチップを用いたキャッシュに
ついての読み込み動作を考える。まず、制御論理に読み
取られるべきアドレスおよびコマンドが生成される。こ
れが、Ｉ／ＯドライバとＩ／Ｏレシーバを介してＤＲＡ
Ｍデータキャッシュに与えられる。次に、ＤＲＡＭデー
タキャッシュでデータが見つかり、それが再度Ｉ／Ｏド
ライバ・Ｉ／Ｏレシーバを介して論理回路に戻る。そし
て、さらにＩ／Ｏドライバを介してデータが最終的にＣ
ＰＵに出力される。本願発明によれば、同一チップ上に
キャッシュをインテグレートとした場合はかかるＩ／Ｏ
ドライバ・レシーバの計５回の通過が最終のＩ／Ｏドラ
イバの一回のみと少なくなるので、上の遅延時間を考慮
に入れれば、１０〜２０ｎｓの遅延がなくなる。This point will be specifically described with reference to FIG. 8 in the case of reading data from the secondary cache. In this description, a read operation for a cache using a DRAM memory chip in which a conventional data memory is not mounted on the same chip will be considered. First, the address and command to be read by the control logic is generated. This is the DRA via the I / O driver and I / O receiver.
M data cache. Next, the data is found in the DRAM data cache, and it is returned to the logic circuit via the I / O driver / I / O receiver again. Then, the data is finally transferred to C through the I / O driver.
Output to PU. According to the present invention, when the cache is integrated on the same chip, the I / O
Since the driver / receiver has a total of five passes, which is reduced to only one pass of the final I / O driver, the delay of 10 to 20 ns is eliminated when the above delay time is taken into consideration.

【００２９】同一チップ上に実装した場合にＤＲＡＭの
アクセス性能が向上する別の理由としては論理チップ上
にインテグレートされるＤＲＡＭメモリが汎用のＤＲＡ
Ｍメモリチップほどには容量／チップ面積の制約が厳し
くないことも挙げられる。例えば、最終的に１６Ｍビッ
トのＤＲＡＭメモリを構成する場合においても、１Ｍｂ
以下のブロックを単位として実装することができる。こ
の結果、ワードラインドライバ、センスアンプなどの負
荷が小さくなり、結果として元の汎用ＤＲＡＭの特性よ
りもかなりの向上が望める。Another reason for improving the access performance of the DRAM when mounted on the same chip is that the DRAM memory integrated on the logic chip is a general-purpose DRA.
It can also be mentioned that the restrictions on the capacity / chip area are not as severe as for M memory chips. For example, even when finally configuring a 16 Mbit DRAM memory, 1 Mb
The following blocks can be implemented as a unit. As a result, the load on the word line driver, the sense amplifier, etc. is reduced, and as a result, the characteristics of the original general-purpose DRAM can be improved considerably.

【００３０】また、汎用ＤＲＡＭを用いた際の問題点で
あったビット幅の問題も論理チップ上にＤＲＡＭメモリ
をインテグレートすることによって解消される。論理チ
ップ内にインテグレートする場合、全てのデータビット
をＩ／Ｏセルを通じて外部にとりだす必要がなくなる。
言いかえれば、プロセッサとの間のデータ転送に必要な
ビット数のみをＩ／Ｏセルを通して外部に取り出すだけ
でよい。この結果、かなり広いビット幅を自由に使用す
ることが可能となり、セットアソシエイテイブキャッシ
ュなどの実現が容易となる。Further, the problem of the bit width, which is a problem when using the general-purpose DRAM, can be solved by integrating the DRAM memory on the logic chip. When integrating in a logic chip, it is not necessary to take all data bits out through the I / O cells.
In other words, only the number of bits required for data transfer with the processor need be taken out through the I / O cell. As a result, it is possible to freely use a considerably wide bit width, and it becomes easy to realize a set associative cache.

【００３１】さらにデータメモリを同一チップ上に実装
しない場合は、多数のビットをＩ／Ｏドライバを介して
ドライブすることになるが、これは電源ノイズの制約上
困難である。また、汎用ＤＲＡＭチップの場合、パッケ
ージコストの制約からむやみに広いデータビット幅を用
いることは妥当でない。Further, if the data memory is not mounted on the same chip, a large number of bits will be driven through the I / O driver, but this is difficult due to power supply noise restrictions. Further, in the case of a general-purpose DRAM chip, it is not appropriate to use an unnecessarily wide data bit width due to package cost restrictions.

【００３２】次にＤＲＡＭメモリのアクセスタイムがＬ
２キャッシュを実現するのに要求される速度よりも遅い
場合について有効にプロセッサの性能を発揮する工夫に
ついて述べる。この場合は図６に示すようにキャッシュ
データメモリ内の特別なデータ配置を用いる。このよう
に「交互に」配置する方式をインタリーブという。この
実施例においては同一セットの二つのキャッシュライン
の間で、ワード単位でインタリーブして配置し、同一キ
ャッシュラインの隣り合う２ワードが同時にアクセス可
能な独立したＤＲＡＭブロックに配置されるようにす
る。Next, the access time of the DRAM memory is L
This section describes a device that effectively demonstrates the performance of the processor when the speed is slower than the speed required to realize the 2-cache. In this case, a special data arrangement in the cache data memory is used as shown in FIG. Such a method of "alternating" is called interleaving. In this embodiment, two cache lines of the same set are interleaved on a word-by-word basis, and two adjacent words of the same cache line are placed in independent DRAM blocks that can be accessed simultaneously.

【００３３】この配置はＤＲＡＭメモリのアクセスタイ
ムが第一ワードのリードオフサイクルには適合するが、
第二ワード以降のバーストサイクルをノーウエイトで実
現するには、ページモードアクセスタイムが遅すぎる場
合の解決策として特に有効である。Although this arrangement has a DRAM memory access time suitable for the read-off cycle of the first word,
It is particularly effective as a solution when the page mode access time is too slow to realize the burst cycle of the second word and thereafter without waiting.

【００３４】具体的には第一ワードのリードオフサイク
ルでＤＡ１，ＤＢ１の２ワードがＤＲＡＭメモリからア
クセスされる。今、セットＡでキャッシュヒットしたと
仮定すると、この場合後続のバーストサイクルではＤＡ
２，ＤＡ３，ＤＡ４の３ワードがアクセスされる必要が
あるが、ＤＡ２のアクセスと並行してＤＡ３のアクセス
を行うことが可能となる。結果として、ＤＲＡＭメモリ
のページモードアクセスタイムを短縮したのと同じ効果
が実現される。Specifically, two words DA1 and DB1 are accessed from the DRAM memory in the read-off cycle of the first word. Assuming now that a cache hit occurs in set A, in the subsequent burst cycle, DA
Although 3 words of 2, DA3 and DA4 need to be accessed, it is possible to access DA3 in parallel with the access of DA2. As a result, the same effect as shortening the page mode access time of the DRAM memory is realized.

【００３５】インタリーブ方式自体は古くから知られる
手法であり、インタリーブのウエイ数分のメモリバンク
を用意すれば簡単に実現されるが、本方式の特に優れて
いる点はセットアソシエーテイブキャッシュ用にすでに
用意されている複数のメモリバンクを用いて、さらに、
同一セットの属する複数キャッシュライン間のインタリ
ーブを行うことにより、新たに、メモリバンクを分割す
る必要がないことである。The interleave method itself has been known for a long time, and can be easily realized by preparing memory banks for the number of ways of interleaving. However, the particularly excellent point of this method is that it is used for the set associative cache. Using multiple memory banks already prepared,
By interleaving a plurality of cache lines to which the same set belongs, it is not necessary to newly divide the memory bank.

【００３６】かかる方式を採用することによってＤＲＡ
Ｍ固有の問題であるプリチャージタイムの問題も解決で
きる。プリチャージタイムとは、ＤＲＡＭがアクセスさ
れてから、次のアクセスを受け入れられるようになるま
でに必要な休止期間のことをいう。プロセッサから二次
キャッシュへのアクセスが連続的に発生しているような
状況においてはプリチャージタイム自体がキャッシュ全
体の休止時間となってしまい，プロセッサの性能を十分
に発揮できない原因となる。本方式を採用することによ
って、見かけ上アクセスタイムが速くなるので先行アク
セスが可能となり、予めアクセスしたおいたワードをプ
ロセッサに帰している間にプリチャージタイムを確保す
ることが可能となる。By adopting such a system, the DRA
The problem of precharge time, which is a problem peculiar to M, can also be solved. The precharge time is a quiescent period required after the DRAM is accessed until the next access can be accepted. In a situation where the processor continuously accesses the secondary cache, the precharge time itself becomes the pause time of the entire cache, which causes the processor performance to be insufficient. By adopting this method, the access time is apparently shortened, so that the preceding access becomes possible, and the precharge time can be secured while the previously accessed word is returned to the processor.

【００３７】さて、このようにＳＲＡＭではなく、ＤＲ
ＡＭメモリを使用することの短所としてはＤＲＡＭはメ
モリのリフレッシュが必要であるということである。リ
フレッシュ動作は典型的には一つのメモリ行について、
１６ｍｓ程度の周期で必要であり、その所要時間は１０
０ｎｓくらいである。リフレッシュは１行ごとに行う必
要があるので、１０２４行を有するＤＲＡＭメモリの場
合はその１６ｍｓの時間のうちに順次に１０２４回のリ
フレッシュ動作を行うことになる。リフレッシュを行っ
ている間は二次キャッシュはプロセッサに対してサービ
スができないので、そのサービス不能な累積時間は１０
０ｎｓｘ１０２４＝０．１ｍｓ程度となり、これは１６
ｍｓのリフレッシュ周期の約０．６％を占める。このよ
うなリフレッシュ動作が直ちにそのリフレッシュ周期に
占める時間の割合分プロセッサの性能を低下させるわけ
ではない。しかし、頻繁に二次キャッシュがアクセスさ
れるような環境においてはいくらかのプロセッサ性能の
低下を来す可能性がある以上、これを有効に防止するこ
とが望ましい。Now, as described above, not the SRAM but the DR
The disadvantage of using AM memory is that DRAM requires memory refresh. The refresh operation is typically for one memory row,
It is necessary in a cycle of about 16 ms, and the required time is 10
It is about 0 ns. Since it is necessary to perform the refresh for each row, in the case of the DRAM memory having 1024 rows, the refresh operation is sequentially performed 1024 times within the 16 ms time. Since the secondary cache cannot service the processor while refreshing, the accumulated unserviceable time is 10
0nsx1024 = 0.1ms or so, which is 16
It occupies about 0.6% of the ms refresh period. Such a refresh operation does not immediately degrade the performance of the processor by the ratio of the time occupied in the refresh cycle. However, in the environment where the secondary cache is frequently accessed, some deterioration of the processor performance may occur, so it is desirable to effectively prevent this.

【００３８】本願発明においてはＤＲＡＭメモリを採用
したことから生じるこの弊害を二次キャッシュに対して
なんらアクテイビテイーが発生しないことが保証される
期間にリフレッシュ動作を実行するという手法によって
解決する。ここで、「アクテイビテイーが発生しない期
間」とは、具体的にいえば二次キャッシュミス、Ｉ／Ｏ
アクセス等でプロセッサ、二次キャッシュともに外部か
らの終了信号を待っているような場面をいう。In the present invention, this adverse effect resulting from the adoption of the DRAM memory is solved by the method of executing the refresh operation during the period in which it is guaranteed that no activity will occur in the secondary cache. Here, the “period in which no activity occurs” is specifically a secondary cache miss or I / O.
A situation in which both the processor and the secondary cache are waiting for a termination signal from the outside due to access or the like.

【００３９】図７に二次キャッシュがミスし、メインメ
モリに対してデータアクセスを行っている最中にリフレ
ッシュ動作をなす時のタイミング図を示す。このよう
に、メインメモリに対するデータアクセスは相当の長い
時間を必要とし、それが完了した時にメインメモリから
プロセッサに対してＲＤＹ信号が発せられるのである
が、示されているとおりメインメモリへのアクセスから
ＲＤＹ信号の発生までの期間においてリフレッシュサイ
クルが設定されている。このリフレッシュは各メモリ行
ごとに行うものであるため、どのアドレス行をリフレッ
シュするかを特定するリフレッシュアドレスカウンタが
設けられる。そして、このカウンタは一つの行について
リフレッシュ動作が完了する度に＋１インクリメントさ
れる。リフレッシュ周期である１６ｍｓ経過後にカウン
タが最終行を示していないときはリフレッシュできてい
ないメモリ行が存在することになるので、この場合に初
めて残りのメモリ行をまとめてリフレッシュする。この
ような方式を採用することによってリフレッシュサイク
ルを隠蔽し、それによってプロセッサの性能の低下を最
小限にすることができる。FIG. 7 is a timing chart when the secondary cache misses and the refresh operation is performed during the data access to the main memory. As described above, data access to the main memory requires a considerably long time, and when the data access is completed, the RDY signal is issued from the main memory to the processor. A refresh cycle is set in the period until the generation of the RDY signal. Since this refresh is performed for each memory row, a refresh address counter for specifying which address row is refreshed is provided. Then, this counter is incremented by +1 every time the refresh operation is completed for one row. When the counter does not indicate the last row after the lapse of 16 ms, which is the refresh cycle, there is a memory row that has not been refreshed. Therefore, in this case, the remaining memory rows are collectively refreshed. By adopting such a scheme, the refresh cycle can be concealed, so that the deterioration of the performance of the processor can be minimized.

【００４０】ここで、かかる方式によってリフレッシュ
サイクルがどの程度隠蔽可能となるかを計算によって示
す。本願発明にしたがって二次キャッシュが０ウエイト
で一次キャッシュにラインフィルする状態においては、
１５０ｎｓで一回の一次キャッシュのラインフィルが完
成する。すなわち、１６ｍｓのリフレッシュ周期の間に
１６６０００回程度の一次キャッシュのラインフィルを
実行する能力を有する。この最大能力の１／５程度で一
次キャッシュから二次キャッシュへのアクセスが発生す
ると仮定しても（これは極めて小さめの仮定である）、
その回数は１６ｍｓのリフレッシュ周期の間に３０００
０回程度となるであろう。従って、この３００００回の
二次キャッシュに対するアクセス中に１０２４回以上二
次キャッシュミスが発生すれば、リフレッシュ周期中に
全てのメモリ行のリフレッシュが可能となる。通常、二
次キャッシュのヒット率は高々９０％程度であり、そう
すると３０００回の二次キャッシュミスがリフレッシュ
周期中に発生することとなり、これは必要である１０２
４回を大きく上回る。従って、この方式によればリフレ
ッシュサイクルはほぼ完全に隠蔽可能である。Here, how much the refresh cycle can be hidden by this method will be shown by calculation. According to the present invention, in the state where the secondary cache is line-filling the primary cache with 0 wait,
The line fill of the primary cache is completed once in 150 ns. That is, it has the ability to execute the primary cache line fill about 166000 times during the refresh period of 16 ms. Even if it is assumed that access from the primary cache to the secondary cache occurs at about 1/5 of this maximum capacity (this is an extremely small assumption),
The number of times is 3000 during the refresh cycle of 16 ms.
It will be about 0 times. Therefore, if the secondary cache miss occurs 1024 times or more during the access to the secondary cache of 30,000 times, all the memory rows can be refreshed during the refresh cycle. Usually, the hit rate of the secondary cache is at most about 90%, which means that 3000 secondary cache misses will occur during the refresh period, which is necessary.
Greatly exceeded four times. Therefore, according to this method, the refresh cycle can be hidden almost completely.

【００４１】キャッシュメモリをＤＲＡＭメモリで構成
する、という点に着目すると例えば遠隔処理システムに
おいて中央の大容量メモリと各プロセッサノードの物理
的な距離が大きいために生じる遅延が問題となる場合
に、各プロセッサノードと頻繁にやり取りするデータを
格納するプライベートなキャッシュメモリを中央の大容
量メモリとは別にＤＲＡＭメモリで構成し各プロセッサ
ノード付近へ設置する、という方式が想定される。この
場合、遅延はその大部分は中央の大容量メモリとプロセ
ッサノードとの物理的な距離によって発生するために、
ＤＲＡＭメモリで構成するプライベートキャッシュメモ
リはさほど高速なものではなくとも、アクセス速度の向
上に寄与することが可能であろう。もちろん、本願発明
のものをかかる用途に使用すれば、アクセス速度の向上
がいっそう果たせることは明白である。Focusing on the point that the cache memory is composed of a DRAM memory, for example, in a remote processing system, when a delay caused by a large physical distance between a central large capacity memory and each processor node becomes a problem, A method is conceivable in which a private cache memory for storing data frequently exchanged with the processor nodes is configured by a DRAM memory in addition to the central large capacity memory and installed near each processor node. In this case, the delay is mostly due to the physical distance between the central mass memory and the processor node,
Even if the private cache memory configured by the DRAM memory is not very high speed, it can contribute to the improvement of access speed. Of course, it is obvious that the access speed can be further improved by using the present invention for such an application.

【００４２】[0042]

【発明の効果】本願発明によれば、従来ＤＲＡＭメモリ
で構成しえなかったキャッシュメモリをＤＲＡＭメモリ
を用いて構成できる。そして、これによってキャッシュ
メモリを非常に小型で安価なものにすることが可能であ
ろう。According to the present invention, a cache memory which cannot be constructed by a conventional DRAM memory can be constructed by using a DRAM memory. And this would allow the cache memory to be very small and inexpensive.

【００４３】また、本願発明によれば、プロセッサに対
して０ウエイトで動作可能な二次キャッシュを小さな面
積、低消費電力を有するＤＲＡＭメモリで実現可能であ
る。ページモードを使用する際にキャッシュデータメモ
リへの格納を同一セットに属する複数のキャッシュライ
ン間でインターリーブにすることによって、プロセッサ
のサイクルタイムよりも実質的に遅いＤＲＡＭメモリを
用いて０ウエイト動作可能となる。さらに、二次キャッ
シュに対して何等のアクテイビテイーも発生しないこと
が保証される期間に、ＤＲＡＭメモリのリフレッシュを
行う方式によって、ＤＲＡＭメモリ固有のリフレッシュ
のオーバヘッドがほぼ完全に解消される。そして、これ
らの結合によってプロセッサの性能をフルに活用できる
二次キャッシュを構成可能とする。Further, according to the present invention, the secondary cache which can operate with 0 wait for the processor can be realized by the DRAM memory having a small area and low power consumption. When the page mode is used, the storage in the cache data memory is interleaved between a plurality of cache lines belonging to the same set, so that the 0 wait operation can be performed using the DRAM memory that is substantially slower than the cycle time of the processor. Become. Furthermore, the refresh overhead inherent to the DRAM memory is almost completely eliminated by the method of refreshing the DRAM memory during the period in which it is guaranteed that no activity will occur with respect to the secondary cache. Then, by combining these, it becomes possible to construct a secondary cache that can fully utilize the performance of the processor.

[Brief description of drawings]

【図１】プロセッサと一次キャッシュ、二次キャッシュ
との一般的な接続を示す図である。FIG. 1 is a diagram showing a general connection between a processor, a primary cache, and a secondary cache.

【図２】０ウエイト動作で一次キャッシュに対してライ
ンフィルする時のタイミング図である。FIG. 2 is a timing chart when line filling is performed on the primary cache in a 0 wait operation.

【図３】二次キャッシュのインプレメンテーションの一
実施例である。FIG. 3 is an example of second level cache implementation.

【図４】二次キャッシュのタグメモリ、データメモリ内
のデータの配置を示す図である。FIG. 4 is a diagram showing an arrangement of data in a tag memory and a data memory of a secondary cache.

【図５】本願発明による０ウエイト動作に係わる二次キ
ャッシュのアクセスのタイミング図である。FIG. 5 is a timing diagram of secondary cache access related to 0 wait operation according to the present invention.

【図６】同一セットのキャッシュラインにインターリー
ブされた配置でデータを格納した実施例である。FIG. 6 is an embodiment in which data is stored in an interleaved arrangement in cache lines of the same set.

【図７】本願発明によるＤＲＡＭリフレッシュのタイミ
ング図である。FIG. 7 is a timing diagram of DRAM refresh according to the present invention.

【図８】ＤＲＡＭを外付けしてＩ／ＯドライバとＩ／Ｏ
レシーバを介してデータのやり取りを行った場合の図で
ある。FIG. 8 shows an I / O driver and an I / O with an external DRAM.
It is a figure at the time of exchanging data via a receiver.

Claims

[Claims]

1. A processor, a primary cache that is connected to the processor and transfers data to each other, and has a larger capacity than the primary cache, is connected to the primary cache, and is requested by the processor in the primary cache. In a data processing system including a secondary cache that is accessed when data does not exist, the secondary cache has a control logic circuit which controls the control, a tag memory which stores address information of the secondary cache,
A data processing system, comprising a data memory for storing data requested by a processor, wherein the data memory is a DRAM memory and is integrated on the same chip as the control logic circuit and the tag memory.

2. The data processing system according to claim 1, wherein the data memory is composed of cells of a unit of 2 Mb or less.

3. The data processing system according to claim 1, wherein the data memory stores interleaved data of several words to be continuously read.

4. The data processing system according to claim 1, wherein the data memory performs a refresh operation by selecting a time when there is no external activity to the secondary cache.

5. The refresh operation is performed based on an instruction from a counter that identifies an address row to be refreshed, and the refresh is completed when the counter does not indicate the last address row after a refresh cycle has elapsed. Refresh all memory rows,
The data processing system according to claim 4.

6. A cache memory connected between a processor and a main memory, in which at least a control logic circuit for controlling the control, a tag memory for storing address information, and a data memory requested by the processor are on the same chip. A cache memory, wherein the data memory comprises a DRAM.

7. A remote data processing system in which a DRAM is applied as a cache data memory in the vicinity of each processor node in addition to a central large capacity main memory, and a private cache memory for storing frequently transferred data is installed.

8. The private cache memory is configured by integrating at least a control logic circuit for controlling the control, a tag memory for storing address information, and a data memory requested by a processor on the same chip. 7. Remote data processing system.