JP4904802B2

JP4904802B2 - Cache memory and processor

Info

Publication number: JP4904802B2
Application number: JP2005366569A
Authority: JP
Inventors: 晃成轟
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2005-02-01
Filing date: 2005-12-20
Publication date: 2012-03-28
Anticipated expiration: 2025-12-20
Also published as: JP2006244460A

Description

本発明は、キャッシュメモリ及びプロセッサに係り、特にマルチスレッドプロセッサ等の複数の処理を並列に実行するプロセッサに備えられるキャッシュメモリ及びこのようなキャッシュメモリを備えたプロセッサに関する。 The present invention relates to a cache memory and a processor, and more particularly to a cache memory provided in a processor that executes a plurality of processes in parallel, such as a multithread processor, and a processor including such a cache memory.

近年、複数のスレッドやタスク（実施形態１ではスレッドに統一して記す）を並列に実行するマルチプロセッサ（マルチコア）、あるいはマルチスレッドプロセッサが注目されている。このようなプロセッサは、総称してマルチプロセッサシステムとも呼ばれる。マルチプロセッサシステムでは、データ等が蓄積されている外部メモリへのアクセスを効率化するため、外部メモリからいったん読み出されたデータのうちの処理に使用される可能性のあるデータを保存しておくキャッシュメモリを備えるマルチプロセッサシステムがある。このような構成の従来技術として、例えば、特許文献１が掲げられる。 In recent years, attention has been paid to a multiprocessor (multicore) or a multithread processor that executes a plurality of threads and tasks (in the first embodiment, unified as threads) in parallel. Such processors are collectively referred to as multiprocessor systems. In a multiprocessor system, data that may be used for processing out of data once read from the external memory is stored in order to improve the efficiency of access to the external memory in which data or the like is stored. There are multiprocessor systems with cache memory. For example, Patent Document 1 is cited as a conventional technique having such a configuration.

また、特許文献１に示したようなキャッシュメモリを備えたマルチプロセッサでは、複数のプロセッサ間で処理に使用されるデータの一致性（コヒーレンシー）を保つことが必要である。従来のプロセッサでは、データのコヒーレンシーを保つためにバス・スヌープが多く採用されている。バス・スヌープとは、各プロセッサ間で共有されるメモリインターフェイスバス上のトランザクションを観察し、自身に割り当てられているキャッシュメモリ上にあるデータにかかるトランザクションが発生したか否かを検出する機能である。 In addition, in a multiprocessor having a cache memory as shown in Patent Document 1, it is necessary to maintain consistency (coherency) of data used for processing among a plurality of processors. Conventional processors often employ bus snoops to maintain data coherency. Bus snoop is a function for observing a transaction on a memory interface bus shared between processors and detecting whether a transaction related to data in a cache memory allocated to the processor has occurred. .

自身に割り当てられたキャッシュメモリ上にあるデータのトランザクションが発生した場合、プロセッサは、キャッシュメモリの該当するエントリを更新し、マルチプロセッサシステムにおける各プロセッサのキャッシュメモリ領域等に保存されているデータの内容を統一する。バス・スヌープには多くの実装法があり、例えばライトワンスやバークレイプロトコルがある。
特開２００４−１７８５７１ When a transaction of data in the cache memory allocated to itself occurs, the processor updates the corresponding entry in the cache memory, and the contents of the data stored in the cache memory area of each processor in the multiprocessor system To unify. There are many implementations of bus snoop, such as write-once and Berkeley protocols.
JP 2004-178571 A

しかしながら、上記した特許文献１の発明は、キャッシュメモリの記憶領域を独立の領域に分割し、マルチプロセッサシステムで同時に実行されるスレッドごとに割り当てている。このような特許文献１によれば、キャッシュメモリに必要とされるデータが保存されていて、このデータに対するアクセスの成功（ヒット）の割合（ヒット率）が低下するという不具合がある。 However, the above-described invention of Patent Document 1 divides the storage area of the cache memory into independent areas and assigns them to threads that are simultaneously executed in the multiprocessor system. According to Patent Document 1 described above, there is a problem in that data required for the cache memory is stored, and the rate (hit rate) of successful access (hit) to the data decreases.

さらに、特許文献１に記した構成でバス・スヌープの機能を用いコヒーレンシーを保持する場合、バスをモニタする回路によってマルチプロセッサシステムのハードウェア構成が大規模化する。また、バスを常時モニタするために消費電力が高まり、キャッシュメモリの独立した各領域に各々アクセスしてデータを書き換えるため、コヒーレンシー保持の動作効率が低いという課題が生じる。 Furthermore, when the coherency is maintained using the bus snoop function in the configuration described in Patent Document 1, the hardware configuration of the multiprocessor system is enlarged by a circuit that monitors the bus. Further, power consumption increases because the bus is constantly monitored, and data is rewritten by accessing each independent area of the cache memory, resulting in low coherency holding operation efficiency.

本発明は、上記した点に鑑みてなされたものであって、マルチプロセッサシステムにあって装置構成を大きくする、あるいは消費電力を高めることがなく、プロセッサ間でデータのコヒーレンシーを動作効率よく保つことができるキャッシュメモリ及びこのキャッシュメモリを備えたプロセッサを提供することを目的とする。 The present invention has been made in view of the above-described points, and maintains data coherency between processors efficiently in a multiprocessor system without increasing the device configuration or increasing power consumption. It is an object of the present invention to provide a cache memory capable of performing the above and a processor including the cache memory.

以上の課題を解決するため、本発明のキャッシュメモリは、複数のプロセッサによって記憶装置から読み出されたデータの少なくとも一部をキャッシュしておき、キャッシュされたデータの少なくとも一部を前記プロセッサに供給するキャッシュメモリであって、前記記憶装置から読み出されたデータが保存されるデータ保存手段と、前記データ保存手段に記憶されているデータのアドレスを、前記データ保存手段が記憶しているデータ全体について一括して管理するアドレス管理手段と、前記プロセッサによって供給を要求されたデータのアドレスを前記アドレス管理手段によって管理されているアドレスと照合し、供給を要求されたデータが前記データ保存手段から読出し可能であるか否か検出するヒット検出手段と、前記ヒット検出手段によってデータが読出し可能であることが検出された場合、検出されたデータを前記プロセッサに供給するデータ供給手段と、を備えることを特徴とする。 In order to solve the above problems, the cache memory of the present invention caches at least a part of data read from a storage device by a plurality of processors, and supplies at least a part of the cached data to the processor. A data storage means for storing data read from the storage device, and an address of the data stored in the data storage means, the entire data stored in the data storage means Address management means that collectively manage the data, and the address of the data requested to be supplied by the processor is checked against the address managed by the address management means, and the data requested to be supplied is read from the data storage means Hit detection means for detecting whether or not possible, and the hit detection means Data if it is detected is readable, characterized in that it comprises a data supply means for supplying the detected data to said processor by.

このような発明によれば、複数のプロセッサによってキャッシュされたデータのデータ保存手段におけるアドレスを一括して管理することができるので、複数のプロセッサでキャッシュされたデータを実質的に１つのデータ保存手段に保存することになり、データ保存手段内におけるデータの不統一をなくすことができる。このため、プロセッサ間でデータのコヒーレンシーを動作効率よく保つことができるキャッシュメモリを提供することができる。さらに、データのコヒーレンシーを保つにあたり、別途回路等を追加する必要がないため、キャッシュメモリの装置構成を大きくする、あるいは消費電力を高めることがない。 According to such an invention, since the addresses in the data storage means of the data cached by the plurality of processors can be collectively managed, the data cached by the plurality of processors is substantially stored in one data storage means. Therefore, it is possible to eliminate data inconsistency in the data storage means. Therefore, it is possible to provide a cache memory that can keep data coherency between processors efficiently. Furthermore, since it is not necessary to add a separate circuit or the like to maintain data coherency, the cache memory device configuration is not increased or the power consumption is not increased.

また、本発明のキャッシュメモリは、前記データ保存手段から読み出されたデータ、前記データ保存手段に書き込まれるデータの少なくとも一方を一時的に保存するバッファ手段をさらに備えることを特徴とする。
このような発明によれば、データ管理手段及びデータ保存手段に対するアクセス回数が低減し、キャッシュメモリへのアクセス速度を高め、キャッシュメモリを採用したプロセッサの処理速度を向上することができる。 The cache memory according to the present invention further includes buffer means for temporarily storing at least one of data read from the data storage means and data written to the data storage means.
According to such an invention, the number of accesses to the data management means and the data storage means can be reduced, the access speed to the cache memory can be increased, and the processing speed of the processor employing the cache memory can be improved.

また、本発明のキャッシュメモリは、前記データ供給手段が、前記ヒット検出手段によってデータが読出し可能であることが検出された場合、検出されたデータを前記プロセッサに供給すると共に、前記プロセッサに供給されたデータと連続するデータを含むデータを、前記データ保存手段から読み出されたデータを一時的に保存する前記バッファ手段にも供給することを特徴とする。 The cache memory according to the present invention supplies the detected data to the processor and the processor when the data supply means detects that the data can be read by the hit detection means. The data including the continuous data is also supplied to the buffer means for temporarily storing the data read from the data storage means.

このような発明によれば、プロセッサに供給されたデータに続くデータを予めバッファに保存することができ、次回にこのデータが要求されたときデータ管理手段及びデータ保存手段にアクセスする必要がなくなる。このため、データ管理手段及びデータ保存手段に対するアクセス回数が低減し、キャッシュメモリへのアクセス速度を高め、キャッシュメモリを採用したプロセッサの処理速度を向上することができる。 According to such an invention, data following the data supplied to the processor can be stored in the buffer in advance, and there is no need to access the data management means and the data storage means when this data is requested next time. Therefore, the number of accesses to the data management unit and the data storage unit can be reduced, the access speed to the cache memory can be increased, and the processing speed of the processor employing the cache memory can be improved.

また、本発明のキャッシュメモリは、前記プロセッサによって供給が要求されることが予想されるデータをキャッシュしておく先読みデータ保存手段をさらに備えることを特徴とする。
このような発明によれば、データ管理手段及びデータ保存手段に対するアクセス回数が低減し、キャッシュメモリへのアクセス速度を高め、キャッシュメモリを採用したプロセッサの処理速度を向上することができる。 The cache memory according to the present invention further includes pre-read data storage means for caching data expected to be supplied by the processor.
According to such an invention, the number of accesses to the data management means and the data storage means can be reduced, the access speed to the cache memory can be increased, and the processing speed of the processor employing the cache memory can be improved.

また、本発明のキャッシュメモリは、前記データ管理手段が、前記データ保存手段のアドレスを複数のウェイとして管理すると共に、データ保存手段に保存されているデータを前記データ保存手段にデータを保持する際の優先度である保存優先度を前記ウェイごとに付し、かつ、各ウェイに付される保存優先度を該ウェイで管理されているデータに対するアクセスの状態に基づいて決定することを特徴とする。 In the cache memory according to the present invention, the data management unit manages the address of the data storage unit as a plurality of ways, and stores data stored in the data storage unit in the data storage unit. The storage priority is assigned to each way, and the storage priority assigned to each way is determined based on the state of access to the data managed by the way. .

このような発明によれば、データを複数のウェイで管理する場合にもＬＲＵ方式を採用してキャッシュのヒット率を高めることができる。
また、本発明のキャッシュメモリは、前記データ保存手段、前記データ管理手段の少なくとも一方がマルチポートメモリであることを特徴とする。
このような発明によれば、複数のプロセッサがデータメモリ、タグメモリに高速にアクセスすることが可能になってマルチプロセッサの処理能力を向上させることができる。 According to such an invention, even when data is managed by a plurality of ways, the cache hit rate can be increased by adopting the LRU method.
The cache memory according to the present invention is characterized in that at least one of the data storage means and the data management means is a multiport memory.
According to such an invention, it becomes possible for a plurality of processors to access the data memory and the tag memory at high speed, and the processing capability of the multiprocessor can be improved.

また、本発明のプロセッサは、複数のプロセッサによって記憶装置から読み出されたデータの少なくとも一部をキャッシュしておき、キャッシュされたデータの少なくとも一部を前記プロセッサに供給するキャッシュメモリを備えたプロセッサであって、前記キャッシュメモリは、前記記憶装置から読み出されたデータが保存されるデータ保存手段と、前記データ保存手段に記憶されているデータのアドレスを、前記データ保存手段が記憶しているデータ全体について一括して管理するアドレス管理手段と、前記プロセッサによって供給を要求されたデータのアドレスを前記アドレス管理手段によって管理されているアドレスと照合し、供給を要求されたデータが前記データ保存手段から読出し可能か否かを検出するヒット検出手段と、前記ヒット検出手段によってデータが読出し可能であることが検出された場合、検出されたデータを前記プロセッサに供給するデータ供給手段と、を備えることを特徴とする。 The processor of the present invention includes a cache memory that caches at least part of data read from the storage device by a plurality of processors and supplies at least part of the cached data to the processor. In the cache memory, the data storage means stores the data storage means for storing the data read from the storage device, and the address of the data stored in the data storage means. Address management means for collectively managing the entire data, the address of the data requested to be supplied by the processor is collated with the address managed by the address management means, and the data requested to be supplied is the data storage means Hit detection means for detecting whether or not reading is possible, and If the data by Tsu preparative detecting means is detected to be readable, characterized in that it comprises a data supply means for supplying the detected data to said processor.

このような発明によれば、複数のプロセッサによってキャッシュされたデータのデータ保存手段におけるアドレスを一括して管理することができるので、複数のプロセッサでキャッシュされたデータを実質的に１つのデータ保存手段に保存することになり、データ保存手段内におけるデータの不統一をなくすことができる。このため、プロセッサ間でデータのコヒーレンシーを動作効率よく保つことができるプロセッサを提供することができる。さらに、データのコヒーレンシーを保つにあたり、別途回路等を追加する必要がないため、プロセッサにあって装置構成を大きくする、あるいは消費電力を高めることがない。 According to such an invention, since the addresses in the data storage means of the data cached by the plurality of processors can be collectively managed, the data cached by the plurality of processors is substantially stored in one data storage means. Therefore, it is possible to eliminate data inconsistency in the data storage means. For this reason, it is possible to provide a processor that can maintain data coherency between processors efficiently. Furthermore, since it is not necessary to add a separate circuit or the like to maintain data coherency, the apparatus configuration in the processor is not increased, or power consumption is not increased.

また、本発明のキャッシュメモリは、複数の前記プロセッサの各々が、スレッドごとに処理を実行すると共に、処理の実行中に実行されているスレッドを他のスレッドと変更し得ることを特徴とする。
このような発明によれば、１つのプロセッサが共通するデータにアクセスする可能性が高いマルチスレッドプロセッサにあってもプロセッサ間でデータのコヒーレンシーを動作効率よく保つことができる。 In addition, the cache memory according to the present invention is characterized in that each of the plurality of processors executes processing for each thread and can change a thread being executed during the processing to another thread.
According to such an invention, even in a multithread processor in which one processor is likely to access common data, data coherency can be maintained between the processors with high operational efficiency.

以下、図を参照して本発明に係るキャッシュメモリ及びこのキャッシュメモリを備えたプロセッサの実施の形態１、実施形態２を説明する。 Hereinafter, a first embodiment and a second embodiment of a cache memory according to the present invention and a processor including the cache memory will be described with reference to the drawings.

（実施形態１）
図１は、本発明の実施形態１、実施形態２に共通のキャッシュメモリを備えたマルチスレッドプロセッサ１０１を示した図である。マルチスレッドプロセッサ１０１は、外部メモリ１０５からデータを読み出す、あるいは実行されたスレッドの結果等を外部メモリ１０５に書込んでいる。 (Embodiment 1)
FIG. 1 is a diagram showing a multi-thread processor 101 having a cache memory common to the first and second embodiments of the present invention. The multi-thread processor 101 reads data from the external memory 105 or writes the result of the executed thread in the external memory 105.

また、マルチスレッドプロセッサ１０１は、キャッシュメモリ１０９を備え、外部メモリ１０５に対し、キャッシュメモリ１０９を介してデータを読み出す、あるいは書込んでいる。キャッシュメモリ１０９は、マルチスレッドプロセッサ１０１に含まれる複数のプロセッサによってキャッシュされたデータを保存しておき、キャッシュされたデータの少なくとも一部を複数のプロセッサのいずれかに供給する。このため、マルチスレッドプロセッサ１０１は、外部メモリ１０５にアクセスすることなくデータの多くの供給を受けることができる。 In addition, the multi-thread processor 101 includes a cache memory 109 and reads or writes data to and from the external memory 105 via the cache memory 109. The cache memory 109 stores data cached by a plurality of processors included in the multi-thread processor 101, and supplies at least a part of the cached data to any of the plurality of processors. Therefore, the multi-thread processor 101 can receive a large amount of data without accessing the external memory 105.

キャッシュメモリ１０９は、一般的に外部メモリ１０５よりもプロセッサが高速にアクセス可能な構成を有している。このため、キャッシュメモリ１０９にアクセスしてデータを取得する構成は、マルチスレッドプロセッサ１０１のデータの読出し及び書込みを高速化すると共に外部メモリ１０５に対するアクセス回数を低減し、マルチスレッドプロセッサ１０１の処理速度及び処理効率を高めることができる。 The cache memory 109 generally has a configuration that allows a processor to access it faster than the external memory 105. For this reason, the configuration in which the cache memory 109 is accessed to acquire data speeds up the reading and writing of data of the multithread processor 101 and reduces the number of accesses to the external memory 105, Processing efficiency can be increased.

また、マルチスレッドプロセッサ１０１は、複数のスレッドと、スレッドの実行に使用されるプロセッサとを一対一に割り付けるものでなく、複数のプロセッサの各々が、スレッドごとに処理を実行すると共に、処理の実行中に実行されているスレッドを他のスレッドと変更し得る。このようなマルチスレッドプロセッサ１０１は、マルチスレッドＯＳによって動作するものである。 In addition, the multi-thread processor 101 does not assign a plurality of threads and a processor used for executing the thread one-to-one, and each of the plurality of processors executes a process for each thread and executes the process. Threads running inside can be changed with other threads. Such a multi-thread processor 101 is operated by a multi-thread OS.

すなわち、マルチスレッドプロセッサ１０１では、複数のプロセッサが、スレッドの優先順位に応じて実行されるスレッドを動的に変更する。このような動作を図２に例示する。図２に示した例では、マルチスレッドプロセッサ１０１がプロセッサ０からプロセッサ３の４つのプロセッサを備えている。そして、いずれのプロセッサもスレッドの実行中により優先度の高いスレッドの割込みが入り、割り込んだスレッドに処理を切り換えている。 That is, in the multi-thread processor 101, a plurality of processors dynamically change threads to be executed according to the priority order of threads. Such an operation is illustrated in FIG. In the example shown in FIG. 2, the multi-thread processor 101 includes four processors from processor 0 to processor 3. In any of the processors, an interrupt of a thread having a higher priority is entered during execution of the thread, and the processing is switched to the interrupted thread.

スレッドが切り替えられる場合、マルチスレッドプロセッサ１０１は、切り替え直前のスレッドの状態や結果（コンテキスト）を退避させると共に、次に実行されるスレッドのコンテキストをプロセッサ設定する。
マルチスレッドプロセッサ１０１では、プロセッサ０からプロセッサ３のいずれでもＯＳが動作し、他のプロセッサを制御し得る。このようなマルチスレッドプロセッサは、複数のプロセッサが同等な立場で処理を分担するものであって、対称型マルチプロセッサ（ＳＭＰ：Symmetric Multi Processor）とも呼ばれている。 When the thread is switched, the multi-thread processor 101 saves the state and result (context) of the thread immediately before the switching, and sets the context of the thread to be executed next as the processor.
In the multi-thread processor 101, the OS operates in any of the processors 0 to 3, and can control other processors. In such a multi-thread processor, a plurality of processors share processing from an equivalent standpoint, and is also called a symmetric multiprocessor (SMP).

また、キャッシュメモリ１０９は、データの保存に使用されるキャッシュメモリ部１０７と、キャッシュメモリ部１０７に対するデータの保存を制御するキャッシュ制御部１０３とを備えている。キャッシュメモリ部１０７は、後に図示するように、データに付されたアドレスや状態を管理するタグメモリと、データ本体を保存するためのデータメモリとを含んでいる。 Further, the cache memory 109 includes a cache memory unit 107 that is used for storing data, and a cache control unit 103 that controls storage of data in the cache memory unit 107. As shown later, the cache memory unit 107 includes a tag memory for managing addresses and states attached to data, and a data memory for storing a data body.

図３は、キャッシュメモリ１０９の構成をより詳細に示した図である。なお、実施形態１では、キャッシュメモリ１０９がプロセッサ０からプロセッサ３までの４つのプロセッサと接続し、４つのプロセッサからデータの要求を受けると共に、４つのプロセッサから受け取ったデータを書き込むことが可能である。なお、各プロセッサがキャッシュメモリ１０９に対して行うデータの要求を、実施形態１では以降読出し命令と記す。 FIG. 3 is a diagram showing the configuration of the cache memory 109 in more detail. In the first embodiment, the cache memory 109 can be connected to four processors from the processor 0 to the processor 3 to receive data requests from the four processors and write data received from the four processors. . In the first embodiment, a data request made by each processor to the cache memory 109 is hereinafter referred to as a read command.

キャッシュメモリ１０９は、図１に示したように、キャッシュ制御部１０３と、キャッシュメモリ部１０７と、ヒット検出部２０８とを備えている。キャッシュメモリ部１０７は、タグメモリ２０６とデータメモリ２０７とを有し、データメモリ２０７は、外部メモリ１０５から読み出されたデータが保存される構成である。また、タグメモリ２０６は、外部メモリ１０５から読み出されたデータを保存するデータメモリ２０７や後述するリード・バッファにおけるアドレスを一括して管理する構成である。 As shown in FIG. 1, the cache memory 109 includes a cache control unit 103, a cache memory unit 107, and a hit detection unit 208. The cache memory unit 107 includes a tag memory 206 and a data memory 207, and the data memory 207 is configured to store data read from the external memory 105. The tag memory 206 is configured to collectively manage addresses in a data memory 207 that stores data read from the external memory 105 and a read buffer described later.

タグメモリ２０６は、データの外部メモリ１０５におけるデータと、このデータが現在保存されているアドレスとを対応付けるデータを例えばテーブル等によって保存するメモリである。データは、外部メモリ１０５の他、データメモリ２０７にも保存されている可能性があるから、データが現在保存されているアドレスは、データメモリ２０７のアドレスをもとり得るものである。 The tag memory 206 is a memory that stores, for example, a table or the like associating data in the external memory 105 with data and an address where the data is currently stored. Since the data may be stored in the data memory 207 in addition to the external memory 105, the address where the data is currently stored can be the address of the data memory 207.

なお、実施形態１では、タグメモリ２０６が、データのアドレスの他、状態（ステータス）をも管理する。ここでいうステータスとは、データの有効、無効やダーティ（外部メモリ１０５から読み出された後に変更された）であるか否か等を示す情報である。
また、キャッシュ制御部１０３は、アドレス制御部２０１、バッファ管理部２０２、ライト・バッファ２０３、リード・バッファ２０５を備えている。 In the first embodiment, the tag memory 206 manages not only the data address but also the state (status). The status here is information indicating whether the data is valid, invalid, dirty (changed after being read from the external memory 105), or the like.
The cache control unit 103 includes an address control unit 201, a buffer management unit 202, a write buffer 203, and a read buffer 205.

アドレス制御部２０１は、プロセッサから入力された読出し命令から要求されたデータのアドレスを取得し、タグメモリ２０６及びデータメモリ２０７をアクセスする際のアドレスに変換してタグメモリ２０６に出力する。あるいは、データメモリ２０７にキャッシュされているデータを読み出す際、タグメモリ２０６およびデータメモリ２０７に読み出されるデータのアドレスを出力する。さらに、データメモリ２０７にキャッシュされていないデータを外部メモリ１０５から読み出す際、外部メモリ１０５のアドレスを生成し、外部メモリ１０５に出力する。 The address control unit 201 acquires the address of the requested data from the read command input from the processor, converts it to an address for accessing the tag memory 206 and the data memory 207, and outputs it to the tag memory 206. Alternatively, when the data cached in the data memory 207 is read, the address of the data read to the tag memory 206 and the data memory 207 is output. Further, when data not cached in the data memory 207 is read from the external memory 105, an address of the external memory 105 is generated and output to the external memory 105.

キャッシュ制御部１０３は、アドレス制御部２０１で生成されたアドレスに基づいてタグメモリ２０６とデータメモリ２０７とに対するアクセスを制御する。また、キャッシュ制御部１０３は、タグメモリ２０６およびデータメモリ２０７へのアクセスが実際に生じるときだけ、メモリアクセスのクロックを供給するなどの方法によって消費電力を低減することも可能である。 The cache control unit 103 controls access to the tag memory 206 and the data memory 207 based on the address generated by the address control unit 201. The cache control unit 103 can also reduce power consumption by a method such as supplying a memory access clock only when access to the tag memory 206 and the data memory 207 actually occurs.

ライト・バッファ２０３は、データメモリ２０７に書き込まれるデータを一時的に保存（バッファリング）するバッファであって、プロセッサ０からプロセッサ３の各々に対応するバッファ２０３ａ、２０３ｂ、２０３ｃ、２０３ｄを備えている。また、リード・バッファ２０５は、データメモリ２０７から読み出されたデータをバッファリングするバッファであって、ライト・バッファ２０３と同様に、プロセッサ０からプロセッサ３の各々に対応するバッファ２０５ａ、２０５ｂ、２０５ｃ、２０５ｄを備えている。ライト・バッファ２０３、リード・バッファ２０５は、データメモリ２０７に対する書込みあるいは読出しのタイミングを調整するために設けられた構成である。 The write buffer 203 is a buffer that temporarily stores (buffers) data to be written to the data memory 207, and includes buffers 203a, 203b, 203c, and 203d corresponding to the processors 0 to 3, respectively. . Further, the read buffer 205 is a buffer for buffering data read from the data memory 207. Similarly to the write buffer 203, the buffers 205a, 205b, and 205c corresponding to the processors 0 to 3, respectively. , 205d. The write buffer 203 and the read buffer 205 are provided to adjust the timing of writing or reading to the data memory 207.

さらに、キャッシュ制御部１０３は、バッファ管理部２０２を備えている。バッファ管理部２０２は、ライト・バッファ２０３とリード・バッファ２０５との間のデータの整合性をとるための構成である。すなわち、バッファ管理部２０２は、ライト・バッファ２０３に保存されているデータとリード・バッファ２０５に保存されているデータとを比較し、本来同じデータであるべきデータの不一致が検出された場合には、例えばリード・バッファ２０５に保存されている側のデータを更新、あるいは削除することによって両者を一致させている。 Further, the cache control unit 103 includes a buffer management unit 202. The buffer management unit 202 has a configuration for ensuring data consistency between the write buffer 203 and the read buffer 205. That is, the buffer management unit 202 compares the data stored in the write buffer 203 with the data stored in the read buffer 205, and when a mismatch between data that should be the same data is detected. For example, the data on the side stored in the read buffer 205 is updated or deleted to make them coincide.

ヒット検出部２０８は、プロセッサによって供給を要求されたデータのアドレスをタグメモリ２０６で管理されているアドレスと照合し、データメモリ２０７に対して供給を要求されたデータがあるか否か検出する。さらに、実施形態１では、供給を要求されたデータが検出された場合、ヒット検出部２０８が検出されたデータをマルチスレッドプロセッサ１０１に供給している。 The hit detection unit 208 compares the address of the data requested to be supplied by the processor with the address managed by the tag memory 206, and detects whether there is data requested to be supplied to the data memory 207. Furthermore, in the first embodiment, when the data requested to be supplied is detected, the hit detection unit 208 supplies the detected data to the multithread processor 101.

次に、以上述べた構成の動作を、キャッシュメモリ１０９に対するリードとライトとについて説明する。 Next, the operation of the above-described configuration will be described for reading and writing with respect to the cache memory 109.

（リード動作）
マルチスレッドプロセッサ１０１は、複数のプロセッサのうちの例えばプロセッサ０から読出し命令を出力する。読出し命令は、供給が要求されるデータの外部メモリ１０５におけるアドレス（読出しアドレス）と、読出しを指示する信号（読出し制御信号）とを含んでいる。アドレス制御部２０１は、リード・バッファ２０５のプロセッサ０に対応するバッファ２０５ａに保存されているデータを検出する。 (Read operation)
The multi-thread processor 101 outputs a read command from, for example, the processor 0 among the plurality of processors. The read command includes an address (read address) of data requested to be supplied in the external memory 105 and a signal instructing reading (read control signal). The address control unit 201 detects data stored in the buffer 205 a corresponding to the processor 0 of the read buffer 205.

そして、バッファ２０５ａにデータが保存されている場合、保存されているデータに付されたタグアドレスと、読出しアドレスとを照合する。そして、読出しアドレスがリード・バッファ２０５のバッファ２０５ａに保存されているデータのタグアドレスと一致した場合、バッファ２０５ａに保存されているデータをプロセッサ０に出力し、読出し処理を完了させる。 When data is stored in the buffer 205a, the tag address attached to the stored data is compared with the read address. If the read address matches the tag address of the data stored in the buffer 205a of the read buffer 205, the data stored in the buffer 205a is output to the processor 0 to complete the read process.

読出しアドレスに対応するタグアドレスが付されたデータがリード・バッファ２０５に保存されていない場合、アドレス制御部２０１は、タグメモリ２０６にアクセスして読出しアドレスを照合する。なお、プロセッサは、読出しアドレスとして、外部メモリ１０５におけるアドレスを使用する。実施形態１では、アドレス制御部２０１が外部メモリ１０５におけるアドレスをデータメモリ２０７にアクセスする際のアドレスに変換する。 When the data with the tag address corresponding to the read address is not stored in the read buffer 205, the address control unit 201 accesses the tag memory 206 and collates the read address. The processor uses an address in the external memory 105 as a read address. In the first embodiment, the address control unit 201 converts an address in the external memory 105 into an address for accessing the data memory 207.

タグメモリ２０６は、読出しの対象となるデータのデータメモリ２０７におけるアドレスとステータスとを対応付けて保存している。タグメモリ２０６における照合の結果、読出しアドレスによって指定されたデータがデータメモリ２０７にあった場合、タグメモリ２０６は、ヒット検出部２０８にデータのステータスを出力する。また、タグメモリ２０６における照合結果はデータメモリ２０７にも出力され、データメモリ２０７からヒット検出部２０８へプロセッサ０によって読出されるデータが出力される。ヒット検出部２０８は、データのステータスに基づいてデータが読出し可能である、つまりキャッシュヒットしたと判断した場合、ヒットしたデータをプロセッサ０へ出力する。 The tag memory 206 stores the address in the data memory 207 of the data to be read in association with the status. If the data specified by the read address is in the data memory 207 as a result of the collation in the tag memory 206, the tag memory 206 outputs the data status to the hit detection unit 208. The collation result in the tag memory 206 is also output to the data memory 207, and data read by the processor 0 is output from the data memory 207 to the hit detection unit 208. The hit detection unit 208 outputs the hit data to the processor 0 when it is determined that the data can be read based on the data status, that is, the cache hit.

また、ヒット検出部２０８は、データの出力と同時に出力されたデータにかかる１エントリ分のデータをリード・バッファ２０５に転送する。このような処理により、次回のアクセス時に転送されたデータが読み出される場合、キャッシュ制御部１０３がタグメモリ２０６とデータメモリ２０７とにアクセスすることなくデータの読出しが可能になる。
一方、読出しアドレスに対応するデータがタグメモリにない、あるいはデータのステータスが無効である等の理由によってデータが読み出せない（キャッシュミスヒットした）場合、キャッシュ制御部１０３は、外部メモリ１０５からデータメモリに読み込むと共にリード・バッファ２０５に転送した後にプロセッサ０へ出力する。また、キャッシュヒットした場合と同様に１エントリ分のデータをリード・バッファ２０５に転送する。 The hit detection unit 208 transfers the data for one entry related to the output data to the read buffer 205 at the same time as the output of the data. By such processing, when data transferred at the next access is read, the data can be read without the cache control unit 103 accessing the tag memory 206 and the data memory 207.
On the other hand, if the data corresponding to the read address is not in the tag memory or the data cannot be read because the data status is invalid (cache miss hit), the cache control unit 103 receives data from the external memory 105. The data is read into the memory and transferred to the read buffer 205, and then output to the processor 0. Similarly to the case of a cache hit, the data for one entry is transferred to the read buffer 205.

なお、以上述べた実施形態１では、リード・バッファ２０５が複数のプロセッサの各々に対応するバッファ２０５ａ〜２０５ｄを備えている。しかし、実施形態１は、このような構成に限定されるものでなく、プロセッサ０による読出し命令の読出しアドレスと一致するタグアドレスが付されたデータがバッファ２０５ａ以外のバッファに保存されていた場合にもバッファ２０５ａ以外のバッファに保存されているデータを読み出す機能を追加することもできる。 In the first embodiment described above, the read buffer 205 includes buffers 205a to 205d corresponding to each of the plurality of processors. However, the first embodiment is not limited to such a configuration, and the case where data with a tag address that matches the read address of the read instruction by the processor 0 is stored in a buffer other than the buffer 205a. In addition, a function of reading data stored in a buffer other than the buffer 205a can be added.

複数のプロセッサが動的にスレッドを切り換えて処理を実行するマルチスレッドプロセッサは、複数の異なるプロセッサによって同一のデータが繰り返し使用される可能性がある。このため、上記したように、読出し命令を出力したプロセッサがリード・バッファ２０５にあるバッファ２０５ａ〜２０５ｄのいずれにもアクセス可能とすれば、例えばプロセッサ０が他のプロセッサの処理時にリード・バッファ２０５に転送されたデータを読み出すことができる。 In a multi-thread processor in which a plurality of processors dynamically execute processing by switching threads, the same data may be repeatedly used by a plurality of different processors. For this reason, as described above, if the processor that has output the read command can access any of the buffers 205a to 205d in the read buffer 205, for example, the processor 0 can access the read buffer 205 during processing of another processor. The transferred data can be read.

このような処理によれば、キャッシュ制御部１０３がタグメモリ２０６、データメモリ２０７にアクセスする回数が低減し、マルチスレッドプロセッサ１０１のデータ読出しに係る処理効率が向上する。
また、以上述べた実施形態１は、プロセッサ間でデータのコヒーレンシーを動作効率よく保つことができる。すなわち、例えばプロセッサごとにタグメモリやデータメモリを備える構成では、同じタグアドレスを持つデータが複数の異なるデータメモリに保存されることになる。そして、複数のデータメモリに存在するデータのうちの一部だけが更新される等の理由によってプロセッサ間におけるデータの不一致が発生する。 According to such processing, the number of times that the cache control unit 103 accesses the tag memory 206 and the data memory 207 is reduced, and the processing efficiency of the multithread processor 101 relating to data reading is improved.
Further, the first embodiment described above can maintain data coherency between processors with high operational efficiency. That is, for example, in a configuration including a tag memory and a data memory for each processor, data having the same tag address is stored in a plurality of different data memories. Then, data mismatch between processors occurs because only a part of data existing in a plurality of data memories is updated.

しかし、実施形態１によれば、外部メモリ１０５から読み出されたデータのデータメモリ２０７におけるアドレスをタグメモリ２０６が一括して管理するため、実質的に唯一のデータメモリによって読出されたデータを保存することになり、キャッシュメモリにおけるデータの不一致をなくすことができる。また、このために実施形態１は、バスを監視する必要がなく、バスを監視するための回路や消費電量が不要である。したがって、プロセッサ間でデータのコヒーレンシーを保つためにマルチプロセッサシステムの装置構成を大きくする、あるいは消費電力を高めることがない。 However, according to the first embodiment, since the tag memory 206 collectively manages the addresses in the data memory 207 of the data read from the external memory 105, the data read by only one data memory is saved. As a result, the data mismatch in the cache memory can be eliminated. For this reason, the first embodiment does not need to monitor the bus, and does not require a circuit or power consumption for monitoring the bus. Therefore, the apparatus configuration of the multiprocessor system is not increased or the power consumption is not increased in order to maintain data coherency between processors.

さらに、実施形態１のマルチスレッドプロセッサは、図２に示したように、１つのプロセッサが複数のスレッドを動的に切り替えて実行するマルチスレッドＯＳのように、データの不一致が生じやすい構成に適用した場合に特に有利である。
また、実施形態１は、ヒットしたデータの読出しと共に、このデータにかかる１エントリ分のデータをリード・バッファ２０５に転送しておく。このため、プロセッサがタグメモリ２０６やデータメモリ２０７にアクセスする回数を低減し、データの読出しにかかるマルチスレッドプロセッサの負荷を軽減することができる。 Furthermore, as shown in FIG. 2, the multi-thread processor according to the first embodiment is applied to a configuration in which data mismatch is likely to occur, such as a multi-thread OS in which a single processor dynamically switches and executes a plurality of threads This is particularly advantageous.
In the first embodiment, the data for one entry related to this data is transferred to the read buffer 205 while reading the hit data. Therefore, the number of times the processor accesses the tag memory 206 and the data memory 207 can be reduced, and the load on the multi-thread processor for reading data can be reduced.

つまり、読出しの対象となるデータを保存するデータメモリは、一般にバッファに比べてアクセスタイムが遅く、プロセッサのパフォーマンス向上のボトルネックになりやすい。データメモリの前段にリード・バッファやライト・バッファを設けることによってメモリアクセスの遅延を見かけ上隠すことが可能になり、結果としてプロセッサの性能向上を図ることができる。 In other words, a data memory that stores data to be read generally has a slower access time than a buffer and is likely to become a bottleneck for improving the performance of the processor. By providing a read buffer and a write buffer in front of the data memory, it becomes possible to apparently hide the delay of memory access, and as a result, the performance of the processor can be improved.

（ライト動作）
次に、実施形態１のマルチスレッドプロセッサによるライト動作について説明する。なお、以下に述べるライト動作は、ライトバックを例にしているが、ライトスルーにも適用することができる。
マルチスレッドプロセッサ１０１は、複数のプロセッサのうちの例えばプロセッサ０からデータを書き込むよう指示する命令（書込み命令）を出力する。書込み命令は、書込みが要求されるデータの外部メモリ１０５におけるアドレス（書込みアドレス）と、書込みを指示する信号（書込み制御信号）とを含んでいる。また、書込み動作にあっては、書込み命令と共に書込まれるデータ（書込みデータ）もがプロセッサから送出される。 (Light operation)
Next, a write operation by the multithread processor of the first embodiment will be described. The write operation described below uses write back as an example, but can also be applied to write through.
The multithread processor 101 outputs an instruction (write instruction) instructing to write data from, for example, the processor 0 among the plurality of processors. The write command includes an address (write address) of data requested to be written in the external memory 105 and a signal instructing writing (write control signal). In the write operation, data (write data) written together with the write command is also sent from the processor.

書込みデータは、キャッシュメモリ１０９において、先ず、書込みアドレスと共にライト・バッファ２０３に保存される。ライト・バッファ２０３は、ＦＩＦＯメモリ（First In First Out memory）でなり、書込まれたデータを書込みの順にデータメモリ２０７に書き込んでいる。
キャッシュ制御部１０３は、データをデータメモリ２０７に書き込むため、先ず、タグメモリ２０６のタグアドレスに書込みアドレスを照合すると共に、データのステータスを検出する。この結果、データメモリ２０７にデータの書込みが可能である、つまりキャッシュヒットしたと判断された場合、書込みデータをデータメモリ２０７に書込む。また、タグメモリに保存されているデータのステータスを示すフラグを「ダーティ」にする。 Write data is first stored in the write buffer 203 together with the write address in the cache memory 109. The write buffer 203 is a FIFO memory (First In First Out memory), and writes the written data into the data memory 207 in the order of writing.
In order to write data to the data memory 207, the cache control unit 103 first checks the write address against the tag address of the tag memory 206 and detects the status of the data. As a result, when it is determined that data can be written into the data memory 207, that is, when a cache hit is determined, the write data is written into the data memory 207. Further, the flag indicating the status of the data stored in the tag memory is set to “dirty”.

また、キャッシュ制御部１０３は、データメモリ２０７にデータの書込みが不可能である、つまりキャッシュミスヒットしたと判断した場合、外部メモリ１０５から書込みデータに該当するデータをデータメモリ２０７に読み出す。そして、ライト・バッファ２０３に書き込む、タグメモリ２０６を更新する。
実施形態１ではライト・バッファ２０３にＦＩＦＯメモリを採用したため、プロセッサによるライト・バッファ２０３への書込みが、ライト・バッファ２０３が一杯になるまで次々と行われる。また、ライト・バッファ２０３に書き込まれたデータは、タグメモリ２０６、データメモリ２０７に対するアクセス状況に応じ、他の処理と調停しながら空き時間を利用して書き込まれる。 In addition, when the cache control unit 103 determines that data cannot be written to the data memory 207, that is, a cache miss has occurred, the cache control unit 103 reads data corresponding to the write data from the external memory 105 to the data memory 207. Then, the tag memory 206 written to the write buffer 203 is updated.
In the first embodiment, since the FIFO memory is employed for the write buffer 203, writing to the write buffer 203 by the processor is successively performed until the write buffer 203 becomes full. Further, the data written in the write buffer 203 is written using the free time while arbitrating with other processes according to the access status to the tag memory 206 and the data memory 207.

なお、バッファ管理部２０２は、上記のリード動作におけるリード・バッファ２０５への書込み、ライト動作におけるライト・バッファ２０３への書込みや読出しを調停する。また、ライト・バッファ２０３、リード・バッファ２０５間でデータ一致性（コヒーレンシー）を保障する。
ライト・バッファ２０３、リード・バッファ２０５間のデータ一致性は、ライト・バッファ２０３に書込まれたデータがデータメモリ２０７書き込まれる前、同一のデータであってリード・バッファ２０５にあるものにリードアクセスが生じたときに問題となる。このような場合、バッファ管理部２０２は、リード・バッファ２０５の内容をライトされたデータに更新する。あるいは、いったんリード・バッファ２０５のデータを無効にし、書き込みデータをライト・バッファ２０３からデータメモリ２０７に書込んだ後、書込まれたデータをリードすることも考えられる。 Note that the buffer management unit 202 arbitrates writing to the read buffer 205 in the above read operation and writing to and reading from the write buffer 203 in the write operation. In addition, data consistency (coherency) is ensured between the write buffer 203 and the read buffer 205.
The data consistency between the write buffer 203 and the read buffer 205 is that the data written in the write buffer 203 is read-accessed to the same data in the read buffer 205 before the data memory 207 is written. When this happens, it becomes a problem. In such a case, the buffer management unit 202 updates the contents of the read buffer 205 with the written data. Alternatively, it is conceivable that the data in the read buffer 205 is once invalidated, the write data is written from the write buffer 203 to the data memory 207, and then the written data is read.

なお、処理効率の観点からは、リード・バッファ２０５の内容をライトされたデータに更新することが望ましい。
また、このようなリード・バッファ２０５の更新にあたっては、プロセッサ０がリード・バッファ２０５の対応するリード・バッファ部分にアクセスした場合にも、他のプロセッサに対応するリード・バッファ２０５も更新対象にする必要がある。すなわち、同一のデータが複数のスレッドで使用されることも考えられるので、プロセッサ０による書込み
に対して、プロセッサ０以外のプロセッサによる読出しと書込みとのデータ一致性を保証する必要がある。 From the viewpoint of processing efficiency, it is desirable to update the contents of the read buffer 205 with the written data.
In addition, when the read buffer 205 is updated, even when the processor 0 accesses the corresponding read buffer portion of the read buffer 205, the read buffer 205 corresponding to another processor is also updated. There is a need. That is, since the same data may be used in a plurality of threads, it is necessary to guarantee data consistency between reading and writing by a processor other than processor 0 with respect to writing by processor 0.

以上述べた実施形態１によれば、書込みデータが、タグメモリ２０６によってアドレスが一括して管理されるデータメモリ２０７に書き込まれるため実質的に唯一のデータメモリによって読出されたデータを保存することになり、キャッシュメモリにおけるデータの不一致をなくすことができる。また、このために実施形態１は、バスを監視する必要がなく、バスを監視するための回路や消費電量が不要である。したがって、プロセッサ間でデータのコヒーレンシーを保つためにマルチプロセッサシステムの装置構成を大きくする、あるいは消費電力を高めることがない。 According to the first embodiment described above, since the write data is written to the data memory 207 whose addresses are collectively managed by the tag memory 206, the data read by only one data memory is saved. Thus, data mismatch in the cache memory can be eliminated. For this reason, the first embodiment does not need to monitor the bus, and does not require a circuit or power consumption for monitoring the bus. Therefore, the apparatus configuration of the multiprocessor system is not increased or the power consumption is not increased in order to maintain data coherency between processors.

図４は、以上述べた実施形態１のキャッシュメモリにおいて実行されるデータの読出しあるいは書込みの動作を説明するためのフローチャートである。また、図５は、図４と比較するため、従来のキャッシュメモリで実行されるデータの読出しの動作を説明するためのフローチャートである。
図４に示したように、実施形態１のキャッシュメモリは、複数のプロセッサのうちの1つ（プロセッサｋとする）が、キャッシュメモリ１０９にアクセスを要求した場合、タグメモリ２０６においてキャッシュヒットを検出する処理を行う（Ｓ４０１）。この結果、キャッシュヒットが検出された場合（Ｓ４０２：Ｙｅｓ）、データメモリ２０７にアクセスし、読出しの対象となるデータを読み出す（Ｓ４０６）。また、タグメモリ２０６におけるデータのステータスを更新する（Ｓ４０７）。 FIG. 4 is a flowchart for explaining the data read or write operation executed in the cache memory of the first embodiment described above. FIG. 5 is a flowchart for explaining the data read operation executed in the conventional cache memory for comparison with FIG.
As shown in FIG. 4, the cache memory of the first embodiment detects a cache hit in the tag memory 206 when one of a plurality of processors (referred to as processor k) requests access to the cache memory 109. (S401). As a result, when a cache hit is detected (S402: Yes), the data memory 207 is accessed, and data to be read is read (S406). Further, the status of data in the tag memory 206 is updated (S407).

また、キャッシュメモリ１０９は、ステップＳ４０２において、キャッシュミスヒットを検出した場合（Ｓ４０２：Ｎｏ）、データメモリ２０７に保存されているデータのうち入れ替えられるデータを決定する（Ｓ４０３）。そして、データメモリ２０７からダーティ・データを書き出し（Ｓ４０４）、外部メモリ１０５に保存されている新規なデータをデータメモリ２０７に読み出す（Ｓ４０５）。 When the cache memory 109 detects a cache miss hit in step S402 (S402: No), the cache memory 109 determines data to be replaced among the data stored in the data memory 207 (S403). Then, dirty data is written out from the data memory 207 (S404), and new data stored in the external memory 105 is read out to the data memory 207 (S405).

図５に処理を示した従来のキャッシュメモリは、ｋ個のプロセッサの各々が独立したタグメモリ及びデータメモリを備える点で実施形態１のキャッシュメモリと相違する。このため、ステップＳ５０２の判断において、従来のキャッシュメモリは、アクセスを要求したプロセッサｋに対応するデータメモリに対してアクセスし（Ｓ５０６）、プロセッサｋに対応するタグメモリを更新する（Ｓ５０７）。 The conventional cache memory whose processing is shown in FIG. 5 is different from the cache memory of the first embodiment in that each of k processors includes an independent tag memory and data memory. Therefore, in the determination in step S502, the conventional cache memory accesses the data memory corresponding to the processor k that requested access (S506), and updates the tag memory corresponding to the processor k (S507).

さらに、プロセッサｋのアクセスが書込みであるか否か判断し（Ｓ５０８）、書込であった場合には（Ｓ５０８：Ｙｅｓ）、プロセッサｋ以外のプロセッサに対応するデータメモリ及びタグメモリのデータをも更新し、データメモリ間におけるデータのコヒーレンシーを調整している。
また、以上述べた本実施形のキャッシュメモリは、以上述べた構成に限定されるものでなく、データの先読み機能を付加した構成とすることも可能である。図６は、実施形態１のキャッシュメモリを命令キャッシュにも適用し、先読みキャッシュとして構成したものである。 Further, it is determined whether or not the access of the processor k is writing (S508). If it is writing (S508: Yes), the data in the data memory and the tag memory corresponding to the processors other than the processor k are stored. Updates and adjusts data coherency between data memories.
The cache memory according to the present embodiment described above is not limited to the configuration described above, and may be configured with a data pre-read function. In FIG. 6, the cache memory of the first embodiment is also applied to an instruction cache and configured as a prefetch cache.

図６に示した構成は、プロセッサによって供給が要求されることが予想されるデータ（命令）をキャッシュしておく先読みデータ保存手段である先読みバッファをさらに備えている。キャッシュメモリと接続する複数のプロセッサは、各々独立に別のプログラムにアクセスするため、先読みバッファは、プロセッサの個数に対応した数必要になる。先読みキャッシュのアドレス制御部６０１は、先読みのためアドレスの連続性を検出するため、あるいはタグメモリおよびデータメモリをキャッシュのエントリ単位で行うためにエントリの境界をまたがる度に次のタグメモリのアドレスを生成する。 The configuration shown in FIG. 6 further includes a prefetch buffer that is a prefetch data storage unit that caches data (instructions) expected to be supplied by the processor. Since a plurality of processors connected to the cache memory independently access different programs, the number of prefetch buffers corresponding to the number of processors is required. The address controller 601 of the prefetch cache detects the address of the next tag memory every time it crosses the entry boundary in order to detect the continuity of the address for prefetching or to perform the tag memory and the data memory in units of cache entries. Generate.

なお、先読みバッファの機能は、従来はタグメモリおよびデータメモリに対するアクセス回数を低減して低消費電力化することを目的にしていた。しかし、先読みの機能をマルチスレッドプロセッサに適用する場合、図３に示したリード・バッファ２０５、ライト・バッファ２０３と同様に、メモリデータにクセス回数を低減してメモリアクセスのボトルネックを解消し、処理速度の向上を図ることができる。 The function of the prefetch buffer has heretofore been aimed at reducing power consumption by reducing the number of accesses to the tag memory and data memory. However, when applying the read-ahead function to a multi-thread processor, as with the read buffer 205 and write buffer 203 shown in FIG. The processing speed can be improved.

（実施形態２）
次に、本発明の実施形態２について説明する。
実施形態２のキャッシュメモリは、実施形態１で説明した図３の構成を有している。このため、実施形態２では、キャッシュメモリの構成の図示及び説明の一部を省くものとする。実施形態２のキャッシュメモリは、データメモリ２０７、タグメモリ２０６少なくとも一方がマルチポートメモリである。データメモリ２０７をマルチポートメモリとする場合、このマルチポートメモリは、プロセッサの数にウェイの数を乗じた数のポートが必要になる。また、タグメモリ２０６をマルチポートメモリとする場合、このマルチポートメモリは、プロセッサの数のポートが必要になる。 (Embodiment 2)
Next, Embodiment 2 of the present invention will be described.
The cache memory of the second embodiment has the configuration of FIG. 3 described in the first embodiment. For this reason, in the second embodiment, a part of the illustration and description of the configuration of the cache memory is omitted. In the cache memory according to the second embodiment, at least one of the data memory 207 and the tag memory 206 is a multi-port memory. When the data memory 207 is a multi-port memory, the multi-port memory requires a number of ports obtained by multiplying the number of processors by the number of ways. Further, when the tag memory 206 is a multi-port memory, this multi-port memory requires a number of ports corresponding to the number of processors.

実施形態２のプロセッサは、実施形態２で説明したキャッシュメモリのデータ書出し（データをデータメモリ２０７から外部メモリ１０５に書き出す動作）に関する構成を説明するものである。
実施形態２では、読み込み及び書き出しの操作に対し、いわゆるＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄａｌｇｏｒｉｔｈｍ）方式を採用している。ＬＲＵとは、キャッシュされているデータのうち、プロセッサがアクセスした後最も長い時間が経過したものをキャッシュメモリから取り除く方法によりデータに対するプロセッサの供給要求の状態に基づいて決定している。このような方式によれば、常にプロセッサの要求頻度が多いデータをデータメモリ２０７にキャッシュしておくことができ、キャッシュメモリを採用した構成の処理効率を高めることができる。 The processor according to the second embodiment describes a configuration related to data writing (operation for writing data from the data memory 207 to the external memory 105) described in the second embodiment.
In the second embodiment, a so-called LRU (Least Recently Used algorithm) system is adopted for reading and writing operations. LRU is determined based on the state of a processor supply request for data by removing from cache memory data that has passed the longest time after being accessed by the processor among the cached data. According to such a system, data frequently requested by the processor can always be cached in the data memory 207, and the processing efficiency of the configuration employing the cache memory can be improved.

また、キャッシュメモリの方式には種々のものがあるが、実施形態２では、キャッシュメモリ１０９が、２ウェイ（ウェイＡ，Ｂ）のセット・アソシアティブ方式のキャッシュメモリであるものとする。なお、セット・アソシアティブ方式とは、キャッシュメモリを複数の領域（ウェイ）に分割し、それぞれのウェイに、メモリデバイス上の異なるアドレスのデータを格納しておくことにより、ヒット率を向上させることができる方式である。 There are various cache memory systems. In the second embodiment, the cache memory 109 is assumed to be a 2-way (way A, B) set-associative cache memory. In the set associative method, the hit ratio is improved by dividing the cache memory into a plurality of areas (way) and storing data of different addresses on the memory device in each way. This is possible.

実施形態２のキャッシュメモリは、タグメモリ２０６が、データメモリ２０７のアドレスを複数のウェイとして管理する。そして、データメモリ２０７に保存されているデータを前記データ保存手段にデータを保持する際の優先度である保存優先度を前記ウェイごとに付し、かつ、各ウェイに付される保存優先度を該ウェイで管理されているデータに対す
なお、実施形態２では、保存優先度を決定する基となるアクセスの状態を、現在に比較的近い所定の期間のアクセス回数とする。このような実施形態２によれば、
先ず、実施形態２のタグメモリ２０６及びデータメモリ２０７の構成を詳細に説明する。図７（ａ）、（ｂ）、（ｃ）は、タグメモリ２０６、データメモリ２０７に記憶されるデータの構造を説明するための図である。（ａ）は、タグメモリ２０６によって管理されるステータスのフラグを示している。 In the cache memory of the second embodiment, the tag memory 206 manages the address of the data memory 207 as a plurality of ways. Then, the storage priority, which is the priority when the data stored in the data memory 207 is stored in the data storage unit, is assigned to each way, and the storage priority assigned to each way is set. For the data managed by the way, in the second embodiment, the access state that is the basis for determining the storage priority is the number of accesses in a predetermined period relatively close to the current time. According to such Embodiment 2,
First, the configuration of the tag memory 206 and the data memory 207 of the second embodiment will be described in detail. FIGS. 7A, 7 </ b> B, and 7 </ b> C are diagrams for explaining the structures of data stored in the tag memory 206 and the data memory 207. (A) shows the status flag managed by the tag memory 206.

フラグは、プロセッサ０〜３の各々ごとにタグメモリ２０６に保存されていて、実施形態２では、データのステータスをＶａｌｉｄｆｌａｇ、Ｄｉｒｔｙｆｌａｇ、Ｕｓｅｄｆｌａｇの３つのフラグによって示している。Ｖａｌｉｄｆｌａｇは、データの有効性を示すフラグである。Ｄｉｒｔｙｆｌａｇは、キャッシュされているデータが読み込んだ値から交信されている状態（ダーティデータ）であることを示し、Ｕｓｅｄｆｌａｇは、書出しの優先度（書出し優先度）を示している。
なお、実施形態２では、各ウェイの書出し優先度をＵｓｅｄフラグに基づいてテーブル（Ｕｓｅｄテーブル）で管理する。Ｕｓｅｄテーブルについては、図８に示して説明する。 The flags are stored in the tag memory 206 for each of the processors 0 to 3, and in the second embodiment, the status of the data is indicated by three flags: Valid flag, Dirty flag, and Used flag. The Valid flag is a flag indicating the validity of data. Dirty flag indicates that the cached data is communicated from the read value (dirty data), and Used flag indicates the priority of writing (writing priority).
In the second embodiment, the writing priority of each way is managed in a table (Used table) based on the Used flag. The Used table will be described with reference to FIG.

図７（ｂ）は、タグメモリ２０６のデータ構造を説明するための図である。また、図７（ｃ）は、データメモリ２０７のデータ構造を説明するための図である。２ウェイのセット・アソシアティブ方式を採用した実施形態２では、プロセッサ０〜プロセッサ３の各々について２つのウェイを持っていて、タグメモリ２０６は、データメモリ２０７を合計８個のウェイ（Ｗ０〜Ｕ７）として管理している。 FIG. 7B is a diagram for explaining the data structure of the tag memory 206. FIG. 7C is a diagram for explaining the data structure of the data memory 207. In the second embodiment adopting the two-way set associative method, each of the processor 0 to the processor 3 has two ways, and the tag memory 206 includes the data memory 207 in a total of eight ways (W0 to U7). Manage as.

タグメモリ２０６に保存されるデータ（タグ情報）は、データのヒット、ミスヒットを検出するためのデータであって、アクセスされるデータのアドレスの１６ビット分が保存されている。また、タグ情報に基づいて読み出されるデータは、１ワードが３２ビットのデータとしてデータメモリにキャッシュされている。
図８（ａ）、（ｂ）は、実施形態２のＬＲＵの処理を説明するための図であって、データの読出しの前後のＵｓｅｄテーブルを示している。実施形態２では、Ｕｓｅｄテーブルをタグメモリ２０６に保存し、キャッシュ制御部１０３によって更新するものとした。 Data (tag information) stored in the tag memory 206 is data for detecting data hits and miss hits, and stores 16 bits of the address of the accessed data. In addition, data read based on the tag information is cached in the data memory as 32-bit data of one word.
FIGS. 8A and 8B are diagrams for explaining LRU processing according to the second embodiment, and show used tables before and after data reading. In the second embodiment, the Used table is stored in the tag memory 206 and updated by the cache control unit 103.

プロセッサ０〜３によって読み出されたデータは、データメモリ２０７のウェイＵ０〜Ｕ７のいずれかにおいてキャッシュされる。ウェイＵ０〜Ｕ７においてキャッシュ可能な数のデータがキャッシュされた後、さらに他のデータをキャッシュする必要がある場合、キャッシュ制御部１０３は、現在ウェイＵ０〜Ｕ７にキャッシュされているデータの１つを外部メモリ１０５に書き出す。そして、新たにキャッシュされた他のデータを書き出されたデータがキャッシュされていた領域に保存する。 Data read by the processors 0 to 3 is cached in any of the ways U0 to U7 of the data memory 207. When another data needs to be cached after the cacheable number of data is cached in the ways U0 to U7, the cache control unit 103 selects one of the data currently cached in the ways U0 to U7. Write to the external memory 105. Then, other newly cached data is stored in the area where the written data has been cached.

合計８つのウェイのキャッシュメモリを持つ実施形態２では、ウェイＵ０〜Ｕ７にキャッシュされているデータのいずれを書き出すかを、ウェイに対するプロセッサの供給要求の状態に基づいて決定している。そして、決定したウェイにあって最もアクセス回数が少なかったデータを書き出すものとする。
このような実施形態２において、アクセス回数の判断を短い時間に限定すれば、直前のデータアクセスの有無によってデータ書出しの対象となるウェイを決定することも可能である。 In the second embodiment having a total of eight ways of cache memory, which of the data cached in the ways U0 to U7 is to be written is determined based on the state of the processor supply request to the way. It is assumed that data having the smallest number of accesses in the determined way is written out.
In the second embodiment, if the determination of the number of accesses is limited to a short time, it is possible to determine the way to which data is to be written depending on the presence or absence of the previous data access.

書出し優先順位は、図８（ａ）、（ｂ）の表中にＬＲＵ順位としてＵｓｅｄテーブルに記録されている。実施形態２でいうＬＲＵ順位は、０〜７の数値によって表されていて、０は最高のＬＲＵ順位を示し、１は最低のＬＲＵ順位を示す。そして、ＬＲＵ順位７が付されたウェイのデータは、次に起こるキャッシュミスヒット時に他のウェイのデータに優先して外部メモリ１０５に書き出される。 The writing priority order is recorded in the Used table as the LRU order in the tables of FIGS. The LRU ranking referred to in the second embodiment is represented by a numerical value of 0 to 7, where 0 indicates the highest LRU ranking and 1 indicates the lowest LRU ranking. The data of the way assigned with the LRU order 7 is written to the external memory 105 in preference to the data of other ways at the next cache miss hit.

図８（ａ）は、プロセッサによって要求されたデータがデータメモリ２０７にミスヒットした場合のＬＲＵ順位の決定について説明するための図である。プロセッサが要求したデータがタグメモリ２０６に管理されるデータにないため、プロセッサは、外部メモリ１０５にアクセスしてデータを読み出し、データメモリ２０７にキャッシュする。この際、キャッシュ制御部１０３は、Ｕｓｅｄテーブルを参照し、ウェイＵ０〜Ｕ７のＬＲＵ順位を参照する。 FIG. 8A is a diagram for explaining the determination of the LRU order when the data requested by the processor mis-hits in the data memory 207. Since the data requested by the processor is not included in the data managed in the tag memory 206, the processor accesses the external memory 105 to read the data and caches it in the data memory 207. At this time, the cache control unit 103 refers to the Used table and refers to the LRU order of the ways U0 to U7.

図８（ａ）の場合、ウェイＵ６のＬＲＵ順位が最低の７であるから、キャッシュ制御部１０３は、ウェイＵ６にキャッシュされていて、キャッシュされたウェイ６のＬＲＵ順位を０にし、他のウェイＵのＬＲＵデータに対するプロセッサの供給要求の状態に基づいて決定している。プロセッサがアクセスした後最も長い時間が経過したものをキャッシュメモリから取り除く方法によりデータを書き出す。そして、最新のデータが順位をそれぞれ１つずつ低下させる。 In the case of FIG. 8A, since the LRU rank of the way U6 is the lowest 7, the cache control unit 103 caches the way U6, sets the LRU rank of the cached way 6 to 0, and sets the other way. It is determined based on the state of the processor supply request for U LRU data. Data is written out by removing from the cache memory the one that has passed the longest time after being accessed by the processor. The latest data lowers the rank one by one.

また、図８（ｂ）は、プロセッサが要求したデータがデータメモリ２０７で管理されているデータにヒットした場合のＬＲＵ順位の決定について説明するための図である。プロセッサが要求したデータがウェイＵ４のデータにヒットした場合、キャッシュ制御部１０３は、このデータを読み出してプロセッサに供給する。このとき、データの書出しは必要ないが、直前にヒットしたウェイＵ４のＬＲＵ順位を０に更新し、この更新に伴ってヒット前のウェイＵ４のＬＲＵ順位（４）より高位のＬＲＵ順位を更新する。 FIG. 8B is a diagram for explaining determination of the LRU order when the data requested by the processor hits data managed in the data memory 207. When the data requested by the processor hits the data of the way U4, the cache control unit 103 reads this data and supplies it to the processor. At this time, it is not necessary to write out data, but the LRU rank of the way U4 hit immediately before is updated to 0, and the LRU rank higher than the LRU rank (4) of the way U4 before hit is updated accordingly. .

図９は、以上述べた実施形態２のうち、キャッシュの制御を説明するためのフローチャートである。キャッシュ制御部１０３は、プロセッサからデータアクセスの要求を受け、タグメモリ２０６に要求されたデータがヒットするか否かを検出する（Ｓ７０１）。プロセッサに対応して管理されているデータにヒットしたか否か判断し（Ｓ７０２）、ヒットが検出された場合には（Ｓ７０２：Ｙｅｓ）、アクセスがデータの書出しを要求するものか否か判断する（Ｓ７０７）。データの書出しが要求された場合（Ｓ７０７：Ｙｅｓ）、データメモリのタグに対応するウェイ（Ｗａｙ（ｎ））へデータを書き込む（Ｓ７１０）。 FIG. 9 is a flowchart for explaining cache control in the second embodiment described above. The cache control unit 103 receives a data access request from the processor and detects whether or not the requested data hits the tag memory 206 (S701). It is determined whether or not the data managed corresponding to the processor has been hit (S702), and if a hit is detected (S702: Yes), it is determined whether or not the access requires data writing. (S707). When data writing is requested (S707: Yes), data is written to the way (Way (n)) corresponding to the tag of the data memory (S710).

また、ステップＳ７０７において、アクセスがデータの書出しを目的とするものでないと判断された場合（Ｓ７０７：Ｎｏ）、Ｗａｙ（ｎ）にキャッシュされているデータを読み出し、アクセスしたプロセッサに供給する（Ｓ７０８）。そして、キャッシュ制御部１０３は、ＬＲＵにしたがってこのデータのアクセス履歴等を示す情報をタグメモリ２０６において更新する（Ｓ７０９）。 If it is determined in step S707 that the access is not intended to write data (S707: No), the data cached in Way (n) is read and supplied to the accessed processor (S708). . Then, the cache control unit 103 updates the information indicating the access history of this data in the tag memory 206 according to the LRU (S709).

一方、ステップＳ７０２において、データがヒットしないと判断された場合（Ｓ７０２：Ｎｏ）、キャッシュ制御部１０３は、アクセス回数が最も少ないウェイ（Ｗａｙ（ｎ））を検出し、さらにウェイ（Ｗａｙ（ｎ））においてプロセッサがアクセスした後最も長い時間が経過したデータをＬＲＵのアルゴリズムによって検出する（Ｓ７０３）。そして、検出されたデータがダーティデータであるか否か判断する（Ｓ７０４）。データがダーティデータである場合（Ｓ７０４：Ｙｅｓ）、このデータをデータメモリ２０７から書出し（Ｓ７０５）、書き出された領域に外部メモリ１０５からデータを読み出す（Ｓ７０６）。 On the other hand, if it is determined in step S702 that the data does not hit (S702: No), the cache control unit 103 detects the way (Way (n)) with the smallest access count, and further detects the way (Way (n)). In step S703, the data having the longest time elapsed since the processor accessed in step) is detected by the LRU algorithm. Then, it is determined whether the detected data is dirty data (S704). If the data is dirty data (S704: Yes), the data is written from the data memory 207 (S705), and the data is read from the external memory 105 to the written area (S706).

図１０は、図９に示した処理のうち、ＵｓｅｄテーブルのＬＲＵ順位を変更するための処理を説明するためのフローチャートである。キャッシュ制御部１０３は、プロセッサによって要求されたデータをタグメモリ２０６に照会し、要求されたデータがいずれかのウェイにヒットしたか否か判断する（Ｓ８０１）。ヒットしたと判断された場合（Ｓ８０１：Ｙｅｓ）、ヒットしたウェイのＬＲＵ順位を０に更新する（Ｓ８０６）。 FIG. 10 is a flowchart for explaining processing for changing the LRU order of the Used table in the processing shown in FIG. The cache control unit 103 inquires of the tag memory 206 about the data requested by the processor, and determines whether the requested data hits any way (S801). If it is determined that a hit has occurred (S801: Yes), the LRU rank of the hit way is updated to 0 (S806).

次に、キャッシュ制御部１０３は、変数ｓを０に設定し（Ｓ８０７）、複数のプロセッサの各ウェイのＬＲＵ順位ｓを順次ｓ+１に更新する（Ｓ８０８）。この更新は、更新後のＬＲＵ順位が、ヒットしたプロセッサのウェイのヒット直前のＬＲＵ順位に達するまで行われる（Ｓ８０９）。
一方、プロセッサｋが要求したデータがキャッシュにヒットしないとき（Ｓ８０１：Ｎｏ）、キャッシュ制御部１０３は、ＬＲＵ順位が最も低いウェイを検出する。そして、このウェイのＬＲＵ順位を０に更新する（Ｓ８０２）。なお、この際、外部メモリ１０５から読み出されたデータは、今回ＬＲＵ順位が０に更新されたウェイにキャッシュされる。 Next, the cache control unit 103 sets the variable s to 0 (S807), and sequentially updates the LRU order s of each way of the plurality of processors to s + 1 (S808). This update is performed until the updated LRU order reaches the LRU order just before the hit of the hit processor way (S809).
On the other hand, when the data requested by the processor k does not hit the cache (S801: No), the cache control unit 103 detects the way with the lowest LRU order. Then, the LRU rank of this way is updated to 0 (S802). At this time, the data read from the external memory 105 is cached in the way whose LRU rank is updated to 0 this time.

そして、キャッシュ制御部１０３は、変数ｓを０に設定し（Ｓ８０８）、複数のプロセッサの各ウェイのＬＲＵ順位ｓを順次ｓ+１に更新する（Ｓ８０９）。この更新は、全てのウェイについて行われる。
以上述べた実施形態２によれば、ウェイを複数備えたマルチプロセッサに適したＬＲＵ方式を実現し、キャッシュヒット率を高めることができる。また、データメモリやタグメモリをマルチポートメモリとしたことによって複数のプロセッサがキャッシュメモリに同時にアクセスすることができる。このため、実施形態２は、マルチプロセッサの処理能力を向上させることができる。 Then, the cache control unit 103 sets the variable s to 0 (S808), and sequentially updates the LRU order s of each way of the plurality of processors to s + 1 (S809). This update is performed for all ways.
According to the second embodiment described above, an LRU method suitable for a multiprocessor having a plurality of ways can be realized, and the cache hit rate can be increased. In addition, since the data memory and the tag memory are multi-port memories, a plurality of processors can simultaneously access the cache memory. For this reason, Embodiment 2 can improve the processing capability of the multiprocessor.

なお、複数のプロセッサが同時にキャッシュメモリにアクセスした場合のＬＲＵ順位の更新は、例えば、プロセッサに対して予め優先順位を付しておき、この優先順位にしたがう順序で更新するようにすることも可能である。 It should be noted that the update of the LRU order when a plurality of processors simultaneously access the cache memory can be performed, for example, by assigning priorities to the processors in advance and updating them in the order according to the priorities. It is.

本発明の実施形態１、実施形態２のキャッシュメモリを備えたマルチスレッドプロセッサを示した図である。It is the figure which showed the multithread processor provided with the cache memory of Embodiment 1 and Embodiment 2 of this invention. 複数のプロセッサが優先順位に応じて実行されるスレッドを動的に変更する動作を説明するための図である。It is a figure for demonstrating the operation | movement which changes the thread by which a some processor is performed according to a priority. 図１に示したキャッシュメモリの構成をより詳細に示した図である。FIG. 2 is a diagram showing the configuration of the cache memory shown in FIG. 1 in more detail. 実施形態１のキャッシュメモリにおいて実行されるデータの読出しあるいは書込みの動作を説明するためのフローチャートである。3 is a flowchart for explaining a data read or write operation executed in the cache memory according to the first embodiment. 図４と比較するため、従来のキャッシュメモリで実行されるデータの読出しの動作を説明するためのフローチャートである。FIG. 5 is a flowchart for explaining a data read operation executed in a conventional cache memory for comparison with FIG. 4. FIG. 実施形態１のキャッシュメモリを命令キャッシュにも適用し、先読みキャッシュとして構成したものである。The cache memory of the first embodiment is also applied to an instruction cache and configured as a prefetch cache. 本発明の実施形態２において、タグメモリ、データメモリに記憶されるデータの構造を説明するための図である。In Embodiment 2 of this invention, it is a figure for demonstrating the structure of the data memorize | stored in a tag memory and a data memory. 実施形態２のＬＲＵの処理を説明するための図である。FIG. 10 is a diagram for explaining LRU processing according to the second embodiment. 実施形態２のうちキャッシュの制御を説明するためのフローチャートである。10 is a flowchart for explaining cache control in the second embodiment. 実施形態２のうちキャッシュの制御のうちＬＲＵの更新を説明するためのフローチャートである。10 is a flowchart for explaining LRU update in cache control in the second embodiment.

Explanation of symbols

１０１マルチスレッドプロセッサ、１０３キャッシュ制御部、１０５外部メモリ、１０７キャッシュメモリ部、１０９キャッシュメモリ、２０１アドレス制御部、２０２バッファ管理部、２０３ライト・バッファ、２０５リード・バッファ、２０６タグメモリ、２０７データメモリ、２０８ヒット検出部 101 multi-thread processor, 103 cache control unit, 105 external memory, 107 cache memory unit, 109 cache memory, 201 address control unit, 202 buffer management unit, 203 write buffer, 205 read buffer, 206 tag memory, 207 data memory 208 Hit detection unit

Claims

A cache memory that caches at least a part of data read from a storage device by a plurality of processors and supplies at least a part of the cached data to the processor;
A data storage means cached data is stored by a plurality of said processors,
Address management means for collectively managing addresses of data stored in the data storage means for the entire data stored in the data storage means;
Hit detection means for comparing the address of the data requested to be supplied by the processor with the address managed by the address management means and detecting whether the data requested to be supplied is readable from the data storage means When,
Data supply means for supplying the detected data to the processor when the hit detection means detects that the data is readable;
A read buffer for temporarily storing only the data read from the data storage means until the data is output to the processor;
A write buffer for temporarily storing only data transmitted from the processor until the data is written to the data storage means;
The data stored in the read buffer is compared with the data stored in the write buffer, and the data stored in the read buffer and the data stored in the write buffer are compared. When a data mismatch that should be the same data is detected, the data stored in the read buffer is updated to the data written in the data storage means, or stored in the read buffer. The data stored in the read buffer and the write buffer are saved by invalidating the data once written and writing the data from the write buffer to the data storage means and then reading the written data. And a cache control unit that matches the received data. Yumemori.

The data supply means, when it is detected by the hit detection means that the data can be read, supplies the detected data to the processor and includes data that is continuous with the data supplied to the processor the cache memory according to claim 1, characterized in that also supplied to the buffer to store the data of data read out from the storage means temporarily.

2. The cache memory according to claim 1, further comprising prefetch data storage means for caching data expected to be supplied by the processor.

The data management unit manages the address of the data storage unit as a plurality of ways, and sets a storage priority that is a priority when the data stored in the data storage unit is stored in the data storage unit. subjected to each and to any one of claims 1-3 for the storage priority is assigned to each way and determining based on the state of access to data managed by the way The listed cache memory.

It said data storage means, the cache memory according to claim 1, any one of 4, characterized in that at least one is a multi-port memory of the data management unit.

A processor comprising a cache memory that caches at least part of data read from a storage device by a plurality of processors and supplies at least part of the cached data to the processor,
The cache memory is
A data storage means cached data is stored by a plurality of said processors,
Address management means for collectively managing addresses of data stored in the data storage means for the entire data stored in the data storage means;
Hit detection means for comparing the address of the data requested to be supplied by the processor with the address managed by the address management means and detecting whether the data requested to be supplied is readable from the data storage means When,
Data supply means for supplying the detected data to the processor when the hit detection means detects that the data is readable;
A read buffer for temporarily storing only the data read from the data storage means until the data is output to the processor;
A write buffer for temporarily storing only data transmitted from the processor until the data is written to the data storage means;
The data stored in the read buffer is compared with the data stored in the write buffer, and the data stored in the read buffer and the data stored in the write buffer are compared. When a data mismatch that should be the same data is detected, the data stored in the read buffer is updated to the data written in the data storage means, or stored in the read buffer. The data stored in the read buffer and the write buffer are saved by invalidating the data once written and writing the data from the write buffer to the data storage means and then reading the written data. A cache control unit that matches the stored data;
A processor comprising:

The processor according to claim 6 , wherein each of the plurality of processors executes processing for each thread and can change a thread being executed during execution of the processing with another thread.