JP4307604B2

JP4307604B2 - Computer circuit system and method using partial cache cleaning

Info

Publication number: JP4307604B2
Application number: JP37702898A
Authority: JP
Inventors: ショベルジェラール; ラッセールセルジ; ダンベルノドミニク
Original assignee: テキサスインスツルメンツインコーポレイテツド
Priority date: 1998-12-07
Filing date: 1998-12-07
Publication date: 2009-08-05
Anticipated expiration: 2018-12-07
Also published as: JP2000172563A

Description

【０００１】
【発明の属する技術分野】
本実施例は、１つまたはそれ以上のキャッシュメモリを実現する計算機使用環境に関する。
【０００２】
【発明が解決しようとする課題】
キャッシュ回路は、（たとえば、マイクロプロセッサなど）現代の計算システムにおいて、情報にアクセスするために必要な時間の潜在的な長さを短縮することによって、システムの性能を高めるために頻繁に使用される重要な構成要素である。普通、キャッシュ回路には、一般にランダムアクセスメモリ（RAM）であるタグメモリなど、各種の構成要素が含まれている。タグＲＡＭは、一般に個別のキャッシュ・データＲＡＭに格納されたキャッシュデータに対応する、いわゆるタグ情報を格納する。タグ情報には（たとえば、外部メモリ構造体のような）何らかの別のメモリデバイス内でキャッシュに入れられたデータ（cached data）を見つけることができる実際のアドレスのように、キャッシュに入れられたデータに対応する各種の特長が含まれている。回路の他の構成要素は、タグＲＡＭに関連しているヒット検出回路である。（Ｎウエイ・セット・アソシアティヴ・キャッシュ回路（N-way set associative cache circuit）にはこの回路がｎ個ある）ヒット検出回路は、タグ情報の一部として格納されている実際のアドレスと、着信アドレス（incoming address）を比較する。この比較が一致すると、キャッシュ回路にヒットがあるという。つまり、キャッシュ・データＲＡＭから、この着信アドレスで探索されるデータを直接検索することができ、反対にこの比較が一致しなければ、キャッシュ回路でミスがあるという。つまり、この着信アドレスで探索されるデータはキャッシュ・データＲＡＭの中に位置していないか、何らかの別の理由であてにできない。キャッシュ・ミスの場合は、メイン・メモリ（つまり外部メモリ）など、メモリ階層の上位メモリか、システムの上位レベルに位置している別のキャッシュメモリからデータを検索しなければならない。したがって、キャッシュ・ミスの後のデータアクセスは、キャッシュ・ヒットの場合のアクセス時間に比較して、非常に長い時間が必要になる。実際、外部メモリから検索するためのアクセスの場合、キャッシュ・ヒットが発生したときのアクセス時間に比較すると、所要時間はかなり長くなるであろう。
【０００３】
上述の説明は、キャッシュ・メモリが有利であると一般に考えられていることを示しているが、計算機および計算機使用環境がさらに複雑になると、キャッシュ動作をもっと詳細に精査して、さらに能率が追加するか否かを決定する必要がある。この点に関し、本発明者は、キャッシュ回路のある種の動作のコンテクストにおいて、いくつかのクロックサイクルを減少させることができることを確認している。キャッシュ動作に関連して使用されるクロックサイクルを減少させると、システムの速度が改善する。またこのクロックサイクルの減少は、携帯形コンピュータなど、多くの現代のシステムで大きな問題になっているシステム全体の電力消費を減少させることにもなる。
【０００４】
【発明が解決しようとする課題】
１つの好適実施例の中に、計算システムを動作させる方法がある。この計算システムにはキャッシュ・メモリが含まれており、このキャッシュ・メモリには所定の数のキャッシュ・ラインがある。最初にこの方法は、複数の書込みアドレスに対して、その複数の書込みアドレスのそれぞれに対応する位置にあるキャッシュ・メモリにデータを書込む。次にこの方法は、キャッシュ・メモリ内の選択した数のラインをクリーニングする。クリーニングするステップは、選択した数のラインのそれぞれに対して、そのラインにおけるデータに対応するダーティ・インジケータを評価して、ダーティ・インジケータがそのライン内のデータがダーティであることを示している場合は、そのラインから他のメモリにデータをコピーする。最終的に、選択されたクリーニングするラインの数は、所定のキャッシュ・ラインの数より少ない。また、その他の回路、システムおよび方法が開示されているとともに請求の範囲に明記されている。
【０００５】
【発明の実施の形態】
図１は、一般的な無線データプラットホーム１０の好適実施例を示しており、この無線データプラットホームの中で、この本明細書中に説明されている各種のキャッシュの実施例を実現することができるとともに、この無線データプラットホームは、たとえば、スマートホン（Smartphone）または携帯コンピュータの実現に使用することができる。無線データ・プラットホーム１０には、それぞれ対応する命令メモリ管理ユニット（ＭＭＵ）１２ｃ、１２ｄを備えた命令キャッシュ１２ａおよびデータキャッシュ１２ｂを有する汎用（ホスト）プロセッサ１２が含まれており、バッファ回路１２ｅおよび動作コア１２ｆも示されているが、これらの回路はすべてシステムバスＳＢＵＳを使用して交信する。ＳＢＵＳには、データＳＢＵＳｄ、アドレスＳＢＵＳａおよび制御ＳＢＵＳｃの導線が含まれている。（示されていない）自身の内部キャッシュを有するディジタル信号プロセッサ（ＤＳＰ）１４ａおよび周辺装置インターフェース１４ｂがＳＢＵＳに接続されている。示されていないが、ディジタル・アナログ変換器（DAＣ）またはネットワークインターフェースを含む各種周辺装置を周辺装置インターフェース１４ｂに接続することができる。ＤＳＰ１４ａおよび周辺装置インターフェース１４ｂは、ＤＭＡインターフェース１６に接続されており、ＤＭＡインターフェース１６はさらにＤＭＡコントローラ１８に接続されている。またＤＭＡコントローラ１８は、ＬＣＤまたはビデオディスプレー２２と交信するビデオまたはＬＣＤコントローラ２０と同様、ＳＢＵＳに接続されている。ＤＭＡコントローラ１８は、アドレスバス２４ａ、データバス２４ｄおよび制御バス２４ｃを介してメイン・メモリに接続されているが、この好適実施例のメインメモリは同期ダイナミックランダムアクセスメモリ（ＳＤＲＡＭ）２４である。同様にＤＭＡコントローラ１８は、アドレスバス２６ａ、データバス２６ｄおよび制御バス２６ｃを介して、１つ（または複数）のフラッシュメモリ２６に接続されている。
【０００６】
無線データ・プラットホーム１０の一般的動作の側面は、無線データ・プラットホーム１０が汎用プロセッサ１２とＤＳＰ１４ａとの双方を使用していることを説明することによって、本発明の概念と関連して理解される。このように、１つのメモリを共用する複数のコアがあるので、後で説明する本発明の方法は、マルチコアシステムなどのシステム性能に各種の改善をもたらすことが理解できるであろう（これは、無線データ・プラットホーム１０とは別のシステムの場合であってもよい）。また、以下に説明する本発明の側面の多くは、単一プロセッサシステムの動作を改善することもできることに注意されたい。
【０００７】
次に本好適実施例のキャッシュの側面に注意を向けると、図２は、例として、図１の汎用プロセッサ１２のデータキャッシュ１２ｂのアーキテクチャを示している。これの構造を詳細に説明する前に、本発明の各種の教示は、命令キャッシュ１２ａ、ＤＳＰ１４ａの１つまたは両方のキャッシュ、または（たとえば、一体化されたキャッシュのような）プラットホーム１０内のさらに別のキャッシュなど、他のキャッシュと関連して実現されうることが理解されるはずである。また、以下に説明する本発明の各種教示は、キャッシュ・メモリの恩恵を受けるであろうスマートホン、ＰDA、パームトップコンピュータ、ノートブックコンピュータ、デスクトップコンピュータなどを含む処理装置とともに使用されうる。最後に、データキャッシュ１２ｂに関して各種の詳細が以下に示されるが、（たとえば、セット・アソシエーション、アレイ・サイズ、アドレスおよび記憶装置の長さなど）これらの詳細の多くは、説明を判りやすくする目的にすぎないことに注意されたい。
【０００８】
次に図２に示すデータ・キャッシュ１２ｂの詳細に注目すると、データ・キャッシュ１２ｂには、メモリアドレスを受信するキャッシュ制御装置２８が含まれているが、この場合のメモリアドレスは３２ビットのデータアドレスDA[31:0]の一部分であり、この受信される部分には、メモリアドレスを受信するキャッシュ制御装置２８が、３２ビットのアドレスのビット「4」からビット「11」を受信することを示すビット「[11:4]」と、これと同様、３２ビットのアドレスのビット「0」からビット「1」を受信することを示すビットDA[1:0]とが含まれている。キャッシュ制御装置２８は、バーチャル・タグ・アレイ３０vに接続されており、データ・アレイ３２のラインに対応するタグを格納する。この点と後で行う考察の説明に関しては、バーチャル・タグ・アレイ３０vはデータ・アレイ３２の中の各ラインのダーティ・ビットを格納するが、この場合、ダーティ・データの表示は、データ・アレイ３２にもってこられたデータが変更されたが、メモリシステム（たとえば、メイン・メモリ）の上位メモリに、その変更されたコピーが出力されていないことを表していることは、キャッシュ技術において公知である。またデータ・アレイ３２の各ラインに対応して含まれている表示は、ＬＲＵインジケータ・アレイ３４内のＬＲＵビット（最近少しも使用されていないことを示すビット）および有効性インジケータ・アレイ３６内の有効性ビットである。
【０００９】
好適実施例においては、データ・キャッシュ１２ｂは２ウエイセットアソシアティブキャッシュ（２way set associative cache）として配列されているので、タグ・アレイ３０vには、それぞれ２つのプレイン３０ａv、３０ｂvがある。同様にデータ・アレイ３２には、２つのメモリプレイン３２ａ、３２ｂがある。図示の実施例においては、各プレイン３２ａ、３２ｂは１０２４×３２ビット（すなわち４バイト）であるから、１６バイトのラインを形成するためには４つの連続したアドレスを必要とする。プレイン３０ａv、３０ｂvの出力は、それぞれの比較器３８ａ、３８ｂに出力される。DA[31:12]も両比較器３８ａ、３８ｂに接続されている。各比較器３８ａ、３８ｂは、それぞれウエイ１ヒット（Hit way1）およびウエイ２ヒット（Hit way2）と命名された１ビットの出力を発生させる。ウエイ１ヒット信号およびウエイ２ヒット信号は、それぞれの転送ゲート４０ａ、４０ｂの制御入力に接続され、各転送ゲートは、出力としてアドレスが指定されたデータ・アレイ３２ビットのデータDD[31:0]を与える。
【００１０】
セットアソシアティブキャッシュの動作は当技術では公知であるから、キャッシュ・クリーニング・プロセスに、より詳細に関連して後で行う考察で理解できる詳細な説明のコンテクストを与えるために、ここでは読取り動作だけについて要約されている。読取り動作に注目すると、メモリアクセスのためにアドレスDA[31:0]が受信されると、アドレスビットDA[11:4]は、バーチャル・タグ・アレイ３０vの各プレインプレイン３０ａv、３０ｂvへのアドレスとして使用される。各プレイン３０ａv、３０ｂvは、そのアドレスに対応して、タグ・ビットＴａｇ＿DA[31:12]を出力するが、この場合、そのタグにはデータ・アレイ３２に格納されるデータのアドレスの表示が含まれている。次にビットDA[31:12]は、一致（すなわちヒット）するか否かを決定するため比較器３８ａ、３８ｂを介してそのタグと比較され、一致する場合は、比較器３８ａ、３８ｂのどちらかの出力が、それぞれウエイ１ヒット信号またはウエイ２ヒット信号を動作可能にする。この同じプロセス中に、この例ではビットDA[11:4]になっているアドレスのインデックス部がデータ・アレイのプレイン３２ａ、３２ｂに適用される。したがって、両プレインが、そのインデックスからの情報を出力して、ウエイ１ヒット信号またはウエイ２ヒット信号が動作可能になると、出力データDD[31:0]としてこれらのプレインのどれか１つの出力が現れる。いうまでもなく、キャッシュミスが発生すると（つまり、ウエイ１ヒット信号もウエイ２ヒット信号も動作可能にならないと）、メモリ階層におけるキャッシュ１２ｂより上位のメモリからアドレス指定された情報が探索される。最後にタグメモリ内の各メモリアドレスに、アレイ・インジケータ３６内の対応する有効性ビットがあるようになることを想起されたい。これらのビットは、キャッシュ内の対応する位置にあるデータが有効であるか否かを表示する。ＬＲＵアレイ・インジケータ３４内のビットは、キャッシュミスの後、プレイン３２ａ、３２ｂのどのラインを更新するか決定する。
【００１１】
データ・キャッシュ１２ｂにはキャッシュ・クリーン機能も含まれているが、この機能は、次に図３のブロック図によって最初に機能的に詳細に説明するように、キャッシュ動作の能率を大幅に改善することができる。特に図３は、キャッシュ・クリーン機能に関する限り、キャッシュ制御装置２８を非常に詳細に示している。キャッシュ制御装置２８には、I_MAXと命名されたアドレスの値を格納するためのアドレス・レジスタ４２が含まれており、後で判るように、このアドレス・レジスタ４２は、次に説明する追加回路によって制御され、データアドレス・インデックス（つまりDA[11:4]）のいくつかのコピーを格納する。アドレス・レジスタ４２のアドレス入力は転送ゲート（passgate）４４の出力に接続されており、このゲートには、アドレス・インデックスDA[11:4]を受信するために接続されているデータ入力がある。また、アドレス・インデックスDA[11:4]は比較器４６の入力に接続されており、さらに比較器４６はアドレス・レジスタ４２に格納されたI_MAXの値を受信するように接続されている。以下、詳細に説明する理由から、データ・キャッシュ１２ｂへの書込みに応答してキャッシュヒットが発生すると、比較器４６は、I_MAXの値が着信アドレス・インデックスDA[11:4]より大きいか否かを決定し、大きい場合は、転送ゲート４４の制御入力を動作可能にするので、その時の着信アドレス・インデックスDA[11:4]がアドレス・レジスタ４２にコピーされて、I_MAXの値を更新する。
【００１２】
図３について補足すると、キャッシュ制御装置２８にはキャッシュ・クリーン処理回路４８が含まれており、この回路４８は、バーチャル・タグ・アレイ３０vから与えられる１つの入力としてダーティ・ビットを受信するために接続されているとともに、以下、詳細に説明する機能を動作可能にするＣＡＣＨＥ＿ＣＬＥＡＮ信号を受信するために接続されている。実際にＣＡＣＨＥ＿ＣＬＥＡＮ信号は、レジスタ４２内のI_MAXの値をクリアーするために接続されることに注意されたい。またレジスタ４２内のI_MAXの値は、キャッシュ・クリーン処理回路４８に対する入力である。キャッシュ・クリーン処理回路４８の構造は、次に述べる図４、５の説明から判るように、キャッシュ・クリーン処理回路４８の機能が与えられている各種の代替回路から当業者によって選択されうる。
【００１３】
図４は、全体を参照番号５０で示す方法の流れ図を示しており、この流れ図はデータ・アレイ３２の書込みに関するキャッシュ制御装置２８の好適動作を説明しているが、かかる方法の大部分は、図３に示す回路ブロックの動作を介して達成される。方法５０はステップ５２で開始し、ここで（レジスタ４２内の）I_MAXの値がゼロにクリアーされる。好適実施例においては、ＣＡＣＨＥ＿ＣＬＥＡＮ信号を立ち上げることによって、このステップを達成することができることに注意されたい。さらにこの点について、本実施例が、コンテクストスイッチを含む動作に関連して改善をもたらしていることは、方法５０の考察の結論によって理解できるであろう。示されていないが、実際にステップ５２は、第１のコンテクストスイッチに対する応答に伴うデータ・キャッシュ１２ｂの初期化の重要な役割であろう。コンテクストスイッチは当業者には公知の用語であるにしても、これらの代替方法を説明した後では、コンテクストスイッチの意味を吟味することは、本明細書の読者のために役立つであろう。コンテクストスイッチは、オペレーティングシステムによって頻繁に実行される外部割り込みまたはクロックタイマーの満了など、いろいろな事象（events）に応答して発生する。このスイッチがプロセスの変更に関連していることは、プラットホーム１０または各種動作が複数のプロセスに分かれている、他のプロセッサに制御されるシステムの中で判ることである。各プロセスは各種の事項（matters）によって定義されており、しかもこれらの事項には、プロセスによって使用されるメモリの領域、プロセスの入出力マッピング、アドレス変換のようなプロセスのメモリ管理、および通常は汎用レジスタに格納される値を特長づけるプロセスが含まれることが多い。コンテクストスイッチは、現在のプロセスが新しいプロセスに変更される場合に発生する。このため、次のプロセス（または、いくつかの別のプロセス）が正しく動作するように、この最新のプロセスに関するこれらの側面のそれぞれを説明する情報を格納する必要がある場合、現在の最新プロセスであることが、もう一度切り替えられると、そのプロセスが再び最新プロセスになるように、この情報を検索することができる。
【００１４】
次に方法５０に戻ると、ステップ５２の後、流れはステップ５４に続く。ステップ５４は、データ・キャッシュ１２ｂを含むメモリシステムに書込みアドレスが発行されることを表している。したがって、図１を簡単に振り返ってみると、ＳＤＲＡＭ２４にデータを書込むためコア１２ｆがアドレスを発行するとステップ５２の実例が発生するのであるから、ＳＤＲＡＭ２４は、下位レベルにあるデータ・キャッシュ１２ｂを含むメモリシステムの中では上位にあることに注意されたい。次に方法５０はステップ５４からステップ５６に続く。
【００１５】
ステップ５６は、ステップ５４で発行された書込みアドレスに応答して、データ・キャッシュ１２ｂでヒットが発生するか否かを決定する。キャッシュヒットが発生しないと（つまりキャッシュミスが発生すると）、方法５０はステップ５６からステップ５８に続く。逆に、キャッシュヒットが発生すると、方法５０はステップ５６からステップ６０に続く。これらの代替経路のそれぞれを以下に説明する。
【００１６】
ステップ５８の場合に注目するとともに、キャッシュミスに応答してステップ５８が発生することを認識すると、ステップ５８は、キャッシュ技術で公知の方法と同様に単独で動作する。特にステップ５８は、データ・キャッシュ１２ｂ以外の記憶回路の中のアドレス位置にデータを書込む。たとえばプラットホーム１０においては、この書込みはＳＤＲＡＭ２４内の適切なアドレスに対して行われる。
【００１７】
キャッシュヒットが発生すると実行されるステップ６０の場合に注目すると、ステップ６０は、現在のアドレス・インデックスDA[11:4]の値がI_MAXの値より大きいか否かを決定する。しばらく図３に戻ると、ステップ６０の動作は比較器４６によって達成されうることに注意されたい。アドレス・インデックスDA[11:4]の値がI_MAXの値より大きい場合、方法５０はステップ６０からステップ６２に続き、そうでない場合は、方法５０はステップ６４に進むが、まずステップ６２の動作を検討した後、これについて以下に説明する。アドレス・インデックスDA[11:4]の値がI_MAXの値より大きいため、すでに到達しているステップ６２は、新しいI_MAXの値として、最新のアドレス・インデックスDA[11:4]を格納する。この点について、２つの事項に注意されたい。第１に、I_MAXの値がステップ５２でクリアーされてから、初めてステップ６０に到達し、かつアドレス・インデックスDA[11:4]の値がゼロでない場合、ステップ６０は、この方法の流れをステップ６２に移しているはずであるから、I_MAXの値は、最新のアドレス・インデックスまで増加している。第２に再び図３に戻ると、ステップ６２は、比較器４６の出力と、転送ゲート４４に対する比較器４６の制御とによって実行される。特にステップ６０を実行する際、比較器４６が、DA[11:4]がI_MAXの値以上になっていると決定すると、レジスタ４２にDA[11:4]がコピーされて、DA[11:4]が、I_MAXの新しい値になるように、比較器４６の出力が転送ゲート４４を動作可能にする。次に方法５０はステップ６２からステップ６４に続く。
【００１８】
ステップ６４は、ステップ５４で指定されたアドレスで、データ・アレイ３２に問題のデータを書込む。また、バーチャル・タグ・アレイ３０v内のダーティ・ビットと書込まれたデータに対応するキャッシュ・ラインとは、ダーティの状態に設定される。次に方法５０はステップ６４からステップ６６に続く。ステップ６６は待ち合わせ状態を表しており、ここで、方法５０は２つの事象のうちの１つを待ち合わせるが、その２つの事象は、別の書込みアドレスまたはコンテクストスイッチの発行である。別の書込みアドレスが発行されると、方法５０はステップ６６からステップ５４に戻る。そのとき、先行ステップが再び発生し、新しく発行された書込みアドレスのインデックスがI_MAXの最新の値より大きい場合、そのインデックスがI_MAXの新しい値になるであろうことは、当業者には理解できるであろう。実際にこのループ動作は、連続する多数の書込みに対して発生し、そのたびに先行ステップが動作するため、I_MAXは増加するであろう。次に最新のコンテクストスイッチの効果に注目すると、方法５０は、先行ステップが続き、ステップ６６からステップ６８に続くので、この時のI_MAXの値は、最後のコンテクストスイッチ以降かつ最新のコンテクストスイッチの前に書込まれているアドレス・インデックスの最大値を表すことは、上述のことから理解されるはずである。
【００１９】
ステップ６８はキャッシュクリーンプロセスを表しているが、後で判るように、このキャッシュクリーンプロセスは、従来の技術とは対照的に性能と効率を劇的に改善することができる。詳細に説明すると、ステップ６８は以下のことを表している。すなわち、I_MAXに等しいＬの値から小さくなり、Ｌの値が０に等しくなるまでのＬのループに対してステップ７０が発生して、Ｌに等しいアドレスをもつ各キャッシュ・ラインがクリーニングされることを表している。換言すると、Ｌの値は、I_MAXから始まって小さくなり、０に等しくなるときまでの間一貫して、ステップ６８にくるたびに小さくなり、流れは、ステップ７０のクリーニング動作に続き、次の反復のためステップ６８に戻ってループする。ステップ７０に注目すると、キャッシュ・ラインのクリーニングは当業者には公知であり、そのラインのタグ（または複数のタグ）を評価して、そのライン内のどのデータがダーティであるか否かを決定することが含まれている。本実施例におけるこの動作は、ＣＡＣＨＥ＿ＣＬＥＡＮ信号によって動作可能になる図３のキャッシュ・クリーン処理回路４８によって制御される。このプロセスは、そのラインにダーティ・データが含まれているか否かを決定し、含まれている場合は、そのデータ（またはライン全体）が上位メモリに書込まれる。それとは反対に、所定のラインに対してそのラインのダーティ・ビット（または複数のダーティ・ビット）が、そのライン全体がクリーンなことを示していると、そのダーティ・ビット（または複数のダーティ・ビット）に対応するデータ・ラインは、上位メモリに書出されない。
【００２０】
上記の説明から、詳細にはI_MAXとステップ６８、７０の効果から、コンテクストスイッチの後、データ・キャッシュ１２ｂがクリーニングされるが、クリーニングプロセスはキャッシュのアドレス０から、コンテクストスイッチの前に書込まれた（つまりI_MAXに格納された）最上位キャッシュアドレスまでを対照にしているにすぎないことは、当業者には理解されるはずである。この方法は、例を示すことによってよく理解されるであろう。ステップ５２の後、それぞれのインデックス・アドレス０、２、４、６および８に対し、連続する５つのキャッシュ書込みがあり、これらの書込みの後にコンテクストスイッチがあるとする。したがって、この時点のI_MAXの値は８に等しく、ステップ６８、７０は、コンテクストスイッチに応答して、アドレス０からアドレス３２までのデータ・アレイ３２をクリーニングするにすぎない。次に、かかる動作は従来の技術とまったく異なることに注意されたい。特に従来の技術においては、コンテクストスイッチに応答して全キャッシュがクリーニングされる。したがって、各キャッシュ・ラインの内容がダーティであるか否かを決定するために、各キャッシュ・ラインが評価され、ダーティな場合は、それらのダーティの内容を上位メモリに書出す。この違いがあるので、本発明の実施例がかなり能率的になることは、当業者には理解されるはずである。説明のためと、５つの連続アドレスの例に戻る手段として、キャッシュには２５５までのアドレスが含まれているものとする。かかる場合、従来の技術は、９から２５５までのアドレスのそれぞれを評価してクリーニングするため、余計な時間を使うであろうが、アドレス指定できるラインごとに１つクロックサイクルがある以上、そうなることは当然である。これとは反対に、好適実施例は、キャッシュ全体より小さいどこかのポイントでクリーニング動作を停止するが、説明したばかりの実施例における中止ポイントは、コンテクストの変更以前に書込まれた最上位アドレスのクリーニングが終了した後である（現在の例では、そのアドレスは８である）。したがって、クリーニング動作に必要な全クロックサイクルの数を大幅に減少させることができ、このクロックサイクル数の減少によって全体の電力消費量も低減する。また、プラットホーム１０に実際に起きる環境のように、頻繁にコンテクストスイッチが発生する環境では、各コンテクストスイッチが発生するたびに、好適実施例の能率が累積される。コンテクストスイッチが発生する合間に数回のキャッシュ書込みがあるにすぎない場合は、この総合的能率は非常に有効であると断言できる。
【００２１】
またステップ６８に関する上記考察から、ステップ６８はアドレス０までクリーニングプロセスを進めることに注意されたい。この方法は、キャッシュの大きさから独立しているので望ましい方法である。したがってどんな場合でも、このような方法に対しては、アドレス０までのクリーニング動作が終了すると、ダーティ・データをもつラインがメイン・メモリに書出されることが想定される。しかし、このような観察ができるため、アドレス０のデータまたはアドレス０に近いデータが変化しているように思われない場合には、２つの代替方法が使用できることに注意されたい。この２つの代替方法のそれぞれを以下に説明する。
【００２２】
上で考察したI_MAXの方法の第１の代替方法においては、所定のコンテクストスイッチに対してキャッシュヒットを発生させる最低位のアドレス・インデックスを決定するため第２のアドレスの値が確定され、I_MAXの値とともにこの第２の値が使用される。したがって、この第２の値をI_MINと命名することにすると、当初この値を大きな値（たとえば、キャッシュの最上位のアドレス）に設定し、所定のコンテクストスイッチ中にキャッシュヒットを発生する最低位のアドレス・インデックスの値まで小さくしてもよい。したがって、例として、キャッシュの最上位インデックス・アドレスは２５５であり、それぞれのインデックス・アドレス８、１６、２４、３２および４０に対して連続する５つのキャッシュ書込みがあり、これらの書込みの後にコンテクストスイッチがあるものとする。この例においては、当初I_MAXは０に等しく、I_MINは２５５に等しい。５回のアクセスに基づいて、I_MAXは、４０に等しくなるまでアクセスのたびに大きくなる。これに反し、アドレス８への最初のアクセスは、I_MINの値を値８まで小さくするけれども、残りのアクセスは、I_MINの更新された値より上位のインデックス・アドレスであるため、残りのアクセスはその値に影響を及ぼさないであろう。この代替方法を結論すると、ステップ７０が、I_MINのアドレスからI_MAXのアドレスまでの間の全ラインであり、かつ、これらのアドレスを含む全ラインをクリーニングして、キャッシュ内のラインの全数よりも少ない、いくつかのラインを再びクリーニングするように、ステップ６０が修正されるであろう。
【００２３】
上で考察したI_MAXの方法の第２の代替方法においては、所定のコンテクストスイッチにキャッシュヒットを発生させる最低位のアドレス・インデックスを決定するため異なるアドレスの値が確定され、この異なる値は、単独で使用されるとともに、キャッシュの最上位アドレスまでクリーニングするために使用される。簡単にいうと、これは、I_MAXの値を使用するプロセスと正反対の動作である。ここでも、この異なるアドレスの値をI_MINと命名することにすると、当初この値を大きな値（たとえば、キャッシュの最上位のアドレス）に設定し、所定のコンテクストスイッチ中にキャッシュヒットを発生する最低位のアドレス・インデックスの値まで小さくしてもよい。しかし、ステップ６８が実行されると、ステップ７０が、I_MINのアドレス値から、キャッシュの最上位、つまりキャッシュの最上位アドレスまでの全てのラインをクリーニングするように、ステップ６８が修正されるであろう。
【００２４】
I_MAXの値の観点から制限的なループ動作になるため、本発明の範囲に、図５に示すような別の実施例が含まれる。方法５０ａと示されているこの代替方法においては、図３の比較器４６の機能が使用されるのではなく、コンテクストスイッチの時点のI_MAXの値を決定するために、最新のコンテクストに関連していて、オペレーティングシステムによって保持されているアドレスの値が使用される。この違いは、ステップ７２について以下に説明されているが、図５に示す残りのステップは、図４に示すステップと同じである。
【００２５】
方法５０ａを参照すると、ステップ５２からステップ６６は上で考察済みなので、ここでは詳細に説明しない。このため、ステップ７２に注目すると、ステップ７２はI_MAXの値を設定するが、ここではその値は、オペレーティングシステムで何か実行中にアクセス可能な値に基づいて設定される。特にある種のオペレーティングシステムは、所定のコンテクストに対してキャッシュ・ラインの最大の値を保持している。したがって、ステップ６６の条件を満たすと、オペレーティングシステムは、終了中の（つまり、そこからスイッチが発生中の）コンテクストスイッチに対応するキャッシュ・ラインの最上位アドレスを使用できるようになるであろう。ステップ７２は、このキャッシュ・ラインの最上位アドレスに等しいI_MAXの値を設定する。このように、方法５０ａがステップ６８に続く場合であって、このキャッシュ・ラインの最上位アドレスがキャッシュ・ラインの全数より小さいという条件の場合、ステップ６８およびステップ７０によって発生するループ動作は、再びデータ・キャッシュ・ラインをクリーニングするので、このキャッシュのクリーニングされるラインの数は、キャッシュ・ラインの全数より小さい。
【００２６】
上記諸実施例が、コンテクストスイッチの後のキャッシュ・クリーニング動作に関連するクロックサイクルの数を減少させ、従来の技術に各種の改善をもたらすことは、上記説明から理解できるであろう。本実施例を詳細に説明してきたが、上記教示に加え、本発明の範囲から逸脱せずに上述の説明に対する各種の代替、修正または変更ができることに注意されたい。たとえば、好適実施例におけるコンテクストスイッチの発生は、I_MAXのリセットをトリガすることであるとともに、その後に連いて発生するコンテクストスイッチに応答して、I_MAXのリセットを限定することであるが、何らかの別の事象によって最初の事象が発生すると、I_MAXの値をリセットし、第２の事象が発生すると、I_MAXの値の上方向の調整を終了して、その後で、どれか最小のアドレスから、最後に退避された I_MAXの値まで、キャッシュを再びクリーニングすることは、当業者ならば確認することができるであろう。別の例として、図４、５は、流れ図を介して一般的な逐次形方法を示しているが、これらのステップを実行する状態マシンのように、かかる動作を実現するために各種の回路を使用できるため、この流れ図に示すような逐次形ではなく、流れが各状態から別の代替状態になりうることは理解されるはずである。さらに別の例として、各種側面を示すためにデータ・キャッシュ１８ｂが使用されてきたが、本発明の教示の多くは、各種の別のキャッシュアーキテクチャにも応用される。さらに最後の例として、プラットホーム１０は説明のためにすぎず、プラットホーム１０はさらに修正されうることと、また多くの創造的な側面が、１つまたはそれ以上のキャッシュメモリを有する別のシステムで実現されうることも理解されるはずである。したがって、前記説明、これらの例、および本発明の教示を得た当業者によって確かめることができるその他の事項は、請求の範囲によって定義されているように、本発明の範囲を示すために役立つはずである。
【図面の簡単な説明】
【図１】本実施例を実現することができる無線データ・プラットホームのブロック図を示す図。
【図２】図１のプラットホームおよび他のプロセス装置において使用することができるキャッシュのアーキテクチャのブロック図を示す図。
【図３】図２のキャッシュ制御装置の各部分のブロック図を示す図であって、これらの部分は、この好適実施例のキャッシュをクリーニングする方法に使用される。
【図４】図１の汎用プロセッサのコンテクストスイッチ中に生起するキャッシュ・クリーンに関連して必要になるクロックサイクルを減少させる第１の実施例の流れ図を示す図であって、キャッシュ・クリーンの程度は、コンテクストスイッチの前にキャッシュに書込まれた最上位のアドレスによって決定される。
【図５】図１の汎用プロセッサのコンテクストスイッチ中に生起するキャッシュクリーンに関連して必要になるクロックサイクルを減少させる第２の実施例の流れ図を示す図であって、キャッシュクリーンの程度は、コンテクストスイッチの前にキャッシュに書込むためにオペレーティングシステムに使用できる最上位のアドレスによって決定される。
【符号の説明】
１０データ・プラットホーム
１２プロセッサ
１２ａ命令キャッシュ
１２ｂデータ・キャッシュ
１２ｃ、１２ｄ命令メモリ管理ユニット（ＭＭＵ）
１２ｅ入力バッファ回路
１２ｆ動作コア
１４ａＤＳＰコア
１４ｂ周辺装置インターフェース
１６ＤＭＡインターフェース
１８ＤＭＡコントローラ
１８ａＦＩＦＯ
１８ｂ記憶装置
１８ｃタイマー
２０ビデオまたはＬＣＤコントローラ
２２ＬＣＤまたはビデオ
２４メイン・メモリ（同期ＤＲＡＭ）
２４ａ、２６ａアドレス・バス
２４ｂ、２６ｂデータ・バス
２４ｃ、２６ｃ制御バス
２６フラッシュメモリ
２８キャッシュ制御装置
３０ａｖ、３０ｂｖバーチャル・タグ・アレイのメモリプレイン
３０ｖバーチャル・タグ・アレイ
３２データ・アレイ
３２ａ，３２ｂデータ・アレイのメモリプレイン
３４ＬＲＵインジケータ・アレイ
３６有効性インジケータ・アレイ
３８ａ、３８ｂ比較器
４０ａ、４０ｂ、４４転送ゲート
４２アドレス・レジスタ
４６比較器
４８キャッシュ・クリーン処理回路
５０方法
６４キャッシュメモリにデータを書込み、状態をダーテイに設定するステップ
６８キャッシュメモリの選択されたラインをクリーニングするステップ[0001]
BACKGROUND OF THE INVENTION
The present embodiment relates to a computer use environment that realizes one or more cache memories.
[0002]
[Problems to be solved by the invention]
Cache circuits are frequently used in modern computing systems (eg, microprocessors) to increase system performance by reducing the potential length of time required to access information. It is an important component. Normally, a cache circuit includes various components such as a tag memory, which is generally a random access memory (RAM). The tag RAM generally stores so-called tag information corresponding to the cache data stored in the individual cache data RAM. The tag information includes cached data, such as the actual address where cached data can be found in some other memory device (such as an external memory structure). Various features corresponding to are included. Another component of the circuit is a hit detection circuit associated with the tag RAM. (The N-way set associative cache circuit has n circuits) The hit detection circuit receives the actual address stored as part of the tag information and the incoming call Compare addresses (incoming addresses). If this comparison matches, there is a hit in the cache circuit. In other words, the data searched for by this incoming address can be directly searched from the cache data RAM. On the contrary, if this comparison does not match, there is a mistake in the cache circuit. That is, the data searched for at this incoming address is not located in the cache data RAM or cannot be relied on for some other reason. In the case of a cache miss, data must be retrieved from the upper memory of the memory hierarchy, such as main memory (ie, external memory), or another cache memory located at the upper level of the system. Therefore, data access after a cache miss requires a very long time compared to the access time in the case of a cache hit. In fact, in the case of an access for retrieving from external memory, the required time will be considerably longer than the access time when a cache hit occurs.
[0003]
The above description shows that cache memory is generally considered advantageous, but as computers and computer usage environments become more complex, the cache operation will be scrutinized in more detail to add more efficiency. It is necessary to decide whether or not to do so. In this regard, the inventor has confirmed that several clock cycles can be reduced in the context of certain operations of the cache circuit. Reducing the clock cycles used in connection with cache operations improves system speed. This reduction in clock cycles also reduces overall system power consumption, which is a major problem in many modern systems, such as portable computers.
[0004]
[Problems to be solved by the invention]
In one preferred embodiment is a method of operating a computing system. The computing system includes a cache memory, which has a predetermined number of cache lines. Initially, the method writes data to a cache memory at a position corresponding to each of the plurality of write addresses. The method then cleans a selected number of lines in the cache memory. If the cleaning step evaluates the dirty indicator corresponding to the data in that line for each selected number of lines, and the dirty indicator indicates that the data in that line is dirty Copies data from that line to another memory. Finally, the number of lines to be selected is less than the predetermined number of cache lines. In addition, other circuits, systems, and methods are disclosed and specified in the claims.
[0005]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows a preferred embodiment of a typical wireless data platform 10 in which the various cache embodiments described herein can be implemented. In addition, this wireless data platform can be used, for example, to implement a smartphone or a portable computer. The wireless data platform 10 includes a general purpose (host) processor 12 having an instruction cache 12a and a data cache 12b each having a corresponding instruction memory management unit (MMU) 12c, 12d, and a buffer circuit 12e and operation. Although the core 12f is also shown, all these circuits communicate using the system bus SBUS. SBUS includes the conductors of data SBUSd, address SBUSa and control SBUSc. A digital signal processor (DSP) 14a having its own internal cache (not shown) and a peripheral device interface 14b are connected to the SBUS. Although not shown, various peripheral devices including a digital-to-analog converter (DAC) or a network interface can be connected to the peripheral device interface 14b. The DSP 14 a and the peripheral device interface 14 b are connected to the DMA interface 16, and the DMA interface 16 is further connected to the DMA controller 18. The DMA controller 18 is connected to the SBUS in the same manner as the video or LCD controller 20 that communicates with the LCD or video display 22. The DMA controller 18 is connected to the main memory via an address bus 24a, a data bus 24d and a control bus 24c, the main memory of this preferred embodiment being a synchronous dynamic random access memory (SDRAM) 24. Similarly, the DMA controller 18 is connected to one (or a plurality of) flash memories 26 via an address bus 26a, a data bus 26d, and a control bus 26c.
[0006]
The general operational aspects of the wireless data platform 10 are understood in connection with the concepts of the present invention by explaining that the wireless data platform 10 uses both a general purpose processor 12 and a DSP 14a. . Thus, since there are multiple cores sharing a single memory, it will be understood that the method of the present invention described later provides various improvements in system performance, such as a multi-core system. It may be a system different from the wireless data platform 10). It should also be noted that many of the aspects of the invention described below can also improve the operation of a single processor system.
[0007]
Turning attention now to the cache aspect of the preferred embodiment, FIG. 2 shows, by way of example, the architecture of the data cache 12b of the general purpose processor 12 of FIG. Prior to discussing this structure in detail, the various teachings of the present invention will further explain the instruction cache 12a, one or both caches of the DSP 14a, or the platform 10 (such as an integrated cache). It should be understood that it can be implemented in connection with other caches, such as another cache. Also, the various teachings of the invention described below can be used with processing devices including smart phones, PDAs, palmtop computers, notebook computers, desktop computers, etc. that would benefit from cache memory. Finally, various details regarding the data cache 12b are provided below, but many of these details (eg, set association, array size, address and storage length, etc.) are for clarity of explanation. Please note that this is only.
[0008]
Next, paying attention to the details of the data cache 12b shown in FIG. 2, the data cache 12b includes a cache controller 28 that receives a memory address. In this case, the memory address is a 32-bit data address. This part of DA [31: 0] indicates that the cache control unit 28 that receives the memory address receives bits “4” to “11” from the 32-bit address. Similarly, bits “[11: 4]” and bits DA [1: 0] indicating that bits “1” are received from bits “0” of the 32-bit address are included. The cache controller 28 is connected to the virtual tag array 30v and stores tags corresponding to the lines of the data array 32. With respect to this point and discussion of the discussion that follows, virtual tag array 30v stores dirty bits for each line in data array 32, in which case the display of dirty data is the data array. It is well known in the cache technology that the data stored in 32 is changed, but the changed copy is not output to the upper memory of the memory system (for example, main memory). is there. Also, the indications included corresponding to each line of the data array 32 are the LRU bit in the LRU indicator array 34 (a bit indicating that it has not been used at all recently) and the validity indicator array 36. Validity bit.
[0009]
In the preferred embodiment, the data cache 12b is arranged as a two-way set associative cache, so the tag array 30v has two planes 30av and 30bv, respectively. Similarly, the data array 32 has two memory planes 32a and 32b. In the illustrated embodiment, each plane 32a, 32b is 1024.times.32 bits (ie, 4 bytes), so four consecutive addresses are required to form a 16 byte line. The outputs of the planes 30av and 30bv are output to the respective comparators 38a and 38b. DA [31:12] is also connected to both comparators 38a and 38b. Each of the comparators 38a and 38b generates a 1-bit output named Way 1 hit (Hit way 1) and Way 2 hit (Hit way 2), respectively. The way 1 hit signal and the way 2 hit signal are connected to the control inputs of the respective transfer gates 40a and 40b, and each transfer gate has 32 bits of data DD [31: 0] in the data array whose address is specified as an output. give.
[0010]
Since the operation of set associative caches is well known in the art, in order to give the cache cleaning process a context with a detailed description that can be understood in more detail and later in the discussion, only read operations will be discussed here. It is summarized. Focusing on the read operation, when the address DA [31: 0] is received for memory access, the address bits DA [11: 4] are the addresses to the respective plain planes 30av, 30bv of the virtual tag array 30v. Used as. Each plane 30av, 30bv outputs a tag bit Tag_DA [31:12] corresponding to its address. In this case, the tag includes an indication of the address of the data stored in the data array 32. It is. Bits DA [31:12] are then compared to the tag via comparators 38a, 38b to determine whether they match (ie, hit), and if so, which of comparators 38a, 38b These outputs enable the way 1 hit signal or the way 2 hit signal, respectively. During this same process, the index portion of the address, which in this example is bits DA [11: 4], is applied to the planes 32a, 32b of the data array. Therefore, when both planes output information from the index and the way 1 hit signal or the way 2 hit signal becomes operable, the output of any one of these planes is output as output data DD [31: 0]. appear. Needless to say, when a cache miss occurs (that is, neither the way 1 hit signal nor the way 2 hit signal becomes operable), the addressed information is searched from the memory above the cache 12b in the memory hierarchy. Finally, recall that each memory address in the tag memory has a corresponding validity bit in the array indicator 36. These bits indicate whether the data at the corresponding location in the cache is valid. The bits in the LRU array indicator 34 determine which line of the planes 32a, 32b is updated after a cache miss.
[0011]
The data cache 12b also includes a cache clean function, which greatly improves the efficiency of the cache operation, as will first be described in functional detail with reference to the block diagram of FIG. be able to. In particular, FIG. 3 shows the cache controller 28 in greater detail as far as the cache clean function is concerned. The cache controller 28 includes an address register 42 for storing the value of the address named I_MAX, as will be seen later, this address register 42 is provided by an additional circuit described below. Controlled and stores several copies of the data address index (ie DA [11: 4]). The address input of the address register 42 is connected to the output of a transfer gate 44, which has a data input connected to receive the address index DA [11: 4]. The address index DA [11: 4] is connected to the input of the comparator 46, and the comparator 46 is connected to receive the value of I_MAX stored in the address register 42. For reasons explained in detail below, when a cache hit occurs in response to a write to the data cache 12b, the comparator 46 determines whether or not the value of I_MAX is larger than the incoming address index DA [11: 4]. If it is larger, the control input of the transfer gate 44 is enabled, so that the incoming address index DA [11: 4] at that time is copied to the address register 42 and the value of I_MAX is updated.
[0012]
To supplement FIG. 3, the cache controller 28 includes a cache clean processing circuit 48, which receives the dirty bit as one input provided by the virtual tag array 30v. And connected to receive a CACHE_CLEAN signal that enables the functions described in detail below. Note that the CACHE_CLEAN signal is actually connected to clear the value of I_MAX in register 42. The value of I_MAX in the register 42 is an input to the cache / clean processing circuit 48. The structure of the cache clean processing circuit 48 can be selected by those skilled in the art from various alternative circuits provided with the function of the cache clean processing circuit 48, as will be understood from the description of FIGS.
[0013]
FIG. 4 shows a flow diagram of a method, generally designated by the reference numeral 50, which describes the preferred operation of the cache controller 28 with respect to the writing of the data array 32; This is achieved through the operation of the circuit block shown in FIG. Method 50 begins at step 52 where the value of I_MAX (in register 42) is cleared to zero. Note that in the preferred embodiment, this step can be accomplished by raising the CACHE_CLEAN signal. Further in this regard, it can be seen from the conclusion of the discussion of method 50 that the present embodiment provides improvements in relation to operation involving context switches. Although not shown, in practice step 52 may be an important role in the initialization of the data cache 12b in response to a response to the first context switch. Although context switches are terms well known to those skilled in the art, after describing these alternatives, it will be helpful for the reader of this specification to examine the meaning of context switches. Context switches occur in response to various events, such as external interrupts frequently executed by the operating system or the expiration of a clock timer. The fact that this switch is associated with a process change can be seen in the platform 10 or in other processor controlled systems where the various operations are divided into multiple processes. Each process is defined by various items (matters), which include the area of memory used by the process, process input / output mapping, process memory management such as address translation, and usually Often includes a process that characterizes the value stored in the general-purpose register. A context switch occurs when the current process is changed to a new process. For this reason, if you need to store information that describes each of these aspects of this latest process so that the next process (or some other process) will work correctly, If something is switched again, this information can be retrieved so that the process becomes the latest process again.
[0014]
Returning now to method 50, after step 52, flow continues to step 54. Step 54 represents that a write address is issued to the memory system that includes the data cache 12b. Accordingly, looking briefly at FIG. 1, the SDRAM 24 includes a data cache 12b at a lower level because the example of step 52 occurs when the core 12f issues an address to write data to the SDRAM 24. Note that it is higher in the memory system. The method 50 then continues from step 54 to step 56.
[0015]
Step 56 determines whether a hit occurs in the data cache 12b in response to the write address issued in step 54. If no cache hit occurs (ie, a cache miss occurs), the method 50 continues from step 56 to step 58. Conversely, method 50 continues from step 56 to step 60 when a cache hit occurs. Each of these alternative paths is described below.
[0016]
Focusing on the case of step 58 and recognizing that step 58 occurs in response to a cache miss, step 58 operates alone, as is well known in the cache technology. In particular, step 58 writes data to an address location in a storage circuit other than the data cache 12b. For example, on the platform 10, this writing is performed to an appropriate address in the SDRAM 24.
[0017]
Focusing on the case of step 60 executed when a cache hit occurs, step 60 determines whether the value of the current address index DA [11: 4] is greater than the value of I_MAX. Note that returning to FIG. 3 for some time, the operation of step 60 may be accomplished by the comparator 46. If the value of the address index DA [11: 4] is greater than the value of I_MAX, the method 50 continues from step 60 to step 62; otherwise, the method 50 proceeds to step 64, but first performs the operation of step 62. This is discussed below after consideration. Since the value of the address index DA [11: 4] is larger than the value of I_MAX, the already reached step 62 stores the latest address index DA [11: 4] as the new value of I_MAX. There are two things to note about this point. First, if step 60 is reached for the first time since the value of I_MAX is cleared in step 52, and the value of address index DA [11: 4] is not zero, step 60 steps through the method flow. Since it should have moved to 62, the value of I_MAX has increased to the latest address index. Second, returning to FIG. 3 again, step 62 is performed by the output of the comparator 46 and the control of the comparator 46 over the transfer gate 44. In particular, when executing step 60, if the comparator 46 determines that DA [11: 4] is equal to or greater than the value of I_MAX, DA [11: 4] is copied to the register 42, and DA [11: 4] becomes the new value of I_MAX, the output of the comparator 46 enables the transfer gate 44. The method 50 then continues from step 62 to step 64.
[0018]
Step 64 writes the data in question to the data array 32 at the address specified in step 54. Further, the dirty bit in the virtual tag array 30v and the cache line corresponding to the written data are set in a dirty state. The method 50 then continues from step 64 to step 66. Step 66 represents a wait state where the method 50 waits for one of two events, the two events being the issue of another write address or context switch. If another write address is issued, method 50 returns from step 66 to step 54. Those skilled in the art will then understand that if the preceding step occurs again and the index of the newly issued write address is greater than the latest value of I_MAX, the index will be the new value of I_MAX. I will. In practice, this loop operation will occur for a number of consecutive writes, and I_MAX will increase because the preceding step operates each time. Turning now to the effect of the latest context switch, method 50 continues with the preceding steps and continues from step 66 to step 68, so that the value of I_MAX at this time is the value after the last context switch and before the latest context switch. It should be understood from the foregoing that it represents the maximum value of the address index written to
[0019]
Step 68 represents a cache clean process, but as will be seen later, this cache clean process can dramatically improve performance and efficiency as opposed to conventional techniques. More specifically, step 68 represents the following. That is, step 70 occurs for the L loop from the L value equal to I_MAX until the L value equals 0, and each cache line with an address equal to L is cleaned. Represents. In other words, the value of L decreases starting at I_MAX, and consistently until it reaches zero, and decreases every time it comes to step 68, the flow continues with the cleaning operation of step 70, and the next iteration Therefore, the process returns to step 68 and loops. Turning to step 70, cache line cleaning is known to those skilled in the art, and the tag (or tags) of the line is evaluated to determine which data in the line is dirty. To be included. This operation in this embodiment is controlled by the cache clean processing circuit 48 of FIG. 3, which is enabled by the CACHE_CLEAN signal. This process determines whether the line contains dirty data, and if so, the data (or the entire line) is written to upper memory. Conversely, if a dirty bit (or multiple dirty bits) for a given line indicates that the entire line is clean, the dirty bit (or multiple dirty bits) Bit) is not written to the upper memory.
[0020]
From the above description, in particular, from the effect of I_MAX and steps 68 and 70, after the context switch, the data cache 12b is cleaned, but the cleaning process is written from the cache address 0 to before the context switch. It should be understood by those skilled in the art that only the top level cache address (ie, stored in I_MAX) is contrasted. This method will be better understood by showing an example. After step 52, assume that there are five consecutive cache writes for each index address 0, 2, 4, 6, and 8, and that there are context switches after these writes. Thus, the value of I_MAX at this time is equal to 8, and steps 68 and 70 are only cleaning the data array 32 from address 0 to address 32 in response to the context switch. Next, note that such operation is quite different from the prior art. In particular, in the prior art, the entire cache is cleaned in response to the context switch. Accordingly, each cache line is evaluated to determine whether the contents of each cache line are dirty, and if they are dirty, the contents of those dirty lines are written to the upper memory. It should be understood by those skilled in the art that this difference makes the embodiments of the present invention fairly efficient. For purposes of explanation, assume that the cache contains up to 255 addresses as a means of returning to the example of five consecutive addresses. In such a case, the prior art would use extra time to evaluate and clean each of the addresses from 9 to 255, but as long as there is one clock cycle for each addressable line. It is natural. In contrast, the preferred embodiment stops the cleaning operation at some point less than the entire cache, but the stop point in the embodiment just described is the highest address written before the context change. After the cleaning is completed (in the present example, the address is 8). Therefore, the total number of clock cycles required for the cleaning operation can be greatly reduced, and the overall power consumption is also reduced by this reduction in the number of clock cycles. Also, in an environment where context switches occur frequently, such as in an environment that actually occurs on the platform 10, the efficiency of the preferred embodiment is accumulated each time each context switch occurs. If there are only a few cache writes between context switches, this overall efficiency can be asserted to be very effective.
[0021]
Also note from the above discussion regarding step 68 that step 68 advances the cleaning process to address 0. This method is desirable because it is independent of the size of the cache. Therefore, in any case, for such a method, it is assumed that when the cleaning operation up to address 0 is completed, a line having dirty data is written to the main memory. However, it should be noted that because of this observation, two alternative methods can be used if the data at or near address 0 does not appear to change. Each of these two alternative methods is described below.
[0022]
In a first alternative to the I_MAX method discussed above, the value of the second address is determined to determine the lowest address index that causes a cache hit for a given context switch, This second value is used with the value. Therefore, if this second value is named I_MIN, this value is initially set to a large value (eg, the highest address of the cache) and the lowest value that will cause a cache hit during a given context switch. You may make it small to the value of an address index. Thus, by way of example, the top index address of the cache is 255, and there are five consecutive cache writes for each index address 8, 16, 24, 32 and 40, after which these context switches There shall be. In this example, initially I_MAX is equal to 0 and I_MIN is equal to 255. Based on 5 accesses, I_MAX increases with each access until it equals 40. On the other hand, the first access to address 8 reduces the value of I_MIN to the value 8, but the remaining accesses are index addresses higher than the updated value of I_MIN, so the remaining accesses Will not affect the value. To conclude this alternative method, step 70 is all lines between the I_MIN address and the I_MAX address, and the entire line containing these addresses is cleaned to be less than the total number of lines in the cache. Step 60 would be modified to clean some lines again.
[0023]
In a second alternative to the I_MAX method discussed above, different address values are established to determine the lowest address index that causes a cache hit for a given context switch, And used to clean up to the highest address of the cache. Simply put, this is the opposite of a process that uses the value of I_MAX. Again, if you decide to name this different address as I_MIN, initially set this value to a large value (for example, the highest address in the cache) and the lowest level that will cause a cache hit during a given context switch. The address index value may be reduced. However, when step 68 is executed, step 68 will be modified so that step 70 cleans all lines from the address value of I_MIN to the top of the cache, ie the top address of the cache. Let's go.
[0024]
Since the loop operation is limited from the viewpoint of the value of I_MAX, another embodiment as shown in FIG. 5 is included in the scope of the present invention. In this alternative method, shown as method 50a, the function of comparator 46 of FIG. 3 is not used, but is associated with the current context to determine the value of I_MAX at the time of the context switch. The address value held by the operating system is used. This difference is described below for step 72, but the remaining steps shown in FIG. 5 are the same as the steps shown in FIG.
[0025]
Referring to method 50a, steps 52 through 66 have been discussed above and will not be described in detail here. Thus, paying attention to step 72, step 72 sets the value of I_MAX, which here is set based on a value that is accessible during the execution of the operating system. In particular, certain operating systems maintain a maximum value for the cache line for a given context. Thus, satisfying the condition of step 66 will allow the operating system to use the highest address of the cache line corresponding to the context switch that is terminating (ie, from which the switch is occurring). Step 72 sets the value of I_MAX equal to the highest address of this cache line. Thus, if method 50a continues to step 68, provided that the highest address of this cache line is less than the total number of cache lines, the loop operation generated by step 68 and step 70 is again Since the data cache lines are cleaned, the number of lines to be cleaned in this cache is less than the total number of cache lines.
[0026]
It will be appreciated from the above description that the above embodiments reduce the number of clock cycles associated with cache cleaning operations after a context switch and provide various improvements over the prior art. Although this embodiment has been described in detail, it should be noted that in addition to the above teachings, various alternatives, modifications, or changes to the above description can be made without departing from the scope of the present invention. For example, the occurrence of a context switch in the preferred embodiment is to trigger a reset of I_MAX and limit the reset of I_MAX in response to a subsequent context switch, but some other When the first event occurs, the value of I_MAX is reset, and when the second event occurs, the upward adjustment of the value of I_MAX is finished, and then the last save is performed from the smallest address. One skilled in the art will be able to confirm cleaning the cache again to the I_MAX value given. As another example, FIGS. 4 and 5 show a general sequential method through flowcharts, but various circuits may be used to implement such operations, such as a state machine that performs these steps. It should be understood that the flow can go from each state to another alternative state, rather than being sequential as shown in this flow chart, as it can be used. As yet another example, data cache 18b has been used to illustrate various aspects, but many of the teachings of the present invention also apply to various other cache architectures. As a final example, platform 10 is for illustration only, platform 10 can be further modified, and many creative aspects are realized in another system having one or more cache memories. It should also be understood that this can be done. Accordingly, the foregoing description, these examples, and others that can be ascertained by one of ordinary skill in the art having learned the teachings of the present invention should serve to illustrate the scope of the invention as defined by the claims. It is.
[Brief description of the drawings]
FIG. 1 is a block diagram of a wireless data platform that can implement the present embodiment.
2 illustrates a block diagram of a cache architecture that can be used in the platform and other process devices of FIG.
FIG. 3 is a block diagram of portions of the cache controller of FIG. 2, which are used in the method of cleaning the cache of this preferred embodiment.
FIG. 4 is a flow diagram of a first embodiment that reduces the clock cycles required in connection with cache clean that occurs during the context switch of the general purpose processor of FIG. Is determined by the highest address written to the cache before the context switch.
FIG. 5 shows a flow diagram of a second embodiment for reducing the clock cycles required in connection with cache clean that occurs during the context switch of the general purpose processor of FIG. It is determined by the highest address that can be used by the operating system to write to the cache before the context switch.
[Explanation of symbols]
10 Data platform
12 processor
12a instruction cache
12b data cache
12c, 12d Instruction memory management unit (MMU)
12e input buffer circuit
12f operating core
14a DSP core
14b Peripheral device interface
16 DMA interface
18 DMA controller
18a FIFO
18b storage device
18c timer
20 Video or LCD controller
22 LCD or video
24 Main memory (synchronous DRAM)
24a, 26a Address bus
24b, 26b Data bus
24c, 26c Control bus
26 Flash memory
28 Cache control device
30av, 30bv virtual tag array memory plane
30v virtual tag array
32 Data array
32a, 32b Data array memory plane
34 LRU indicator array
36 Effectiveness Indicator Array
38a, 38b comparator
40a, 40b, 44 Transfer gate
42 Address register
46 Comparator
48 Cache Clean Processing Circuit
50 methods
64 Writing data to the cache memory and setting the state to dirty
68 Cleaning the selected line of the cache memory

Claims

A method of operating a computing system including a cache memory having a predetermined number of cache lines, comprising:
First, for a plurality of write addresses, writing data to a location in the cache memory corresponding to each of the plurality of write addresses;
Second, cleaning a selected number of lines of the cache memory in response to a context switch by the computing system;
For the selected number of lines, the cleaning step comprises:
Evaluating a dirty indicator corresponding to the data in the line;
Copying the data from the line to another memory if the dirty indicator indicates that the data in the line is dirty;
The number of selected lines is less than a predetermined number of cache lines, and the method of operating the computing system comprises:
Holding a value in the address indicator;
According to each of the plurality of write addresses, an index indicating a position in the cache memory corresponding to the write address is compared with a value held in the address indicator, and the index is larger than a value held in the address indicator. In the case, the step of setting the value of the address indicator to the same value as the index, and when the comparison and setting for a plurality of write addresses is completed, the value of the address indicator indicates the latest value and is selected. The step of setting, wherein the number of defined lines corresponds to the latest value;
Method.

A method of operating a computing system including a cache memory having a predetermined number of cache lines, comprising:
A step of comparing the index indicating the position of the cache line where the write data related to the write request is written with the value held in the address indicator each time the write data related to the write request is written to the cache memory; ,
As a result of the comparing step, when the index indicating the position of the cache line to which the write data is written is larger than the value held in the address indicator, the cache data to which the write data is written is written. Setting an index indicating the position of the line as a new value held in the address indicator;
Evaluating the cache line dirty indicator for all cache lines having a value greater than or equal to 0 and less than or equal to the value held in the address indicator when cleaning the cache memory; ,
Copying cache line data indicating that the evaluated dirty indicator is dirty to another memory;
Having a method.

A computing system,
A cache memory having a predetermined number of cache lines;
A circuit for writing data to a location in the cache memory corresponding to each of the plurality of write addresses in response to the plurality of write addresses;
A circuit for cleaning a selected number of lines of the cache memory,
Circuitry for evaluating a dirty indicator corresponding to the data in the line for each of the selected number of lines;
A circuit for copying data from the line to another memory if the dirty indicator indicates that the data in the line is dirty;
A circuit for cleaning, comprising:
Including
The number of selected lines is less than a predetermined number of cache lines;
Furthermore, the circuit for cleaning is
An address indicator to hold the value;
In response to each of the plurality of write addresses, the index indicating the position in the cache memory corresponding to the write address and the value of the address indicator are compared, and when the index is smaller than the value held in the address indicator, A circuit for setting the value of the address indicator to the same value as the index, the circuit for setting indicating the value of the address indicator indicating the latest value when the comparison and setting for a plurality of write addresses are completed; ,
And the number of selected lines depends on the latest value,
Calculation system.

2. The method of claim 1, wherein the cleaning step includes cleaning all lines between the first address of the cache memory and the latest value.

5. The method of claim 4, wherein the first address is cache memory address 0.

4. The computing system according to claim 3, wherein the circuit for cleaning includes a circuit for cleaning all lines including the last address of the cache memory to the latest value.

7. The computing system according to claim 6 , wherein the last address is the highest address of the cache memory.

A method of operating a computing system including a cache memory having a predetermined number of cache lines, comprising:
First, for a plurality of write addresses, writing data to a location in the cache memory corresponding to each of the plurality of write addresses;
Second, cleaning the selected number of lines in the cache memory, wherein the number of selected lines is less than the predetermined number of cache lines. Including
For each of the selected number of lines, the cleaning step comprises:
Evaluating a dirty indicator corresponding to the data in the line;
Copying the data from the line to another memory if the dirty indicator indicates that the data in the line is dirty;
Including
Further, the cleaning step includes
Holding a value in the first address indicator;
Holding a value in the second address indicator;
According to each of the plurality of write addresses, the index indicating the position in the cache memory corresponding to the write address is compared with the value of the first address indicator, and the index is held in the first address indicator. If the value is greater than the value, the step of setting the value of the first address indicator to the same value as the index, and when the comparison and setting for a plurality of write addresses is completed, the value of the first address indicator The setting step indicating the latest value of 1;
In accordance with each of the plurality of write addresses, the index indicating the position in the cache memory corresponding to the write address is compared with the value of the second address indicator, and the index is held in the second address indicator. And setting the second address indicator value to the same value as the index when the value is smaller than the value, and when the comparison and setting for a plurality of write addresses is completed, the second address indicator value is The setting step indicating the latest value of 2;
Including
The number of selected lines depends on the first and second latest values;
Method.

The method of claim 8, wherein the step of cleaning, the method comprising cleaning all lines including from the second most recent value of the cache memory to the first most recent value.

A computing system,
A cache memory having a predetermined number of cache lines;
A circuit for writing data to a location in the cache memory corresponding to each of the plurality of write addresses in response to the plurality of write addresses;
A circuit for cleaning a selected number of lines of the cache memory, wherein the number of selected lines is less than a predetermined number of cache lines and for each of the selected number of lines; And
A circuit for evaluating a dirty indicator corresponding to the data in the line;
A circuit for copying data from the line to another memory when the dirty indicator indicates that the data in the line is dirty;
A circuit for cleaning, comprising:
Including
Furthermore, the circuit for cleaning is
A first address indicator holding a value;
A second address indicator holding a value;
According to each of the plurality of write addresses, the index indicating the position in the cache memory corresponding to the write address is compared with the value of the first address indicator, and the index is held in the first address indicator. A circuit for setting the value of the first address indicator to the same value as the index when the value is larger than the value, and when the comparison and setting for a plurality of write addresses are completed, the value of the first address indicator The setting circuit in which indicates the first latest value;
In accordance with each of the plurality of write addresses, the index indicating the position in the cache memory corresponding to the write address is compared with the value of the second address indicator, and the index is held in the second address indicator. A circuit that sets the value of the second address indicator to the same value as the index when the value is smaller than the value, and when the comparison and setting for a plurality of write addresses is completed, the value of the second address indicator The setting circuit indicating the latest value of 2;
Including
The number of selected lines depends on the first and second latest values;
Calculation system.

In computing system of claim 10, the circuit for the cleaning includes circuitry for cleaning all lines including the second most recent value of the cache memory to the first most recent value, calculated system.