JPH05508497A

JPH05508497A - Method and apparatus for non-sequential source access

Info

Publication number: JPH05508497A
Application number: JP3514424A
Authority: JP
Inventors: ウイルソン，ジミー　アール．; ビアード，ダグラス　アール．; チェン，スティーブ　エス．; エッカート，ロジャー　イー．; ヘッセル，リチャード　イー．; フェルプス，アンドルー　イー．; シルベイ，アレクサンダー　エイ．; バンダウォーン，ブライアン　ディー．
Original assignee: クレイ、リサーチ、インコーポレーテッド
Priority date: 1990-06-11
Filing date: 1991-06-10
Publication date: 1993-11-25
Also published as: US5208914A; AU8447491A; WO1991020038A1

Abstract

(57)【要約】本公報は電子出願前の出願データであるため要約のデータは記録されません。 (57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】非順次資源アクセスのための方法およびその装置技　術　分　野本発明は、コンピュータおよび電子論理システムのための記憶システムおよび記憶管理に関する。さらに詳しく言えば、本発明は、要求が発行された順序に対して順序通りでなく応答が返され得る、共用資源、特にメインメモリに対して要求を発行することができる非順次記憶アクセスのための方法およびその装置に関する。[Detailed description of the invention] Methods and device techniques for non-sequential resource access Field The present invention provides storage systems and recording for computers and electronic logic systems. Regarding storage management. More specifically, the present invention provides Requests to shared resources, especially main memory, that can result in out-of-order responses METHODS AND APPARATUS FOR NON-SEQUENTIAL MEMORY ACCESS Ru.

先　行　技　術共用ハードウェア資源、特にメインメモリにアクセスするための複数リクエスタシステムにおける従来の方法および装置は、そのシステムによるデータまたは命令に対する上書きまたは不正確なアクセスを防止するために、その共用資源に対する要求が相互に時間順序的なまたは整列された関係を維持することを必要としている。実際、共用資源に対する乱順アクセスの可能性は、通常、データおよび／または命令への不正確なアクセスとみなされ、従来技術では記憶アクセスハザードと呼ばれている。Forward technique Multiple requesters for accessing shared hardware resources, especially main memory Conventional methods and apparatus in a system are to prevent overwriting or inaccurate access to the shared resources. requirements that maintain a chronological or ordered relationship with each other. ing. In fact, the possibility of out-of-order access to shared resources is typically or considered as an incorrect access to an instruction, and in the prior art, the memory access hazard It is called a code.

従来技術において、多重プロセッサシステムにおける共用資源へのアクセスを制御するために使用される一つの技法は、その共用資源への要求を大域的決定アルゴリズムによって取り扱う中央制御機構を維持することである。その多重プロセッサシステムが物理的に小さいシステムで少数のりクエスタおよび少数の共用資源しか有していない場合、この方法は有効である。多数のりクエスタおよび多数の共用資源を備えたもっと大規模な多重プロセッサシステムでは、大域的決定アルゴリズムを用いた中央ｍａｙ方式は、扱いにくくなり、その決定時間がその多重プロセッサの全処理性能に影響を及ぼし始める。In the prior art, access to shared resources in multiprocessor systems is restricted. One technique used to control the demand for shared resources is to use a global decision algorithm to The goal is to maintain a central control mechanism that is handled by algorithms. Its multiple processes If the server system is physically small and has a small number of clusters and a small number of shared resources, This method is effective if you only have a source. Many Nori Questa and many In larger multiprocessor systems with shared resources of A central method using algorithms can be cumbersome and the decision time can be It begins to affect the overall processing performance of heavy processors.

別の従来技術の技法は、以前の決定がクリアされている場合にのみ所定の決定がなされるように、各種決定を時間タグによりインタロックすることである。この方式に伴う問題は、それらの時間タグを更新するシステム内を信号が伝わるための過渡時間が妨げとなり、やはり、全システム性能が不利に影響を受けるということである。Another prior art technique is to make a given decision only if the previous decision has been cleared. Interlocking various decisions with time tags as they occur. this The problem with this method is that signals travel through the system that update those time tags. The transient time of That's true.

本質的に、資源ロックアウトおよび共用資源アクセスの問題は、従来技術のスーパコンピュータでは、各資源をその多重プロセッサシステム全体で順次的にスケジュールする中央＃ｉ１機構を利用することによって管理されている。この方式は、システム性能を犠牲にして、記憶アクセスハザードの問題を効果的に回避する。Essentially, the problem of resource lockout and shared resource access is In a computer, each resource is scheduled sequentially across its multiprocessor system. is managed by utilizing a central #i1 mechanism that processes This method effectively avoids the memory access hazard problem at the expense of system performance. Ru.

多重プロセッサシステムでの共用資源にアクセスするための新規な方法およびシステムを設計する際に、考慮すべき４つの問題が存在する。第１の問題は、その共用資源のスルーブツトをいかにして最大にするかである。A novel method and system for accessing shared resources in multiprocessor systems There are four issues to consider when designing a stem. The first problem is that The question is how to maximize the throughput of shared resources.

第２の問題は、そのプロセッサと共用資源との間のバンド幅をいかにして最大にするかである。第３の問題は、プロセッサと共用資源との間のアクセス時間をいかにして最小にするかである。最後の問題は、資源要求について予測可能な結果が得られるように、いかにしてアクセスハザードを回避するかである。コンピュータ処理システムがこれらの問題の全部に対する最適な解決を得られない場合、そのシステムの性能は事実上限定される。The second problem is how to maximize the bandwidth between the processor and the shared resources. It's up to you. The third problem is the access time between processors and shared resources. The question is how to minimize it. The final issue is that there are predictable outcomes for resource requirements. The problem is how to avoid access hazards so that access can be achieved. computer If the data processing system does not provide optimal solutions to all of these problems, The performance of that system is effectively limited.

例えば、メインメモリに対して３つの要求を行ったプロセッサの問題を考慮する。各要求は、結果として３つの個別の論理機構によって処理されることになるメインメモリの異なる区分に向けられるものとする。従来技術では、これらの要求のそれぞれは連続している必要があり、以降の要求は、先行する要求が完了するまで開始することができない。この３つの要求の例では、順次アクセスのその要求条件は、事実上、そのメインメモリに関係する論理の２７３がアイドル状態となる結果をもたらす。この制限は、バイブライン化動作として行われるように作業の安定した流れを処理要素および共用資源に付与することによってその処理要素および共用資源を継続的にビジー状態に保つことをその目的とする、高性能システムにおいて特に大きな打撃となる。For example, consider the problem of a processor making three requests to main memory. . Each request results in a message being processed by three separate logical mechanisms. shall be directed to different partitions of in-memory. With conventional technology, these requirements Each of the must be consecutive, with subsequent requests completing the preceding request. cannot be started until In this example of three requests, the requirements for sequential access are The requirement is that the logic 273 related to the main memory is in an idle state. bring about results. This restriction is designed to be done as a vibrating operation. Processing requirements are reduced by providing a steady flow of work to processing elements and shared resources. A high-performance system whose purpose is to keep primary and shared resources continuously busy. This is especially hard on the stem.

しばしばスーパコンピュータと称される、そうした高性能コンピュータ処理システムの処理速度および柔軟性を向上させる努力において、本発明に対し先に提出された特許出願である、’Ｃ１ｕｓｔｅｒ　Ａｒｃｈｉｔｅｃｔｕｒｅ　ｆｏｒ　ａ　Ｈｉｇｈｌｙ　Ｐａｒａｌｌｅｌ　５ｃａｌａｒ／Ｖｅｃｔｏｒ　Ｍｕｌｔｉｐｒ。Such high-performance computer processing systems, often referred to as supercomputers, In an effort to improve the processing speed and flexibility of 'C1uster Architecture for a Highly Parallel 5 calar/Vector Mul tipr.

ｃｅｓｓｏｒ　Ｓｙｓｔｅｍ“と題する、ＰＣＴ出願番号ＰＣＴ／ＵＳ９０１０７６６５は、複数のプロセッサおよび外部インタフェースが、メインメモリ、大域レジスタまたは割り込み機構といった共用資源の共通な集合に対して複数かつ同時の要求を行うことができる、スーパコンピュータ用アーキテクチャを提供している。各資源をその多重プロセッサシステム全体で順次的にスケジュールするために中央制御機構を利用するこの従来技術の技法は、この形式の高度並列多重プロセッサ用クラスタアーキテクチャにとっては許容できない。従って、全部の共用資源にわたる全部のりクエスタに対して等しくかつ民主的なアクセスを保証し、各共用資源が独立した速度でしかもアクセスハザードを回避するように同時的にデータを処理できるようにすることによって、多重プロセッサシステムにおける性能を向上させるために、記憶アクセスのための新規な方法および装置が必要とされ本発明は、複数リクエスタシステムにおける共用資源への非順次アクセスのための方法および装置を提供する。PCT Application No. PCT/US9010 entitled “cessor System” The 7665 has multiple processors and external interfaces, main memory, large multiple and Provides an architecture for supercomputers that can perform simultaneous requests. ing. schedule each resource sequentially across its multiprocessor system This prior art technique, which utilizes a central control mechanism for This is unacceptable for cluster architectures for processors. Therefore, all Guarantees equal and democratic access to all access points across shared resources simultaneously so that each shared resource has independent speed and avoids access hazards. multiprocessor systems by allowing data to be processed atomically. New methods and devices for memory access are needed to improve memory access performance. Summary of the Invention The present invention provides non-sequential access to shared resources in multiple requester systems. provides a method and apparatus for

これを実現するために、本発明は、データをその宛先で効果的に再順序づけるために各種のタグを使用する。最も単純な形態では、このタグは、方向情報に関する別のタグを位置づけるためのバッファ内の場所へ、または、そのタグに関係する応答を発するためのバッファまたはプロセッサ（レジスタ）内の場所へ、スイッチング論理を方向づける。例えば、メモリからデータをロードするには、そのリクエスタが、要求信号、アドレスおよび要求タグを付与することが必要になる。要求信号は、そのアドレスおよび要求タグの妥当性を検査する。アドレスは、その要求されたデータのメモリ内の記憶場所を指定する。要求タグは、データがそのプロセッサへ返された時にそのデータを入れるための場所を指定する。To achieve this, the present invention provides a method for effectively reordering data at its destination. Use various tags for this purpose. In its simplest form, this tag concerns directional information. to a location in the buffer to position another tag, or related to that tag. switch to a location in a buffer or processor (register) to issue a response. Orient the changing logic. For example, to load data from memory, its Requires requester to provide request signal, address and request tag . The request signal checks the validity of its address and request tag. The address is Specifies the location in memory of the requested data. The request tag indicates that the data Specifies where to put the data when it is returned to the processor.

本発明に従った非順次共用資源アクセス用装置は、同じくその多重プロセッサシステムの共用資源にアクセスすることができる複数のプロセッサおよび複数の入出力インタフェースを有する多重プロセッサシステムにおいて使用される。好適な実施例における共用資源は、メインメモリ、大域レジスタおよび割り込み機構を含む。１クラスタの密結合プロセッサ内での非順次アクセスのために、本発明は、そのプロセッサからの複数の資源要求を生成するための要求生成資源と、その要求生成資源に動作可能に接続されており、それらの資源要求が生成された時間順序で資源要求を受信し、その資源要求を共用資源へ経路指定するためのスイッチング手段と、その要求された資源が使用可能になるとその資源要求にサービスする共用資源手段とを含む。各資源要求は、要求された共用資源のアドレス、および、その資源要求が返されるべきリクエスタ内の記憶場所を指定する要求タグを含む。The device for non-sequential shared resource access according to the invention also comprises a multi-processor system. Multiple processors and multiple inputs that can access the system's shared resources Used in multiprocessor systems with output interfaces. suitable Shared resources in some embodiments include main memory, global registers, and interrupt facilities. including. For non-sequential access within a cluster of tightly coupled processors, the present invention is a request generation resource for generating multiple resource requests from that processor, and its is operably connected to the request-generating resources of the resource when those resource requests are generated. A switch for receiving resource requests in inter-order order and routing the resource requests to shared resources. and servicing the resource request when the requested resource becomes available. shared resource means for Each resource request includes the address of the requested shared resource, and a request tag that specifies the storage location within the requester where the resource request should be returned. Including.

スイッチング手段に関係するスイッチング論理は、その資源要求に関係づけられた要求タグを格納するためのタグ待ち行列と、そのタグ待ち行列からの個々の要求タグを資源応答に関係づけるための論理手段と、その資源応答および個々の要求タグをプロセッサへ返すための手段とを含む。共用資源に関係するスイッチング論理は、共用資源との間で要求を経路指定するためのスイッチング手段と、要求を正しく経路指定するための制御論理と、複数の決定要求を取り扱うための論理と、要求されている最終データエンティティを格納または検索するための論理とを含む。The switching logic associated with the switching means is related to its resource requirements. A tag queue for storing requested request tags and individual requests from that tag queue. A logical means for relating request tags to resource responses, and how those resource responses and individual requests can be and means for returning the requested tag to the processor. Switching involving shared resources The switching logic provides the switching means to route requests to and from shared resources and Control logic for correctly routing requests and logic for handling multiple decision requests. the logic for storing or retrieving the final data entity requested. including.

他の実施例において、本発明はまた、プロセッサの密結合クラスタの外部にある共用資源に対する非順次アクセスも可能にする。この実施例では、クラスタタグと称する新しいタグを生成するために要求タグに付加的な経路指定情報を付属させるために、遠隔クラスタアダプタが論理手段に備わる。このクラスタタグは、目標クラスタの遠隔クラスタアダプタに渡され、そこで、その要求から取り外され、タグバッファに格納される。目標クラスタの内部で使用されるために新しい要求タグが生成される。その応答が目標クラスタの遠隔クラスタアダプタに返されると、返された要求タグは、タグバッファ内でそれに関係する対応したクラスタタグを位置づけるために使用される。その応答およびクラスタタグはその後、要求側クラスタに返される。要求側クラスタの遠隔クラスタアダプタで、そのクラスタタグは、付加的な戻り経路指定情報および要求タグの各部に分解される。In other embodiments, the present invention also provides for It also allows non-sequential access to shared resources. In this example, the cluster tag Attach additional routing information to the request tag to generate a new tag called A remote cluster adapter is provided in the logical means to enable the clustering. This cluster tag is passed to the target cluster's remote cluster adapter, where it is removed from its request. and stored in the tag buffer. new to be used internally in the target cluster A request tag is generated. The response is sent back to the target cluster's remote cluster adapter. , the returned request tag will have its associated corresponding class in the tag buffer. Used to position tags. That response and cluster tag is then Returned to the requesting cluster. The requesting cluster's remote cluster adapter The raster tag is broken down into additional return routing information and request tag parts.

付加的な戻り経路指定部は、その応答および要求タグ部を要求側スイッチング手段に返すために使用される。The additional return routing section passes the response and request tag sections to the requesting switching Used to return to the stage.

このシステムの特殊な帰結は、制御が局所的に取り扱われ、共用資源に対するアクセスに関する決定が迅速に、かつ、その資源の十分な利用度を保証するために必要な時間にのみ行われるということである。極めて高いシステムバンド幅と結合された可能な最高のシステムスルーブツトを維持することによって、本発明は、リクエスタが最小限のアクセス時間で処理するデータを安定して供給されることを保証し、それにより、システムバンド幅およびスルーブツトパラメータの所定の集合について多重プロセッサシステムの全性能を向上させる。A special consequence of this system is that control is handled locally and access to shared resources is to ensure that access decisions are made quickly and that resources are fully utilized. This means that it is done only when necessary. Extremely high system bandwidth and By maintaining the highest possible system throughput combined, the present invention achieves , the requester is provided with a steady supply of data to process with minimal access time. system bandwidth and throughput parameters. improve the overall performance of a multiprocessor system for a given set.

本発明の目的は、要求が発行された時間順序に対してその応答が乱順で返され得る、共用資源に要求を発行することができる非順次記憶アクセスのための方法および装置を供することである。It is an object of the invention that the responses may be returned out of order with respect to the chronological order in which the requests were issued. methods and methods for non-sequential storage access that can issue requests to shared resources. and equipment.

本発明の第２の目的は、共用資源システムの各構成要素が同時並行的に、かつ、潜在的に異なる速度で動作できることを保証することによって、インタリーブト共用資源システムにおける性能を向上させることができる、記憶アクセスのための方法および装置を供することである。A second object of the present invention is to enable each component of the shared resource system to simultaneously and Interleave by ensuring that it can operate at potentially different speeds. For memory access that can improve performance in shared resource systems An object of the present invention is to provide a method and apparatus for the invention.

本発明の第３の目的は、その多重プロセッサシステムが多数のりクエスタおよび多数の共用資源によって機器構成されている場合に、高バンド幅、高スルーブツトおよび低待ち時間を付与する、多重プロセッサシステム用記憶アクセスシステムのための方法および装置を供することである。A third object of the present invention is that the multiprocessor system High bandwidth, high throughput when configured with many shared resources A storage access system for multiprocessor systems that provides high performance and low latency. The object of the present invention is to provide a method and apparatus for the system.

本発明の上述その他の目的は、図面、好適な実施例の詳細な説明および添付された請求の範囲によって明白となるであろう。The foregoing and other objects of the invention will be further appreciated from the drawings, detailed description of the preferred embodiments, and the accompanying drawings. This will become clear from the following claims.

図面の説明図１ａ、ｌｂ、ｌｃおよび１ｄは、傘来技術および本発明のバイブライン化要求／応答記憶アクセス技法の説明図である。Drawing description Figures 1a, lb, lc and 1d show the conventional technology and the viblining requirements of the present invention. FIG. 2 is an explanatory diagram of a /response storage access technique.

図２は、本発明の好適な実施例の単一の多重プロセッサクラスタのブロック図である。FIG. 2 is a block diagram of a single multiprocessor cluster of a preferred embodiment of the present invention. be.

図３８および３ｂは、本発明の好適な実施例の４クラスタ実施例のブロック図である。38 and 3b are block diagrams of a four-cluster implementation of the preferred embodiment of the present invention. be.

図４は、好適な実施例のアービトレーションノード手段を示した単一の多重プロセッサクラスタのブロック図である。FIG. 4 shows a single multi-processor illustrating the arbitration node implementation of the preferred embodiment. FIG. 2 is a block diagram of a processor cluster.

図５ａおよび５ｂは、本発明の好適な実施例における人出力インタフェースの詳細ブロック図である。Figures 5a and 5b show details of the human output interface in a preferred embodiment of the invention. It is a detailed block diagram.

図６ａおよび６ｂは、プロセッサ内の入出力インタフェースに関係するボートの詳細ブロック図である。Figures 6a and 6b illustrate the ports involved in the input/output interface within the processor. FIG. 3 is a detailed block diagram.

図７ａ、７ｂ、７ｃ、７ｄおよび７ｅは、本発明の各種要求タグの説明図である。7a, 7b, 7c, 7d and 7e are illustrations of various request tags of the present invention. .

図８は、外部インタフェースボートのブロック図である。FIG. 8 is a block diagram of the external interface boat.

図９は、要求タグとコマンドブロックワードとの間の対応を示す。FIG. 9 shows the correspondence between request tags and command block words.

図１Ｏは、本発明の好適な実施例におけるＮＲＣＡ手段の詳細ブロック図である。FIG. 1O is a detailed block diagram of the NRCA means in a preferred embodiment of the invention. .

図１１ｇおよび１１ｂは、本発明の好適な実施例におけるＭＲＣＡ手段の詳細ブロック図である。Figures 11g and 11b show detailed blocks of the MRCA means in a preferred embodiment of the invention. It is a lock diagram.

好適な実施例の説明まず、図１ａ〜１ｄによって、従来技術と比較して、本発明のバイブライン化乱順アクセス機構について説明する。これらの図は、メモリ／共用資源アーキテクチャの各レベルでの要求／応答動作に適用可能である。多重りクエスタシステムにおいて一般にアクセスされる共用資源はメモリであることが理解されるはずなので、本発明の好適な実施例もメモリへのアクセスに関して説明スることになるが、本発明は、そのアクセスはいずれの形式の共用ハードウェア資源に対して行われるものと想定している。この意味で、共用ハードウェア資源には、メモリ、大域レジスタ、割り込み機構の他、ボート、経路、機能単位、レジスタ、待ち行列、バンクなどを含む。DESCRIPTION OF THE PREFERRED EMBODIMENT First, FIGS. 1a to 1d show that the present invention's vibrating disorder is compared with the prior art. The sequential access mechanism will be explained. These diagrams illustrate the memory/shared resource architecture. It is applicable to request/response operations at each level of the architecture. Multilayer Questa System It should be understood that memory is a commonly accessed shared resource in Therefore, the preferred embodiment of the present invention will also be described in terms of access to memory. However, the present invention does not require that access be made to any form of shared hardware resource. It is assumed that this will occur. In this sense, shared hardware resources include memory, In addition to global registers and interrupt mechanisms, boats, routes, functional units, registers, and wait lines Including columns, banks, etc.

図１８は、作来技術のシステムにおいて、一連の要求および応答の流れがどのように取り扱われるがを示している。乱順アクセスまたはストリーム化の機能がまったくないので、連続した要求はそれぞれ、その次の要求が開始できるまで、その関係する応答が完了するのを待たなければならない。図１ｂについて言えば、一部の従来技術のベクトルプロセッサは、各応答が返されるのを待つ必要がなく、ベクトルレジスタのロードまたは書き込みを行う連続した要求を発する能力を支援している。図１ｂに示すそうした限られたバイブライン化技法は、メインメモリにアクセスするベクトルプロセッサに適用されているが、他のシステム資源には適用されていない。Figure 18 shows how a series of requests and responses flow in a conventional system. It shows that it is treated like a sea urchin. Random access or streaming functionality is now available. each successive request waits until the next request can start. must wait for the related responses to complete. Regarding Figure 1b, Some prior art vector processors do not have to wait for each response to be returned. , the ability to issue successive requests to load or write vector registers. We are supporting. Such a limited vibelining technique, shown in Figure 1b, applied to vector processors that access memory, but other system resources is not applied.

対照的に、図１ｃは、全部の要求およびそれらの関係する応答が時間順序になっているが、応答１は要求ｎが発行される前に返され得る、インタリーブされた系列の要求および応答を示している。図１ｄでは、本発明の乱順アクセス機構の全能力が、要求およびその関係する応答が時間順序に対するいかなる特殊な関係も伴わずに生起していることによって例示されている。この図に示された記憶アクセスシステムでは、応答２は、応答１より前に返され得る。図１ｄに図示されたバイブライン技法は、従来技術においては適用されていない。In contrast, Figure 1c shows that all requests and their related responses are in time order. but response 1 can be returned before request n is issued, in an interleaved system. Showing column requests and responses. In Fig. 1d, the entire random order access mechanism of the present invention is shown. The ability to ensure that requests and their associated responses do not have any special relationship to temporal order. This is exemplified by the fact that it occurs unaccompanied. The storage access shown in this diagram In a process system, response 2 may be returned before response 1. Illustrated in Figure 1d The Vibrine technique has not been applied in the prior art.

好適な実施例を説明する上で、以下では、多重プロセッサシステムの好適な実施例の説明から始め、次に、各種リクエスタおよびそれらの関係するボートの説明に始まって、その多重プロセッサシステムの共用資源へ話を進め、非順次アクセスのための方法および装置について説明する。In describing the preferred embodiment, the following describes a preferred implementation of a multiprocessor system. Start with an explanation of the example, then a description of the various requesters and their related boats. Starting with the shared resources of the multiprocessor system, we discuss A method and apparatus for this purpose are described.

[Multiprocessor system]

図２によって、本発明とともに用いられる多重プロセッサシステムの好適な実施例の単一の多重プロセッサクラスタのアーキテクチャについて説明する。高度並列スカシ／多重プロセッサシステム用ステム用のこの好ましいクラスタアーキテクチャは、共用資源の大規模な集合１２（メインメモリ１４、大域レジスタ１６、割り込み機構１８など）を共用する複数の高速プロセッサ１０を支援することができる。プロセッサ１０は、ベクトルおよびスカラ両方の並列処理が可能で、アービトレーションノード手段２０を介して共用資源１２に接続されている。また、アービトレーションノード手段２０を介して、複数の外部インタフェースポート２２および入出力コンセントレータ（ＩＯＣ）２４が接続されており、それらはさらに様々な外部データ送信装置２６と接続されている。これらの外部データ送信装置２６は、高速チャネル３０によって入出力コンセントレータ２４に連結された二次記憶システム（ＳＭＳ）２８を含むことができる。 FIG. 2 shows a preferred implementation of a multiprocessor system for use with the present invention. An example single multiprocessor cluster architecture is described. Altitude level This preferred cluster architecture for column cluster/multiprocessor systems A architecture is a large set of shared resources 12 (main memory 14, global registers 16 , interrupt mechanism 18, etc.). Can be done. The processor 10 is capable of both vector and scalar parallel processing, It is connected to the shared resource 12 via arbitration node means 20. Ma In addition, a plurality of external interface points are connected via the arbitration node means 20. 22 and an input/output concentrator (IOC) 24 are connected to it. These are further connected to various external data transmitting devices 26. These external data The data transmitter 26 is connected to the input/output concentrator 24 by a high speed channel 30. A secondary storage system (SMS) 28 may be included.

外部データ送信装置２６はまた、１つ以上の標準チャネル３４によって入出力コンセントレータ２４に連結された他の各種周辺装置およびインタフェース３２を含むことができる。これらの周辺装置およびインタフェース３２は、ディスク記憶装置、テープ記憶装置、プリンタ、外部プロセッサおよび通信ネットワークを含むことができる。プロセッサ１０、共用資源１２、アービトレーションノード２０および外部インタフェースポート２２は、一体として、本発明の好適な実施例に従った高度並列多重プロセッサシステム用の単一の多重プロセッサクラスタ４０を構成する。External data transmitter 26 also provides input/output control via one or more standard channels 34. various other peripheral devices and interfaces 32 connected to the center 24. can be included. These peripherals and interfaces 32 storage devices, tape storage devices, printers, external processors, and communication networks. can be included. Processor 10, shared resource 12, arbitration node 20 and external interface port 22 together form a preferred implementation of the present invention. Single multiprocessor cluster for highly parallel multiprocessor systems according to example 40.

多重プロセッサクラスタ４０の好適な実施例は、プロセッサ１０、共用資源１２、アービトレーションノード２０および外部インタフェースポート２２を１つ以上のクラスタ４０に物理的に編成することによって、現在の共用記憶スーパコンピュータの直接接続インタフェースの問題を克服する。図３ａおよび３ｂに示す好適な実施例ではＪ４０ａ、４０ｂ、４０ｃおよび４０ｄの４つのクラスタが存在する。これらのクラスタ４０ａ、４０ｂ。A preferred embodiment of multi-processor cluster 40 includes processors 10, shared resources 12 , one or more arbitration nodes 20 and external interface ports 22. By physically organizing into clusters 40 on Overcoming problems with computer direct connect interfaces. Shown in Figures 3a and 3b In the preferred embodiment there are four clusters: J40a, 40b, 40c and 40d. Exists. These clusters 40a, 40b.

４０ｃおよび４０ｄのそれぞれは、そのクラスタに関係づけられた自己自身のプロセッサ１０”＋　１０　ｂ。40c and 40d each have their own cluster associated with that cluster. Processor 10” + 10b.

１０ｃおよび１０ｄ、共用資源１２ａ、１２ｂ、１２ｃおよび１２ｄ１および、外部インタフェースポート２２　ａ、　２２　ｂ、　２２　ｃおよび２２ｄの集合を物理的に有する。クラスタ４０　ａ、　４０　ｂ、　４０　ｃおよび４０ｄは、各アービトレーションノード手段２０ａ、２０ｂ。10c and 10d, shared resources 12a, 12b, 12c and 12d1, and Collection of external interface ports 22a, 22b, 22c and 22d have a physical presence. Clusters 40a, 40b, 40c and 40d are each arbitration node means 20a, 20b.

２０ｃおよび２０ｄの論理部分である遠隔クラスタアダプタ４２によって相互接続されている。クラスタ４０ａ。20c and 20d are interconnected by remote cluster adapter 42, which is a logical part of 20c and 20d. It is continued. Cluster 40a.

４０ｂ、４０ｃおよび４０ｄは物理的に分離されているが、これらのクラスタの論理的編成および遠隔クラスタアダプタ４２による物理的相互接続は、クラスタ４０ａ。Although 40b, 40c and 40d are physically separated, the The logical organization and physical interconnection by remote cluster adapters 42 40a.

４０ｂ、４０Ｃおよび４０ｄの全部にわたる共用資源１２ｇ、１２ｂ、１２ｃおよび１２ｄの全部に対する所望の対称的アクセスを可能にする。Shared resources 12g, 12b, 12c and 40b, 40c and 40d and 12d.

次に図４によって、単一のクラスタ４０のアービトレーションノード手段２０の好適な実施例について説明する。概念的なレベルでは、アービトレーションノード手段２０は、プロセッサ１０および外部インタフェースポート２２を、同一のクラスタ４０内の共用資源１２へ、および、遠隔クラスタアダプタ４２を通じて他のクラスタ４０内の共用資源１２へ、対称的に相互接続させる、複数のクロスバ−スイッチを含む。通常、フルクロスバ−スイッチは、各リクエスタが各資源に接続することを可能にするはずである。本発明では、アービトレーションノード手段２０は、資源より多数のりクエスタが存在する状況において、フルクロスバ−スイッチと同様の結果を得ることを可能にする。好適な実施例では、アービトレーションノード手段２０は、１６個のアービトレーションノード４４および遠隔クラスタアダプタ手段４２を含む。遠隔クラスタアダプタ手段４２は、ノード遠隔クラスタアダプタ（Ｎ　ＲＣＡ）手段４６およびメモリ遠隔クラスタアダプタ（ＭＲＣＡ）手段４８に分割される。Referring now to FIG. 4, the arbitration node means 20 of a single cluster 40 A preferred embodiment will be described. At a conceptual level, arbitration no The code means 20 connects the processor 10 and the external interface port 22 to the same to shared resources 12 within cluster 40 and through remote cluster adapter 42 A plurality of cross-connections symmetrically interconnecting shared resources 12 in other clusters 40. Including bar switch. Normally, a full crossbar switch allows each requester to should allow you to connect to. In the present invention, the arbitration no. The code means 20 performs a full cross operation in a situation where there are more number questers than resources. It makes it possible to obtain similar results with a bar switch. In the preferred embodiment, the arbitrator The arbitration node means 20 includes 16 arbitration nodes 44 and Includes remote cluster adapter means 42. The remote cluster adapter means 42 memory remote cluster adapter (NRCA) means 46 and memory remote cluster adapter (NRCA) means 46; (MRCA) means 48.

ＮＲＣＡ手段４６は、アービトレーションノード４４が他の全部の多重プロセッサクラスタ４０の遠隔クラスタアダプタ手段４２にアクセスできるようにする。The NRCA means 46 allows the arbitration node 44 to The remote cluster adapter means 42 of the server cluster 40 can be accessed.

同様に、ＭＲＣＡ手段４８は、他の全部の多重プロセッサクラスタ４０の遠隔クラスタアダプタ手段４２からのそのクラスタ４０の共用資源１２に対するアクセスを制御する。Similarly, the MRCA means 48 controls the remote clocks of all other multiprocessor clusters 40. Access to the shared resources 12 of that cluster 40 from the raster adapter means 42 control the

この実施例では、その１６個のアービトレーションノード４４は、３２個のプロセッサ１０および３２個の外部インタフェースポート２２をメインメモリ１４、大域レジスタ１６および割り込み機構１８ならびにＮＲＣＡ手段４６と相互接続させている。各アービトレーションノード４４は、８本の双方向並列経路５０によってメインメモリ１４と接続されている。単一の並列双方向経路５２は、各アービトレーションノード４４をＮＲＣＡ手段４６に接続させている。好適な実施例では、各アービトレーションノード４４からの同じ経路５２も、アービトレーションノード４４を大域レジスタ１６および割り込み機構１８に接続させるために使用されているが、この相互接続を実施するために個別の経路が使用できることは理解されるであろう。In this example, the 16 arbitration nodes 44 have 32 arbitration nodes. The processor 10 and 32 external interface ports 22 are connected to the main memory 14, Interconnect with global register 16 and interrupt mechanism 18 and NRCA means 46 I'm letting you do it. Each arbitration node 44 has eight bidirectional parallel paths 50. Therefore, it is connected to the main memory 14. A single parallel bidirectional path 52 connects each The arbitration node 44 is connected to NRCA means 46. good practice In the example, the same route 52 from each arbitration node 44 is also connection node 44 to global register 16 and interrupt mechanism 18; is used in the It will be understood that

各アービトレーションノード４４と同様に、ＭＲＣＡ手段４８は、８本の双方向並列経路５４によってメインメモリ１４と接続されている。同様に、単一の並列双方向経路５６は、ＭＲＣＡ手段４８を大域レジスタ１６および割り込み機構１８に接続させている。好適な実施例では、合計６本の並列双方向経路５８が、クラスタ４０を相互接続するために使用されている。例えば、クラスタ４０ｇは、各クラスタ４０ｂ、４０ｃおよび４０ｄと接続する２本の経路５８を有する。このようにして、ＭＲＣＡ手段４８は、他のクラスタ４０が、そのクラスタ４０の共用資源１２への直接アクセスを行えるようにする。As with each arbitration node 44, the MRCA means 48 has eight bidirectional It is connected to the main memory 14 by a parallel path 54 . Similarly, a single parallel Bidirectional path 56 connects MRCA means 48 to global register 16 and interrupt mechanism 1. It is connected to 8. In the preferred embodiment, a total of six parallel bidirectional paths 58 are connected to the clock. It is used to interconnect rasters 40. For example, cluster 40g is It has two paths 58 connecting each cluster 40b, 40c and 40d. child In this way, the MRCA means 48 detects whether another cluster 40 of that cluster 40 Allows direct access to shared resources 12.

図５８および５ｂに示すように、メモリポート３１０のそれぞれ、ＮＲＣＡボート３１２、および、プロセッサポート３１４．３１．５，３１６および３１７のそれぞれのためのアービトレーションネットワーク３０３および３０６は、アービトレーションノード４４を含む。また、アービトレーションノード４４には、入力ボート待ち行列３０１、クロスバー３０２および３０７、タグ待ち行列３０４およびデータ待ち行列３０５が含まれる。As shown in Figures 58 and 5b, each of the memory ports 310 ports 312, and processor ports 314.31.5, 316 and 317. Arbitration networks 303 and 306 for each A bitration node 44 is included. In addition, the arbitration node 44 has the following information: Input boat queue 301, crossbars 302 and 307, tag queue 30 4 and a data queue 305.

本願で詳細に説明するように、アービトレーションネットワーク３０３および３０６は、最旧の参照が最初に処理されるようにするために、先着順サービス複数リクエスタトグル方式を使用している。同一経過時間の複数の旧参照の場合、公平アルゴリズムが、そのアービトレーションネットワーク３０３または３０６によってそれぞれ制御されるポート３１０および３１２ならびに３１５゜３１６および３１７への等しいアクセスを保証する。Arbitration networks 303 and 3, as described in detail herein. 06 uses a first-come, first-served service to ensure that the oldest references are processed first. You are using the requester toggle method. In the case of multiple old references with the same age, the public The Taira algorithm applies to its arbitration network 303 or 306. Therefore, the ports 310 and 312 and 315, 316 and 316 controlled respectively and 317.

他のスーパコンピュータでは、メモリのリターンは、要求が送出された順序と同順で戻る。従って、データが戻る際にそのデータをどこに入れるかに関してはまったく曖昧さはないので、そのプロセッサのメモリリターン論理は単純である。In other supercomputers, memory returns are the same as the order in which the requests were sent. Return in order. Therefore, there are no issues as to where to put the data when it returns. Since there is no ambiguity, the processor's memory return logic is simple.

しかし、メモリのリターンを順序通りに制限することは、その順序づけの制約により共用資源がいずれかの順序の破約を明白に知るまで待たされることになるので、性能を犠牲にすることも意味し、それゆえ、並行活動の量を低減させることになる。早期の戻りは、その記憶システムの不均質な待ち時間のために、以前に要求されたリターンよりも短い待ち時間を伴って戻るようなリターンである。リターンが、要求が送出された順序と同順で戻るように制限されていない場合、それを保証するために、その記憶サブシステムはソート機構を付与しなければならない。これは、データを要求する複数のポートお誹びそのデータを返す複数の記憶部が存在する場合、相当の負担になる。However, restricting memory returns in order is Rather, the shared resource is forced to wait until it explicitly learns of the breaking of either order. , which also means sacrificing performance and therefore reducing the amount of concurrent activity. become. The early return is due to the heterogeneous latencies of that storage system, A return that returns with a shorter latency than the requested return. Li If the turn is not constrained to return in the same order that the requests were sent, then To guarantee this, the storage subsystem must provide a sorting mechanism. do not have. This allows multiple ports to request data and multiple ports to return that data. If a memory department exists, it will be a considerable burden.

本発明では、メモリのデータリターンは、その要求の順序に対して乱順て戻ることができる。要求が待ち行列に蓄積した時は常に、その応答は、要求がその待ち行列に最初に入れられた相対時間に関して乱順て返され得るというのが、その多重プロセッサ全体の待ち行列およびアービトレーションネットワークの特徴である。これは、早期の要求のデータが後期の要求のデータよりも後に戻ることがあるということを意味する。しかしながら、プロセッサにおける順序づけの制限のために、早期に着信したデータを使用することが可能ではない場合もある。In the present invention, memory data returns are returned out of order with respect to the order of the request. I can do it. Whenever a request accumulates in the queue, the response is The fact that the matrix can be returned in random order with respect to the relative time initially entered is that is a feature of heavy processor-wide queuing and arbitration networks. Ru. This means that data from earlier requests may return later than data from later requests. It means to be. However, due to ordering limitations in processors, Therefore, it may not be possible to use data that arrives early.

例えば、算術演算に伴うデータは、再現可能な結果が望まれる際には、元のプログラムで指定された同じ順序で使用されなければならない。従って、本発明では、メモリリターンデータは、それがすでに実際に使用できるかどうかにかかわらず、その最終宛先（ベクトルレジスタ、スカシレジスタ、Ｌレジスタ、命令キャッシュバッファまたは人出力バッファ）に入れられる。そのデータが使用できるかどうかは、以前に発行された要求の応答状態にもとづく。For example, data associated with arithmetic operations may be Must be used in the same order as specified in the gram. Therefore, in the present invention , the memory returns data regardless of whether it is already actually available or not. first, its final destination (vector register, space register, L register, instruction cache) output buffer or human output buffer). that data can be used The decision is based on the response status of previously issued requests.

本発明は、全部のシステム資源が、図１ｂ〜１ｄに示したようなバイブライン技法の全部を用いてアクセスされ得る、共用資源アクセスのための方法および装置を供することである。これを実現するために、本発明は、要求および応答を記録し、そのデータをその宛先で事実上再び順序づけるために、タグおよび待ち行列を使用する。The present invention provides that all system resources are Method and apparatus for shared resource access that can be accessed using the full scope of the law It is to provide. To achieve this, the invention records requests and responses. tags and queues to effectively reorder that data at its destination. use.

最も単純な形態では、タグは、方向情報用の別のタグを位置づけるためのそのバッファ内の場所、または、そのタグに関係する応答を入れるためのバッファまたはプロセッサ（レジスタ）内の場所を、その論理に知らせる。In its simplest form, a tag uses its base to position another tag for directional information. buffer or location in which to place responses related to that tag. tells its logic its location in the processor (register).

例えば、メモリからデータを要求するには、そのリクエスタが要求信号、アドレスおよび要求タグを付与することをめる。要求信号は、そのアドレスおよび要求タグの妥当性を検査する。アドレスは、メインメモリにおける応答データの位置を指定する。要求タグは、データがプロセッサに返される際にそのデータを入れる場所を指定する。その好適な実施例の説明を、共用資源にアクセスするための複数のポートを有するシステムにおいて各プロセッサを備えた多重プロセッサシステムの文脈で行ってきたが、本発明が、各プロセッサがメモリにアクセスするための単一のポートしか持たない、または、メモリにアクセスするための複数のポートを有する単一プロセッサシステムに対しても等しく適用可能であることは明らかであろう。For example, to request data from memory, the requester uses a request signal, an address request tags. The request signal indicates the address and request Check the validity of tags. The address is the location of the response data in main memory. Specify. Request tags include data when it is returned to the processor. Specify the location. A description of the preferred embodiment for accessing shared resources. A multiprocessor system with each processor in a system with multiple ports Although we have discussed this in the context of a system, the present invention allows each processor to access memory. have only a single port for accessing memory, or multiple ports for accessing memory. Equally applicable to uniprocessor systems with ports It should be obvious.

[Processor]

以下の節では、好適な実施例のプロセッサ１０が、スカシ、ベクトルおよび命令要求をどのようにして管理するかを詳細に説明する。４つのベクトルロードポート、１つのスカシロードボートおよび１つの命令ロードポートが存在する。各ポートは、乱順応答を発行するための異なる機構を有する。 In the following sections, the processor 10 of the preferred embodiment will be described as follows: Describe in detail how requests will be managed. 4 vector load ports There is one Skashi load port and one instruction load port. Each port The ports have different mechanisms for issuing random responses.

図６ａおよび６ｂについて説明すれば、ベクトルロードポート７２４，７２６，７２８および７３０は、要求信号、アドレス、および、そのデータがベクトルレジスタ７１８に向けられている記憶アクセス用要求タグを付与する。４つのベクトルロードポートが存在し、各ロードポートは、いずれかの所定の時間に２つの未決のベクトルロードを支援する。ベクトルロードポートは、特定のベクトルレジスタ７１８へのメモリリターンのための可変サイズ要求を行う。各要求群は、１〜６４個の個別の要求から構成することができる。各ポートへのそれら２つの可能な未決ベクトルロードのうち、一度に一方だけが要求を発行することができる。しかし、メモリリターンの非順次性のために、第２のロードに関するリターンの全部が第１のロードのリターンのいずれかより前に着信するということも想定できる。ベクトルロードポート制御機構７３２は、データが正しいレジスタに入れられ、そのレジスタの全部の既発行データが使用されるまではそのデータが使用されないように保証しなければならない。6a and 6b, vector load ports 724, 726, 728 and 730 are vector registers for request signals, addresses, and their data. A storage access request tag directed to register 718 is attached. four vectors load ports exist, and each load port has two load ports at any given time. Support pending vector loads. A vector load port is a vector load port that Makes a variable size request for memory return to register 718. Each request group is It can consist of 1 to 64 individual requests. those two to each port Only one of the possible pending vector loads can issue a request at a time. Ru. However, due to the non-sequential nature of memory returns, the return on the second load It is also possible that all of the loads arrive before any of the returns of the first load. Can be determined. The vector load port control mechanism 732 ensures that the data is in the correct register. until all published data in that register is used. It must be ensured that it is not used.

図７ａに示す通り、ベクトルポートからの各メモリリターンおよびアドレスに伴う要求タグは、そのメモリリターンの宛先を指示するのに十分な大きさである。As shown in Figure 7a, each memory return from the vector port and address The request tag is large enough to indicate the destination of the memory return.

すなわち、２つの可能なベクトルレジスタのうちのいずれにそのデータを入れるか、および、そのベクトルレジスタのどの記憶場所を使用するかである。要求が発行されると、ベクトルロードポート制御機構は、正しいタグがその要求およびアドレスに続くように保証する。そのベクトルタグの第１の構成要素は、宛先レジスタ識別子ビットである。好適な実施例では、そのタグの一部として宛先レジスタ番号を送信する代わりに、プロセッサは単に、２つのレジスタのうちの一方を指示する単一のピットを送信する。その特定のレジスタ番号は、命令発行時に確立され、ロード命令がポートに発行されるごとに異なることができる。タグには単一の宛先レジスタ識別子しか含まれていないので、いずれかの所定の時間には２つのロードだけが未決となり得る。一つのポート内でのタグ衝突（同一宛先レジスタビットの再使用）は、そのタグピットを再使用する前に、各レジスタの全部の応答が返るのを待つことによって防止される。さらに、各ポートからの連続したロードは、常に、異なる宛先レジスタタグピットを使用する。各ベクトルレジスタについて（しかし異なるベクトルロードポートにおいて）複数のローフトルレジスタロードが完了するのを待つことによって避けられる。i.e. put the data into which of two possible vector registers and which storage location of that vector register to use. request is Once issued, the vector load port control mechanism determines that the correct tag is Guaranteed to follow the address. The first component of that vector tag is the destination register identifier bits. The preferred embodiment includes the destination register as part of its tag. Instead of sending the register number, the processor simply registers one of two registers. Send a single pit to direct. That particular register number is established and can be different each time a load instruction is issued to a port. on the tag contains only a single destination register identifier, so at any given time can have only two loads pending. Tag collision within one port (same destination (register bit reuse) This is prevented by waiting for all responses. In addition, communication from each port Subsequent loads always use different destination register tag pits. each vector Multiple loafs for registers (but at different vector load ports) This can be avoided by waiting for the register load to complete.

ベクトルタグの第２の部分は、ベクトルレジスタの要素番号である。これは単に、所定のレジスタのいずれのワードがその記憶データによって更新されるはずかを指定する。The second part of the vector tag is the vector register element number. This is simply , which word of a given register should be updated by its stored data Specify.

ベクトルレジスタの第３の、そして最後の部分は、そのプロセッサ１０に関するロード要求を事実上破棄する、「取り消し」標識である。しかし、記憶システムにとって、取り消された要求は、ごくわずかな例外を伴って、取り消されていない要求とまったく同様に扱われる。この例外とは、取り消されたクラスタ外要求は性能上の理由でクラスタ内に転送されるということである。要求が発行された後にその要求を取り消すことができるこの能力は、要求を取り扱うプロセッサ１０およびアービトレーションノード４４の部分とアドレスを取り扱うプロセッサ１０およびアービトレーションノード４４の部分との間の、時間が決定的である何らかのハンドシェーキングを削除するので、効果的な記憶待ち時間の低減をもたらす。この実施例では、取り消しビットは、アドレスの妥当性検査および要求アービトレーションがオーバラップでき、プロセッサ１０およびアービトレーションノード４４内で並行して実施されるようにする。すなわち、例えば、アービトレーションノード４４は記憶要求のアービトレーションを進め、同時に、プロセッサ１０内のそのアドレス制御論理はその記憶要求のアドレスが有効な要求であるかどうか、つまり正しいアドレス範囲内にあるかどうかを判定する。プロセッサ１０がそのアドレスが範囲外であると判定した場合、取り消しビットが設定され、その要求はクラスタ内アドレスへ向けられる。The third and last part of the vector registers relates to the processor 10. A "cancel" indicator that effectively discards the load request. However, the memory system For example, a canceled request is, with very few exceptions, a canceled request. It is treated exactly the same as a new request. This exception is a canceled out-of-cluster request. is transferred within the cluster for performance reasons. request was submitted This ability to later cancel the request is provided by the processor 1 handling the request. 0 and a processor that handles parts and addresses of the arbitration node 44. 10 and part of the arbitration node 44, the time is critical. It also effectively reduces memory latency as it removes some handshaking. Tarasu. In this example, the cancellation bit is used for address validation and request Arbitration can overlap, processor 10 and arbitrator and are executed in parallel within the application node 44. That is, for example, The arbitration node 44 proceeds with the arbitration of storage requests and, at the same time, The address control logic within processor 10 indicates that the address of the storage request is a valid request. Determine whether it exists, that is, whether it is within the correct address range. process If the processor 10 determines that the address is out of range, the cancellation bit is set. and directs the request to the intra-cluster address.

ある種の動作では、この取り消し機能は、ループにおいてデータのボトムローディングを可能にすることで有益である。このソフトウェア技法では、記憶待ち時間の作用は、そのアドレスが有効であるかどうかがわかる前にデータをプリフェッチすることによって回避される。In some operations, this undo function can be used to This is beneficial because it allows for With this software technique, when waiting for memory The interaction between Avoided by touching.

好ましい多重プロセッサシステムでは、プログラムは、マツピング例外をオフにし、ランダムなアドレス（そのデータ空間外のアドレスでさえ）で問題なくメモリにアクセスすることができ、その取り消し機構はそのプログラムが禁止データにアクセスできないように保証する。On preferred multiprocessor systems, programs should turn off mapping exceptions. and notes with no problem with random addresses (even addresses outside of that data space) The program can access prohibited data and its cancellation mechanism to ensure that it cannot be accessed.

従って、プログラムは、それが使用するアドレスを検査する前にメモリにアクセスすることができ、ロード命令は、ループの上部ではなく下部に置くことができる（「ボトムローディング」）。Therefore, a program must access memory before examining the addresses it uses. The load instruction can be placed at the bottom of the loop instead of the top. (“bottom loading”).

再び図６８および６ｂに戻って、スカラ要求タグについて説明する。メモリへの単一のスカラボート７２２は、Ｓレジスタ７１４およびＬレジスタ７１６への記憶データの経路を付与する。正しいデータ順序づけを保証するために、スカラポートは、ベクトルボートと同様に、各記憶要求およびアドレスに要求タグを付加する。Returning again to Figures 68 and 6b, the scalar request tag will be described. to memory A single scalar board 722 writes to S register 714 and L register 716. Provide a route for stored data. To ensure correct data ordering, the scara point Similar to vector ports, ports attach request tags to each storage request and address. do.

図７ｂに示すように、このスカラ要求タグは、宛先レジスタ形式およびレジスタ番号を指示する。スカラタグの第１の部分は、レジスタ形式である。これは、ＬレジスタとＳレジスタの応答を識別する。スカラタグの第２の部分は、レジスタ番号である。これは、どちらのレジスタ番号（ＬまたはＳ）にリターンデータを入れるかを指示する。タグ衝突は、レジスタ当たりただ一つの未決の参照を、または、Ｌレジスタの場合はレジスタ群当たりただ一つの未決の参照を許可することによって防止される。スカラタグの第３の、そして最後の部分は、ユーザプログラムが、それらの許可されたアドレス空間外にあるデータにアクセスしないように防ぐ、「取り消し」標識である。この機構については、本願書の各所で説明されている。As shown in Figure 7b, this scalar request tag includes the destination register format and register Indicate the number. The first part of the scalar tag is in register format. This is L Identify register and S register responses. The second part of the scalar tag is a register It's a number. This indicates which register number (L or S) the return data should be placed in. Instruct whether to enter. Tag collisions result in only one outstanding reference per register, or Or, in the case of L registers, allow only one pending reference per register group. This is prevented by The third and final part of the scalar tag is programs to access data outside their permitted address spaces. It is a "cancelled" sign that prevents This mechanism is explained in various parts of this application. has been done.

図６ａおよび６ｂに示す通り、スカラロードボート７２２は、Ｌレジスタ７１６またはＳレジスタ７１４のいずれかに向けられたリターンを受け取る。そのリターンタグのＬ／Ｓビットは、そのデータがメモリから返った時にいずれの宛先レジスタ形式に書き込むかを決定する。As shown in FIGS. 6a and 6b, the scalar load boat 722 is connected to the L register 716. or receive a return directed to either S register 714. That Rita The L/S bit in the element tag indicates which destination level the data will be sent to when it returns from memory. Decide whether to write in register format.

Ｓレジスタが書き込まれる場合、その要求タグは２つの機能を実行する。まず、そのレジスタを未予約にさせ、それによりＳレジスタファイルの書き込みアドレスを形成する。Ｌレジスタが書き込まれる場合、タグは同じ２つの機能を実行する。しかし、Ｌレジスタは、個々のレジスタではなく、ブロック単位で予約されたりされながったりする。When an S register is written, its request tag performs two functions. first, causes that register to be unreserved, thereby making the write address of the S register file form a space. When the L register is written, the tag performs the same two functions. Ru. However, L registers are reserved in blocks rather than individual registers. or be ignored.

スカラボート応答について概説した要求タグ方式は、Ｌレジスタロードが、Ｓレジスタロードなどの高優先順位ロードによって割り込まれることを許す。出方アドレスにはいかなる順序づけも課されず、また、入力データストリームにもまったく課されないので、一方のレジスタロード（例えば、Ｌレジスタ）の集合をバックグラウンドアクティビティとして扱い、他方の集合（Ｓレジスタ）を高優先順位のフォアグラウンドアクティビティとして扱うことが可能である。その後、正しいソフトウェア支援によって、Ｌレジスタは、ブロックロードおよびストアによってアクセスされるソフトウェア管理キャッシュとして扱われることができる。The request tagging scheme outlined for the scalar boat response means that the L register load is Allows to be interrupted by high priority loads such as register loads. How to get out No ordering is imposed on the addresses, and no ordering is imposed on the input data stream. Since one set of register loads (e.g., L registers) is Treated as a background activity, giving high priority to the other set (S register) It can be treated as a ranked foreground activity. after that, With proper software assistance, the L registers can be used for block loads and stores. Can be treated as a software managed cache accessed by Ru.

再び図６ａおよび６ｂに戻って、命令要求タグについて説明する。命令および入出カポ−ドア２０は、メモリから命令キャラシュア１０および入出力バッファ７１２への経路を付与する。正しい命令の順序づけを保証するために、この命令ボートは、他のボートと同様に、その要求およびアドレスとともに要求タグを付与する。命令要求タグは、図７ｃに示す通り、その命令の応答のためのバッファ番号およびそのバッファ内の要素番号を指示する。Returning again to FIGS. 6a and 6b, the instruction request tag will be described. command and input The output port 20 outputs the command character 10 and the input/output buffer 7 from the memory. Give a route to 12. This instruction box is used to ensure correct instruction ordering. A boat is given a request tag along with its request and address, just like any other boat. do. The instruction request tag contains the buffer number for the response of that instruction, as shown in Figure 7c. and the element number within that buffer.

ベクトルボートレジスタ番号と同様に、この命令ボートバッファ番号標識は、単一のタグビットで符号化される。これは、未決のバッファ要求の数を２つに制限するが、命令ロードに関係する制御およびデータ経路を単純にする。命令タグ衝突は、新しい充填の開始を許可する前に、最旧のバッファ充填が完了するのを待つことによって回避される。同一バッファへの複数の未決ロードは、キャッシュ置換方針によって禁止されている。タグの符号化は、リターン宛先が命令キャッシュバッファ７１２または入出力応答バッファ７１０のいずれであるかを指示する。Similar to the vector boat register number, this instruction boat buffer number indicator is simply Encoded with one tag bit. This limits the number of pending buffer requests to two However, it simplifies the control and data paths involved in instruction loading. command tag The buffer waits for the oldest buffer fill to complete before allowing a new fill to begin. This can be avoided by Multiple pending loads to the same buffer are cached Prohibited by replacement policy. The tag encoding is specified when the return destination is an instruction cache. buffer 712 or input/output response buffer 710. Ru.

各ベクトルロードボート７２４，７２６．７２８および７３０は、「バック」ビットの２つの集合を維持しており、その一方は、そのボートの各可能な未決ベクトルレジスタ用のものである。メモリからリターンが戻ると、そのレジスタおよび要素番号のバックビットは１に設定され、そのデータはそのベクトルレジスタに書き込まれる。しかし、制御機構７３２は、以前の要素を含むそれまでの全部の要素がメモリから返されるまで、そのデータが使用されないようにする。これは、応答が乱順て戻っても、ベクトルレジスタの全部の要素が順番に使用されることを保証する。Each vector load boat 724, 726, 728 and 730 has a "back" boat. It maintains two sets of boats, one for each possible pending vector for that boat. This is for the torque register. When a return is returned from memory, its registers and and the back bit of the element number is set to 1, and the data is stored in that vector register. will be written to. However, the control mechanism 732 controls all of the previous elements, including the previous elements. Prevents the data from being used until the element in is returned from memory. this uses all elements of the vector register in order, even if the response returns out of order. I guarantee that.

ベクトルレジスタの全部の要素がメモリから返されると、その側のバックビットは「ノットバ・ツク」とマークされ、そのバックビットの集合は別のロード命令に使用可能となる。そのタグが要求が取り消されたことを指示しているベクトルおよびスカラロードデータは、非信号のＮａＮ　（Ｎｏｔ　ａ　Ｎｕｍｂｅｒ：　ｒ非数値」）によって破棄され、交替される。これは、本発明の好適な実施例が、オペランドマツピング例外を使用可能としないプログラムによるアクセスから記憶データを保護する方法である。命令データがメモリ１４から返されると、プロセッサ１０は、それを入れるべき場所を決定するためにそのリターンタグを使用する。命令リターンは、データリターンとは異なり、マツブトモードであれば命令マツピング例外が常に使用可能であるので、取り消し機能を使用しない。When all elements of a vector register are returned from memory, the back bit on that side is marked as ``not-backed'' and its set of back bits is another load instruction. It becomes available for use. a vector whose tag indicates that the request was canceled And scalar load data is a non-signal NaN (Not a Number: ``r non-numeric'') is discarded and replaced. This is a preferred embodiment of the invention is accessed by a program that does not enable operand mapping exceptions? This is a method of protecting stored data. When the instruction data is returned from memory 14, Processor 10 uses the return tag to determine where to put it. use. Unlike data return, instruction return is different from data return, even in Matsubuto mode. For example, instruction mapping exceptions are always available, so do not use the cancel function.

非コヒーレント記憶システムに関係するハザードを回避するために、プロセッサ１０および外部インタフェースボート２２は、アービトレーションノード４４がそのタグ以外の情報を付与することをめる。To avoid the hazards associated with non-coherent storage systems, processors 10 and external interface boat 22, arbitration node 44 It is recommended to add information other than the tag.

この情報は、要求の順序づけ、および、それらの要求が特定の共用資源によって処理されるためにコミットされる時に関係している。好適な実施例では、コヒーレンシーを保証するために使用されるこの技法は、「データマーク機構」と称する。このデータマーク機構は、原特許願書においてより詳細に開示されている。This information is used to determine the ordering of requests and how they are handled by specific shared resources. Relevant when committed to be processed. In a preferred embodiment, the coffee This technique used to ensure currency is called a “data mark mechanism.” Ru. This data mark mechanism is disclosed in more detail in the original patent application.

[External interface port]

プロセッサ１０によって発行される要求に加え、本発明は、外部インタフェースポート２２を介して発行された周辺装置からの資源要求にもサービスすることができる。外部インタフェースポート２２によってメインメモリ１４または大域レジスタ１６からフェッチされたデータは、それが要求された順序とは異なる順序で戻ることができる。本発明のこの非順次アクセスを実現するために、外部インタフェースポート２２に関係するｌ０Ｃ２４およびＳＭＳ　２８は、外部インタフェースポート２２を通じてなされる資源要求に要求タグを付加する。 In addition to the requests issued by processor 10, the present invention It can also service resource requests from peripheral devices issued through port 22. can. External interface port 22 connects main memory 14 or global memory. Data fetched from register 16 is in a different order than it was requested. You can return with To achieve this non-sequential access of the present invention, external 10C 24 and SMS 28 related to interface port 22 are A request tag is attached to a resource request made through the face port 22.

図８によって、好適な実施例の外部インタフェースポート２２の詳細な説明を行う。外部インタフェースポート２２は、クラスタチャネルインタフェース（ＣＣＩ）１２０を物理的に接続するメモリボートケーブル（図示せず）によってメインメモリ１４からコマンドおよびデータワードのパケットを受け入れる。コマンドワードはコマンドバッファ３５０に入れられ、データはデータＦＩＦＯバッファ３６０に経路指定される。コマンドバッファ３５０にコマンドが存在すると、外部インタフェースポート２２の制御論理３７０は、アービトレーションノード４４を通じてメモリ１４ヘアクセスを要求することになる。コマンドワードのワードカウント、コマンド、アドレスおよびｍタグフィールドからのデータは、その要求が認識される際にアービトレーションノード４４へ配信する準備として各自のレジスタ３８２，３８４．３８６および３８８ヘロードされる。行われたすべてのワード要求について、新しい要求タグおよびアドレスが計算されなければならない。FIG. 8 provides a detailed description of the external interface port 22 of the preferred embodiment. cormorant. The external interface port 22 is a cluster channel interface (CC I) A memory boat cable (not shown) that physically connects the packets of command and data words from the online memory 14; command The word is placed in the command buffer 350 and the data is placed in the data FIFO buffer. 360. If a command exists in the command buffer 350, The control logic 370 of the external interface port 22 is an arbitration node. 44 to request access to the memory 14. command word Data from the code count, command, address and mtag fields are each request in preparation for delivery to the arbitration node 44 upon recognition of the request. It is loaded into its own registers 382, 384, 386 and 388. It was done A new request tag and address must be computed for every word request. No.

フェッチ要求の場合、いかなるデータも送信されないが、アドレスおよび要求タグは、そのコマンドワードカウントフィールドの内容に等しい要求数について送信される。要求タグは、０に設定された下位６ビツトに始まり、タグの正しい数が送信されるまでそのフィールドの内容を増分させて計算される。同様に、要求のアドレスは、そのコマンドワードに存在するアドレスに始まって、各要求が認識されるごとにそれを増分させて計算される。For fetch requests, no data is sent, but the address and request tag are The command will be sent for a number of requests equal to the contents of its command word count field. be believed. The request tag starts with the lower 6 bits set to 0 and the correct number of tags. is calculated by incrementing the contents of that field until the field is sent. Similarly, request The addresses of each request are accepted starting with the address present in that command word. It is calculated by incrementing it each time it is recognized.

ストア要求の場合、データＦＩＦＯ３６０内の次のワードがアドレス、タグおよびコマンド情報とともに提示される。ワードカウント値は、各要求後に減分される。For store requests, the next word in data FIFO 360 contains the address, tag and and command information. The word count value is decremented after each request. Ru.

ワードカウント値が０に達すると、以降の要求はいっさい行われない。ＦＩＦＯ３５０および３６０は、可能であれば必ず、アービトレーションノード４４を絶えずビジー状態にしておくために外部インタフェースポート２２でコマンドおよびデータが常に使用可能であることを保証するようにコマンドおよびデータを保持するために使用される。Once the word count value reaches 0, no further requests will be made. FIFO 350 and 360 terminate arbitration nodes 44 whenever possible. commands and commands on external interface port 22 to keep it busy. commands and data to ensure they are always available. used to hold

フェッチされたデータは、転送レジスタ３９０を介して共用資源から戻る。その要求がなされた時に発行された要求タグは、そのデータとともに返される。転送レジスタ３９０の出力は、メインメモリ１４に接続されている。外部インタフェースボート２２を連結するケーブルのデータ線に関係する制御線は、有効データワードがＣＣ１１２０のバスにあることを指示するために挿入されている。Fetched data returns from the shared resource via transfer registers 390. the The request tag issued when the request was made is returned with the data. transfer The output of register 390 is connected to main memory 14 . external interface The control line related to the data line of the cable connecting the baseboard 22 is a valid data line. It is inserted to indicate that the word is on the CC 1120 bus.

入出力要求タグは、ＩＯＣ２４を通じて伝わる際にそのデータパケットに先行するコマンドワードとともに送信される。タグは、そのデータをｒＯｃ　２４のいずれのバッファが受信するかを指示する４ビツトフイールド、および、そのデータが格納されるそのバッファ内の記憶場所を指示する６ビツトフイールドを含む。４ビツトのバッファ選択フィールドは、図９に示すようにデコードされる。コード１０１１は、データを２つのＳＭＴＣコマンドバッファのうちの一方に方向づける。残りの６つのタグピットは、その応答データワードがいずれのバッファおよびそのいずれの記憶場所へ格納されるかを指示する。コード１１１１は、ｌ０Ｃ２４によっては使用されない。それは、プロセッサの命令キャッシュフェッチ動作に予約されている。ＩＯＣ２４は、メモリボートをその命令キャッシュと共用する。The I/O request tag precedes the data packet as it travels through the IOC 24. is sent with the command word. The tag sends its data to rOc 24. A 4-bit field that indicates which buffer receives the data, and the data Contains a 6-bit field that indicates the location within that buffer where the data is stored. . The 4-bit buffer selection field is decoded as shown in FIG. Ko The board 1011 directs data to one of two SMTC command buffers. Attach. The remaining six tag pits have their response data words assigned to either buffer. and the storage location in which it is stored. Code 1111 is l Not used by 0C24. It is the processor's instruction cache Reserved for touch operation. The IOC 24 uses the memory boat as its instruction cache. Sharing.

６ビツトフイールドは、要求がなされた際に各個別のデータワード要求について外部インタフェースボートで生成される。要求は、最下位アドレスに始まり順に行われる。図７ｄによれば、入出力要求タグ、すなわち、６ビツトワード識別子および４ビツト宛先識別子は、１０ビツトタグフイールドに入れられる。このタグは、記憶システムによって要求とともに伝わり、データの各ワードとともにＩＯＣ２４へ返される。その後、要求タグは、そのデータワードを適切なバッファおよびそのバッファの記憶場所へ方向づけるためにＩＯＣ２４によって使用される。A 6-bit field is used for each individual data word request when the request is made. Generated by external interface boat. Requests are made in order starting with the lowest address. It will be done. According to FIG. 7d, the input/output request tag, i.e. the 6-bit word identifier and the 4-bit destination identifier are placed in the 10-bit tag field. This type The storage system carries the request along with each word of data. Returned to OC24. The request tag then sends that data word to the appropriate buffer. and is used by the IOC 24 to direct to the memory location of its buffer. Ru.

要求がなされた際の連続順序でタグが生成されるので、宛先バッファの記憶場所をアドレス指定するためにタグを使用することは、データが、いずれの任意の順序で戻っても、必ず正しい順序でバッファにロードされることを保証する。従って、データをバッファから連続順序で読み出すことは、データが正しい順序で宛先へ返されることを保証する。Tags are generated in sequential order when requests are made, so the memory location of the destination buffer Using tags to address data, in any arbitrary order This ensures that the buffers are loaded in the correct order, even if you return in the same order. follow Therefore, reading data from the buffer in consecutive order ensures that the data is delivered in the correct order. Guarantee that it will be returned to you.

[Arbitration node]

再び図５ａおよび５ｂに戻って、共用資源の観点から見ると（この場合、経路５０および５２）、各入力要求は、要求アービトレーションネットワーク３０３によってアービトレーションが行われる。同様の応答アービトレーションネットワーク３０６は、それらの各自のプロセッサポー）−３１５，３１６または３１７へ戻る応答データのアービトレーションを行う。入力する要求の場合、入力待ち行列３０１が、その要求アービトレーションネットワーク３０３によって渡されるのを待つ最大１６個の要求を保持する。返される応答の場合、データ待ち行列３０５が、その応答アービトレーションネットワーク３０６がそれらの応答データを宛先ボート３１５．３１６または３１７へ戻すのを待つ最大６４個の応答を保持する。これらの待ち行列３０１および３０５のそれぞれは、リクエスタと共用資源の間でデータが流れる際のいずれかの制御待ち時間を範囲に収めるように戦略的に大きさが取られている。また、データが記憶部から返されると、その関係するタグはタグ待ち行列３０４から検索され、データおよびタグがデータ待ち行列３０５ヘロードされる前に、再び付属される。ＮＲＣＡ経路５２による応答の場合、データおよびその関係するタグはすでに対にされており、ＮＲＣＡおよびＭＲＣＡ手段は関係するデータおよびタグを取り扱う。 Returning again to Figures 5a and 5b, from a shared resource perspective (in this case path 5 0 and 52), each input request is sent to the request arbitration network 303. Therefore, arbitration is performed. A similar response arbitration network ports 306 - 315, 316 or 317 of their respective processor ports. Arbitrates response data back to . For input requests, waiting for input matrix 301 is passed by its request arbitration network 303. Holds up to 16 requests waiting to be completed. For responses returned, the data queue 305 whose response arbitration network 306 up to 64 responses waiting to return data to destination boat 315.316 or 317. Hold. Each of these queues 301 and 305 is to keep any control latency within the range when data flows between resources. It is strategically sized. Also, once the data is returned from storage, its The relevant tag is retrieved from the tag queue 304 and the data and tag are It is attached again before being loaded into matrix 305. Response via NRCA pathway 52 , the data and its related tags are already paired and the NRCA and and MRCA means handle related data and tags.

要求アービトレーションネットワーク３０３が、入力している要求が使用可能な資源を要求しており、最高優先順位を有しており、かつ、記憶部へ向けられていると判定した場合、その要求のアドレスおよびデータ構成要素は経路５０に置かれ、その正しい記憶部へ経路指定される。要求アービトレーションネットワーク３０３が、入力している要求が使用可能な資源を要求しており、最高優先順位を有しており、かつ、ＮＲＣＡ　４６へ向けられていると判定した場合、その要求のアドレス、データおよびタグ構成要素は経路５２に置かれ、その正しい共用資源１２へ経路指定される。アービトレーションネットワーク３０３は、それらの相互接続配線５０および５２へのアクセスを事実上制御していることに留意しなければならない。その要求経路に沿った以降のアービトレーションネットワークは、他の共用資源へのアクセスを制御する。データは、要求された順序とは異なる順序で要求ボート３１５，３１６および３１７へ返され得る。The request arbitration network 303 determines whether the request being input is usable. is requesting resources, has the highest priority, and is directed to storage. If it is determined that the address and data components of the request are and is routed to its correct storage. request arbitration network 303 indicates that the incoming request requires available resources and has the highest priority. and if it is determined that the request is directed to NRCA 46. address, data and tag components are placed on path 52 and its correct shared resources are source 12. The arbitration network 303 Note that this effectively controls access to interconnect traces 50 and 52. Must be. Subsequent arbitration networks along that request path controls access to other shared resources. Data is in a different order than requested may be returned to request boats 315, 316, and 317 in the order specified.

アービトレーションノード４４は、各ロードアドレスとともにタグの集合を受信し、後の参照のためにそれらを待ち行列に入れる。データがメインメモリから返されると、それらのタグは対応するデータワードに再び付属され、データおよびタグの両者はその要求側ボートへ返される。プロセッサ１０は、そのデータを正しい宛先へ入れるためにこれらのタグを利用し、それが正しい順序で使用されるように保証する。Arbitration node 44 receives a set of tags with each load address. and queue them for later reference. data returned from main memory Once the tags are attached to the corresponding data word, the data and Both tags are returned to the requesting boat. Processor 10 corrects the data. Use these tags to enter the new destination and make sure they are used in the correct order. We guarantee that.

[Main memory]

再び図５ａおよび５ｂに戻って、メモリ参照に関して、各記憶部のスイッチング論理４００は、所定のアービトレーションノード４４からのその特定の記憶下位区分に対する全部の入力要求を収集する各記憶下位区分に下位区分キャッチ待ち行列４０１を含む。各アービトレーションノード４４は、各記憶部１４に自己自身のキャッチ待ち行列の集合を有する。バンク要求アービトレーションネットワーク４０５は、各サイクルでそのバンク４０３への未決の要求を有する下位区分キャッチ待ち行列４０１のその群の間でアービトレーションを行う。要求が選択されると、その選択された要求はその宛先バンク４０３へ発行される。要求がストアである場合、アドレスおよびデータがそのバンクへ発行される。要求がロードである場合、そのアドレスだけがそのバンクへ発行される。要求がロードアンドフラグである場合、そのアドレスおよびデータがそのバンクへ発行される。ロードおよびロードアンドフラグについては、バンク４０３からのその応答データは、応答アービトレーションネットワーク４０７がその記憶部からの出力応答を許可する前にホールド待ち行列４０６で保持される。 Returning again to Figures 5a and 5b, with respect to memory references, the switching of each memory Logic 400 determines whether that particular storage subordinate from a given arbitration node 44 Wait for subdivision catch on each storage subdivision to collect all input requests for the partition Contains matrix 401. Each arbitration node 44 stores its own information in each storage unit 14. has its own set of catch queues. Bank request arbitration network Bank 405 is a subdivision that has outstanding requests to that bank 403 each cycle. Arbitration occurs between that group of catch queues 401. request selected Then, the selected request is issued to its destination bank 403. The request is If it is a bank, addresses and data are issued to that bank. request is low If the bank is a bank, only that address will be issued to that bank. request is unloaded If flagged, the address and data are issued to that bank. B for the load and flags, their response data from bank 403. The response arbitration network 407 outputs the output response from its storage section. It is held in a hold queue 406 before being granted.

[Remote cluster adapter]

次に図１０について説明する。ＮＲＣＡ手段４６の場合、入力待ち行列６１０または６３０が入力要求を収集する。入力待ち行列６１０は、外部インタフェースへ向けられた参照を保持する。アービトレーションネットワーク６１２は、各サイクルで外部資源への未決の要求を有するその１群の入力待ち行列６１０の間でアービトレーションを行う。要求が選択されると、その選択された要求は、アドレス、データ、および、経路５８に置かれるその要求タグおよび付加的な情報（図７０参照）から構成されるクラスタタグと称する新しいタグとともに、その宛先資源へ発行される。入力待ち行列６３０は、割り込み機構、大域レジスタまたは５ＥＴＮレジスタへ向けられた参照を保持する。アービトレーションネット７ −り６３４は、各サイクルでその資源６２０．６３２または６３３への未決の要求を有するその１群の入力待ち行列６３０の間でアービトレーションを行う。その要求が許可されると、それはその宛先資源へ発行される。データが大域レジスタ６３３またはＳ　ＥＴＮレジスタ６３２からアービトレーションノード４４へ返される場合、高優先順位の要求がその出力アービトレーションネットワーク６１５に提示され、それによりそのアービトレーションノードへ戻る出力経路はクリアされることになる。 Next, FIG. 10 will be explained. In the case of NRCA means 46, input queue 610 or or 630 collects input requests. Input queue 610 is an external interface Holds a reference directed to. Arbitration network 612 Among that group of input queues 610 that have pending requests for external resources at a cycle Perform arbitration. Once a request is selected, the selected request is response, data, and its request tag and additional information placed on path 58 ( (see Figure 70), along with a new tag called a cluster tag consisting of Issued to the previous resource. Input queue 630 may include an interrupt mechanism, global registers or holds a reference directed to the 5ETN register. arbitration net 7 - resource 634 requests pending requests to that resource 620, 632 or 633 each cycle. Arbitration is performed between that group of input queues 630 that have requests. So If a request for is granted, it is issued to its destination resource. data is in the global register from the SETN register 633 or S ETN register 632 to the arbitration node 44 If returned, the high priority request is sent to its output arbitration network 6 15, thereby returning the output route to that arbitration node. It will be rearranged.

ボート５８を介してＭＲＣＡ手段４８から戻るデータおよびタグは、待ち行列６１４に受信される。大域レジスタ６３３または５ＥＴＮレジスタ６３２からのデータは、事前にアービトレーションが行われ、ただちに、関係するタグとともに戻り経路５２に置かれる。各クロックサイクルにおいて、応答アービトレーションネットワーク６１５は、ボート５２または５６の戻りデータ経路についてアービトレーションを行う。データ待ち行列６１４、大域レジスタ６３３または５ＥＴＮレジスタ６３２から適切なデータが選択され、適切なボート５２または５６へ返される。Data and tags returning from MRCA means 48 via boat 58 are sent to queue 6 Received on 14th. Data from global register 633 or 5ETN register 632 The data is pre-arbitrated and immediately sent along with the associated tags. placed on the return path 52. At each clock cycle, the response arbitration The communication network 615 provides an architecture for the return data path of the boat 52 or 56. Perform bitration. Data queue 614, global register 633 or 5E The appropriate data is selected from the TN register 632 and placed in the appropriate boat 52 or 56. will be returned to.

次に図１１ｇおよび１１ｂについて説明すれば、ＭＲＣＡ手段４８は、それらを介して他のクラスタからのストアおよびロード動作が受信される６つのボート５２０が存在する。これらのボートのそれぞれは、受信待ち行列５００．タグバッファ５０２、応答待ち行列５０４、および、ボート制御論理５０１および５０３から構成される。各ボートは、他の全部のボートから独立して動作する。別のクラスタからボートに着信した全部の動作は、そのクラスタへ返される。これは、ストアもロードも含む。ＭＲＣＡ手段４８の待ち行列およびアービトレーション（５０６，５０７，５０８，５０９，５１０，５１１および５１２）は、本質的に、アービトレーションノード４４の待ち行列およびアービトレーション（それぞれ、３０１，３０２，３０３，３０４，３０５，３０６および３０７）と同様に動作する。Referring now to FIGS. 11g and 11b, the MRCA means 48 6 boats 5 through which store and load operations from other clusters are received There are 20. Each of these boats has a receive queue 500 . tag bag response queue 504 and boat control logic 501 and 503 It consists of Each boat operates independently from all other boats. another group All operations arriving at a boat from a raster are returned to that cluster. this is, Includes store and load. MRCA means 48 queuing and arbitration (506, 507, 508, 509, 510, 511 and 512) are essentially In addition, the queue of arbitration node 44 and the arbitration 301, 302, 303, 304, 305, 306 and 307), respectively. works.

ある動作が外部クラスタからＭＲＣＡポート５２０に着信すると、そのデータ、アドレスおよびタグ情報が、６４個の記憶場所の深さを持つ受信待ち行列５００に書き込まれる。この受信待ち行列５００に妥当な動作が書き込まれた後、ボート制御論理５０１は、その動作がＭＲＣＡ手段４８に渡され得るかどうかを判定するために資源検査を実行する。３つの検査される資源が存在する。When an operation arrives at MRCA port 520 from an external cluster, the data, Address and tag information is in a receive queue 500 that is 64 locations deep. will be written to. After a valid operation is written to this receive queue 500, the board The control logic 501 determines whether the operation can be passed to the MRCA means 48. Perform resource checks to There are three resources that are examined.

検査される第１の資源はタグの可用性に関する。動作がＭＲＣＡ手段４８に渡されると、その要求とともに着信した元のクラスタタグはタグバッファ５０２に書き込まれ、新しい８ビツト要求タグがタグジェネレータ５０１によって生成される。その元のタグが書き込まれるタグバッファ５０２の記憶場所のアドレスは、その新しい要求タグになる。この要求タグは一意でなければならないので、新しいクラスタタグが生成されＭＲＣＡ手段４８に渡されると、それは、その動作がＭＲＣＡ手段４８′ｂ１らそのボートへ返されるまで、再使用することはできない。この論理の実施は、要求タグが順に生成されなければならないことを要する。生成される次の要求タグがＭＲＣＡ手段４８においてまだ未決である場合、そのボートは、受信待ち行列５００から自己の次の動作を発行することができない。タグバッファ５０２は２５６個の記憶場所の深さを持つ。The first resource checked concerns the availability of tags. The operation is passed to MRCA means 48 When the request is received, the original cluster tag that arrived with the request is written to the tag buffer 502. A new 8-bit request tag is generated by tag generator 501. Ru. The address of the memory location in tag buffer 502 where the original tag is written is: That becomes the new request tag. This request tag must be unique, so new Once a new cluster tag is generated and passed to the MRCA means 48, it MRCA means 48'b1 may not be reused until returned to its boat. stomach. Implementation of this logic requires that request tags must be generated in order. . If the next request tag to be generated is still pending in the MRCA means 48, then The boat cannot issue its next operation from the receive queue 500. . Tag buffer 502 is 256 locations deep.

ＭＲＣＡ手段４８へ動作が発行できる前に検査されなければならない第２の資源は、リターン待ち行列５０４における記憶場所の可用性に関する。ＭＲＣＡ手段４８はアービトレーションノード４４に戻り動作を保持させるための機構をまったく持っていないので、ＭＲＣＡ手段４８は、アービトレーションノード４４がら戻るいずれかの動作を格納するためにリターン待ち行列５０４に記憶場所が常に存在するように保証しなければならない。A second resource that must be checked before an operation can be issued to the MRCA means 48 relates to the availability of storage locations in return queue 504. MRCA means 48 completely implements a mechanism for causing arbitration node 44 to maintain return operation. Since the arbitration node 44 does not have many There is always a memory location in the return queue 504 to store any operations that return from must be guaranteed to exist.

このリターン待ち行列５０４は１２８個の記憶場所の深さを持つ。リターン待ち行列５０４の記憶場所の全部が割り当てられると、記憶場所が使用可能となるまで、他のいかなる動作もＭＲＣＡ手段４８に発行できない。This return queue 504 is 128 locations deep. Waiting for return Once all of the storage locations in matrix 504 have been allocated, the storage locations are ready for use. , no other action can be issued to the MRCA means 48.

ＭＲＣＡ手段４８内のボート待ち行列５０６は、受信待ち行列５００からの動作が発行できる前に検査されなければならない第３の資源である。ボート制御論理５０１は、現在の動作の総数をボート待ち行列５０６に保持する。ボート待ち行列５０６が一杯になると、ポート制御論理５０１は、記憶場所が使用可能となるまで、発行を抑止しなければならない。The boat queue 506 within the MRCA means 48 receives operations from the receive queue 500. This is the third resource that must be examined before it can be issued. boat control logic 501 maintains the current total number of operations in a boat queue 506. waiting for boat Once column 506 is full, port control logic 501 indicates that a storage location is available. Issuance must be suppressed until then.

ＭＲＣＡ手段４８から動作が戻ると、データが存在する場合、そのデータは、リターン待ち行列５０４に直接格納される。その動作とともに返される要求タグは、そのタグバッファにアクセスし、元のクラスタタグ５０３を回復させるために使用される。この元のクラスタタグは、タグバッファ５０２から抽出され、そのデータとともにリターン待ち行列５０４に格納される。その後、ポート制御論理５０１は、そのクラスタ間経路５２０の遠端のクラスタについて資源検査を行う。その遠端クラスタが自己の受信待ち行列に使用可能な記憶場所を有していれば、リターン待ち行列５０４はロードされない。記憶場所がなければ、待ち行列の記憶場所が使用可能となるまで、データは保持される。When the operation returns from the MRCA means 48, the data, if present, is Stored directly in turn queue 504. The request tag returned with that action is , to access its tag buffer and recover the original cluster tag 503. used. This original cluster tag is extracted from the tag buffer 502 and its It is stored in return queue 504 along with the data. Then the port control logic 501 performs a resource check on the cluster at the far end of its intercluster path 520. . if the far-end cluster has storage available for its receive queue , return queue 504 is not loaded. If there is no storage space, the queue Data is retained until a storage location becomes available.

好適な実施例の説明を提示してきたが、本発明の精神を逸脱することなく、各種変更がなし得ることが想定される。従って、本発明の範囲は、好適な実施例の説明よりもむしろ、添付請求の範囲によって規定されるべきものと意図する。Although a description of a preferred embodiment has been presented, various modifications may be made without departing from the spirit of the invention. It is assumed that changes may be made. Accordingly, the scope of the invention is limited to the description of the preferred embodiment. It is intended that the scope of the invention, rather than the scope of the invention, be defined by the claims appended hereto.

Ｆｉｇ、　２Ｆｉｇ、　３ａ翫３ｂＦｉｇ、　５Ｆｉｇ、　Ｉｌａ要　約　書複数リクエスタシステムにおいて共用資源（１２）に対する非順次アクセスのための方法および装置は、データをその宛先で効果的に再順序づけるために各種のタグを使用する。最も単純な形態では、このタグは、方向情報に関する別のタグを位置づけるためにバッファ内の場所へ、または、そのタグに関係する応答を発するためにバッファまたはプロセッサ（レジスタ）内の場所へ、スイッチング論理を方向づける。例えば、メモリ（１４）からデータをロードするには、そのリクエスタが、要求信号、アドレスおよび要求タグを付与することが必要になる。Fig, 2 Fig, 3a Pole 3b Fig, 5 Fig, Ila Summary book For non-sequential access to shared resources (12) in multiple requester systems The method and apparatus for Use tags. In its simplest form, this tag contains another tag for directional information. to a location in the buffer to locate the tag, or emit a response related to that tag. Switching logic to a location in a buffer or processor (register) to direct the principles. For example, to load data from memory (14), It will be necessary for the questa to provide the request signal, address and request tag.

要求信号は、そのアドレスおよび要求タグの妥当性を検査する。アドレスは、その要求されたデータのメモリ（１４）内の記憶場所を指定する。要求タグは、データがそのプロセッサへ返された時にそのデータを入れるための場所を指定する。そのリクエスタのスイッチング論理（４４）は、その資源要求に関係する要求タグを格納するためのタグ待ち行列と、そのタグ待ち行列からの個々の要求タグを資源応答に関係づけるための論理手段と、その資源応答および個々の要求タグをり゛クエスタへ返すための手段とを含む。メモリ（１４）に関係するスイッチング論理（４００）は、共用資源との間で要求く経路指定するための制御論理と、複数の決定要求を取り扱うための論理と、要求されている最終データエンティティを格納または検索するための論理とを含む。The request signal checks the validity of its address and request tag. The address is Specifies the storage location in memory (14) of the requested data. The request tag is specifies the location to put the data when it is returned to its processor . The requester's switching logic (44) determines whether the requestor's switching logic (44) A tag queue for storing tags and individual request tags from that tag queue A logical means for relating a resource response to a resource response and its individual request tags. and means for returning the information to the requester. Switches related to memory (14) The routing logic (400) includes control logic for routing requests to and from shared resources. , the logic for handling multiple decision requests and the final data entity being requested. and logic for storing or retrieving the information.

国際調査報告international search report

Claims

[Claims]

1. Non-sequential shared resource access in multiprocessors with multiple processors , wherein the shared resources include main memory, global registers, and the device is operable with each of the processors; a request for generating multiple resource requests from said processor; generating means, each of said resource requests having an address and address of the requested shared resource; , a request tag that specifies the storage location within the processor to which the resource request should be returned; the request generation means, which operably connected to the request generating means, the time at which the resource request was generated; for in turn receiving the resource request and routing the resource request to the shared resource; switching means, wherein said switching means a tag queue for storing the requested tag; logic means for associating each request tag from the tag queue with a resource response; and means for returning the resource request and each request tag to the processor. the switching means, operatively connected to the switching means and the shared resource; service the resource request when the requested resource becomes available; means for returning said resource responses to said switching means in the order in which they are serviced; It includes Thereby, the resource responses are out of order with respect to the time order in which the resource requests were issued. A device characterized in that it can be returned at

2. 2. The apparatus of claim 1, wherein the switching means further comprises: before being routed to said shared resource in a given clock cycle of the processor system. Arbitration node means for arbitrating between storage resource requests A device comprising:

3. 3. The apparatus of claim 2, further comprising logical means for associating each of said request tags. further receives a cancellation instruction from the request generation means, and responds to the cancellation instruction. in response to the resource request prior to the time the resource request was routed to the shared resource. Apparatus characterized in that it includes cancellation logic means for canceling the request.

4. 4. The apparatus of claim 3, wherein said cancellation logic means responds to said resource request. to indicate that the resource request has been canceled by returning a non-numeric value in response to the request. Featured device.

5. 2. The apparatus of claim 1, wherein the switching means further comprises: characterized in that it includes an address verification means for verifying the validity of the requested address. equipment.

6. 6. The apparatus of claim 5, further comprising logical means for associating each of said request tags. further receives a cancellation instruction from the request generation means, and responds to the cancellation instruction. in response to the resource request prior to the time the resource request was routed to the shared resource. Apparatus characterized in that it includes cancellation logic means for canceling the request.

7. Non-sequential shared resource access in multiprocessors with multiple requesters , wherein the requester includes a processor and an external interface. The shared resources include main memory, global registers, and and an interrupt mechanism, the device comprising: operably connected to each requester and receiving multiple resource requests from said requester; a request generation means for generating a resource request, each of the resource requests being requested; address of the shared resource, a request tag that specifies the storage location within the requester where the resource request should be returned; the request generation means, which operably connected to the request generating means, the time at which the resource request was generated; for in turn receiving the resource request and routing the resource request to the shared resource; switching means, wherein said switching means a tag queue for storing the requested tag; logical means for associating each request tag from said tag queue with a resource response; for returning the resource request and each request tag to the requester, including: said switching means, said switching means comprising: operatively connected to the switching means and the shared resource; service the resource request when the requested resource becomes available; means for returning said resource responses to said switching means in the order in which they are serviced; It includes Thereby, the resource responses are out of order with respect to the time order in which the resource requests were issued. A device characterized in that it can be returned at

8. 8. The apparatus of claim 7, wherein the switching means further comprises: before being routed to said shared resource in a given clock cycle of the processor system. Arbitration node means for arbitrating between storage resource requests , and address verification means for verifying the validity of the address of the resource request. A device comprising:

9. 9. The apparatus of claim 8, further comprising logical means for associating each of said request tags. further receives a cancellation instruction from the request generation means, and responds to the cancellation instruction. in response to the resource request prior to the time the resource request was routed to the shared resource. Apparatus characterized in that it includes cancellation logic means for canceling the request.

10. 8. The apparatus of claim 7, wherein the requester and the shared resource are multiple requests. raster-organized, and said switching means further comprises a remote cluster. receives a resource request from a requester in that cluster directed to a shared resource in that cluster, and forwarding these requests to remote cluster adapter means of said remote cluster; receives a resource response from a cluster and is directed to a shared resource of said cluster; receiving a resource request from a remote cluster adapter means of said remote cluster; a remote cluster associated with each cluster for returning the resource response to the remote cluster; Apparatus characterized in that it includes star adapter means.

11. A multiprocessor system in which queuing and queuing occur between one or more requesters. and means for establishing a pipeline; a means for making requests to resources; such that the requests may be serviced in a different order than the time order in which the requests were made; and one or more resource means for responding to said request. processor system.

12. Accessing shared resources in multiprocessor systems with multiple processors A method for accessing a shared resource such as main memory, global registers, etc. and an interrupt mechanism, wherein the method includes a generating resource requests, each of said resource requests comprising: the address of the requested shared resource, and a request tag specifying a storage location within the processor to which the resource request is to be returned; the step of generating the request; to a switching mechanism related to the shared resource in the time order in which the resource request was issued. presenting the resource request; placing said request tag related to said resource request in a tag queue of said switching means; storing and making the requested resource available to generate a resource response; servicing the resource requests when the resource requests are serviced; and the order in which the resource requests are serviced. returning the resource response to the switching means at associating each of the request tags from the tag queue with the resource response; returning the resource request and each request tag to the processor; Thereby, the resource responses are out of order with respect to the time order in which the resource requests were issued. A method characterized in that the method can be returned by:

13. 13. The method of claim 12, wherein servicing the resource request comprises: routing to the shared resource at a predetermined clock cycle of the multiprocessor system; arbitrating between the resource requests determined; and verifying the validity of the address of the resource request. .

14. 14. The method of claim 13, wherein the step of servicing the resource request further comprises: , inspecting a cancellation instruction associated with the resource request and responding to the cancellation instruction. the resource request prior to the time the resource request was routed to the shared resource. A method characterized in that it includes a step of erasing.

15. 15. The method of claim 14, wherein the step of canceling the resource request includes the step of canceling the resource request. that the resource request has been canceled by returning a non-numeric value in response to a cancellation instruction; A method characterized by instructing.

16. Accessing shared resources in a multi-requester system with multiple requesters A method for accessing a processor and an external interface, the requester comprising: The shared resources include main memory, global memory, and register and an interrupt mechanism, the method comprising: generating a resource request from one of said requestors; Each is the address of the requested shared resource, and a request tag specifying a storage location within the requester to which the resource request is to be returned; the step of generating the request; to a switching mechanism related to the shared resource in the time order in which the resource request was issued. presenting the resource request; placing said request tag related to said resource request in a tag queue of said switching means; storing and making the requested resource available to generate a resource response; servicing the resource requests when the resource requests are serviced; and the order in which the resource requests are serviced. returning the resource response to the switching means at associating each of the request tags from the tag queue with the resource response; returning the resource request and each request tag to the processor; Thereby, the resource responses are out of order with respect to the time order in which the resource requests were issued. A method characterized in that the method can be returned by:

17. 17. The method of claim 16, wherein servicing the resource request comprises: routing to the shared resource at a predetermined clock cycle of the multiprocessor system; arbitrating between the resource requests determined; and verifying the validity of the address of the resource request. .

18. 18. The method of claim 17, wherein the step of servicing the resource request further comprises: , inspecting a cancellation instruction associated with the resource request and responding to the cancellation instruction. the resource request prior to the time the resource request was routed to the shared resource. A method characterized in that it includes a step of erasing.

19. 17. The method of claim 16, wherein the requester and the shared resource being organized into a cluster and servicing the resource requests further comprising: A remote resource request from a requester of said cluster directed to a shared resource of a remote cluster. and forwarding the remote resource request to a remote cluster and transmitting the remote resource request to the remote cluster. 12. A method comprising: receiving a resource response from a host.

20. 20. The method of claim 19, wherein the step of servicing the resource request further comprises: To, receiving a remote resource request from the remote cluster directed to a shared resource of the cluster; and returning the resource response to the remote cluster for the remote request. A method characterized by comprising: