JPH0652126A

JPH0652126A - Message passing device of interconnected node network and computer system thereof

Info

Publication number: JPH0652126A
Application number: JP5119335A
Authority: JP
Inventors: Norman Barker Thomas; トーマス・ノーマン・バーカー; Charles Dap Michael; ミッチェル・チャールス・ダップ; Walren Difenderfer James; ジェイムス・ワーレン・ディフェンダファ; Mitchell Resmeesta Donald; ドナルド・ミッチェル・レスミースタ; Richard Edward Nier; リチャード・エドワード・ニーア; Eugene Letter Eric; エリック・ユージーン・レター; List Richardson Robert; ロバート・リースト・リチャードソン; John Smooral Vincent; ビンセント・ジョーン・スモーラル
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1992-05-22
Filing date: 1993-04-23
Publication date: 1994-02-25

Abstract

PURPOSE: To rapidly transmit an I/O message passing between the inter-connection processing elements of a large scale parallel computer system. CONSTITUTION: A mechanism removing the setting processing of I/O constitution and the using of a memory band width to pass by storing an output message from the main memory of a node without using the processing asset of the node is provided and a control and communication route dynamically connecting an input/output data route is given to allow a message to pass the node by using the dynamically connected input/output route without using a processing, a memory band width or the storage asset of the node, and the node 20 allow the message to pass the dynamically connected input/output route when there is not a memory and the request of input data and provide a function simultaneously processes the memory and input data by using the full memory band width of the node.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は多重又はマルチ・プロセ
ッサ並列計算システムに関し、特に密並列プロセッサ及
び処理要素間でＩ／Ｏメッセージを急速転送する能力を
有するアーキテクチャに関する。本発明は単一命令複数
データ（ＳＩＭＤ）及び複数命令複数データ（ＭＩＭ
Ｄ）のオペレーション・モード間を動的に切替える能力
を有する。FIELD OF THE INVENTION This invention relates to multi- or multi-processor parallel computing systems, and more particularly to architectures having the ability to rapidly transfer I / O messages between densely parallel processors and processing elements. The present invention provides single instruction multiple data (SIMD) and multiple instruction multiple data (MIM).
It has the ability to dynamically switch between modes D).

【０００２】[0002]

【従来の技術】相互参照特許出願本願は下記の関連特許出願の部分継続出願であり、その
優先権を要求する。“並列連想プロセッサ・システム”
と称するジェイ・ディーフェンダーファほかにより１９
９０年１１月１３日に提出の米国特許出願６１１，５９
４号。“動的複数モード並列プロセッサ・アレイ・アー
キテクチャ”と称するピー・エム・コージにより１９９
１年１１月２７日に提出の米国特許出願７９８，７８８
号。 Cross Reference Patent Applications This application is a continuation-in-part of the following related patent application and claims its priority. "Parallel associative processor system"
19 by Jay Defenderfa et al.
US Patent Application 611,59 filed November 13, 1990
No. 4. 199 by P.M.C., Called "Dynamic Multi-Mode Parallel Processor Array Architecture"
US Patent Application 798,788 filed Nov. 27, 1
issue.

【０００３】更に、本願は本願と同時出願の下記特許出
願と関連がある。“ＳＩＭＤ処理要素内の命令”と称す
るピー・エイ・ウイルキンソンほかにより１９９２年５
月２２日に提出の米国特許出願。“ＳＩＭＤ機における
浮動小数点の実現”と称するピー・エイ・ウイルキンソ
ンほかにより１９９２年５月２２日に提出の米国特許出
願。“ＳＩＭＤピケットのグループ化”と称するピー・
エイ・ウイルキンソンほかにより１９９２年５月２２日
に提出の米国特許出願。“アレイ・プロセッサ用スライ
ド・ネットワーク”と称するピー・エイ・ウイルキンソ
ンほかにより１９９２年５月２２日に提出の米国特許出
願。“ＳＩＭＤ機におけるピケットの自律化”と称する
ピー・エイ・ウイルキンソンほかにより１９９２年５月
２２日に提出の米国特許出願。“Ｈ−ＤＯＴＳに基づく
アレイ・プロセッサ通信ネットワーク”と称するピー・
エイ・ウイルキンソンほかにより１９９２年５月２２日
に提出の米国特許出願。“ＳＩＭＤ／ＭＩＭＤ機用制御
機能”と称するアール・アール・リチャードソンほかに
より１９９２年５月２２日に提出の米国特許出願。Further, the present application is related to the following patent applications co-filed with the present application. May 1992 by P.A. Wilkinson et al., Entitled "Instructions in SIMD Processing Elements".
US patent application filed on March 22. US patent application filed on May 22, 1992 by P.A. Wilkinson et al., Entitled "Floating Point Realization on SIMD Machines". P's called "SIMD Picket Grouping"
US patent application filed May 22, 1992 by A. Wilkinson et al. US patent application filed May 22, 1992 by P.A. Wilkinson et al., Entitled "Slide Network for Array Processors". US patent application filed on May 22, 1992 by P.A. Wilkinson et al., Entitled "Picket Autonomy in SIMD Machines". A P-system called "H-DOTS-based Array Processor Communication Network"
US patent application filed May 22, 1992 by A. Wilkinson et al. US patent application filed May 22, 1992 by Earl Earl Richardson et al., Which is called "Control Function for SIMD / MIMD Machines".

【０００４】更に、本願は下記特許出願に関連がある。
“拡張並列アレイ・プロセッサ”と称するティー・バー
カほかにより１９９２年５月２２日に提出の米国特許出
願。“ＳＩＭＤ／ＭＩＭＤ処理メモリー要素”と称する
ティー・バーカほかにより１９９２年５月２２日に提出
の米国特許出願。“ＰＭＥ記憶及び転送／回路切替モー
ド”と称するティー・バーカほかにより１９９２年５月
２２日に提出の米国特許出願。“完全分散処理メモリー
要素”と称するティー・バーカほかにより１９９２年５
月２２日に提出の米国特許出願。“Ｎ次元修正ハイパー
コード”と称するティー・バーカほかにより１９９２年
５月２２日に提出の米国特許出願。Further, the present application is related to the following patent applications.
US patent application filed May 22, 1992 by Tea Barker et al., Entitled "Extended Parallel Array Processor". US patent application filed May 22, 1992 by Tea Barker et al., Entitled "SIMD / MIMD Processed Memory Element". US patent application filed May 22, 1992 by Tea Barker et al., Entitled "PME Storage and Transfer / Circuit Switching Mode." May 1992 by Tea Barka et al.
US patent application filed on March 22. US patent application filed May 22, 1992 by Tea Barker et al., Referred to as "N-dimensional modified hypercode".

【０００５】“拡張並列プロセッサ・アレイ・ディレク
タ”と称するエム・ダップほかにより１９９２年５月２
２日に提出の米国特許出願。“ＡＰＡＰ機械的パッケー
ジ”と称するエム・ダップほかにより１９９２年５月２
２日に提出の米国特許出願。“ＡＰＡＰＩ／Ｏプロ
グラマブル・ルータ”と称するエム・ダップほかにより
１９９２年５月２２日に提出の米国特許出願。“ＡＰＡ
ＰＩ／Ｏジッパ接続”と称するティー・バーカほか
により１９９２年３月に提出の米国特許出願。May 2, 1992 by M. Dup et al., Entitled "Extended Parallel Processor Array Director".
US patent application filed on the 2nd. May 2, 1992 by M Dup et al., Named "APAP Mechanical Package".
US patent application filed on the 2nd. US patent application filed on May 22, 1992 by M Dup et al., Named "APAP I / O Programmable Router". "APA
U.S. patent application filed in March 1992 by T. Barker et al., Entitled "PI / O Zipper Connection".

【０００６】本願及び上記出願中の特許出願はニューヨ
ーク州アーモンクのインターナショナル・ビジネス・マ
シーンズ・コーポレイションに譲渡され所有されてい
る。上記出願中の特許出願の記載は参照事項により本願
に編入される。This application and the above-noted pending patent applications are assigned and owned by International Business Machines Corporation of Armonk, NY. The description of the above-filed patent applications is incorporated herein by reference.

【０００７】他の相互参照特許出願本願と同一譲受人に譲渡され、本願出願時に所有した相
互参照特許出願は下記のものを含む。１９８８年９月２
７日に提出され、現在は“ＳＩＭＤアレイ・プロセッ
サ”と称するジェイムス・エル・テイラにより１９９０
年５月４日に提出されたその継続出願第０７／５１９，
３３２号の利益となるように放棄された米国特許出願第
０７／２５０，５９５号（１９８９年５月３日にＥＰＯ
出願第８８３０７８５５／８８−Ａとして最初開示され
た）。 Other Cross-Reference Patent Applications The cross-reference patent applications assigned to the same assignee as the present application and owned at the time of filing this application include: September 2, 1988
Submitted on the 7th, now called "SIMD Array Processor" by James El Teira 1990
Continuation Application No. 07/519, filed May 4, 2014,
US patent application Ser. No. 07 / 250,595, which was abandoned to benefit 332 (EPO on May 3, 1989).
It was first disclosed as application No. 8807855 / 88-A).

【０００８】“ポリモルフィック・メッシュに任意なグ
ラフを実現する回路及び方法”と称するエイチ・リーに
より１９８８年５月１３日に提出の米国特許出願第０７
／１９３，９９０号。“大容量並列ＳＩＭＤコンピュー
タ用２次元入出力方式”と称するアール・ジャフェほか
により１９８９年１０月２４日に提出の米国特許出願第
０７／４２６，１４０号。“並列プロセッサ・システム
におけるメモリー保護操作実行装置及び方法”と称する
ダヴリュー・シー・ディートリック，ジュニアほかによ
り１９８９年１１月２１日に提出の米国特許出願第０７
／４３９，７５８号。US Patent Application No. 07, filed May 13, 1988 by H. Lee, entitled "Circuits and Methods for Realizing Arbitrary Graphs on Polymorphic Meshes".
/ 193,990. U.S. patent application Ser. No. 07 / 426,140 filed Oct. 24, 1989 by Earl Jaffe et al., Entitled "Two-Dimensional Input / Output Method for Massively Parallel SIMD Computers". U.S. patent application Ser. No. 07, filed Nov. 21, 1989 by W. C. Dietrick, Jr. et al., Entitled "Memory Protected Operation Performing Apparatus and Method in Parallel Processor Systems".
/ 439,758.

【０００９】“相互接続処理要素システム及び相互接続
方法”と称するダヴィド・ビー・ロルフにより１９９１
年５月１３日に提出の米国特許出願第０７／６９８，８
６６号。出願中の上記参照全特許出願は本願と同一譲受
人であるインターナショナル・ビジネス・マシーンズ・
コーポレイションに譲渡され所有されている。上記特許
出願の記載は参照事項により本願に編入される。1991 by David B. Rolf, entitled "Interconnect Processing Element System and Interconnect Method".
US patent application Ser. No. 07 / 698,8 filed May 13, 2013
No. 66. All of the above referenced patent applications pending are all assigned to International Business Machines
Transferred to and owned by Corporation. The description of the above patent application is incorporated herein by reference.

【００１０】使用用語集・ＡＬＵプロセッサの演算論理装置である。・アレイ１次元以上の構成要素又は要素の配列を示す。アレイは
ホートランのような言語において単一名で識別される順
序付きデータ項目（アレイ要素）群を含む。他の言語に
おいては、かかる順序付きデータ項目群の名称は全べて
同一属性を有するデータ要素群又は順序付き集合を示
す。 Glossary of Terms ALU is an arithmetic logic unit of a processor. Array Indicates one-dimensional or more-dimensional component or array of components. An array contains an ordered group of data items (array elements) identified by a single name in languages such as Hotlan. In other languages, the names of such ordered data items all refer to data elements or ordered sets that all have the same attributes.

【００１１】プログラム・アレイは一般に数又は次元属
性によって指定された次元を有する。又、アレイの宣言
子はある言語のアレイの各次元のサイズを指定すること
ができる。ある言語において、アレイはテーブルの要素
の配列である。ハードウェアを意味する場合、アレイは
大規模並列アーキテクチャの全体的に同一な構造（機能
要素）の集合である。データ並列演算におけるアレイ要
素は、並列が各独立であり、要求されたオペレーション
を並列に実行することができるときに、そのオペレーシ
ョンを割当てることができる要素である。一般に、アレ
イは処理要素の格子とみなすことができる。アレイの各
部には、部分データが正規の格子パターンの周囲を移動
することができるように、その部分データを割当てるこ
とができる。しかし、データはアレイの任意な位置に割
振り、又は割当てることができる。Program arrays generally have a dimension specified by a number or dimension attribute. The array declarator can also specify the size of each dimension of the array for a language. In some languages, an array is an array of table elements. In the hardware sense, an array is a collection of globally identical structures (functional elements) in a massively parallel architecture. An array element in a data parallel operation is an element that can be assigned a parallel operation when the parallel operations are independent and the requested operation can be executed in parallel. In general, an array can be thought of as a grid of processing elements. Each part of the array can be assigned partial data so that the partial data can move around a regular grid pattern. However, the data can be allocated or assigned to any location in the array.

【００１２】・アレイ・ディレクタアレイに対する制御装置としてプログラムされた装置で
ある。アレイ・ディレクタはアレイに配列された機能要
素のグループ化に対するマスタ・コントローラ（又は制
御装置）の機能を実行する。Array Director A device programmed as a controller for the array. The array director performs the function of a master controller (or controller) for the grouping of functional elements arranged in an array.

【００１３】・アレイ・プロセッサ２つの主なタイプのアレイ・プロセッサがある。その１
つは複数命令複数データ（ＭＩＭＤ）アレイ・プロセッ
サであり、他方は単一命令複数データ（ＳＩＭＤ）アレ
イ・プロセッサである。ＭＩＭＤアレイ・プロセッサに
おいては、アレイの処理要素は共通の命令ストリームを
経由する同一命令に制限されるが、各処理要素に関連す
るデータは個有である。本発明の好ましいアレイ・プロ
セッサは他の特性を有する。それは拡張アレイ・プロセ
ッサと呼称され、頭字語ＡＰＡＰを使用する。Array Processors There are two main types of array processors. Part 1
One is a multiple instruction multiple data (MIMD) array processor, and the other is a single instruction multiple data (SIMD) array processor. In MIMD array processors, the processing elements of the array are limited to the same instruction via a common instruction stream, but the data associated with each processing element is unique. The preferred array processor of this invention has other characteristics. It is called the Extended Array Processor and uses the acronym APAP.

【００１４】・非同期正規の時間関係がないことである。一方の機能の遂行に
より他方の機能の遂行を予想することができないという
ことである。すなわち、他方の機能の遂行との関係にお
いて規則的な又は予想可能な時間関係を持たずに発生す
るものである。制御状況下において、データが遊休要素
をアドレスするのを待ち状態にあるとき、コントローラ
は制御を引渡す位置を指定する。これによりオペレーシ
ョンの順次は維持されるが、他の事象の発生時刻と一致
しない。Asynchronous There is no regular time relationship. The performance of one function cannot predict the performance of the other function. That is, it occurs without having a regular or predictable time relationship with the performance of the other function. Under control conditions, when the controller is waiting for data to address an idle element, the controller specifies a location to yield control. This maintains the sequence of operations but does not coincide with the time of occurrence of other events.

【００１５】・ＢＯＰＳ／ＧＯＰＳＢＯＰＳ又はＧＯＰＳは秒当り大量（数１０億）のオペ
レーションと同じ意味を有する頭字語である。ＧＯＰＳ
参照。BOPS / GOPS BOPS or GOPS is an acronym that has the same meaning as a large number (billions) of operations per second. GOPS
reference.

【００１６】・回路切替／記憶転送これらの用語はノード・ネットワークを通してデータ・
パケットを移動する２つの機構を示す。記憶転送は各中
間ノードがデータ・パケットを受信し、そのメモリーに
記憶し、そしてその受信先の方に転送する。回路切替は
データ・パケットを中間ノード・メモリーに入力せず
に、そのノードを通して受信先の方に直接データ・パケ
ットを通過するよう、中間ノードの入力ポートを出力ポ
ートに論理的に接続することを指令する機構である。Circuit switching / memory transfer These terms refer to data through a node network
Two mechanisms for moving packets are shown. Store transfer involves each intermediate node receiving a data packet, storing it in its memory, and forwarding it to its destination. Circuit switching involves logically connecting the input port of an intermediate node to an output port so that the data packet does not enter the intermediate node memory but passes through the node directly to the destination. It is a mechanism for commanding.

【００１７】・クラスタクラスタは制御装置（クラスタ・コントローラ）及びそ
れに接続されたハードウェア（ターミナル機能装置、又
は仮想構成要素でよい）とから成るステーション（又は
端末、機能装置）である。本願のクラスタはノード・ア
レイとも称するＰＭＥのアレイを含む。通常、クラスタ
は５１２ＰＭＥを持つ。本願の全ＰＭＥノード・アレイ
は、各々がクラスタ・コントローラ（ＣＣ）によって支
援されるクラスタ群から成る。Cluster A cluster is a station (or terminal, functional device) consisting of a control device (cluster controller) and the hardware (which may be a terminal functional device or virtual component) connected to it. The cluster of the present application includes an array of PMEs, also referred to as a node array. Usually, the cluster has 512 PMEs. The entire PME node array of the present application consists of clusters, each supported by a cluster controller (CC).

【００１８】・クラスタ・コントローラクラスタ・コントローラは１以上の装置又はそれに接続
されている機能装置に対する入出力（Ｉ／Ｏ）オペレー
ションを制御する装置である。クラスタ・コントローラ
は通常、それがＩＢＭ３６０１金融機関通信制御装置に
存在する場合には、装置に記憶されそこで遂行されるプ
ログラムによって制御されるが、それがＩＢＭ３２７２
制御装置に存在する場合には、ハードウェアにより全体
的に制御されうるものである。Cluster Controller A cluster controller is a device that controls input / output (I / O) operations to one or more devices or functional units connected thereto. The cluster controller is usually controlled by a program stored in the IBM 3601 financial institution communications controller, if present on the device, and executed by the IBM 3272.
If present in the controller, it may be entirely controlled by hardware.

【００１９】・クラスタ・シンクロナイザクラスタ・シンクロナイザは構成要素の同期オペレーシ
ョンを維持するため、クラスタの一部又は全部のオペレ
ーションを管理する機能装置である。この機能装置はプ
ログラムの遂行と特定の時間関係を維持する。Cluster Synchronizer The Cluster Synchronizer is a functional unit that manages the operation of some or all of the clusters to maintain the synchronized operation of the components. This functional unit maintains a specific time relationship with the execution of the program.

【００２０】・コントローラコントローラは相互接続ネットワークのリンクを経由し
てデータ及び命令の送信を制御する装置である。そのオ
ペレーションはコントローラが接続されているプロセッ
サで遂行されるプログラムにより、又は装置内で遂行さ
れるプログラムによって制御される。Controller A controller is a device that controls the transmission of data and instructions via the links of the interconnection network. The operation is controlled by a program executed by the processor to which the controller is connected or by a program executed in the device.

【００２１】・ＣＭＯＳＣＭＯＳは相補型金属酸化物半導体技術に対する頭字語
である。それは通常動的ランダム・アクセス・メモリー
（ＤＲＡＭ）の製造に使用される。ＮＭＯＳはＤＲＡＭ
の製造に使用される他の技術である。ＣＭＯＳが好まし
いが、ＡＰＡＰの製造に使用する技術は現に使用されて
いる半導体技術の範囲に制限されるべきものではない。CMOS CMOS is an acronym for complementary metal oxide semiconductor technology. It is commonly used in the manufacture of dynamic random access memory (DRAM). NMOS is DRAM
Is another technique used in the manufacture of. Although CMOS is preferred, the technology used to manufacture APAP should not be limited to the scope of semiconductor technology currently in use.

【００２２】・ドッティング（Dotting) ドッティングは３以上のリードを物理的に共に接続する
ことにより結合することを示す。ほとんどのバックパネ
ル・バスはこの接続方式を共用する。この用語は経過時
間のオア・ドット（OR DOT）も表わすが、ここでは非常
に簡単なプロトコルによりバスに組合わせることができ
る複数のデータ源の識別に使用される。Dotting refers to connecting three or more leads by physically connecting them together. Most back panel buses share this connection method. This term also refers to elapsed time OR DOT, but is used here to identify multiple data sources that can be combined into a bus by a very simple protocol.

【００２３】本発明におけるＩ／Ｏジッパ（Zipper、後
述する）の概念は、ノードに入るポートがノードから出
るポートによって駆動され、又はシステム・バスからく
るデータによって駆動することができる概念の実現に使
用することができる。逆に、ノードから出るデータは他
のノードに対する入力と、システム・バスに対する入力
の両方に使用可能である。システム・バス及び他のノー
ドの両方に対するデータ出力は同時には行われず、サイ
クルが別である。The concept of I / O zippers (Zipper, described below) in the present invention implements the concept that a port entering a node can be driven by a port exiting the node or by data coming from the system bus. Can be used. Conversely, data exiting a node is available both as an input to other nodes and to the system bus. Data output to both the system bus and other nodes does not occur at the same time and is a separate cycle.

【００２４】ドッティングは、２ポートＰＥ或はＰＭＥ
又はピケットがドッティングを利用することによって各
種編成のアレイに使用することができる場合におけるＨ
−ＤＯＴの検討で使用される。２Ｄ及び３Ｄメッシュ、
基数２Ｎ−立方（又は３次元）、疎基数４Ｎ−立方、及
び疎基数８Ｎ−立方等を含み、数個のトポロジが検討さ
れる。Dotting is for 2-port PE or PME
Or H when pickets can be used in arrays of various formations by utilizing dotting
-Used in DOT studies. 2D and 3D mesh,
Several topologies are considered, including radix-2N-cubic (or three-dimensional), sparse radix-4N-cubic, and sparse-radix-8N-cubic.

【００２５】・ＤＲＡＭＤＲＡＭは動的ランダム・アクセス・メモリーの頭字語
であり、コンピュータの主メモリー用の記憶装置として
一般に使用される。用語ＤＲＡＭはキャッシュとして、
又は主メモリーではないメモリーとしての使用にも適用
される。DRAM DRAM is an acronym for Dynamic Random Access Memory and is commonly used as a storage device for the main memory of computers. The term DRAM is a cache,
It also applies to use as memory that is not main memory.

【００２６】・浮動小数点（FLOATING-POINT）浮動小数点の数は２つの部分で表わされる。それらは、
固定小数点又は小数部と、ある想定の基数又はベース
（radix 、base）に対する指数部とである。指数は１０
進小数点の実際の配置を示す。典型的な浮動小数点の表
現として、例えば、実数０．０００１２３４は０．１２
３４−３として表わされる。その場合、０．１２３４は
固定小数点部であり、−３は指数である。Floating point (FLOATING-POINT) A floating point number is represented in two parts. They are,
It is a fixed point or fractional part, and an exponent part for some assumed radix or base (radix, base). The index is 10
Indicates the actual placement of the decimal point. As a typical floating point representation, for example, the real number 0.0001234 is 0.12.
It is represented as 34-3. In that case, 0.1234 is a fixed point part and -3 is an exponent.

【００２７】この例において、浮動小数点の基数又はベ
ース（radix 、base）は１０である。この１０は単位よ
り大きく、浮動小数点表示の指数により明示的に表わさ
れるか、浮動小数点表示の特性によって表示される冪で
累乗されてから、固定小数点部に乗算することにより実
数の表示を決定するようにした暗黙固定の正の整数の基
数を表わす。数字リテラルは実数同様浮動小数点の表記
法で表わすことができる。In this example, the floating point radix or base is 10. This 10 is larger than the unit and is either expressed explicitly by the exponent of the floating point display or raised to the power indicated by the characteristic of the floating point display, and then multiplied by the fixed point part to determine the display of the real number. Represents the radix of an implicitly fixed positive integer. Numeric literals can be represented in floating point notation as well as real numbers.

【００２８】・ＦＬＯＰＳこの用語は秒当りの浮動小数点命令数を示す。浮動小数
点の計算は加算（ＡＤＤ）、減算（ＳＵＢ）、乗算（Ｍ
ＰＹ）、割算（ＤＩＶ）及び他の多くのものを含む。秒
当りの浮動小数点命令のパラメータは屡加算又は乗算命
令を使用して計算され、一般に５０／５０比の混合を有
するものと推定することができる。この計算は指数、小
数、及び全べての必要な小数正規化の生成を含む。本実
施例においては、３２ビット又は４８ビット浮動小数点
ホーマットをアドレスすることができる（本実施例にお
いて、混合の場合、それを計数しなかったがより長くす
ることができる）。浮動小数点の計算は、固定小数点命
令によって行う場合、乗算命令を必要とする。あるもの
は結果を数字表示する場合１０対１比を使用するが、あ
る特定の研究においてはより適切な使用として６．２５
比を示した。各種アーキテクチャは異なる比率を有す
る。FLOPS This term refers to floating point instructions per second. Floating point calculations are addition (ADD), subtraction (SUB), multiplication (M
PY), division (DIV) and many others. The parameters of floating point instructions per second are calculated using often add or multiply instructions and can be generally estimated to have a 50/50 ratio mixture. This calculation involves the generation of exponents, decimals, and all necessary fractional normalizations. In the present example, a 32-bit or 48-bit floating point format can be addressed (in the present case it was not counted in the case of mixing, but can be longer). Floating point calculations require multiply instructions when performed by fixed point instructions. Some use a 10 to 1 ratio when numerically displaying results, but 6.25 is a more appropriate use in certain studies.
The ratio is shown. Various architectures have different ratios.

【００２９】・機能装置機能装置は目的を達成することができるハードウェア、
ソフトウェア、又は両方の実体又はエンティティであ
る。Functional device A functional device is hardware capable of achieving the purpose,
Software, or both entities or entities.

【００３０】・ＧバイトＧバイト（Ｇbytes)は１０億（１０⁹）バイトを示す。
Ｇbytes/ｓは秒当り１０億バイトを表わす。G bytes G bytes indicate 1 billion (10 ⁹ ) bytes.
Gbytes / s represents 1 billion bytes per second.

【００３１】・ＧＩＧＡＦＬＯＰＳこの用語は秒当り（１０）^**９浮動小数点命令を意味す
る。GIGAFLOPS This term means (10) ^** 9 floating point instructions per second.

【００３２】・ＧＯＰＳ及びＰＥＴＡＯＰＳＧＯＰＳ及びＢＯＰＳは意味が同一であり、秒当り１０
⁹オペレーションを意味する。PETAOPS は秒当り１０¹²
オペレーションという現行機械の能力を意味する。本実
施例によるＡＰＡＰ機のそれは、正に秒当り１０⁹命令
を意味するＢＩＰ／ＧＩＰと同一である。ある機械にお
いて、その命令は２以上のオペレーション（すなわち、
加算及び乗算）を発生するかもしれないが、本発明はそ
のようなことはしない。その代り、１オペレーションを
実行するに多くの命令を使用するかもしれない。例え
ば、本発明は、６４ビット演算を実行するに複数の命令
を使用する。しかし、本発明におけるオペレーションの
計数の際、ログ・オペレーションの計数を選択しなかっ
た。ＧＯＰＳは、成果又はパホーマンスの表示に使用す
るに好ましいものであるかもしれないが、表示した使用
法に一貫性がない。あるものはＭＩＰＳ／ＭＯＰＳに遭
遇し、次にＢＩＰＳ／ＢＯＰＳ、及びＭｅｇａ（メガ）
FLOPS ／Ｇｉｇａ（ギガ）FLOPS ／Ｔｅｒａ（テラ）FL
OPS ／Ｔｅｔａ（テタ）FLOPS に遭遇する。GOPS and PETAOPS GOPS and BOPS have the same meaning and are 10 per second.
⁹ means operation. PETAOPS is 10 ¹² per second
It means the capacity of the current machine called operation. That of the APAP machine according to this embodiment is exactly the same as BIP / GIP which means 10 ⁹ instructions per second. On some machines, the instruction is more than one operation (ie
Additions and multiplications), the invention does not. Instead, many instructions may be used to perform one operation. For example, the present invention uses multiple instructions to perform 64-bit operations. However, when counting the operations in the present invention, the counting of log operations was not selected. GOPS may be preferred for use in displaying outcomes or performance, but the displayed usage is not consistent. Some encounter MIPS / MOPS, then BIPS / BOPS, and Mega
FLOPS / Giga (Giga) FLOPS / Tera (Terra) FL
Encounter OPS / Teta FLOPS.

【００３３】・ＩＳＡＩＳＡは命令セット・アーキテクチャを意味する。ISA ISA stands for Instruction Set Architecture.

【００３４】・リンクリンクは物理的又は論理（ロジック）的であり、要素又
は構成要素である。物理的リンクは構成要素又は装置を
接続する物理的接続であるが、コンピュータのプログラ
ミングにおいて、リンクはプログラムの離れた部所間に
おいて制御及びパラメータを通過させる命令又はアドレ
スである。多重システムにおいて、リンクは実アドレス
又は仮想アドレスによって識別することができるリンク
識別用プログラム・コードによって指定することができ
る２システム間の接続である。かくして、一般に、リン
クは、論理的及び物理的両方において、物理的媒体及び
全プロトコルと、関連装置及びプログラミングとを含
む。Link A link is physical or logical and is an element or component. A physical link is a physical connection that connects components or devices, but in computer programming, a link is an instruction or address that passes control and parameters between remote parts of a program. In multiple systems, a link is a connection between two systems that can be specified by a link identifying program code that can be identified by a real or virtual address. Thus, in general, a link includes both physical and physical protocols, both logically and physically, and associated devices and programming.

【００３５】・ＭＦＬＯＰＳこの用語は秒当り（１０）^**６浮動小数点命令を意味す
る。MFLOPS This term refers to (10) ^** 6 floating point instructions per second.

【００３６】・ＭＩＭＤＭＩＭＤはプロセッサ・アレイ・アーキテクチャを示す
ことに使用される。そこで、アレイの各プロセッサはそ
れ自体の命令ストリーム、すなわち複数の命令ストリー
ムを有し、各処理要素当り１つ配置された複数データ・
ストリームを遂行する。MIMD MIMD is used to refer to the processor array architecture. Thus, each processor in the array has its own instruction stream, i.e., multiple instruction streams, one for each processing element.
Carry out a stream.

【００３７】・モジュールモジュールは他の構成要素と共に使用するよう設計され
たハードウェアの個別的且つ識別可能なプログラム装置
又は単位、又は機能装置又は単位である。又、単一電子
チップに含まれているＰＥの集合もモジュールと呼ばれ
る。Module A module is a discrete and identifiable program unit or unit of hardware or functional unit or unit of hardware designed for use with other components. A group of PEs included in a single electronic chip is also called a module.

【００３８】・ノード一般にノードはリンクの接合又は接合点である。一般的
ＰＥアレイの１ＰＥはノードであることができる。ノー
ドは、又モジュールと称するＰＥの集合を含むことがで
きる。本実施例において、ノードはＰＭＥのアレイを構
成し、ＰＭＥの集合をノードと称する。ノードは８ＰＥ
Ｍであることが好ましい。Node A node is generally a junction or junction of links. One PE of a general PE array can be a node. A node can also include a collection of PEs, also called a module. In this embodiment, the nodes form an array of PMEs, and the set of PMEs is called a node. Node is 8 PE
It is preferably M.

【００３９】・ノード・アレイＰＭＥから成るモジュールの集合は屡ノード・アレイと
称し、それは、モジュールから成るノードのアレイであ
る。ノード・アレイは通常数個のＰＭＥより多いが、ノ
ード・アレイの用語は複数を包含する。Node Array A collection of modules made up of PMEs is often referred to as a node array, which is an array of nodes made up of modules. A node array usually includes more than a few PMEs, but the term node array encompasses the plural.

【００４０】・ＰＤＥＰＤＥは、部分微分方程式を示す。PDE PDE indicates a partial differential equation.

【００４１】・ＰＤＥ緩和解答法ＰＤＥ緩和解答法はＰＤＥ（部分微分方程式）を解答す
るための方法である。ＰＤＥ解答は公知汎用のほとんど
のスーパ・コンピュータ能力を使用するので、緩和法
（relaxation process）のよい例であるかもしれない。
ＰＤＥ方程式を解く方法は多数有り、１より多い数値法
は緩和法を含む。例えば、有限構成要素法によりＰＤＥ
を解く場合、緩和法は大量のコンピュータ時間を消費す
る。PDE relaxation solution method The PDE relaxation solution method is a method for solving a PDE (partial differential equation). The PDE solution uses most of the known general-purpose supercomputer powers and may therefore be a good example of a relaxation process.
There are many ways to solve the PDE equation, and numerical methods greater than one include relaxation methods. For example, by the finite component method, PDE
The relaxation method consumes a large amount of computer time when solving.

【００４２】今、熱伝導の世界における例を考える。煙
突内部に熱いガスがあり、外には冷い風が吹いている場
合、煙突レンガ内の温度勾配はどのように形成されるだ
ろうか。各レンガを小さなセグメントと仮定し、各セグ
メント間における熱の流れ方を温度差の関数として方程
式を表わすことにより、熱伝導ＰＤＥは、有限要素問題
に変換することができる。そこで、内部及び外部を除く
全べての要素が室温であるのに対し、その境界セグメン
トが熱ガス温度及び冷風温度であったとして、緩和の開
始を決定しなければならない。Now consider an example in the world of heat conduction. If hot gas is inside the chimney and cold wind is blowing outside, how will the temperature gradient in the chimney brick be formed? By assuming each brick as a small segment and expressing the equation of how the heat flows between each segment as a function of temperature difference, the heat transfer PDE can be transformed into a finite element problem. Therefore, it is necessary to determine the start of relaxation assuming that the boundary segments are the hot gas temperature and the cold air temperature, while all the elements except the inside and outside are room temperature.

【００４３】コンピュータ・プログラムはそのセグメン
トに流入する又は流出する熱の量に基づく各セグメント
の温度変数を更新することにより時間をモデル化する。
その場合、煙突を横切る温度変数群を緩和して、物理的
煙突において発生する実際の温度分布を与えるようにす
るまで、モデルの全セグメントを処理するに多数のサイ
クルを必要とする。The computer program models time by updating the temperature variable for each segment based on the amount of heat entering or exiting that segment.
It then takes many cycles to process all the segments of the model until the temperature variables across the chimney are relaxed to give the actual temperature distribution that occurs in the physical chimney.

【００４４】目的が煙突のガス冷却をモデル化すること
である場合、構成要素はガス方程式にまで拡張しなけれ
ばならず、煙突内部の境界の状態は他の有限リンク・モ
デルに連結され、処理又はプロセスを続行する。熱の流
れはセグメントとその隣接間温度差によって異なること
に注意する。温度変数を分布させるため、ＰＥ間通信経
路を使用する。それはこの隣接通信パターン、又はＰＤ
Ｅ関係を並列演算に大いに適用可能にする特性である。If the goal is to model the gas cooling of a chimney, the components must extend to the gas equation and the boundary conditions inside the chimney can be linked to other finite link models and processed. Or continue the process. Note that the heat flow depends on the temperature difference between the segment and its neighbors. An inter-PE communication path is used to distribute the temperature variables. This is this adjacent communication pattern, or PD
It is a property that makes the E relationship highly applicable to parallel computing.

【００４５】・ピケット（ＰＩＣＫＥＴ）ピケットはアレイ・プロセッサを構成する構成要素のア
レイの要素である。それはデータ・フロー（ＡＬＵＲ
ＥＧＳ）、メモリー、制御、要素に関連する通信マトリ
ックスの部分である。その装置はそれらの制御及びアレ
イ相互通信機構部分を有する並列プロセッサ及びメモリ
ー要素から成る第１／第ｎのアレイ・プロセッサを示
す。ピケットはプロセッサ・メモリー要素又はＰＭＥの
形態である。本発明によるＰＭＥチップ設計プロセッサ
・ロジックは関連出願のピケット・ロジックを実現する
ことができ、ノードとして形成されたプロセッサ・アレ
イに対するロジックを有する。Picket A picket is an element of an array of components that make up an array processor. Data flow (ALU R
EGS), memory, control, part of the communication matrix associated with the element. The device shows a 1 / nth array processor consisting of parallel processors and memory elements with their control and array intercommunication parts. Pickets are in the form of processor memory elements or PMEs. The PME chip design processor logic according to the present invention can implement the picket logic of the related application, with the logic for the processor array formed as nodes.

【００４６】用語ピケットは一般に使用の処理要素用の
アレイ用語ＰＥに類似し、１クロック・サイクルでビッ
ト並列情報バイトを処理する、好ましくは処理要素と局
所メモリーとの組合せからなる処理アレイの構成要素で
ある。バイト幅データ・フロー・プロセッサと３２ｋバ
イト以上のメモリーとから成るこの好ましい実施例は、
本質的に他のピケットに対する通信を制御し、接続す
る。The term picket is generally similar to the array term PE for used processing elements and is a component of a processing array that processes bit parallel information bytes in one clock cycle, preferably consisting of a combination of processing elements and local memory. Is. This preferred embodiment, consisting of a byte wide data flow processor and 32 kbytes or more of memory,
Essentially controls and connects to other pickets.

【００４７】用語ピケットは機能的には軍によるピケ・
ラインの類推が相当ぴったりに適合するものと思われる
が、その語源はトム・ソイヤ（Tom Sawyer）及び彼の白
柵からとったものである。The term picket is functionally a military picket.
The analogy of the line seems to fit fairly well, but its etymology comes from Tom Sawyer and his white fence.

【００４８】・ピケット・チップピケット・チップは単一シリコン・チップ上に複数のピ
ケットを含む。Picket Chip A picket chip contains multiple pickets on a single silicon chip.

【００４９】・ピケット・プロセッサ・システム（又は
サブシステム）ピケット・プロセッサはピケットのアレイ、通信ネット
ワーク、入出力（Ｉ／Ｏ）システム、及びマイクロプロ
セッサ、走査ルーチン・プロセッサ、及びアレイを走行
するマイクロ−コントローラから成るＳＩＭＤコントロ
ーラで構成される。Picket Processor System (or Subsystem) A picket processor is an array of pickets, a communications network, an input / output (I / O) system, and a microprocessor, scan routine processor, and micro-running array. It consists of a SIMD controller.

【００５０】・ピケット・アーキテクチャピケット・アーキテクチャは下記のような問題を含む数
々の異なる種類の問題に適応する機能を有するＳＩＭＤ
アーキテクチャに対する好ましい実施例である。 −集合連想処理 −並列数値的集中処理 −イメージに類似の物理的アレイ処理Picket Architecture Picket Architecture is a SIMD that has the ability to adapt to a number of different types of problems, including:
2 is a preferred embodiment for the architecture. − Set associative processing − Parallel numerical centralized processing − Image-like physical array processing

【００５１】・ピケット・アレイピケット・アレイは幾何学的順序で配列されたピケット
の集合、正規のアレイである。Picket Array A picket array is a regular array of pickets arranged in a geometric order.

【００５２】・ＰＭＥ又はプロセッサ・メモリー要素ＰＭＥはプロセッサ・メモリー要素に対して使用され
る。用語ＰＭＥは単一プロセッサ、メモリー、及び入出
力（Ｉ／Ｏ）可能システム要素又は本発明による並列ア
レイ・プロセッサの１つを形成する装置を示す。プロセ
ッサ・メモリー要素はピケットを包含する用語である。
プロセッサ・メモリー要素はプロセッサと、その関連メ
モリーと、制御インターフェースと、アレイ通信ネット
ワーク機構の部分とから成る第１／第ｎのプロセッサ・
アレイである。この要素はピケット・プロセッサ内又は
サブアレイの一部にあるような、又ここで説明する多重
プロセッサ・メモリー要素ノードにあるような正規のア
レイの接続機能を有するプロセッサ・メモリー要素を持
つことができる。PME or processor memory element The PME is used for the processor memory element. The term PME refers to a unit that forms a single processor, memory, and input / output (I / O) capable system element or parallel array processor according to the present invention. Processor memory element is a term that encompasses pickets.
The processor memory element comprises a processor, its associated memory, a control interface, and a part of the array communication network facility, the 1 / nth processor
It is an array. This element can have a processor memory element with the normal array connectivity, such as in a picket processor or as part of a sub-array, or in a multi-processor memory element node as described herein.

【００５３】・経路指定経路指定はメッセージを受信先に到達させるまでの物理
的経路の指定である。経路指定はソース又は発信元と受
信先とを有する。これら要素又はアドレスは一時的関係
又は類縁性を有する。メッセージの経路指定は屡指定表
を参照して得られるキーに基づき行われる。ネットワー
クにおいて、受信先はリンクを識別する経路制御アドレ
スに従い送信される情報の受信先としてアドレスされた
如何なる端末、ステーション、又はネットワーク・アド
レス可能な装置でよい。受信先フィールドはメッセージ
・ヘッダに置かれ、その受信先コードによって受信先が
識別される。Routing The routing is the designation of the physical route through which the message reaches the recipient. Routing has a source or source and a destination. These elements or addresses have a temporary relationship or affinity. Message routing is based on keys obtained by reference to a table often. In a network, the recipient may be any terminal, station, or network addressable device that is addressed as the recipient of information sent according to a routing address that identifies the link. The recipient field is placed in the message header and its recipient code identifies the recipient.

【００５４】・ＳＩＭＤＳＩＭＤはアレイの全プロセッサが単一の命令ストリー
ムから指令されて、処理要素当り１つ配置されている複
数データ・ストリームを遂行するようにしたプロセッサ
・アレイ・アーキテクチャである。SIMD SIMD is a processor array architecture in which all processors in the array are commanded from a single instruction stream to perform multiple data streams, one per processing element.

【００５５】・ＳＩＭＤＭＩＭＤ又はＳＩＭＤ／ＭＩＭＤこの用語はある複雑な命令を処理する期間、ＭＩＭＤか
らＳＩＭＤに切換えることができる二重機能を有する。
すなわち、２モードを有する機械を示す用語である。シ
ンキング・マシーンズ社の接続機型式ＣＭ−２は、ＭＩ
ＭＤ機の前端又は後端として置かれた場合、オペレーシ
ョンが異なる問題部分を遂行するため異なるモードを実
行すること（屡々二重モードという）を可能とする。SIMD MIMD or SIMD / MIMD This term has the dual function of being able to switch from MIMD to SIMD during the processing of certain complex instructions.
That is, it is a term indicating a machine having two modes. The Thinking Machines model CM-2 is MI
When placed as the front or rear end of an MD machine, it allows operations to perform different modes (often referred to as dual mode) to accomplish different problem parts.

【００５６】これらの機械はイリアク（Illiac）以来存
在し、マスタＣＰＵと他のプロセッサとを相互接続する
バスを使用した。マスタ制御プロセッサは他のＣＰＵの
処理の割込能力を有する。他のＣＰＵは独立のプログラ
ム・コードを走行することができる。割込み中チェック
ポイント機能（制御されるプロセッサの現行状況の閉鎖
及び保管）に寄与する機能がなければならない。These machines have existed since Illiac and used buses that interconnect the master CPU and other processors. The master control processor has the ability to interrupt the processing of other CPUs. Other CPUs can run independent program code. There must be a function that contributes to the checkpoint function during interruption (closing and saving the current state of the controlled processor).

【００５７】・ＳＩＭＩＭＤこの用語はアレイの全プロセッサが単一の命令ストリー
ムから指令されて、処理要素当り１つ配置されている複
数データ・ストリームを遂行するようにしたプロセッサ
・アレイ・アーキテクチャである。この構造内で、命令
遂行を指定する各ピケット内のデータ従属オペレーショ
ンはＳＩＭＤ命令ストリームによって制御される。SIMIMD This term is a processor array architecture in which all processors in the array are commanded from a single instruction stream to perform multiple data streams, one arranged per processing element. Within this structure, the data dependent operations within each picket that specify instruction performance are controlled by the SIMD instruction stream.

【００５８】これはＳＩＭＤ命令ストリームを使用して
複数データ・ストリーム（ピケット当り１つ）を操作す
る複数命令ストリーム（ピケット当り１つ）の逐次能力
を有する単一命令ストリーム機である。This is a single instruction stream machine with the sequential capability of multiple instruction streams (one per picket) to operate multiple data streams (one per picket) using the SIMD instruction stream.

【００５９】・ＳＩＳＤＳＩＳＤは単一命令単一データの頭字語である。SISD SISD is an acronym for single instruction, single data.

【００６０】・スワッピングスワッピングとは、ある記憶区域のデータ内容を他の記
憶区域のデータ内容と交換することである。Swapping Swapping is the exchange of the data content of one storage area with the data content of another storage area.

【００６１】・同期オペレーションＭＩＭＤ機の同期オペレーションは、各活動が事象（通
常クロック）に関連する動作モードである。それはプロ
グラム・シーケンスで正規に発生する指定事象であるこ
とができる。オペレーションは独立して機能を実行する
よう多数のＰＥにディスパッチされる。制御はオペレー
ションが終了するまでコントローラに戻されない。機能
装置のアレイにオペレーション命令があった場合、その
要求は、制御がコントローラに戻されるまで、それらオ
ペレーションを終了しなければならない各アレイの要素
に対しコントローラから発生する。Synchronous operation The synchronous operation of a MIMD machine is a mode of operation in which each activity is associated with an event (usually a clock). It can be a designated event that normally occurs in the program sequence. Operations are dispatched to multiple PEs to perform their functions independently. Control is not returned to the controller until the operation is complete. If there is an operation instruction on an array of functional units, the request is issued by the controller for each array element that must complete those operations until control is returned to the controller.

【００６２】・ＴＥＲＡＦＬＯＰＳこの用語はＴＥＲＡ（テラ）とFLOPS （前述）との結合
語であり、秒当り（１０）^**１２浮動小数点メモリーを
意味する。TERAFLOPS This term is a combination of TERA and FLOPS (described above), meaning (10) ^** 12 floating point memory per second.

【００６３】・ＶＬＳＩＶＬＳＩは集積回路に対して使用された場合における超
大規模集積の頭字語である。VLSI VLSI is an acronym for Very Large Scale Integration when used for integrated circuits.

【００６４】・ジッパ（Zipper）ジッパは新たに与えられた機能である。それはアレイ構
造の正規な相互接続の外部にある装置から接続されるべ
きリンクを考慮するものである。使用用語集（追加）回路切替：アレイのＰＭＥ間のデータ転送方法。
そこで、中間ＰＭＥは中間ＰＭＥにおいて処理を加える
ことなく、中間ＰＭＥを通してメッセージを最終受信先
の方へ通過させるよう、入力ポートを出力ポートに論理
的に接続する。-Zipper A zipper is a newly added function. It allows for links to be connected from devices outside the regular interconnect of the array structure. Glossary of terms (additional) circuit switching: Data transfer method between array PMEs.
Therefore, the intermediate PME logically connects the input port to the output port so as to pass the message to the final destination through the intermediate PME without any processing in the intermediate PME.

【００６５】記憶及び転送：アレイのＰＭＥ間のデー
タ転送方法。そこで、中間ＰＭＥは各メッセージを受信
して、そのメッセージを最終受信先の方へ再送信しなけ
ればならない。Storage and Transfer: A method of transferring data between PMEs of an array. The intermediate PME must then receive each message and retransmit it to its final recipient.

【００６６】入力転送終了割込：Ｉ／Ｏメッセージ・
ワードを転送終了タグと共に受信したときに発生するＰ
ＭＥプログラム・コンテキスト切替要求。Input transfer end interrupt: I / O message
P that occurs when a word is received with a transfer end tag
ME program context switching request.

【００６７】直接メモリー・アクセス（ＤＭＡ）：Ｉ
／Ｏポートをコンピュータ・メモリー・システム及び自
己管理データ転送に直接接続するようにした機構。Direct Memory Access (DMA): I
A mechanism to directly connect the / O port to a computer memory system and self-managed data transfer.

【００６８】ブレーク・イン：プロセッサを透過コン
テキスト切替にし、プロセッサを自己管理データ転送機
能に対するデータ・フロー及び制御経路指定に使用する
ようにした機構。Break-in: A mechanism that puts the processor into a transparent context switch and uses the processor for data flow and control routing to self-managed data transfer functions.

【００６９】実行時間ソフトウェア：処理要素で遂行
するソフトウェア。それは、オペレーティング・システ
ム、遂行プログラム、アプリケーション・プログラム、
サービス・プログラム等を含む。Execution Time Software: Software executed by the processing elements. Operating system, executive program, application program,
Includes services and programs.

【００７０】メモリー・リフレッショ：現行情報が再
書込みされている間、メモリーの使用を中断するように
した動的ＲＡＭ（ＤＲＡＭ）技術の必須機能。Memory Reflection: An essential feature of Dynamic RAM (DRAM) technology that suspends the use of memory while the current information is being rewritten.

【００７１】ウエイポイントＰＭＥ：メッセージの発
信元でも受信先でもないＰＭＥのネットワークを通るメ
ッセージ経路におけるＰＭＥ。Waypoint PME: A PME in a message path through a network of PMEs that is neither the originator nor recipient of the message.

【００７２】背景技術より高速なコンピュータに対する終りなき探求におい
て、今日の機械を困惑させる複雑な問題を克服するた
め、分割して数百及び数千のロー・コスト・マイクロプ
ロセッサを並列に連結することによりスーパ・スーパコ
ンピュータを作成するようにしてきた。かかる機械は大
規模並列と呼ばれる。大規模並列システムを作成するた
め、本発明者は新たな方法を作成した。本発明者が行っ
た多くの改良に対しては他の多くの業績の背景が考慮さ
れなければならない。 Background Art In an endless quest for faster computers, hundreds and thousands of low cost microprocessors are concatenated in parallel to overcome the complex problems that plague today's machines. Has created a super computer. Such machines are called massively parallel. To create a massively parallel system, the inventor created a new method. For many of the improvements made by the inventor, many other performance backgrounds must be considered.

【００７３】技術分野の要約において他の出願に対し参
照が行われた。その点については、本発明者による並列
連想プロセッサ・システム（米国特許出願第６０１，５
９４号）と、拡張並列アレイ・プロセッサ（ＡＰＡＰ）
に対する関連出願を参照するとよい。特定のアプリケー
ションに最もよく適合するアーキテクチャを選出するた
めにシステム交換が要求されるが、一つの解決法も満足
するものはなかった。そして、本発明者の思想は解決を
与えることを容易にした。References were made to other applications in the Technical Summary. In this regard, the present inventors have proposed a parallel associative processor system (US Patent Application No. 601,5).
No. 94) and an advanced parallel array processor (APAP)
See the related application for. System replacement is required to select the architecture that best fits a particular application, but no single solution has been satisfactory. And the idea of the inventor made it easy to provide a solution.

【００７４】[0074]

【発明が解決しようとする課題】以上記述した従来技術
においては、データ経路の中間ノードを透過する使用は
提案されておらず、又その接続構造は非常に複雑である
ばかりでなく、各ノードにおいて処理と通信とを分離し
て行う必要があった。The above-mentioned prior art does not propose the use of transparent intermediate nodes in the data path, and the connection structure is not only very complicated, but also in each node. It was necessary to separate processing and communication.

【００７５】従って、本発明の目的は、大規模並列アプ
リケーションに使用することができるコンピュータ・シ
ステムの相互接続ノード・ネットワークを通してメッセ
ージを通過させる能力を有する大規模並列コンピュータ
・システムを提供することである。Accordingly, it is an object of the present invention to provide a massively parallel computer system having the ability to pass messages through an interconnect node network of computer systems that can be used for massively parallel applications. .

【００７６】[0076]

【課題を解決するための手段】本発明者は新規な概念に
より設計されたシステム及び新たな“チップ”を作成す
ることにより大規模並列プロセッサ及び他のコンピュー
タ・システムを作成する新たな方法を創作した。本出願
はかかるシステムに向けられる。本願及び関連出願にお
いて、開示されるべき各種概念はそれら出願において見
ることができる。各出願に記述されている構成要素はこ
のシステムに組合わされて新たなシステムとすることが
できる。それらは現行技術と組合わせることもできる。The inventor has created a new method for creating massively parallel processors and other computer systems by creating a system and a new "chip" designed according to the novel concept. did. The present application is directed to such a system. Various concepts to be disclosed in the present application and related applications can be found in those applications. The components described in each application can be combined into this system to form a new system. They can also be combined with current technology.

【００７７】本願及び関連出願において、拡張並列アレ
イ・プロセッサ（ＡＰＡＰ）と称するピケット・プロセ
ッサを考案した。ピケット・プロセッサはＰＭＥを使用
することができるということに注意するべきである。ピ
ケット・プロセッサは、非常にコンパクトなアレイ・プ
ロセッサを希望するような軍の適用に特に有益であるか
もしれない。その点に関し、このピケット・プロセッサ
は、幾分、本願の拡張並列アレイ・プロセッサ（ＡＰＡ
Ｐ）に対する好ましい本実施例とは異なるかもしれな
い。しかし、共通性は在り、本実施例のある面及び機能
は、異なる機械に適用することができる。In the present application and related applications, we have devised a picket processor called the Enhanced Parallel Array Processor (APAP). It should be noted that the picket processor can use the PME. Picket processors may be particularly useful in military applications where a very compact array processor is desired. In that regard, this picket processor is somewhat similar to the Advanced Parallel Array Processor (APA) of this application.
It may be different from the preferred embodiment for P). However, there is commonality and certain aspects and features of this embodiment can be applied to different machines.

【００７８】用語ピケットは、プロセッサ及びメモリー
から成るアレイ・プロセッサの第１／第ｎの要素と、ア
レイ相互間通信に適用可能な、そこに含まれている通信
要素とを含む。ピケットの概念は、又、第１／第ｎのＡ
ＰＡＰ処理アレイにも適用可能である。The term picket includes the first / nth element of an array processor consisting of a processor and memory and the communication elements contained therein, which are applicable to inter-array communication. The concept of picket is also the 1st / nth A
It is also applicable to PAP processing arrays.

【００７９】ピケットの概念は、データ幅、メモリー・
サイズ、及びレジスタの数においてＡＰＡＰとは異なり
うるが、ピケットはＡＰＡＰの代替である大規模並列実
施例においては、第１／第ｎの正規のアレイに対し接続
可能に構成されるのに対し、ＡＰＡＰのＰＭＥはサブア
レイの一部であるという点において異なる。両システム
共、ＳＩＭＩＭＤを遂行することはできるが、ピケット
・プロセッサは、ＰＥのＭＩＭＤを有するＳＩＭＤ機と
して構成されるので、ＳＩＭＩＭＤを直接遂行すること
ができるのに対し、MIMD APAP 構造は、ＳＩＭＤをシミ
ュレートするよう制御されたMIMD PE を使用することに
よってＳＩＭＩＭＤを遂行する。又、両機械共ＰＭＥを
使用する。The concept of picket is as follows: data width, memory
Although pickets may differ from APAP in size and number of registers, pickets are configured to be connectable to the 1 / nth regular array in the massively parallel embodiment, which is an alternative to APAP. APAP PMEs differ in that they are part of a sub-array. Both systems can perform SIMIMD, but since the picket processor is configured as a SIMD machine having PE MIMD, it can directly perform SIMIMD, while the MIMD APAP structure implements SIMD. Perform SIMIMD by using the MIMD PE controlled to simulate. Also, both machines use PME.

【００８０】両システム共並列アレイ・プロセッサとし
て構成することができ、アレイ通信ネットワークと相互
接続された“Ｎ”要素を有するアレイに対するアレイ処
理装置から成る。その第１／第Ｎのプロセッサ・アレイ
は処理要素と、その関連メモリーと、制御バス・インタ
ーフェースと、アレイ通信ネットワークの一部とから成
る。Both systems can be configured as parallel array processors and consist of array processing units for arrays with "N" elements interconnected with the array communication network. The 1 / Nth processor array consists of a processing element, its associated memory, a control bus interface, and part of an array communication network.

【００８１】並列アレイ・プロセッサは２重オペレーシ
ョン・モード機能を持ち、そこで、処理装置はどちらか
のモード又は２つのモードで指令され、ＳＩＭＤオペレ
ーション及びＭＩＭＤオペレーションに対する、これら
２つのモード間を自由に移動することができる。ＳＩＭ
Ｄがその組織のモードである場合には、処理装置はＳＩ
ＭＩＭＤモードで自己の命令を遂行するよう各要素を指
令するべき能力を持ち、ＭＩＭＤが処理装置の組織に対
する実施モードである場合は、処理装置はＭＩＭＤの遂
行をシミュレートするようアレイの選ばれた要素を同期
化するべき能力を持つ（これをMIMD-SIMD と称する）。The parallel array processor has dual operation mode capability, where the processor is commanded in either or two modes and is free to move between these two modes for SIMD and MIMD operations. can do. SIM
If D is the mode of the organization, the processor is SI
If the MIMD is capable of directing each element to perform its command in the MIMD mode, and the MIMD is the implementation mode for the organization of the processor, then the processor is chosen in the array to simulate the performance of the MIMD. Has the ability to synchronize elements (this is called MIMD-SIMD).

【００８２】両アレイの並列アレイ・プロセッサはアレ
イの要素間で情報を通過させ、通り抜けさせる経路を持
つアレイ通信ネットワークを提供する。情報の移動は２
つの方法のいずれかによって制御することができる。第
１の方法としては、移動データが受信先を規定せず、全
メッセージを同時且つ同一方向に移動するようアレイ・
コントローラが指示する。第２の方法としては、各メッ
セージがその開始位置に受信先を規定したヘッダを持
ち、自己経路指定するものである。The parallel array processors of both arrays provide an array communication network with paths for passing and passing information between the elements of the arrays. Information transfer is 2
It can be controlled in any of two ways. In the first method, the mobile data does not specify the destination and all messages are moved simultaneously and in the same direction.
Instructed by the controller. The second method is that each message has a header that defines the recipient at its start and self-routes.

【００８３】複数のアレイ・プロセッサのアレイのセグ
メントは単一半導体チップ上に設けられている処理装置
の複数のコピーを持ち、そのアレイ・セグメントの各コ
ピーは、アレイ通信ネットワークを拡張するため、その
セグメント及びバッファと、ドライバと、マルチプレク
サと、そのアレイ・セグメントがアレイの他のセグメン
トと一体的に接続可能にする制御とに接続されたアレイ
通信ネットワークの一部を含む。コントローラからの制
御バス又は経路は、アレイの各要素及びそれらの活動の
制御機能まで延長するよう各処理装置に配設される。An array segment of a plurality of array processors has a plurality of copies of a processing unit provided on a single semiconductor chip, each copy of the array segment expanding the array communication network. It includes a portion of an array communication network that is connected to segments and buffers, drivers, multiplexers, and controls that allow the array segment to be integrally connected to other segments of the array. A control bus or path from the controller is arranged in each processor to extend to the control functions of each element of the array and their activities.

【００８４】並列アレイの各処理要素セグメントはプロ
セッサ・メモリー要素の複数のコピーを含み、それは単
一半導体チップの限界内に含まれ、アレイ・セグメント
は、チップに含まれているアレイ・セグメントに対する
制御機能の通信を支援するため、アレイ制御バス及びレ
ジスタ・バッファの一部を含む。Each processing element segment of the parallel array contains multiple copies of the processor memory element, which are contained within the limits of a single semiconductor chip, and the array segment controls the array segments contained on the chip. It includes a portion of the array control bus and register buffers to support communication of functions.

【００８５】両方共メッシュ移動又は経路指定移動を実
現することができる。通常、ＡＰＡＰは、チップ上に一
方法で相互関係する８要素を持ち、チップは他の方法で
相互関係するようにした２重相互接続構造を実現する。
チップのプログラマブル経路指定は、一般に、上記のＰ
ＭＥ間にリンクを設定して行われるが、ノードは他の方
法で接続してもよく、通常他の方法で接続される。チッ
プ上で、正規のＡＰＡＰ構造は本質的に、２×２メッシ
ュであり、そのノード相互接続は経路指定された疎８進
Ｎ−立方（３次元）であることができる。両システム
共、マトリックスを点対点経路で構成可能にするＰＥ
（ＰＭＥ）の間にＰＥ間相互接続経路を有する。Both mesh movements or routing movements can be realized. APAP typically implements a dual interconnect structure with eight elements on one chip that are interrelated in one way, and the chips are otherwise interrelated.
Programmable routing of chips is generally described in P above.
This is done by setting a link between the MEs, but the nodes may be connected in other ways and are usually connected in other ways. On chip, the canonical APAP structure is essentially a 2x2 mesh and its node interconnects can be routed sparse octal N-cubes (3D). PE that enables the matrix to be configured with point-to-point paths in both systems
There is an interconnection path between PEs between (PME).

【００８６】更に、本発明においては、上記の課題を解
決するため、次に述べるように構成した。本発明による
大規模並列コンピュータ・システムは、大規模並列オペ
レーションにおいて使用することができるコンピュータ
・システムの相互接続ネットワークを通してメッセージ
を通過（又は透過）送信させる能力を有する。Further, in order to solve the above-mentioned problems, the present invention is constructed as described below. A massively parallel computer system according to the present invention has the ability to send (or transparent) messages through an interconnected network of computer systems that can be used in massively parallel operations.

【００８７】それは、Ｉ／Ｏ構造のセットアップ処理及
び主メモリー帯域幅の使用を除き、ノードの資産処理を
使用せずに、各ノードの主メモリーから出力メッセージ
を送信する機構、及びメモリーに対する機構を持つ。そ
して、データ経路は通信経路に設けられ、ノードの入力
データ経路の１つをその出力経路の１つに動的且つ論理
的に接続するよう制御する。システムはノードの記憶資
産、メモリー帯域幅、又は処理を使用せずに、動的且つ
論理的に接続された入力経路及び出力経路を使用して、
ノードを通しメッセージを通過又は透過する。It provides a mechanism for sending output messages from the main memory of each node, and a mechanism for memory, without using the node's asset processing, except for the I / O structure setup process and the use of main memory bandwidth. To have. A data path is then provided in the communication path and controls one of the input data paths of the node to be dynamically and logically connected to one of its output paths. The system uses dynamically and logically connected input and output paths without the use of node storage assets, memory bandwidth, or processing,
Pass or pass a message through a node.

【００８８】ノードはノードのメモリー帯域幅を全部使
用して、メモリーと入力データとを同時に処理する能力
を有するが、メッセージがメモリー又は入力データの処
理が論理的に接続されている経路の使用を要求しない限
り、動的に接続されている入力及び出力経路を通してそ
のメッセージを通過させる。そして本発明は論理的に接
続された入力及び出力経路にあるメッセージの終端を認
識する。経路に沿った将来の入力メッセージはノードの
処理資産及び記憶装置を使用しうるように、接続されて
いる入力経路を出力経路から動的且つ論理的に遮断す
る。A node has the ability to process memory and input data at the same time, using all of the node's memory bandwidth, but messages use memory or the path through which the processing of input data is logically connected. Pass the message through dynamically connected input and output paths unless requested. The invention then recognizes the end of the message on the logically connected input and output paths. Future input messages along the path will dynamically and logically disconnect the connected input path from the output path so that the node's processing resources and storage can be used.

【００８９】[0089]

【実施例】以下、添付図面に基づき本発明の好ましい実
施例を詳細に説明する。最初、本願に関連する“並列連
想プロセッサ・システム”（米国特許出願第６１１，５
９４号）について説明する。それはコンピュータ・メモ
リーの集積及び単一チップ内の制御ロジックと、チップ
内の組合せの複製と、単一チップの複製から成るプロセ
ッサ・システムの構築との考え方を記述する。この方法
はチップの境界超過及び配線長を減少することによって
実行性能を拡張又は強化するものではあるが、単一チッ
プ・タイプのみの開発及び製造の犠牲により大規模並列
処理能力を提供するシステムに導入される。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT A preferred embodiment of the present invention will now be described in detail with reference to the accompanying drawings. First, the "Parallel Associative Processor System" related to the present application (US Patent Application No. 611,5).
No. 94) will be described. It describes the idea of computer memory integration and control logic within a single chip, replication of combinations within a chip, and construction of a processor system consisting of a single chip replication. Although this method extends or enhances execution performance by reducing chip boundaries and wire lengths, it does provide a system that provides massive parallel processing capabilities at the expense of single chip type development and manufacturing. be introduced.

【００９０】上記米国特許出願第６１１，５９４号はチ
ップ内のその構造に対して接続された複数のＳＩＭＤ処
理メモリー要素（ＰＭＥ）を持つ１次元Ｉ／Ｏ構造の使
用方法を記述している。本発明は１より大きい次元にそ
の概念を拡張し、データ転送及びプログラム割込みの両
方を持つ完全Ｉ／Ｏシステムを含むよう入念に完成した
ことを特徴とする。The above-referenced US patent application Ser. No. 611,594 describes the use of a one-dimensional I / O structure with multiple SIMD processing memory elements (PMEs) connected to that structure in a chip. The invention is characterized by extending the concept to a dimension greater than one and by elaborating it to include a complete I / O system with both data transfer and program interrupts.

【００９１】関連する米国出願は、異なる他の面におい
て、先の発明を拡張し、入念に考案した。以下に記述す
る本発明の実施例はチップ当り８つのＳＩＭＤ／ＭＩＭ
ＤＰＭＥを持つ４次元Ｉ／Ｏ構造によって説明するが、
その説明より大きい次元及び多いＰＭＥに拡張すること
もできる。The related US applications, in different respects, extend and elaborate on the earlier invention. The embodiment of the invention described below is eight SIMD / MIM per chip.
The four-dimensional I / O structure with DPME explains
It can be extended to more dimensions and more PMEs than its description.

【００９２】本発明はプロセッサ間通信から外部入力／
出力機能まで本発明の概念を拡張する上、更に処理アレ
イの制御に必要なインターフェース及び要素を提供す
る。要約すると、それら３つのタイプのＩ／Ｏは、
（イ）プロセッサ間、（ロ）外部に対する（又は、から
の）プロセッサ、及び（ハ）同報通信／制御等である。
大規模並列処理システムはこれらすべてのタイプのＩ／
Ｏ帯域幅を要求し、プロセッサ計算能力と平衡すること
を要求する。The present invention allows external input / output from inter-processor communication.
In addition to extending the inventive concept to output functionality, it also provides the interfaces and elements needed to control the processing array. In summary, these three types of I / O are:
(B) between processors, (b) processors to (or from) the outside, and (c) broadcast communication / control.
Large-scale parallel processing systems use I / O of all these types.
It requires O bandwidth and balances processor computing power.

【００９３】アレイ内におけるこれらの要求は、ＰＭＥ
と称し、非常に高速な割込み状態スワッピング能力によ
り増補された１６ビット命令セット・アーキテクチャ・
コンピュータの複製によって充足させることができる。
ＰＭＥの特性は他の大規模並列処理機の処理要素には存
在しない。本発明による機械は処理、経路指定、記憶、
及びＩ／Ｏオペレーションを完全に分散させることがで
きる。この特性は他の設計のいずれにも存在しない。These requirements within the array are
, A 16-bit instruction set architecture augmented with very fast interrupt state swapping capabilities.
It can be satisfied by computer duplication.
The characteristics of PME do not exist in the processing elements of other massively parallel processors. The machine according to the invention is used for processing, routing, storing,
And I / O operations can be fully distributed. This property does not exist in any of the other designs.

【００９４】次に、図１に基づき、詳細に説明する。図
１は、特に“拡張並列アレイ・プロセッサ”（ＡＰＡ
Ｐ）（上記参照の米国特許出願の１つ）の主な要素及び
ホスト・プロセッサに対するＡＰＡＰインターフェース
を示すＡＰＡＰのブロック図である。図１において、１
はこのＡＰＡＰが付属するホスト・プロセッサであり、
このホスト・プロセッサのプログラム遂行によりデータ
及びコマンドが発行される。３はこれらデータ及びコマ
ンドを受信して解釈するアレイ・ディレクタのアプリケ
ーション・プログラム・インターフェース（ＡＰＩ）で
ある。Next, details will be described with reference to FIG. FIG. 1 specifically illustrates the "Extended Parallel Array Processor" (APA
P) is a block diagram of the APAP showing the main elements of one of the above-referenced US patent applications and the APAP interface to the host processor. In FIG. 1, 1
Is the host processor to which this APAP is attached,
Data and commands are issued by the program execution of this host processor. An array director application program interface (API) 3 receives and interprets these data and commands.

【００９５】ＡＰＩはクラスタ・シンクロナイザ４及び
クラスタ・コントローラ５を経由してデータ及びコマン
ドをクラスタ６まで通過させる。クラスタはＡＰＡＰの
メモリー及び並列処理を提供する。クラスタ・シンクロ
ナイザ及びクラスタ・コントローラから供給される機能
はデータ及びコマンドを適切なクラスタに経路指定し、
クラスタ間に平衡なロードを与えることである。アレイ
・ディレクタに対するより詳細な説明は、“拡張並列プ
ロセッサ・アレイ・ディレクタ”と称する米国特許出願
記述されている。The API passes data and commands up to the cluster 6 via the cluster synchronizer 4 and the cluster controller 5. The cluster provides APAP memory and parallel processing. Functions provided by the cluster synchronizer and cluster controller route data and commands to the appropriate clusters,
To give a balanced load between the clusters. A more detailed description of the array director is found in the U.S. patent application entitled "Extended Parallel Processor Array Director."

【００９６】クラスタは変更ハイパーキューブとして相
互接続された多数のＰＭＥから成る。ハイパーキューブ
における各セルは、その隣りのどのセルのアドレスもど
の単一ビット位置においても異なるというようにアドレ
スすることができる。リング内のどのセルもそれらのア
ドレスが±１異なる２つの隣り合うセルとしてアドレス
することができる。ＡＰＡＰに対して使用される変更ハ
イパーキューブはリング外にハイパーキューブを構築す
ることによってこれらアプローチを組合わせる。リング
の交点はノードと定義される。A cluster consists of multiple PMEs interconnected as a modified hypercube. Each cell in the hypercube can be addressed such that the address of any cell next to it is different at any single bit position. Any cell in the ring can be addressed as two adjacent cells whose addresses differ by ± 1. The modified hypercube used for APAP combines these approaches by building the hypercube outside the ring. The intersection of the rings is defined as a node.

【００９７】次に、図２に基づき本発明の好ましい実施
例について詳細に説明する。図２に例示した好ましい実
施例のノードは、“ＳＩＭＤ／ＭＩＭＤ処理メモリー要
素”と称する米国特許出願においては、２ｎのＰＭＥ２
０と同報通信及び制御インターフェース（ＢＣＩ）部２
１とから成る。ＰＭＥは２×ｎアレイとしてノード内で
構成される。アレイを特徴付けるアレイのリング又は次
元数は“ｎ”で表わされ、チップのＩ／Ｏポートの数、
従って物理的なチップ・パッケージによって制限される
が、本実施例ではｎ＝４である。チップ技術の改良によ
り、“ｎ”が増加した場合チップ内のアレイの次元を高
くすることができる。Next, a preferred embodiment of the present invention will be described in detail with reference to FIG. The node of the preferred embodiment illustrated in FIG. 2 is a 2n PME2 in the U.S. patent application entitled "SIMD / MIMD Processing Memory Element".
0 and broadcast communication and control interface (BCI) unit 2
It consists of 1. The PME is organized within the node as a 2xn array. The ring or dimensionality of the array that characterizes the array is represented by "n", the number of I / O ports on the chip,
Therefore, although limited by the physical chip package, n = 4 in this embodiment. Improvements in chip technology allow higher dimensions of the array within the chip as "n" increases.

【００９８】図６及び図７はＰＭＥからアレイに対する
発展を示す。８つのＰＭＥがノード１５１を形成するよ
う相互接続される。８個のノード・グループはＸ次元リ
ング（１６ＰＭＥ）で相互接続され、オーバーラップす
る８個のノード・グループはＹ次元リング１５２で相互
接続される。これはノードの８×８アレイ（５１２ＰＭ
Ｅ）を持つ単一２次元クラスタを形成する。6 and 7 show the evolution from PME to array. Eight PMEs are interconnected to form node 151. The eight node groups are interconnected by an X-dimensional ring (16PME) and the eight overlapping node groups are interconnected by a Y-dimensional ring 152. This is an 8x8 array of nodes (512 PM
E) form a single two-dimensional cluster with.

【００９９】クラスタは４次元アレイ要素１５３を形成
するよう８×８アレイまで組合わされる。アレイ要素を
横切る各８個のノード・グループはＷ次元及びＺ次元の
両方で相互接続される。４次元全べての単一ノードに対
する相互接続経路は１５４で示す。アレイは規則的な又
は直角のいずれかである必要はないということに注意す
るべきである。特定のアプリケーション又は構成は全べ
て又はどの次元においてもノードの数を再定義すること
ができる。The clusters are combined up to an 8 × 8 array to form a four dimensional array element 153. Each group of eight nodes across the array element are interconnected in both the W and Z dimensions. The interconnection path for all single nodes in four dimensions is shown at 154. It should be noted that the array need not be either regular or right angle. A particular application or configuration can redefine the number of nodes in all or in any dimension.

【０１００】各ＰＭＥはノードの１リング２６のみに存
在する（図２）。それらリングはＷ，Ｘ，Ｙ，及びＺで
示す。チップ内のＰＭＥ２０は１つがノードのリングに
沿い、外部で時計方向にデータを移動するように対にさ
れる（すなわち、＋Ｗ，−Ｗ）。その他のものはノード
のリング２３，２６に沿い、外部で反時計方向にデータ
を移動して、各ノードの外部ポートにＰＭＥを接続す
る。各リングの２つのＰＭＥはそれら外部のＩ／Ｏポー
ト（＋Ｗ，−Ｗ，＋Ｘ，−Ｘ，＋Ｙ，−Ｙ，＋Ｚ，−
Ｚ）によって指定される。ノード内には、又、４＋ｎの
ＰＭＥと４−ｎのＰＭＥとを相互に接続する２つのリン
グ２２がある。Each PME exists only on one ring 26 of the node (FIG. 2). The rings are designated W, X, Y, and Z. The PMEs 20 in the chip are paired so that one moves data clockwise along the ring of nodes (i.e., + W, -W). Others move data counterclockwise externally along the ring 23, 26 of the node and connect the PME to the external port of each node. The two PMEs of each ring have their external I / O ports (+ W, -W, + X, -X, + Y, -Y, + Z,-).
Z). Within the node there are also two rings 22 interconnecting 4 + n PMEs and 4-n PMEs.

【０１０１】これら内部リングは外部リング間でメッセ
ージを移動する経路を提供する。ＡＰＡＰは４次元直交
アレイ１５１〜１５４と考えられるので、内部リングは
全次元のアレイを通してメッセージを移動することがで
きる。これは、どのＰＭＥでも、ノードそれ自体のリン
グのＰＭＥかそのノード内の隣接ＰＭＥをアドレスする
ことによりメッセージを目的の方へステップすることが
できる。These inner rings provide a path for moving messages between outer rings. The APAP is considered a four-dimensional orthogonal array 151-154 so that the inner ring can move messages through the array in all dimensions. This allows any PME to step the message towards the destination by addressing the PME of the node's own ring or a neighboring PME within that node.

【０１０２】図８は回路切替経路を示す転送インターフ
ェースのブロック図である。図８において、各ＰＭＥは
４入力ポート及び４出力ポートを有する（左８５，９
２、右８６，９５、垂直９３，９４、及び外部８０，８
１）。３つの入力ポートと３つの出力ポートはチップの
他のＰＭＥに対し全二重点対点接続を与える。第４のポ
ートはオフチップＰＭＥに対し全二重点対点接続を与え
る。ピン及び電力がこの好ましい実施例の物理的パッケ
ージに制限があるため、実際のＩ／Ｏインターフェース
は、図９に例示するように、ＰＭＥ間データ・ワード９
６，１００の４ニブルの多重化に使用される４ビット幅
経路９７，９８，９９である。FIG. 8 is a block diagram of a transfer interface showing a circuit switching path. In FIG. 8, each PME has 4 input ports and 4 output ports (left 85, 9).
2, right 86,95, vertical 93,94, and external 80,8
1). The three input ports and three output ports provide full-duplex point-to-point connections to the other PMEs on the chip. The fourth port provides a full-duplex point-to-point connection for the off-chip PME. Due to the pin and power limitations of the physical package of this preferred embodiment, the actual I / O interface is a PME-to-PME data word 9 as illustrated in FIG.
4 bit wide paths 97, 98, 99 used to multiplex 4 nibbles of 6,100.

【０１０３】各ＰＭＥは、その４入力ポート８０，８
５，８６，９３の１つを、ＰＭＥの主データ・フロー８
７，８８，８９にデータを入力することなく、その４出
力ポート８１，９２，９４，９５の１つに直接切替える
ことができるＩ／Ｏの回路切替モードを有する。回路切
替における発信元１６２と受信先１６１（図４）の選択
はＰＭＥのソフトウェアの遂行による制御によって行わ
れる。他の３入力ポートは、第４の入力が出力ポートに
切替えられている間、ＰＭＥの主記憶に対するアクセス
が続けられる。Each PME has its 4 input ports 80, 8
One of 5,86,93 is the main data flow of PME 8
It has an I / O circuit switching mode in which data can be directly switched to one of the four output ports 81, 92, 94, and 95 without inputting data to 7, 88, 89. The selection of the transmission source 162 and the reception destination 161 (FIG. 4) in the circuit switching is performed by the control performed by the software of the PME. The other three input ports continue to access the PME's main memory while the fourth input is switched to the output port.

【０１０４】ＰＭＥ入力ポートのデータは局所ＰＭＥに
向けるか、又はリングを更に下がるＰＭＥに向けるか予
め定めておくことができる。リングを更に下がるＰＭＥ
に対してデータを予め定めておくと、そのデータは局所
ＰＭＥ主メモリーに記憶され、その後目標ＰＭＥ（記憶
及び転送）８４，８７，８８，８９，９０，９１の方に
局所ＰＭＥから転送することができる。又は、局所入力
ポートはデータが目標ＰＭＥに対する途上の局所ＰＭＥ
８４，９０を通して“透過的”に通過するように、特定
の局所出力ポート（回路切替）に対し論理的に接続する
ことができる。The data at the PME input port can be predetermined to point to the local PME or to the PME further down the ring. PME further down the ring
Data is stored in the local PME main memory and then transferred from the local PME to the target PME (store and transfer) 84, 87, 88, 89, 90, 91. You can Or, the local input port is a local PME whose data is on the way to the target PME.
It can be logically connected to a particular local output port (circuit switch) for "transparent" passage through 84,90.

【０１０５】局所ＰＭＥソフトウェアは、局所ＰＭＥが
４入力及び４出力のいずれかに対し記憶及び転送モード
にするか、又は回路切替モードにするか否かを動的に制
御する（１７０）。回路切替モードにおいて、ＰＭＥは
回路切替に関連する機能を除く全べての機能を同時に処
理する。記憶及び転送モードにおいては、ＰＭＥは全べ
て他の処理機能を中断してＩ／Ｏ転送処理を開始する。The local PME software dynamically controls whether the local PME is in store and transfer mode for either 4 inputs and 4 outputs or in circuit switching mode (170). In circuit switching mode, the PME simultaneously handles all functions except those related to circuit switching. In the store and transfer mode, the PME suspends all other processing functions and begins the I / O transfer process.

【０１０６】この好ましい実施例において、ＰＭＥＩ
／Ｏの設計は下記の３つのＩ／Ｏオペレーティング・モ
ードを提供する。In this preferred embodiment, the PME I
The / O design offers three I / O operating modes:

【０１０７】１．正規のモード：２つの隣接するＰＭＥ
間におけるデータ転送に使用される。このデータの転送
はＰＭＥソフトウェアによって開始される。隣接ＰＭＥ
以外のＰＭＥに対して予め定められたデータは隣接ＰＭ
Ｅによって受信されて後、それがその隣接ＰＭＥから発
信するかの如くに送信しなければならない。例えば、＋
Ｗで発信され、−Ｘに向けて予め定められた２進ハイパ
ーキューブ１５１データは＋Ｗから＋Ｚに転送される。
その後、それは＋Ｚから＋Ｘに転送される。最終的に、
それは＋Ｗから＋Ｚに転送される。この処理は、ノード
内のＰＭＥと、異なるノードのＰＭＥの両方に適用され
る。1. Regular mode: 2 adjacent PMEs
Used for data transfer between. The transfer of this data is initiated by the PME software. Adjacent PME
Predetermined data for PMEs other than is the adjacent PM
After being received by E, it must transmit as if it originated from its neighbor PME. For example, +
Binary hypercube 151 data originating at W and predetermined for -X is transferred from + W to + Z.
Then it is transferred from + Z to + X. Finally,
It is transferred from + W to + Z. This process applies to both PMEs within a node and PMEs of different nodes.

【０１０８】２．回路切替モード：データ及び制御をＰ
ＭＥを横切り通過させることができる。これは即近でな
いＰＭＥ間の通信を高速に行うことができる。上記１．
の例において、＋Ｗは−Ｘに対する回路切替経路を要求
して＋Ｚに対してメッセージを送信する。＋Ｚはそのメ
ッセージを＋Ｘに転送して後、＋Ｗ入力に接続されてい
る回路スイッチを＋Ｘ出力にセットする。2. Circuit switching mode: P for data and control
The ME can be passed across. This enables high-speed communication between PMEs that are not immediate. Above 1.
In the example, + W requests the circuit switching path for -X and sends a message to + Z. + Z forwards the message to + X and then sets the circuit switch connected to the + W input to the + X output.

【０１０９】＋Ｘは、＋Ｚ入力に接続されている回路切
替経路を−Ｘ出力にセットする。そこで、＋Ｗは＋Ｚを
通してデータを直接−Ｘに転送する。図７はこの状態を
示す。図８において、＋ＸＰＭＥはマルチプレクサ
（ＭＵＸ）８４、ＭＵＸ９０、及びデマルチプレクサ
（DEMUX)９１を介して左（ＬＥＦＴ）入力８５を垂直
（VERTICAL）出力９４に接続する。＋ＺＰＭＥは左入
力８５を右出力９５に接続することを除き、上記と類似
の経路を設定する。+ X sets the circuit switching path connected to the + Z input to the -X output. So + W transfers the data directly to -X through + Z. FIG. 7 shows this state. In FIG. 8, + X PME connects the left (LEFT) input 85 to the vertical (VERTICAL) output 94 via multiplexer (MUX) 84, MUX 90, and demultiplexer (DEMUX) 91. + Z PME sets up a path similar to the above except that it connects the left input 85 to the right output 95.

【０１１０】この方法又はプロセスはノード内転送及び
ノード間転送に等しく適用される。ノード間回路の切替
転送は、通過する２つのノードにおける出力バッファ・
レジスタ８３と入力バッファ・レジスタ８２とに対する
ロードが必要なため、通過する各ノードにおいて追加の
クロック・サイクルが消費されるので、ノード内回路切
替転送よりわずか遅い。しかし、それでもなお、ネット
ワークの直径は各回路切替ＰＭＥに対するものより減じ
られる。This method or process applies equally to intra-node and inter-node transfers. Switching transfer of the circuit between nodes is performed by the output buffers in the two passing nodes.
It is slightly slower than the intra-node circuit switch transfer because it requires an additional clock cycle at each node traversed due to the need to load register 83 and input buffer register 82. However, the diameter of the network is still reduced from that for each circuit-switched PME.

【０１１１】３．ジッパ・モード：このモードはアレイ
・コントローラによりクラスタのノードからデータを読
取り又はロードするために使用されるＩ／Ｏ転送モード
である。ジッパ(Zipper)モードは標準的機能及び回路切
替モードを使用して、クラスタ・カードのＰＭＥのアレ
イに対し及びそこからデータを急速に転送する。本実施
例による好ましいジッパは“ＡＰＡＰＩ／Ｏジッパ接
続”と称する米国特許出願に示されているジッパであ
る。3. Zipper mode: This mode is the I / O transfer mode used by the array controller to read or load data from the nodes of the cluster. Zipper mode uses standard functions and circuit switching modes to rapidly transfer data to and from an array of PMEs in a cluster card. The preferred zipper according to this embodiment is the zipper shown in the US patent application entitled "APAP I / O Zipper Connection".

【０１１２】Ｉ／Ｏ転送モード、正規のモード、及び回
路切替モードの３基本モードに加え、スプリッタ・サブ
モードが設けられる。正規のデータ転送サブモードにお
いて、データ経路が設定されると、データの全べては発
信元１６２から単一指定受信先１６１に転送される（図
４）。スプリッタ・サブモードにおいて、基本１６１及
び代替１６７受信先の両方がワード・カウント１６８と
同様に指定される。メッセージのデータ・ワードはスプ
リッタ・ワード・カウントが０に達するまで基本受信先
に転送される。そこで、スプリッタは代替受信先に切替
え、スプリッタ・ワード・カウントが再び０に達するま
で次のデータ・ワード群を転送し、そして基本受信先に
戻る。このプロセスは全メッセージが送信されるまで続
けられる。In addition to the three basic modes of I / O transfer mode, normal mode, and circuit switching mode, a splitter submode is provided. When the data path is set in the regular data transfer submode, all the data is transferred from the source 162 to the single designated receiver 161 (FIG. 4). In the splitter submode, both the primary 161 and alternate 167 destinations are designated as well as the word count 168. The data word of the message is forwarded to the elementary destination until the splitter word count reaches zero. The splitter then switches to the alternate destination, transfers the next group of data words until the splitter word count reaches zero again, and then returns to the primary destination. This process continues until the entire message has been sent.

【０１１３】概念的に、データは発信元ＰＭＥの主メモ
リーから受信先ＰＭＥの主メモリーに転送される。本実
施例においては、各インターフェースに対するメモリー
の２つの位置に出力データ・ブロック２３０の開始アド
レスと、ブロック２３１に含まれているワードの数とを
含むよう割付けられる。その上、ＰＭＥ制御レジスタ１
（図４）はデータ出力のモード及び受信先を制御する。Conceptually, data is transferred from the main memory of the originating PME to the main memory of the receiving PME. In this embodiment, two locations of memory for each interface are allocated to contain the starting address of output data block 230 and the number of words contained in block 231. Besides, PME control register 1
(FIG. 4) controls the mode of data output and the receiver.

【０１１４】ＰＭＥ実行時間ソフトウェアは割付けられ
たメモリー位置をロードして出力データ・ブロックを定
義し、ＰＭＥ制御レジスタ１をロードして、存在する場
合、スプリット・ワード・カウント１６８と、受信先１
６１と、ノード１６３，１６５と、代替受信先１６７と
を定義し、定義したブロックがメッセージを終了したか
否か規定し、その実行時間プログラム１６５を続行する
前にＰＭＥが終了したデータ・ブロックを転送するか否
かを規定する。出力オペレーションを実行するため、Ｐ
ＭＥ実行時間ソフトウェアは最初アドレス及びカウント
を指定メモリー位置にロードする。次に、該ソフトウェ
アはＰＭＥ制御レジスタ１にロードし、最後に該ソフト
ウェアはデータ送信シーケンスを開始する出力命令を遂
行する。PME runtime software loads the allocated memory location to define the output data block and loads PME control register 1 to split word count 168, if present, and receiver 1
61, nodes 163 and 165, and an alternate receiver 167 to define whether the defined block has terminated the message, and the data block that the PME has terminated before continuing its runtime program 165. Specifies whether to transfer. P to perform the output operation
The ME execution time software loads the first address and count into the designated memory location. Next, the software loads the PME control register 1, and finally the software performs an output command to start the data transmission sequence.

【０１１５】各データ・ワードを送信するため、送信シ
ーケンスはカウント２３１を減分し、開始アドレス２３
０を増分し、データ・ワードをメモリー４１から読取
る。データ・ワードは送信レジスタ４７，９６にロード
され、選択されたＰＭＥ９７，１６１に送信される。送
信シーケンスは正規のＰＭＥ処理に入ってメモリー４１
及びＡＬＵ４２に対するアクセスをサイクル・スチール
し、Ｉ／Ｏアドレス及びカウント・フィールドを更新
し、送信レジスタ４７，９６をロードする。ＰＭＥ制御
レジスタ１のＣＸビット１６５はＰＭＥ処理が送信シー
ケンス中命令をインターリーブし続けるか、又は送信シ
ーケンスが終了するまで遊休か否かを判別する。このシ
ーケンスはカウントが０に達するまで続けられる。To send each data word, the send sequence decrements the count 231 and returns the start address 23
Increment zero and read the data word from memory 41. The data word is loaded into the transmit registers 47,96 and transmitted to the selected PME 97,161. The transmission sequence enters the regular PME process and the memory 41
And cycle access to ALU 42, update I / O address and count fields, and load transmit registers 47, 96. The CX bit 165 of the PME control register 1 determines whether the PME process continues to interleave the instructions during the transmit sequence or is idle until the transmit sequence ends. This sequence continues until the count reaches zero.

【０１１６】データ転送インターフェースは４ビット幅
９７であるため、各１６ビット・データ・ワード２２０
は４ビット片（ニブル）で送信される。又、タグ・ニブ
ル２２１及びパリティ・ニブル２２２はデータと共に送
信される。２２３に転送ホーマットを示す。Since the data transfer interface is 4 bits wide 97, each 16 bit data word 220
Are transmitted in 4-bit pieces (nibbles). Also, the tag nibble 221 and the parity nibble 222 are transmitted with the data. 223 shows a transfer format.

【０１１７】図１１は転送シーケンスのホーマットを示
す。インターフェースの送信ＰＭＥは受信ＰＭＥに対し
て要求２２５を発生する。受信確認２２６を受信したと
き、送信ＰＭＥはデータ送信を開始し、次の送信シーケ
ンスを発生させることができる。次の送信シーケンスは
受信確認を受取るまで発行されない。FIG. 11 shows the format of the transfer sequence. The sending PME of the interface issues a request 225 to the receiving PME. Upon receipt of the Acknowledgment 226, the transmitting PME may initiate data transmission and generate the next transmission sequence. The next transmission sequence will not be issued until receipt of the acknowledgment.

【０１１８】ＰＭＥ制御レジスタ１におけるＴＣタグ・
ビット１６４がセットされると、ＴＣビット２２４は転
送される最後のデータ・ワードのタグ・フィールドでセ
ットされる。このビットはデータ転送が終了したという
ことを受信ＰＭＥに知らせるビットである。TC tag in PME control register 1
When bit 164 is set, TC bit 224 is set in the tag field of the last data word transferred. This bit is a bit that informs the receiving PME that the data transfer is completed.

【０１１９】正規のモードにおいて、ＰＭＥはＬ８５，
Ｖ９３，Ｒ８６，及びＸ８０インターフェースからデー
タを受信することができる。ＰＭＥがインターフェース
からデータを受信することができるようになるまでに、
そのインターフェースに対するメモリーの入力バッファ
をセットアップしなければならない。メモリーの２つの
記憶位置は各入力データ・バッファ２３２の開始アドレ
スと、バッファ２３３に含まれているワードの数とを含
むよう割付けられる。更に、ＰＭＥ制御レジスタ２が入
力インターフェース１７３を使用可能にすること、及び
Ｉ／Ｏ割込み１７２を許容することの両事象を行うよう
にしたマスク・ビットを含む。In the normal mode, PME is L85,
Data can be received from V93, R86, and X80 interfaces. By the time the PME is able to receive data from the interface,
You must set up an input buffer in memory for that interface. Two memory locations are allocated to contain the starting address of each input data buffer 232 and the number of words contained in buffer 233. In addition, PME control register 2 includes a mask bit that is responsible for both the enabling of input interface 173 and the allowing of I / O interrupt 172.

【０１２０】ＰＭＥ実行時間ソフトウェアは割付けられ
たメモリー位置をロードして出力データ・ブロックを定
義し、ＰＭＥ制御レジスタ２をロードして入力データ転
送を使用可能にする。入力バッファが定義され、使用可
能にされると、ＰＭＥ実行時間ソフトウェアは他の処理
タスクを続行し、Ｉ／Ｏ割込みを待つ。The PME runtime software loads the allocated memory location to define the output data block and loads the PME control register 2 to enable the input data transfer. Once the input buffer is defined and enabled, the PME runtime software continues with other processing tasks and waits for I / O interrupts.

【０１２１】インターフェースの１つが要求２４０を検
出したとき、受信ＰＭＥは受信確認２４１を送出し、デ
ータを入力レジスタ８７にロードする。そこで、受信シ
ーケンスは開始され、カウント２３３をフェッチし、減
分し、入力バッファ・アドレス２３２をフェッチし、増
分し、ＰＭＥメモリー４１にデータ・ワードを記憶す
る。受信シーケンスは送信シーケンスと類似である。受
信シーケンスは正規のＰＭＥ処理に入り、メモリー４１
及びＡＬＵ４２に対するアクセスをサイクル・スチール
し、Ｉ／Ｏアドレス及びカウント・フィールドを更新
し、入力データ・ワードをメモリー４１にロードする。When one of the interfaces detects the request 240, the receiving PME sends a confirmation of receipt 241 and loads the data into the input register 87. There, the receive sequence begins, fetching count 233, decrementing, fetching input buffer address 232, incrementing and storing the data word in PME memory 41. The receive sequence is similar to the transmit sequence. The reception sequence enters the regular PME process, and the memory 41
And ALU 42 cycle steal access, update I / O address and count fields, and load input data word into memory 41.

【０１２２】ＰＭＥプロセッサはＰＭＥ制御レジスタ１
のＣＸビット１６５の状態に関係なく、受信シーケンス
中、命令をインターリーブし続ける。このシーケンス
は、カウントが０に達するか、ＴＣタグを受信するかの
いずれかとなり、その結果、対応する入力割込レジスタ
のビットが“入力バッファ・フル”又は“転送終了”を
示す割込コード１９０によりセットされる１７１まで継
続される。The PME processor uses the PME control register 1
The instructions continue to be interleaved during the receive sequence regardless of the state of the CX bit 165 of the. This sequence is either a count reaching 0 or a TC tag is received, so that the corresponding input interrupt register bit indicates an "input buffer full" or "transfer end" interrupt code. Continue until 171 set by 190.

【０１２３】ＰＭＥは、下記の条件に合致する場合、要
求に応答して受信確認２２６を発生する。１．入力レジスタ８７，１００は自由である。２．要求は禁止されない１７４。３．割込１８２は要求入力に対して保留ではない。４．要求入力は回路切替されない。５．要求入力は現行要求のどれよりも最高優先権を有す
る。The PME generates an acknowledgment 226 in response to the request if the following conditions are met. 1. The input registers 87 and 100 are free. 2. The request is not barred 174. 3. Interrupt 182 is not pending for request input. 4. The request input is not switched. 5. The request input has the highest priority over any of the current requests.

【０１２４】入力レジスタ８７，１００は、受信確認２
２６が発生したときから受信シーケンスがデータ・ワー
ドをメモリーに記憶するまで使用中とされる。入力レジ
スタが使用中の場合、受信確認は禁止される。使用中状
態においては、受信シーケンスが発生するまで、入力レ
ジスタに対する重ね書きが防がれる（受信シーケンスが
メモリー・リフレッシュ又は他のＰＭＥオペレーション
を遅らせるため）。The input registers 87 and 100 are used for receiving confirmation 2
From the time 26 occurs, the receive sequence is in use until the data word is stored in memory. Acknowledgment is prohibited if the input register is in use. In the busy state, overwrites to the input registers are prevented until the receive sequence occurs (because the receive sequence delays memory refresh or other PME operation).

【０１２５】ＴＣタグ・ビット２２４が送信ＰＭＥから
送信された場合、Ｉ／Ｏ割込ラッチがそのインターフェ
ースのためにセットされる１７１。そのインターフェー
スに対するそれ以上の受信確認２２６は、割込ラッチが
ＰＭＥ実行時間ソフトウェアによってリセットされるま
で発生しない。他の全べてのインターフェースは活動状
態に維持される。例えば、ＴＣタグ・ビット２２４が左
インターフェース８５からのデータ転送のためにセット
された場合、左からのそれ以上の要求は、Ｌ割込みが取
られ、Ｌ割込ラッチがリセットされるまで禁止される。
Ｖ９３，Ｒ８６，Ｘ８０からの要求２２５は普通にとる
ことができる。If the TC tag bit 224 is sent from the transmit PME, the I / O interrupt latch is set 171 for that interface. No further acknowledgment 226 for that interface will occur until the interrupt latch is reset by the PME runtime software. All other interfaces are kept active. For example, if the TC tag bit 224 is set for a data transfer from the left interface 85, further requests from the left will be inhibited until the L interrupt is taken and the L interrupt latch is reset. .
Request 225 from V93, R86, X80 can be taken normally.

【０１２６】共通の入力レジスタ８７，１００に対し単
一の経路８４，９９があるため、受信確認２２６優先権
は下記に示すようになる。 Since there is a single path 84,99 to the common input register 87,100, the acknowledgment 226 priority is as follows:

【０１２７】受信ＰＭＥが受信確認２２６を発生する
と、入力マルチプレクサ８４，９９は正しい入力ソース
にセットされる。ワード転送の終端において、入力レジ
スタ８７，１００はロックされる。入力レジスタは受信
シーケンスが発生するまでロック状態に維持され、デー
タは正しいメモリー・バッファに転送される。この時間
中、入力レジスタは使用中であるため、全受信確認は禁
止される。When the receiving PME issues an Acknowledge 226, the input multiplexers 84, 99 are set to the correct input source. At the end of the word transfer, the input registers 87,100 are locked. The input register remains locked until the receive sequence occurs and the data is transferred to the correct memory buffer. During this time, the input register is in use and all reception confirmations are prohibited.

【０１２８】データ・ワードはＴＣタグ２２４ビット集
合と共に転送され、受信ＰＭＥが正規のモードにある
と、Ｉ／Ｏ割込みが関連インターフェースに対して発生
する１７１。その上、バッファ・カウントは、ＴＣタグ
が送信ＰＭＥから送られまで０になり、Ｉ／Ｏ割込み
は、入力バッファのオーバーフローに反映するよう割込
コード集合と関連するインターフェースに対して生成さ
れる。The data word is transferred with the TC tag 224 bit set, and when the receiving PME is in normal mode, an I / O interrupt is generated 171 to the associated interface. Moreover, the buffer count goes to 0 until the TC tag is sent from the sending PME, and I / O interrupts are generated for the interface associated with the interrupt code set to reflect the overflow of the input buffer.

【０１２９】正規のプロセス又は処理は隣り合うＰＭＥ
間のデータ転送に使用される。ＰＭＥ実行時間ソフトウ
ェア・プロトコルは、それがデータに対する最終受信先
か又は単なる中間地点であるかを受信ＰＭＥソフトウェ
アに示す情報をそのメッセージのヘッダに挿入する。そ
れが中間地点であるとＰＭＥ実行時間ソフトウェアが判
別した場合、データを最終受信先の方へ移動するよう送
信シーケンスを設定しなければならない。A legitimate process or operation is a neighbor PME.
Used for data transfer between. The PME runtime software protocol inserts information in the header of the message that indicates to the receiving PME software whether it is the final destination for the data or just the waypoint. If the PME runtime software determines that it is a waypoint, the send sequence must be set up to move the data towards its final destination.

【０１３０】この処理は記憶及び転送として知られる。
記憶及び転送はデータをその最終受信先の方に移動す
る。しかし、このデータ移動は、ＰＭＥメモリー帯域幅
を使用して、中間地点ごとにＩ／Ｏバッファ区域をロー
ドし、アンロードする両方でＩ／Ｏオーバーヘッドを発
生し、ヘッダ情報を処理し、ＰＭＥを通過する各メッセ
ージに対する送信シーケンスを初期化するためのＣＰＵ
処理オーバーヘッドを増加する。This process is known as store and transfer.
Store and transfer moves the data to its final destination. However, this data movement uses PME memory bandwidth to incur I / O overhead for both loading and unloading the I / O buffer area at each waypoint, processing header information, and handling PMEs. CPU for initializing the transmission sequence for each message passing through
Increases processing overhead.

【０１３１】回路切替モードにおいては、発信元ＰＭＥ
実行時間ソフトウェアと最終受信先ＰＭＥ実行時間ソフ
トウェアが正規のプロセス又は処理を使用する。回路切
替機構は、発信元ＰＭＥから各中間地点ＰＭＥを通して
最終受信先ＰＭＥに送信された（正規のモードで）ソフ
トウェア・プロトコル・メッセージに応答して、中間地
点ＰＭＥ実行時間ソフトウェアによって呼出される。In the circuit switching mode, the source PME
The run-time software and final destination PME run-time software use regular processes or operations. The circuit switching mechanism is invoked by the waypoint PME runtime software in response to software protocol messages (in normal mode) sent from the originating PME through each waypoint PME to the final destination PME.

【０１３２】中間地点ＰＭＥは回路切替モードを呼出す
ことはできないが、中間地点はプロトコル・メッセージ
及びデータ・メッセージを記憶し、及び転送する。記憶
及び転送する中間地点ＰＭＥの受信側に残る中間地点Ｐ
ＭＥは、ソフトウェア・プロトコル・メッセージを受信
するときに回路切替モードを呼出す。The waypoint PME cannot invoke circuit switching mode, but it does store and forward protocol and data messages. Waypoint P that remains on the receiving side of waypoint PME for storing and transferring
The ME invokes circuit switching mode when it receives the software protocol message.

【０１３３】回路切替経路は局所ＰＭＥ実行時間ソフト
ウェアによって動的に制御される。実行時間ＰＭＥソフ
トウェアは、ＰＭＥが回路切替１９１を既に実行し、図
４に示すＰＭＥ制御レジスタ１をロードしているか否か
を判別する。この回路切替はＣＳ１７０で可能とされ、
入力ソースはＲＡ１６２で選択され、出力受信先はＴＡ
１６１で選択される。その上、ソフトウェアはスプリッ
タ・サブモード１６６を使用可能にし、代替受信先を選
択し１６７、スプリッタ・ワード・カウント１６８を使
用することができる。The circuit switching path is dynamically controlled by the local PME runtime software. The execution time PME software determines whether the PME has already executed the circuit switch 191 and loaded the PME control register 1 shown in FIG. This circuit switching is possible with CS170,
Input source is selected by RA162, output destination is TA
161 is selected. In addition, the software may enable splitter submode 166, select an alternate receiver 167, and use splitter word count 168.

【０１３４】回路切替経路は、介入するＰＭＥ処理又は
主記憶資産８７，８８，８９を通さずに、ＰＭＥ入力経
路８２，８５，８６，９３を直接出力経路８３，９２，
９４，９５に接続する。この回路切替はタグ２２１、デ
ータ２２０、及びタグ２２２を通過する。中間地点ＰＭ
Ｅは制御タグＴＣ２２４に対するタグを監視する。The circuit switching path is such that the PME input paths 82, 85, 86, 93 are directly output paths 83, 92, without passing through the intervening PME processing or main memory assets 87, 88, 89.
Connect to 94 and 95. This circuit switching passes through the tag 221, the data 220, and the tag 222. Waypoint PM
E monitors the tag for control tag TC224.

【０１３５】ＰＭＥが制御タグＴＣを検出したとき、そ
れは適当な割込要求１７１をセットする。実行時間ＰＭ
Ｅソフトウェアが割込みをかけたとき、それは回路切替
１７０及びスプリット１６６をリセットするような要求
ソフトウェア・プロトコルを処理し、代替受信先経路１
６７に対しメッセージを送信して、他の中間地点ＰＭＥ
に対する回路切替を終了する。When the PME detects the control tag TC, it sets the appropriate interrupt request 171. Execution time PM
When the E software interrupts, it handles the required software protocol such as resetting circuit switch 170 and split 166, and alternate destination path 1
67 to send a message to the other waypoint PME
The circuit switching for is ended.

【０１３６】ＰＭＥが回路切替モードにある間、それ
は、出力マルチプレクサＭＵＸ９０が回路切替経路内に
あるので、データ転送を開始することはできない。しか
し、回路切替入力９９を除くいかなる入力からでもデー
タを受信することができ、又メモリー４１、ＡＬＵ４
０、又は他の処理機能８８のどれも回路切替経路を使用
しないので、即時データ出力を要求しない処理全べてを
実行することができる。While the PME is in the circuit switching mode, it cannot initiate a data transfer because the output multiplexer MUX90 is in the circuit switching path. However, the data can be received from any input except the circuit switching input 99, and the memory 41 and the ALU4 can be received.
0, or any of the other processing functions 88 do not use the circuit switching path, so that all processing that does not require immediate data output can be performed.

【０１３７】例えば、Ｌ８５入力が４出力８１，９２，
９４，９５の１つに回路切替された場合、入力データの
転送はＶ９３，Ｒ８６，又はＸ８０入力から受信するこ
とができる。これら入力データの転送は受信シーケンサ
からＰＭＥメモリーに対して入力データをロードするシ
ーケンスを発生させる。その上、ＰＭＥ内の計算処理は
回路切替モードが使用可能である間続けられる。For example, L85 input has 4 outputs 81, 92,
When the circuit is switched to one of 94, 95, the transfer of input data can be received from the V93, R86, or X80 input. The transfer of these input data causes a sequence of loading the input data from the receiving sequencer to the PME memory. Moreover, the calculation process in the PME is continued while the circuit switching mode is enabled.

【０１３８】発信元及び受信先ＰＭＥに対する回路切替
の利点は、データ・メッセージを記憶し、処理し、再送
信する中間地点ＰＭＥに関連する遅延時間を除去するこ
とができるということである。これは、回路切替モード
で動作する各中間地点ＰＭＥに対する遅延時間の除去に
よってアレイの通信直径を有効に減じることができるか
らである。The advantage of circuit switching for the source and destination PMEs is that the delay time associated with waypoint PMEs that store, process and retransmit data messages can be eliminated. This is because the communication diameter of the array can be effectively reduced by eliminating the delay time for each waypoint PME operating in circuit switching mode.

【０１３９】中間地点ＰＭＥに対する回路切替の利点
は、中間地点ＰＭＥを通して通過するデータを記憶し、
再送信するため、中間地点ＰＭＥのＩ／Ｏがメモリー帯
域幅を使用しないということである。又、中間地点ＰＭ
Ｅ実行時間ソフトウェアはデータ・メッセージの再送信
に要求されるオーバーヘッドを持たない。回路切替の本
質的効果は中間地点ＰＭＥに対するより高い処理帯域幅
と、発信元及び受信先ＰＭＥに対するより短い通信直径
とにある。The advantage of circuit switching over the waypoint PME is that it stores the data passing through the waypoint PME,
It means that the I / O of the waypoint PME does not use the memory bandwidth because of the retransmission. Also, the intermediate point PM
E runtime software does not have the overhead required to retransmit a data message. The essential effect of circuit switching lies in the higher processing bandwidth for the waypoint PME and the shorter communication diameter for the source and destination PMEs.

【０１４０】以上、本発明の好ましい実施例を詳細に説
明したが、本発明はそれのみでなく、現在及び将来、本
発明の範囲において、更に改良、変更、及び拡張しうる
ことは明らかである。Although the preferred embodiment of the present invention has been described in detail above, it is obvious that the present invention is not limited to this, and can be further improved, changed and expanded within the scope of the present invention in the present and future. .

【０１４１】[0141]

【発明の効果】本発明は以上説明したように構成し、メ
モリー及び入力データが要求しない限り、メッセージを
動的に接続して入力経路及び出力経路を通過させると共
に、ノードのフル・メモリー帯域幅を使用してメモリー
と入力データとを同時に処理する機能を提供したことに
より、並列プロセッサ及び処理要素間でＩ／Ｏメッセー
ジを急速転送させることができるマルチ・プロセッサ並
列コンピュータ・システムを提供することができる。The present invention is constructed as described above and dynamically connects messages to pass through the input and output paths and the full memory bandwidth of the node unless memory and input data require it. By providing a function for simultaneously processing a memory and input data using a parallel processor, it is possible to provide a multi-processor parallel computer system capable of rapidly transferring an I / O message between parallel processors and processing elements. it can.

[Brief description of drawings]

【図１】特に、拡張並列アレイ・プロセッサＡＰＡＰの
主要素及びホスト・プロセッサに対するＡＰＡＰインタ
ーフェースを示す典型的なＡＰＡＰの機能ブロック図FIG. 1 is a functional block diagram of a typical APAP showing, among other things, the main elements of an enhanced parallel array processor APAP and the APAP interface to a host processor.

【図２】本発明による処理メモリー要素ノードの実施例
であって、特にかかるノードを構成する各種要素の相互
接続を示す機能ブロック図FIG. 2 is a functional block diagram of an embodiment of a processing memory element node according to the present invention, showing in particular the interconnection of the various elements making up such a node.

【図３】処理メモリー要素（ＰＭＥ）の実施例によるデ
ータ・フローを示し、データ・フローの基本部には主記
憶、汎用レジスタ、ＡＬＵ及びレジスタ、及び相互接続
メッシュの一部を含むブロック図FIG. 3 is a block diagram illustrating a data flow according to an embodiment of a processing memory element (PME), with a basic portion of the data flow including main memory, general purpose registers, ALUs and registers, and a portion of an interconnection mesh.

【図４】割込み及び相互接続ネットワークの実現を支援
するＰＭＥ制御レジスタを例示する構成図FIG. 4 is a block diagram illustrating a PME control register that supports the implementation of interrupt and interconnect networks.

【図５】ＰＭＥ状況レジスタ及び割込レベル優先権割当
を示す構成図FIG. 5 is a block diagram showing PME status register and interrupt level priority assignment.

【図６】変更２進ハイパーキューブを例示する構成図FIG. 6 is a block diagram illustrating a modified binary hypercube.

【図７】変更２進ハイパーキューブを例示する構成図FIG. 7 is a block diagram illustrating a modified binary hypercube.

【図８】回路切替経路を示す転送インターフェースのデ
ータ・フロー図FIG. 8 is a data flow diagram of a transfer interface showing a circuit switching path.

【図９】ＰＥＩ／Ｏデータ・フロー及び相互接続ネッ
トワークに対する抱き合せを例示するその構成図FIG. 9 is a block diagram illustrating PE I / O data flow and tying to an interconnection network.

【図１０】タグ、パリティ、及びＰＭＥＩ／Ｏ間で転
送されるデータ・ワードを例示するその構成図FIG. 10 is a block diagram illustrating tags, parity, and data words transferred between PME I / Os.

【図１１】ＰＭＥＩ／Ｏ間で順序付けされる出力イン
ターフェースを例示するその構成図FIG. 11 is a block diagram illustrating an output interface ordered between PME I / Os.

【図１２】ＰＭＥＩ／Ｏ間で順序付けされる入力イン
ターフェースを例示するその構成図FIG. 12 is a block diagram illustrating an input interface ordered among PME I / Os.

【図１３】割込み及びＩ／Ｏ処理に対する予約記憶位置
を示すその構成図FIG. 13 is a configuration diagram showing reserved storage locations for interrupt and I / O processing.

[Explanation of symbols]

１ホスト・プロセッサ２ホスト・メモリー３アプリケーション・プロセッサ・インターフェース４クラスタ・シンクロナイザ５クラスタ・コントローラ６クラスタ２０処理メモリー要素（ＰＭＥ）２１同報通信及び制御インターフェース（ＢＣＩ）４１主メモリー４２ＡＬＵ４７送信レジスタ 1 Host Processor 2 Host Memory 3 Application Processor Interface 4 Cluster Synchronizer 5 Cluster Controller 6 Cluster 20 Processing Memory Element (PME) 21 Broadcast and Control Interface (BCI) 41 Main Memory 42 ALU 47 Transmission Register

───────────────────────────────────────────────────── フロントページの続き (72)発明者ミッチェル・チャールス・ダップアメリカ合衆国13760、ニューヨーク州、エンドウェル、イボン・アベニュー、1130 番地 (72)発明者ジェイムス・ワーレン・ディフェンダファアメリカ合衆国13827、ニューヨーク州、オウゴ、フロント・ストリート、396番地 (72)発明者ドナルド・ミッチェル・レスミースタアメリカ合衆国13850、ニューヨーク州、ベスタル、コリンズ・ヒル・ロード、108 エイ番地 (72)発明者リチャード・エドワード・ニーアアメリカ合衆国13732、ニューヨーク州、アパラチン、ホーレスト・ヒル・ロード、 109番地 (72)発明者エリック・ユージーン・レターアメリカ合衆国18851、ペンシルバニア州、ワレン・センタ、ボックス29ビー、エイチシーアール 34番地 (72)発明者ロバート・リースト・リチャードソンアメリカ合衆国13850、ニューヨーク州、ベスタル、ボックス81、マーソン・ロード、アール．ディー．＃２（番地なし) (72)発明者ビンセント・ジョーン・スモーラルアメリカ合衆国13760、ニューヨーク州、エンドウェル、スキーレイン・テラス、 812番地 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Mitchell Charles Dapp United States 13760, New York, Endwell, Ivon Avenue, 1130 (72) Inventor James Warren Defenderfa United States 13827, Ogo, New York , Front Street, 396 (72) Inventor Donald Mitchell Resmista United States 13850, New York, Vestal, Collins Hill Road, 108 Aye (72) Inventor Richard Edward Nieer United States 13732, New York, Appalachin, Forest Hill Road, Address 109 (72) Inventor Eric Eugene Letter United States of America 18 851, Pennsylvania, Warren Center, Box 29, Box 34, H.C. 34 (72) Inventor Robert Least Richardson United States 13850, New York, Vestal, Box 81, Marson Rod, Earl. Dee. # 2 (No Address) (72) Inventor Vincent Joan Smallal 812, Skirain Terrace, Endwell, Endwell, New York, USA 13760

Claims

[Claims]

1. A message passing device for passing a message through a network including nodes connected to each other, wherein an input message is stored in a main memory of each node, and
Except for the setup process of the O structure and the use of main memory bandwidth, it provides a means to send output messages from each node's main memory without using the node's processing assets and one of the node's input data paths A means to dynamically and logically connect to one of its output paths and to use the input and output paths that are dynamically and logically connected without using the processing, memory bandwidth or storage assets of the node. Means for passing messages through the node, and as long as the process does not require the use of memory or input data logically connected paths, through the dynamically connected input and output paths. A means by which a node can simultaneously process memory and input data using the full node memory bandwidth while a message is passing, and logically connected inputs and outputs. Means for recognizing the end of a message in the input path, and means for dynamically and logically disconnecting the connected input path from the output path along the path so that future incoming messages will use the node's storage and processing assets. A message passing device comprising:

2. The message passing device according to claim 1, wherein the logical connection of the input path to the output path reduces the network communication diameter by 1 for each node of such a configuration.

3. The message passing device is further configured such that the first n-word message is passed to R and the next n-word is S.
R, S, so that the next n words pass through T, the next n words pass through U, and the next n words pass through R.
2. The message passing device according to claim 1, further comprising means for substituting an output route among a plurality of closest routes such as T and U.

4. The node is a processing memory element (PM).
The message passing device according to claim 1, which is E).

5. A plurality of multi-processing memory elements (PMEs) comprising means for internal data flow and control, means for data and control communication, a significant amount of local storage, and circuit switching modes and storage and transfer modes. ), The PME to PME communication network passing data and distributing control among the multi-processing memory elements.

6. The computer system further comprises:
6. The computer of claim 5, including means for dynamically switching within the processing memory element between SIMD and MIMD operation modes for storage and transfer / circuit switching functions that enable transfer of messages and data. ·system.

7. The memory of each said processing memory element comprises:
When blocking is not occurring, a circuit switching mode is provided to provide either a targeted message in the processing memory element or a message moved in the store and transfer mode, and a specific processing memory element directly to the requested output port. The computer system of claim 5, providing a data recipient for messages that are not targeted to be sent.

8. The memory of each processing memory element comprises:
When blocking is not occurring, a circuit switching mode is provided to provide either a targeted message in the processing memory element or a message moved in the store and transfer mode, and a specific processing memory element directly to the requested output port. Provide a data destination for untargeted messages to be transmitted, and software control of the processing memory element is a path for the selected transmission mode by dynamically selecting a circuit switching mode and a storage and transfer mode. 6. The computer system according to claim 5, wherein execution of designation and determination is controlled.

9. The processing memory element is a direct memory
A clock that supports addressing and communication of dynamic selection between circuit switching mode and store and transfer mode, and the data present in circuits switched to internal output ports allows for single cycle data transfer. 6. The computer system according to claim 5, wherein the computer system is adapted to detect the occurrence of a chip crossing by transferring the data in a chip regardless of the above.

10. The processing memory element supports direct memory addressing and communication of dynamic selection between circuit switching mode and storage and transfer mode, data transfer operating in transparent mode, and direct memory. To use forward and backward data paths for all to provide means to probe through several stages to detect acknowledgments from processing memory elements performing addressing and off-chip transfers The computer system according to claim 5, characterized in that