JP2840294B2

JP2840294B2 - Data communication system for parallel computer

Info

Publication number: JP2840294B2
Application number: JP1153276A
Authority: JP
Inventors: 宏喜三浦
Original assignee: Sanyo Denki Co Ltd
Current assignee: Sanyo Denki Co Ltd
Priority date: 1989-06-15
Filing date: 1989-06-15
Publication date: 1998-12-24
Anticipated expiration: 2013-12-24
Also published as: JPH0318961A

Description

【発明の詳細な説明】（イ）産業上の利用分野本発明は、計算機のデータ通信システム、特に複数の
プロセッサ間で、各プロセッサに与えられたプロセッサ
番号に従って、データの送受信を行なう並列計算機にお
けるデータ通信システムに関する。The present invention relates to a data communication system for a computer, and more particularly to a parallel computer for transmitting and receiving data between a plurality of processors in accordance with a processor number given to each processor. The present invention relates to a data communication system.

（ロ）従来の技術近年、実用的な並列処理計算機の実現に向けて研究が
進められており、特に、半導体技術の進歩に伴い、通信
制御部とデータ処理部をあわせたものを、１チップの要
素プロセッサLSIとして実現し、この要素プロセッサLSI
を多数個接続して、並列処理プロセッサを実現する研究
が多く見られる。(B) Conventional technology In recent years, research has been conducted toward the realization of a practical parallel processing computer. In particular, with the advance of semiconductor technology, a combination of a communication control unit and a data processing unit has been integrated into one chip. Of this element processor LSI
Many studies have been conducted to realize a parallel processing processor by connecting a large number of devices.

例えば、昭和59年４月９日付で発行された日経エレク
トロニクスの第181頁から第218頁に開示されている並列
処理計算機においては、Imppと呼ばれる１チップの要素
プロセッサをリング状に複数個接続してデータの送受信
を行うデータ通信システムを構成している。For example, in the parallel processing computer disclosed on page 181 to page 218 of Nikkei Electronics published on April 9, 1984, a plurality of one-chip element processors called Impp are connected in a ring. Thus, a data communication system for transmitting and receiving data is configured.

また、本願発明者は、情報処理学会第38回（平成元年
前期）論文集2T−２に開示されているように、１チップ
の要素プロセッサLSIを、最大1024台接続した大規模並
列データ駆動計算機EDDEN（Enhanced Data Driven ENgi
ne）の開発を進めている。In addition, as disclosed in the 38th (Early 1989) Transactions of Information Processing Society of Japan 2T-2, the inventor of the present application has proposed a large-scale parallel data drive system in which a maximum of 1024 one-chip element processors are connected. Computer EDDEN (Enhanced Data Driven ENgi
ne) is under development.

（ハ）発明が解決しようとする問題点上述のような並列計算機においては、各要素プロセッ
サにプロセッサ番号をつけて識別し、通信データには行
き先のプロセッサ番号を付加しておき、各プロセッサは
到着した通信データのプロセッサ番号と自身のプロセッ
サ番号とを比較することによって、通信データを選択的
に自身のデータ処理部に入力するという方法がよく用い
られる。また、並列計算機内で処理が施された結果デー
タを並列計算機外部へ出力し、ホスト計算機などの他の
データ処理装置に渡す機能が必要である。このような機
能を実現するためには、各プロセッサにプロセッサ番号
を割りあてるとともにホスト計算機にも固有のプロセッ
サ番号を割りあて、前述の結果データには行き先番号と
してホスト計算機のプロセッサ番号を付加することによ
って、並列計算機外部へのデータであることを識別する
方法が一般的である。(C) Problems to be Solved by the Invention In the above-mentioned parallel computer, each element processor is identified by assigning a processor number, and the destination processor number is added to the communication data, and each processor arrives. A method is often used in which the communication data is selectively input to its own data processing unit by comparing the processor number of the communication data thus obtained with its own processor number. In addition, it is necessary to have a function of outputting the result data processed in the parallel computer to the outside of the parallel computer and passing it to another data processing device such as a host computer. In order to realize such a function, a processor number is assigned to each processor and a unique processor number is also assigned to the host computer, and the processor number of the host computer is added as a destination number to the above-described result data. In general, a method is used to identify that the data is external to the parallel computer.

例えば、上述のImppと呼ばれるプロセッサにおいて
は、前記日経エレクトロニクス第206頁に示されている
ように、４ビットで表わされるモジュール番号（16種
類）のうち、モジュール番号“0"をホスト計算機に割り
あてて、リングバスに挿入されたインターフェース部に
おいて、このモジュール番号“0"のデータを検知してホ
スト計算機に向けて出力する方法をとっている。For example, in the above-mentioned processor called Impp, as shown in the Nikkei Electronics page 206, the module number “0” is allocated to the host computer among the module numbers (16 types) represented by 4 bits. Then, in the interface unit inserted into the ring bus, a method of detecting the data of the module number “0” and outputting it to the host computer is adopted.

しかし、前述のEDDENのように多数の要素プロセッサ
を２次元的あるいは３次元的な幾何学的トポロジーをも
った結合網で相互接続する計算機において、このように
ホスト計算機にもプロセッサ番号を割り当てると、均一
なプロセッサで構成すべきトポロジーのうちの１部のプ
ロセッサが欠けてしまうことになり、並列計算機の処理
能力を低下させるとともに、並列計算機のプログラミン
グを極めて困難にすることになる。However, in a computer that interconnects a number of element processors by a connection network having a two-dimensional or three-dimensional geometric topology like the EDDEN described above, if a processor number is assigned to the host computer in this way, Some of the topologies that should be configured with a uniform processor will be missing, reducing the processing power of the parallel computer and making the parallel computer extremely difficult to program.

さらに、EDDENのような計算機では、各要素プロセッ
サから、結果データがホスト計算機に向けて転送されて
いく際に、何種類もの経路が存在するため、このような
ホスト計算機行きの結果データを最短経路で転送できる
ことが望ましい。Further, in a computer such as EDDEN, when result data is transferred from each element processor to the host computer, there are many types of paths, and the result data destined for the host computer is transferred to the shortest path. It is desirable to be able to transfer by.

従って、本発明の目的は、並列計算機のプロセッサ結
合トポロジーを乱すことなく、ホスト計算機への結果デ
ータを、ホスト計算機に向けて最短経路で転送制御でき
るような並列計算機のデータ通信システムを提供するこ
とである。Therefore, an object of the present invention is to provide a data communication system of a parallel computer which can control transfer of result data to the host computer via the shortest path without disturbing the processor connection topology of the parallel computer. It is.

（ニ）問題点を解決するための手段本発明は、隣接プロセッサとの結合のために複数の通
信ポートを備えたプロセッサを複数個結合して要素プロ
セッサ間のデータ通信を行う並列計算機のデータ通信シ
ステムにおいて、各プロセッサは、固有のプロセッサ番号で識別され、
前記通信データは少なくともデータの行き先のプロセッ
サ番号と該データが前記並列計算機の外部へ出力される
べきか該並列計算機の内部で処理されるべきかを示す外
部フラグを保持しており、各プロセッサに前記複数の通信ポートのいずれかを介
して前記通信データが到着した時、該通信データが保持
する行き先プロセッサ番号と自身のプロセッサ番号を比
較し、上記両プロセッサ番号が一致しなければ、該データを
行き先のプロセッサに最短で到着せしめるように、該デ
ータを前記複数の通信ポートのうちのいずれかに選択的
に出力し、上記両プロセッサ番号が一致し、かつ該データの保持
する外部フラグが、該データが並列計算機の内部で処理
されるべきであることを示している時は、該データを自
身のプロセッサにおいてデータ処理し、上記両プロセッサ番号が一致し、かつ該データの保持
する外部フラグが、該データが並列処理計算機の外部へ
出力されるべきであることを示している時は、該データ
を自身のプロセッサでデータ処理せずに前記複数の通信
ポートのうちの定められた通信ポートに出力し、前記並列処理計算機の外部へ向けて出力する通信デー
タは、該データが前記並列計算機の外部へ出力されるべ
きであることを示す外部フラグを有すると共に、前記並
列計算機の外部に最も近いプロセッサのプロセッサ番号
を行き先プロセッサ番号として有することを特徴とす
る。(D) Means for Solving the Problems The present invention relates to a data communication of a parallel computer which performs data communication between element processors by connecting a plurality of processors having a plurality of communication ports for connection with an adjacent processor. In the system, each processor is identified by a unique processor number,
The communication data has at least a processor number of a destination of the data and an external flag indicating whether the data should be output to the outside of the parallel computer or processed inside the parallel computer. When the communication data arrives via any of the plurality of communication ports, the destination processor number held by the communication data is compared with its own processor number. If the two processor numbers do not match, the data is compared. The data is selectively output to any one of the plurality of communication ports so that the data arrives at the destination processor in the shortest time. When the data indicates that it should be processed inside a parallel computer, the data is processed in its own processor. When the two processor numbers match and the external flag held by the data indicates that the data is to be output to the outside of the parallel processing computer, the data is processed by its own processor. Communication data to be output to a predetermined communication port among the plurality of communication ports without data processing, and to be output to the outside of the parallel processing computer, the data should be output to the outside of the parallel computer. And a processor number of a processor closest to the outside of the parallel computer as a destination processor number.

（ホ）作用本発明のデータ通信システムによれば、並列計算機の
各要素プロセッサからホスト計算機に向けて結果データ
を出力する際には、発信元の要素プロセッサにおいて、
結果データにホスト計算機に最も近い距離にある要素プ
ロセッサのプロセッサ番号と、並列計算機外部へ出力さ
れるべきデータであることを示す外部フラグとを付加し
て発信すればよい。このような結果データパケットは、
経路の途中のプロセッサにおいては、自身のプロセッサ
番号とパケットのプロセッサ番号が一致しないため、パ
ケットが保持するプロセッサ番号をもつプロセッサに最
短距離でパケットが到着できるように、いずれかの通信
ポートにパケットを選択的に出力する。このようにし
て、結果データパケットが、パケットが保持するプロセ
ッサ番号をもつプロセッサに到着すると、自身のプロセ
ッサ番号とパケットのプロセッサ番号が一致してかつ外
部フラグが外部行きを示しているという条件が検知さ
れ、結果データパケットは、該プロセッサの所定の通信
ポートに出力させる。従って、該プロセッサの該所定の
通信ポートに、出力インターフェースなどを介してホス
ト計算機を出力しておけば、結果データパケットは、ホ
スト計算機に向けて出力されることになる。(E) Function According to the data communication system of the present invention, when the result data is output from each element processor of the parallel computer to the host computer, at the source element processor,
The result data may be transmitted with the processor number of the element processor closest to the host computer and an external flag indicating that the data is to be output to the outside of the parallel computer. Such a result data packet is:
A processor in the middle of the route does not match its own processor number with the processor number of the packet, so the packet is sent to one of the communication ports so that the packet can reach the processor with the processor number held by the packet over the shortest distance. Selectively output. Thus, when the result data packet arrives at the processor having the processor number held by the packet, the condition that the processor number of the packet matches the processor number of the packet and the external flag indicates outbound is detected. The result data packet is output to a predetermined communication port of the processor. Therefore, if the host computer is output to the predetermined communication port of the processor via an output interface or the like, the result data packet is output to the host computer.

このようにして、任意の要素プロセッサから発せられ
る結果データパケットは、最短距離で、しかも途中のい
ずれのプロセッサのデータ処理部にも入力されることな
く、ホスト計算機に向けて転送される。In this way, the result data packet issued from any element processor is transferred to the host computer at the shortest distance and without being input to the data processing unit of any of the processors on the way.

また、プロセッサ番号とは別の外部フラグによって結
果データパケットを識別するため、並列計算機の結合ト
ポロジーを乱すこともない。Further, since the result data packet is identified by an external flag different from the processor number, the connection topology of the parallel computer is not disturbed.

また、以上の説明から、結果データパケット以外のデ
ータパケットは、行き先プロセッサに到着するまでは結
果データパケットの場合と同様に最短経路で転送され、
行き先プロセッサにおいて、そのデータ処理部に入力さ
れてデータ処理が施されることになる。Also, from the above description, data packets other than the result data packet are transferred via the shortest path until reaching the destination processor, as in the case of the result data packet.
In the destination processor, the data is input to the data processing unit and subjected to data processing.

（ヘ）実施例第１図に本発明実施例としての高並列データ駆動計算
機のシステムを示し、第２図に要素プロセッサの構成を
示す。(F) Embodiment FIG. 1 shows a highly parallel data driven computer system as an embodiment of the present invention, and FIG. 2 shows the configuration of an element processor.

まず第２図の要素プロセッサ（PE）は、基本的にはプ
ログラム記憶（PS）、発火制御・カラー管理部（FCC
M）、命令実行部（EXE）、及びキューメモリ（Ｑ）が巡
回パイプライン（リング）構造に接続された構成であ
る。First, the element processor (PE) in Fig. 2 basically consists of a program storage (PS), a firing control and color management unit (FCC).
M), an instruction execution unit (EXE), and a queue memory (Q) are connected in a cyclic pipeline (ring) structure.

プログラム記憶（PS）はノード番号の更新、定数付
与、及び結果のコピーを行う。発火制御・カラー管理部
（FCCM）は、左右オペインドの待ち合わせ及びカラーの
獲得・解放の管理を行なう。命令実行部（EXE）は、浮
動小数点・整数演算、条件判定、分岐などの命令を実行
する。キュー（Ｑ）は、リング上でのあらゆるデータ流
変動を吸収する緩衝記憶である。The program storage (PS) updates the node numbers, assigns constants, and copies the results. The firing control and color management unit (FCCM) manages the queuing of left and right opinoids and the acquisition and release of colors. The instruction execution unit (EXE) executes instructions such as floating-point / integer operation, condition determination, and branching. The queue (Q) is a buffer memory that absorbs any data flow fluctuations on the ring.

ベクトル演算制御部（VC）は、ベクトル演算関連命
令、及び外部データメモリアクセス命令の実行制御を行
う。外部データメモリ（EDM）は、構造体、ベクトルデ
ータ等を格納するメモリである。The vector operation control unit (VC) controls execution of a vector operation related instruction and an external data memory access instruction. The external data memory (EDM) is a memory for storing structures, vector data, and the like.

通信制御部（NC）は、東西南北４系統の通信ポートを
備え、最大1024プロセッサ（PE）のトーラス結合網に基
づくルーティング制御を行う。入力制御部（IC）は、通
信制御部からリングへのデータパケットの入力処理を行
う。出力制御部（OC）は、リングから通信制御部へのデ
ータパケットの出力処理を行う。ベクトル演算制御部
（VC）と、入力制御部（IC）及び出力制御部（OC）の間
には構造体（ベクトル）データ通信用のバイパス線を備
えている。The communication control unit (NC) has four communication ports of north, south, east and west, and performs routing control based on a torus connection network of up to 1024 processors (PE). The input control unit (IC) performs a process of inputting a data packet from the communication control unit to the ring. The output control unit (OC) performs a process of outputting a data packet from the ring to the communication control unit. A bypass line for structure (vector) data communication is provided between the vector operation control unit (VC), the input control unit (IC), and the output control unit (OC).

斯様な要素プロセッサ（PE）を多数用いたEDDENの基
本的な構成は第１図に示すようにｎ×ｎ台の要素プロセ
ッサをトーラス結合網で接続することを基本とする。該
トーラス結合網とは、多数のプロセッサを行列配置し、
各縦方向のプロセッサ群を循環的に結合する複数の縦通
信線と各横方向のプロセッサ群を循環的に結合する複数
の横通信線とで任意のプロセッサ間のデータ通信を可能
としたものである。The basic configuration of an EDDEN using a large number of such element processors (PE) is based on the connection of n × n element processors by a torus connection network as shown in FIG. With the torus connection network, a large number of processors are arranged in a matrix,
A plurality of vertical communication lines that cyclically connect each vertical processor group and a plurality of horizontal communication lines that cyclically connect each horizontal processor group enable data communication between arbitrary processors. is there.

このようなシステムでは、ネットワークと外部とのデ
ータのやりとりは、ネットワークインタフェース（NI
F）を挿入して行う。In such a system, the exchange of data between the network and the outside uses a network interface (NI
F).

上述の構成のデータ駆動計算機で用いられるデータパ
ケットには、大別してプログラム実行に使用する実行パ
ケットとプログラム実行以外に使用する非実行パケット
があり、第４図（ａ）〜（ｅ）にその実例を示してい
る。パケット形式は、構造体データを保持したパケット
以外は固定長とし、プロセサ（PE）内のパイプラインリ
ング上では33ビット×２語、ネットワーク上（通信制御
部）においては18ビット×４語構成である。The data packets used in the data driven computer having the above configuration are roughly classified into an execution packet used for program execution and a non-execution packet used for other than program execution. FIGS. 4A to 4E show examples. Is shown. The packet format is fixed length except for the packet holding the structure data, and is 33 bits x 2 words on the pipeline in the processor (PE) and 18 bits x 4 words on the network (communication control unit). is there.

以下に、第４図のパケットフォーマットにおける各フ
ィールド内容について説明する。The contents of each field in the packet format of FIG. 4 will be described below.

HD（1bit）:2語パケットの際の１語目（ヘッダ）と２語
目（テイル）の識別子であり、ヘッダの場合“1"。HD (1 bit): an identifier of the first word (header) and the second word (tail) in a two-word packet, and “1” in the case of a header.

EX（1bit）：パイプラインリングから通信制御部へ向け
て出力すべきパケットを識別するフラグ。EX (1 bit): A flag for identifying a packet to be output from the pipeline ring to the communication control unit.

MODE（2bit）：実行パケット、非実行パケットなどのパ
ケットの種類を識別する識別コード。MODE (2 bits): An identification code for identifying the type of packet such as an execution packet and a non-execution packet.

Ｓ−CODE:MODEと合わせてパケットに対する処理を規定
する識別コード。S-CODE: An identification code that defines processing for a packet together with MODE.

OPCODE−Ｍ（5bit）及びOPCODE−Ｓ（6bit）：命令の種
類を識別する命令コード。OPCODE-M (5 bits) and OPCODE-S (6 bits): Instruction codes for identifying the type of instruction.

NODE＃（11bit）：データフローグラフのノード番号。NODE # (11bit): Node number of data flow graph.

COLOR（4bit）：カラー。サブルーチンコールによるプ
ログラム共用など、同一データフローグラフを多重実行
する際に環境を識別するための識別番号。COLOR (4bit): Color. Identification number for identifying the environment when executing the same data flow graph multiple times, such as program sharing by subroutine calls.

DATA（32bit）：整数、浮動小数点数などの数値デー
タ。DATA (32bit): Numeric data such as integers and floating point numbers.

HT（1bit）：ネットワーク上のパケットでヘッダ、テイ
ルとその中間の語とを識別するフラグ。HT (1 bit): A flag for identifying a header, a tail and an intermediate word in a packet on the network.

RQ（1bit）：ネットワーク上を転送されるパケットに付
加するフラグで、ネットワーク上でデータが１語転送さ
れるたびに値が反転する為、語の存在を認識できる。更
に、値が反転することが、パケットを前方へ転送するた
めの転送要求信号となる。また、HTフラグと合わせて、
ヘッダとテイルとを識別できる。RQ (1 bit): A flag added to a packet transferred on the network. The value is inverted every time one word of data is transferred on the network, so that the presence of a word can be recognized. Further, the inversion of the value becomes a transfer request signal for transferring the packet forward. Also, along with the HT flag,
The header and the tail can be distinguished.

ADDRESS（16bit）：各メモリのロード／ダンプなどの際
に、メモリアドレスを格納する。ADDRESS (16bit): Stores the memory address when loading / dumping each memory.

また、パイプラインリング上の入力制御部（IC）に
は、自身のプロセッサ番号を格納しておくためのプロセ
ッサ番号レジスタを備えている。第６図にプロセッサ番
号レジスタの構成を示す。同図のPE番号Ｘは横方向（東
西方向）のPE番号（列番号）であり、PE番号Ｙは縦方向
（南北方向）のPE番号（行番号）である。両者を合わせ
て各プロセッサを固有に識別するプロセッサ番号とな
る。又、同図のPEACTと称するフラグビットは、プロセ
ッサ番号が既に設定されているかどうかを示すフラグで
あり、設定されていなければ“0"であり、設定された時
に“1"となる。The input control unit (IC) on the pipeline ring has a processor number register for storing its own processor number. FIG. 6 shows the configuration of the processor number register. The PE number X in the figure is a horizontal (east-west) PE number (column number), and the PE number Y is a vertical (north-south) PE number (row number). Together, the processor numbers uniquely identify each processor. A flag bit called PEACT in the figure is a flag indicating whether or not the processor number has already been set, and is "0" if not set, and "1" when set.

通信制御部（NC）は、第４図（ｃ）及び同図（ｅ）の
如きパケットを通信ポートを介して受けとる。The communication control unit (NC) receives a packet as shown in FIGS. 4C and 4E via a communication port.

特殊動作モード（PEACT＝０）においては、通信制御
部は東西南北あらゆるポートから入力される全てのデー
タパケットを、自身へのパケットとみなして、パイプラ
インリングに入力し、識別コードによって指示される所
定の処理を行わしめる。この時、東西南北いずれかのポ
ートに、第６図に示しプロセッサ番号レジスタへのロー
ドを示す識別コードを持つ非実行パケットが到着する
と、通信制御部は、これをパイプラインリング上の入力
制御部（IC）に入力し、ここでプロセッサ番号レジスタ
に所定のプロセッサ番号がロードされるとともにPEACT
フラグが“1"にセットされる。このようにしてPEACTフ
ラグが“1"にセットされると該プロセッサの通信制御部
（NC）は、ノーマル動作モードで動作するようになる。In the special operation mode (PEACT = 0), the communication control unit regards all data packets input from all ports in the east, west, north and south as packets to itself, inputs them to the pipeline ring, and is indicated by the identification code. A predetermined process is performed. At this time, when a non-executable packet having an identification code shown in FIG. 6 and indicating the loading into the processor number register arrives at one of the east, west, north and south ports, the communication control unit transmits this to the input control unit on the pipeline ring. (IC), where the processor number register is loaded with the specified processor number and PEACT
The flag is set to "1". When the PEACT flag is set to "1" in this manner, the communication control unit (NC) of the processor operates in the normal operation mode.

ノーマル動作モードに（PEACT＝１）おいては、通信
制御部は到着したパケットの１語目にあるPE＃（パケッ
トの行き先プロセッサ番号）と自身のプロセッサ番号レ
ジスタにセットされている自身のプロセッサ番号とを比
較して、両者が一致した時にのみ該パケットをパイプラ
インリングに入力し、一致しない時は、所定のルーティ
ングアルゴリズムに従って該パケットを東西南北いずれ
かのポートに出力して隣接するプロセッサに向けて転送
する。In the normal operation mode (PEACT = 1), the communication control unit sets the PE # (the destination processor number of the packet) in the first word of the arriving packet and the own processor number set in the own processor number register. The packet is input to the pipeline ring only when the two match, and when the two do not match, the packet is output to one of the east, west, north, south and north ports in accordance with a predetermined routing algorithm to be directed to the adjacent processor. Transfer.

第５図に、MODEによって識別されるパケットの種類を
示す、同図に示すように、MODE＝00を保持したパケット
は、ホスト計算機へ向けて出力される結果パケットとし
て識別される。FIG. 5 shows the type of packet identified by MODE. As shown in FIG. 5, a packet holding MODE = 00 is identified as a result packet output to the host computer.

本発明計算機システムの主たる特徴は、このMODE識別
子がMODE＝00であるか否かによって、各プロセッサの通
信制御部がそれぞれの場合に応じた処理を行う点にあ
る。The main feature of the computer system of the present invention is that the communication control unit of each processor performs processing according to each case depending on whether or not the MODE identifier is MODE = 00.

これを説明するために通信制御部の動作についてさら
に詳細に説明する。第３図に通信制御部（NC）の構成を
模式的に示す。同図に於て、（RWI）及び（RWO）は、西
（Ｗ）入出力ポートを構成する自己同期式の入力シフト
レジスタ及び出力シフトレジスタであり、４段の18ビッ
トレジスタからなる。同様に（REI）（REO）は東（Ｅ）
入出力ポート、（RNI）（RNO）は北（Ｎ）入出力ポー
ト、（RSI）（RSO）は南（Ｓ）入出力ポートを構成して
いる。また、○は合流回路、◎は分岐回路を示してい
る。To explain this, the operation of the communication control unit will be described in more detail. FIG. 3 schematically shows the configuration of the communication control unit (NC). In the figure, (RWI) and (RWO) are a self-synchronous input shift register and an output shift register constituting the west (W) input / output port, and are composed of four stages of 18-bit registers. Similarly, (REI) (REO) means east (E)
The input / output port, (RNI) (RNO) constitutes a north (N) input / output port, and (RSI) (RSO) constitutes a south (S) input / output port. ○ indicates a merging circuit, and 、 indicates a branch circuit.

第３図を用いて、通信制御部におけるルーティングア
ルゴリズムについて説明する。M1〜M5はそれぞれパケッ
トの合流回路であり、同図に示した番号の順に優先度を
つけて、到着したパケットを合流させる（番号１が最も
優先度が高い）。R1〜R5はそれぞれパケットの分岐回路
であり、以下のようなアルゴリズムで処理を行う。The routing algorithm in the communication control unit will be described with reference to FIG. Each of M1 to M5 is a packet merging circuit that assigns a priority in the order of the numbers shown in the figure and merges arriving packets (number 1 has the highest priority). R1 to R5 are packet branch circuits, respectively, which perform processing according to the following algorithm.

§I.自分のプロセッサ番号（行番号、列番号）を（y,
x）、ネットワークの配列サイズをｐ×ｑ（p:縦方向、
q:横方向）、パケットの行き先プロセッサ番号を（Y,
X）とし、 Δｘ≡（Ｘ−ｘ）mod q,|Δx|≦q/2 Δｙ≡（Ｙ−ｙ）mod p,|Δy|≦p/2 とする。（modは、モジュロ演算を示す。） §II.プロセッサ番号はＮからＳの方向に順にｙ＝０、１、２、・・・ｐＷからＥの方向に順に、ｘ＝０、１、２、・・・ｑとする。§I. Change my processor number (row number, column number) to (y,
x), the array size of the network is p × q (p: vertical direction,
q: horizontal direction), specify the destination processor number of the packet as (Y,
X), and Δx≡ (X−x) mod q, | Δx | ≦ q / 2 Δy≡ (Y−y) mod p, | Δy | ≦ p / 2. (Mod indicates a modulo operation.) §II. Processor numbers are arranged in the order from N to S. y = 0, 1, 2,..., P In order from W to E, x = 0, 1, 2 ,... Q.

§III.MODEはパケットのMODEフィールドの値を意味し、
MODE＝00はホスト計算機行きのパケットであることを意
味する。§III.MODE means the value of the MODE field of the packet,
MODE = 00 means that the packet is destined for the host computer.

分岐条件は以下の通りである。 The branch condition is as follows.

（１）.R1 MODE≠00かつ（PEACT＝０またはΔｙ＝０）の時、パ
ケットをＰへ出力。(1) When R1 MODE ≠ 00 and (PEACT = 0 or Δy = 0), output the packet to P.

MODE＝00かつPEACT＝１かつΔｙ＝０の時、パケット
をＥに出力。When MODE = 00, PEACT = 1 and Δy = 0, output the packet to E.

上記以外の時、パケットをＳへ出力。 Otherwise, output the packet to S.

（２）.R2 MODE≠00かつ（PEACT＝０または（Δｘ＝０かつΔｙ
＝０））の時、パケットをＰへ出力。(2) .R2 MODE ≠ 00 and (PEACT = 0 or (Δx = 0 and Δy
= 0)), the packet is output to P.

PEACT＝１かつΔｘ＝０かつΔｙ＞０の時、パケット
をＳへ出力。When PEACT = 1, Δx = 0 and Δy> 0, output the packet to S.

PEACT＝１かつΔｘ＝０かつΔｙ＜０の時、パケット
をＮへ出力。When PEACT = 1, Δx = 0 and Δy <0, the packet is output to N.

上記以外の時、パケットをＷへ出力。 Otherwise, output the packet to W.

（３）.R3 MODE≠00かつ（PEACT＝０または（Δｘ＝０かつΔｙ
＝０））の時、パケットをＰへ出力。(3) .R3 MODE ≠ 00 and (PEACT = 0 or (Δx = 0 and Δy
= 0)), the packet is output to P.

上記以外の時、パケットをＥへ出力。 Otherwise, output the packet to E.

（４）.R4 MODE≠00かつ（PEACT＝０またはΔｙ＝０）の時、パ
ケットをＰへ出力。(4) When R4 MODE ≠ 00 and (PEACT = 0 or Δy = 0), output the packet to P.

MODE＝00かつPEACT＝１かつΔｙ＝０の時、パケット
をＥへ出力。When MODE = 00, PEACT = 1 and Δy = 0, output the packet to E.

上記以外の時、パケットをＮへ出力。 Otherwise, output the packet to N.

（５）.R5 Δｘ＝０かつΔｙ＞０の時、パケットをＳへ出力。(5) .R5 When Δx = 0 and Δy> 0, output the packet to S.

Δｘ＝０かつΔｙ＜０の時、パケットをＮへ出力。 When Δx = 0 and Δy <0, output the packet to N.

Δｘ＜０の時、パケットをＷへ出力。 When Δx <0, output the packet to W.

以上の説明からわかるように、PEACT＝１のノーマル
動作モードにおいては、各プロセッサ通信制御部は、パ
ケットの行き先＝（Y,X）、各プロセッサのプロセッサ
番号＝（y,x）の時、Ｘ＝ｘでない限り、パケットをＷ
からＥへ、あるいはＥからＷへ転送する。Ｘ＝ｘであれ
ば、Ｙ＝ｙでない限りパケットをＮからＳへ、あるいは
ＳからＮへ転送する。さらに、ＷまたはＥのポートから
ＮまたはＳのポートにパケットを転送する時、あるいは
パイプラインリング内部からＷ、Ｅ、Ｎ、Ｓのいずれか
のポートにパケットを転送する時には、モジュロ演算に
よって、プロセッサ間距離が小さくなる方向が選択され
ることになり、常に最短距離でのパケット通信制御機能
（セルフルーティング機能）が実現されている。As can be seen from the above description, in the normal operation mode in which PEACT = 1, each processor communication control unit determines that when the destination of the packet = (Y, X) and the processor number of each processor = (y, x), X = X unless W = x
From E to E or from E to W. If X = x, forward the packet from N to S or from S to N unless Y = y. Further, when transferring a packet from a port of W or E to a port of N or S, or transferring a packet from the inside of the pipeline ring to any port of W, E, N, or S, a modulo operation is performed by the processor. The direction in which the distance becomes smaller is selected, and the packet communication control function (self-routing function) at the shortest distance is always realized.

さらに、パケットが行き先のプロセッサに到着し、Δ
ｘ＝Δｙ＝０が検出されると、MODE≠00ならば行き先プ
ロセッサのパイプラインリングに入力されて処理され、
MODE＝00のホスト計算機行きのパケットであればパイプ
ラインリングには入力せずに特定の通信ポートに（Ｅに
到着したパケット以外は全てＥポートに）出力する。Further, the packet arrives at the destination processor, and Δ
When x = Δy = 0 is detected, if MODE ≠ 00, it is input to the pipeline of the destination processor and processed,
If the packet is destined for the host computer of MODE = 00, the packet is not input to the pipeline ring but is output to a specific communication port (all packets except the packet arriving at E are output to the E port).

以上がルーティングアルゴリズムの一例であるが、こ
れに限られるものではない。The above is an example of the routing algorithm, but the present invention is not limited to this.

次に、上述の説明の如き通信制御部を備えた要素プロ
セッサ（PE）を４×４（ｐ＝ｑ＝４）の配列に接続した
計算機EDDENの全体構成の一例を第７図に示し、本発明
実施例の更に詳細な説明を行う。Next, FIG. 7 shows an example of the overall configuration of a computer EDDEN in which element processors (PE) each having a communication control unit as described above are connected in an array of 4 × 4 (p = q = 4). A more detailed description of embodiments of the invention will be given.

第７図において、ホストコンピュータから出力される
データは、ホストインターフェースにおいて第４図
（ｃ）あるいは（ｅ）の形式に変換されて、入力線（I
N）、ネットワークインタフェース（NIF）を介して４×
４台のプロセッサ群に入力される。In FIG. 7, the data output from the host computer is converted into the format shown in FIG. 4C or FIG.
N), 4x via network interface (NIF)
The data is input to a group of four processors.

同図の計算機の電源が投入された時には、各PEのプロ
セッサ番号レジスタの値は不定である。次にハードウェ
アリセット信号（RST）を全PEに供給してやると、全て
のPEのPEACTフラグが０にクリアされて、全てのPEの通
信制御部の動作モードは特殊動作モードに設定される。
次にホストコンピュータから、前述のような経路でプロ
セッサ番号レジスタへのロードパケット（データ値＝0
0）が発せられると、該パケットは左上（１行１列）のP
EのＷポートに到達する。該PEの通信制御部内の分岐回
路R3は、PEACT＝０という条件を検知しパケットをＰ
（パイプラインリング内）に向けて転送し、該PEのプロ
セッサ番号レジスタに番号00が設定されるとともに該PE
のPEACTフラグは“1"にセットされる。以後、該PEは、P
E00として識別される（数字は、左が行番号、右が列番
号）。次に、プロセッサ番号レジスタへのロードパケッ
ト（PE＃＝01、データ値＝01）が発せられると、同様に
パケットは、PEOOのＷポートに到達するが、該プロセッ
サのPEACT＝１となっているため、今度は、PE00のR3に
おいてΔｘ≠０であるためパケットはＥポートに出力さ
れて、１行２列のPEのＷポートに到達する。１行２列の
PEでは、PEACT＝０であるため、該PEに番号01がセット
されてPEACT＝１にセットされる。同様にして、ロード
パケット（PE＃＝02、データ値02）によって、PE02が設
定され、ロードパケット（PE＃＝03、データ値03）によ
って、PE03が設定される。次に、ロードパケット（PE＃
＝10、データ値10）がPE00のＷポートに到達すると、該
PEのR3においては、Δｘ＝０かつΔｙ＞０の条件が検知
されて、パケットはＳポートへ出力され、２行１列のPE
がPE10に設定される。同様にして、PE11、PE12、…PE33
の全てのPEに番号が設定されると、全てのPEのPEACTフ
ラグは“1"にセットされており、計算機全体がノーマル
動作モードで動作するようになる。When the power supply of the computer shown in the figure is turned on, the value of the processor number register of each PE is undefined. Next, when the hardware reset signal (RST) is supplied to all the PEs, the PEACT flags of all the PEs are cleared to 0, and the operation modes of the communication control units of all the PEs are set to the special operation mode.
Next, a load packet (data value = 0) from the host computer to the processor number register through the above-described route.
0) is issued, the packet is placed in the upper left (1 row, 1 column) P
Reach E W port. The branch circuit R3 in the communication control unit of the PE detects the condition of PEACT = 0 and converts the packet to P.
(In the pipeline ring), the number 00 is set in the processor number register of the PE, and the
Is set to "1". Thereafter, the PE
It is identified as E00 (the numbers are row numbers on the left and column numbers on the right). Next, when a load packet (PE # = 01, data value = 01) to the processor number register is issued, the packet similarly reaches the W port of PEOO, but the processor has PEACT = 1. Therefore, this time, since Δx ≠ 0 at R3 of PE00, the packet is output to the E port and reaches the W port of the PE in the first row and the second column. 1 row, 2 columns
In the PE, since PEACT = 0, the number 01 is set in the PE and PEACT = 1. Similarly, PE02 is set by the load packet (PE # = 02, data value 02), and PE03 is set by the load packet (PE # = 03, data value 03). Next, load packet (PE #
= 10, data value 10) reaches the W port of PE00,
In R3 of the PE, the condition of Δx = 0 and Δy> 0 is detected, the packet is output to the S port, and the PE of 2 rows and 1 column is output.
Is set to PE10. Similarly, PE11, PE12, ... PE33
When the numbers are set to all the PEs, the PEACT flags of all the PEs are set to “1”, and the entire computer operates in the normal operation mode.

以後のノーマル動作モードでは、プログラム実行の終
了を示す結果パケット及びプロセッサの各部をダンプし
た結果パケットなどのホスト計算機に向けて出力される
結果パケットには、送信元のプロセッサににおいて行き
先プロセッサ番号03とMODE＝00が保持されて出力され
る。In the subsequent normal operation mode, the result packet indicating the end of the program execution and the result packet output to the host computer, such as the result packet obtained by dumping each part of the processor, include the destination processor number 03 in the transmission source processor. MODE = 00 is retained and output.

前述のアルゴリズムからわかるように、この結果パケ
ットは、まず、最短距離でPE03に向けて転送される。結
果パケットがPE03に到着すると、Δｘ＝Δｙ＝０かつMO
DE＝00の条件が成立する。As can be seen from the above algorithm, this result packet is first forwarded to PE03 over the shortest distance. When the result packet arrives at PE03, Δx = Δy = 0 and MO
The condition of DE = 00 is satisfied.

従って、PE03のＷポート、Ｎポート、あるいはＳポー
トに到着した結果パケットは、全てPE03のＥポートに出
力され、ネットワークインタフェース（NIF）に入力さ
れる。また、PE00のＷポートを経てPE03のＥポートへと
向かう結果パケットは、PE03に到着する前にネットワー
クインタフェース（NIF）に入力される。Therefore, all the result packets arriving at the W port, N port, or S port of PE03 are output to the E port of PE03 and input to the network interface (NIF). Also, the result packet going to the E port of PE03 via the W port of PE00 is input to the network interface (NIF) before arriving at PE03.

該ネットワークインタフェース（NIF）では、到来す
るパケットのMODEフィールドを常に監視しており、MODE
≠00ならばそのまま図の右から左あるいは左から右へ通
過させる。一方MODE＝00ならば到来したパケットを出力
線（OUT）を介してホスト計算機に向けて出力する。The network interface (NIF) constantly monitors the MODE field of an incoming packet.
If it is ≠ 00, it is passed from right to left or left to right in the figure. On the other hand, if MODE = 00, the incoming packet is output to the host computer via the output line (OUT).

以上のような機能によって、どの要素プロセッサから
発せられた結果パケットも、最短距離でホスト計算機に
向けて転送されていき、出力される。更にこのような機
能が、格子結合（トーラス結合）という並列計算機のプ
ロセッサ結合トポロジーを何ら乱すことなく実現されて
いることがわかる。With the above functions, the result packet issued from any element processor is transferred to the host computer over the shortest distance and output. Further, it can be seen that such a function is realized without disturbing the processor connection topology of a parallel computer called lattice connection (torus connection).

（ト）発明の効果以上の説明から明らかなように、本発明によれば、並
列計算機のプロセッサ結合トポロジーを乱すことなく、
ホスト計算機への結果データを、ホスト計算機に向けて
最短経路で転送制御できる。(G) Effects of the present invention As is apparent from the above description, according to the present invention, without disturbing the processor connection topology of the parallel computer,
Transfer of result data to the host computer can be controlled by the shortest path to the host computer.

[Brief description of the drawings]

第１図は本発明のデータ通信システムを示すシステム
図、第２図は本発明のプロセッサの概略構成を示すブロ
ック図、第３図は本発明のプロセッサの要部の模式図、
第４図（ａ）乃至（ｅ）はパケットの構成図、第５図は
パケットの識別コードの一部を示す対応図、第６図は本
発明のプロセッサ内部のプロセッサ番号レジスタの構成
図、第７図は本発明のデータ通信システムを示す更に詳
細なシステム図である。（PE）……プロセッサ、（PS）……プログラム記憶、
（EXE）……命令実行部、（NC）……通信制御部、（NI
F）……ネットワークインタフェイス。1 is a system diagram showing a data communication system of the present invention, FIG. 2 is a block diagram showing a schematic configuration of a processor of the present invention, FIG. 3 is a schematic diagram of a main part of the processor of the present invention,
4 (a) to 4 (e) are diagrams showing the structure of a packet, FIG. 5 is a diagram showing a part of the identification code of the packet, FIG. FIG. 7 is a more detailed system diagram showing the data communication system of the present invention. (PE) ... Processor, (PS) ... Program storage,
(EXE) Command execution unit, (NC) Communication control unit, (NI
F)… Network interface.

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開昭62−187958（ＪＰ，Ａ) 特開昭63−240663（ＪＰ，Ａ) 特開昭63−113659（ＪＰ，Ａ) 特開昭61−208561（ＪＰ，Ａ) 特開昭60−241155（ＪＰ，Ａ) 情報処理学会第38回（昭和64年前期) 全国大会講演論文集（▲ＩＩＩ▼）ｐ 1408−1409 (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 15/16 G06F 15/80 ＪＩＣＳＴ科学技術文献データベース──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-62-187958 (JP, A) JP-A-63-240663 (JP, A) JP-A-63-113659 (JP, A) JP-A 61-18759 208561 (JP, A) JP-A-60-241155 (JP, A) IPSJ 38th (Early 1988) National Convention Lecture Papers (III) p 1408-1409 (58) Int.Cl. ⁶ , DB name) G06F 15/16 G06F 15/80 JICST Science and Technology Literature Database

Claims

(57) [Claims]

1. A data communication system of a parallel computer for performing data communication between element processors by connecting a plurality of processors having a plurality of communication ports for connection with an adjacent processor, wherein each processor is a unique processor. The communication data is identified by a number, and holds at least a processor number of a destination of the data and an external flag indicating whether the data should be output to the outside of the parallel computer or processed inside the parallel computer. When the communication data arrives at each processor via any of the plurality of communication ports, the destination processor number held by the communication data is compared with its own processor number, and if the two processor numbers do not match. For example, the data is transmitted to the plurality of communication ports so that the data arrives at the destination processor in the shortest time. When the two processor numbers match and the external flag held by the data indicates that the data is to be processed inside the parallel computer. Indicates that the data is to be processed by its own processor, the two processor numbers match, and an external flag held by the data indicates that the data is to be output outside the parallel processing computer. Is output to the specified communication port of the plurality of communication ports without performing data processing by its own processor, and the communication data output to the outside of the parallel processing computer is It has an external flag indicating that data should be output outside the parallel computer, and has the processor number of the processor closest to the outside of the parallel computer. Parallel computer data communication system characterized by having as can destination processor number.

2. A plurality of processors are arranged in rows and columns, and a plurality of vertical communication lines cyclically connecting each vertical processor row and a plurality of horizontal communication lines cyclically connecting each horizontal processor row. A parallel computer that performs data communication between the processors,
In a data communication system of a parallel computer, which is inserted into at least one of the plurality of vertical communication lines or the plurality of horizontal communication lines and has an input / output interface for transmitting and receiving data to and from the outside of the parallel computer, Each of the processors includes at least a data processing unit and a communication control unit. The communication control unit includes four adjacent communication ports for connection with four adjacent processors in each of the row direction and the column direction, and a connection between the data processing unit and the four communication ports. When communication data arrives at the communication control unit of each processor via any of the communication ports, the communication control unit includes:
The destination processor number of the communication data is compared with its own processor number. If the two processor numbers do not match, the data is converted so that the distance to the destination processor is shortest among the four adjacent communication ports. When the two processor numbers match, and the external flag held by the data indicates that the data is to be processed inside the parallel computer, The data is input to the data processing unit of its own processor through the internal communication port for data processing, and the two processor numbers match, and the external flag held by the data indicates that the data is external to the parallel computer. When the data is to be output to the four adjacent communication ports without processing the data with its own processor data. And when communication data arrives at the input / output interface, an external flag held by the data indicates that the data should be processed inside the parallel computer. At the time, the data is passed as it is, and when the external flag held by the data indicates that the data is to be output to the outside of the parallel computer, the data is sent to the outside of the parallel computer. The communication data to be output and output to the outside of the parallel computer has an external flag indicating that the data is to be output to the outside of the parallel computer, and is the closest to the outside of the parallel computer. A data communication system for a parallel computer, comprising a processor and a processor number as a destination processor number.