JP3209630B2

JP3209630B2 - Data transfer device and multiprocessor system

Info

Publication number: JP3209630B2
Application number: JP02297194A
Authority: JP
Inventors: 順二西川
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1993-03-09
Filing date: 1994-02-22
Publication date: 2001-09-17
Anticipated expiration: 2016-09-17
Also published as: JPH076125A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、多数のプロセッサを用
いて並列に処理を行うマルチプロセッサシステムにおい
てプロセッサ間のデータ転送やプロセッサと外部とのデ
ータの授受を行うためのデータ転送装置に関するもので
ある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data transfer apparatus for transferring data between processors and transferring data between the processors and the outside in a multiprocessor system in which a plurality of processors perform parallel processing. is there.

【０００２】[0002]

【従来の技術】近年、情報処理の大規模化が進み、性能
の高い計算機システムが必要とされている。並列計算機
は、多数のプロセッサに処理を分担させることによって
性能向上を図ったものである。一般に並列処理では、複
数のプロセッサにおける処理の進行に伴って、プロセッ
サ間のデータ転送が必要となる。このようなプロセッサ
間の通信を行うため、各種の相互結合網（データ転送ネ
ットワーク）が提案されている。この中でも、クロスバ
型のネットワークは完全結合網であり、１回のデータ転
送で任意のプロセッサ間の通信を実現できる。2. Description of the Related Art In recent years, the scale of information processing has been increased, and a computer system with high performance has been required. The parallel computer is designed to improve performance by sharing processing among many processors. In general, in parallel processing, data transfer between processors is required as the processing in a plurality of processors progresses. In order to perform such communication between processors, various interconnection networks (data transfer networks) have been proposed. Among them, the crossbar network is a complete connection network, and communication between arbitrary processors can be realized by one data transfer.

【０００３】図２２に、複数のプロセッサエレメント
（ＰＥ）の間をクロスバ型のネットワークで接続した従
来のマルチプロセッサシステムの構成を示す。同図にお
いて、５０は送信側のプロセッサエレメント、５５は受
信側のプロセッサエレメント、６０は送信側データ転送
制御装置、６５は受信側データ転送制御装置、７０はデ
ータ転送チャネル（結合ノード）としてのバッファユニ
ットである。FIG. 22 shows a configuration of a conventional multiprocessor system in which a plurality of processor elements (PEs) are connected by a crossbar network. In the figure, reference numeral 50 denotes a transmitting processor element, 55 denotes a receiving processor element, 60 denotes a transmitting data transfer control device, 65 denotes a receiving data transfer control device, and 70 denotes a buffer as a data transfer channel (connection node). Unit.

【０００４】図２２のように論理的に表されるクロスバ
型のネットワークを実際のハードウエアでどのように実
現するかは、各々の並列計算機によって異なる。また、
処理の進行に従ってネットワークをアクセスする手順も
その処理内容によって異なる。図２３は、各バッファユ
ニット７０を先入れ先出し方式のＦＩＦＯバッファ７１
とした例を示すものである。送信側のプロセッサエレメ
ント５０は、ＦＩＦＯバッファ７１に空きのある間はデ
ータをそのチャネルに送出できる。受信側のプロセッサ
エレメント５５は、空でないＦＩＦＯバッファ７１から
データを読み込む。したがって、結合ノードを単なるス
イッチで構成する場合とは違って、ネットワーク全体の
結合状態を設定する必要はない。[0004] How a crossbar network logically represented as shown in FIG. 22 is realized by actual hardware differs depending on each parallel computer. Also,
The procedure for accessing the network as the processing progresses also differs depending on the processing content. FIG. 23 shows a FIFO buffer 71 of the first-in first-out type in which each buffer unit 70 is provided.
FIG. The processor element 50 on the transmission side can send data to the channel while the FIFO buffer 71 is empty. The receiving processor element 55 reads data from the non-empty FIFO buffer 71. Therefore, it is not necessary to set the connection state of the entire network, unlike the case where the connection node is constituted by a simple switch.

【０００５】図２２及び図２３に示す構成においてデー
タ転送先を指定するためのバッファユニットアドレス
（チャネル番号）の生成例を図２４及び図２５に示す。
図２４は、送信側データ転送制御装置６０に内蔵された
従来のアドレス生成回路の構成を示すものである。図２
４において、アドレス生成回路６１は、ｎビットポイン
タ６２と＋１加算器６３とを備えている。この構成によ
れば、送信側プロセッサエレメント５０から受信側プロ
セッサエレメント５５へデータを送るとき、ｎビットポ
インタ６２に保持するバッファユニットアドレスＡをデ
ータの転送ごとにアドレス更新要求信号ＣＮＴでインク
リメントすることにより、図２５に示すようにデータ転
送先のチャネル番号を順次指定することができる。これ
を「バースト転送」と呼ぶ。このデータ転送方法は、チ
ャネルの指定に余分な時間を必要とせず、ＣＮＴ信号と
してパルスを与えるだけでよいため、データ転送レート
が高い。FIGS. 24 and 25 show examples of generation of a buffer unit address (channel number) for designating a data transfer destination in the configuration shown in FIGS. 22 and 23.
FIG. 24 shows a configuration of a conventional address generation circuit built in the transmission-side data transfer control device 60. FIG.
In 4, the address generation circuit 61 includes an n-bit pointer 62 and a +1 adder 63. According to this configuration, when data is sent from the transmitting processor element 50 to the receiving processor element 55, the buffer unit address A held in the n-bit pointer 62 is incremented by the address update request signal CNT every time data is transferred. 25, the channel number of the data transfer destination can be sequentially designated. This is called "burst transfer". This data transfer method does not require extra time for channel designation and only needs to provide a pulse as a CNT signal, so that the data transfer rate is high.

【０００６】[0006]

【発明が解決しようとする課題】上記従来のデータ転送
装置における図２４のようなアドレス生成回路６１の構
成では、例えばｎビットポインタ６２が２ビットなら
０，１，２，３，０，１，２，３，…のように、２のｎ
乗のサイズをカウントするだけであり、アドレスの変化
範囲が固定され、かつシーケンシャルなアドレス生成に
制限される。In the configuration of the address generation circuit 61 as shown in FIG. 24 in the conventional data transfer apparatus, for example, if the n-bit pointer 62 is 2 bits, 0, 1, 2, 3, 0, 1, 1 2, 3, ..., 2
Only the size of the power is counted, the range of change of the address is fixed, and limited to sequential address generation.

【０００７】このような理由からサイズが固定されたデ
−タ転送ネットワークを用いたマルチプロセッサシステ
ムでは、ネットワークのサイズを処理内容に合わせる最
適化ができないため、並列化の効率が落ちる。また、プ
ロセッサエレメントをグループ化し、グループごとに目
的の異なるデータ転送を行いたい場合、バースト転送を
採用できないために、低転送レートの他のデータ転送方
法をとらざるを得ないという問題点を有している。For this reason, in a multiprocessor system using a data transfer network having a fixed size, the efficiency of parallelization decreases because the size of the network cannot be optimized according to the processing contents. In addition, when processor elements are grouped and it is desired to perform different data transfer for each group, burst transfer cannot be employed, so that another data transfer method having a low transfer rate must be used. ing.

【０００８】また、上記のようにシーケンシャルなアド
レスしか生成できないデ−タ転送ネットワークを用いた
マルチプロセッサシステムでは、例えば３次元の配列デ
ータを効率良く処理できないという問題がある。マルチ
プロセッサシステムにおいてデータの分散・収集を行う
場合、とびとびのアドレス値がしばしば出現するからで
ある。Further, in a multiprocessor system using a data transfer network capable of generating only sequential addresses as described above, there is a problem that, for example, three-dimensional array data cannot be efficiently processed. This is because, when data is distributed and collected in a multiprocessor system, discrete address values often appear.

【０００９】本発明の目的は、プログラマブルなアドレ
ス生成機構の採用によってマルチプロセッサシステムの
処理効率を高めることにある。An object of the present invention is to increase the processing efficiency of a multiprocessor system by employing a programmable address generation mechanism.

【００１０】[0010]

【課題を解決するための手段】上記目的を達成するため
に、本発明は、バッファユニットアドレスの上限及び下
限を規制したり、その増分を変更したりできるようにし
たものである。In order to achieve the above object, the present invention restricts the upper and lower limits of a buffer unit address and changes the increment thereof.

【００１１】[0011]

【作用】本発明によれば、バッファユニットアドレスの
上下限の規制によりネットワークが分割可能となるの
で、マルチプロセッサシステムの処理効率が高められ
る。アドレス変化範囲の限定により、プロセッサエレメ
ント間のデータ転送と、プロセッサエレメントとデータ
入出力装置との間のデータ転送とを任意に選択すること
も可能となる。According to the present invention, the network can be divided by the upper and lower limits of the buffer unit address, so that the processing efficiency of the multiprocessor system can be improved. By limiting the address change range, it is possible to arbitrarily select data transfer between the processor elements and data transfer between the processor elements and the data input / output device.

【００１２】また、増分の変更により、とびとびのアド
レス値を簡単かつ高速に生成できることとなるので、３
次元以上のデータ構造に適したネットワークの転送先ア
ドレス又は転送元アドレスの効率的な生成が可能とな
る。Also, by changing the increment, discrete address values can be generated easily and at high speed.
It is possible to efficiently generate a transfer destination address or a transfer source address of a network suitable for a data structure of dimensions or more.

【００１３】[0013]

【実施例】以下、図面を参照しながら本発明の実施例に
ついて説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００１４】（実施例１）図１は、本発明の実施例に係
るデータ転送装置の構成を示す図である。データ転送制
御装置１０とＫ個のバッファユニット（ＢＵ）２０と
は、データ転送バス１６によって相互に接続されてい
る。データ転送制御装置１０は、アドレスレジスタ１
２、アドレスカウンタ１３、比較器１４、ネットワーク
インターフェース６、内部バス１５及びそれらを制御す
るためのアドレス制御回路１１を備えている。このうち
アドレスレジスタ１２、アドレスカウンタ１３及び比較
器１４は、アドレス生成回路７を構成するものである。
Ｒ／Ｗはプロセッサエレメント（不図示）からアドレス
レジスタ１２へのアドレス設定を制御するためのレジス
タ設定信号、ＳＥＴ，ＣＮＴはアドレスカウンタ１３へ
のアドレス設定信号及びアドレス更新要求信号、Ｃは比
較器１４からの一致信号、Ａはデータ転送バス１６へ送
出されるバッファユニットアドレスである。(Embodiment 1) FIG. 1 is a diagram showing a configuration of a data transfer apparatus according to an embodiment of the present invention. The data transfer control device 10 and the K buffer units (BU) 20 are interconnected by a data transfer bus 16. The data transfer control device 10 includes an address register 1
2, an address counter 13, a comparator 14, a network interface 6, an internal bus 15, and an address control circuit 11 for controlling them. The address register 12, the address counter 13, and the comparator 14 constitute the address generation circuit 7.
R / W is a register setting signal for controlling address setting from a processor element (not shown) to the address register 12, SET and CNT are address setting signals and address update request signals to the address counter 13, and C is a comparator 14 A is a buffer unit address sent to the data transfer bus 16.

【００１５】図２は、アドレス生成回路７の詳細内部構
成を示した図である。アドレスレジスタ１２は、初期ア
ドレス、下限アドレス、上限アドレスの３つのアドレス
を保持する。アドレスカウンタ１３は、＋１加算器１７
と、セレクタ１８と、アドレスラッチ１９とを備えてい
る。セレクタ１８は、アドレスレジスタ１２からの初期
アドレス及び下限アドレスと、＋１加算器１７でインク
リメントした後のアドレスとの３つのアドレスの中から
１つをアドレス設定信号ＳＥＴに応じて選択するもので
ある。セレクタ１８の出力をアドレスラッチ１９に入力
し、その保持値Ａを各バッファユニット２０へのアドレ
ス出力とする。また、アドレスラッチ１９の出力Ａとア
ドレスレジスタ１２からの上限アドレスとを比較器１４
に入力する。比較器１４は、これら２つの値を比較し、
比較結果を一致信号Ｃとしてアドレス制御回路１１へ返
す。FIG. 2 is a diagram showing a detailed internal configuration of the address generation circuit 7. As shown in FIG. The address register 12 holds three addresses: an initial address, a lower limit address, and an upper limit address. The address counter 13 includes a +1 adder 17
, A selector 18 and an address latch 19. The selector 18 selects one of the three addresses of the initial address and the lower limit address from the address register 12 and the address after being incremented by the +1 adder 17 in accordance with the address setting signal SET. The output of the selector 18 is input to the address latch 19, and the held value A is used as an address output to each buffer unit 20. The output A of the address latch 19 and the upper limit address from the address register 12 are compared by the comparator 14.
To enter. Comparator 14 compares these two values,
The comparison result is returned to the address control circuit 11 as a match signal C.

【００１６】次に、以上の構成のデータ転送装置の動作
について説明する。図３にアドレス設定の例を示す。ま
ずアドレスレジスタ１２の初期アドレス（例えば１７）
がセレクタ１８で選択されて、アドレスラッチ１９に格
納され、バッファユニットアドレスＡが初期アドレスに
設定される。そして、このアドレスＡで指定されるバッ
ファユニット２０に対して、データ転送を行う。次に、
セレクタ１８を＋１加算器１７の出力に切り換え、デー
タの転送ごとにアドレスラッチ１９にアドレス更新要求
信号ＣＮＴを与えることにより、該アドレスラッチ１９
の保持アドレスは順次インクリメントされる。データを
次々とバッファユニット２０へ転送するとき、ＣＮＴ信
号にてアドレスを順次変化させ、バースト転送を行うの
である。アドレスラッチ１９の保持アドレスが上限アド
レス（例えば２５）に到達した場合には、比較器１４か
ら一致信号Ｃが出力される。このとき、アドレス設定信
号ＳＥＴによりセレクタ１８を下限アドレス（例えば１
０）に切り換え、アドレスラッチ１９に下限アドレスを
格納する。これにより、上限アドレスに係るデータ転送
後に、アドレスラッチ１９の保持アドレスが下限アドレ
スに戻る。このようにして、図３に示すように、初期ア
ドレスから始めて、下限アドレスと上限アドレスとの間
を順次変化するアドレス生成を行う。Next, the operation of the data transfer device having the above configuration will be described. FIG. 3 shows an example of address setting. First, the initial address of the address register 12 (for example, 17)
Is selected by the selector 18 and stored in the address latch 19, and the buffer unit address A is set as the initial address. Then, data transfer is performed to the buffer unit 20 specified by the address A. next,
The selector 18 is switched to the output of the +1 adder 17, and the address update request signal CNT is applied to the address latch 19 every time data is transferred.
Are sequentially incremented. When transferring data to the buffer unit 20 one after another, the address is sequentially changed by the CNT signal and burst transfer is performed. When the address held by the address latch 19 reaches the upper limit address (for example, 25), the comparator 14 outputs a match signal C. At this time, the selector 18 is set to the lower limit address (for example, 1) by the address setting signal SET.
0), and the lower limit address is stored in the address latch 19. As a result, the address held in the address latch 19 returns to the lower limit address after data transfer related to the upper limit address. In this manner, as shown in FIG. 3, starting from the initial address, address generation is performed that sequentially changes between the lower limit address and the upper limit address.

【００１７】以上のとおり本データ転送装置によれば、
バッファユニットアドレスの上下限を指定して順にアク
セスするアドレス生成回路７を備えることにより、デー
タ転送先としてのＫ個のバッファユニット２０のうちの
一部（１つのグループ）に対するバースト転送が可能に
なる。また、異なるグループのバッファユニットに対し
ては、アドレスレジスタ１２中のアドレスを書き換える
ことによって、そのグループに属するバッファユニット
との間のデータ転送を行うことができ、転送先の切り換
えが容易になる。As described above, according to the present data transfer device,
Providing the address generation circuit 7 for sequentially accessing the buffer unit by specifying the upper and lower limits of the buffer unit address enables burst transfer to a part (one group) of the K buffer units 20 as data transfer destinations. . In addition, by rewriting the address in the address register 12 for the buffer units in different groups, data can be transferred between the buffer units belonging to the group and the transfer destination can be easily switched.

【００１８】なお、図４に示すように、アドレスレジス
タ１２に、制御レジスタ９からの選択信号ＳＥＬにより
スイッチングされるアドレス切換回路８を付加してもよ
い。Ｋ個のバッファユニット２０をいくつかにグループ
分けをし、グループごとの複数組の初期、下限及び上限
アドレスをアドレスレジスタ１２に格納しておき、その
中の１組を選択することにより、データ転送の対象とな
るバッファユニットを容易に切り換えることができる。As shown in FIG. 4, an address switching circuit 8 which is switched by a selection signal SEL from the control register 9 may be added to the address register 12. The K buffer units 20 are divided into several groups, and a plurality of sets of initial, lower limit and upper limit addresses for each group are stored in the address register 12, and one of the sets is selected to perform data transfer. Can be easily switched.

【００１９】また、アドレス更新要求信号ＣＮＴは、デ
ータを複数個送るごとに発生してもよく、これによりパ
ケット転送への対応が可能となる。アドレスカウンタ１
３には＋１加算器１７を用いたが、これを−１加算器に
すれば上限から下限に向かってアドレスを順次減少させ
るバースト転送も可能となる。The address update request signal CNT may be generated every time a plurality of data are sent, thereby making it possible to respond to packet transfer. Address counter 1
Although the +1 adder 17 is used for 3, a burst transfer in which the address is sequentially reduced from the upper limit to the lower limit can be performed by using a -1 adder.

【００２０】次に、上記データ転送装置を利用したマル
チプロセッサシステムの構成を、図５を参照しながら説
明する。図５において、５×８（＝４０）個のバッファ
ユニット（ＢＵ）２０と、図１に示す構成を備えた５＋
８（＝１３）個のデータ転送制御装置１０とを並べてク
ロスバ型のデータ転送ネットワークが構成されている。
５本のロウバス１６ａは、各々プロセッサエレメント
（ＰＥ）３０の第１のデータ転送ポート３１に接続され
る。また、８本のコラムバス１６ｂは、各々プロセッサ
エレメント３０の第２のデータ転送ポート３２と、Ｉ／
Ｏ装置４０とに接続される。Next, the configuration of a multiprocessor system using the data transfer device will be described with reference to FIG. In FIG. 5, 5 × 8 (= 40) buffer units (BU) 20 and 5+
Eight (= 13) data transfer control devices 10 are arranged side by side to form a crossbar type data transfer network.
Each of the five row buses 16a is connected to a first data transfer port 31 of a processor element (PE) 30. The eight column buses 16b are connected to the second data transfer port 32 of the processor element 30,
It is connected to the O device 40.

【００２１】ロウバス１６ａとコラムバス１６ｂとの交
点に位置する１つのバッファユニット（ＢＵ）２０の構
成を図６に示す。ロウバス１６ａとコラムバス１６ｂと
にそれぞれ接続された２つのポート２２，２３と、ＦＩ
ＦＯメモリ２１との接続は、双方向にデータを転送する
経路を持つ。FIG. 6 shows the configuration of one buffer unit (BU) 20 located at the intersection of the row bus 16a and the column bus 16b. Two ports 22, 23 connected to the row bus 16a and the column bus 16b, respectively;
The connection to the FO memory 21 has a path for transferring data in both directions.

【００２２】次に、この構成のマルチプロセッサシステ
ムにおけるデータ転送動作について説明する。プロセッ
サエレメント３０は、アドレスレジスタ１２（図１，
２）で指定される範囲のバッファユニット２０にデータ
を転送する。各プロセッサエレメント３０が０から４ま
でのアドレスを生成する場合には、５個のプロセッサエ
レメント３０の間で相互にデータ転送を行う。アドレス
の設定をプロセッサエレメント３０のグループごとに変
えると、設定したアドレス範囲のグループ内で相互にデ
ータ転送を行う。例えば、２個のプロセッサエレメント
（ＰＥ０，１）は下限アドレスを０に、上限アドレスを
１に設定し、残り３個のプロセッサエレメント（ＰＥ
２，３，４）は下限アドレスを２に、上限アドレスを４
に設定する。このとき、プロセッサエレメント３０は２
つのグループに分割されて、それぞれのグループ内での
相互のデータ転送が実行される。各プロセッサエレメン
ト（ＰＥ０〜４）が下限アドレスを５に、上限アドレス
を７に設定し、３個のＩ／Ｏ装置４０が下限アドレスを
０に、上限アドレスを４に設定すれば、プロセッサエレ
メント３０とＩ／Ｏ装置４０との間でのデータ転送が可
能になる。Next, a data transfer operation in the multiprocessor system having this configuration will be described. The processor element 30 stores the address register 12 (FIG. 1,
The data is transferred to the buffer unit 20 in the range specified in 2). When each processor element 30 generates an address from 0 to 4, data is mutually transferred between the five processor elements 30. When the address setting is changed for each group of the processor elements 30, data transfer is performed between the groups within the set address range. For example, two processor elements (PE0, 1) set the lower limit address to 0, the upper limit address to 1, and the remaining three processor elements (PE0, 1).
(2,3,4) sets the lower limit address to 2 and the upper limit address to 4
Set to. At this time, the processor element 30
It is divided into two groups, and mutual data transfer within each group is performed. If each processor element (PE0-4) sets the lower limit address to 5 and the upper limit address to 7, and the three I / O devices 40 set the lower limit address to 0 and the upper limit address to 4, the processor element 30 , And data transfer between the I / O device 40 becomes possible.

【００２３】このようなアドレス生成により、プロセッ
サエレメント数やＩ／Ｏ装置の個数を任意に追加拡張す
ることが可能となり、データ転送先の変更はアドレスの
範囲をソフトウエアで変更するだけで済む。また、図６
に示すバッファユニット２０により、双方向のデータ転
送を実行することができる。By such address generation, the number of processor elements and the number of I / O devices can be arbitrarily added and extended, and the data transfer destination can be changed only by changing the address range by software. FIG.
The bidirectional data transfer can be executed by the buffer unit 20 shown in FIG.

【００２４】以上のとおり、本マルチプロセッサシステ
ムによれば、アドレス範囲を指定して順にアクセスする
アドレス生成回路７（図１）を備えることにより、プロ
セッサエレメント３０をグループ化し、そのグループご
とに相互のデータ転送を実行することができる。したが
って、処理内容に応じてデータ転送ネットワークを最適
な大きさに分割して効率良く並列処理を行うことができ
る。また、プロセッサエレメント３０とＩ／Ｏ装置４０
との間のデータ転送も容易に実行できる。しかも、アド
レス範囲で指定するので、ネットワークの拡張が容易で
ある。As described above, according to the present multiprocessor system, the processor elements 30 are grouped by providing the address generation circuit 7 (FIG. 1) for sequentially accessing by designating the address range, and the mutual Data transfer can be performed. Therefore, the data transfer network can be divided into optimal sizes according to the processing contents, and parallel processing can be performed efficiently. Further, the processor element 30 and the I / O device 40
Data transfer to and from can also be easily performed. In addition, since the address is specified in the address range, network expansion is easy.

【００２５】なお、８本のロウバス１６ａと５本のコラ
ムバス１６ｂとを設け、８本のうちの３本のロウバス１
６ａの各々にデータ転送制御装置１０を介してＩ／Ｏ装
置４０を接続してもよい。ロウバス１６ａの本数とコラ
ムバス１６ｂの本数との双方をプロセッサエレメント３
０の数より大きくした構成も可能であり、この場合には
プロセッサエレメント３０の２つのポート３１，３２の
いずれでもＩ／Ｏ装置４０との間のデータ転送が可能と
なる。Note that eight row buses 16a and five column buses 16b are provided, and three of the eight row buses 1a are provided.
An I / O device 40 may be connected to each of the devices 6a via the data transfer control device 10. Both the number of row buses 16a and the number of column buses 16b are assigned to the processor element 3
A configuration in which the number is larger than 0 is also possible. In this case, data transfer between the I / O device 40 and the two ports 31 and 32 of the processor element 30 is possible.

【００２６】図７に示すデータ転送制御装置１０は、内
部バス１５を共通バス４１に接続するための共通バスイ
ンターフェース４２を図１の構成に付加したものであ
る。図８に示すように、例えば３個のデータ転送制御装
置１０を１本の共通バス４１に接続し、該共通バス４１
に５個のＩ／Ｏ装置４０をつなぐことができる。図８の
構成によれば、ネットワークのバッファユニットを拡張
することなしにＩ／Ｏ装置の拡張が容易となる。The data transfer control device 10 shown in FIG. 7 is obtained by adding a common bus interface 42 for connecting the internal bus 15 to the common bus 41 to the configuration shown in FIG. As shown in FIG. 8, for example, three data transfer control devices 10 are connected to one common bus 41, and the common bus 41
Can be connected to five I / O devices 40. According to the configuration of FIG. 8, the expansion of the I / O device becomes easy without expanding the buffer unit of the network.

【００２７】（実施例２）次に、マルチプロセッサシス
テムに用いられるデータ転送制御装置中のアドレス生成
回路の他の例を図９に示す。図９のアドレス生成回路の
アドレスカウンタ１３は、２個の＋１加算器１７ａ，１
７ｂと、上位セレクタ１８ａと、下位セレクタ１８ｂ
と、２ビットの上位アドレスラッチ１９ａと、２ビット
の下位アドレスラッチ１９ｂとを備えている。上位セレ
クタ１８ａは、ＳＥＴ１信号に応じて、アドレスレジス
タ１２の初期アドレスの上位２ビット、アドレスレジス
タ１２の下限アドレスの上位２ビット及び＋１加算器１
７ａの出力の中から１つを選んで上位アドレスラッチ１
９ａに与える。一方、下位セレクタ１８ｂは、ＳＥＴ２
信号に応じて、アドレスレジスタ１２の初期アドレスの
下位２ビット、アドレスレジスタ１２の下限アドレスの
下位２ビット及び＋１加算器１７ｂの出力の中から１つ
を選んで下位アドレスラッチ１９ｂに与える。しかも、
上位アドレスラッチ１９ａと下位アドレスラッチ１９ｂ
とは、ＣＮＴ１信号とＣＮＴ２信号によって、独立に更
新される。上位アドレスラッチ１９ａの出力Ａ１と下位
アドレスラッチ１９ｂの出力Ａ２とは、合成されて４ビ
ットのバッファユニットアドレスＡとなる。(Embodiment 2) FIG. 9 shows another example of an address generation circuit in a data transfer control device used in a multiprocessor system. The address counter 13 of the address generation circuit of FIG. 9 includes two +1 adders 17a, 1
7b, the upper selector 18a, and the lower selector 18b
And a 2-bit upper address latch 19a and a 2-bit lower address latch 19b. The high-order selector 18a, in response to the SET1 signal, outputs the high-order 2 bits of the initial address of the address register 12, the high-order 2 bits of the low-order address of the address register 12, and the +1 adder 1
7a is selected from the output of the upper address latch 1
Give to 9a. On the other hand, the lower selector 18 b
According to the signal, one of the lower 2 bits of the initial address of the address register 12, the lower 2 bits of the lower limit address of the address register 12, and the output of the +1 adder 17b is selected and supplied to the lower address latch 19b. Moreover,
Upper address latch 19a and lower address latch 19b
Is independently updated by the CNT1 signal and the CNT2 signal. The output A1 of the upper address latch 19a and the output A2 of the lower address latch 19b are combined to form a 4-bit buffer unit address A.

【００２８】次に、図９のアドレス生成回路による３次
元の配列に対するプロセッサ間のデータ転送動作につい
て、図１０〜図１６に従って説明する。プロセッサエレ
メント３０の総数はｎ×ｎとし、ｎは２の累乗の定数で
ある。以下の説明ではｎ＝４とする。Next, the data transfer operation between processors in a three-dimensional array by the address generation circuit of FIG. 9 will be described with reference to FIGS. The total number of processor elements 30 is n × n, where n is a constant of a power of two. In the following description, n = 4.

【００２９】図１０にプロセッサエレメント３０の数が
１６の場合のマルチプロセッサシステムの例を示す。１
６個のプロセッサエレメント３０は、１６×１６個のバ
ッファユニット２０にロウバス１６ａ及びコラムバス１
６ｂを通じて接続される。流体解析などの数値演算にお
ける並列処理では、流体の状態を表す３次元の配列デー
タを各プロセッサエレメント３０に分配し、プロセッサ
エレメント３０の間で相互にデータを交換しながら並列
演算を進めていく。FIG. 10 shows an example of a multiprocessor system when the number of processor elements 30 is 16. 1
The six processor elements 30 store 16 × 16 buffer units 20 in the row bus 16 a and the column bus 1.
6b. In parallel processing in numerical calculation such as fluid analysis, three-dimensional array data representing the state of a fluid is distributed to each processor element 30, and parallel processing is performed while exchanging data between the processor elements 30.

【００３０】３次元配列データを１６個のプロセッサエ
レメント３０に分割して並列処理する場合のデータの分
割方法について説明する。A description will be given of a method of dividing data when the three-dimensional array data is divided into 16 processor elements 30 and processed in parallel.

【００３１】図１１（ａ）に示すように、４×４×４の
３次元配列をＡ（１：４，１：４，１：４）と表すこと
とする。この３次元配列Ａを１６個の１次元配列に分割
する場合、３通りの分割方法が可能である。図１１
（ｂ）はＸ方向の分割Ａ（１：４，／１：４，１：４
／）を、図１１（ｃ）はＹ方向の分割Ａ（１：４／，
１：４，／１：４）を、図１１（ｄ）はＺ方向の分割Ａ
（／１：４，１：４／，１：４）を各々表している。い
ずれの場合にも、１６個の１次元配列が１個ずつ、図１
０中の１６個のプロセッサエレメント３０に割り当てら
れる。As shown in FIG. 11A, a 4 × 4 × 4 three-dimensional array is represented by A (1: 4, 1: 4, 1: 4). When dividing the three-dimensional array A into 16 one-dimensional arrays, three types of division methods are possible. FIG.
(B) is a division A (1: 4, / 1: 4,1: 4) in the X direction.
/ (), And FIG. 11 (c) shows a division A (1: 4 /,
1: 4, / 1: 4), and FIG.
(/ 1: 4, 1: 4 /, 1: 4). In each case, one of the 16 one-dimensional arrays is shown in FIG.
It is assigned to 16 processor elements 30 in 0.

【００３２】Ｘ方向の分割の場合には、図１２に示すよ
うな２次元配置の仮想プロセッサエレメントアレイ３０
ａが想定される。例えば、仮想プロセッサエレメントア
レイ３０ａのうちの／４，１／には、１次元配列Ａ
（１：４，／４，１／）が割り当てられる。図１０中の
実際のプロセッサエレメント３０（すなわち物理プロセ
ッサエレメント）は１次元の並びであるから、図１３に
従って仮想プロセッサエレメントを更に物理プロセッサ
エレメント３０ｂに割り当てる必要がある。図１３は、
仮想プロセッサエレメントの番号と物理プロセッサエレ
メントの番号との対応を示す。Ｘ方向の分割Ａ（１：
４，／Ｊ，Ｋ／）（Ｊ＝１：４、Ｋ＝１：４）の場合の
仮想プロセッサエレメントの番号／Ｊ，Ｋ／は、Ｌ＝
（Ｊ−１）×４＋Ｋ−１（１≦Ｊ，Ｋ≦４）とすると
き、物理プロセッサエレメントの番号Ｌに割り当てられ
る。In the case of division in the X direction, the virtual processor element array 30 having a two-dimensional arrangement as shown in FIG.
a is assumed. For example, / 4, 1 / of the virtual processor element array 30a has a one-dimensional array A
(1: 4, / 4,1 /) is assigned. Since the actual processor elements 30 (that is, physical processor elements) in FIG. 10 are one-dimensionally arranged, it is necessary to further allocate a virtual processor element to the physical processor element 30b according to FIG. FIG.
The correspondence between the number of the virtual processor element and the number of the physical processor element is shown. Division A in the X direction (1:
4, / J, K /) (J = 1: 4, K = 1: 4), the number / J, K / of the virtual processor element is L =
When (J−1) × 4 + K−1 (1 ≦ J, K ≦ 4), it is assigned to the number L of the physical processor element.

【００３３】以上のように割り当てられた３次元配列を
並列処理する際、１個のプロセッサエレメント３０（３
０ｂ）内の１次元配列に対する１回の演算のみでは、空
間領域全体の状態の変化を求めることはできない。３次
元空間領域全体の演算を行うためには、分割方向を順次
変化させて演算を行う必要がある。例えば、偏微分方程
式を差分法で解く場合のＡＤＩ（Ａｌｔｅｒｎａｔｉｎ
ｇＤｉｒｅｃｔｉｏｎＩｍｐｌｉｃｉｔ）法は、こ
のような演算方法である。When the three-dimensional array allocated as described above is processed in parallel, one processor element 30 (3
The change in the state of the entire space area cannot be obtained by only one operation on the one-dimensional array in 0b). In order to perform the operation on the entire three-dimensional space area, it is necessary to perform the operation while sequentially changing the division direction. For example, when a partial differential equation is solved by a difference method, ADI (Alterninin) is used.
The g Direction Implicit method is such an operation method.

【００３４】図１４は、Ｘ方向の分割からＹ方向の分割
への切り換えの場合のプロセッサエレメント３０（３０
ｂ）の間のデータ転送の様子を示す。Ｙ方向の分割Ａ
（Ｉ／，１：４，／Ｋ）（Ｉ＝１：４、Ｋ＝１：４）の
場合の仮想プロセッサエレメントの番号／Ｋ，Ｉ／は、
Ｍ＝（Ｋ−１）×４＋Ｉ−１（１≦Ｉ，Ｋ≦４）とする
とき、物理プロセッサエレメントの番号Ｍに割り当てら
れる。FIG. 14 shows a processor element 30 (30) for switching from division in the X direction to division in the Y direction.
The state of data transfer during b) is shown. Division A in Y direction
In the case of (I /, 1: 4, / K) (I = 1: 4, K = 1: 4), the virtual processor element numbers / K, I / are
When M = (K−1) × 4 + I−1 (1 ≦ I, K ≦ 4), it is assigned to the number M of the physical processor element.

【００３５】まず、Ｘ方向に分割された１次元配列を各
々持った１６個のプロセッサエレメント３０（３０ｂ）
から、データ転送制御装置１０及びロウバス１６ａを介
してバッファユニット２０へデータを転送する動作につ
いて説明する。例えば番号１の物理プロセッサエレメン
ト３０ｂが保持している１次元配列Ａ（１：４，／１，
２／）は、４個の要素Ａ（１，／１，２／）、Ａ（２，
／１，２／）、Ａ（３，／１，２／）、Ａ（４，／１，
２／）からなる。このとき、番号１の物理プロセッサエ
レメント３０ｂに接続されたデータ転送制御装置１０の
アドレス生成回路（図９）は、上位アドレスラッチ１９
ａの内容を１に固定しながら下位アドレスラッチ１９ｂ
のみの順次更新によって、バッファユニットアドレスＡ
として４、５、６、７を順次生成する。これにより、１
次元配列Ａ（１：４，／１，２／）の４個の要素が４個
のバッファユニット２０に分散配置される。他の番号の
物理プロセッサエレメント３０ｂからバッファユニット
２０へのデータ転送も同様の態様で実行される結果、３
次元配列Ａ（１：４，１：４，１：４）の全ての要素デ
ータが６４個のバッファユニット２０に分散配置され
る。First, 16 processor elements 30 (30b) each having a one-dimensional array divided in the X direction
The operation of transferring data to the buffer unit 20 via the data transfer control device 10 and the row bus 16a will now be described. For example, the one-dimensional array A (1: 4, / 1, 1) held by the physical processor element 30b of number 1
2 /) are four elements A (1, / 1,2 /), A (2,
/ 1,2 /), A (3,1 / 1,2 /), A (4,1 / 1,
2 /). At this time, the address generation circuit (FIG. 9) of the data transfer control device 10 connected to the physical processor element 30b of No. 1
lower address latch 19b while fixing the content of a to 1
The buffer unit address A
4, 5, 6, and 7 are sequentially generated. This gives 1
The four elements of the dimensional array A (1: 4, / 1,2 /) are distributed and arranged in the four buffer units 20. Data transfer from the physical processor element 30b of another number to the buffer unit 20 is performed in a similar manner.
All the element data of the dimensional array A (1: 4, 1: 4, 1: 4) are distributed and arranged in 64 buffer units 20.

【００３６】次に、各々１個の要素データを持った６４
個のバッファユニット２０から、データ転送制御装置１
０及びコラムバス１６ｂを介して１６個のプロセッサエ
レメント３０（３０ｂ）へデータを転送する動作につい
て説明する。図１５は、Ｙ方向に分割された１次元配列
Ａ（１／，１：４，／２）が割り当てられる番号４の物
理プロセッサエレメント３０ｂへのデータ転送の様子を
示す。この番号４の物理プロセッサエレメント３０ｂへ
転送すべき１次元配列は、番号１の物理プロセッサエレ
メント３０ｂから転送されてきた要素データＡ（１，／
１，２／）と、番号５の物理プロセッサエレメント３０
ｂから転送されてきた要素データＡ（１，／２，２／）
と、番号９の物理プロセッサエレメント３０ｂから転送
されてきた要素データＡ（１，／３，２／）と、番号１
３の物理プロセッサエレメント３０ｂから転送されてき
た要素データＡ（１，／４，２／）とからなる。したが
って、番号４の物理プロセッサエレメント３０ｂに接続
されたデータ転送制御装置１０のアドレス生成回路（図
９）は、下位アドレスラッチ１９ｂの内容を１に固定し
ながら上位アドレスラッチ１９ａのみの順次更新によっ
て、図１６に示すように、バッファユニットアドレスＡ
として１、５、９、１３を順次生成する。これにより、
４個のバッファユニット２０に分散配置されていた要素
データが番号４の物理プロセッサエレメント３０ｂに収
集される。バッファユニット２０から他の番号の物理プ
ロセッサエレメント３０ｂへのデータ転送も同様の態様
で実行される結果、３次元配列Ａ（１：４，１：４，
１：４）がＹ方向に分割された状態で１６個の物理プロ
セッサエレメント３０ｂに分散配置される。Next, 64 elements each having one element data
From the buffer units 20, the data transfer control device 1
The operation of transferring data to the 16 processor elements 30 (30b) via 0 and the column bus 16b will be described. FIG. 15 shows a state of data transfer to the physical processor element 30b of number 4 to which the one-dimensional array A (1 /, 1: 4, / 2) divided in the Y direction is assigned. The one-dimensional array to be transferred to the physical processor element 30b of number 4 is the element data A (1, //) transferred from the physical processor element 30b of number 1.
1, 2 /) and the physical processor element 30 of number 5
element data A transferred from b (1, / 2,2 /)
And element data A (1, / 3,2 /) transferred from the physical processor element 30b of number 9 and number 1
And the element data A (1, / 4,2 /) transferred from the third physical processor element 30b. Therefore, the address generation circuit (FIG. 9) of the data transfer control device 10 connected to the physical processor element 30b of No. 4 fixes the contents of the lower address latch 19b to 1 while sequentially updating only the upper address latch 19a. As shown in FIG.
1, 5, 9, and 13 are sequentially generated. This allows
The element data distributed and arranged in the four buffer units 20 is collected in the physical processor element 30b of No. 4. The data transfer from the buffer unit 20 to the physical processor element 30b of another number is executed in the same manner, and as a result, the three-dimensional array A (1: 4, 1: 4,
1: 4) are distributed to the 16 physical processor elements 30b while being divided in the Y direction.

【００３７】以上のとおり、図９のアドレス生成回路に
よれば、アドレスを上位と下位に分割することにより、
マルチプロセッサシステムにおける３次元配列の分割方
向の変更をバースト転送で実現でき、高速なデータ転送
が可能となる。As described above, according to the address generation circuit of FIG. 9, by dividing an address into upper and lower parts,
The change of the dividing direction of the three-dimensional array in the multiprocessor system can be realized by burst transfer, and high-speed data transfer is possible.

【００３８】なお、取り扱うべき配列の大きさがプロセ
ッサエレメント３０（３０ｂ）の個数よりも大きい場合
には、１つの物理プロセッサエレメント３０ｂが２個以
上の仮想プロセッサエレメントを分担する。このとき、
プロセッサエレメント３０（３０ｂ）間のデータ転送
は、図９のアドレス生成回路を繰り返し使用することで
実行できる。When the size of the array to be handled is larger than the number of processor elements 30 (30b), one physical processor element 30b shares two or more virtual processor elements. At this time,
Data transfer between the processor elements 30 (30b) can be performed by repeatedly using the address generation circuit of FIG.

【００３９】（実施例３）図１０のマルチプロセッサシ
ステムでは、プロセッサエレメント３０の総数の平方根
ｎが２の累乗であったので、図９のアドレス生成回路中
の上位アドレスラッチ１９ａの出力を上位ビットに、下
位アドレスラッチ１９ｂの出力を下位ビットに各々割り
当てることでバッファユニットアドレスＡとすることが
できた。(Embodiment 3) In the multiprocessor system of FIG. 10, since the square root n of the total number of the processor elements 30 is a power of 2, the output of the upper address latch 19a in the address generation circuit of FIG. The buffer unit address A can be obtained by assigning the output of the lower address latch 19b to the lower bits.

【００４０】次に、プロセッサエレメントの総数の平方
根ｎが２の累乗でない場合にも適用可能なアドレス生成
回路の例を、図１７を用いて説明する。図１７のアドレ
ス生成回路のアドレスカウンタ１３は、第１の加算器１
７ｃと、＋１加算器１７ｂと、上位セレクタ１８ａと、
下位セレクタ１８ｂと、４ビットの上位アドレスラッチ
１９ａと、２ビットの下位アドレスラッチ１９ｂと、第
２の加算器４３とを備えている。アドレスレジスタ１２
は、初期、下限及び上限アドレスに加えて、増分アドレ
スを保持する。第１の加算器１７ｃは、上位アドレスラ
ッチ１９ａの出力Ａ１とアドレスレジスタ１２からの増
分アドレスとの加算結果を出力する。第２の加算器４３
は、上位アドレスラッチ１９ａの出力Ａ１と下位アドレ
スラッチ１９ｂの出力Ａ２との加算結果を、バッファユ
ニットアドレスＡとして出力する。そして、第２の加算
器４３の出力Ａとアドレスレジスタ１２からの上限アド
レスとを第１の比較器１４に入力し、下位アドレスラッ
チ１９ｂの出力Ａ２とアドレスレジスタ１２からの増分
アドレスとを第２の比較器１４ｂに入力する。第１及び
第２の比較器１４，１４ｂは、各々一致信号Ｃ，Ｃｂを
出力する。Next, an example of an address generation circuit applicable to a case where the square root n of the total number of processor elements is not a power of 2 will be described with reference to FIG. The address counter 13 of the address generation circuit of FIG.
7c, a +1 adder 17b, an upper selector 18a,
A lower selector 18b, a 4-bit upper address latch 19a, a 2-bit lower address latch 19b, and a second adder 43 are provided. Address register 12
Holds the increment address in addition to the initial, lower and upper addresses. The first adder 17c outputs the result of addition of the output A1 of the upper address latch 19a and the increment address from the address register 12. Second adder 43
Outputs the result of addition of the output A1 of the upper address latch 19a and the output A2 of the lower address latch 19b as a buffer unit address A. Then, the output A of the second adder 43 and the upper limit address from the address register 12 are input to the first comparator 14, and the output A2 of the lower address latch 19b and the increment address from the address register 12 are input to the second comparator 14. To the comparator 14b. The first and second comparators 14 and 14b output coincidence signals C and Cb, respectively.

【００４１】例えば、アドレス６、７、８の３個のバッ
ファユニット２０へのデータ転送の場合には、アドレス
レジスタ１２の中の増分アドレスを３に設定し、上位ア
ドレスラッチ１９ａの内容を６に固定しながら下位アド
レスラッチ１９ｂの内容を０、１、２と１ずつ変化させ
る。また、図１８に示すように、アドレス１、４、７、
１０、１３の５個のバッファユニット２０からのデータ
転送のためのアドレス生成の場合には、アドレスレジス
タ１２の中の増分アドレスを３に設定し、下位アドレス
ラッチ１９ｂの内容を１に固定しながら上位アドレスラ
ッチ１９ａの内容を０、３、６、９、１２と３ずつ変化
させる。For example, in the case of data transfer to the three buffer units 20 at addresses 6, 7, and 8, the increment address in the address register 12 is set to 3 and the contents of the upper address latch 19a are set to 6. The contents of the lower address latch 19b are changed by 0, 1, 2 and 1 while being fixed. As shown in FIG. 18, addresses 1, 4, 7,.
In the case of generating addresses for data transfer from the five buffer units 20 of 10 and 13, the increment address in the address register 12 is set to 3 while the contents of the lower address latch 19b are fixed to 1. The contents of the upper address latch 19a are changed by 0, 3, 6, 9, 12, and 3 at a time.

【００４２】以上のとおり、図１７のアドレス生成回路
によれば、アドレスに関する２つの加算器１７ｃ，４３
を設けることにより、プロセッサエレメントの総数の平
方根が２の累乗でない場合でも高速のバースト転送を実
現できる。As described above, according to the address generation circuit of FIG. 17, the two adders 17c and 43 related to the address are used.
Is provided, high-speed burst transfer can be realized even when the square root of the total number of processor elements is not a power of 2.

【００４３】（実施例４）次に、プロセッサエレメント
の２次元配置を備えたマルチプロセッサシステムの例を
説明する。図１９は、同システム中のバッファユニット
の部分のみを示したものであり、図５のバッファユニッ
ト２０の構成を５組重ねた構成（５×８×５構成）であ
る。５×５本のロウバスと５×５本のコラムバスには、
５×５個のプロセッサエレメントを接続する。残り５×
３本のコラムバスには、５×３個のＩ／Ｏ装置を接続す
る。(Embodiment 4) Next, an example of a multiprocessor system having a two-dimensional arrangement of processor elements will be described. FIG. 19 shows only the buffer unit in the same system, and has a configuration (5 × 8 × 5 configuration) in which five sets of the buffer unit 20 in FIG. 5 are stacked. 5 × 5 row bus and 5 × 5 column bus
5 × 5 processor elements are connected. 5x remaining
5 × 3 I / O devices are connected to the three column buses.

【００４４】図２０に、第ｓ面（０≦ｓ≦４）の５×８
個のバッファユニット２０を介したプロセッサエレメン
ト間の接続関係と、プロセッサエレメントとＩ／Ｏ装置
との接続関係とを示す。本システムでは、第ｓ面内の５
個のプロセッサエレメント（０，ｓ）〜（４，ｓ）の各
々の第１のポートが、第ｓ面のロウバス１６ａ、データ
転送制御装置１０、バッファユニット２０及びコラムバ
ス１６ｂを介して、５つの面のプロセッサエレメント
（ｓ，０）〜（ｓ，４）の各々の第２のポートに接続さ
れる。また、第ｓ面内の３個のＩ／Ｏ装置（ｓ，０）、
（ｓ，１）、（ｓ，２）は、同一面内のバッファユニッ
ト２０を介してプロセッサエレメントに接続される。FIG. 20 shows the 5 × 8 of the s-th surface (0 ≦ s ≦ 4).
The connection relationship between the processor elements via the buffer units 20 and the connection relationship between the processor elements and the I / O devices are shown. In this system, 5 in the s plane
The first ports of each of the processor elements (0, s) to (4, s) are connected to the five buses via the row bus 16a, the data transfer control device 10, the buffer unit 20, and the column bus 16b on the s-th surface. It is connected to the second port of each of the processor elements (s, 0) to (s, 4) on the surface. Also, three I / O devices (s, 0) in the s-th plane,
(S, 1) and (s, 2) are connected to the processor element via the buffer unit 20 in the same plane.

【００４５】以上の構成のマルチプロセッサシステムで
も、プロセッサエレメント間のデータ転送と、プロセッ
サエレメントとＩ／Ｏ装置との間のデータ転送とを行う
ことができる。プロセッサエレメントの総数をｎ×ｎ
（上記の例の場合は５×５＝２５）とすると、プロセッ
サエレメント間のデータ転送に必要なバッファユニット
の総数はｎ×ｎ×ｎ（上記の例の場合は５×５×５＝１
２５）となり、従来のクロスバ結合で必要となるバッフ
ァユニットの総数ｎ×ｎ×ｎ×ｎ（例えば５×５×５×
５＝６２５）よりもかなり少ない数でシステムを実現す
ることが可能となる。また、Ｉ／Ｏ装置も容易に接続す
ることができる。In the multiprocessor system having the above configuration, data transfer between the processor elements and data transfer between the processor elements and the I / O device can be performed. The total number of processor elements is n × n
(5 × 5 = 25 in the above example), the total number of buffer units necessary for data transfer between the processor elements is n × n × n (5 × 5 × 5 = 1 in the above example).
25), and the total number of buffer units n × n × n × n (for example, 5 × 5 × 5 ×
5 = 625). Also, I / O devices can be easily connected.

【００４６】次に、図８に示すようなＩ／Ｏ装置のため
の共通バスを図１９及び図２０に示すマルチプロセッサ
システムに導入した例を説明する。図２１に示すよう
に、Ｉ／Ｏ装置の接続に係る５×３本のコラムバスの番
号を（ｓ，ｔ）（０≦ｓ≦４かつ０≦ｔ≦２）とし、第
１群のコラムバス（０，０）〜（４，０）に第１の共通
バス４１ａを、第２群のコラムバス（０，１）〜（４，
１）に第２の共通バス４１ｂを、第３群のコラムバス
（０，２）〜（４，２）に第３の共通バス４１ｃをそれ
ぞれ接続する。そして、これら３本の共通バス４１ａ，
４１ｂ，４１ｃの各々に１個のＩ／Ｏ装置（Ｉ／Ｏ₀，
Ｉ／Ｏ₁，Ｉ／Ｏ₂）４０を接続する。例えば、Ｉ／Ｏ
₀は第１の共通バス４１ａを介して全ての面の全てのロ
ウバスに接続されるので、全てのプロセッサエレメント
は、ロウバスを経由してＩ／Ｏ₀との間でデータ転送を
行うことができる。Ｉ／Ｏ₁及びＩ／Ｏ₂についても同
様である。また、各共通バス４１ａ，４１ｂ，４１ｃに
は独立にＩ／Ｏ装置を拡張することができ、必要に応じ
てこれに追加接続するだけでよい。Next, an example in which a common bus for the I / O device as shown in FIG. 8 is introduced into the multiprocessor system shown in FIGS. 19 and 20 will be described. As shown in FIG. 21, the number of 5 × 3 column buses related to the connection of the I / O device is (s, t) (0 ≦ s ≦ 4 and 0 ≦ t ≦ 2), The first common bus 41a is connected to the buses (0,0) to (4,0), and the second group of column buses (0,1) to (4,0).
The first common bus 41b is connected to 1), and the third common bus 41c is connected to the third group of column buses (0, 2) to (4, 2). Then, these three common buses 41a,
41b, 41c, one I / O device (I / O ₀ ,
I / O ₁ , I / O ₂ ) 40 are connected. For example, I / O
_{Since 0} is connected to all row buses on all surfaces via the first common bus 41a, all processor elements can perform data transfer with I / O ₀ via the row bus. . The same applies to I / O ₁ and I / O ₂ . Further, an I / O device can be independently extended for each of the common buses 41a, 41b and 41c, and additional connection to the I / O device is only required if necessary.

【００４７】以上のとおり、図２１に示すようなＩ／Ｏ
装置のための共通バスを導入することによって、全ての
プロセッサエレメントで１つのＩ／Ｏ装置を共有するこ
とができる。また、バスをグループ分けしたので、１本
の共通バスの上のＩ／Ｏ装置に故障が生じた場合には、
他の共通バス上のＩ／Ｏ装置によるデータ入出力が可能
であり、故障の影響を少なくすることができる。As described above, the I / O as shown in FIG.
By introducing a common bus for the devices, one I / O device can be shared by all processor elements. Also, since the buses are grouped, if a failure occurs in an I / O device on one common bus,
Data input / output by I / O devices on another common bus is possible, and the influence of a failure can be reduced.

【００４８】[0048]

【発明の効果】以上説明してきたとおり、本発明によれ
ば、バッファユニットアドレスの変化範囲を任意に限定
できるようにしたので、データ転送先又はデータ転送先
のグループ化が可能となり、ひいては並列化の効率を高
めることができる。また、バッファユニットアドレスの
限定により、プロセッサエレメント間の任意のデータ転
送と、プロセッサエレメントとデータ入出力装置との間
の任意のデータ転送とを選択できる。As described above, according to the present invention, the change range of the buffer unit address can be arbitrarily limited, so that the data transfer destinations or the data transfer destinations can be grouped, and the parallelization can be achieved. Efficiency can be increased. Further, by limiting the buffer unit address, any data transfer between the processor elements and any data transfer between the processor elements and the data input / output device can be selected.

【００４９】また、とびとびのアドレス値を簡単かつ高
速に生成できるようにしたので、３次元以上のデータ構
造に適したネットワークを実現できる。Further, since discrete address values can be generated easily and at high speed, a network suitable for a data structure of three or more dimensions can be realized.

[Brief description of the drawings]

【図１】本発明の実施例に係るデータ転送装置の構成図
である。FIG. 1 is a configuration diagram of a data transfer device according to an embodiment of the present invention.

【図２】図１中のアドレス生成回路の構成図である。FIG. 2 is a configuration diagram of an address generation circuit in FIG. 1;

【図３】図２のアドレス生成回路によるバッファユニッ
トアドレスの生成例を示す図である。FIG. 3 is a diagram illustrating an example of generation of a buffer unit address by the address generation circuit of FIG. 2;

【図４】図２のアドレス生成回路の変形例を示す構成図
である。FIG. 4 is a configuration diagram showing a modification of the address generation circuit of FIG. 2;

【図５】図１の構成を利用したマルチプロセッサシステ
ムの構成図である。FIG. 5 is a configuration diagram of a multiprocessor system using the configuration of FIG. 1;

【図６】図５中の１個のバッファユニットの構成図であ
る。6 is a configuration diagram of one buffer unit in FIG. 5;

【図７】図１中のデータ転送制御装置の変形例を示す構
成図である。FIG. 7 is a configuration diagram showing a modification of the data transfer control device in FIG. 1;

【図８】図７のデータ転送制御装置を利用したマルチプ
ロセッサシステムの構成図である。FIG. 8 is a configuration diagram of a multiprocessor system using the data transfer control device of FIG. 7;

【図９】図２のアドレス生成回路の他の変形例を示す構
成図である。FIG. 9 is a configuration diagram illustrating another modification of the address generation circuit of FIG. 2;

【図１０】図９のアドレス生成回路の構成をデータ転送
制御装置に用いたマルチプロセッサシステムの構成図で
ある。10 is a configuration diagram of a multiprocessor system using the configuration of the address generation circuit of FIG. 9 in a data transfer control device.

【図１１】（ａ）は図１０のマルチプロセッサシステム
が処理すべき３次元配列を、（ｂ）は該３次元配列のＸ
方向の分割を、（ｃ）は該３次元配列のＹ方向の分割
を、（ｄ）は該３次元配列のＺ方向の分割をそれぞれ示
す図である。11A is a diagram illustrating a three-dimensional array to be processed by the multiprocessor system of FIG. 10; FIG.
FIG. 4C is a diagram illustrating division in the direction, FIG. 4C is a diagram illustrating division in the Y direction of the three-dimensional array, and FIG.

【図１２】図１０のマルチプロセッサシステムで図１１
（ａ）の３次元配列を処理する場合の仮想プロセッサエ
レメントの２次元配置を示す図である。FIG. 12 shows the multiprocessor system of FIG. 10;
FIG. 7A is a diagram illustrating a two-dimensional arrangement of virtual processor elements when processing the three-dimensional array of FIG.

【図１３】物理プロセッサエレメントの番号と図１２中
の仮想プロセッサエレメントの番号との対応を示す図で
ある。FIG. 13 is a diagram showing the correspondence between physical processor element numbers and virtual processor element numbers in FIG. 12;

【図１４】図１０のマルチプロセッサシステムにおける
データの分散・収集の様子を示す概念図である。FIG. 14 is a conceptual diagram showing a state of distribution and collection of data in the multiprocessor system of FIG.

【図１５】図１４における１つのプロセッサエレメント
によるデータ収集の詳細を示す概念図である。FIG. 15 is a conceptual diagram showing details of data collection by one processor element in FIG. 14;

【図１６】図９のアドレス生成回路によるバッファユニ
ットアドレスの生成例を示す図である。16 is a diagram illustrating an example of generation of a buffer unit address by the address generation circuit of FIG. 9;

【図１７】図２のアドレス生成回路の更に他の変形例を
示す構成図である。FIG. 17 is a configuration diagram showing still another modification of the address generation circuit of FIG. 2;

【図１８】図１７のアドレス生成回路によるバッファユ
ニットアドレスの生成例を示す図である。18 is a diagram illustrating an example of generation of a buffer unit address by the address generation circuit of FIG. 17;

【図１９】図５中のバッファユニットの２次元配置を３
次元配置に拡張した例を示す構成図である。FIG. 19 shows a two-dimensional arrangement of the buffer units in FIG.
It is a block diagram showing the example extended to the dimensional arrangement.

【図２０】図１９の３次元配置されたバッファユニット
を用いたマルチプロセッサシステムにおける第ｓ面のバ
ッファユニットの接続を示す図である。20 is a diagram illustrating connection of the buffer units on the s-th surface in a multiprocessor system using the buffer units arranged three-dimensionally in FIG. 19;

【図２１】図１９及び図２０に示すマルチプロセッサシ
ステムに図８と同様の共通バスを導入した例を示す図で
ある。FIG. 21 is a diagram showing an example in which a common bus similar to that of FIG. 8 is introduced into the multiprocessor system shown in FIGS. 19 and 20.

【図２２】従来のマルチプロセッサシステムの構成図で
ある。FIG. 22 is a configuration diagram of a conventional multiprocessor system.

【図２３】図２２中の１個のバッファユニットの構成図
である。FIG. 23 is a configuration diagram of one buffer unit in FIG. 22;

【図２４】図２２中の送信側データ転送制御装置が内蔵
しているアドレス生成回路の構成図である。24 is a configuration diagram of an address generation circuit built in the transmission-side data transfer control device in FIG. 22;

【図２５】図２４のアドレス生成回路によるバッファユ
ニットアドレス（チャネル番号）の生成例を示す図であ
る。FIG. 25 is a diagram illustrating an example of generation of a buffer unit address (channel number) by the address generation circuit of FIG. 24;

[Explanation of symbols]

６ネットワークインターフェース７アドレス生成回路８アドレス切換回路９制御レジスタ１０データ転送制御装置１１アドレス制御回路（制御手段）１２アドレスレジスタ（記憶手段）１３アドレスカウンタ（更新手段）１４比較器（比較手段）１４ｂ比較器１５内部バス１６データ転送バス１７，１７ａ，１７ｂ＋１加算器１７ｃ加算器１８セレクタ１８ａ上位セレクタ１８ｂ下位セレクタ１９アドレスラッチ１９ａ上位アドレスラッチ１９ｂ下位アドレスラッチ２０バッファユニット２１ＦＩＦＯメモリ３０プロセッサエレメント１６ａロウバス（第１のデータ転送バス）１６ｂコラムバス（第２のデータ転送バス）４０Ｉ／Ｏ装置（データ入出力装置）４１，４１ａ〜４１ｃ共通バス４２共通バスインターフェース４３加算器Ａバッファユニットアドレス（チャネル番号）Ｃ一致信号ＣＮＴアドレス更新要求信号ＳＥＴアドレス設定信号 Reference Signs List 6 network interface 7 address generation circuit 8 address switching circuit 9 control register 10 data transfer control device 11 address control circuit (control means) 12 address register (storage means) 13 address counter (update means) 14 comparator (comparison means) 14b comparison Device 15 Internal bus 16 Data transfer bus 17, 17a, 17b +1 Adder 17c Adder 18 Selector 18a Upper selector 18b Lower selector 19 Address latch 19a Upper address latch 19b Lower address latch 20 Buffer unit 21 FIFO memory 30 Processor element 16a Row bus ( 16b Column bus (second data transfer bus) 40 I / O device (data input / output device) 41, 41a to 41c Common bus 42 Common bus Interface 43 Adder A Buffer unit address (channel number) C Match signal CNT Address update request signal SET Address setting signal

───────────────────────────────────────────────────── フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 13/14,13/28,13/36 G06F 15/16,15/173 H04L 11/20 ──────────────────────────────────────────────────続き Continued on the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 13 / 14,13 / 28,13 / 36 G06F 15 / 16,15 / 173 H04L 11/20

Claims

(57) [Claims]

A buffer unit connected to a common data transfer bus; and a buffer unit address for sequentially selecting an arbitrary number of buffer units from the plurality of buffer units. A data transfer control device for controlling data transfer between the buffer unit selected by the buffer unit address and the data transfer bus, the data transfer control device comprising: Storage means for holding first and second addresses that specify a range of unit addresses; and holding the first address given from the storage means.
Updating means for outputting as the buffer unit address while sequentially updating the held first address; and a buffer unit address output from the updating means coincides with a second address provided from the storage means. A comparing means for outputting a coincidence signal, and sequentially applying an address update request signal to the updating means so as to cause the updating means to execute an address update, and receiving the coincidence signal from the comparing means. A data transfer device, comprising: a control unit for giving an address setting signal to the updating unit so as to set the first address held in the storage unit to the updating unit.

2. The data transfer device according to claim 1, wherein the storage unit has a function of further holding a third address different from the first and second addresses as an initial address. A data transfer apparatus further comprising a function of setting an initial address held by the storage means in the updating means before outputting an address update request signal to the updating means.

3. The data transfer device according to claim 1, wherein said updating means has a function of independently updating an upper part and a lower part of said held first address. apparatus.

4. The data transfer apparatus according to claim 1, wherein said updating means includes means for adding an increment address to said held first address.

5. The data transfer device according to claim 1, wherein the data transfer control device is commonly connected to a data transfer control device of at least one other data transfer device and at least one data input / output device. And a common bus interface means interposed between the common bus and the data transfer bus.

6. Coordinates (x, y), x, y = 0 to N-1
(N ≧ 2), N ² buffer units arranged at two-dimensional lattice points represented by the following, and each of the N ² buffer units is commonly connected to N buffer units arranged in the x direction. N first
And N ^second buffer units commonly connected to N buffer units arranged in the y direction among the N ² buffer units, respectively.
A data transfer bus, 2N data transfer controllers connected to respective ends of the first and second data transfer buses, and the first data transfer of the 2N data transfer controllers A first data transfer port connected to one of the N data transfer controllers connected to an end of the bus;
A second data transfer port connected to one of the N data transfer controllers connected to the end of the second data transfer bus among the 2N data transfer controllers; and a N number of processor elements, wherein each of the 2N data transfer control device, any from among the N buffer units which are commonly connected to the data transfer control device of the N ² pieces of buffer units Storage means for holding first and second addresses specifying a range of buffer unit addresses for sequentially selecting a number of buffer units; holding first addresses provided from the storage means;
Updating means for outputting as the buffer unit address while sequentially updating the held first address; and a buffer unit address output from the updating means coincides with a second address provided from the storage means. A comparing means for outputting a coincidence signal, and sequentially applying an address update request signal to the updating means so as to cause the updating means to execute an address update, and receiving the coincidence signal from the comparing means. Control means for providing an address setting signal to the updating means so as to set the first address held by the storage means in the updating means.

7. The multiprocessor system according to claim 6, wherein said storage means has a function of further holding a third address different from said first and second addresses as an initial address, and said control means A multi-processor system further comprising a function of setting an initial address held by the storage means in the updating means prior to outputting an address update request signal to the updating means.

8. The multiprocessor system according to claim 6, wherein said updating means has a function of independently updating an upper part and a lower part of said held first address. system.

9. The multiprocessor system according to claim 6, wherein said updating means includes means for adding an increment address to said held first address.

10. The multiprocessor system according to claim 6, wherein N (K ≧ 1) additional buffer units are connected to each of said N first or second data transfer buses. And K third data transfer buses commonly connected to N additional buffer units arranged in the y direction or the x direction of the N × K additional buffer units, respectively, and the third data transfer K connected to each end of the bus
A multiprocessor system further comprising a plurality of additional data transfer control devices.

11. The multiprocessor system according to claim 10, further comprising at least one data input / output device connected to said K additional data transfer control devices.

12. The multiprocessor system according to claim 10, wherein L is connected to each of L (L ≧ 2) additional data transfer control device groups obtained by dividing the K additional data transfer control devices. A multiprocessor system further comprising: a plurality of common buses; and M (M ≧ L) data input / output devices connected to at least one of the L common buses.

13. The multiprocessor system according to claim 10, wherein each of the K additional data transfer control devices is commonly connected to the additional data transfer control device among the N × K additional buffer units. Additional storage means for storing first and second addresses for specifying a buffer unit address range for sequentially selecting an arbitrary number of additional buffer units from the N buffer units; and the additional storage means. And an additional updating means for holding the first address given from the buffer unit and outputting the buffered address as the buffer unit address while sequentially updating the held first address; and a buffer unit address output from the additional updating means. An adder for outputting a match signal when the second address provided from the additional storage means matches; An adding / comparing means, an address updating request signal is sequentially given to the additional updating means so as to cause the additional updating means to execute an address update, and the additional storing means holds the output when a matching signal is output from the additional comparing means. A multi-processor system comprising an additional control means for giving an address setting signal to the additional updating means so as to set the first address in the additional updating means.

14. The multiprocessor system according to claim 13, wherein said additional storage means has a function of further holding a third address different from said first and second addresses as an initial address, Means for setting the initial address held by the additional storage means in the additional update means prior to outputting an address update request signal to the additional update means, the multiprocessor system further comprising: .

15. The multiprocessor system according to claim 13, wherein said additional updating means has a function of independently updating an upper part and a lower part of said held first address. Processor system.

16. The multiprocessor system according to claim 13, wherein said additional updating means includes means for adding an increment address to said held first address.

17. The multiprocessor system according to claim 13, wherein each of the K additional data transfer control devices includes at least one other additional data transfer control device and at least one data input / output device. A multiprocessor system further comprising common bus interface means interposed between a common bus connected in common and one of the third data transfer buses.

18. Coordinates (x, y, z), x, y, z = 0
~N-1 (N ≧ 2) , in a 3-dimensional of N ³ pcs arranged at a lattice point buffer unit represented, N number of buffers, each arranged in the x direction among the N ³ pieces of buffer units first data transfer bus of N ² present, which are commonly connected to the unit, each said N ³ pieces of the N aligned in the y direction in the buffer unit buffer unit commonly connected N ² pieces of the second in a data transfer bus, the first and the 2N ² pieces of the data transfer control device connected to each end portion of the second data transfer bus, said first data of said 2N ² pieces of the data transfer control device A first data transfer port connected to one of the N ² data transfer controllers connected to the end of the transfer bus; and a ^second data transfer port of the 2N ² data transfer controllers.
And a N ² pieces of processor elements each having one to a second data transfer ports connected in the connected N ² pieces of the data transfer control device to the end of the data transfer bus, said 2N each of the ^two data transfer control device, the N ³ pieces of buffer units the data transfer control device to the commonly connected the N from the buffer unit any number of buffer units sequentially selected to for Storage means for holding first and second addresses specifying a range of buffer unit addresses; holding first addresses provided from the storage means;
Updating means for outputting as the buffer unit address while sequentially updating the held first address; and a buffer unit address output from the updating means coincides with a second address provided from the storage means. A comparing means for outputting a coincidence signal, and sequentially applying an address update request signal to the updating means so as to cause the updating means to execute an address update, and receiving the coincidence signal from the comparing means. Control means for providing an address setting signal to the updating means so as to set the first address held by the storage means in the updating means.

19. The multiprocessor system according to claim 18, wherein said storage means has a function of further holding a third address different from said first and second addresses as an initial address, and said control means A multi-processor system further comprising a function of setting an initial address held by the storage means in the updating means prior to outputting an address update request signal to the updating means.

20. The multiprocessor system according to claim 18, wherein said updating means has a function of independently updating an upper part and a lower part of said held first address. system.

21. The multiprocessor system according to claim 18, wherein said updating means includes means for adding an increment address to said held first address.

22. The multiprocessor system according to claim 18, wherein each of said N ² first or second data transfer buses has K
N (K ≧ 1) connected N ² × K additional buffer units, and each of the N ² × K additional buffer units is common to N additional buffer units arranged in the y direction or the x direction. N × K third data transfer buses connected to each other; and N × K third data transfer buses connected to respective ends of the third data transfer buses.
A multiprocessor system further comprising × K additional data transfer control devices.

23. The multiprocessor system according to claim 22, further comprising at least one data input / output device connected to said N × K additional data transfer control devices. .

24. The multiprocessor system according to claim 22, wherein said N × K additional data transfer control devices are connected to each of L (L ≧ 2) additional data transfer control device groups. A multiprocessor system further comprising: L common buses; and M (M ≧ L) data input / output devices connected to at least one of each of the L common buses.

25. The multiprocessor system according to claim 22, wherein each of the N × K additional data transfer control devices is common to the additional data transfer control devices among the N ² × K additional buffer units. Additional storage means for holding first and second addresses specifying a buffer unit address range for sequentially selecting an arbitrary number of additional buffer units from the connected N buffer units; and Additional update means for holding a first address given from the additional storage means and outputting the buffered address as the buffer unit address while sequentially updating the held first address; and a buffer unit output from the additional update means A match signal is output when the address matches the second address given from the additional storage means. An additional comparing means, an address updating request signal is sequentially given to the additional updating means so as to cause the additional updating means to execute an address update, and when an additional signal is output from the additional comparing means, the additional storing means A multi-processor system comprising: an additional control unit that supplies an address setting signal to the additional updating unit so as to set the held first address in the additional updating unit.

26. The multiprocessor system according to claim 25, wherein said additional storage means has a function of further holding a third address different from said first and second addresses as an initial address, Means for setting the initial address held by the additional storage means in the additional update means prior to outputting an address update request signal to the additional update means, the multiprocessor system further comprising: .

27. The multiprocessor system according to claim 25, wherein said additional updating means has a function of independently updating an upper part and a lower part of said held first address. Processor system.

28. The multiprocessor system according to claim 25, wherein said additional updating means includes means for adding an increment address to said held first address.

29. The multiprocessor system according to claim 25, wherein each of said N × K additional data transfer control devices includes at least one other additional data transfer control device and at least one data input / output device. A multi-processor system, further comprising common bus interface means interposed between a common bus commonly connected to the third data transfer bus and one of the third data transfer buses.