JPH05274279A

JPH05274279A - Parallel processing apparatus and method

Info

Publication number: JPH05274279A
Application number: JP4073702A
Authority: JP
Inventors: Takatoshi Kodaira; 高敏小平
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-03-30
Filing date: 1992-03-30
Publication date: 1993-10-22

Abstract

(57)【要約】【目的】並列処理装置のプロセッサ間のデータ伝送速度
及び効率を向上させる。【構成】送信側プロセッサと受信側プロセッサにそれぞ
れデータ送信用及びデータ受信用のＦＩＦＯメモリを設
け、この間を連接することにより、プロセッサ間の実効
的データ伝送速度向上を可能とした。さらに、受信側プ
ロセッサに受信用交替バッファ方式メモリを設け、受信
用ＦＩＦＯメモリよりの受信データを上記受信用バッフ
ァメモリの一方にＤＭＡ方式で転送する。また、処理フ
ェーズごとにFIFOメモリからＤＭＡデータ転送する転送
先バッファメモリを交替することにより、プロセッサ間
データ伝送によるプロセッサの処理能力低下を排除して
いる。 (57) [Abstract] [Purpose] To improve data transmission speed and efficiency between processors of a parallel processing device. A transmission side processor and a reception side processor are provided with FIFO memories for data transmission and data reception, respectively, and by connecting these memories, an effective data transmission rate between the processors can be improved. Further, a receiving alternate buffer system memory is provided in the receiving side processor, and the received data from the receiving FIFO memory is transferred to one of the receiving buffer memories by the DMA system. Further, by replacing the transfer destination buffer memory for transferring the DMA data from the FIFO memory for each processing phase, the deterioration of the processing capability of the processor due to the data transfer between the processors is eliminated.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、並列処理装置及び方法
に係り、特に、高速性，柔軟性に留意した並列処理装置
及び方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a parallel processing apparatus and method, and more particularly to a parallel processing apparatus and method in which high speed and flexibility are taken into consideration.

【０００２】[0002]

【従来の技術】プロセッサを複数用いて並列演算装置を
構成し、高速処理を実現する点については、従来より各
種提案されている。例えば、特開平3−174646 号公報で
は、複数プロセッサを専用の結合線により接続する方法
が開示されている。2. Description of the Related Art Various proposals have heretofore been made to realize high-speed processing by configuring a parallel arithmetic unit using a plurality of processors. For example, Japanese Patent Application Laid-Open No. 3-174646 discloses a method of connecting a plurality of processors with a dedicated coupling line.

【０００３】[0003]

【発明が解決しようとする課題】高速処理を実現方法す
るには、プロセッサ間のデータ伝送速度および効率を向
上させることが性能向上に重要であり、また、プロセッ
サ間のデータ伝送路の構造の柔軟性が演算装置の汎用性
を確保するために重要であるが、上記従来技術では、プ
ロセッサ間データ伝送速度向上と接続構造の柔軟性向上
が強く求められていた。In order to realize high-speed processing, it is important to improve the data transmission speed and efficiency between the processors to improve the performance, and the structure of the data transmission path between the processors is flexible. Although it is important to ensure the versatility of the arithmetic unit, in the above-mentioned conventional technique, it has been strongly demanded to improve the data transmission rate between processors and the flexibility of the connection structure.

【０００４】本発明の目的は、プロセッサ間のデータ伝
送速度及び効率を向上させる並列処理装置及び方法を提
供することである。It is an object of the present invention to provide a parallel processing apparatus and method for improving the data transmission rate and efficiency between processors.

【０００５】本発明の他の目的は、演算装置の汎用性を
確保するためにプロセッサ間のデータ伝送路構造に柔軟
性を付与する並列処理装置及び方法を提供することであ
る。Another object of the present invention is to provide a parallel processing apparatus and method for providing flexibility to a data transmission path structure between processors in order to ensure versatility of an arithmetic unit.

【０００６】[0006]

【課題を解決するための手段】上記目的達成のための本
発明の特徴点は、次の通りである。The features of the present invention for achieving the above object are as follows.

【０００７】（１）プロセッサ間データ伝送の実効速度
を向上させるために、送信側プロセッサと受信側プロセ
ッサにそれぞれデータ送信用及びデータ受信用のＦＩＦ
Ｏ（First−in First−out）メモリを設け、この間を連
接することにより、プロセッサ間の実効的データ伝送速
度向上を可能とした。(1) In order to improve the effective speed of data transmission between processors, the transmitting side processor and the receiving side processor respectively have a data transmission FIFO and a data reception FIFO.
By providing an O (First-in First-out) memory and connecting these, it is possible to improve the effective data transmission rate between the processors.

【０００８】（２）さらに上記受信側プロセッサに受信
用交替バッファ方式メモリを設け、受信用ＦＩＦＯメモ
リよりの受信データを上記受信用バッファメモリの一方
にDMA（Direct Memory Access）方式で転送する。ま
た、処理フェーズごとにＦＩＦＯメモリからＤＭＡデー
タ転送する転送先バッファメモリを交替することによ
り、プロセッサ間データ伝送によるプロセッサの処理能
力低下を排除している。(2) Further, the reception side processor is provided with a reception alternate buffer system memory, and the reception data from the reception FIFO memory is transferred to one of the reception buffer memories by a DMA (Direct Memory Access) system. Further, by replacing the transfer destination buffer memory for transferring the DMA data from the FIFO memory for each processing phase, it is possible to eliminate the deterioration of the processing capability of the processor due to the data transfer between the processors.

【０００９】（３）複数のプロセッサ間の接続部分に任
意に接続を切り替えられるスイッチ網を挿入することに
より並列処理装置の構造を自由に変更可能とした。(3) The structure of the parallel processing device can be freely changed by inserting a switch network capable of arbitrarily switching the connection in the connection portion between the plurality of processors.

【００１０】（４）複数のプロセッサ間の接続構造を階
層化し、かつ各階層の構造を同一にし、階層の段数を増
加させることによって接続するプロセッサを事実上無制
限に増やすことを可能とした。(4) The connection structure between a plurality of processors is hierarchized, the structures of the respective hierarchies are the same, and the number of hierarchies of the hierarchies is increased, thereby making it possible to increase the number of processors to be connected virtually unlimitedly.

【００１１】上記及び上記以外の本発明の特徴点につい
ては、以下の記載より、さらに明確とされる。The features of the present invention other than the above and the above are further clarified from the following description.

【００１２】[0012]

【作用】本発明によれば、データ伝送を行なう２つのプ
ロセッサにそれぞれ送信用FIFOメモリと受信用ＦＩＦＯ
メモリを設け、その間を連接し、さらに受信側プロセッ
サに交替バッファ構成の受信データメモリ領域を設け、
受信ＦＩＦＯメモリから受信バッファにＤＭＡ方式でデ
ータ転送を行なう。一つの処理フェーズでは、一方の受
信バッファに受信ＦＩＦＯメモリよりデータが転送さ
れ、もう一方の受信バッファの内容が読みだされて演算
に使用される。次の処理フェーズでは交替バッファを切
り替え、受信ＦＩＦＯメモリから書き込んであるデータ
を演算用に読みだすことができる。According to the present invention, a transmission FIFO memory and a reception FIFO are respectively provided to two processors for data transmission.
A memory is provided, the space between them is connected, and a receiving data memory area having a replacement buffer configuration is further provided in the receiving processor.
Data is transferred from the reception FIFO memory to the reception buffer by the DMA method. In one processing phase, data is transferred from the reception FIFO memory to one reception buffer, and the contents of the other reception buffer are read out and used for calculation. In the next processing phase, the alternate buffer can be switched and the written data can be read out from the reception FIFO memory for calculation.

【００１３】このような方法で２つのプロセッサ間のデ
ータ伝送が行なわれるので、データの流れからみて上流
のプロセッサが一つ前の処理フェーズで計算した結果を
一つ下流のプロセッサがメモリアクセスにより参照でき
る。データの流れの向きにそってパイプライン形式の並
列処理を実施する場合には、プロセッサ間のデータ伝送
に必要な時間は高速メモリに対するハードウェア的なア
クセス時間のみである。Since data is transmitted between the two processors by such a method, one downstream processor refers to the result calculated by the upstream processor in the previous processing phase in terms of data flow by memory access. it can. When pipeline-type parallel processing is performed according to the direction of data flow, the time required for data transmission between processors is only the hardware access time to the high-speed memory.

【００１４】この結果、プロセッサ間の情報伝達量を従
来方式に較べて１０ないし１００倍向上させることがで
き、該並列処理装置の処理能力が向上する。As a result, the amount of information transmitted between the processors can be improved 10 to 100 times as compared with the conventional system, and the processing capacity of the parallel processing device is improved.

【００１５】さらに、接続する複数プロセッサ間の送受
信ＦＩＦＯメモリ間に交換接続用スイッチ網を挿入し、
その接続状態を任意に変更できるので該並列処理装置を
構成するプロセッサ群の接続を処理内容に適応して効率
のよい構造に設定、汎用性のある演算を実施することが
できる。Further, a switch network for exchange connection is inserted between transmission / reception FIFO memories between a plurality of processors to be connected,
Since the connection state can be arbitrarily changed, the connection of the processor group forming the parallel processing device can be set to an efficient structure by adapting to the processing content, and a versatile operation can be performed.

【００１６】また、複数のプロセッサ間の接続構造を階
層化し、かつ各階層の構造を同一にしてあるので、階層
の段数を増減することによって接続するプロセッサを任
意に増減することが可能であり、目的に合致した処理性
能の演算能力を実現することができる。Further, since the connection structure between a plurality of processors is hierarchized and the structure of each hierarchy is the same, it is possible to arbitrarily increase or decrease the number of processors to be connected by increasing or decreasing the number of stages of the hierarchy. It is possible to realize the computing power of the processing performance that matches the purpose.

【００１７】[0017]

【実施例】本発明は音響，画像，映像をはじめとする各
種信号処理に見られる如く、まとまったデータの集合に
対し、順次処理を行なうことにより最終的な結果が得ら
れるパイプライン処理に向いている。これらの処理は自
然界で２次元的あるいは３次元的広がりを持つデータを
対象としており、パイプライン処理と同時に並列処理が
可能なものである。本発明はこれら並列かつパイプライ
ン処理が可能な処理対象に対し、その対象に最も適した
構造の並列あるいはパイプラインあるいはその双方の組
合わせによる処理（以下、「並列・パイプライン処理」
と称す）装置及び方法を提供し、高速演算を実現するも
のである。このため、並列処理装置としての処理能力が
高いだけでなく、対象に対応して任意に並列・パイプラ
インの処理構造を変更できる点に特徴がある。以下、本
発明の実施例を図面を用いて説明する。以下、プロセッ
サエレメントをＰＥ，プロセッサエレメント群をＰＥ
群，プロセッサエレメント群集合体をＰＥ群集合体と称
す。BEST MODE FOR CARRYING OUT THE INVENTION The present invention is suitable for pipeline processing in which a final result is obtained by sequentially processing a set of data, as is found in various signal processing such as sound, image and video. ing. These processes are intended for data having a two-dimensional or three-dimensional spread in the natural world, and are capable of parallel processing simultaneously with pipeline processing. The present invention deals with these parallel and pipeline-processable processing targets by a parallel or pipeline having a structure most suitable for the target or a combination of both (hereinafter, “parallel / pipeline processing”).
Device) and method for realizing high-speed operation. Therefore, not only is the processing capability as a parallel processing device high, but the parallel / pipeline processing structure can be arbitrarily changed according to the target. Embodiments of the present invention will be described below with reference to the drawings. Hereinafter, PE is a processor element and PE is a processor element group.
A group and a processor element group aggregate are called a PE group aggregate.

【００１８】図１は、並列処理装置の構成要素たる群集
合体の構成例を示すものである。ＰＥ群((１)〜(２のｍ
乗−１))４が並列・パイプライン処理を担当するプロセ
ッサの集合であり、各々複数個のＰＥより構成されてい
る。ＣＰＵ１は、本ＰＥ群集合体の全体動作を制御統括
するものであり、一般のマイクロプロセッサボードを用
いることができる。メモリ２は、該ＣＰＵ１の処理プロ
グラムとデータ格納、作業領域に使用するものである。
外部インターフェイス部３は、本ＰＥ群集合体が外部の
制御計算機等と連結して使用する場合にデータ交信する
ためのものであり、外部インターフェイス信号線８はイ
ーサネット等業界標準のものである。スイッチ網／同期
制御部６は並列・パイプライン処理を行なう複数のＰＥ
群４に対し、ＰＥ接続制御網制御信号線１２を介して、
処理開始のタイミング制御とＰＥ間接続制御を行なう。
該処理開始タイミングは本ＰＥ群集合体の内部処理状態
から求められる場合と、外部同期信号線１５により外部
同期による場合がある。ＰＥ群接続交換スイッチ網７
は、ＰＥ群間リンケージ信号線１１で接続された該複数
ＰＥ群４間の接続形態を柔軟に変更でき、所望の並列接
続，パイプライン接続、またはそれらの組合わせの形態
で相互に接続される並列・パイプライン処理構造を実現
する。スイッチ網／同期制御部６は、処理対象のアルゴ
リズムに対応して決定されたＰＥ群間の接続形態を実現
するように、ＰＥ群接続交換スイッチ網制御信号線１３
を介して、ＰＥ群接続交換スイッチ網７の内部接続状況
を制御する。ＣＰＵバス９は、ＣＰＵ１が処理の進行状
態あるいは外部よりの指令に基づきＰＥ群リンケージ部
５，スイッチ網／同期制御部６等、ＣＰＵバス９に接続
されている装置を制御するためのデータ経路で、業界標
準のバスでよい。ＰＥ群リンケージ部５は、ＣＰＵ−Ｐ
Ｅ群間リンケージ信号線１０を介してＰＥ群４に接続さ
れる。図１のＰＥ群集合体は、ＰＥ群集合体間リンケー
ジ信号線１４を経由して他のＰＥ群集合体へ接続され
る。FIG. 1 shows an example of the structure of a group aggregate which is a constituent element of the parallel processing device. PE group ((1) ~ (2 m
-1)) 4 is a set of processors in charge of parallel / pipeline processing, each of which is composed of a plurality of PEs. The CPU 1 controls and controls the entire operation of the PE group assembly, and a general microprocessor board can be used. The memory 2 is used for storing a processing program of the CPU 1 and data, and a work area.
The external interface section 3 is for data communication when the PE group assembly is used by being connected to an external control computer or the like, and the external interface signal line 8 is an industry standard such as Ethernet. The switch network / synchronization control unit 6 includes a plurality of PEs that perform parallel / pipeline processing.
For the group 4, through the PE connection control network control signal line 12,
Timing control of processing start and connection control between PEs are performed.
The processing start timing may be obtained from the internal processing state of the PE group assembly or may be externally synchronized by the external synchronization signal line 15. PE group connection exchange switch network 7
Can flexibly change the connection form between the plurality of PE groups 4 connected by the PE group linkage signal line 11, and are connected to each other in a desired parallel connection, pipeline connection, or a combination thereof. A parallel / pipeline processing structure is realized. The switch network / synchronization control unit 6 implements the PE group connection switching switch network control signal line 13 so as to realize the connection form between the PE groups determined corresponding to the algorithm to be processed.
The internal connection status of the PE group connection exchange switch network 7 is controlled via the. The CPU bus 9 is a data path for the CPU 1 to control the devices connected to the CPU bus 9, such as the PE group linkage unit 5, the switch network / synchronization control unit 6, etc., based on the progress of processing or an external command. , An industry standard bus will do. The PE group linkage unit 5 is a CPU-P.
It is connected to the PE group 4 via the linkage signal line 10 between the E groups. The PE group aggregate of FIG. 1 is connected to another PE group aggregate via the inter-PE group aggregate linkage signal line 14.

【００１９】図２は、図１のＰＥ群４の構造を説明する
構成図である。複数のＰＥがＰＥ間リンケージ信号線１
７によりＰＥ接続交換スイッチ網１８に接続されてい
る。FIG. 2 is a block diagram for explaining the structure of the PE group 4 of FIG. Linkage signal line 1 between PEs
7 is connected to the PE connection exchange switch network 18.

【００２０】図３は、図２における各ＰＥ((１)〜(２の
ｎ乗−１))１６の内部構成例であり、演算処理部１９
に、メモリバスＡ２８，メモリバスＢ２９，ＤＭＡチャ
ネル(１〜Ｎ)２４が接続されている。メモリバスＡ２８
には入力ＦＩＦＯメモリ２５，出力ＦＩＦＯメモリ２６
と通信ポート制御部２７が接続されており、ポート制御
部２７よりプロセッサエレメント間リンケージ信号線３
０により他ＰＥとの連接が可能となっている。通信ポー
ト及び入出力ＦＩＦＯメモリを複数メモリバスＡ２８に
接続し、複数のＰＥとの連接が可能である。メモリバス
Ｂ２９には、プログラムメモリ２０，データ／ワークメ
モリ２１を接続し、さらに各々の通信ポートに対応して
一組の入力バッファＡ２２と出力バッファＢ２３を接続
する。ＤＭＡチャネル２４は各通信ポートに対応して設
けられ、ＤＭＡチャネル制御信号線３１を介した演算処
理部１９の制御信号により、入力ＦＩＦＯメモリに受信
したデータを演算処理部１９の処理負荷となることな
く、入力バッファＡ２２または入力バッファＢ２３にＤ
ＭＡ転送する。FIG. 3 shows an internal configuration example of each PE ((1) to (2 to the nth power −1)) 16 in FIG.
Further, a memory bus A28, a memory bus B29, and a DMA channel (1 to N) 24 are connected. Memory bus A28
Has an input FIFO memory 25 and an output FIFO memory 26.
Is connected to the communication port control unit 27, and the linkage signal line 3 between processor elements is connected from the port control unit 27.
With 0, connection with other PE is possible. A communication port and an input / output FIFO memory can be connected to a plurality of memory buses A28 to connect with a plurality of PEs. A program memory 20 and a data / work memory 21 are connected to the memory bus B29, and a set of an input buffer A22 and an output buffer B23 are connected to each communication port. The DMA channel 24 is provided corresponding to each communication port, and the data received in the input FIFO memory becomes a processing load of the arithmetic processing unit 19 by the control signal of the arithmetic processing unit 19 via the DMA channel control signal line 31. D in input buffer A22 or input buffer B23
MA transfer.

【００２１】図４は、ＰＥの他の内部構成例であり、図
３のメモリバスＢ２９をメモリバスＢ２９とメモリバス
Ｃ３２の独立構成としたものである。この構成では、各
通信ポートに対応する入力バッファＡ２２と入力バッフ
ァＢ２３を別のメモリバスに接続するので、演算処理部
１９が入力バッファＡ２２の内容をメモリバスＢ２９経
由でアクセスしている処理フェーズでは入力ＦＩＦＯメ
モリ２５よりメモリバスＡ２８，メモリバスＣ３２経由
で入力バッファＢ２３にＤＭＡデータ転送を実施し、以
下処理フェーズごとに入力バッファメモリをきりかえ
る。従って、演算処理部１９による入力バッファアクセ
スとＤＭＡ転送に使用するメモリバスを別に並行して行
なうことができ、バス競合による処理能力低下を防止す
ることができる。FIG. 4 shows another example of the internal structure of the PE, in which the memory bus B29 shown in FIG. 3 has an independent structure of the memory bus B29 and the memory bus C32. In this configuration, since the input buffer A22 and the input buffer B23 corresponding to each communication port are connected to different memory buses, in the processing phase where the arithmetic processing unit 19 is accessing the contents of the input buffer A22 via the memory bus B29. DMA data transfer is performed from the input FIFO memory 25 to the input buffer B23 via the memory bus A28 and the memory bus C32, and the input buffer memory is switched for each processing phase. Therefore, the input buffer access by the arithmetic processing unit 19 and the memory bus used for the DMA transfer can be separately performed in parallel, and the deterioration of the processing capacity due to the bus competition can be prevented.

【００２２】図２のＰＥ接続交換スイッチ網１８は、図
３または図４の通信ポートと連接するＰＥ間リンケージ
信号線３０を各通信ポートにつき、また各ＰＥにつき接
続するものである。ＰＥ接続交換スイッチ網１８はＰＥ
の各通信ポートに対応してポート数だけの同一構造のも
のから構成される。該同一構造の一つにつき、その構成
例を示したものが図５である。The PE connection exchange switching network 18 of FIG. 2 connects the PE-to-PE linkage signal line 30 connected to the communication port of FIG. 3 or 4 to each communication port and to each PE. PE connection exchange switch network 18 is PE
Each communication port has the same structure as the number of ports. FIG. 5 shows a configuration example of one of the same structures.

【００２３】図５に示すように、接続交換スイッチ網の
機能は、ＰＥ群を構成するＰＥに対し、その相互接続の
自由を実現することにある。スイッチ網部３３は接続交
換スイッチ網入力信号線３７と接続交換スイッチ網出力
信号線３８間の交換接続を実現するものであり、接続交
換スイッチ網入力信号線３７は各ＰＥのＰＥ間リンケー
ジ信号線３０と接続され、接続交換スイッチ網出力信号
線３８は各ＰＥの別のＰＥ間リンケージ信号線３０と接
続される。スイッチ網部３３はスイッチ素子３６を接続
し構成される。接続交換スイッチ網入力信号線３７は各
ＰＥの数に外部接続用の１チャネルを加えた数だけ用意
され、その数は接続交換の効率化を図るため２のべき乗
数に選定するのがよい。図５における構成は、接続可能
なＰＥ数が８の例を示している。スイッチ網部３３の交
換機能部は、接続先ＰＥに対応したＰＥ０からＰＥ７ま
での行と、第０段から第３段までの列より構成され、各
スイッチ素子の状態を制御することにより接続交換スイ
ッチ網入力信号線３７のＰＥと接続交換スイッチ網出力
信号線３８の任意のＰＥを接続することができる。As shown in FIG. 5, the function of the connection exchange switch network is to realize the freedom of interconnection between PEs forming a PE group. The switch network unit 33 realizes exchange connection between the connection exchange switch network input signal line 37 and the connection exchange switch network output signal line 38, and the connection exchange switch network input signal line 37 is a PE-PE linkage signal line of each PE. The connection switching switch network output signal line 38 is connected to another PE inter-PE linkage signal line 30 of each PE. The switch net portion 33 is configured by connecting switch elements 36. The connection exchange switch network input signal line 37 is prepared by adding the number of each PE plus one channel for external connection, and the number is preferably selected as a power of 2 in order to improve the efficiency of connection exchange. The configuration in FIG. 5 shows an example in which the number of connectable PEs is eight. The exchange function unit of the switch network unit 33 includes rows from PE0 to PE7 corresponding to the connection destination PE and columns from the 0th stage to the 3rd stage, and the connection exchange is performed by controlling the state of each switch element. The PE of the switch network input signal line 37 and any PE of the connection switching switch network output signal line 38 can be connected.

【００２４】一般に２のｎ乗個のＰＥが存在するとき、
第０段から第ｎ段のバススイッチを、行数が２のｎ乗
個，列数がｎ＋１となるように行列で配置し、各行列の
バススイッチ素子の接続先を次の（１)(２)(３）の原則
で決定する。Generally, when there are 2n PEs,
The 0th to nth stage bus switches are arranged in a matrix so that the number of rows is 2 to the nth power and the number of columns is n + 1, and the connection destinations of the bus switch elements of each matrix are as follows (1) ( 2) Determined according to the principle of (3).

【００２５】（１）ＰＥに０から（２のｎ乗）−１まで
の順序数ｉを割り当てる。(1) An ordinal number i from 0 to (2 to the nth power) -1 is assigned to PE.

【００２６】（２）ｉを２進数で表現すると、２の（ｎ
−１）乗から２の０乗までのｎ桁の２進数となる。(2) When i is represented by a binary number, (n of 2)
It is an n-digit binary number from the power of 1 to the power of 2 0.

【００２７】（３）（２）で２のｋ乗ビットが０のときｉ行（ｋ＋１）列の素子と（ｉ＋２のｋ乗）行ｋ列の素
子ｉ行（ｋ＋１）列の素子とｉ行ｋ列の素子を接続する。(3) When the bit of the 2nd power of k is 0 in (2): The element of the i-th row (k + 1) th column and the element of the (i + 2kth) -th row kth column The element of the i-th row (k + 1) th column and the i-th row Connect the elements in column k.

【００２８】（２）で２のｋ乗ビットが１のときｉ行（ｋ＋１）列の素子と（ｉ−２のｋ乗）行ｋ列の素
子ｉ行（ｋ＋１）列の素子とｉ行ｋ列の素子を接続する。In (2), when the k-th power of 2 is 1, the element in the i-th row (k + 1) th column and the element in the (i-2th k-th) -th row k-th column The element in the i-th row (k + 1) th column and the i-th row k Connect the elements in a row.

【００２９】以上、（２)(３）をｋ＝０からｎ−１ま
で、ｉ＝０から（２のｎ乗）−１までについて１きざみ
で実施する。As described above, (2) and (3) are carried out in steps of 1 from k = 0 to n-1 and from i = 0 to (2 to the nth power) -1.

【００３０】ｎ−１列から１列までのスイッチ素子間接
続路はＰＥ間の接続状況により重複使用されることがあ
るので必要に応じ多重化する。スイッチ網部３３の接続
状態は各スイッチ素子３６の接続状態を切り替えること
により制御可能である。切り替え制御部３４はＰＥ接続
交換スイッチ網制御信号線１２を介してＣＰＵ１より制
御され、接続状態メモリ３５に選択しうる接続状態を事
前に格納しておき、ＣＰＵ１よりの選択信号により接続
パターンを選択しスイッチ素子３６の状態を制御する。Since the connection paths between the switch elements from the (n-1) th column to the 1st column may be used in duplicate depending on the connection status between PEs, they are multiplexed as necessary. The connection state of the switch network 33 can be controlled by switching the connection state of each switch element 36. The switching control unit 34 is controlled by the CPU 1 via the PE connection exchange switch network control signal line 12, stores a connection state that can be selected in the connection state memory 35 in advance, and selects a connection pattern by a selection signal from the CPU 1. Then, the state of the switch element 36 is controlled.

【００３１】切り替え制御部３４は各行列のスイッチ素
子状態をＰＥ接続交換スイッチ網制御線１２を経由して
接続状態メモリ３５に記憶する。スイッチ網／同期制御
部６より接続切り替え指令があった場合はまず切り替え
制御部３４が接続切り替え指令を受信する。次に、接続
状態メモリ３５の内容から選択されたスイッチ接続状態
に基づきスイッチ素子状態を規定する。ＣＰＵ−ＰＥ間
リンケージ信号線１０は第４段または第５段のスイッチ
素子３６により各ＰＥに接続可能であり、CPU1よりの初
期プログラムローディング及びデータ転送に使用され
る。The switching control unit 34 stores the switch element states of each matrix in the connection state memory 35 via the PE connection exchange switch network control line 12. When there is a connection switching command from the switch network / synchronization control unit 6, the switching control unit 34 first receives the connection switching command. Next, the switch element state is defined based on the switch connection state selected from the contents of the connection state memory 35. The CPU-PE linkage signal line 10 can be connected to each PE by the fourth or fifth stage switch element 36, and is used for initial program loading from the CPU 1 and data transfer.

【００３２】図６に、各々のスイッチ素子３６の機能の
一例を示す。スイッチ素子入力信号線３９を介した２入
力のバス入力とスイッチ素子出力信号線４０を介した２
出力のバス出力間で外部制御信号に基づき交換接続を行
ない、スイッチ信号制御信号線４１を介した指令によ
り、（ａ)(ｂ）のいずれかの状態をとる。FIG. 6 shows an example of the function of each switch element 36. 2 inputs bus input via switch element input signal line 39 and 2 inputs via switch element output signal line 40
Exchange connection is made between the output bus outputs based on an external control signal, and one of the states (a) and (b) is set by a command via the switch signal control signal line 41.

【００３３】図７は、スイッチ素子３６の構成の一例を
示したものである。ＡＮＤゲート４２，ＯＲゲート４
３，ＮＯＴゲート４４を組み合わせることにより実現す
ることができる。FIG. 7 shows an example of the structure of the switch element 36. AND gate 42, OR gate 4
3, can be realized by combining the NOT gates 44.

【００３４】図８は、隣接接続されたＰＥ（ａ）４５及
びＰＥ（ｂ）５３の動作を関連づけて説明するための構
成図である。ＰＥ４５は、演算プロセッサ（ａ）４６，
入力バッファａ−Ａ４７，入力バッファａ−Ｂ４８，入
力ＦＩＦＯ（ａ）４９，出力ＦＩＦＯ（ａ）５０，ＤＭ
Ａチャネル（ａ）５１を含む。ＰＥ（ｂ）５３は、演算
プロセッサ（ｂ）５４，入力バッファｂ−Ａ５５，入力
バッファｂ−Ｂ５６，入力ＦＩＦＯ（ｂ）５７，出力Ｆ
ＩＦＯ（ｂ）５８，ＤＭＡチャネル（ｂ）５９を含み、
通信ポート出力（ａ）５２，接続交換スイッチ網６２及
び通信ポート入力（ｂ）６０を介して、ＰＥ４５に接続
される。ＰＥ（ｂ）５３は、さらに、通信ポート出力
（ｂ）６１を介して、更に、他のＰＥに接続される。FIG. 8 is a block diagram for explaining the operations of the PE (a) 45 and the PE (b) 53 connected to each other in association with each other. The PE 45 includes an arithmetic processor (a) 46,
Input buffer a-A47, input buffer a-B48, input FIFO (a) 49, output FIFO (a) 50, DM
A channel (a) 51 is included. The PE (b) 53 includes an arithmetic processor (b) 54, an input buffer b-A55, an input buffer b-B56, an input FIFO (b) 57, and an output F.
Including IFO (b) 58 and DMA channel (b) 59,
It is connected to the PE 45 via the communication port output (a) 52, the connection exchange switch network 62 and the communication port input (b) 60. The PE (b) 53 is further connected to another PE via the communication port output (b) 61.

【００３５】図９は、図８のように隣接接続されたＰＥ
（ａ）４５及びＰＥ（ｂ）５３の動作の一例を、タイム
チャートで記したものである。ここで、並列・パイプラ
イン処理の対象となる一まとまりの処理を一フェーズの
処理と呼ぶことにする。FIG. 9 shows PEs connected adjacently as shown in FIG.
An example of the operation of (a) 45 and PE (b) 53 is shown in a time chart. Here, a group of processes that are targets of the parallel / pipeline process will be referred to as one-phase process.

【００３６】フェーズ１の処理では、演算プロセッサ
(ａ)４６は出力ＦＩＦＯメモリ（ａ）５０に演算結果を
格納する。図２におけるＰＥ接続交換スイッチ網１８の
論理的動作を接続された２組のＰＥ間に着目してみると
図８の接続交換スイッチ網６２ように単純化できる。フ
ェーズ１では出力ＦＩＦＯメモリ（ａ）５０に格納され
た演算結果データはＰＥ接続スイッチ網６２経由でＰＥ
(ｂ)５３の入力ＦＩＦＯメモリ（ｂ）５７に直ちに転送
される。入力ＦＩＦＯメモリ（ｂ）５７のデータはＤＭ
Ａチャネル（ｂ）５９により入力バッファｂ−Ａ５５に
ＤＭＡ転送される。In the processing of phase 1, the arithmetic processor
(a) 46 stores the operation result in the output FIFO memory (a) 50. Focusing on the logical operation of the PE connection exchange switch network 18 in FIG. 2 between the two sets of connected PEs, the connection exchange switch network 62 in FIG. 8 can be simplified. In phase 1, the operation result data stored in the output FIFO memory (a) 50 is transferred to the PE via the PE connection switch network 62.
(b) Immediately transferred to the input FIFO memory (b) 57 of 53. The data in the input FIFO memory (b) 57 is DM
DMA transfer is performed to the input buffer b-A55 by the A channel (b) 59.

【００３７】フェーズ２の処理では、入力バッファｂ−
Ａ５５が演算プロセッサ（ｂ）５４に接続され、入力バ
ッファｂ−Ｂ５６に入力ＦＩＦＯメモリ（ｂ）５７より
のデータがＤＭＡ転送される。フェーズ１の演算結果は
入力バッファｂ−Ａ５５に格納されたままフェーズ２で
は演算プロセッサ（ｂ）５４がアクセス可能となり、演
算プロセッサ（ｂ）５４はその内容に従って次の段階の
処理をパイプラインで実施することができる。フェーズ
２ではこの間、演算プロセッサ（ａ）４６がフェーズ１
の次の演算を実施し、その結果をＰＥ接続交換スイッチ
網６２経由で入力バッファｂ−Ｂ５６に格納している。In the processing of phase 2, the input buffer b-
A55 is connected to the arithmetic processor (b) 54, and the data from the input FIFO memory (b) 57 is DMA-transferred to the input buffer b-B56. The operation result of the phase 1 is stored in the input buffer b-A 55, and the operation processor (b) 54 can access it in the phase 2 and the operation processor (b) 54 executes the process of the next stage according to the contents in the pipeline. can do. In Phase 2, during this time, the arithmetic processor (a) 46 is in Phase 1
Is executed and the result is stored in the input buffer b-B56 via the PE connection exchange switch network 62.

【００３８】図１０（ａ）及び図１０（ｂ）は、７個の
ＰＥよりなる並列処理装置の接続実現例である。図２の
ＰＥ接続交換スイッチ網１８の接続を切り替えることに
よりＰＥ間の任意の接続を実現することができる。ＰＥ
接続交換スイッチ網１８は、この場合８個のＰＥを接続
交換することが可能であるが、図５に示す接続交換スイ
ッチ網入力信号線３７及び接続交換スイッチ網出力信号
線３８の各々ＰＥ１個分についてはＰＥを割当てずに外
部接続用に残し、７個のＰＥを接続したものである。一
重線６３による接続と二重線６４による接続及び点線６
５はそれぞれ別個のＰＥ接続交換網スイッチ網による接
続を示す。例えば、図１１は、ＰＥ接続交換スイッチ網
（ａその１）の接続機能図である。図１２は、ＰＥ接続
交換スイッチ網（ａその２）の接続機能図である。図１
３は、ＰＥ接続交換スイッチ網（ａその３）の接続機能
図である。図１４は、ＰＥ接続交換スイッチ網（ｂその
１）の接続機能図である。図１５は、ＰＥ接続交換スイ
ッチ網（ｂその２）の接続機能図である。図１６は、Ｐ
Ｅ接続交換スイッチ網（ｂその３）の接続機能図であ
る。10 (a) and 10 (b) are examples of connecting and implementing a parallel processing device composed of seven PEs. By switching the connection of the PE connection exchange switch network 18 of FIG. 2, an arbitrary connection between PEs can be realized. PE
In this case, the connection exchange switch network 18 can connect and exchange eight PEs. However, one PE of each of the connection exchange switch network input signal line 37 and the connection exchange switch network output signal line 38 shown in FIG. In the case of, the PEs are not allocated and are left for external connection, and seven PEs are connected. Connection by single line 63, connection by double line 64 and dotted line 6
Reference numeral 5 indicates a connection by a separate PE connection switching network switch network. For example, FIG. 11 is a connection function diagram of the PE connection exchange switch network (a part 1). FIG. 12 is a connection function diagram of the PE connection exchange switch network (a-2). Figure 1
3 is a connection function diagram of the PE connection exchange switch network (a-3). FIG. 14 is a connection function diagram of the PE connection exchange switch network (b-1). FIG. 15 is a connection function diagram of the PE connection exchange switch network (b-2). FIG. 16 shows P
It is a connection function diagram of E connection exchange switch network (b 3).

【００３９】図１０（ａ）の例では、図５のＰＥ接続交
換スイッチ網を３組用い、一重線６３の部分をＰＥ接続
交換スイッチ網（ａその１）の接続機能図（図１１）中
の太線部分を接続するように設定し、二重線６４の部分
をＰＥ接続交換スイッチ網（ａその２）の接続機能図
（図１２）中の太線部分を接続するように設定し、さら
に点線６５の部分をＰＥ接続交換スイッチ網（ａその
３）の接続機能図（図１３）中の太線部分を接続するよ
うに設定することにより実現できる。In the example of FIG. 10A, three sets of PE connection exchange switch networks of FIG. 5 are used, and the portion of the single line 63 is a connection function diagram of PE connection exchange switch network (a 1) (FIG. 11). Of the PE connection exchange switch network (a 2) is connected to the thick line portion of the PE connection exchange switch network (FIG. 12), and the dotted line is further connected. This can be realized by setting the portion 65 to connect the thick line portion in the connection function diagram (FIG. 13) of the PE connection exchange switch network (a 3).

【００４０】また、図１０（ｂ）の例では、同様に図５
のＰＥ接続交換スイッチ網を３組用い、一重線６３の部
分をＰＥ接続交換スイッチ網（ｂその１）の接続機能図
（図１４）中の太線部分を接続するように設定し、二重
線６４の部分をＰＥ接続交換スイッチ網（ｂその２）の
接続機能図（図１５）中の太線部分を接続するように設
定し、さらに点線６５の部分をＰＥ接続交換スイッチ網
（ｂその３）の接続機能図（図１６）中の太線部分を接
続するように設定することにより実現できる。図１０に
なる接続は例であり、処理対象の並列・パイプライン処
理構造に対応してＣＰＵ１がＰＥ接続パターンを設定
し、スイッチ網／同期制御部６に指令を与える。ＰＥ接
続指令は演算の開始にあたり１回のみＰＥ接続パターン
を指定し、以降その演算が終了するまで同一の接続を保
持してもよく、また必要に応じ演算途中で処理フェーズ
の開始に先立ち接続を変更してもよい。Further, in the example of FIG.
3 sets of PE connection exchange switch networks, and the single line 63 is set to connect the thick line portion in the connection function diagram (FIG. 14) of the PE connection exchange switch network (FIG. 14). The portion 64 is set to connect the thick line portion in the connection function diagram (FIG. 15) of the PE connection exchange switch network (b-2), and the portion of the dotted line 65 is the PE connection exchange switch network (b-3). It can be realized by setting so that the thick line portion in the connection function diagram (FIG. 16) is connected. The connection shown in FIG. 10 is an example, and the CPU 1 sets a PE connection pattern corresponding to the parallel / pipeline processing structure of the processing target, and gives a command to the switch network / synchronization control unit 6. The PE connection command may specify the PE connection pattern only once at the start of the calculation, and may keep the same connection until the calculation is completed. If necessary, the PE connection command may be connected before the start of the processing phase during the calculation. You may change it.

【００４１】各ＰＥ群４を構成する各ＰＥを所期のプロ
グラムとデータに従って動作させるには各ＰＥにプログ
ラムとデータを必要なタイミングでローディングする必
要がある。図１におけるＰＥ群リンケージ部５は各ＰＥ
群４に対してこの目的で存在するものであり、ＰＥ群リ
ンケージ部５の一方はＣＰＵバス９に接続され、もう一
方はＰＥ群４に接続される。In order to operate each PE which constitutes each PE group 4 according to a desired program and data, it is necessary to load the program and data into each PE at a necessary timing. The PE group linkage portion 5 in FIG.
This group exists for this purpose with respect to the group 4, one of the PE group linkage parts 5 is connected to the CPU bus 9, and the other is connected to the PE group 4.

【００４２】図１７は、図１のＰＥ群リンケージ部５の
構成例であり、演算処理部６６にメモリバスＡ７５とメ
モリバスＢ７６が接続され、さらにＤＭＡチャネル制御
線７９を介してＤＭＡチャネル７０が接続されている。
メモリバスＢ７６には入力ＦＩＦＯメモリ７１，出力Ｆ
ＩＦＯメモリ７２、およびポート制御部７３より構成さ
れる通信ポートが接続される。一方、メモリバスＡ７５
にはプログラムメモリ７８，データ／ワークメモリ６
７，デュアルポートメモリ６８、及びバッファメモリ６
９が接続されている。ＣＰＵ１とはＣＰＵバス９，ＣＰ
Ｕバスインタフェース信号線７４経由でデュアルポート
メモリ６８が接続され、ＰＥにローディングすべきプロ
グラム及びデータをＣＰＵ１からデュアルポートメモリ
７４に書き込む。ＰＥ群４を構成する各ＰＥ１６とはＰ
Ｅリンケージ信号線７７を経由して連接されている。デ
ュアルポートメモリ６８に書き込まれた情報は、ＤＭＡ
チャネル７０の制御により出力ＦＩＦＯメモリ７２に書
き込まれポート制御部７３より連接先の各ＰＥに伝送さ
れる。ＰＥ群リンケージ部５の主目的は効率よく各ＰＥ
にＣＰＵ１より情報を伝送することであるから、ＰＥに
連接する通信ポートは情報伝送能力が許すかぎり可能な
だけ多くすることが好ましい。この目的でＤＭＡチャネ
ル７０が設けられ、演算処理部６６に負荷をかけること
なくデュアルポートメモリ６８または、バッファ６９に
転送格納された情報を接続先ＰＥに配分伝送する。バッ
ファ６９は接続先のＰＥごとに設け、伝送データの格納
バッファエリアとして用いてもよく、またＰＥリンケー
ジ信号線７７，入力ＦＩＦＯメモリ経由で接続先ＰＥよ
り情報を取り込む際のＤＭＡチャネル７０の転送先とし
て用いてもよい。FIG. 17 shows an example of the configuration of the PE group linkage unit 5 of FIG. 1, in which the memory bus A75 and the memory bus B76 are connected to the arithmetic processing unit 66, and the DMA channel 70 is connected via the DMA channel control line 79. It is connected.
An input FIFO memory 71 and an output F are provided on the memory bus B76.
A communication port including the IFO memory 72 and the port control unit 73 is connected. On the other hand, memory bus A75
Program memory 78, data / work memory 6
7, dual port memory 68, and buffer memory 6
9 is connected. CPU1 means CPU bus 9, CP
The dual port memory 68 is connected via the U-bus interface signal line 74, and the program and data to be loaded into the PE are written from the CPU 1 to the dual port memory 74. What is each PE 16 that constitutes the PE group 4?
They are connected via an E linkage signal line 77. The information written in the dual port memory 68 is DMA
It is written in the output FIFO memory 72 under the control of the channel 70 and transmitted from the port control unit 73 to each PE to which it is connected. The main purpose of the PE group linkage section 5 is to efficiently use each PE.
Since the information is transmitted from the CPU 1, it is preferable to increase the number of communication ports connected to the PE as much as the information transmission capability allows. A DMA channel 70 is provided for this purpose, and the information transferred and stored in the dual port memory 68 or the buffer 69 is distributed and transmitted to the connection destination PEs without imposing a load on the arithmetic processing unit 66. The buffer 69 may be provided for each connection destination PE and used as a storage buffer area for transmission data. Also, the transfer destination of the DMA channel 70 at the time of fetching information from the connection destination PE via the PE linkage signal line 77 and the input FIFO memory. You may use as.

【００４３】図１におけるＰＥ接続交換スイッチ網７は
各ＰＥ群４に含まれる各ＰＥ間の接続を任意に制御して
データ伝送を行なうための接続交換スイッチ網である。
各ＰＥ群４とはＰＥ群間リンケージ信号線１１で接続さ
れるが、最終的には同じく図２に示されるように、ＰＥ
群間リンケージ信号線１１はＰＥ接続交換スイッチ網１
２を介して各ＰＥに交換接続される。The PE connection exchange switch network 7 in FIG. 1 is a connection exchange switch network for arbitrarily controlling the connection between the PEs included in each PE group 4 to perform data transmission.
Each PE group 4 is connected by a PE group linkage signal line 11, but finally, as shown in FIG.
The inter-group linkage signal line 11 is the PE connection exchange switch network 1.
2 is exchange-connected to each PE.

【００４４】図１８に、ＰＥ接続交換スイッチ網７の構
造を示す。ＰＥ群接続交換スイッチ網７の構造は図２に
おけるＰＥ接続交換スイッチ網１８と同一の構造を持
つ。ただしＰＥ群接続交換スイッチ網７の場合にはＰＥ
接続交換スイッチ網１８と異なり、ＣＰＵ１より各ＰＥ
にプログラム及びデータを転送する必要はないから、Ｃ
ＰＵ−ＰＥ間リンケージ信号線１０は省略してよい。Ｐ
Ｅ群接続交換スイッチ網７の接続状態制御は、ＰＥ群接
続交換スイッチ網制御信号線１３を経由してスイッチ網
／同期制御部６よりＣＰＵ１の指示に従い実施される。
図１８の装置をもってすると、図２のＰＥ群において各
ＰＥに対して実施した接続交換制御と同一の制御をＰＥ
群に対して実行することができる。すなわち、図１に示
されたＰＥ群集合体はその内部に２階層に階層化された
接続制御可能なＰＥのグループを保有しているのであ
る。図１８のＰＥ群接続交換スイッチ網では、ＰＥ群を
接続して複数のＰＥ群より構成される図１のごときＰＥ
群集合体を構築すると同時にＰＥ群集合体間リンケージ
信号線１４を介してさらに複数のＰＥ群集合体を接続す
ることができる。ＰＥ群内におけるＰＥ間の接続は今日
の集積技術、実装技術をもってすれば、同一のプリント
板上で実現可能であるが、ＰＥ群，ＰＥ群集合体、さら
に複数のＰＥ群集合体により構成される並列処理装置と
ＰＥグループの階層が上がるに従い、接続交換スイッチ
網に接続する信号線の距離が長くなる。図１８における
ＥＯ／ＯＥ変換部８０は接続距離のかかる増大に対処す
るために長距離伝送部を光伝送に変換し、伝送速度の低
下を防止するためのものである。FIG. 18 shows the structure of the PE connection exchange switch network 7. The PE group connection exchange switch network 7 has the same structure as the PE connection exchange switch network 18 in FIG. However, in the case of PE group connection exchange switch network 7, PE
Unlike the connection exchange switch network 18, each PE from the CPU 1
There is no need to transfer programs and data to
The PU-PE linkage signal line 10 may be omitted. P
The connection state control of the E group connection switching switch network 7 is performed by the switch network / synchronization control unit 6 via the PE group connection switching switch network control signal line 13 in accordance with an instruction from the CPU 1.
With the device shown in FIG. 18, the same control as the connection exchange control performed for each PE in the PE group shown in FIG.
Can be performed on a group. That is, the PE group aggregate shown in FIG. 1 has a group of PEs capable of connection control which is hierarchically divided into two layers. In the PE group connection exchange switch network of FIG. 18, the PE groups are connected to each other and configured by a plurality of PE groups.
At the same time that the group assembly is constructed, a plurality of PE group assemblies can be connected via the inter-PE group assembly linkage signal line 14. The connection between PEs within a PE group can be realized on the same printed board with today's integration technology and mounting technology, but it is composed of PE groups, PE group aggregates, and multiple PE group aggregates in parallel. As the hierarchy of the processing device and the PE group increases, the distance of the signal line connected to the connection exchange switch network becomes longer. The EO / OE conversion unit 80 in FIG. 18 converts the long-distance transmission unit into optical transmission in order to cope with the increase in the connection distance, and prevents a decrease in transmission speed.

【００４５】図１９に示すのは、ＰＥ群集合体を複数接
続して構成した並列演算装置（または並列演算機構）の
構成例である。ＰＥ群集合体（（１）〜（２のｌ乗−
１））８１の詳細は図１に示す通りであり、外部インタ
ーフェイス部３を介して外部インターフェイス信号線８
でリンケージバス８６と接続する。ここで、前記ｌは英
文字エルの小文字を表す。制御計算機８２はＰＥ群集合
体内のＣＰＵ１に指令を出すほか、ビデオ端末８４を介
してプログラム開発、マンマシンコミュニケーション、
並列演算装置全体の動作状況表示を実施する。また、制
御計算機８２の動作記録をプリンタ８３に記録させるこ
ともできる。ＰＥ群集合体８１間はＰＥ群集合体間リン
ケージ信号線１４を経由してＰＥ群集合体接続交換スイ
ッチ網８５により接続交換される。FIG. 19 shows a configuration example of a parallel computing device (or parallel computing mechanism) configured by connecting a plurality of PE group aggregates. PE group aggregate ((1) to (2 to the power 1-
The details of 1)) 81 are as shown in FIG. 1, and the external interface signal line 8 is provided via the external interface section 3.
To connect with the linkage bus 86. Here, 1 represents the lowercase letter of the English letter L. The control computer 82 issues a command to the CPU 1 in the PE group assembly, and also executes program development, man-machine communication, through the video terminal 84.
The operation status of the entire parallel computing device is displayed. Further, the operation record of the control computer 82 can be recorded in the printer 83. The PE group aggregates 81 are connected and exchanged by the PE group aggregate connection exchange switch network 85 via the PE group aggregate linkage signal line 14.

【００４６】図２０に、図１９のＰＥ群集合体接続交換
スイッチ網８５の構成例を示す。ＰＥ群集合体接続交換
スイッチ網８８の構造は、図２におけるＰＥ接続交換ス
イッチ網１８、及び図１８におけるＰＥ群接続交換スイ
ッチ網７と同一構造であり、同一の接続交換制御機能を
有するものである。ＰＥ群集合体接続交換スイッチ網制
御信号線８９は図１８におけるＰＥ群接続交換スイッチ
網制御信号線と同一のものである。従って、図１９に示
すＰＥ群集合体８１のいずれかに接続し、該ＰＥ群集合
体に対するＰＥ接続交換スイッチ網１８と同様に制御す
ればよい。ＰＥ群集合体接続交換スイッチ網８５に接続
される並列演算装置間リンケージ信号線８７は、複数の
ＰＥ群集合体よりなる図１９のごとき並列演算装置をさ
らに連接交換するために使用する。並列演算装置間リン
ケージ信号線８７はＰＥ群集合体間リンケージ信号線１
４と論理的にまったく同一の構造を有する。なおＥＯ／
ＯＥ変換部はＰＥ群集合体間の信号伝送の距離が長くな
るため、伝送性能の劣化を防ぐために光信号による伝送
に変換するための変換器である。FIG. 20 shows a configuration example of the PE group aggregate connection / switching switch network 85 of FIG. The PE group aggregate connection switching switch network 88 has the same structure as the PE connection switching switch network 18 in FIG. 2 and the PE group connection switching switch network 7 in FIG. 18, and has the same connection switching control function. .. The PE group aggregate connection exchange switch network control signal line 89 is the same as the PE group connection exchange switch network control signal line in FIG. Therefore, it may be connected to any one of the PE group aggregates 81 shown in FIG. 19 and controlled similarly to the PE connection switching network 18 for the PE group aggregates. The linkage signal line 87 between parallel arithmetic units connected to the PE group aggregate connecting / switching switch network 85 is used for further connecting and exchanging the parallel arithmetic units as shown in FIG. The linkage signal line 87 between parallel processing devices is the linkage signal line 1 between PE group aggregates.
It has the same structure as 4 logically. EO /
The OE converter is a converter for converting to transmission by an optical signal in order to prevent deterioration of transmission performance because the distance of signal transmission between PE group aggregates becomes long.

【００４７】図２１に、図１におけるスイッチ網／同期
制御部６の同期制御機能を示した。本発明になる並列演
算装置は、処理フェーズ毎に同期しながら演算を行なう
ことを特徴としているので、各ＰＥは同期して処理フェ
ーズを開始する必要があり、ＰＥ同期指令９３をスイッ
チ網／同期制御部６がフェーズ開始時点毎に各ＰＥに対
して送信する。同期信号の発生方法は、外部同期信号線
９６の信号をそのまま用いてもよく、また同期タイマー
設定値９１によりプログラマブルタイマー９４を設定
し、周期的にＰＥ同期指令９３を発生してもよい。ここ
で同期タイマー設定値９１は、各ＰＥが１フェーズの処
理を終了するに必要な最も長い時間以上にＣＰＵ１より
指定することができる。とくに各ＰＥのフェーズ毎の処
理時間が変動する場合には、各ＰＥの処理終了信号９０
をＡＮＤ論理９５に入力し、すべてのＰＥの処理終了が
成り立った時点でＰＥ同期指令９３を出力する必要があ
る。図２１では、上記３種類の同期方法を同期方式選択
信号９２により選択可能としているが、３種類の同期方
法のうち１種または２種のみをスイッチ網／同期制御部
６に持たせてもよい。なお同期方式選択信号９２はＣＰ
Ｕ１または制御計算機８２より設定してもよい。FIG. 21 shows the synchronization control function of the switch network / synchronization control unit 6 in FIG. Since the parallel arithmetic device according to the present invention is characterized by performing arithmetic operations in synchronization with each processing phase, it is necessary for each PE to synchronously start the processing phase, and the PE synchronization command 93 is transmitted to the switch network / synchronization. The control unit 6 transmits to each PE at each phase start time. As a method of generating the synchronization signal, the signal of the external synchronization signal line 96 may be used as it is, or the PE synchronization command 93 may be periodically generated by setting the programmable timer 94 by the synchronization timer setting value 91. Here, the synchronization timer set value 91 can be specified by the CPU 1 for the longest time required for each PE to finish the processing of one phase. Especially when the processing time for each phase of each PE changes, the processing end signal 90 of each PE
Must be input to the AND logic 95, and the PE synchronization command 93 must be output when the processing of all PEs is completed. In FIG. 21, the three types of synchronization methods are selectable by the synchronization method selection signal 92, but the switch network / synchronization control unit 6 may have only one or two types of the three types of synchronization methods. .. The synchronization method selection signal 92 is CP
It may be set from U1 or the control computer 82.

【００４８】図２２に、各ＰＥの処理終了信号９０によ
り同期指令９３を発生する場合のタイムチャートを示
す。ＰＥ０からＰＥｎまでの全ＰＥの処理が終了する
と、スイッチ網／同期制御部６がＣＰＵ１に報告する。
この報告に基づき交換接続スイッチ網の接続変更が必要
な場合にはスイッチ網／同期制御部６に変更指令を発す
る。接続変更がない場合には直に次の処理フェーズの開
始を各ＰＥに指令する。FIG. 22 shows a time chart when the synchronization command 93 is generated by the processing end signal 90 of each PE. When the processing of all PEs from PE0 to PEn is completed, the switch network / synchronization control unit 6 reports to the CPU1.
If it is necessary to change the connection of the switching connection switch network based on this report, a change command is issued to the switch network / synchronization control unit 6. When there is no change in connection, each PE is directly instructed to start the next processing phase.

【００４９】図２３は、図２のＰＥ群のプリント板１０
０上への実装例を示したものである。プロセッサ９７は
高速処理能力のあるものがよく、例えばＤＳＰ（ディジ
タル・シグナル・プロセッサを採用する。図３に示すＰ
Ｅのうち、演算処理部１９，複数のＤＭＡチャネル２
４，入力ＦＩＦＯ２５，出力ＦＩＦＯ２６、及び制御ポ
ート２７より構成される複数の通信ポートを１チップに
集積化したＤＳＰをプロセッサ９７として使用すること
によりＰＥ間のリンケージに関わる周辺回路を大幅に省
略することができる。メモリ９８は、高集積かつ高速で
あるものが適しており、現在の技術水準では４Ｍビット
ＳＲＡＭが好ましい。スイッチ網素子９９はプロセッサ
間のリンケージ信号線を最短距離で結び、かつ自由に接
続を外部よりプログラムで制御できなくてはならない。
この目的から図５に対応する接続交換スイッチ網を集積
回路化し、スイッチ部の信号の伝搬遅延を最小化する。
また集積化により同時に回路の小型化を実現して１枚の
プリント板上に極力多数のＰＥを配置することにより、
ＰＥ相互間のリンケージ性能を向上させる。プロセッサ
９７の配置は相互間の距離が最短になるように、スイッ
チ網素子９９を取り囲んでプリント板中央部に集中させ
ている。プリント板接続部１０１は、図１２のＣＰＵ−
ＰＥ群間リンケージ信号線１０，ＰＥ群間リンケージ信
号線１１、及びＰＥ接続交換スイッチ網制御信号線１２
をプリント板外に引きだし、他のプリント板と接続する
ためのものである。ここで言う他のプリント板とは、自
分以外のＰＥ群，ＰＥ群リンケージ部，ＰＥ群接続交換
スイッチ網、及びスイッチ網／同期制御部である。FIG. 23 shows a printed board 10 of the PE group shown in FIG.
0 shows an example of implementation on 0. The processor 97 preferably has a high-speed processing capability, for example, a DSP (digital signal processor is adopted. P shown in FIG. 3 is used.
Of E, the arithmetic processing unit 19 and the plurality of DMA channels 2
4, the peripheral circuit related to the linkage between PEs is largely omitted by using as the processor 97 a DSP in which a plurality of communication ports composed of an input FIFO 25, an output FIFO 26, and a control port 27 are integrated on one chip. You can A highly integrated and high-speed memory is suitable for the memory 98, and 4 Mbit SRAM is preferable in the current state of the art. The switch network element 99 must connect the linkage signal lines between the processors in the shortest distance, and can freely control the connection from the outside by a program.
For this purpose, the connection exchange switch network corresponding to FIG. 5 is integrated into a circuit to minimize the signal propagation delay of the switch section.
Moreover, by realizing the miniaturization of the circuit at the same time by integration and arranging as many PEs as possible on one printed board,
Improves linkage performance between PEs. The processors 97 are arranged so as to be concentrated in the central portion of the printed board by surrounding the switch network elements 99 so that the distance between them is minimized. The printed board connecting portion 101 is the CPU-in FIG.
PE group linkage signal line 10, PE group linkage signal line 11, and PE connection exchange switch network control signal line 12
Is to be pulled out of the printed board and connected to another printed board. The other printed boards referred to here are a PE group other than itself, a PE group linkage unit, a PE group connection / switch network, and a switch network / synchronization control unit.

【００５０】図２５は、図１８のＰＥ群接続交換スイッ
チ網及び図２０のＰＥ群集合体接続交換スイッチ網のプ
リント板への実装例を示したものである。ＰＥ群及びＰ
Ｅ群集合体の外部に対するリンケージ信号線の論理構造
は、ＰＥの場合と同一であり、ＦＩＦＯ間の接続であ
る。ＰＥ群及びＰＥ群集合体の外部から見た論理仕様で
はＰＥの場合と同一となるため、図２のＰＥ接続交換ス
イッチ網と同一のプリント板上に実装を行なうことがで
きる。すなわち、図２３のスイッチ網素子９９をそのま
ま適用し、プロセッサ９７及びメモリ素子９８を除去す
ると同時にプリント板の外部との信号インターフェース
による電気信号レベルの低下を保証するために必要に応
じてドライバ部１０２をプリント板外部との信号接続を
行なうプリント板接続部１０１との間に設置する。ＰＥ
群集合体接続交換スイッチ網の場合には、ユニット構造
をとるＰＥ群集合体間の接続であるため距離が長くな
る。必要に応じプリント板接続部１０１の先にＥ／Ｏ変
換器，Ｏ／Ｅ変換器を接続してもよい。プリント板接続
部１０１は、ＰＥ群間リンケージ信号線１１あるいはＰ
Ｅ群集合体間リンケージ信号線１４、及びＰＥ接続交換
スイッチ網制御信号線１２をプリント板外に引きだし、
他のプリント板またはユニットと接続するためのもので
ある。ここで言う他のプリント板またはユニットとは、
接続先のＰＥ群または、接続先のＰＥ群集合体である。FIG. 25 shows an example of mounting the PE group connection exchange switch network of FIG. 18 and the PE group aggregate connection exchange switch network of FIG. 20 on a printed board. PE group and P
The logical structure of the linkage signal line to the outside of the E group aggregate is the same as that of the PE, and is the connection between the FIFOs. Since the logical specifications of the PE group and the PE group assembly from the outside are the same as those of the PE, they can be mounted on the same printed circuit board as the PE connection exchange switch network of FIG. That is, the switch network element 99 of FIG. 23 is applied as it is, the processor 97 and the memory element 98 are removed, and at the same time, the driver unit 102 is necessary as necessary to guarantee the reduction of the electric signal level due to the signal interface with the outside of the printed board. Is installed between the board and the printed board connecting portion 101 for signal connection with the outside of the board. PE
In the case of a group aggregate connection exchange switch network, the distance is long because it is a connection between PE group aggregates having a unit structure. If necessary, an E / O converter and an O / E converter may be connected to the end of the printed board connecting portion 101. The printed board connecting portion 101 is connected to the PE group linkage signal line 11 or P.
The linkage signal line 14 between the E group aggregates and the PE connection exchange switch network control signal line 12 are drawn out of the printed board,
It is for connecting to another printed board or unit. The other printed boards or units mentioned here are
The PE group of the connection destination or the PE group aggregate of the connection destination.

【００５１】図２４は、本発明になる並列演算装置に含
まれるＰＥ間の関係を概念的に示した図である。ＰＥは
３段階に階層化され、ＰＥを相互に連接したものがＰＥ
群であり、１つのＰＥ群は全体として１つのＰＥと同様
の外部リンケージ信号線を持っている。ＰＥ群を複数集
め、相互に連接したものがＰＥ群集合体である。さらに
ＰＥ群集合体を複数集め、相互に連接したものが並列演
算装置（または並列演算機構とも称す）である。FIG. 24 is a diagram conceptually showing the relationship between PEs included in the parallel arithmetic unit according to the present invention. PEs are layered in 3 stages, and PEs are connected to each other.
One PE group has the same external linkage signal line as one PE as a whole. A PE group aggregate is formed by collecting a plurality of PE groups and connecting them to each other. Further, a group of a plurality of PE group aggregates that are connected to each other is a parallel computing device (also referred to as a parallel computing mechanism).

【００５２】すなわち、並列演算装置に対するＰＥ群集
合体の論理的関係，ＰＥ群集合体に対するＰＥ群の論理
的関係，ＰＥ群に対するＰＥの論理的関係はすべて同一
構造である。この関係（階層レベル／接続する要素／要
素間リンケージ）は、１）並列演算装置／ＰＥ群集合体／ＰＥ群集合体間リン
ケージ信号線２）ＰＥ群集合体／ＰＥ群／ＰＥ群間リンケージ信号線３）ＰＥ群／ＰＥ／ＰＥ間リンケージ信号線となる。このような論理構造を持たすことにより、複数
のＰＥを３階層を越えて積み重ねて接続することにつき
論理的制約はなく、事実上、任意の規模の並列演算装置
を実現することが可能となる。とくに、本発明になる並
列演算装置ではＰＥ間のリンケージにつき、上位階層を
経る度合が少ないほどリンケージ信号線の実装距離、及
び経由する論理ゲート段数が少なく、高速大容量のデー
タ伝送が可能であり、論理的に密接な関係のＰＥ間から
論理的に疎遠な関係のＰＥ間まで段階的に情報伝送能力
を割り当てることが可能である。That is, the logical relationship of the PE group aggregate to the parallel computing device, the logical relationship of the PE group to the PE group aggregate, and the logical relationship of the PE to the PE group are all the same structure. This relationship (hierarchical level / connected elements / linkage between elements) is 1) parallel arithmetic unit / PE group aggregate / PE group aggregate linkage signal line 2) PE group aggregate / PE group / PE group linkage signal line 3) It becomes a linkage signal line between PE group / PE / PE. By having such a logical structure, there is no logical restriction in stacking and connecting a plurality of PEs over three layers, and it is possible to realize a parallel arithmetic device of practically any scale. In particular, in the parallel computing device according to the present invention, as regards the linkage between PEs, the smaller the degree of passing through the upper hierarchy, the smaller the mounting distance of the linkage signal line and the number of logic gate stages to pass through, which enables high-speed and large-capacity data transmission. It is possible to allocate the information transmission capacity stepwise from PEs that are in a logically close relationship to PEs that are in a logically distant relationship.

【００５３】以上述べたように、複数のプロセッサ間を
ＦＩＦＯにより接続し、かつ交替バッファ構造の受信用
メモリ領域と受信用ＦＩＦＯ間をＤＭＡ転送とするの
で、送受信両プロセッサともプロセッサ間の転送データ
についてメモリサイクルで読み書きが可能となり、プロ
セッサ間のデータ伝送速度，効率が向上する。この結
果、該並列演算装置の処理能力が向上する。さらに、プ
ロセッサ間の接続状態を接続交換スイッチ網により任意
に変更できるので、広汎な用途の処理内容に適応して効
率のよい演算を実施することができる。As described above, since a plurality of processors are connected by the FIFO and DMA transfer is performed between the receiving memory area of the alternate buffer structure and the receiving FIFO, the transfer data between the processors of both the transmitting and receiving processors Data can be read and written in memory cycles, improving the data transmission speed and efficiency between processors. As a result, the processing capability of the parallel computing device is improved. Further, since the connection state between the processors can be arbitrarily changed by the connection exchange switch network, it is possible to perform efficient calculation by adapting to the processing contents of a wide range of applications.

【００５４】[0054]

【発明の効果】本発明によれば、プロセッサ間のデータ
伝送速度及び効率を向上させる並列処理装置及び方法を
提供することができる。According to the present invention, it is possible to provide a parallel processing apparatus and method for improving the data transmission rate and efficiency between processors.

【００５５】また、本発明によれば、演算装置の汎用性
を確保するためにプロセッサ間のデータ伝送路構造に柔
軟性を付与した並列処理装置及び方法を提供することが
できる。Further, according to the present invention, it is possible to provide a parallel processing apparatus and method in which flexibility is imparted to a data transmission path structure between processors in order to ensure versatility of an arithmetic unit.

[Brief description of drawings]

【図１】本発明の一実施例のＰＥ群集合体の構成図であ
る。FIG. 1 is a configuration diagram of a PE group aggregate according to an embodiment of the present invention.

【図２】本発明の一実施例のＰＥ群の構成図である。FIG. 2 is a configuration diagram of a PE group according to an embodiment of the present invention.

【図３】本発明の一実施例のＰＥの内部構成図である。FIG. 3 is an internal configuration diagram of a PE according to an embodiment of the present invention.

【図４】本発明の一実施例のＰＥの内部構成の他の例を
示した図である。FIG. 4 is a diagram showing another example of the internal configuration of the PE according to the embodiment of the present invention.

【図５】本発明の一実施例のＰＥ接続交換スイッチ網の
構成図である。FIG. 5 is a configuration diagram of a PE connection exchange switch network according to an embodiment of the present invention.

【図６】図５のスイッチ素子の機能の一例を示した図で
ある。FIG. 6 is a diagram showing an example of the function of the switch element of FIG.

【図７】図５のスイッチ素子の構成の一例を示した図で
ある。FIG. 7 is a diagram showing an example of the configuration of the switch element of FIG.

【図８】本発明の一実施例の隣接したＰＥ間の接続構成
図である。FIG. 8 is a connection configuration diagram between adjacent PEs according to an embodiment of the present invention.

【図９】図８に示された構成の動作の一例を示したタイ
ムチャートである。9 is a time chart showing an example of the operation of the configuration shown in FIG.

【図１０】本発明の一実施例のＰＥの接続実現例を示し
た図である。FIG. 10 is a diagram showing a connection implementation example of a PE according to an embodiment of the present invention.

【図１１】本発明の一実施例のＰＥ接続交換スイッチ網
(ａその１)の接続機能図である。FIG. 11 is a PE connection exchange switch network according to an embodiment of the present invention.
It is a connection function diagram of (a 1).

【図１２】本発明の一実施例のＰＥ接続交換スイッチ網
(ａその２)の接続機能図である。FIG. 12 is a PE connection exchange switch network according to an embodiment of the present invention.
It is a connection function diagram of (a2).

【図１３】本発明の一実施例のＰＥ接続交換スイッチ網
(ａその３)の接続機能図である。FIG. 13 is a PE connection exchange switch network according to an embodiment of the present invention.
It is a connection function diagram of (a3).

【図１４】本発明の一実施例のＰＥ接続交換スイッチ網
(ｂその１)の接続機能図である。FIG. 14 is a PE connection exchange switch network according to an embodiment of the present invention.
It is a connection function diagram of (b 1).

【図１５】本発明の一実施例のＰＥ接続交換スイッチ網
(ｂその２)の接続機能図である。FIG. 15 is a PE connection exchange switch network according to an embodiment of the present invention.
It is a connection function diagram of (b 2).

【図１６】本発明の一実施例のＰＥ接続交換スイッチ網
(ｂその３)の接続機能図である。FIG. 16 is a PE connection exchange switch network according to an embodiment of the present invention.
It is a connection function diagram of (b 3).

【図１７】本発明の一実施例のＰＥ群リンケージ部の接
続構成図である。FIG. 17 is a connection configuration diagram of a PE group linkage unit according to an embodiment of the present invention.

【図１８】本発明の一実施例のＰＥ群接続交換スイッチ
網の構成図である。FIG. 18 is a configuration diagram of a PE group connection exchange switch network according to an embodiment of the present invention.

【図１９】本発明の一実施例の並列演算装置の構成図で
ある。FIG. 19 is a configuration diagram of a parallel arithmetic device according to an embodiment of the present invention.

【図２０】図１９のＰＥ群集合体接続交換スイッチ網の
構成例を示す図である。20 is a diagram showing a configuration example of the PE group aggregate connection / switching switch network of FIG. 19;

【図２１】本発明の一実施例の同期制御機能の一例を示
す図である。FIG. 21 is a diagram showing an example of a synchronization control function according to an embodiment of the present invention.

【図２２】図２１の同期制御のタイムチャートの一例を
示す図である。22 is a diagram showing an example of a time chart of the synchronization control shown in FIG.

【図２３】本発明の一実施例におけるＰＥ群のプリント
板への実装例を示す図である。FIG. 23 is a diagram showing an example of mounting a PE group on a printed board according to an embodiment of the present invention.

【図２４】本発明の一実施例の並列演算装置の階層構造
例を示す概念図である。FIG. 24 is a conceptual diagram showing an example of a hierarchical structure of a parallel arithmetic device in one embodiment of the present invention.

【図２５】本発明の一実施例におけるＰＥ群接続交換ス
イッチ網及びＰＥ群集合体接続交換スイッチ網のプリン
ト板への実装例を示す図である。FIG. 25 is a diagram showing an example of mounting a PE group connection exchange switch network and a PE group aggregate connection exchange switch network on a printed board according to an embodiment of the present invention.

[Explanation of symbols]

１…ＣＰＵ、２…メモリ、３…外部インターフェイス
部、４…ＰＥ群、５…ＰＥ群リンケージ部、６…スイッ
チ網／同期制御部、７…ＰＥ群接続交換スイッチ網、８
…外部インターフェイス信号線、９…ＣＰＵバス、１０
…ＣＰＵ−ＰＥ群間リンケージ信号線、１１…ＰＥ群間
リンケージ信号線、１２…ＰＥ接続交換スイッチ網制御
信号線、１３…ＰＥ群接続交換スイッチ網制御信号線、
１４…ＰＥ群集合体間リンケージ信号線、１５…外部同
期信号線、１６…ＰＥ、１７…ＰＥ間リンケージ信号
線、１８…ＰＥ接続交換スイッチ網、１９…演算処理
部、２０…プログラムメモリ、２１…データ／ワークメ
モリ、２２…入力バッファＡ、２３…入力バッファＢ、
２４…ＤＭＡチャネル、２５…入力ＦＩＦＯ、２６…出
力ＦＩＦＯ、２７…入出力ポート制御部、２８…メモリ
バスＡ、２９…メモリバスＢ、３０…プロセッサエレメ
ント間リンケージ信号線、３１…ＤＭＡチャネル制御信
号線、３２…メモリバスＣ、３３…スイッチ網部、３４
…切り替え制御部、３５…接続状態メモリ、３６…スイ
ッチ素子、３７…接続交換スイッチ網入力信号線、３８
…接続交換スイッチ網出力信号線、３９…スイッチ素子
入力信号線、４０…スイッチ素子出力信号線、４１…ス
イッチ素子制御信号線、４２…ＡＮＤゲート、４３…Ｏ
Ｒゲート、４４…ＮＯＴゲート、４５…ＰＥ（ａ）、４
６…演算プロセッサ（ａ）、４７…入力バッファａ−
Ａ、４８…入力バッファａ−Ｂ、４９…入力ＦＩＦＯ
（ａ）、５０…出力ＦＩＦＯ（ａ）、５１…ＤＭＡチャ
ネル（ａ）、５２…通信ポート出力（ａ）、５３…ＰＥ
（ｂ）、５４…演算プロセッサ（ｂ）、５５…入力バッ
ファｂ−Ａ、５６…入力バッファｂ−Ｂ、５７…入力Ｆ
ＩＦＯ（ｂ）、５８…出力ＦＩＦＯ（ｂ）、５９…ＤＭ
Ａチャネル（ｂ）、６０…通信ポート入力（ｂ）、６１
…通信ポート出力（ｂ）、６２…接続交換スイッチ網、
６３…一重線、６４…二重線、６５…点線、６６…演算
処理部、６７…データ／ワークメモリ、６８…デュアル
ポートメモリ、６９…バッファメモリ、７０…ＤＭＡチ
ャネル、７１…入力ＦＩＦＯ、７２…出力ＦＩＦＯ、７
３…ポート制御部、７４…ＣＰＵバスインタフェース信
号線、７５…メモリバスＡ、７６…メモリバスＢ、７７
…ＰＥリンケージ信号線、７８…プログラムメモリ、７
９…ＤＭＡチャネル制御線、８０…ＥＯ／ＯＥ変換部、
８１…ＰＥ群集合体、８２…制御計算機、８３…プリン
タ、８４…ビデオ端末、８５…ＰＥ群集合体接続交換ス
イッチ網、８６…リンケージバス、８７…並列演算装置
間リンケージ信号線、８８…ＰＥ群集合体接続交換スイ
ッチ網、８９…ＰＥ群集合体接続交換スイッチ網制御信
号線、９０…ＰＥ処理終了信号、９１…同期タイマー設
定値、９２…同期方式選択信号、９３…ＰＥ同期指令、
９４…プログラマブルタイマ、９５…ＡＮＤ論理、９６
…外部同期信号線、９７…プロセッサ、９８…メモリ素
子、９９…スイッチ網素子、１００…プリント板、１０
１…プリント板接続部、１０２…スイッチ網ドライバ
部。1 ... CPU, 2 ... Memory, 3 ... External interface section, 4 ... PE group, 5 ... PE group linkage section, 6 ... Switch network / synchronization control section, 7 ... PE group connection / switching switch network, 8
... external interface signal line, 9 ... CPU bus, 10
... CPU-PE group linkage signal line, 11 ... PE group linkage signal line, 12 ... PE connection exchange switch network control signal line, 13 ... PE group connection exchange switch network control signal line,
Reference numeral 14 ... PE group aggregate linkage signal line, 15 ... External synchronization signal line, 16 ... PE, 17 ... PE linkage signal line, 18 ... PE connection exchange switch network, 19 ... Arithmetic processing unit, 20 ... Program memory, 21 ... Data / work memory, 22 ... Input buffer A, 23 ... Input buffer B,
24 ... DMA channel, 25 ... Input FIFO, 26 ... Output FIFO, 27 ... I / O port controller, 28 ... Memory bus A, 29 ... Memory bus B, 30 ... Processor element linkage signal line, 31 ... DMA channel control signal Line, 32 ... memory bus C, 33 ... switch network part, 34
... switching control unit, 35 ... connection state memory, 36 ... switch element, 37 ... connection exchange switch network input signal line, 38
Connection switching switch network output signal line, 39 switch element input signal line, 40 switch element output signal line, 41 switch element control signal line, 42 AND gate, 43 O
R gate, 44 ... NOT gate, 45 ... PE (a), 4
6 ... Arithmetic processor (a), 47 ... Input buffer a-
A, 48 ... Input buffer a-B, 49 ... Input FIFO
(A), 50 ... Output FIFO (a), 51 ... DMA channel (a), 52 ... Communication port output (a), 53 ... PE
(B), 54 ... Arithmetic processor (b), 55 ... Input buffer b-A, 56 ... Input buffer b-B, 57 ... Input F
IFO (b), 58 ... Output FIFO (b), 59 ... DM
A channel (b), 60 ... Communication port input (b), 61
... communication port output (b), 62 ... connection exchange switch network,
63 ... Single line, 64 ... Double line, 65 ... Dotted line, 66 ... Arithmetic processing section, 67 ... Data / work memory, 68 ... Dual port memory, 69 ... Buffer memory, 70 ... DMA channel, 71 ... Input FIFO, 72 … Output FIFO, 7
3 ... Port control unit, 74 ... CPU bus interface signal line, 75 ... Memory bus A, 76 ... Memory bus B, 77
... PE linkage signal line, 78 ... Program memory, 7
9 ... DMA channel control line, 80 ... EO / OE converter,
81 ... PE group aggregate, 82 ... Control computer, 83 ... Printer, 84 ... Video terminal, 85 ... PE group aggregate connection / switch network, 86 ... Linkage bus, 87 ... Parallel processing unit linkage signal line, 88 ... PE group aggregate Connection exchange switch network, 89 ... PE group assembly connection exchange switch network control signal line, 90 ... PE processing end signal, 91 ... synchronization timer set value, 92 ... synchronization method selection signal, 93 ... PE synchronization instruction,
94 ... Programmable timer, 95 ... AND logic, 96
... external synchronization signal line, 97 ... processor, 98 ... memory element, 99 ... switch network element, 100 ... printed board, 10
1 ... Printed board connection part, 102 ... Switch network driver part.

Claims

[Claims]

1. An arithmetic processing unit, an input FIFO memory provided on a signal input side, and an output FI provided on a signal output side.
A parallel processing device including a processor element having an FO memory, comprising a plurality of processor elements, and including a processor element group for performing data transmission between the processor elements via a processor element connection switching network. Parallel processing device.

2. The processor element according to claim 1, wherein the processor element is different from a first bus connecting the arithmetic processing unit and the input FIFO memory and / or the output FIFO memory, and a first bus different from the first bus. The parallel processing apparatus further comprising: an input buffer connected to the arithmetic processing unit by a second bus; and means for performing DMA transfer of data between the input buffer and the FIFO memory.

3. A parallel structure comprising a plurality of processor element groups according to claim 1, and including a processor element group aggregate for performing data transmission between the processor element groups via a processor element group connection switching network. Processing equipment.

4. The parallel processing device according to claim 1, wherein the processor element group aggregate has a management processor for controlling the processor element group connection switching network.

5. A parallel arrangement comprising a plurality of processor element group aggregates according to claim 4, and connecting the processor element group aggregates with each other using a linkage signal line between the processor element group aggregates for transmitting data between the processor element group aggregates. Processing equipment.

6. The parallel processing apparatus according to claim 1, wherein the processor element connection exchange switch network has a structure that enables connection between arbitrary processor elements.

7. The parallel processor according to claim 4, wherein the management processor can control a connection state of the processor element connection switching switch network via a switch network connection control unit in accordance with a processing target. Processing equipment.

8. The management processor according to claim 4, wherein the management processor can control the connection state of the processor element connection exchange switch network via a switch network connection control unit in accordance with the progress of processing. Parallel processing device.

9. A plurality of processor elements each including a computing unit having a plurality of transmission channels and a main memory,
A switch network capable of arbitrarily controlling connection of some or all of the transmission channels by external control is arranged on the same printed board, and a plurality of processors on the same printed board associate and process information with each other. Parallel processing device characterized by.

10. n pieces (n is a natural number) on the same printed board
2m units (m
Is a natural number), and m sets of switch networks that connect channels that can be one input channel and channels that can be one output channel from each processor are provided. The parallel processing device is characterized in that the input channel of the above and the output channel of an arbitrary processor can be connected in a one-to-one correspondence according to control from the outside.

11. The switch network between a plurality of processors according to claim 10, wherein the input side to the output side of the switch network are formed on the same integrated circuit, and each of the m switch networks is formed as required. The input and output channels are grouped by a specific number of bits and divided, and a set of switch networks is divided into a plurality of bit slices corresponding to the groups. A parallel processing device characterized by being integrated in an integrated circuit.

12. A system according to claim 10, wherein n is selected to be a power of 2 2k, a switch network capable of connecting to n processors is provided, and n-1 processors are arranged. Without arranging one processor, the input channel group for the non-allocated processor is connected to the output channel group for the non-allocated processor of another parallel arithmetic device having the same structure in a one-to-one correspondence, By connecting the output channel groups to the input channel groups to the non-arranged processors of another parallel arithmetic unit having the same structure to each other in a one-to-one relationship, the processors in the two parallel arithmetic units can be mutually connected. A parallel processing device having a structure.

13. A system according to claim 10, wherein n is selected to be a power of 2 2k, and a switch network capable of connecting to n processors is provided, and then n-1 processors are arranged. N sets of external connection input / outputs for an aggregate of n parallel arithmetic devices having an input channel group and an output channel group for external connection without arranging one processor A parallel processing device having a structure in which channel groups are connected by a switch network having the same structure as the processors in the parallel processing device are connected, and the processors in the n sets of parallel processing devices can be connected to each other. ..

14. The switch network to which the n parallel arithmetic devices are connected according to claim 13, wherein n−1 parallel arithmetic devices are connected and one parallel arithmetic device is not connected and the non-connection is made. N parallel computing device aggregates having an input channel group and an output channel group for the parallel computing device for external connection
To the aggregate of up to n pieces, the n sets of external connection input / output channel groups are connected by a switch network having the same structure as that of each processor in the parallel operation device, and n sets of parallel operations are connected. A parallel processing device having a structure in which processors in a device assembly can be connected to each other.

15. An arithmetic processing unit, an input FIFO memory provided on a signal input side, and an output F provided on a signal output side.
A first bus connecting the IFO memory, the arithmetic processing unit, and the FIFO memory; an input buffer connected to the arithmetic processing unit by a second bus different from the first bus; An information processing method having a processor element including means for performing DMA transfer of data to and from the FIFO memory, wherein in the case of performing data transfer between a plurality of processor elements, an output from a transmitting side arithmetic processing unit When writing data directly to the FIFO memory and transmitting information to the receiving side input FIFO memory, a pair of receiving side input buffers for DMA transfer from the receiving side input FIFO memory are prepared for each receiving side input FIFO memory, and configured as an alternate buffer memory. Then, in the first phase, one of the pair of alternate buffer memories is The DMA transfer destination from the input FIFO memory is used, and the other of the replacement buffer memories is set as a processing target data area of the reception side arithmetic processing unit. In the second phase, the other of the replacement buffer memories is transferred from the input FIFO memory. As a DMA transfer destination,
A parallel processing method, wherein one of the replacement buffer memories is used as a processing target data area of a reception side arithmetic processing unit.

16. The method according to claim 15, wherein when each of the plurality of processor elements shares a group of processing tasks, it occurs at a timing longer than the longest time required to execute one phase of processing. A synchronization signal generating means for outputting a synchronization signal to the processor element, wherein the processor element starts the processing of one phase at the time when the synchronization signal is generated from the synchronization signal generating means, and executes the next synchronization. A parallel processing method characterized in that the processing of the next phase is started when a signal is generated.

17. The synchronizing signal generating means according to claim 15, wherein the synchronizing signal generating means detects that all the processors of the processor group have completed one-phase processing, and generates the synchronizing signal based on the result. And parallel processing method.

18. The synchronization signal generating means according to claim 15, wherein the synchronization signal generating means generates the synchronization signal at a cycle longer than the longest time required for all the processors of the processor group to complete one-phase processing. A parallel processing method characterized by:

19. The parallel processing method according to claim 15, wherein said synchronization signal generating means can change the generation period of said synchronization signal in response to a command from said management processor.