JPS63124163A

JPS63124163A - Processor array

Info

Publication number: JPS63124163A
Application number: JP61270244A
Authority: JP
Inventors: Takeshi Oki; 健大木; Teiji Nishizawa; 西澤　貞次
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1986-11-13
Filing date: 1986-11-13
Publication date: 1988-05-27

Abstract

PURPOSE:To process complicated arithmetic operations having many variables at a high speed without increasing the number of input/output pins of each processor element and computing elements, by using a timing generating circuit and plural processor elements. CONSTITUTION:A timing signal of T0 is outputted to each processor element 2 from a timing generating circuit 1 in accordance with a synchronizing signal SYNC. While an element PE1 reads the input data for t0 and outputs it to the following element PE2 after arithmetic. Then a timing signal of T1 is outputted from the circuit 1 and the element PE1 outputs the input data for T1 to the PE2 after arithmetic. The PE2 outputs the input received from the PE1 to an element PE3 after arithmetic. Thus a timing signal of T2 is processed in the same way and a single piece of data is inputted to the PE1 for a period of T0-T2 and calculated by each element 2 with each delay equivalent to a single timing in terms of a pipeline. Then the results of arithmetic are successively outputted from an element PE12 set at the right edge. In such a way, the complicated arithmetic operations are carried out at a high speed while each element 2 is calculated in time division with a simple constitution.

Description

【発明の詳細な説明】産業上の利用分野本発明は情報処理分野における一次元アレイ構造を有す
るプロセッサアレイに関するものである。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to a processor array having a one-dimensional array structure in the information processing field.

従来の技術従来のプロセッサアレイとしては、米国カーネギ−メロ
ン大学のＨ、Ｔ　、　Ｋｕｎｇによって提案されている
シストリックアレイがある（例えば、Ｃ・ミード、Ｌ・
コンウェイ共著、菅野卓雄、榊裕之監訳「超ＬＳＩシス
テム入門」（昭和５６年６月３０日）。BACKGROUND ART A conventional processor array is a systolic array proposed by H. T. Kung of Carnegie Mellon University in the United States (for example, C. Mead, L.
"Introduction to Ultra LSI Systems" (June 30, 1981), co-authored by Conway, translated and supervised by Takuo Kanno and Hiroyuki Sakaki.

培風館、Ｐ２９９〜Ｐ３１８）０第３図はこの従来のプロセッサアレイの１例を示すもの
であり、ａは帯幅４の帯行列とベクトルの乗算式であり
、ｂはその乗算を行なう一次元アレイ状に接続された４
個のプロセッサエレメント３からなるプロセッサアレイ
の構成図である。Baifukan, P299-P318) 0 Figure 3 shows an example of this conventional processor array, where a is a multiplication formula for a band matrix with a band width of 4 and a vector, and b is a one-dimensional array that performs the multiplication. 4 connected in a shape
3 is a configuration diagram of a processor array including processor elements 3. FIG.

以上のように構成された従来のプロセッサアレイについ
て以下その動作を説明する。The operation of the conventional processor array configured as described above will be described below.

各プロセッサエレメントは３個のレジスタＲＡ。Each processor element has three registers RA.

Ｒｘ、Ｒアをもっており、それぞれ帯行列Ａ、ベクトル
Ｉ、ベクトルｙの内容を保持する。最初、すべてのレジ
スタの値は０である。各プロセッサエレメント内では、
まずレジスタＲＡに行列Ａの帯の中の新しい要素を、レ
ジスタＲ８に左隣のプロセッサエレメントのレジスタＲ
１の内容を、レジスタＲに右隣のプロセッサエレメント
のレジスりＲの内容を受は取る。次にＲア＋ＲＡ　ｘ　
Ｒ！の演算を行い、結果をレジスタｎアに入れる。この
ようにＡは下方へ、Ｘは右方へ、ｙは左方へプロセッサ
エレメント間を移動し、左端のプロセッサエレメントか
ら順次帯行列とベクトルの乗算結果ｙが第４図のように
出力されていく。It has Rx and Ra, and holds the contents of band matrix A, vector I, and vector y, respectively. Initially, the values of all registers are 0. Within each processor element,
First, a new element in the band of matrix A is placed in register RA, and register R of the processor element on the left is placed in register R8.
1, and the contents of register R of the processor element on the right are taken into register R. Next, R a+RA x
R! performs the calculation and stores the result in register nA. In this way, A moves downward, X moves to the right, and y moves to the left among the processor elements, and the product y of the banded matrix and vector is sequentially output from the leftmost processor element as shown in Figure 4. go.

発明が解決しようとする問題点しかしながら上記のような構成では、各サイクルにおけ
る入出力データの種類、およびプロセッサエレメント内
での演算内容は同一であるため、演算すべきデータの種
類に応じてプロセッサエレメント間の入出力ビン数とプ
ロセッサエレメント内の演算器が必要になるという問題
点を有していたＯ本発明はかかる点に鑑み、演算すべきデータの種類が多
い場合でも数少ない入出力ビン数と演算器からなるプロ
セッサエレメントを用いて構成できるプロセッサアレイ
を提供することを目的とする。Problems to be Solved by the Invention However, in the above configuration, the type of input/output data in each cycle and the content of the calculation within the processor element are the same, so the processor element In view of these problems, the present invention has the problem of requiring a small number of input/output bins and a calculation unit in the processor element, even when there are many types of data to be calculated. It is an object of the present invention to provide a processor array that can be configured using processor elements consisting of arithmetic units.

問題点を解決するための手段本発明は、入力クロックからタイミング信号（τ。〜Ｔ
ｍ）を生成し、同期信号によりタイミング信号のＴｏの
位置が設定可能なタイミング発生回路と、前段と比較し
て１タイミング遅れて同種の入力データを前段よシ読み
込み、各タイミングで異なる演算処理して次段に出力す
る一次元単方向に縦続接続されたｎ個のプロセッサエレ
メントを備えたプロセッサアレイである。Means for Solving the Problems The present invention provides a method for converting a timing signal (τ.~T
m), and the position of To of the timing signal can be set by a synchronization signal, and a timing generation circuit that reads the same type of input data from the previous stage with a delay of one timing compared to the previous stage, and performs different arithmetic processing at each timing. This is a processor array comprising n processor elements connected in cascade in one-dimensional unidirectional direction.

作　　用本発明は前記した構成により、各プロセッサエレメント
内ではタイミングごとに異なる種類のデータを読み込み
、異なる演算処理ができるため、各プロセッサエレメン
トの入出力ビン数、および演算器数を増加させることな
く、多くの変数を有する複雑な処理が高速に行なえる。Effect: With the above-described configuration, the present invention can read different types of data at different timings and perform different arithmetic processing within each processor element, without increasing the number of input/output bins and the number of arithmetic units of each processor element. , complex processing with many variables can be performed at high speed.

実施例第１図は本発明の実施例において、ｍ＝２゜ｎ；１２の
場合のプロセッサアレイの構成図を示すものである。第
１図において、１は入力クロックＣＬＫによりＴ０〜Ｔ
２のタイミング信号を生成し、同期信号５ＹＮＣにより
タイミング信号のＴｏの位置を設定できるタイミング発
生回路、２は内部タイミング七〇〜Ｔ２により異なる演
算処理をするプロセッサエレメントである。Embodiment FIG. 1 shows a configuration diagram of a processor array in the case of m=2°n;12 in an embodiment of the present invention. In FIG. 1, 1 is T0 to T according to the input clock CLK.
2 is a timing generation circuit that can generate the timing signal No. 2 and set the position of the timing signal To by the synchronization signal 5YNC; and No. 2 is a processor element that performs different arithmetic processing depending on the internal timings 70 to T2.

以上のように構成された本実施例のプロセッサアレイに
ついて、以下その動作を説明する。The operation of the processor array of this embodiment configured as described above will be described below.

まず、同期信号５ＹＮＣに合わせてタイミング発生回路
１からＴｏのタイミング信号が生成され、各プロセッサ
エレメント２に出力される。各プロセッサエレメント２
は入力データを受は取ってから演算結果を出力するまで
１タイミングの遅延が生じるため、内部タイミング信号
も１タイミングずつの遅れるように接続されている。そ
のため第２図に示すように、タイミング発生回路１から
Ｔｏのタイミング信号が生成されると、最初のプロセッ
サエレメントＰＥ、ではｔ０用の入力データを読み込み
、七〇用の演算をｔ１！どこした後、次段のプロセッサ
エレメントＰＥ２　に出力される。次に、タイミング発
生回路１からＴ１のタイミング信号が生成されると、Ｐ
Ｅ　　では嚢、用の入力データを読み込み、ｔ、用の演
算をほどこした後、ＰＥ２に演算結果を出力すると共に
、ＰＥ２ではＰＥ、がｔ。First, a timing signal To is generated from the timing generation circuit 1 in accordance with the synchronization signal 5YNC, and is output to each processor element 2. Each processor element 2
Since there is a delay of one timing between receiving input data and outputting the calculation result, the internal timing signal is also connected so as to be delayed by one timing. Therefore, as shown in FIG. 2, when the timing signal To is generated from the timing generation circuit 1, the first processor element PE reads the input data for t0 and performs the calculation for 70 at t1! After that, it is output to the next stage processor element PE2. Next, when the timing signal T1 is generated from the timing generation circuit 1, P
E reads the input data for sac, and performs the calculation for t, and then outputs the calculation result to PE2, and PE2 calculates t.

で演算した結果を入力し、ｔ０用の演算をしてＰＥ３へ
出力する。さらに、タイミング発生回路１からＴ３のタ
イミング信号が生成されると、ＰＥ　　ではｔ　のデー
タを演算し、ＰＥ２ではｔ、のデータを演算し、ＰＥ３
ではｔｏのデータが演算される。このようにＴ０〜Ｔ２
の間にＰＥ１に１つのデータが入力され、第２図に示す
ように１タイミングずつ遅れて各プロセッサエレメント
２でパイプライン的に演算処理されながら右端のＰＥ、
２から演算結果が順次出力されていく。The result of the calculation is input, the calculation for t0 is performed, and the result is output to PE3. Furthermore, when the timing signal T3 is generated from the timing generation circuit 1, PE calculates the data of t, PE2 calculates the data of t, and PE3 calculates the data of t.
Then, the data of to is calculated. Like this T0~T2
During this period, one piece of data is input to PE1, and as shown in FIG.
The calculation results are sequentially output from 2 onwards.

以上のように本実施例によれば、タイミング発生回路か
ら生成されるタイミング信号を各プロセッサエレメント
に１つずつずらせて接続することによシ、簡単なシステ
ム構成で、各プロセッサエレメントを時分割で演算させ
ながら、多くの変数を有する複雑な処理ができる。As described above, according to this embodiment, by connecting the timing signals generated from the timing generation circuit to each processor element with a shift of one, each processor element can be connected in a time-sharing manner with a simple system configuration. Complex processing with many variables can be performed while performing calculations.

なお、本実施例において、演算結果は右端のプロセッサ
エレメントＰＥ、２から得られるとしたが、各プロセッ
サエレメント内に結果を格納するレジスタを設け、各プ
ロセッサエレメントからグローパルな出力バスを通して
出力する構成にしてもよい０発明の詳細な説明したように、本発明によれば、各プロセッサエレ
メントの入出力ビン数と演算器数を増やすことなく、多
くの変数を有する複雑な処理が高速に行なうことができ
、その実用的効果は大きい。In this embodiment, it is assumed that the calculation result is obtained from the rightmost processor element PE,2, but a register for storing the result is provided in each processor element, and the configuration is such that the result is output from each processor element through a global output bus. As described in detail, according to the present invention, complex processing with many variables can be performed at high speed without increasing the number of input/output bins and the number of arithmetic units of each processor element. It can be done, and its practical effects are great.

[Brief explanation of the drawing]

第１図は本発明における一実施例のグロセッサアレイの
構成図、第２図は同実施例の動作説明図、第３図は従来
のグロセッサアレイの構成図とそのグロセッサアレイで
解くことのできる乗算式の説明図、第４図は同従来例の
動作説明図である０１・・・・−・タイミング発生回路
、２・・・・・・プロセッサエレメント。代理人の氏名　弁理士　中　尾　敏　男　ほか１名第２
図ＰＥｆ　　ＰＥ？　　ＰＥＪＰＥ４菓３図（区）FIG. 1 is a configuration diagram of a grosser array according to an embodiment of the present invention, FIG. 2 is an explanatory diagram of the operation of the same embodiment, and FIG. 3 is a configuration diagram of a conventional grosser array and how to solve the problem using the grosser array. FIG. 4 is an explanatory diagram of the operation of the conventional example. 01...--timing generation circuit, 2...-processor element. Name of agent: Patent attorney Toshio Nakao and 1 other person 2nd
Figure PEf PE? PEJPE4 Ka 3 (ku)

Claims

[Claims]

A timing generation circuit that generates a timing signal (T_o to T_m) from an input clock and can set the position of T_o of the timing signal using a synchronization signal, and 1.
A processor array comprising n processor elements cascaded in a one-dimensional unidirectional direction that read input data of the same type from a previous stage with a timing delay, perform different arithmetic processing at each timing, and output it to the next stage.