JP2553728B2

JP2553728B2 - Arithmetic unit

Info

Publication number: JP2553728B2
Application number: JP2043009A
Authority: JP
Inventors: 宏西川; 高志浜田; 基宏三沢; 和生佐久嶋; 美和吹野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-02-23
Filing date: 1990-02-23
Publication date: 1996-11-13
Anticipated expiration: 2011-11-13
Also published as: JPH03245223A

Description

【発明の詳細な説明】産業上の利用分野本発明は、プロセサ内に演算器を複数個有し、複数命
令に対してそのデータ依存関係を識別して複数個の演算
器を同時または逐次的に実行する演算装置に関するもの
である。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention has a plurality of arithmetic units in a processor, identifies the data dependency of a plurality of instructions, and simultaneously or sequentially operates the plurality of arithmetic units. The present invention relates to an arithmetic unit for executing the above.

従来の技術従来、演算器を複数個設けた演算装置としては、VLIW
（ベリーロング・インストラクション・ワード Very L
ong Instruction Word）計算機と呼ばれるアーキテクチ
ャが知られている。Conventional technology Conventionally, VLIW has been used as an arithmetic unit with multiple arithmetic units.
(Very Long Instruction Word Very L
ong Instruction Word) An architecture called a computer is known.

第４図は従来の同アーキテクチャのブロック図であ
り、レジスタファイル401は第１の演算器402へデータを
供給するためのデータ読出しポートrs1d、rs2a並びに第
１の演算器402の演算結果を書込むためのポートrdaを備
えている。そして、同レジスタファイル401は、第２の
演算器403へデータを供給するためのデータ読出しポー
トrs1b、rs2b並びに第２の演算器403の演算結果を書込
むためのポートrdbをも備えている。命令レジスタ404
は、前記２つの演算器402、403を同時に動作させるため
に、２命令を格納できるビット長を持ち、命令バス405
を経由してデータがこの命令レジスタ404に格納され
る。命令レジスタ404中の２命令は、制御信号406、407
をそれぞれ介して第１の演算器402、第２の演算器403へ
の指令となる。FIG. 4 is a block diagram of the same architecture of the related art, and the register file 401 writes data read ports rs1d and rs2a for supplying data to the first arithmetic unit 402 and the arithmetic result of the first arithmetic unit 402. It has a port rda for. The register file 401 also has data read ports rs1b and rs2b for supplying data to the second arithmetic unit 403 and a port rdb for writing the operation result of the second arithmetic unit 403. Instruction register 404
Has a bit length capable of storing two instructions in order to operate the two arithmetic units 402 and 403 at the same time.
The data is stored in this instruction register 404 via. The two instructions in the instruction register 404 are control signals 406 and 407.
The command is sent to the first computing unit 402 and the second computing unit 403 via each of them.

このようなアーキテクチャを採用すると、プロセサ内
部では、最大２命令までを並列に実行することが可能と
なり、ベクトル化の効かない命令の高速化を達成するこ
とが可能となる。If such an architecture is adopted, it is possible to execute up to two instructions in parallel inside the processor, and it is possible to achieve speeding up of instructions that are not vectorized.

第５図は従来のプロセサに対する命令格納メモリの内
容例を示す図であり、基本的には、命令１語に対し第１
列と第２列の２つの命令が格納された形態のメモリであ
る。プロセサは命令ポインタで指示される命令語を順次
取出し、命令レジスタ404に格納し、命令レジスタ404に
格納された命令を実行することになる。命令R0＝R1＋R2
を例にとると、命令の実行は次のようになる。レジスタ
ファイル401から読出されたレジスタR1、R2の内容が第
１の演算器401に送られ、R1とR2が加算される。そし
て、この演算結果のR0がレジスタファイル401に入力し
て格納されるのである。FIG. 5 is a diagram showing an example of the contents of an instruction storage memory for a conventional processor.
This is a memory in which two instructions of a column and a second column are stored. The processor sequentially fetches the instruction word designated by the instruction pointer, stores it in the instruction register 404, and executes the instruction stored in the instruction register 404. Instruction R0 = R1 + R2
For example, the instruction execution is as follows. The contents of the registers R1 and R2 read from the register file 401 are sent to the first arithmetic unit 401, and R1 and R2 are added. Then, R0 of this operation result is input and stored in the register file 401.

第５図には、同時に実行することが不可能な命令が格
納されている状態も、合わせて表示してある。即ち、第
１列、第４エントリにある命令R3＝R0−R3は、同列第３
エントリの命令R0＝R1−R2の実行後でないと、実行不可
能である。同時に、第１列、第５エントリの命令R5＝R3
−R2は、同列第４エントリ命令R3＝R0−R3の実行後でな
いと実行不可能である。同時実行不可能な理由は、命令
の演算結果を次の命令で使用するからであり、演算結果
は、レジスタファイル401に格納しない限り、次命令で
のレジスタファイル401の読出しが実行できない。FIG. 5 also shows a state in which instructions that cannot be simultaneously executed are stored. That is, the instruction R3 = R0-R3 in the first entry in the first column and the fourth entry is
It cannot be executed until after the instruction R0 = R1-R2 of the entry is executed. At the same time, the instruction R5 = R3 in the 1st column, 5th entry
-R2 can be executed only after execution of the fourth entry instruction R3 = R0-R3 in the same column. The reason why the simultaneous execution is not possible is that the operation result of the instruction is used in the next instruction, and unless the operation result is stored in the register file 401, the reading of the register file 401 by the next instruction cannot be executed.

したがって、第５図に示されるように、命令格納メモ
リ中にはnop命令が格納され、メモリに無駄が生じるこ
とになる。Therefore, as shown in FIG. 5, the nop instruction is stored in the instruction storing memory, and the memory is wasted.

第６図には、第４図のプロセサが第５図に示された命
令を実行するときのタイミングチャートしてあり、10、
12…16は第５図の第１列の第１エントリから第５エント
リに格納された命令に対応し、11、13…17は、第２列の
第１エントリから第５エントリに格納された命令に対応
する。ここに、「Ｉ」は、命令格納メモリから命令を読
出して命令レジスタ404に格納するまでの時間を示し、
「Ｅ」はレジスタファイル401中のレジスタからデータ
を読出した後、演算を施して、この演算結果をレジスタ
ファイル401に格納するまでの時間を示す。FIG. 6 is a timing chart when the processor of FIG. 4 executes the instructions shown in FIG.
12 ... 16 correspond to the instructions stored in the first to fifth entries in the first column of FIG. 5, and 11, 13 ... 17 are stored in the first to fifth entries of the second column. Corresponds to an instruction. Here, “I” indicates the time until the instruction is read from the instruction storage memory and stored in the instruction register 404,
“E” indicates the time from reading data from the register in the register file 401, performing the operation, and storing the operation result in the register file 401.

このタイミングチャートからも理解されるように、第
１、第２の演算器402、403を有する場合に、命令の１語
長を大きくすることにより並列動作が表現できるから、
１つずつ命令を実行することに比較して、高性能が達成
できる点がVLIW計算器の大きな特長である。As can be understood from this timing chart, when the first and second arithmetic units 402 and 403 are provided, the parallel operation can be expressed by increasing the one word length of the instruction.
A major feature of the VLIW calculator is that it can achieve higher performance than executing instructions one by one.

発明が解決しようとする課題ところで、前述した従来のVLIW計算器にあっては、no
p命令が存在するため、最も密度の高い状態で命令格納
メモリに命令を充填できないから、次のような問題点も
含む。即ち、演算器２つ用のプロセサと１つ用のプロセ
サでは結合性が無く、演算器１のプロセサで動作するプ
ログラムのコンパイルコードは、演算器２つのプロセサ
上では正常に動作しなくなり、プログラムの再コンパイ
ル動作が必要になる。また、パイプラインの段数が、命
令フュッチと実行の２段であるため、nop命令の実行時
間の命令全体の実行時間に占める比率が大きい等の課題
がある。Problems to be Solved by the Invention By the way, in the conventional VLIW calculator described above, no
Since there are p instructions, it is not possible to fill the instruction storage memory with instructions at the highest density, which also includes the following problems. That is, there is no connectivity between the processor for two arithmetic units and the processor for one, and the compiled code of the program that operates on the processor of the arithmetic unit 1 will not operate normally on the processor of the two arithmetic units, and A recompile operation is required. Further, since the number of stages of the pipeline is two, that is, the instruction stitch and the execution, there is a problem that the ratio of the execution time of the nop instruction to the execution time of the whole instruction is large.

本発明は、前述したような従来の課題に鑑みてなされ
たもので、その目的とするところは、複数個の演算器を
有するプロセサであって、演算器１つ用のコンパイルコ
ードにも動作でき、再コンパイル動作が不用で、コード
を圧縮でき、演算動作を高速化できる演算装置を提供す
ることにある。The present invention has been made in view of the above-described conventional problems, and an object thereof is a processor having a plurality of arithmetic units, which can operate even in a compiled code for one arithmetic unit. , A recompilation operation is unnecessary, a code can be compressed, and an operation device capable of accelerating operation operation is provided.

課題を解決するための手段上記目的を達成するため、本発明の演算装置は複数の
データの読み出しポート、および複数の演算結果を格納
するための書き込みポートをもち、これらのポートが独
立に同時操作可能な機能をもつレジスタファイルと、前
記レジスタファイルの個々のポートからの読み出しデー
タを格納する入力ラッチと、これらのうちの２つの入力
ラッチに格納されたデータを入力として制御信号により
定まる演算を、これら２つのデータの間で施す機能を有
する複数の演算器と、それぞれの演算器の出力を格納す
る出力ラッチと、出力ラッチをレジスタファイルの書き
込みポートに接続した構成において、演算器への入力デ
ータをレジスタ番号で指定し、演算結果の格納場所をレ
ジスタ番号で指定し、演算種別を制御コードで指定する
命令コードを、複数個保持できる命令レジスタを具備
し、メモリに格納された命令を命令レジスタにフェッチ
するフェッチステージ、その命令レジスタの内容をデコ
ードし、演算器の入力ラッチへ指定レジスタの内容を読
み出すデコードステージ、演算器でこれらのデータを演
算し、出力データを出力ラッチに格納する実行ステー
ジ、さらに出力ラッチの内容をレジスタに格納する書き
込みステージの４段パイプラインでの実行において、命
令レジスタに引続き格納、実行される２つの命令は、最
初に同時実行される複数命令コードにより格納場所とし
て指定されるレジスタ番号と次に同時実行される複数命
令コードで指定される演算入力のデータを格納している
レジスタ番号とが同一であることを検出する第１の判定
機構を有し、前記検出結果に基づき、演算器への入力を
演算器の出力ラッチ内に格納されたデータを使うことを
可能とするセレクタを具備し、さらに命令レジスタ中の
複数の命令コードを同時実行する場合に、これらの命令
コード間でデータ依存関係が存在することを検出する第
２の判定機構を有し、これにより命令レジスタに格納さ
れた命令コード内にデータ依存関係がパイプラインのデ
コードステージで検出された場合は最初に実行されるべ
き命令コードは個々のステージを順次実行させ、後続す
る依存関係のある命令コードは、実行ステージを無操作
として、その次のタイミングで実行フェーズに移行させ
る操作を順次データ依存関係がある命令コードに繰り返
し適用し、この命令の次に実行される命令コードは、前
記命令の命令コードで最後に実行された命令コードが実
行ステージになった時に、デコードフェーズに移行させ
るものである。In order to achieve the above object, the arithmetic unit of the present invention has a plurality of data read ports and a plurality of write ports for storing a plurality of operation results, and these ports operate independently at the same time. A register file having a possible function, an input latch for storing read data from each port of the register file, and an operation determined by a control signal using the data stored in two of these input latches as inputs, In a configuration in which a plurality of arithmetic units having a function of performing between these two data, an output latch for storing the output of each arithmetic unit, and a configuration in which the output latch is connected to the write port of the register file, the input data to the arithmetic unit Is specified by the register number, the storage location of the operation result is specified by the register number, and the operation type is specified by the control code. It has an instruction register that can hold multiple instruction codes to specify, fetch stage that fetches the instruction stored in memory to the instruction register, decodes the content of the instruction register, and writes it to the input latch of the arithmetic unit. In the 4-stage pipeline execution of the decode stage for reading out, the execution stage for computing these data by the arithmetic unit and storing the output data in the output latch, and the write stage for storing the contents of the output latch in the register The two instructions to be stored and executed subsequently store the register number designated as the storage location by the multiple instruction code executed at the same time and the operation input data designated by the multiple instruction code executed at the same time. Has a first determination mechanism that detects that the register number is Based on the detection result, a selector is provided which enables the input to the arithmetic unit to use the data stored in the output latch of the arithmetic unit, and further when a plurality of instruction codes in the instruction register are simultaneously executed, It has a second determination mechanism for detecting the existence of data dependency between these instruction codes, whereby the data dependency is detected in the instruction decode stored in the instruction register at the decode stage of the pipeline. In this case, the instruction code to be executed first causes each stage to be executed sequentially, and the instruction code with the subsequent dependency has the operation stage as no operation and moves to the execution phase at the next timing. Repeatedly applied to the dependent instruction code, and the instruction code executed next to this instruction is executed last with the instruction code of the above instruction. When the instruction code becomes execution stage, it is intended to shift the decode phase.

作用このような演算装置によると、同時に複数命令が実行
可能でない場合に１命令ずつパイプライン動作したとき
と、同等のパフォーマンスを保証できる。Operation With such an arithmetic unit, it is possible to guarantee the same performance as when one instruction is pipelined when multiple instructions cannot be executed simultaneously.

実施例以下、第１図ないし第３図を用いて本発明の実施例を
詳細に説明する。Embodiment Hereinafter, an embodiment of the present invention will be described in detail with reference to FIGS. 1 to 3.

本実施例は、２つの命令を同時実行させる場合を説明
しているが、これは３命令以上の同時実行をさせる場合
も、以下の考えを拡張することで実現できる。Although the present embodiment describes the case where two instructions are simultaneously executed, this can be realized by expanding the following idea even when simultaneously executing three or more instructions.

第１図は、２個の演算器を用いた場合の本発明による
演算装置のブロック図である。レジスタファイル101
は、データ読出しポートとしてのrs1a、rs2a、rs1b、rs
2bの４つの独立ポートをもち、書込みポートとしてはrd
a、rdbの２つの独立ポートを有している。入力ラッチ10
2、103、108、109には、レジスタファイル101からの読
出しデータが格納される。第１、第２の演算器106、112
の演算結果は、その出力ラッチ107、113にそれぞれ格納
されるが、出力ラッチ107の出力は、第１、第２の演算
器106、112の入力を選択するためのセレクタ104、105、
110、111に接続されると共に、レジスタファイル101の
ポートrdaにも接続される。同様に、出力ラッチ113の出
力もセレクタ104、105、110、111に接続され、レジスタ
ファイル101のポートrdbにも接続される。FIG. 1 is a block diagram of an arithmetic unit according to the present invention when two arithmetic units are used. Register file 101
Is the data read port for rs1a, rs2a, rs1b, rs
It has 4 independent ports of 2b, and the write port is rd
It has two independent ports, a and rdb. Input latch 10
Read data from the register file 101 is stored in 2, 103, 108, and 109. First and second arithmetic units 106, 112
The calculation result of is stored in the output latches 107 and 113, respectively. The output of the output latch 107 is the selectors 104 and 105 for selecting the inputs of the first and second arithmetic units 106 and 112, respectively.
It is connected to 110 and 111 as well as to the port rda of the register file 101. Similarly, the output of the output latch 113 is also connected to the selectors 104, 105, 110 and 111, and is also connected to the port rdb of the register file 101.

命令バス116は、第１、第２の演算器106、112を同時
に動作させることが可能なビット巾をもったもので、命
令格納メモリからの読出し結果は命令レジスタ114に格
納される。そして、命令レジスタ114中に格納された２
つの命令は、デコード信号118を通して第１の演算器106
用の指令となり、デコード信号117を通して第２の演算
器112用の指令となる。データ依存判定回路115は、命令
レジスタ114に格納された命令が同時実行可能か否かを
判定し、同時実行不可能である場合は、命令を逐次的に
実行させる機能を備えている。The instruction bus 116 has a bit width that allows the first and second arithmetic units 106 and 112 to operate simultaneously, and the result read from the instruction storage memory is stored in the instruction register 114. The two stored in the instruction register 114
The two instructions are transmitted through the decode signal 118 to the first arithmetic unit 106.
And a command for the second computing unit 112 through the decode signal 117. The data dependence determination circuit 115 has a function of determining whether or not the instructions stored in the instruction register 114 can be simultaneously executed, and if the instructions cannot be simultaneously executed, the instructions are sequentially executed.

第２図は命令格納メモリの内容例を示し、第３図には
この命令を実行したときのタイミングチャートを示して
ある。タイミングチャート中の「Ｉ」は、命令格納メモ
リから命令レジスタ114への命令格納時間、「Ｄ」はレ
ジスタファイル101からデータを読出して第１、第２の
演算器106、112の入力ラッチ102、103、108、109へ転送
するデータ転送時間、「Ｅ」は第１、第２の演算器10
6、112に対するデータが入力されて演算結果が出力ラッ
チ107、113に格納されるまでの時間、「Ｗ」は出力ラッ
チ107、113の内容がレジスタファイル101に格納される
までの時間とする。FIG. 2 shows an example of the contents of the instruction storage memory, and FIG. 3 shows a timing chart when this instruction is executed. In the timing chart, "I" is the instruction storage time from the instruction storage memory to the instruction register 114, "D" is the data read from the register file 101 and the input latches 102 of the first and second arithmetic units 106 and 112, Data transfer time to transfer to 103, 108, 109, "E" is the first and second arithmetic units 10
It is assumed that the data for 6 and 112 are input and the calculation result is stored in the output latches 107 and 113, and “W” is the time until the contents of the output latches 107 and 113 are stored in the register file 101.

命令格納メモリから読み出された命令I0（R0＝R1＋R
2）及びI1（R3＝R1−R2）の実行は次のように進む。ま
ずI0、I1の命令が時刻t1に命令レジスタ114に格納され
る。次に、命令I0で示される演算ソースであるレジスタ
ファイル101中のレジスタR1、R2の内容が読出され、第
１の演算器106の入力ラッチ102、103に保持される。同
様に、命令I1の演算ソースであるR1、R2の内容も、第２
の演算器112の入力ラッチ108、109に保持される。この
作業はＤフェーズに相当し、時刻t2に終了するが、Ｅフ
ェーズでは、２つの第１、第２の演算器106、112で加
算、減算が同時に行なわれ、演算結果が時刻t3に出力ラ
ッチ107、113に格納される。Ｗフェーズでは、これら出
力ラッチ107、113の内容がレジスタファイル101のR0、R
3に同時に書込まれる。この作業は時刻t4に終了する。
２つの命令が同時に実行可能であるのは、I0ののデステ
ィネーションレジスタが、I1のソースレジスタと異なっ
ているからである。Instruction I0 (R0 = R1 + R read from the instruction storage memory
2) and I1 (R3 = R1-R2) execution proceeds as follows. First, the instructions of I0 and I1 are stored in the instruction register 114 at time t1. Next, the contents of the registers R1 and R2 in the register file 101 which is the operation source indicated by the instruction I0 are read out and held in the input latches 102 and 103 of the first operation unit 106. Similarly, the contents of R1 and R2, which are the operation sources of instruction I1, are
Are held in the input latches 108 and 109 of the arithmetic unit 112. This work corresponds to the D phase and ends at time t2, but in the E phase, addition and subtraction are simultaneously performed by the two first and second arithmetic units 106 and 112, and the arithmetic result is output at time t3. It is stored in 107 and 113. In the W phase, the contents of these output latches 107 and 113 are stored in R0 and R of the register file 101.
Written to 3 at the same time. This work ends at time t4.
Two instructions can be executed simultaneously because the destination register of I0 is different from the source register of I1.

同様に、I2（R4＝R1＋R2）、I3（R5＝R3＋R2）の実行
の場合、これらの２つの命令は同時に実行可能である。
しかし、I3のＥフェーズでは、I1の命令によって更新さ
れるレジスタR3の内容をソースデータとして用いている
から、I3のＥフェーズの実行は、I1のＷフェーズの実行
と同時に進む。したがって、I3のＥフェーズの実行直前
には、レジスタR3の内容は更新されていないため、第１
図のデータ依存判別回路115が働き、I3のＤフェーズで
は、R3に書込むべきデータを保持している第２の演算器
112の出力ラッチ113から後のＥフェーズで用いる入力デ
ータを入手することにすればよい。つまり、このように
すれば、命令I2、I3が同時に実行できるのみならず、直
前の命令I0、I1との間での命令実行パイプラインを乱さ
ずに、命令実行ができることになる。Similarly, in the case of executing I2 (R4 = R1 + R2) and I3 (R5 = R3 + R2), these two instructions can be executed simultaneously.
However, in the E phase of I3, the contents of the register R3 updated by the instruction of I1 are used as the source data, so the execution of the E phase of I3 proceeds at the same time as the execution of the W phase of I1. Therefore, immediately before the execution of the E3 phase of I3, the contents of the register R3 have not been updated,
In the D phase of I3, the data dependence determination circuit 115 in the figure operates, and the second arithmetic unit holds the data to be written in R3.
The input data used in the subsequent E phase may be obtained from the output latch 113 of 112. That is, in this way, not only the instructions I2 and I3 can be executed at the same time, but also the instructions can be executed without disturbing the instruction execution pipeline between the immediately preceding instructions I0 and I1.

このデータ依存判別回路の機能は、命令レジスタの内
容をこの回路内にコピーして、この中の２つの命令中の
ソースレジスタ番号と、新たな命令レジスタの命令中の
デスティネーションレジスタ番号との一致を検出するこ
とで実現できる。The function of this data dependence determination circuit is to copy the contents of the instruction register into this circuit, and match the source register number in the two instructions in this with the destination register number in the instruction of the new instruction register. Can be realized by detecting.

次の命令であるI4（R0＝R1−R2）、I5（R3＝R0−R3）
の実行に関しては、I4の結果をI5が使用するので、同時
実行することはできない。このことは、データ依存判別
回路115を用いてＤフェーズで検出判定できる。詳細に
いうと、この処理は、I5のＥフェーズを１段遅らせるこ
とで（即ち時刻t5〜時刻t6の間）行なうことにする。こ
の一段分の遅延を実現することは、第１図の出力ラッチ
113に演算結果データを格納しないことで実現される、
図３では、この操作をnopと記述してある。次にt5−t6
の時間では、I4をnop化することが行なわれる。これ
は、レジスタファイルへの格納を行なわないことを意味
する。このようなデータ依存関係が生じた時には、t4−
t5,t5−t6をＥフェーズとすることは、Ｅフェーズを２
つ続けるための回路で実現できる。Next instruction is I4 (R0 = R1-R2), I5 (R3 = R0-R3)
Concerning execution of, since the result of I4 is used by I5, it cannot be executed simultaneously. This can be detected and judged in the D phase by using the data dependence judgment circuit 115. More specifically, this process is performed by delaying the E5 phase of I5 by one stage (that is, between time t5 and time t6). Achieving this one-stage delay depends on the output latch of FIG.
It is realized by not storing the calculation result data in 113,
In FIG. 3, this operation is described as nop. Then t5-t6
At the time of, I4 nopization is performed. This means that the register file is not stored. When such a data dependency occurs, t4−
Setting t5, t5-t6 as the E phase means that the E phase is 2
It can be realized with a circuit for continuing.

時刻t5以降は通常のフェーズ実行に移行する。この結
果、引く続く命令I6、I7、I8、I9に影響が発生する。こ
れは、１サイクル分の実行がパイプラインで遅延したこ
とに起因する。図３では、依存関係が発覚した時刻は、
I4、I5のＤフェーズ実行時点であるので、I6、I7のt4−
t5の時間でおこなう操作は、nop操作とする。この時点
ではI6、I7は命令レジスタに格納されており、Ｄフェー
ズ実行結果としての演算器の入力ラッチへのデータ格納
を禁止することで実現される。After time t5, the normal phase execution starts. As a result, the subsequent instructions I6, I7, I8 and I9 are affected. This is because the execution of one cycle was delayed in the pipeline. In Figure 3, the time when the dependency was discovered is
At the time of executing the D phase of I4 and I5, t4 of I6 and I7
The operation performed at time t5 is the nop operation. At this point, I6 and I7 are stored in the instruction register, and this is realized by prohibiting data storage in the input latch of the arithmetic unit as the D phase execution result.

本例では、I6（R5＝R3−R2）、I7（R6＝R1−Ｒ）は同
時実行可能であるので、パイプラインの遅延は発生して
いない。さらに、命令I8、I9に関しては、Ｉフェーズを
始めることを抑制することが行なわれる。これは、メモ
リフェッチを開始することを次のタイミングにづらすこ
とで実現される。このようにして、パイプラインのみだ
れを正常にすることをおこなう。In this example, since I6 (R5 = R3-R2) and I7 (R6 = R1-R) can be executed simultaneously, no pipeline delay has occurred. Further, regarding the instructions I8 and I9, the start of the I phase is suppressed. This is realized by stating the start of memory fetch at the next timing. In this way, the drooping of the pipeline is made normal.

このような処理により、引く続き命令レジスタに格納
された命令と命令レジスタ内に格納された複数命令で
も、データ依存関係があり、同時に実行可能でない場合
にもデータの依存関係をみたしつつ実行される。この場
合は、データ依存関係のある命令は逐次実行された場合
と同一実行時間となっており、無駄なサイクルは生じて
いない。By such processing, even if the instruction stored in the subsequent instruction register and the multiple instructions stored in the instruction register have a data dependency relationship, even if they cannot be executed at the same time, they are executed while observing the data dependency relationship. It In this case, the instruction having the data dependency has the same execution time as that of the case where the instruction is sequentially executed, and no useless cycle occurs.

発明の効果以上に述べてきたように、本発明によれば、演算器１
つ用にコンパイルされたオブジェクトでも、複数個の演
算器を有するプロセサで動作することが可能になり、再
コンパイル動作が不要になる。また、本発明では、コン
パイルドコード中に並列実行可能でない演算器の命令フ
ィールドをnop化する命令を入れる必要がないため、コ
ードを圧縮化できる。そして、本発明では、複数個の演
算器を同時に動作するため、１つの演算器のプロセサに
比較して演算動作が高速になる。Effects of the Invention As described above, according to the present invention, the arithmetic unit 1
Even an object compiled for one purpose can be operated by a processor having a plurality of arithmetic units, and a recompilation operation is unnecessary. Further, according to the present invention, it is not necessary to insert an instruction for converting the instruction field of a non-parallel-executable arithmetic unit into a nop in the compiled code, so that the code can be compressed. Further, in the present invention, since a plurality of arithmetic units operate simultaneously, the operation speed becomes higher than that of a processor of one arithmetic unit.

[Brief description of drawings]

第１図は本発明による演算装置のブロック結線図、第２
図は命令格納メモリの内容例を示す概念図、第３図は命
令実行のパイプラインのタイミングチャート、第４図は
従来の演算装置のブロック結線図、第５図は命令格納メ
モリの内容例を示す概念図、第６図は命令実行のタイミ
ングチャートである。 101……レジスタファイル、102、103、108、109……入
力ラッチ、106……第１の演算器、112……第２の演算
器、107、113……出力ラッチ、114……命令レジスタ、1
15……データ依存判別回路。FIG. 1 is a block connection diagram of an arithmetic unit according to the present invention, and FIG.
FIG. 3 is a conceptual diagram showing an example of contents of an instruction storage memory, FIG. 3 is a timing chart of an instruction execution pipeline, FIG. 4 is a block connection diagram of a conventional arithmetic unit, and FIG. 5 is an example of contents of an instruction storage memory. The conceptual diagram shown in FIG. 6 is a timing chart of instruction execution. 101 ... Register file, 102, 103, 108, 109 ... Input latch, 106 ... First arithmetic unit, 112 ... Second arithmetic unit, 107, 113 ... Output latch, 114 ... Instruction register, 1
15 …… Data dependence discrimination circuit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者佐久嶋和生大阪府門真市大字門真1006番地松下電器産業株式会社内 (72)発明者吹野美和大阪府門真市大字門真1006番地松下電器産業株式会社内 (56)参考文献特開昭62−55736（ＪＰ，Ａ) ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Kazuo Sakushima 1006 Kadoma, Kadoma City, Osaka Prefecture Matsushita Electric Industrial Co., Ltd. (72) Inventor, Miwa Fukino 1006 Kadoma, Kadoma City, Osaka Matsushita Electric Industrial Incorporated (56) References Japanese Patent Laid-Open No. 62-55736 (JP, A)

Claims

(57) [Claims]

1. A read port for a plurality of data, and a write port for storing a plurality of operation results,
A register file having the function of enabling these ports to operate independently at the same time, an input latch for storing read data from the individual ports of the register file, and input of data stored in two of these input latches A configuration in which a plurality of arithmetic units having a function of performing an operation determined by a control signal between these two data, an output latch storing the output of each arithmetic unit, and an output latch connected to the write port of the register file In, the input data to the arithmetic unit is specified by the register number, the storage location of the operation result is specified by the register number, and the operation register that can hold a plurality of instruction codes that specify the operation type by the control code is provided. Fetch stage that fetches the instruction stored in Cord, decode stage for reading the contents of the specified register to the input latch calculator calculates these data the arithmetic unit, execution stage for storing the output data in the output latch
Furthermore, in the execution of a write stage that stores the contents of the output latch in a register in a 4-stage pipeline, two instructions that are continuously stored and executed in the instruction register are:
Make sure that the register number specified as the storage location by the multiple instruction code that is executed at the same time is the same as the register number that stores the operation input data specified by the multiple instruction code that is executed at the same time. A first determination mechanism for detecting, and based on the detection result, a selector that enables the input to the arithmetic unit to use the data stored in the output latch of the arithmetic unit, Of the instruction code stored in the instruction register, the second determination mechanism detects that there is a data dependency between these instruction codes when the plurality of instruction codes of the instruction code are simultaneously executed. If a dependency is detected at the decode stage of the pipeline, the first instruction code to be executed causes each stage to be executed sequentially, The instruction code to be executed does not operate the execution stage, and the operation to shift to the execution phase at the next timing is repeatedly applied to the instruction code having the sequential data dependency. An arithmetic unit that enables simultaneous execution of instructions by shifting to the decode phase when the last executed instruction code of the instruction reaches the execution stage.