JPH01295366A

JPH01295366A - Vector processing device

Info

Publication number: JPH01295366A
Application number: JP12485688A
Authority: JP
Inventors: Tomoo Aoyama; 青山　智夫; Hiroshi Murayama; 浩村山
Original assignee: Hitachi Ltd; Hitachi Computer Engineering Co Ltd
Current assignee: Hitachi Ltd; Hitachi Computer Engineering Co Ltd
Priority date: 1988-05-24
Filing date: 1988-05-24
Publication date: 1989-11-29

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、ベクトル処理装置に係り、特に、階層構造の
記憶装置を備え、階層間でのデータ転送をプログラムの
負担なく行うことを可能としたベクトル処理装置に関す
る。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a vector processing device, and in particular, to a vector processing device that is equipped with a storage device with a hierarchical structure and that allows data transfer between layers without any burden on programs. The present invention relates to a vector processing device.

[Conventional technology]

計算機の利用において、近年大規模な技術計算に対する
需要が極めて大きくなっており、このような需要に応え
るため、ベクトル処理装置が開発されている。この種ベ
クトル処理装置の性能を決定する主要因は、メモリスル
ープットにあり、より高速のベクトル処理装置の設計に
は、多大の労力を、主記憶部を制御する論理部のために
さく必要があった。また、高いメモリスループットを実
現するためには、記憶部に高速の記憶素子を使用する必
要があり、このため、ベクトル処理装置の記憶部は、一
般の計算機に比較してコスト高となる傾向にある。この
ような記憶部のコスト高を回避するため、近年のベクト
ル処理装置は、階層構造を有する記憶部を備えて構成さ
れるようになってきている。この種階層構造の記憶部を
備えるベクトル処理装置に関する従来技術として、例え
ば、アイ　イー　イー　イー、シー　エッチ２２１６−
０　（ＩＥＥＥ、ＣＨ２２１６−０）、（１９８５）、
第３０１頁〜第３０９真に記載された技術が知られてい
る。In the use of computers, the demand for large-scale technical calculations has become extremely large in recent years, and vector processing devices have been developed to meet this demand. The main factor determining the performance of this type of vector processing device is memory throughput, and designing a faster vector processing device requires a great deal of effort to be devoted to the logic section that controls the main memory. Ta. Additionally, in order to achieve high memory throughput, it is necessary to use high-speed storage elements in the storage section, and for this reason, the storage section of vector processing devices tends to be more expensive than that of general computers. be. In order to avoid such high cost of the storage section, vector processing devices in recent years have been configured with a storage section having a hierarchical structure. As a prior art related to a vector processing device having a storage section with this kind of hierarchical structure, for example, IEE, C.H. 2216-
0 (IEEE, CH2216-0), (1985),
The technique described on pages 301 to 309 is known.

この種従来技術によるベクトル処理装置は、その記憶部
が主記憶部とローカルメモリとを含んで構成され、ベク
トルデータをベクトルレジスタにロードする場合、−旦
、主記憶部からローカルメモリへデータを転送する必要
があった。そして、このデータ転送は、プログラムによ
り規定されている。例えば、プログラムにより、ＲＥＡＬ　　ＪＮＣＯＭＭＯＮ１０ＮＥ／ＪＮ　（７８０）ＲＥＧＦＩＬ
Ｅ　　ＯＮＥのように指定することによって、配列ＪＮを主記憶部と
は異なった記憶部へ割付けることを規定する。In this type of conventional vector processing device, the storage section includes a main storage section and a local memory, and when vector data is loaded into a vector register, the data is transferred from the main storage section to the local memory. I needed to. This data transfer is defined by the program. For example, programmatically, REAL JN COMMON10NE/JN (780) REGFIL
By specifying E ONE, it is specified that array JN is allocated to a storage section different from the main storage section.

前述の方法は、ユーザにメモリ管理を行わせており、プ
ログラミングに多大の負担を強いることになる。一方、
汎用計算機または比較的低速のベクトル処理装置で実施
されているデマンドページング処理は、データ転送の制
御をハードウェアによって行うことにより前記の問題点
を解決している。しかし、現在の技術では、ページング
制御によって、十数ＧＢ／Ｓｅｃ以上のデータ転送速度
を実現することは困難である。The above-mentioned method requires the user to manage the memory, which imposes a heavy programming burden. on the other hand,
Demand paging processing, which is implemented on a general-purpose computer or a relatively low-speed vector processing device, solves the above problems by controlling data transfer using hardware. However, with current technology, it is difficult to achieve a data transfer rate of more than 10 GB/Sec by paging control.

そして、数ＧＦＬＯＰＳ以上の処理性能を持つベクトル
処理装置に階層構造の記憶部を具備させ、記憶部を構成
する階層間のデータ転送を、可能な限り、プログラミン
グ上の負担なく行うことを可能としたベクトル処理装置
の開発が強く要望されている。The vector processing device, which has a processing performance of several GFLOPS or more, is equipped with a hierarchical storage section, making it possible to transfer data between the layers that make up the storage section with as little programming burden as possible. There is a strong demand for the development of vector processing devices.

[Problem to be solved by the invention]

従来技術による階層構造の記憶部を備え、処理性能数が
数ＧＦＬＯＰＳオーダのベクトル処理装置は、記憶階層
間のデータ転送をプログラムによって規定する必要があ
り、このため、プログラム内にデータ転送処理のための
命令を多数混在させなければならず、その解読が困難と
なり、プログラマに多大の負担を強いていたという問題
点を有していた。A vector processing device according to the prior art, which is equipped with a hierarchically structured storage unit and whose processing performance is on the order of several GFLOPS, requires a program to specify data transfer between storage hierarchies. The problem was that a large number of instructions had to be mixed together, making it difficult to decipher them and placing a heavy burden on the programmer.

本発明の目的は、前記従来技術の問題点を解決し、ベク
トル処理装置において、メモリスループットＬ低下させ
ることな（、ハードウェアによって、転送条件を検出し
、記憶階層間のデータ転送を行うことができるようにし
た、プログラムの負担を低減できるベクトル処理装置を
提供することにある。An object of the present invention is to solve the problems of the prior art described above, and to enable hardware to detect transfer conditions and transfer data between storage hierarchies without reducing memory throughput L in a vector processing device. An object of the present invention is to provide a vector processing device that can reduce the burden on programs.

[Means to solve the problem]

本発明によれば、前記目的は、以下のような構成と機能
を備えることによって達成される。According to the present invention, the above object is achieved by providing the following configuration and functions.

（１）主記憶とローカルメモリとを含んで構成された階
層構造を有する記憶部。(1) A storage unit having a hierarchical structure including a main memory and a local memory.

（２）ベクトル処理を実行する処理部と、主記憶部から
ローカルメモリへ前もってデータを転送するブリフェッ
チ処理を実行する処理部とによって構成された命令処理
部。(2) An instruction processing unit configured by a processing unit that executes vector processing and a processing unit that executes brief fetch processing that transfers data from the main storage unit to the local memory in advance.

（３）前記画処理部間に設けた同期制御機構。(3) A synchronization control mechanism provided between the image processing sections.

（４）プログラムを解析して、プログラムの論理的区分
に、命令解読部によって処理可能なタグを付加するコン
パイラに設けた機能。(4) A function provided in a compiler that analyzes a program and adds tags that can be processed by an instruction decoder to logical divisions of the program.

前記プログラムの論理的区分は、プログラムの制御が他
のルーチンへ渡されるとき、ベクトルロード命令によっ
てベクトルレジスタへ移されるベクトルデータ量がロー
カルメモリの容量を越える可能性のあるとき、または、
ベクトル処理でベクトルレジスタ上のデータを主記憶に
書込み、かつ、同データをベクトルロード命令により、
ベクトルレジスタに書込むときの位置である。プログラ
ムは、コンパイラがその論理的区分を識別することによ
って、ｎ個の論理的区分に分離される。The logical partitioning of the program occurs when control of the program is passed to another routine, when the amount of vector data transferred to a vector register by a vector load instruction may exceed the capacity of local memory, or
Write the data in the vector register to main memory using vector processing, and then write the same data using the vector load instruction.
This is the position when writing to the vector register. A program is separated into n logical partitions by the compiler identifying the logical partitions.

（５）ｎ個に区分されたプログラムを解析して、プリフ
ェッチ処理部に作用するコードを作成するコンパイラに
設けた機能。すなわち、ベクトルロ−ド命令以外のベク
トル命令を消去または無効命令化し、ベクトルロード命
令実行に必要なアドレスレジスタセットアツプ命令等を
残し、ベクトルロード処理に関係のない浮動小数点演算
命令等を消去または無効命令化する機能。(5) A function provided in the compiler that analyzes a program divided into n parts and creates a code that acts on the prefetch processing section. In other words, vector instructions other than vector load instructions are erased or disabled, address register set-up instructions, etc. necessary for vector load instruction execution are left, and floating point arithmetic instructions, etc. that are unrelated to vector load processing are erased or disabled. function.

（６）前述のタグ位置に、ブリフェッチ処理を無条件に
実行してよいか否か、ベクトル処理部のベクトルストア
処理の完了を待つ必要があるか否かにを識別する情鮪加する機能。(6) A function to add information to the above-mentioned tag position to identify whether or not the brief fetch process can be executed unconditionally, and whether or not it is necessary to wait for the completion of the vector store process of the vector processing unit.

この付加情報は、命令解読部により処理される。This additional information is processed by the instruction decoder.

（７）ｎ個に区分されたプログラムを解析して、ベクト
ル処理部に作用するコードを作成するコンパイラに設け
た機能。(7) A function provided in the compiler that analyzes a program divided into n parts and creates a code that acts on the vector processing section.

この変換作業は、従来のベクトルコード作成と同様であ
る。This conversion work is similar to conventional vector code creation.

（８）前述のタグ位置に、ブリフェッチ処理部の待ち状
態を解放するか否かを決定する命令を付加する機能。(8) A function that adds an instruction to the above-mentioned tag position to determine whether or not to release the wait state of the brief fetch processing unit.

この解放情報は、ベクトルストアの結果をベクトルロー
ドで読出しているか否かを調べることによって判定生成
される。This release information is determined and generated by checking whether the result of the vector store is being read by vector load.

[Effect]

従来技術によるベクトル処理装置は、ベクトル処理を規
定するベクトル命令及びスカラ処理を規定するスカラ命
令を処理している。ベクトル命令、特に、ベクトルロー
ド／ストア命令の主記憶参照方法を規定するセットアツ
プ処理を行う命令は、セットアツプ系命令と呼ばれ、ス
カラ命令の一種として分類される。ベクトルロード／ス
トア命令は、メモリリクエスタを作動させ、主記憶部と
ベクトルレジスタ間のデータ転送を制御するものであり
、主記憶部参照アドレスが命令オペランドに明示されて
いるものと、暗示的に指定されているものとに分けられ
る。暗示指定の命令をインデクス付ベクトルロード／ス
トア命令という。Vector processing devices according to the prior art process vector instructions that specify vector processing and scalar instructions that specify scalar processing. Vector instructions, particularly instructions that perform set-up processing that define a main memory reference method for vector load/store instructions, are called set-up instructions and are classified as a type of scalar instruction. Vector load/store instructions operate the memory requester and control data transfer between main memory and vector registers, and the main memory reference address is explicitly specified in the instruction operand and implicitly specified. It can be divided into those that are The implicitly specified instruction is called an indexed vector load/store instruction.

本発明によるベクトル処理装置は、ベクトル処理論理部
で、ベクトルロード命令によって、ローカルメモリとベ
クトルレジスタ間のデータ転送を制御し、ベクトルスト
ア命令、インデクス付ベクトルロード／ストア命令によ
って、主記憶とベクトルレジスタ間のデータ転送を制御
している。ブリフェッチ処理論理部は、ベクトルロード
命令を処理し、主記憶とローカルメモリ間のデータ転送
を制御する。In the vector processing device according to the present invention, the vector processing logic unit controls data transfer between the local memory and the vector register using a vector load instruction, and controls data transfer between the main memory and the vector register using a vector store instruction and an indexed vector load/store instruction. Controls data transfer between The brifetch processing logic processes vector load instructions and controls data transfers between main memory and local memory.

セットアツプ系命令は、ベクトル処理論理部と、プリフ
ェッチ処理論理部の両方に作用する。Setup instructions affect both the vector processing logic and the prefetch processing logic.

プログラムの論理的区分に付加されたタグ及びその他の
情報は、ベクトル処理論理部とプリフェッチ処理論理部
の命令解読部に作用し、画処理部の同期を行う。タグ及
び同期のための情報は、プログラム実行のための手段で
あるが、それ自身ではプログラムで規定されたデータ処
理に関与することはない。従って、以下これらの情報に
ついて、「命令」という名称は使用しない。The tags and other information attached to the logical divisions of the program act on the instruction decoders of the vector processing logic and prefetch processing logic to synchronize the image processing sections. Although tags and synchronization information are means for program execution, they do not themselves participate in data processing specified by the program. Therefore, the term "instruction" will not be used hereinafter for this information.

プログラムは、コンパイラによって、２種類のオブジェ
クトコード、すなわち、ベクトル処理論理部用のコード
とブリフェッチ処理論理部用のコードに変換される。こ
の２種類のオブジェクトコードは、Ｏ３によって１個の
ユーザジョブとして扱われ、２種類のオブジェクトコー
ドに対し１個の論理空間が与えられる。ユーザジョブに
対してＣＰＵ資源を割当てるタイミングとなった場合、
Ｏ８は、ブリフェッチ処理のオブジェクトコードの先頭
番地から処理を開始するように、ベクトル処理装置のス
カラ処理部に指示を与える。この指示は、前記スカラ処
理部内のプログラム状態語のＮ　Ｉ　Ａ　領域を書替え
ることによって行われる。スカラ処理部は、スカラ処理
の他セットアツプ系命令処理によって、ベクトル処理論
理部、ブリフェッチ処理論理部内のアドレッシングのた
めのレジスタ類をセットアツプする。スカラ処理部は、
これらのセットアツプが夫々の処理論理部内で完了した
後、プリフェッチ処理論理部を起動し、次いでベクトル
処理論理部を起動する。画処理論理部は、夫々のオブジ
ェクトコード部の先頭位置にあるタグ情報によって、命
令解読を行うかアイドリングするかを決定し、それに従
って動作する。初め、プリフェッチ処理論理部は、必ず
命令解読処理に入り、ベクトル処理論理部は、アイドリ
ングするか否かをタグの情報によって決定する。A program is converted by a compiler into two types of object code: code for the vector processing logic and code for the brief fetch processing logic. These two types of object codes are treated as one user job by O3, and one logical space is given to the two types of object codes. When it is time to allocate CPU resources to a user job,
O8 instructs the scalar processing section of the vector processing device to start processing from the first address of the object code for brief fetch processing. This instruction is performed by rewriting the NIA area of the program status word in the scalar processing section. The scalar processing section sets up registers for addressing in the vector processing logic section and the briffetch processing logic section by processing set-up instructions in addition to scalar processing. The scalar processing section is
After these setups are completed within their respective processing logic, the prefetch processing logic is activated, followed by the vector processing logic. The image processing logic section determines whether to decode instructions or idle, based on the tag information at the beginning of each object code section, and operates accordingly. Initially, the prefetch processing logic section always enters instruction decoding processing, and the vector processing logic section determines whether to idle or not based on the information of the tag.

ブリフェッチ処理論理部でベクトルロード命令が検出さ
れると、主記憶部からローカルメモリ部へのデータの転
送が行われる。また、ブリフェッチ処理論理部でタグが
検出されるとリリース信号が生成され、該リリース信号
によって、ベクトル処理論理部とブリフェッチ処理論理
部との間の同期制御機構内のセマフォがカウントアツプ
される。When a vector load instruction is detected in the brief fetch processing logic section, data is transferred from the main memory section to the local memory section. Further, when a tag is detected in the brifetch processing logic section, a release signal is generated, and the release signal causes a semaphore in the synchronization control mechanism between the vector processing logic section and the brifetch processing logic section to count up.

このセマフォは、ベクトル処理部がアイドリング処理か
ら命令解読処理に移行するとカウントダウンされる。This semaphore is counted down when the vector processing section shifts from idling processing to instruction decoding processing.

ブリフェッチ用オブジェクトコードのタグ位置において
、そのタグ情報が、次のブリフェッチ処理の無条件実行
可を示している場合、引続いてオブジェクトコードの次
の論理的区分の処理がブリフェッチ処理部により実行さ
れる。このとき、ブリフェッチ処理部内のアドレスレジ
スタのセットアツプは、スカラ処理部によって完了して
いなければならない。この保証は、コンパイラによって
おこなわれる。ブリフェッチ処理部内のアドレスレジス
タは、２面化されていてもよい。２重のハードウェアを
持たない場合、レジスタ番号を変えて、ブリフェッチ処
理で引用しているレジスタの内容を、スカラ処理のセッ
トアツプ処理で破壊しないように、ソフトウェアで保証
しておく。If the tag information at the tag position of the object code for brief fetch indicates that the next brief fetch process can be executed unconditionally, the process of the next logical division of the object code is subsequently executed by the brief fetch processing unit. . At this time, the setup of the address register in the briffetch processing section must be completed by the scalar processing section. This guarantee is made by the compiler. The address register in the brief fetch processing unit may be dual-sided. If you do not have dual hardware, change the register numbers and use software to ensure that the contents of the registers referenced in the brieffetch process will not be destroyed in the setup process of the scalar process.

ブリフェッチ用オブジェクトコードのタグ位置において
、そのタグ情報が、次のブリフェッチ処理のベクトル処
理待ちを示している場合、ブリフェッチ処理部は、アイ
ドリング状態となる。この状態を解除するため、ベクト
ル処理部は、オブジェクトコードの位置で、ブリフェッ
チ処理部の待ち解除指示が存在する場合、ベクトル処理
部側の命令解読部からブリフェッチ処理部側の命令解読
部に対し解除信号を送出する。前述のセマフォは、この
解除信号によっては動作しない。If the tag information at the tag position of the object code for brifetch indicates waiting for vector processing of the next brifetch process, the brifetch processing unit enters an idling state. To release this state, if there is a wait release instruction for the brief fetch processing section at the location of the object code, the vector processing section sends the instruction decoding section on the vector processing section to the instruction decoding section on the brief fetch processing section. Send a signal. The aforementioned semaphore does not operate in response to this release signal.

ブリフェッチ処理のベクトル処理とは、ベクトルロード
対象のベクトルデータに対して因果関係がない場合、夫
々の処理部でオーバラップして実行される。The vector processing of the brief fetch processing is executed in an overlapping manner in each processing unit when there is no causal relationship to the vector data to be vector loaded.

第２図はブリフェッチ処理とベクトル処理との実行状況
を説明する図であり、以下、これについて説明する。FIG. 2 is a diagram for explaining the execution status of brief fetch processing and vector processing, and this will be explained below.

第２図において、Ｐ　Ｆ　ｎ　（ｎ　＝Ｏ＋　１　＋　
２−−−−−−）はブリフェッチ処理の、また、ＶＰｎ
　（ｎ＝ｏ。In FIG. 2, P F n (n = O+ 1 +
2-------) is for briefetch processing, and also for VPn
(n=o.

１・・−・・−・）はベクトル処理の実行を示している
。1...-...-) indicates execution of vector processing.

Ｓ　ｅｍａｐｈｏｒｅと記して示した行の数値は、セマ
フォの内容値、すなわちカウント値を示しており、Δは
ブリフェッチ処理部及びベクトル処理部間の信号伝播時
間を示している。この信号伝播時間は、実際の時間に対
し長大に描かれている。The numerical value in the row labeled Semaphore indicates the content value of the semaphore, that is, the count value, and Δ indicates the signal propagation time between the brief fetch processing section and the vector processing section. This signal propagation time is exaggerated relative to the actual time.

いま、第２図において、初めにブリフェッチ処理ＰＦＯ
が実行され、この処理がタグ位置で完了するとセマフォ
の値が＋１される。ベクトル処理部は、このセマフォの
値をテストし、この値が正値ならばベクトル命令解読を
実行する。ブリフェッチ処理ＰＦＯの完了後、タグの情
報がベクトル処理部からの解除信号待ちとなっていない
場合、ブリフェッチ処理ＰＦＩの処理が次のサイクルか
から開始される。同様にして、ブリフェッチ処理ＰＦ２
までの処理が実行され、該処理ＰＦ２の完了後のタグが
ベクトル処理待ちとなっていたとする。この場合、ブリ
フェッチ処理部は、前記のタグの情報により待ち状態に
移行する。ベクトル処理ＶＰＯが完了すると前述のブリ
フェッチ処理部の待ち状態は解除され、ブリフェッチ処
理部は、次のブリフェッチ処理ＰＦ３の処理を実行し、
ベクトル処理部は、ベクトル処理ＶＰＩの処理を実行す
る。この間、セマフォは、ブリフェッチ処理の完了で＋
１され、ベクトル処理の開始で−１され、第２図に示す
ような値となる。Now, in FIG. 2, first the brief fetch processing PFO
is executed, and when this process is completed at the tag position, the value of the semaphore is incremented by 1. The vector processing unit tests the value of this semaphore, and if this value is a positive value, executes vector instruction decoding. After the completion of the brief fetch processing PFO, if the tag information is not waiting for a release signal from the vector processing unit, the processing of the brief fetch processing PFI is started from the next cycle. Similarly, brief fetch processing PF2
Assume that the processes up to this point have been executed, and the tag after the completion of the process PF2 is waiting for vector processing. In this case, the brief fetch processing unit shifts to a waiting state based on the tag information. When the vector processing VPO is completed, the waiting state of the above-mentioned brifetch processing unit is released, and the brifetch processing unit executes the next brifetch processing PF3,
The vector processing unit executes processing of vector processing VPI. During this time, the semaphore is +
1, and is decremented by 1 at the start of vector processing, resulting in a value as shown in FIG.

以上が、本発明によるベクトル処理装置の概略動作であ
り、本発明は、ブリフェッチ処理とベクトル処理動作の
パイプライン的な処理により、ベクトルロード処理をベ
クトル演算処理とオーバラップさせることが可能である
。オーバラップの度合は、従来技術によるベクトル処理
装置のチエイニング制御による命令実行ステージの重な
りからプログラムの論理的区分単位の重なりへ拡大され
たことになる。The above is the general operation of the vector processing device according to the present invention, and the present invention allows vector load processing to overlap vector calculation processing by pipeline processing of brief fetch processing and vector processing operation. The degree of overlap has been expanded from the overlap of instruction execution stages due to chaining control of vector processing devices in the prior art to the overlap of logical division units of programs.

ブリフェッチ処理は、ベクトル処理に比較して処理量が
少なく、実行時間も短いため、マルチジコブ実行時に、
複数ジョブのブリフェッチ処理が可能である。このため
、Ｏ３は、ブリフェッチ処理のためのアドレス管理を新
たに行う必要がある。Briefetch processing has a smaller amount of processing and shorter execution time than vector processing, so when executing multijicob,
It is possible to perform brief fetch processing for multiple jobs. Therefore, O3 needs to newly manage addresses for brief fetch processing.

〔Example〕

以下、本発明によるベクトル処理装置の一実施例を図面
により詳細に説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a vector processing apparatus according to the present invention will be described in detail below with reference to the drawings.

本発明の一実施例において、ローカルメモリは、ベクト
ル処理部内のベクトルレジスタと同様な構造を有し、ベ
クトルレジスタに対応する領域のアクセス権がブリフェ
ッチ処理部で管理される。この領域は、ブリフェッチ処
理が完了したとき、「データ確定」となり、ベクトル処
理部からこの領域に対するデータ続出が行われ、ベクト
ルレジスタにデータが書込まれたとき「空きｊとなる。In one embodiment of the present invention, the local memory has a structure similar to the vector register in the vector processing section, and the access rights to the area corresponding to the vector register are managed by the brief fetch processing section. This area becomes ``data confirmed'' when the brief fetch process is completed, data is successively outputted to this area from the vector processing unit, and becomes ``vacant j'' when data is written to the vector register.

ブリフェッチ処理は、ベクトル処理に対し先行して行わ
れるため、ブリフェッチ処理部内でベクトルロード命令
を実行しようとした場合、主記憶部から読出したベクト
ルデータを書込むローカルメモリ上の領域が「データ確
定」となっている場合があり得る。このとき、ブリフェ
ッチ処理部内の命令解読部は、ベクトルロード命令の起
動を抑止する。この起動の抑止は、ベクトル処理部から
ローカルメモリの読出しが行われるまで解除されない。Since brifetch processing is performed in advance of vector processing, if you try to execute a vector load instruction in the brifetch processing section, the area on the local memory where the vector data read from the main memory will be written is "data finalized". There may be cases where this is the case. At this time, the instruction decoding section in the brief fetch processing section suppresses activation of the vector load instruction. This suppression of activation is not released until the local memory is read from the vector processing unit.

本発明の実施例は、前述のように、ローカルメモリの領
域を、「データ確定」と「空きＪの２状態で管理してい
るが、ベクトルレジスタのように書込み、読出しを同時
に行うようなチエイニング制御をサポートするように拡
張することも可能である。As mentioned above, the embodiment of the present invention manages the local memory area in two states: "data fixed" and "vacant J", but the embodiment also manages the local memory area in two states: "data fixed" and "empty J", but the chaining method allows writing and reading to be performed simultaneously like a vector register. It can also be extended to support control.

第１図は本発明の一実施例によるベクトル処理装置の概
略構成を示すブロック図、第３図はブリフェッチ処理部
のブロック図、第４図、第５図はローカルメモリの状態
管理部のブロック図、第６図はベクトル処理部内のロー
カルメモリとベクトルレジスタ間のデータ転送を制御す
るデータ転送処理部のブロック図である。第１図、第３
図〜第６図において、１はブリフェッチ・デコーダ、２
はベクトル命令デコーダ、３はブリフェッチ処理用の命
令続出論理部、４はブリフェッチ処理用のベクトルアド
レス生成論理部、５はセマフォ、６はベクトル命令続出
論理部、７はベクトル処理用のアドレス生成論理部、８
はローカルメモリとベクトルレジスタ間のデータ転送処
理部、９はローカルメモリアクセス用のアドレステーブ
ル、１０は記憶制御部、１１は主記憶部、１５はスイッ
チング論理部、１６はローカルメモリ、１７はローカル
メモリの状態管理部、■８はベクトルレジスタ、１９は
ベクトルレジスタの状態管理部、１１２゜１２３．３０
６，４１１は加算器、１１４はカウンタ、１１５，４０
０は比較回路、２０２はプライオリティ回路、４０３は
エンコーダである。FIG. 1 is a block diagram showing a schematic configuration of a vector processing device according to an embodiment of the present invention, FIG. 3 is a block diagram of a brief fetch processing section, and FIGS. 4 and 5 are block diagrams of a local memory state management section. , FIG. 6 is a block diagram of a data transfer processing unit that controls data transfer between a local memory and a vector register in the vector processing unit. Figures 1 and 3
In Figures to Figure 6, 1 is a brieffetch decoder, 2
is a vector instruction decoder, 3 is an instruction successive instruction logic unit for brifetch processing, 4 is a vector address generation logic unit for brifetch processing, 5 is a semaphore, 6 is a vector instruction successive logic unit, and 7 is an address generation logic unit for vector processing. , 8
1 is a data transfer processing unit between local memory and vector register, 9 is an address table for local memory access, 10 is a storage control unit, 11 is a main memory unit, 15 is a switching logic unit, 16 is a local memory, and 17 is a local memory state management section, ■8 is a vector register, 19 is a vector register state management section, 112° 123.30
6,411 is an adder, 114 is a counter, 115,40
0 is a comparison circuit, 202 is a priority circuit, and 403 is an encoder.

第１図において、主記憶部１１上の領域１２〜１４には
、夫々、ブリフェッチ処理用の命令列、ベクトル処理用
の命令列及びベクトルデータが格納されている。In FIG. 1, areas 12 to 14 on the main storage unit 11 store an instruction sequence for brief fetch processing, an instruction sequence for vector processing, and vector data, respectively.

第１図に示すベクトル処理装置は、初めにブリフェッチ
処理用の命令読出しを命令続出論理部３により実行する
。命令続出論理部３は、主記憶部ＩＩから記憶制御部１
０を介して読出した命令をパス５０を介して受取り、パ
ス５１を経由してブリフェッチ・デコーダ１に送出する
。この動作において、パス５０上にはフェッチアドレス
及びデータが転送され、パス５１上にはデコーダ１がら
の指示及び命令読出論理部３がらのフェッチデータが転
送される。これらのパス５０．５１は、夫々複数の信号
線の集りであるが、第１図では図面の簡単化のため１本
の線で示されている。In the vector processing device shown in FIG. 1, the instruction successive output logic unit 3 first executes instruction reading for brief fetch processing. The instruction successive logic section 3 transfers the instructions from the main memory section II to the storage control section 1.
0 is received via path 50 and sent to brieffetch decoder 1 via path 51. In this operation, the fetch address and data are transferred onto the path 50, and the instruction from the decoder 1 and the fetch data from the instruction read logic unit 3 are transferred onto the path 51. These paths 50 and 51 are each a collection of a plurality of signal lines, but are shown as one line in FIG. 1 to simplify the drawing.

ブリフェッチ・デコーダ１は、前述のブリフェッチデー
タをデコードした結果、ブリフェッチ処理用のベクトル
ロード命令を検出すると、パス５２を介してベクトルア
ドレス生成論理部４を起動する。ベクトルアドレス生成
論理部４は、主記憶部１１からベクトルデータを読出す
ためのアドレスを生成しパス５３に送出すると同時に、
書込先ローカルメモリ対応のアドレステーブル９に同ベ
クトルデータ生成のためのベースアドレス及び増分アド
レスをセットする。これらのアドレスは、ベクトル処理
部がローカルメモリ１６を読出すために用いられる。記
憶制御部１０は、パス５３上に送られたアドレスによっ
て、主記憶部１１からベクトルデータを読出す。読出さ
れたベクトルデータは、スイッチング論理部１５．パス
５４を経由してローカルメモリ１６に書込まれる。スイ
ッチング論理部１５は、各命令読出論理部３，６及び各
アドレス生成部４．７でアドレスに付加されるシンク情
報によって作動させられる。ローカルメモリ１６の構造
は、この実施例においてはベクトルレジスタ１８と同様
のデータ配置構造とするが、特に、この構造に限られる
ことはない。ローカルメモリ１６の状態管理部１７は、
ローカルメモリ１６の領域毎のデータの確定状況を管理
するフリップフロップの集合により構成される。これら
のフリップフロップは、対応する領域上にデータが確定
したときにセットされ、ベクトル処理部からのアクセス
によって当該データ領域が解放されたときにリセットさ
れる。そして、該状態管理部１７は、ローカルメモリ１
６上の全領域にデータが書込まれていて、追加書込みが
できない場合、パス５５を介してブリフェッチ処理用の
ベクトルアドレス生成論理部４に抑止信号を送出する。When the brifetch decoder 1 detects a vector load instruction for brifetch processing as a result of decoding the aforementioned brifetch data, it activates the vector address generation logic unit 4 via the path 52 . The vector address generation logic unit 4 generates an address for reading vector data from the main storage unit 11 and sends it to the path 53, and at the same time,
The base address and incremental address for generating the same vector data are set in the address table 9 corresponding to the write destination local memory. These addresses are used by the vector processing section to read local memory 16. The storage control unit 10 reads vector data from the main storage unit 11 according to the address sent on the path 53. The read vector data is sent to the switching logic section 15. It is written to local memory 16 via path 54. The switching logic section 15 is activated by the sink information added to the address in each instruction read logic section 3, 6 and each address generation section 4.7. In this embodiment, the local memory 16 has a data arrangement structure similar to that of the vector register 18, but is not particularly limited to this structure. The state management unit 17 of the local memory 16
It is composed of a set of flip-flops that manage the status of data in each area of the local memory 16. These flip-flops are set when data is determined in the corresponding area, and reset when the data area is released by access from the vector processing section. The state management unit 17 then controls the local memory 1
If data has been written to all the areas on 6 and additional writing is not possible, an inhibition signal is sent to the vector address generation logic unit 4 for briefetch processing via path 55.

ベクトルアドレス生成論理部４は、この抑止信号によっ
て、ブリフェッチ用のアドレス生成を留保する。The vector address generation logic unit 4 reserves address generation for briefetch by this inhibition signal.

ブリフェッチ・デコーダ１は、ブリフェッチ命令列中に
タグを検出すると、パス５６を介してセマフオ５をカウ
ントアツプする。When the briefetch decoder 1 detects a tag in the briefetch instruction string, it counts up the semaphore 5 via the path 56.

また、ブリフェッチ・デコーダ１は、次のブリフェッチ
処理がベクトル命令列の処理の完了を待つ必要がある場
合、ブリフェッチ・デコーダ１内部のフリップフロップ
をセットし、命令解読処理を停止する。この待ち状態は
、ベクトル処理用アドレス生成論理部７からパス５７上
にリセット信号が送出されるまで継続する。Further, if the next brifetch process needs to wait for the completion of processing of a vector instruction sequence, the brifetch decoder 1 sets a flip-flop inside the brifetch decoder 1 and stops the instruction decoding process. This waiting state continues until a reset signal is sent from the vector processing address generation logic section 7 onto the path 57.

ベクトル命令デコーダ２は、パス５９．６０及びベクト
ル命令続出論理部６を用い、記憶制御部１０を介して主
記憶部１１よりベクトル処理命令列の読出しを行う。こ
の場合、ベクトル処理命令列の読出しが可能か否かの判
定は、パス５日を介してセマフオ５の値を読出すことに
より、ベクトル命令デコーダ２内で行われる。パス５９
．６０は、パス５０．５１と同様に複数の信号線で構成
されている。ベクトル命令デコーダ２は、読出されたベ
クトル処理命令列中にベクトルロード命令を検出すると
、パス６２を介してローカルメモリ１６とベクトルレジ
スタ２０間のデータ転送処理部８を起動する。また、ベ
クトル命令デコーダ２は、ベクトル処理命令列中にベク
トルロード命令以外のベクトルアクセス系命令を検出す
ると、パス６１を介してベクトル処理用アドレス生成部
７を起動する。The vector instruction decoder 2 uses paths 59 and 60 and the vector instruction successive output logic section 6 to read a vector processing instruction sequence from the main storage section 11 via the storage control section 10. In this case, the determination as to whether or not the vector processing instruction sequence can be read is made in the vector instruction decoder 2 by reading the value of the semaphore 5 through the pass 5. pass 59
．． Similarly to paths 50 and 51, 60 is composed of a plurality of signal lines. When the vector instruction decoder 2 detects a vector load instruction in the read vector processing instruction sequence, it activates the data transfer processing section 8 between the local memory 16 and the vector register 20 via the path 62. Further, when the vector instruction decoder 2 detects a vector access instruction other than the vector load instruction in the vector processing instruction string, it activates the vector processing address generation unit 7 via the path 61.

データ転送処理部８は、パス６３を介してアドレステー
ブル９をアクセスし、ローカルメモリ１６内のどの領域
にアクセスすべきベクトルデータが格納されているかを
判定し、同時にパス６４を介してベクトルレジスタの状
態管理部１９により、ｔｒ　迷光ベクトルレジスタのビ
ジー状況を判定する。The data transfer processing unit 8 accesses the address table 9 via the path 63, determines which area in the local memory 16 stores the vector data to be accessed, and at the same time accesses the vector register via the path 64. The state management unit 19 determines the busy status of the tr stray light vector register.

さらに、データ転送処理部８は、パス６９を介してロー
カルメモリの状態管理部１７により、ローカルメモリ１
６内の特定領域にベクトルデータが書込まれているか否
かを判定する。ベクトルレジスタ１８への書込みが可能
な場合、データ転送処理部８は、パス６５を介してロー
カルメモリ１６の特定領域をアクセスし、ベクトルデー
タを読出すだめのアドレスを送出する。これによりロー
カルメモリ１６から読出されたデータは、パス６６゜セ
レクタ２０を通ってベクトルレジスタ１８に書込まれる
。セレクタ２０の選択情報は、ベクトル命令デコーダ２
によって、命令のオペレーションコードから生成される
。ローカルメモリ１６の読出しが完了すると、データ転
送処理部８は、パス６７を介してローカルメモリ内の領
域リセット信号を送出する。ベクトルレジスタ１８の状
態は、状態管理部１９により管理されている。ベクトル
レジスタ１８が、ベクトル処理部内のリソースによって
使用されると、そのリソースにより、レジスタフリー信
号が生成され、この信号がパス６８を介して送られ、ベ
クトルレジスタの状態管理回路１９内のベクトルレジス
タ１８の各レジスタ領域の状態を保持しているフリップ
フロップがリセットされる。ベクトル処理部内のリソー
スは、第１回には省略され示されていない。Furthermore, the data transfer processing unit 8 uses the local memory state management unit 17 via the path 69 to control the local memory 1
It is determined whether vector data is written in the specific area within 6. If writing to the vector register 18 is possible, the data transfer processing unit 8 accesses a specific area of the local memory 16 via the path 65 and sends out an address from which to read the vector data. Data read from local memory 16 is thereby written to vector register 18 through path 66° selector 20. The selection information of the selector 20 is transmitted to the vector instruction decoder 2.
is generated from the instruction's operation code. When the reading from the local memory 16 is completed, the data transfer processing section 8 sends out an area reset signal in the local memory via the path 67. The state of the vector register 18 is managed by a state management section 19. When a vector register 18 is used by a resource in the vector processing unit, that resource generates a register free signal, which is sent via path 68 to the vector register 18 in the vector register state management circuit 19. The flip-flops holding the state of each register area are reset. Resources in the vector processing unit are omitted and not shown in the first part.

第３図は前述した第１図に示すベクトル処理装置におけ
るブリフェッチ・デコーダ１．ブリフェッチ処理用の命
令続出論理部３．ブリフェッチ処理用のベクトルアドレ
ス生成論理部４及びローカルメモリアクセス用のアドレ
ステーブル９の詳細を示すブロック図であり、以下、こ
れについて説明する。第３図において、第１図と同一の
符号は、同一物を示す。FIG. 3 shows the briffetch decoder 1 in the vector processing device shown in FIG. 1 described above. Instruction succession logic unit for brief fetch processing 3. It is a block diagram showing details of a vector address generation logic unit 4 for brief fetch processing and an address table 9 for local memory access, and will be described below. In FIG. 3, the same reference numerals as in FIG. 1 indicate the same parts.

第３図において、パス５１ｂを介してブリフェッチ処理
用の命令が、第１図に示す命令続出論理部３から送られ
、レジスタ１００にセットされる。In FIG. 3, an instruction for brief fetch processing is sent from the instruction successive logic section 3 shown in FIG. 1 via a path 51b, and is set in the register 100.

デコーダ１０１は、このレジスタ１００にセットされた
命令のオペレーションコード部を解読する。Decoder 101 decodes the operation code portion of the instruction set in register 100.

ブリフェッチ処理部は、ブリフェッチを行うベクトルロ
ード命令、ベクトルロード命令のためのアドレスレジス
タセットアツプ命令及びタグを解読している。この場合
、タグも命令の一種として解読され、これらは、デコー
ダ１０１によって解読される。The briefetch processing unit decodes a vector load instruction for performing a briefetch, an address register set-up instruction for the vector load instruction, and a tag. In this case, the tag is also decoded as a type of command, and these are decoded by the decoder 101.

デコーダ１０１がアドレスレジスタセットアツプ命令を
検出すると、デコーダ１０１は、パス１５０を通してス
イッチング回路１０２に選択信号を伝達する。これによ
り、レジスタ１００内の命令のオペランドにあるアドレ
ス情報がレジスタ１０３〜１０５に送出される。このレ
ジスタ１０３〜１０５には、夫々、ベクトル語長、ベク
トルベースアドレス、ベクトル増分アドレスが格納され
るとする。When decoder 101 detects an address register set-up instruction, decoder 101 transmits a selection signal to switching circuit 102 through path 150. As a result, the address information in the operand of the instruction in register 100 is sent to registers 103-105. It is assumed that a vector word length, a vector base address, and a vector increment address are stored in the registers 103 to 105, respectively.

また、アドレス情報は、命令のオペランドから直接得ら
れるものとしたが、必ずしもイミーデイエイト型に限る
ものではない。Further, although the address information is obtained directly from the operand of the instruction, it is not necessarily limited to the immediate type.

デコーダ１０１は、タグを解読するとパス５６を介して
セマフオ５をカウントアツプする。セマフオ５は、アッ
プダウンカウンタにより構成されている。読出されたタ
グがベクトル処理待ちを指示している場合、デコーダ１
０１は、フリップフロップ１０６をセットする。このフ
リップフロップ１０６の出力は、インバータ１０７によ
って反転され、ＡＮＤ回路１０８に入力される。フリッ
プフロップ１０６は、ベクトル処理部のアドレス生成論
理部７からパス５７を介してリセット信号が送られるま
でクリアされない。When the decoder 101 decodes the tag, it counts up the semaphore 5 via the path 56. The semaphore 5 is composed of an up/down counter. If the read tag indicates waiting for vector processing, decoder 1
01 sets flip-flop 106. The output of this flip-flop 106 is inverted by an inverter 107 and input to an AND circuit 108. Flip-flop 106 is not cleared until a reset signal is sent via path 57 from address generation logic 7 of the vector processing section.

デコーダ１０１がベクトルロード命令を解読すると、デ
コーダ１０１は、パス１５２に起動信号を送出する。こ
の起動信号は、ＡＮＤ回路１０８、ＯＲ回路１０９を介
してレジスタ１１０のセット信号としてレジスタ１１０
に与えられる。このＡＮＤ回路１０８を介した起動信号
は、同時にパス１５３を介してセレクタ１１１にも与え
られる。When decoder 101 decodes the vector load instruction, decoder 101 sends an activation signal to path 152. This activation signal is sent to the register 110 as a set signal for the register 110 via an AND circuit 108 and an OR circuit 109.
given to. The activation signal via this AND circuit 108 is also applied to the selector 111 via a path 153 at the same time.

これにより、セレクタ１１１は、レジスタ１０４の内容
を加算器１１２に送る。この結果、レジスタ１０４の内
容は、加算器１１２を通過してレジスタ１１０に格納さ
れる。セレクタ１１１は、パス１５３上の起動信号がオ
フになると、レジスタ１０５内の情報を加算器１１２に
送る。パス５３ｃ上には、第１図に示す記憶制御部１０
からリリース信号が送られて来る。このリリース信号は
、記憶制御部１０がリクエストを処理したことを示して
おり、ＡＮＤ向路１１３とＯＲ回路１０９とを経てレジ
スタ１１０にセット信号として与えられる。この結果、
パス５３ｃ上にリリース信号が送られてくる毎に、レジ
スタ１１０内の値にレジスタ１０５内の値が加算されて
、すなわち、レジスタ１０４内のベースアドレス値に、
順次ベクトル増分値が加算された値が、パス５３ａ上に
送出されることになる。パス５３ｃ上の信号及びパス１
５３上の信号は、カウンタ１１４をカウントアツプさせ
、その出力は、比較回路１１５によってレジスタ１０３
上のベクトル語長と比較される。この比較結果は、両者
の一致が得られたとき“１パとなり、そうでない場合“
０”となって、パス１５５上送出される。このパス１５
５上の信号“１゛′は、カウンタ１１４をリセットし、
レジスタ１００に対するセット信号となり、ＯＲ回路１
１７に入力される。この信号は、同時にフリップフロッ
プ１１６によってラッチされ、インバータ１１８によっ
て反転され、パス１５４を介してＡＮＤ回路１１３に入
力される。これにより、ベクトルアドレスの生成が中断
される。Thereby, the selector 111 sends the contents of the register 104 to the adder 112. As a result, the contents of register 104 are passed through adder 112 and stored in register 110. Selector 111 sends the information in register 105 to adder 112 when the activation signal on path 153 is turned off. On the path 53c, the storage control unit 10 shown in FIG.
A release signal is sent from This release signal indicates that the storage control unit 10 has processed the request, and is given as a set signal to the register 110 via the AND path 113 and the OR circuit 109. As a result,
Every time a release signal is sent on path 53c, the value in register 105 is added to the value in register 110, that is, the base address value in register 104 is added to the value in register 110.
The value to which the vector increment values are sequentially added is sent out on the path 53a. Signal on path 53c and path 1
The signal on 53 causes counter 114 to count up, and its output is sent to register 103 by comparison circuit 115.
It is compared with the vector word length above. The result of this comparison is “1 pass” when the two match, and “1 pass” otherwise.
0” and is sent on path 155. This path 15
The signal "1" on 5 resets the counter 114;
It becomes a set signal for the register 100, and the OR circuit 1
17. This signal is simultaneously latched by flip-flop 116, inverted by inverter 118, and input to AND circuit 113 via path 154. This interrupts vector address generation.

前述の動作で、ベクトルロード命令のアドレス生成が完
了したことになり、このとき、パス１５５上に信号が送
出される。また、デコーダ１０１は、アドレスセットア
ツプ命令を検出すると、パス５１ａ上に信号を送出する
。これらのパス１５５．５１ａ上の信号は、ＯＲ回路１
１７で論理和がとられた後、フリップフロップ１０６の
出力の反転信号と、ＡＮＤ回路１１９で論理積がとられ
る。このＡＮＤ回路１１９の出力は、デコーダ１０１で
解読した命令の完了を示す。但し、ここでは、タグを命
令に含めない。このＡＮＤ回路１１９の出力である命令
完了情報は、パス５７上のリセット信号とＯＲ回路１２
０で論理和がとられ、レジスタ１２１に対するセット信
号としてパス１５６上に送出される。The above operation completes address generation for the vector load instruction, and a signal is sent on path 155 at this time. Furthermore, when decoder 101 detects an address set-up command, it sends a signal onto path 51a. The signals on these paths 155.51a are sent to OR circuit 1
After the logical sum is performed in step 17, the logical product is performed with the inverted signal of the output of the flip-flop 106 in an AND circuit 119. The output of this AND circuit 119 indicates completion of the instruction decoded by the decoder 101. However, here, the tag is not included in the instruction. The instruction completion information that is the output of the AND circuit 119 is combined with the reset signal on the path 57 and the OR circuit 12.
It is ORed with 0 and sent on path 156 as a set signal for register 121.

レジスタ１２２は、プリセット命令語長を格納しており
、加算器１２３は、レジスタ１２２内のプリセット命令
語長とレジスタ１２１内の値を加算し、パス１５６上の
信号値が“１”のときのセットタイミングでパス５０上
に送出する。シンク情報は、パス５０及び５３ａ上に信
号を送出するときに定まった余分の信号値を信号線幅を
拡げて送出することによって記憶制御部に送られる。The register 122 stores the preset instruction word length, and the adder 123 adds the preset instruction word length in the register 122 and the value in the register 121, and calculates the value when the signal value on the path 156 is "1". It is sent onto the path 50 at the set timing. The sync information is sent to the storage control section by expanding the signal line width and sending out a predetermined extra signal value when sending signals onto the paths 50 and 53a.

第４図及び第５図は第１図に示すローカルメモリ１６、
ローカルメモリの状態管理回路１７、スイッチング論理
部１５及びアドレステーブル９の詳細を示す図であり、
以下、これについて説明する。これらの図は第３図と関
連があるので、同一の論理が重複して表わされている部
分がある。また、同一のパスには同一の符号が付けられ
ている。4 and 5 show the local memory 16 shown in FIG. 1,
1 is a diagram showing details of a local memory state management circuit 17, a switching logic unit 15, and an address table 9,
This will be explained below. Since these figures are related to FIG. 3, there are parts where the same logic is expressed overlappingly. Also, the same paths are given the same reference numerals.

第４図は主にローカルメモリ領域が書込可能か否かを制
御する構成を示し、第５図は同領域へのデータの書込み
と、同領域に対するベクトル処理部からのデータの読出
しを制御する構成を示している。Figure 4 mainly shows the configuration that controls whether or not the local memory area is writable, and Figure 5 shows the configuration that controls writing data to the same area and reading data from the vector processing unit to the same area. It shows the configuration.

第４図において、フリップフロップ２００は、ローカル
メモリ１６の個々の領域の書込可能状態を制御しており
、“０”が書込可能を示す。全フリップフロップ２００
が°“１”となっている場合、ＡＮＤ回路２０１の出力
は“１°′となり、パス１６２上の信号値は“０”とな
る。このパス１６２上の信号値“０”は、第３図に示す
ＡＮＤ回路１０８に作用し、ベクトルロード命令の起動
を抑止する。In FIG. 4, a flip-flop 200 controls the writable state of each area of the local memory 16, and "0" indicates writable. all flip flops 200
is “1”, the output of the AND circuit 201 is “1°”, and the signal value on the path 162 is “0”.The signal value “0” on this path 162 is It acts on the AND circuit 108 shown in the figure and suppresses activation of the vector load instruction.

フリップフロップ２００の値が、複数個″０”となって
いる場合、プライオリティ回路２０２は、その優先順序
を決定する。決定された結果は、エンコーダ２０３によ
ってコード化され、パス２５０上に送出される。このコ
ード情報は、スイッチング回路２０４に作用し、レジス
タ１０４，１０５内のベクトルベース値、ベクトル増分
値をアドレステーブル９内のテーブル９ａ、９ｂに送る
制御を行う。これにより、アドレステーブル９ａ、９ｂ
には、夫々ベクトルベース値及びベクトル増分値がセッ
トされる。パス２５０上のコード情報は、同時にデコー
ダ２０５によりデコードされ、ローカルメモリの領域に
対応するフリップフロップ２００を°“１′′にセット
し、その領域が書込不可能であることを表示させる。フ
リップフロップ２００は、ベクトル処理部からのローカ
ルメモリ１６の＋ｉＦ出しが行われ、不要になると、パ
ス２５２を介してリセットされる。パス２５０上のコー
ド情報は、さらに、レジスタ２０６にセットされた後、
パス５３ｂを経由して後述する第５図のレジスタ３０３
に送られる。この情報は、ローカルメモリの領域にデー
タを書込む場合のスイッチングのために利用される。When the values of the flip-flops 200 are "0" for a plurality of values, the priority circuit 202 determines the priority order. The determined result is encoded by encoder 203 and sent on path 250. This code information acts on the switching circuit 204 to control sending the vector base value and vector increment value in the registers 104 and 105 to the tables 9a and 9b in the address table 9. As a result, address tables 9a and 9b
A vector base value and a vector increment value are respectively set in . The code information on path 250 is simultaneously decoded by decoder 205, which sets flip-flop 200 corresponding to the local memory area to ``1'', indicating that the area is not writable. The +iF output of the local memory 16 from the vector processing unit is performed on the bus 200, and when it is no longer needed, it is reset via the path 252.The code information on the path 250 is further set in the register 206, and then
Register 303 in FIG. 5, which will be described later, via path 53b.
sent to. This information is used for switching when writing data to a local memory area.

次に、第５図において、主記憶部１１から続出されたベ
クトルデータは、パス３５０を経由してレジスタ３００
に格納される。同様に、シンク情報及びアドバンス情報
が、パス３５１，３５２を介して送られてくる。レジス
タ３００に格納されたデータは、スイッチング論理部１
５に作用するレジスタ３０１を介するシンク情報によっ
て、リクエスト・ソース先に分配される。パス５４は、
フェッチデータをローカルメモリへ分配するパスである
。一方、前述のように、パス５３ｂには、第４図のエン
コーダ２０３からの書込先のローカルメモリの領域を指
定する情報が伝播している。Next, in FIG.
is stored in Similarly, sync information and advance information are sent via paths 351 and 352. The data stored in the register 300 is stored in the switching logic section 1.
The sink information is distributed to the request source destination via the register 301 acting on the request source. The path 54 is
This is the path that distributes fetch data to local memory. On the other hand, as described above, information specifying the write destination local memory area from the encoder 203 in FIG. 4 is propagated through the path 53b.

この情報は、信号デイレイのためのレジスタ３０３を介
してスイッチング回路３０２に作用し、前記パス５４上
のデータをローカルメモリ１６の各領域に書込むよう制
御する。第５図では、ローカルメモリ１６の各領域は、
０〜ｎまであるとして示されている。This information acts on the switching circuit 302 via the register 303 for signal delay, and controls writing of the data on the path 54 into each area of the local memory 16. In FIG. 5, each area of the local memory 16 is
It is shown that there are 0 to n.

レジスタ３０５は、初め′０″にセットされ、その値は
、加算器３０６によって、主記憶部１１からデータが読
出される毎に＋１される。そしてこのレジスタ３０５の
値は、書込むべきローカルメモリの領域のアドレスを示
すことになる。この値は、スイッチング回路３０６によ
って、ローカルメモリ１６の各領域のアクセスのため分
配される。The register 305 is initially set to ``0'', and its value is incremented by 1 by the adder 306 each time data is read from the main memory section 11.The value of this register 305 is then set to ``0'' in the local memory to be written. This value is distributed by the switching circuit 306 for accessing each area of the local memory 16.

パス６５ａ上には、ベクトル処理部から、読出すべきロ
ーカルメモリのアドレスが送られてくる。On the path 65a, the address of the local memory to be read is sent from the vector processing section.

同様に、パス６５ｂ上には、ベクトル処理部から、読出
すべきローカルメモリ１６の領域の選択のために用いる
情報が送られてくる。これらの情報に基づいて、ローカ
ルメモリ１６から読出されたベクトルデータは、パス６
６を経由してベクトルレジスフ１８に送られる。Similarly, on the path 65b, information used for selecting the area of the local memory 16 to be read is sent from the vector processing section. Based on this information, the vector data read from the local memory 16 is transferred to the path 6.
6 to the vector register 18.

パス１５５上には、第３図により説明したように、比較
回路１１５からベクトルアドレスの生成完了信号が送ら
れてくる。この完了信号は、信号デイレイのためのレジ
スタ３０４を介し、スイッチング回路３０７によって、
書込みの行われているローカルメモリ１６の各領域に対
応するレジスタ３０８のいずれか１つに格納される。こ
のレジスタ３０８全体と、第４図に示すフリップフロッ
プ２００全体は、第１図に示すローカルメモリの状態管
理部１７を構成する。As explained with reference to FIG. 3, a vector address generation completion signal is sent from the comparator circuit 115 onto the path 155. This completion signal is passed through the register 304 for signal delay, and is sent to the switching circuit 307.
It is stored in one of the registers 308 corresponding to each area of the local memory 16 where writing is being performed. The entire register 308 and the entire flip-flop 200 shown in FIG. 4 constitute the local memory state management section 17 shown in FIG.

第６図は第１図に示すベクトル処理用のアドレス生成論
理部７及びデータ転送処理部８の詳細を示すブロック図
であり、以下、これについて説明する。FIG. 6 is a block diagram showing details of the vector processing address generation logic section 7 and data transfer processing section 8 shown in FIG. 1, and will be described below.

第６図において、第４図ですでに説明したように、レジ
スタ９ａには、ベクトルベースアドレスが、レジスタ９
ｂには、ベクトル増分アドレスが夫々格納されている。In FIG. 6, as already explained in FIG. 4, the vector base address is stored in register 9a.
Each vector increment address is stored in b.

パス４５０及び４５１上には、ベクトル処理部で処理さ
れる、主記憶部の参照を行うベクトル命令の処理に必要
なベースアドレス及び増分アドレスが送られてくる。こ
れらのアドレスデータのソースは、スカラ処理部でセッ
トアツプ系命令によって、ベクトル処理部内のレジスタ
にセットされたアドレスデータである。On paths 450 and 451, a base address and an incremental address necessary for processing a vector instruction that refers to the main memory, which is processed by the vector processing unit, are sent. The source of these address data is address data set in a register in the vector processing section by a set-up instruction in the scalar processing section.

パス４５２を介して、第１図に示すベクトル命令デコー
ダ２より、第６図に示す論理回路に対する起動がかけら
れると、比較回路４００は、プリフェッチされたベクト
ルデータがローカルメモリ１６内に存在するか否かを調
べる。すなわち、ベクトルデータの一致不一致は、ベー
スアドレスと増分アドレスの夫々が一致するか否かを比
較回路４００でチエツクすることにより調べられ、両ア
ドレスの一致は、ＡＮＤ回路４０１によってチェベられ
る。When the logic circuit shown in FIG. 6 is activated from the vector instruction decoder 2 shown in FIG. Find out whether or not. That is, whether or not the vector data match is checked by checking whether the base address and the incremental address match each other in the comparator circuit 400, and whether or not the two addresses match is checked by the AND circuit 401.

ベクトルロード命令の場合、ＯＲ回路４０２の出力は“
１”であり、ベクトルストア命令の場合、ＯＲ回路４０
２の出力は０”である。このＯＲ回路４０２の出力は、
パス６７ａを介して第１図に示すローカルメモリの状態
管理回路１７に送られる。また、ＡＮＤ回路４０１の出
力は、エンコーダ４０３によってコード化され、パス６
５ｂを通して第１図のローカルメモリ１６に送られる。In the case of a vector load instruction, the output of the OR circuit 402 is “
1'', and in the case of a vector store instruction, the OR circuit 40
The output of 2 is 0''.The output of this OR circuit 402 is
It is sent to the local memory state management circuit 17 shown in FIG. 1 via a path 67a. Further, the output of the AND circuit 401 is encoded by the encoder 403, and the output of the AND circuit 401 is encoded by the encoder 403.
5b to the local memory 16 in FIG.

パス６５ｂのシンク先は、第５図に示すセレクタ３１１
である。The sink destination of the path 65b is the selector 311 shown in FIG.
It is.

レジスタ４０４には、ベクトル長が保持されている。論
理回路４０５はカウンタであり、レジスタ４０７内のデ
ータを毎サイクル＋１カウントアツプする。レジスタ４
０７の出力は、パス６５ａを介して、第５図のローカル
メモリ１６に送られ、ローカルメモリ参照アドレスとな
る。同時に、レジスタ４０４の値と４０７の値とが比較
回路４０６で比較され、この結果がパス７０ｂを介して
記憶制御部１０に送られる。The register 404 holds the vector length. Logic circuit 405 is a counter and counts up the data in register 407 by +1 every cycle. register 4
The output of 07 is sent to the local memory 16 in FIG. 5 via a path 65a, and becomes a local memory reference address. At the same time, the value of register 404 and the value of register 407 are compared by comparison circuit 406, and the result is sent to storage control unit 10 via path 70b.

ベクトルストア命令の場合、セレクタ４１０は、初めパ
ス４５０上のベースアドレス１Ｉｒｌ算器４１１を通し
てレジスタ４１２にセットし、続いてパス４５１上のベ
クトル増分アドレスを加算器に送る。For vector store instructions, selector 410 first sets the base address on path 450 to register 412 through 1Irl multiplier 411, and then sends the vector increment address on path 451 to the adder.

この加算結果は、レジスタ４１２に格納され、パス７０
ａを通って記憶制御部１０に送られる。インデクス付の
ベクトル命令の場合、パス４５１上のベクトル増分値の
代わりに、バス４５５上のインデクス値が使用される。The result of this addition is stored in register 412 and
a and is sent to the storage control unit 10. For indexed vector instructions, the index value on bus 455 is used instead of the vector increment value on path 451.

セレクタ４１０は、第１図に示すベクトル命令デコーダ
２によってその動作が指示される。The operation of selector 410 is instructed by vector instruction decoder 2 shown in FIG.

〔Effect of the invention〕

以上説明したように、本発明によれば、ベクトル処理に
おいて、処理性能に最も大きく寄与するベクトルデータ
フェッチ動作を、プログラムの論理区分単位に、ベクト
ル演算処理とオーバラップさせることが可能とできる。As described above, according to the present invention, in vector processing, the vector data fetch operation, which contributes most to the processing performance, can be overlapped with the vector arithmetic processing for each logical division of the program.

これにより、本発明は、従来技術によるベクトル処理装
置のチエイニング制御による命令実行ステージのオーバ
ラップ効果よりも広い範囲で処理のオーバラップをさせ
ることが可能となり、特に、階層構造の記憶装置を有す
るベクトル処理装置において、ベクトルロード処理とベ
クトル演算処理との重複化によって、ベクトルデータ読
出しが高速化できるという効果を奏する。従って、本発
明によれば、ベクトル処理装置の主記憶部を比較的低速
の記憶素子で構成することが可能となり、処理装置全体
を安価に構成することが可能となる。As a result, the present invention makes it possible to overlap processing in a wider range than the overlap effect of instruction execution stages due to chaining control of vector processing devices according to the prior art. In the processing device, by duplicating vector load processing and vector calculation processing, it is possible to speed up vector data reading. Therefore, according to the present invention, it is possible to configure the main memory section of the vector processing device with relatively low-speed storage elements, and it is possible to configure the entire processing device at low cost.

[Brief explanation of the drawing]

第１図は本発明の一実施例によるベクトル処理装置の概
略構成を示すブロック図、第２図はブリフェッチ処理と
ベクトル処理との実行状況を説明する図、第３図はブリ
フェッチ処理部のブロック図、第４図、第５図はローカ
ルメモリの状態管理部のブロック図、第６図はデータ転
送処理部のブロック図である。ｌ−・・−・−ブリフェッチ・デコーダ、２・−−一−
−−ベクトル命令デコーダ、３−・・−ブリフェッチ処
理用の命令続出論理部、４−・−・ブリフェッチ処理用
のベクトルアドレス生成論理部、５・・・・−・セマフ
ォ、６−−・−ベクトル命令続出論理部、７−・−・ベ
クトル処理用アドレス生成論理部、８−・−・・データ
転送処理部、９−・−・−ローカルメモリアクセス用の
アドレステーブル、１０・−−−−ｍ−記憶制御部、１
１−・−主記憶部、１５・−・・−スイッチング論理部
、１６−・−ローカルメモリ、１７−・−一一一一ロー
カルメモリの状態管理部、１８−・−・ベクトルレジス
タ、１９−−−−−−−ベクトルレジスタの状態管理部
。一イ第１図第４図Ｆｉｇ、５ｔｏｓｃｕ　　　ｔｏＬＭFIG. 1 is a block diagram showing a schematic configuration of a vector processing device according to an embodiment of the present invention, FIG. 2 is a diagram explaining the execution status of brifetch processing and vector processing, and FIG. 3 is a block diagram of a brifetch processing section. , FIGS. 4 and 5 are block diagrams of the local memory state management section, and FIG. 6 is a block diagram of the data transfer processing section. l-・--Briefetch decoder, 2・--1-
--Vector instruction decoder, 3---Instruction succession logic unit for brief-fetch processing, 4---Vector address generation logic unit for brief-fetch processing, 5--Semaphore, 6---Vector Instruction successive logic unit, 7-- Address generation logic unit for vector processing, 8-- Data transfer processing unit, 9-- Address table for local memory access, 10--m -Storage control unit, 1
1--Main memory unit, 15--Switching logic unit, 16--Local memory, 17--1111 Local memory state management unit, 18--Vector register, 19- -------Vector register state management section. Fig. 1 Fig. 4 Fig, 5 toscue toLM

Claims

[Claims] 1. In a vector processing device, a storage section consisting of a plurality of hierarchies, a plurality of logic sections that decode vector instructions that refer to the main storage section, and a plurality of logic sections that decode the vector instructions. A vector processing device comprising: a logic unit that performs synchronization control between the units; and a data path provided between any hierarchy of the storage unit and a vector register. 2. In a vector processing device, a storage unit consisting of a plurality of hierarchies, a plurality of logic units that decode vector instructions that refer to the main memory unit, and synchronization control between the plurality of logic units that decode the vector instructions. an ethics unit that stores an index of vector data held in the storage unit; a logic unit that manages whether access to the vector data held in the storage unit is possible; 1. A vector processing device comprising: a data path provided between any one of the layers of the section and a vector register.