JP4789269B2

JP4789269B2 - Vector processing apparatus and vector processing method

Info

Publication number: JP4789269B2
Application number: JP2008102198A
Authority: JP
Inventors: 尊博内田
Original assignee: NEC Computertechno Ltd
Current assignee: NEC Computertechno Ltd
Priority date: 2008-04-10
Filing date: 2008-04-10
Publication date: 2011-10-12
Anticipated expiration: 2028-04-10
Also published as: JP2009252133A

Description

本発明はベクトル演算処理に関する。 The present invention relates to vector operation processing.

一般にベクトル処理装置は、主記憶装置からロードしたベクトルデータやベクトル演算中の中間結果などを保持する複数のベクトル演算レジスタと、ベクトル演算レジスタに保持されたベクトルデータに対する演算を行うベクトル演算器とを備える。 In general, a vector processing device includes a plurality of vector operation registers that hold vector data loaded from a main memory, intermediate results during vector operations, and a vector operation unit that performs operations on vector data held in the vector operation registers. Prepare.

メモリへアクセスしてから読み出しデータが帰ってくるまでの時間はメモリアクセスＴＡＴと呼ばれる。近年のベクトル型コンピュータでは、動作クロックの高速化にともなって、メモリアクセスＴＡＴが命令処理時間に対して相対的に長くなる傾向がある。 The time from the access to the memory until the read data returns is called the memory access TAT. In recent vector computers, the memory access TAT tends to be relatively long with respect to the instruction processing time as the operation clock increases.

特許文献１には、主記憶装置からベクトル演算レジスタへのロードを高速化するために、メインメモリとベクトル演算レジスタとの間にベクトルデータを一時的に格納するロードバッファを備えるベクトル処理装置が記載されている。ロードバッファは、ベクトルデータの全ての要素が揃った条件と、転送先のベクトル演算レジスタのリソースが空いた条件の両方が整ってベクトル演算レジスタにベクトルデータが転送されるまで、一時的にデータをバッファリングする機能を持つ。 Patent Document 1 describes a vector processing device including a load buffer that temporarily stores vector data between a main memory and a vector operation register in order to speed up loading from the main storage device to the vector operation register. Has been. The load buffer temporarily stores data until the vector data is transferred to the vector operation register after both the condition that all the elements of the vector data are complete and the condition that the transfer destination vector operation register resource is available. Has the ability to buffer.

特許文献２には、ベクトルデータバッファの物量の増加を防ぎ、利用効率を向上させることのできるベクトル処理装置を提供することを目的とした技術が記載されている。このベクトル処理装置は、命令により指令された配列データの配列系統数を保持する配列系統数保持手段と、その配列系統数保持手段により保持された値により複数の配列系統ごとにベクトルデータバッファを分割し、配列データの入出力を制御する分割制御手段を備えることを特徴とする。
特開２００５−２５６９３号公報特開平６−２７４５２６号公報 Patent Document 2 describes a technique aimed at providing a vector processing device capable of preventing an increase in the amount of the vector data buffer and improving the utilization efficiency. This vector processing device divides a vector data buffer for each of a plurality of array systems by means of array system number holding means for holding the number of array systems of array data instructed by an instruction, and a value held by the array system number holding means And division control means for controlling the input / output of the array data.
JP 2005-25693 A JP-A-6-274526

近年のベクトル型コンピュータでは、動作クロックの高速化にともなって、メモリへアクセスしてから読み出しデータが帰ってくるまでの時間（メモリアクセスＴＡＴ）が命令処理時間に対して相対的に長くなる傾向がある。 In recent vector computers, as the operation clock speeds up, the time from accessing the memory until the read data returns (memory access TAT) tends to be relatively longer than the instruction processing time. is there.

ロードバッファは、演算器を有効に活用するために演算命令に先行して発行されるベクトルロード命令によってメモリから帰ってくるロードデータを一時的にバッファリングすることによってメモリアクセスＴＡＴを隠蔽する役割を担う。相対的に長くなったメモリアクセスＴＡＴを隠蔽するためには、より大容量のロードバッファが必要とされる。限られたチップ面積を有効に活用するために、ロードバッファを効率的に利用し、ロードバッファ容量の増加を抑制することが求められる。 The load buffer serves to conceal the memory access TAT by temporarily buffering the load data returned from the memory by the vector load instruction issued prior to the arithmetic instruction in order to effectively use the arithmetic unit. Bear. In order to conceal the relatively long memory access TAT, a larger capacity load buffer is required. In order to effectively use a limited chip area, it is required to efficiently use a load buffer and suppress an increase in load buffer capacity.

ロードバッファを有するベクトル処理装置は、ベクトルロード命令で指定された要素が全てロードバッファに格納されたことを確認して、ロードバッファからベクトル演算レジスタへの転送を開始する。こうした処理では、メモリインタリーブが効かないようなアドレスに対するロードであった場合や、要素数の大きいベクトルロード命令の場合、全ての要素がロードバッファに格納されるまでに時間を要してしまう。その結果、ロードバッファからベクトル演算レジスタへの転送が遅れてしまい、そのロードデータを使う後続命令の実行も遅くなってしまうという問題点があった。 The vector processing apparatus having the load buffer confirms that all elements specified by the vector load instruction are stored in the load buffer, and starts transfer from the load buffer to the vector operation register. In such processing, when loading is performed at an address where memory interleaving does not work, or when a vector load instruction having a large number of elements is used, it takes time until all the elements are stored in the load buffer. As a result, there is a problem that transfer from the load buffer to the vector operation register is delayed and execution of subsequent instructions using the load data is also delayed.

本発明の目的は、ベクトル演算器特有のロードバッファの使用効率を向上させることでシステム全体としての性能の向上を可能とするシステムを提供することにある。 An object of the present invention is to provide a system capable of improving the performance of the entire system by improving the use efficiency of a load buffer unique to a vector computing unit.

本発明によるベクトル処理装置は、受信した命令に基づいてメインメモリからベクトルデータを読み出すメモリアクセス制御部と、メモリアクセス制御部が読み出したベクトルデータを格納するロードバッファと、ベクトル演算レジスタを備えロードバッファからベクトル演算レジスタに転送されたベクトルデータをベクトル処理するベクトル処理部と、ベクトルデータを構成する複数の要素を複数の要素群に分け、複数の要素群のうちで全ての要素がメモリアクセス制御部によりメインメモリから読み出された要素群に対して、ロードバッファからベクトル処理部への転送を開始するように制御するベクトル命令発行部とを備える。 A vector processing apparatus according to the present invention includes a memory access control unit that reads vector data from a main memory based on a received instruction, a load buffer that stores vector data read by the memory access control unit, and a vector operation register. A vector processing unit that vector-processes vector data transferred from a vector to a vector operation register, and a plurality of elements constituting the vector data are divided into a plurality of element groups, and all the elements of the plurality of element groups are memory access control units. And a vector instruction issuing unit that controls the element group read from the main memory to start transfer from the load buffer to the vector processing unit.

本発明によるベクトル処理方法は、受信した命令に基づいてメインメモリからベクトルデータを読み出すステップと、メモリアクセス制御部が読み出したベクトルデータをロードバッファに格納するステップと、ロードバッファからベクトル演算レジスタに転送されたベクトルデータをベクトル処理するステップと、ベクトルデータを構成する複数の要素を複数の要素群に分け、複数の要素群のうちで全ての要素がメモリアクセス制御部によりメインメモリから読み出された要素群に対して、ロードバッファからベクトル処理部への転送を開始するように制御するステップとを備える。 A vector processing method according to the present invention includes a step of reading vector data from a main memory based on a received instruction, a step of storing vector data read by a memory access control unit in a load buffer, and a transfer from the load buffer to a vector operation register The vector processing of the vector data and a plurality of elements constituting the vector data are divided into a plurality of element groups, and all elements of the plurality of element groups are read from the main memory by the memory access control unit And a step of controlling the element group to start transfer from the load buffer to the vector processing unit.

本発明により、ベクトルロード命令のデータを使った後続のベクトル演算命令を早く実行可能となるためベクトル演算器の使用効率が向上し、システムトータルの性能を向上させることが可能となる。 According to the present invention, subsequent vector operation instructions using the data of the vector load instruction can be executed quickly, so that the use efficiency of the vector operation unit is improved and the total system performance can be improved.

以下、本発明の実施の形態について図面を参照して詳細に説明する。図１を参照すると、本発明の一実施の形態としての構成概略図が示されている。本実施の形態におけるベクトル処理装置は、命令デコード部１、ベクトルロードリクエスト処理部２、ロードデータ整列判定部３、ベクトル命令発行部４、ベクトル命令処理部５、メモリアクセス制御部６及びメインメモリ７を備える。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Referring to FIG. 1, there is shown a schematic configuration diagram as an embodiment of the present invention. The vector processing apparatus according to the present embodiment includes an instruction decoding unit 1, a vector load request processing unit 2, a load data alignment determining unit 3, a vector instruction issuing unit 4, a vector instruction processing unit 5, a memory access control unit 6, and a main memory 7. Is provided.

命令デコード部１は入力した命令列をデコードする。命令デコード部１は、デコードした命令がベクトルロード命令を含むベクトル命令の場合は、そのベクトル命令１２と付随情報とをベクトル命令発行部４に送出する。命令デコード部１は、デコードした命令がベクトルロード命令の場合は、そのベクトルロード命令１１と付随情報とをベクトルロードリクエスト処理部２に送出する。命令デコード部１は、ベクトルロードリクエスト処理部２からビジー信号１０を受け取った場合は、ベクトルロード命令の送出を一時的に止める機能を持つ。 The instruction decoding unit 1 decodes the input instruction sequence. If the decoded instruction is a vector instruction including a vector load instruction, the instruction decoding unit 1 sends the vector instruction 12 and accompanying information to the vector instruction issuing unit 4. If the decoded instruction is a vector load instruction, the instruction decoding unit 1 sends the vector load instruction 11 and accompanying information to the vector load request processing unit 2. When receiving the busy signal 10 from the vector load request processing unit 2, the instruction decoding unit 1 has a function of temporarily stopping transmission of the vector load instruction.

ベクトルロードリクエスト処理部２は、命令デコード部１よりベクトルロード命令１１と付随情報とを受信すると、その命令によってロードされるベクトルデータを格納するためのロードバッファを確保する処理を行う。この処理は、以下のように行われる。 When the vector load request processing unit 2 receives the vector load instruction 11 and the accompanying information from the instruction decoding unit 1, the vector load request processing unit 2 performs a process of securing a load buffer for storing vector data loaded by the instruction. This process is performed as follows.

受信した命令によってロードされるベクトルデータを構成する要素の数に基づいて、そのベクトルデータを格納するために必要なサブロードバッファ（例示：２５６ｂｉｔのロードバッファを４個に分けることによって用意される６４ｂｉｔ単位のサブロードバッファ）の数が算出される。複数存在するロードバッファの中から、算出された数のサブロードバッファの空きがあるロードバッファが、ロードをリクエストする宛先である確保済みロードバッファとして指定される。更に、確保済みロードバッファが有する空きサブロードバッファの中から、ベクトルデータを格納するために必要な数のサブロードバッファが確保済みサブロードバッファとして指定される。確保済みロードバッファを示す確保済みロードバッファ番号と、確保済みサブロードバッファを示す確保済みサブロードバッファ番号とを含むロードバッファ確保番号通知１３、２３がロードデータ判定部３とベクトル命令発行部４に送出される。 Based on the number of elements constituting the vector data loaded by the received instruction, a subload buffer necessary for storing the vector data (example: 64-bit prepared by dividing a 256-bit load buffer into four parts) The number of units (subload buffers) is calculated. Among the plurality of load buffers, the load buffer having the calculated number of sub-load buffer vacancies is designated as the reserved load buffer that is the destination requesting the load. Further, the number of subload buffers necessary for storing vector data is designated as the reserved subload buffers from the free subload buffers of the reserved load buffers. Load buffer allocation number notifications 13 and 23 including an allocated load buffer number indicating an allocated load buffer and an allocated sub load buffer number indicating an allocated sub load buffer are sent to the load data determination unit 3 and the vector instruction issue unit 4 Sent out.

ベクトルロードリクエスト処理部２は、タグ及びアドレス１８を生成してメモリアクセス制御部６に送出する。アドレスは、ロードするベクトルデータを構成する複数の要素が格納されているメインメモリ７上の場所を示す。タグは、ロードするベクトルデータが格納されるロードバッファ上の宛先を示す確保済みロードバッファ番号と確保済みサブロードバッファ番号とを含む。必要な数のロードバッファが確保できなかった場合には、ベクトルロードリクエスト処理部２は、ビジー信号１０を命令デコード部１に送信する。 The vector load request processing unit 2 generates a tag and address 18 and sends them to the memory access control unit 6. The address indicates a location on the main memory 7 where a plurality of elements constituting vector data to be loaded are stored. The tag includes a reserved load buffer number indicating a destination on the load buffer in which vector data to be loaded is stored and a reserved subload buffer number. When the necessary number of load buffers cannot be secured, the vector load request processing unit 2 transmits a busy signal 10 to the instruction decoding unit 1.

ロードデータ整列判定部３は、サブロードバッファ毎にロードされるベクトルデータの要素数を予め記憶し、メインメモリ７からサブロードバッファへベクトル要素が送出されたときにその要素数をデクリメントして残り要素数をモニタすることによって、サブロードバッファ単位ですべてのレジスタ領域にデータが送信されたことを認識する整列判定処理を行う。 The load data alignment determination unit 3 stores in advance the number of elements of vector data to be loaded for each subload buffer, and decrements the number of elements when the vector element is sent from the main memory 7 to the subload buffer. By monitoring the number of elements, alignment determination processing for recognizing that data has been transmitted to all register areas in units of subload buffers is performed.

より詳しくは、ロードデータ整列判定部３は、受信した確保番号通知１３で指定されている確保済みサブロードバッファ番号に対応する整列判定カウンタに、リプライ要素数分の値をセットする。メモリアクセス制御部６は、メインメモリ７からベクトルデータの要素を読み出してロードバッファの方に送出したときに、その宛先をタグ１９としてロードデータ整列判定部３に送出する。ロードデータ整列判定部３は、そのタグ１９をデコードし、サブロードバッファ番号毎に集計した値を整列判定カウンタにセットされた値から減算する。減算の結果が０となったら、リプライ予定の要素が全て揃ったと判定し、ベクトル命令発行部４に整列通知１５を送出する。 More specifically, the load data alignment determination unit 3 sets a value corresponding to the number of reply elements in the alignment determination counter corresponding to the reserved subload buffer number designated by the received reservation number notification 13. When the memory access control unit 6 reads the vector data element from the main memory 7 and sends it to the load buffer, the memory access control unit 6 sends the destination to the load data alignment determination unit 3 with the destination as a tag 19. The load data alignment determination unit 3 decodes the tag 19 and subtracts the value totaled for each subload buffer number from the value set in the alignment determination counter. When the result of subtraction becomes 0, it is determined that all elements scheduled for reply are prepared, and an alignment notification 15 is sent to the vector instruction issuing unit 4.

ベクトル命令発行部４は、以下のようにベクトル命令発行処理とロードバッファ解除処理とを行う。ベクトル命令発行部４は、命令デコード部１より、ベクトル演算命令やベクトルロード命令を含むベクトル命令１２を受け取る。ベクトル演算命令を受け取った場合は、必要に応じてベクトル演算レジスタなどの各種資源のビジー状況ならびに命令間の整合性を確認した上で、適切なタイミングでベクトル演算開始指示１７をベクトル命令処理部５に送出する。ベクトルロード命令を受け取った場合は、そのベクトルロード命令に対応するロードバッファ確保番号通知２３を受け取り、対応するロードバッファの整列通知１５ならびに転送先ベクトル演算レジスタ領域のビジー状況ならびに命令間の整合性を確認した上で、適切なタイミングでロードバッファ転送開始指示１６をベクトル命令処理部５に送出する。その際に転送を開始したロードバッファ番号をロードバッファ解放通知１４としてベクトルロードリクエスト処理部２に送出する。 The vector instruction issue unit 4 performs a vector instruction issue process and a load buffer release process as follows. The vector instruction issuing unit 4 receives a vector instruction 12 including a vector operation instruction and a vector load instruction from the instruction decoding unit 1. When a vector operation instruction is received, the busy state of various resources such as a vector operation register and the consistency between instructions are confirmed as necessary, and a vector operation start instruction 17 is sent at an appropriate timing to the vector instruction processing unit 5. To send. When a vector load instruction is received, a load buffer allocation number notification 23 corresponding to the vector load instruction is received, the corresponding load buffer alignment notification 15 and the busy status of the transfer destination vector operation register area and the consistency between instructions are checked. After confirmation, a load buffer transfer start instruction 16 is sent to the vector instruction processing unit 5 at an appropriate timing. At this time, the load buffer number that has started the transfer is sent to the vector load request processing unit 2 as a load buffer release notification 14.

ベクトル命令処理部５は、ロードバッファ５−１と、ベクトル演算レジスタ５−２とを備え、ベクトル演算レジスタ５−２に格納されたデータに対してベクトル演算処理を行う機能を有する。ベクトル命令処理部５は、以下のようにベクトル命令処理を行う。ロードバッファ転送開始指示１６を受け取ると、ロードバッファ５−１の指定された領域からロードデータを読み出して指定されたベクトル演算レジスタ領域に格納する。ベクトル演算開始指示１７を受け取ると、指定されたベクトル演算レジスタ領域からデータを読み出して所定のベクトル演算を行った結果を、指定されたベクトル演算レジスタ領域に格納する処理を行う。 The vector instruction processing unit 5 includes a load buffer 5-1 and a vector operation register 5-2, and has a function of performing vector operation processing on data stored in the vector operation register 5-2. The vector instruction processing unit 5 performs vector instruction processing as follows. When the load buffer transfer start instruction 16 is received, the load data is read from the designated area of the load buffer 5-1, and stored in the designated vector operation register area. When the vector calculation start instruction 17 is received, a process of reading data from the designated vector calculation register area and performing a predetermined vector calculation is stored in the designated vector calculation register area.

メモリアクセス制御部６は、タグ及びアドレス１８を受け取る。タグ及びアドレス１８のアドレスには、読み出されるベクトルデータの各要素のメインメモリ７上のアドレスを示す情報であるデータ読み出しアドレス２１が示されている。受け取ったタグ及びアドレス１８に基づいて、データ読み出しアドレス２１がメインメモリ７に送出される。メモリアクセス制御部６は、メインメモリ７から読み出しデータ２２を受け取る。受け取った読み出しデータ２２と、タグ及びアドレス１８のタグに示される読み出しデータ２２の宛先とを含むタグ及びデータ２０が、ベクトル命令処理部５に送出される。更に、タグ及びアドレス１８において読み出しデータ２２に対応するタグ１９が、ロードデータ整列判定部３に送出される。メモリアクセス制御部６はインタリーブ機能を有しており、メインメモリ７上の分割された領域に対して並列的にアクセスすることが可能な機能を持つ。 The memory access control unit 6 receives the tag and address 18. The address of the tag and address 18 indicates a data read address 21 which is information indicating the address on the main memory 7 of each element of the vector data to be read. A data read address 21 is sent to the main memory 7 based on the received tag and address 18. The memory access control unit 6 receives read data 22 from the main memory 7. The tag and data 20 including the received read data 22 and the destination of the read data 22 indicated by the tag and the tag of the address 18 are sent to the vector instruction processing unit 5. Further, the tag 19 corresponding to the read data 22 at the tag and address 18 is sent to the load data alignment determination unit 3. The memory access control unit 6 has an interleaving function, and has a function capable of accessing the divided areas on the main memory 7 in parallel.

メインメモリ７は、プログラムおよびベクトルデータを格納する。メインメモリ７は、メモリアクセス制御部６からデータ読み出しアドレス２１を受け取ると、そのアドレスのメモリ素子からデータを読み出して読み出しデータ２２としてメモリアクセス制御部６に送出する。メインメモリ７は、インタリーブ構成をとることができるように、複数に分割されたメモリ領域のそれぞれにアクセスポートを有する。 The main memory 7 stores programs and vector data. When the main memory 7 receives the data read address 21 from the memory access control unit 6, the main memory 7 reads the data from the memory element at the address and sends it to the memory access control unit 6 as read data 22. The main memory 7 has an access port in each of the divided memory areas so that an interleaved configuration can be adopted.

図２は、ロードデータ整列判定部３の詳細を示す。要素数セット部３０１は、受け取った確保番号通知１３に含まれる確保済みロードバッファ番号と確保済みサブロードバッファ番号により、ロードバッファ及びサブロードバッファの使用箇所を特定する。要素数セット部３０１は、受け取ったベクトルロード命令１１によりロードされるベクトルデータを構成する複数の要素を複数の確保済みサブロードバッファに分配して格納するために、各々の確保済みサブロードバッファに格納する要素の数であるサブロードバッファ格納予定要素数３５１を算出して、要素数減算カウンタ部３０２に送出する。 FIG. 2 shows details of the load data alignment determination unit 3. The number-of-elements setting unit 301 specifies the use location of the load buffer and the sub load buffer based on the reserved load buffer number and the reserved sub load buffer number included in the received reservation number notification 13. The number-of-elements setting unit 301 distributes a plurality of elements constituting vector data loaded by the received vector load instruction 11 to a plurality of reserved subload buffers, and stores them in each reserved subload buffer. The subload buffer storage scheduled element number 351 that is the number of elements to be stored is calculated and sent to the element number subtraction counter unit 302.

本実施の形態においては、ベクトルデータの最大要素数を２５６とし、ベクトルデータを分割してロードする分割単位であるサブロードバッファの要素数を６４とする。ロードするベクトルデータの要素数が１〜６４までの場合は１個の分割単位を、要素数が６５〜１２８までの場合は２個の分割単位を、要素数が１２９〜１９２までの場合は３個の分割単位を、要素数が１９３〜２５６までの場合は４個の分割単位を使用する。 In the present embodiment, the maximum number of elements of vector data is 256, and the number of elements of a subload buffer, which is a division unit for dividing and loading vector data, is 64. When the number of elements of the vector data to be loaded is 1 to 64, one division unit is used. When the number of elements is 65 to 128, two division units are used. When the number of elements is 129 to 192, 3 is used. When the number of elements is 193 to 256, four division units are used.

メインメモリ７から読み出されたデータは、一つのロードバッファ５−１に対して、そのロードバッファ５−１の領域を所定の分割単位で分割したサブ領域であるサブロードバッファを単位として転送される。一つのロードバッファ５−１に対応する複数のサブロードバッファは、サブロードバッファ番号によってそれぞれ特定される。要素数セット部３０１は、ロードするベクトルデータの要素数を分割して、確保番号通知１３により指定されたサブロードバッファに割り当てることにより、サブロードバッファ格納予定要素数３５１を生成する。この割り当ては、サブロードバッファ番号がより若番の確保済みサブロードバッファから要素が６４個ずつ詰められるように行われる。要素数が６４で割り切れず端数が生じるケースでは、最も番号の大きい確保済みサブロードバッファ番号に端数が割り当てられる。 The data read from the main memory 7 is transferred to one load buffer 5-1 in units of sub load buffers, which are sub areas obtained by dividing the area of the load buffer 5-1 by a predetermined division unit. The A plurality of sub load buffers corresponding to one load buffer 5-1 are specified by sub load buffer numbers. The element number setting unit 301 generates the subload buffer storage scheduled element number 351 by dividing the number of elements of the vector data to be loaded and assigning it to the subload buffer specified by the reservation number notification 13. This allocation is performed so that 64 elements are packed from the reserved subload buffer having a lower subload buffer number. In the case where the number of elements is 64 and the fraction is not divisible, the fraction is assigned to the reserved subload buffer number having the largest number.

要素数減算カウンタ部３０２は、サブロードバッファと等しい数の整列判定カウンタと、整列判定カウンタの各々に対応する有効フラグとを備える。要素数減算カウンタ部３０２は、未確保のサブロードバッファの各々の有効フラグに“０”をセットし、確保済みサブロードバッファの各々に対応する有効フラグに“１”をセットする。要素数減算カウンタ部３０２は、有効フラグが“１”である各々の整列判定カウンタに、サブロードバッファ格納予定要素数３５１をセットする。 The element number subtraction counter unit 302 includes a number of alignment determination counters equal to the number of subload buffers and valid flags corresponding to the alignment determination counters. The element number subtraction counter unit 302 sets “0” to each valid flag of the unreserved subload buffer, and sets “1” to the valid flag corresponding to each secured subload buffer. The element number subtraction counter unit 302 sets the subload buffer storage scheduled element number 351 in each alignment determination counter whose valid flag is “1”.

ロードバッファ番号振り分け部３０３は、複数のタグ１９を受け取る。複数のタグ１９の各々には、ロードするベクトルデータを構成する要素のうち、メインメモリ７から読み出された要素が格納されるロードバッファ５−１を特定する確保済みロードバッファ番号と確保済みサブロードバッファ番号とが含まれる。ロードバッファ番号振り分け部３０３は、複数のタグ１９をそれぞれデコードしてサブロードバッファ番号毎に振り分けて要素数カウンタ３０４に送る機能を有する。 The load buffer number assigning unit 303 receives a plurality of tags 19. Each of the plurality of tags 19 includes a reserved load buffer number and a reserved sub that identify a load buffer 5-1 in which an element read from the main memory 7 among the elements constituting the vector data to be loaded is stored. Contains the load buffer number. The load buffer number assigning unit 303 has a function of decoding each of the plurality of tags 19 and assigning them to each sub load buffer number and sending them to the element number counter 304.

要素数カウンタ３０４は、複数のロードバッファ５−１が備える複数のサブロードバッファのそれぞれに対応したカウンタを有する。それぞれのカウンタは、ロードバッファ番号振り分け部３０３がタグ１９を受け取る毎に、ロードバッファ番号振り分け部３０３で対応するサブロードバッファが振り分けの宛先となった回数３５２をカウントアップして、要素数減算カウンタ部３０２に送出する。ロードバッファ番号振り分け部３０３が同時に受け取り可能なタグ１９の数をＮとすると（即ちベクトルデータを構成するＮ個の要素に対応するタグを同時に受け取り可能であるとすると）、全てのタグ１９が同一サブロードバッファ番号宛であった場合にはカウントした値はＮとなる（最大値はＮ）。 The element counter 304 has a counter corresponding to each of the plurality of sub load buffers provided in the plurality of load buffers 5-1. Each time the load buffer number distribution unit 303 receives the tag 19, each counter counts up the number of times 352 that the corresponding sub load buffer became the distribution destination in the load buffer number distribution unit 303, and the element number subtraction counter To the unit 302. If the number of tags 19 that can be received simultaneously by the load buffer number assigning unit 303 is N (that is, if tags corresponding to N elements constituting the vector data can be received simultaneously), all the tags 19 are the same. If it is addressed to the subload buffer number, the counted value is N (the maximum value is N).

整列判定部３０５は、各々のサブロードバッファに予定数（サブロードバッファ格納予定要素数３５１と同じ）のロードデータが格納されたことを判定する機能を持つ。この判定は、要素数減算カウンタ部３０２の各整列判定カウンタの値が“０”であることと、それに対応する有効フラグが“１”であることとを条件として成立する。この条件が成立すると、整列判定部３０５は、サブロードバッファに予定数の要素が全てロードされたことを示す整列通知１５をベクトル命令発行部４に通知する。 The alignment determination unit 305 has a function of determining that a predetermined number of load data (same as the number 351 of scheduled subload buffer storage elements) is stored in each subload buffer. This determination is made on condition that the value of each alignment determination counter of the element number subtraction counter unit 302 is “0” and the corresponding valid flag is “1”. When this condition is satisfied, the alignment determining unit 305 notifies the vector instruction issuing unit 4 of an alignment notification 15 indicating that all the predetermined number of elements have been loaded into the subload buffer.

図３は、ベクトル命令発行部４を示す。ベクトル命令発行部４は、２つのベクトル演算命令発行待ちバッファ４１０、４３０と、２つのベクトルロード命令発行待ちバッファ４２０、４４０とを備えるものとする。命令バッファ部４０１は、命令デコード部１からベクトルロード命令を含むベクトル命令１２を受け取り、ベクトル演算命令発行待ちバッファ４１０、４３０にベクトル命令を、ベクトルロード命令発行待ちバッファ４２０、４４０にベクトルロード命令を格納するように制御する。 FIG. 3 shows the vector instruction issuing unit 4. The vector instruction issue unit 4 includes two vector operation instruction issue wait buffers 410 and 430 and two vector load instruction issue wait buffers 420 and 440. The instruction buffer unit 401 receives the vector instruction 12 including the vector load instruction from the instruction decode unit 1, receives the vector instruction in the vector operation instruction issue waiting buffers 410 and 430, and receives the vector load instruction in the vector load instruction issue wait buffers 420 and 440. Control to store.

ベクトル命令バッファ部４０１は、ベクトル演算命令発行待ちバッファ４１０、４３０またはベクトルロード命令発行待ちバッファ４２０、４４０が発行待ちで使用中のため命令を格納できない場合は、ベクトル命令を受け取った順に（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔで）バッファリングする。このとき未発行要素識別フラグ４１２または未発行要素識別フラグ４２２に１つでも“１”がある場合は、未処理の要素番号帯があると判断して、ベクトル演算命令発行待ちバッファ４１０、４３０またはベクトルロード命令発行待ちバッファ４２０、４４０が発行待ちで使用中であると判断される。 When the vector operation instruction issuance waiting buffers 410 and 430 or the vector load instruction issuance waiting buffers 420 and 440 are waiting for issuance and cannot be used for storing instructions, the vector instruction buffer unit 401 receives the vector instructions in the order received (First In Buffering (at First Out). At this time, if there is at least one “1” in the unissued element identification flag 412 or the unissued element identification flag 422, it is determined that there is an unprocessed element number band, and the vector operation instruction issue waiting buffers 410, 430 or It is determined that the vector load instruction issuance waiting buffers 420 and 440 are in use waiting for issuance.

ベクトル演算器ビジー管理部４０２は、図示していないベクトル演算器を使用中であることを示すビジーフラグを管理する機能を持つ。ベクトル演算器を使用中であることを示すベクトル演算器ビジー情報はベクトル演算命令発行チェック部４１５に送出されてベクトル演算命令発行チェックに使用される。ベクトル演算命令発行チェックにより、ベクトル演算器使用中に他の要素番号帯や別の命令によるベクトル演算が始まらないよう制御される。 The vector computing unit busy management unit 402 has a function of managing a busy flag indicating that a vector computing unit (not shown) is being used. The vector operation unit busy information indicating that the vector operation unit is being used is sent to the vector operation instruction issue check unit 415 and used for the vector operation instruction issue check. By the vector operation instruction issue check, control is performed so that the vector operation by another element number band or another instruction does not start while using the vector operation unit.

ロードバッファ転送パスビジー管理部４０３は、ベクトル処理部５内にあるロードバッファ５−１からベクトル演算レジスタ５−２へロードデータを転送するために使用されるパスであるロードバッファ転送パスが使用中であることを示すビジーフラグを管理する機能を持つ。ロードバッファ転送パスのビジー状態を示すビジー情報はロードバッファ転送開始を行う際の発行チェックに使用される。この発行チェックにより、ロードバッファ転送中に同命令の別要素番号帯や別のベクトルロード命令によるロードバッファ転送が始まらないよう制御される。 The load buffer transfer path busy management unit 403 is using a load buffer transfer path which is a path used to transfer load data from the load buffer 5-1 in the vector processing unit 5 to the vector operation register 5-2. It has a function to manage a busy flag indicating that there is a certain event. The busy information indicating the busy state of the load buffer transfer path is used for an issue check when starting the load buffer transfer. By this issuance check, control is performed so that load buffer transfer due to another element number band of the same instruction or another vector load instruction does not start during load buffer transfer.

ベクトル演算命令発行待ちバッファ４１０、４３０は、それぞれ、ベクトル演算命令情報バッファ４１１、未発行要素識別フラグ４１２、命令間整合性維持フラグ４１３、命令間整合性維持フラグチェック部４１４、及びベクトル演算命令発行チェック部４１５を含む。 The vector operation instruction issue waiting buffers 410 and 430 are a vector operation instruction information buffer 411, an unissued element identification flag 412, an inter-instruction consistency maintaining flag 413, an inter-instruction consistency maintaining flag check unit 414, and a vector arithmetic instruction issuance, respectively. A check unit 415 is included.

ベクトルロード命令発行待ちバッファ４２０、４４０は、それぞれ、ベクトルロード命令情報バッファ４２１、未発行要素識別フラグ４２２、命令間整合性維持フラグ４２３、ロードバッファ番号４２４、ロードバッファ使用箇所指示フラグ４２５、命令間整合性維持フラグチェック部４２６、特定ロードバッファ転送条件確認部４２７、及びベクトルロード命令発行チェック部４２８を含む。 The vector load instruction issue waiting buffers 420 and 440 include a vector load instruction information buffer 421, an unissued element identification flag 422, an inter-instruction consistency maintaining flag 423, a load buffer number 424, a load buffer use location instruction flag 425, and an inter-instruction A consistency maintenance flag check unit 426, a specific load buffer transfer condition confirmation unit 427, and a vector load instruction issue check unit 428 are included.

ベクトル演算命令情報バッファ４１１は、演算種別とベクトル演算命令で読み書きするベクトル演算レジスタ情報とを格納する。 The vector operation instruction information buffer 411 stores the operation type and vector operation register information that is read and written by the vector operation instruction.

未発行要素識別フラグ４１２は、２５６の要素の帯からなるベクトルデータを６４要素を分割単位として分割することにより形成された４つの要素番号帯に対応して設置される。未だ演算開始指示が発行されていない要素番号帯に対応するフラグ値が“１”となるよう制御される。 The unissued element identification flag 412 is set in correspondence with four element number bands formed by dividing vector data composed of 256 element bands into 64 elements as division units. Control is performed so that the flag value corresponding to the element number band for which the calculation start instruction has not yet been issued becomes “1”.

命令間整合性維持フラグ４１３は、６４要素を分割単位として形成された４つの要素番号帯に対応し、かつ４１０以外の命令発行待ちバッファ４２０、４３０、４４０に対応して設置される。先行命令とデータ整合性上命令実行順序を守る必要がある（ベクトル演算レジスタ競合関係がある）場合には、対応するフラグ値が有効となるよう制御される。 The inter-instruction consistency maintaining flag 413 corresponds to four element number bands formed with 64 elements as a division unit, and is set corresponding to instruction issue waiting buffers 420, 430, and 440 other than 410. When it is necessary to keep the instruction execution order for data consistency with the preceding instruction (there is a vector operation register conflict relationship), the corresponding flag value is controlled to be valid.

命令間整合性維持フラグチェック部４１４は、以下のように命令間整合性処理を行う。命令間整合性維持フラグ４１３のフラグ情報が無効の場合、該当する分割単位に対応する命令発行許可信号をベクトル演算命令発行チェック部４１５に対して送出する。 The inter-instruction consistency maintaining flag check unit 414 performs inter-instruction consistency processing as follows. When the flag information of the inter-instruction consistency maintaining flag 413 is invalid, an instruction issue permission signal corresponding to the corresponding division unit is sent to the vector operation instruction issue check unit 415.

命令間整合性維持フラグ４１３のフラグ情報が有効であった場合は、先行してベクトルロード命令発行待ちバッファ４２０に格納され実行待ちであるベクトルロード命令転送先のベクトル演算レジスタ領域と、後続のベクトル演算命令で読み込むまたは書き込むベクトル演算レジスタ領域とが一致しているために、データ整合性を保つ必要性から命令発行の順番を守らなければならないことが示されている。従って、先行のベクトルロード命令が発行されない限りベクトル演算命令発行待ちバッファ４１０に格納された後続のベクトル演算命令は発行されない。こうした処理により、仕掛かり中のベクトルロード命令が使用するベクトル演算レジスタが先行するベクトル処理と競合するときには、後続のベクトル演算命令の実行を保留することが可能である。 If the flag information of the inter-instruction consistency maintaining flag 413 is valid, the vector load instruction transfer destination vector operation register area stored in the vector load instruction issuance waiting buffer 420 and waiting for execution, and the following vector Since the vector operation register area read or written by the operation instruction coincides with the vector operation register area, it is indicated that the order of instruction issuance must be maintained from the necessity of maintaining data consistency. Therefore, unless the preceding vector load instruction is issued, the succeeding vector operation instruction stored in the vector operation instruction issuance waiting buffer 410 is not issued. By such processing, when the vector operation register used by the vector load instruction in progress competes with the preceding vector processing, execution of the subsequent vector operation instruction can be suspended.

本実施の形態ではベクトルデータを構成する２５６の要素を６４要素を分割単位として分割することにより４つの要素番号帯が形成される。例えばベクトルロード命令と後続のベクトル演算命令のデータ整合性を維持する必要があった場合、最初の６４要素に対する先行のベクトルロード命令が発行されると最初の６４要素に対応する命令間整合性維持フラグ４１３が無効となり、ベクトル演算命令の最初の６４要素に対応する命令発行許可信号がベクトル演算命令発行チェック部４１５に対して送出される。 In this embodiment, four element number bands are formed by dividing 256 elements constituting vector data using 64 elements as a division unit. For example, when it is necessary to maintain the data consistency between the vector load instruction and the subsequent vector operation instruction, when the preceding vector load instruction for the first 64 elements is issued, the inter-instruction consistency corresponding to the first 64 elements is maintained. The flag 413 becomes invalid, and an instruction issue permission signal corresponding to the first 64 elements of the vector operation instruction is sent to the vector operation instruction issue check unit 415.

ベクトル演算命令発行チェック部４１５は、ベクトル演算器ビジー管理部４０２からのビジー信号と、命令間整合性維持フラグチェック部４１４からの命令発行許可信号と、未発行要素識別フラグ４１２からの未発行要素識別フラグ信号と、ベクトル演算命令情報バッファ４１１からの命令情報を受ける。 The vector operation instruction issue check unit 415 includes a busy signal from the vector operation unit busy management unit 402, an instruction issue permission signal from the inter-instruction consistency maintaining flag check unit 414, and an unissued element from the unissued element identification flag 412. The identification flag signal and instruction information from the vector operation instruction information buffer 411 are received.

ベクトル演算命令発行チェック部４１５は、ベクトル演算命令発行の条件をチェックする。この条件は、ベクトル演算器ビジーでないという条件と、６４要素を分割単位として分割することにより得られた４つの要素番号帯に対応する４ｂｉｔの未発行要素識別フラグ４１２とそれに対応する４ｂｉｔの命令間整合性維持フラグチェック部４１４の信号との桁毎の論理積を取った４ｂｉｔの信号が有効“１”である桁が存在するという条件の両方を満たすという条件である。この条件が満たされた場合、６４要素を分割単位として分割することにより得られた４つの要素番号帯の中から条件が満たされた１つの要素番号帯に対するベクトル演算開始指示１７がベクトル命令処理部５に送出される。 The vector operation instruction issue check unit 415 checks a condition for issuing a vector operation instruction. This condition includes a condition that the vector computing unit is not busy, and a 4-bit unissued element identification flag 412 corresponding to the four element number bands obtained by dividing 64 elements as a division unit, and a 4-bit instruction corresponding thereto. This is a condition that satisfies both of the conditions that there is a digit in which a 4-bit signal obtained by ANDing the signal of the consistency maintaining flag check unit 414 for each digit has a valid “1”. When this condition is satisfied, the vector operation start instruction 17 for one element number band that satisfies the condition is selected from the four element number bands obtained by dividing the 64 elements as a division unit. 5 is sent out.

４つの要素番号帯について複数の要素番号帯に対する発行条件が整うケースもある。その場合は複数の中から１つの要素番号帯に対するベクトル演算命令実行開始指示が出される。また、ベクトル演算命令の実行開始指示を出すと同時にベクトル演算器ビジー管理部４０２に対してベクトル演算器ビジーフラグを２クロック間（＝６４要素／［１クロック当たり処理スピード３２要素］）点灯するよう指示を出す。 There are cases where the issuing conditions for a plurality of element number bands are established for the four element number bands. In that case, a vector operation instruction execution start instruction is issued for one element number band from among a plurality. Also, an instruction to start execution of a vector operation instruction is issued, and at the same time, the vector operation unit busy management unit 402 is instructed to turn on the vector operation unit busy flag for 2 clocks (= 64 elements / [processing speed 32 elements per clock]). Put out.

ベクトルロード命令情報バッファ４２１は、ベクトルロード命令転送先のベクトル演算レジスタ情報を格納する。 The vector load instruction information buffer 421 stores vector operation register information of a vector load instruction transfer destination.

未発行要素識別フラグ４２２は、６４要素を分割単位として形成された４つの要素番号帯に対応して設置される。未だロードバッファ転送開始指示が発行されていない要素単位のフラグ値は“１”となるよう制御される。 The unissued element identification flag 422 is set corresponding to four element number bands formed with 64 elements as division units. The flag value for each element for which the load buffer transfer start instruction has not yet been issued is controlled to be “1”.

命令間整合性維持フラグ４２３は、６４要素を分割単位として形成された４つの要素番号帯に対応し、かつベクトルロード命令発行待ちバッファ４２０以外の命令発行待ちバッファ４１０、４３０、４４０に対応して設置される。先行命令とベクトル演算レジスタ競合関係がありデータ整合性を保つ必要性がある場合には対応するフラグ値が“１”（有効）となるよう制御される。 The inter-instruction consistency maintaining flag 423 corresponds to four element number bands formed with 64 elements as division units, and corresponds to instruction issue waiting buffers 410, 430, and 440 other than the vector load instruction issue waiting buffer 420. Installed. When there is a vector operation register conflict relationship with the preceding instruction and there is a need to maintain data consistency, the corresponding flag value is controlled to be “1” (valid).

ロードバッファ番号４２４には、ベクトルロード命令転送元のロードバッファ番号が格納される。 The load buffer number 424 stores the load buffer number of the vector load instruction transfer source.

ロードバッファ使用箇所指示フラグ４２５は、サブロードバッファ使用箇所指示フラグを格納する。ベクトルロード命令転送元のロードバッファ番号内の複数のサブロードバッファ番号に対応するフラグが有効となるように設定される。 The load buffer usage location instruction flag 425 stores a sub load buffer usage location indication flag. A flag corresponding to a plurality of sub load buffer numbers in the load buffer number of the vector load instruction transfer source is set to be valid.

命令間整合性維持フラグチェック部４２６は、命令間整合性維持フラグ４２３のフラグ情報が無効の場合、該当する分割単位に対応する命令発行許可信号をベクトルロード命令発行チェック部４２８に対して送出する機能を持つ。命令間整合性維持フラグ４２３のフラグ情報が有効であった場合は、先行してベクトル演算命令発行待ちバッファ４１０に格納され実行待ちであるベクトル演算命令が使用するベクトル演算レジスタ領域と、後続のベクトルロード命令転送先のベクトル演算レジスタ領域とが一致していて命令発行の順番を守らなければならないことが示されている。この場合、先行のベクトル演算命令が発行されない限りベクトルロード命令発行待ちバッファ４２０に格納された後続のベクトルロード命令は発行しないよう制御する。 When the flag information of the inter-instruction consistency maintaining flag 423 is invalid, the inter-instruction consistency maintaining flag check unit 426 sends an instruction issuance permission signal corresponding to the corresponding division unit to the vector load instruction issuance checking unit 428. Has function. If the flag information of the inter-instruction consistency maintaining flag 423 is valid, the vector operation register area used in advance by the vector operation instruction stored in the vector operation instruction issue waiting buffer 410 and waiting for execution, and the subsequent vector It is shown that the vector operation register area of the load instruction transfer destination coincides and the order of instruction issue must be observed. In this case, control is performed so that the subsequent vector load instruction stored in the vector load instruction issuance buffer 420 is not issued unless the preceding vector operation instruction is issued.

本実施の形態では、ベクトルデータを構成する複数（２５６個）の要素の要素を複数の要素群（６４要素を分割単位として分割することにより形成される４つの要素番号帯）に分ける。例えばベクトル演算命令と後続のベクトルロード命令とのデータ整合性を維持する必要があった場合、最初の６４要素に対する先行のベクトル演算命令が発行されると命令間整合性維持フラグ４２３の最初の６４要素に対応するフラグが無効となり、最初の６４要素に対応するベクトルロード命令の発行許可（ロードバッファ転送開始許可）信号がベクトルロード命令発行チェック部４２８に対して送出される。この処理により、ベクトルデータを構成する要素が全て揃っていない場合でも、複数の要素群のうちで全ての要素がメインメモリ７から読み出された要素群から先にベクトル演算レジスタ５−２に転送することが可能となる。 In the present embodiment, the elements of a plurality (256 elements) constituting the vector data are divided into a plurality of element groups (four element number bands formed by dividing 64 elements as a division unit). For example, if it is necessary to maintain data consistency between a vector operation instruction and a subsequent vector load instruction, when the preceding vector operation instruction for the first 64 elements is issued, the first 64 of the inter-instruction consistency maintenance flag 423 is issued. The flag corresponding to the element becomes invalid, and a vector load instruction issue permission (load buffer transfer start permission) signal corresponding to the first 64 elements is sent to the vector load instruction issue check unit 428. As a result of this processing, even when all the elements constituting the vector data are not prepared, all elements of the plurality of element groups are transferred from the element group read from the main memory 7 to the vector operation register 5-2 first. It becomes possible to do.

特定ロードバッファ転送条件確認部４２７は、ロードデータ整列判定部３から送られてきた整列通知１５と、ベクトルロード命令発行待ちバッファ４２０に格納されているロードバッファ番号４２４と、ロードバッファ使用箇所指示フラグ４２５に格納されているサブロードバッファ使用箇所指示フラグの情報とをサブロードバッファ毎の整列通知信号と比較する。比較の結果、一致したサブロードバッファがあった場合は、そのサブロードバッファに対応する要素番号帯のベクトルロード命令発行許可（ロードバッファ転送開始許可）をベクトルロード命令発行チェック部４２８に送出する。 The specific load buffer transfer condition confirmation unit 427 includes an alignment notification 15 sent from the load data alignment determination unit 3, a load buffer number 424 stored in the vector load instruction issuance waiting buffer 420, and a load buffer use location instruction flag. The information of the sub load buffer use location instruction flag stored in 425 is compared with the alignment notification signal for each sub load buffer. If there is a matching subload buffer as a result of the comparison, the vector load instruction issuance permission (load buffer transfer start permission) of the element number band corresponding to the subload buffer is sent to the vector load instruction issuance check unit 428.

本実施の形態において、この時ロードバッファ使用箇所指示フラグ４２５と、６４要素を分割単位として分割することにより形成された４つの要素番号帯別の４２２と、命令間整合性維持フラグ４２３との対応は、ロードバッファ使用箇所指示フラグ４２５の若い順に順番に対応しているものとしている。 In this embodiment, the correspondence between the load buffer use location instruction flag 425, the four element number band 422 formed by dividing the 64 elements into division units, and the inter-instruction consistency maintaining flag 423 Are in order of increasing load buffer usage location instruction flag 425.

例えば２５６要素のベクトルロード命令が４２０にバッファリングされているケースでロードバッファ使用箇所指示フラグ４２５には“１１１０１０００”というパタンが格納されているケースを考える。この場合、ロードデータ要素は以下の対応が成立するようにロードバッファ５−１へ格納されるよう制御が行われる。要素０〜６３に関しては未発行要素識別フラグ４２２の１番目のフラグとロードバッファ使用箇所指示フラグ４２５の１ｂｉｔ目が対応する。要素６４〜１２７に関しては未発行要素識別フラグ４２２の２番目のフラグとロードバッファ使用箇所指示フラグ４２５の２ｂｉｔ目が対応する。要素１２８〜１９１は未発行要素識別フラグ４２２の３番目のフラグとロードバッファ使用箇所指示フラグ４２５の３ｂｉｔ目が対応する。要素１９２〜２５５は未発行要素識別フラグ４２２の４番目のフラグとロードバッファ使用箇所指示フラグ４２５の５ｂｉｔ目のフラグが対応する。 For example, consider a case where a 256 element vector load instruction is buffered at 420 and the load buffer usage location instruction flag 425 stores the pattern “11101000”. In this case, control is performed so that the load data element is stored in the load buffer 5-1 so that the following correspondence is established. Regarding the elements 0 to 63, the first flag of the unissued element identification flag 422 corresponds to the first bit of the load buffer usage location instruction flag 425. Regarding the elements 64 to 127, the second flag of the unissued element identification flag 422 corresponds to the second bit of the load buffer usage location instruction flag 425. The elements 128 to 191 correspond to the third flag of the unissued element identification flag 422 and the third bit of the load buffer usage location instruction flag 425. The elements 192 to 255 correspond to the fourth flag of the unissued element identification flag 422 and the fifth bit flag of the load buffer usage location instruction flag 425.

ベクトルロード命令発行チェック部４２８は、ロードバッファ転送パスビジー管理部４０３からのロードバッファ転送パスビジー信号と、命令間整合性維持フラグチェック部４２６からの命令発行許可信号と、特定ロードバッファ転送条件確認部４２７からのベクトルロード命令発行許可信号と、未発行要素識別フラグ４２２からの未発行要素識別フラグ信号と、ベクトルロード命令情報バッファ４２１からの命令情報を受ける。 The vector load instruction issue check unit 428 includes a load buffer transfer pass busy signal from the load buffer transfer path busy management unit 403, an instruction issue permission signal from the inter-instruction consistency maintaining flag check unit 426, and a specific load buffer transfer condition confirmation unit 427. From the vector load instruction information buffer 421, the unissued element identification flag signal from the unissued element identification flag 422, and the instruction information from the vector load instruction information buffer 421.

ベクトルロード命令発行チェック部４２８は、以下の３条件が満たされたときに、６４要素を分割単位として分割することにより形成された４つの要素番号帯の中から１つの要素番号帯に対するベクトル演算開始指示１７をベクトル命令処理部５に送出する。
（１）ロードバッファ転送パスがビジーでない、
（２）ある要素番号帯に着目した時に４ｂｉｔの未発行要素識別フラグ４２２とそれに対応する４ｂｉｔの命令間整合性維持フラグチェック部４２６の信号との桁毎の論理積を取った４ｂｉｔの信号の桁が“１”である、
（３）ロードバッファ転送条件確認部４２７において４つの要素番号帯のいずれかのロードデータが揃ったことが確認された。
この条件が満たされた場合に、６４要素を分割単位として形成された４つの要素番号帯の中から１つの要素番号帯に対するベクトル演算開始指示１７がベクトル命令処理部５に送出される。 The vector load instruction issuance check unit 428 starts vector operation for one element number band from four element number bands formed by dividing 64 elements as a division unit when the following three conditions are satisfied: An instruction 17 is sent to the vector instruction processing unit 5.
(1) The load buffer transfer path is not busy,
(2) When attention is paid to a certain element number band, a 4-bit signal obtained by performing a logical product for each digit of a 4-bit unissued element identification flag 422 and a corresponding 4-bit inter-instruction consistency maintenance flag check unit 426 The digit is “1”.
(3) The load buffer transfer condition confirmation unit 427 confirms that the load data of any of the four element number bands is complete.
When this condition is satisfied, a vector operation start instruction 17 for one element number band is sent to the vector instruction processing unit 5 out of four element number bands formed using 64 elements as a division unit.

４つの要素番号帯について複数の要素番号帯に対する発行条件が整うケースもある。その場合は複数の中から１つの要素番号帯に対するベクトルロード命令実行開始指示（ロードバッファ転送開始指示１６）をベクトル命令発行部５に送出すると同時に、ロードバッファ使用中フラグリセット信号生成部２０４にロードバッファ解放通知１４が送出される。また、ロードバッファ転送開始指示を出すと同時にロードバッファ転送パスビジー管理部４０３に対してビジーフラグを２クロック間（＝６４要素／［１クロック当たり処理スピード３２要素］）点灯するよう指示を出す。 There are cases where the issuing conditions for a plurality of element number bands are established for the four element number bands. In this case, a vector load instruction execution start instruction (load buffer transfer start instruction 16) for one element number band from among a plurality of elements is sent to the vector instruction issuing unit 5 and simultaneously loaded to the load buffer in-use flag reset signal generating unit 204. A buffer release notification 14 is sent out. At the same time when the load buffer transfer start instruction is issued, the load buffer transfer path busy management unit 403 is instructed to turn on the busy flag for 2 clocks (= 64 elements / [processing speed 32 elements per clock]).

図４は、ベクトルロードリクエスト処理部２を示す。アドレス変換部２０１は、命令デコード部１より受け取ったベクトルロード命令１１を解読する。アドレス変換部２０１は、ベクトルロード命令１１に含まれるロード開始アドレス、要素間アドレスディスタンス、および要素数に基づいて、ベクトルデータの要素数分のアドレスを生成する。このアドレスは、使用するベクトルデータを構成する複数の要素が格納されているメインメモリ７上の位置を示す。アドレス変換部２０１は、生成したアドレスをタグ及びアドレス１８の一部としてメモリアクセス制御部６に送出することにより、１つのベクトルロード命令に基づいて要素数分のロード指示を行う。 FIG. 4 shows the vector load request processing unit 2. The address conversion unit 201 decodes the vector load instruction 11 received from the instruction decoding unit 1. The address conversion unit 201 generates addresses for the number of elements of the vector data based on the load start address, the inter-element address distance, and the number of elements included in the vector load instruction 11. This address indicates a position on the main memory 7 where a plurality of elements constituting vector data to be used are stored. The address conversion unit 201 sends the generated address as a part of the tag and the address 18 to the memory access control unit 6 and performs a load instruction for the number of elements based on one vector load instruction.

使用ロードバッファ決定部２０２は、以下のように、ベクトルデータをロードするロードバッファ５−１上の場所を決定する。まず、命令デコード部１より受け取ったベクトルロード命令１１を解読する。ベクトルロード命令１１には、ロードするベクトルデータの要素数を示す要素数情報が含まれる。その要素数情報に基づいて、ベクトルデータの全要素を格納するために必要な必要サブロードバッファ数を算出する。必要サブロードバッファ数分の空きサブロードバッファがあるロードバッファ５−１が選択される。選択されたロードバッファ５−１の中で未使用のサブロードバッファを必要サブロードバッファ数だけ確保する。確保済みサブロードバッファは、選択されたロードバッファ５−１を特定する確保済みロードバッファ番号と、確保されたサブロードバッファを特定する確保済みサブロードバッファ番号とによって特定される。以上の処理により、あるベクトルロード命令によって使用されるロードバッファ５−１と複数のサブロードバッファが決定され、確保番号通知１３、２３によって通知される。 The use load buffer determination unit 202 determines a place on the load buffer 5-1 to load vector data as follows. First, the vector load instruction 11 received from the instruction decoding unit 1 is decoded. The vector load instruction 11 includes element number information indicating the number of elements of vector data to be loaded. Based on the number-of-elements information, the necessary number of subload buffers necessary for storing all the elements of the vector data is calculated. A load buffer 5-1 having empty subload buffers corresponding to the number of necessary subload buffers is selected. In the selected load buffer 5-1, an unused sub load buffer is secured by the number of necessary sub load buffers. The reserved sub load buffer is specified by a reserved load buffer number that specifies the selected load buffer 5-1, and a reserved sub load buffer number that specifies the reserved sub load buffer. Through the above processing, the load buffer 5-1 and the plurality of sub load buffers used by a certain vector load instruction are determined and notified by the allocation number notifications 13 and 23.

使用ロードバッファ決定部２０２は、確保番号通知１３、２３と同時に、ロードバッファ使用中フラグ２０３に対してロードバッファ使用中フラグセット信号を送出して、新たに確保したロードバッファ５−１上の場所を示す確保済みロードバッファ番号と確保済みサブロードバッファ番号とに対応するフラグに“１”をセットする。使用ロードバッファ決定部２０２は更に、タグ生成部２０５に対して、確保済みロードバッファ番号と確保済みサブロードバッファ番号とロードするベクトルデータの要素数とを伝達する。 The used load buffer determining unit 202 sends a load buffer in-use flag set signal to the load buffer in-use flag 203 at the same time as the reservation number notifications 13 and 23, and a location on the newly reserved load buffer 5-1 "1" is set in a flag corresponding to the reserved load buffer number indicating the reserved sub-buffer number. The use load buffer determination unit 202 further transmits to the tag generation unit 205 the reserved load buffer number, the reserved sub load buffer number, and the number of elements of vector data to be loaded.

使用ロードバッファ決定部２０２は、全てのロードバッファ番号において必要数のサブロードバッファ（ロードバッファ分割単位）が確保出来なかった場合は、ビジー信号１０を命令デコード部１に送ることで後続のベクトルロード命令の送出を抑止する。 When the required number of sub load buffers (load buffer division units) cannot be secured for all the load buffer numbers, the used load buffer determination unit 202 sends the busy signal 10 to the instruction decode unit 1 to perform subsequent vector loading. Suppress sending instructions.

本実施の形態においては最大要素数を２５６として、ロードバッファ分割する単位の要素数を６４としている。要素数が１〜６４までの場合は１個の分割単位を、要素数が６５〜１２８までの場合は２個の分割単位を、要素数が１２９〜１９２までの場合は３個の分割単位を、要素数が１９３〜２５６までの場合は４個の分割単位を使用する。 In the present embodiment, the maximum number of elements is 256, and the number of elements in the unit for dividing the load buffer is 64. When the number of elements is 1 to 64, one division unit is used. When the number of elements is 65 to 128, two division units are used. When the number of elements is 129 to 192, three division units are used. When the number of elements is 193 to 256, 4 division units are used.

ロードバッファ使用中フラグ２０３は、全てのロードバッファ分割単位毎のフラグを有している。即ち、全てのサブロードバッファ番号に対応するフラグを有している。これらのフラグは、使用ロードバッファ決定部２０２からのセット信号及びロードバッファ使用中フラグリセット信号生成部２０４からのリセット信号に基づいてセットおよびリセットされる。 The load buffer in-use flag 203 has a flag for every load buffer division unit. That is, it has flags corresponding to all subload buffer numbers. These flags are set and reset based on a set signal from the used load buffer determining unit 202 and a reset signal from the load buffer in-use flag reset signal generating unit 204.

ロードバッファ使用中フラグリセット信号生成部２０４は、ロードバッファ転送開始指示が発行されるのに合わせて送出されるロードバッファ解放通知１４をベクトル命令発行部４より受け取る。ロードバッファ解放通知１４は、空きとなったサブロードバッファを特定するサブロードバッファ番号を含んでいる。ロードバッファ使用中フラグリセット信号生成部２０４は、このサブロードバッファ番号に対応するフラグがリセットされるようリセット信号をロードバッファ使用中フラグ２０３に送出する。 The load buffer busy flag reset signal generation unit 204 receives from the vector instruction issue unit 4 a load buffer release notification 14 that is sent when a load buffer transfer start instruction is issued. The load buffer release notification 14 includes a sub load buffer number that identifies a sub load buffer that has become empty. The load buffer busy flag reset signal generation unit 204 sends a reset signal to the load buffer busy flag 203 so that the flag corresponding to the sub load buffer number is reset.

タグ生成部２０５は、使用ロードバッファ決定部２０２が確保したロードバッファ番号と複数のサブロードバッファ番号とロードするベクトルデータの要素数の情報から、ロードするベクトルデータの要素の各々の格納先ロードバッファアドレスをタグとして生成し、アドレス変換部２０１で個別に生成されたメインメモリ７上のアドレスであるロードアドレスに１対１で対応するようにタグ及びアドレス１８をメモリアクセス制御部６に送出する。 The tag generation unit 205 uses the load buffer number secured by the use load buffer determination unit 202, a plurality of subload buffer numbers, and information on the number of elements of vector data to be loaded to store each load destination load buffer of the vector data elements. The address is generated as a tag, and the tag and address 18 are sent to the memory access control unit 6 so as to correspond one-to-one to the load address that is an address on the main memory 7 generated individually by the address conversion unit 201.

タグの情報は、確保済みロードバッファ番号と複数の確保済みサブロードバッファ番号とロードデータ各要素の格納先ロードバッファアドレスの情報から成る。格納先ロードバッファアドレスを複数の確保済みサブロードバッファ番号の小さい方から順番に割り当てるように決めておくことで、格納先ロードバッファアドレスをタグとしてメモリアクセス制御部６に送出すれば必要な情報が揃うためインタフェース削減可能となり、望ましい構成となる。 The tag information includes information on a reserved load buffer number, a plurality of reserved sub load buffer numbers, and a storage load buffer address of each element of load data. By determining that the storage destination load buffer addresses are allocated in order from the smallest of the plurality of reserved sub load buffer numbers, if the storage destination load buffer address is sent to the memory access control unit 6 as a tag, the necessary information can be obtained. As a result, the number of interfaces can be reduced, resulting in a desirable configuration.

以上に説明した本実施の形態の構成により、以下のようなベクトル処理装置が構成される。
第１に、使用ロードバッファ決定部２０２、要素数セット部３０１、要素数減算カウンタ部３０２、整列判定部３０５、ベクトル演算命令発行待ちバッファ４１０、及びベクトル演算命令発行待ちバッファ４３０により、次のベクトル処理装置が実現される。 The following vector processing apparatus is configured by the configuration of the present embodiment described above.
First, the use load buffer determination unit 202, the element number setting unit 301, the element number subtraction counter unit 302, the alignment determination unit 305, the vector operation instruction issue wait buffer 410, and the vector operation instruction issue wait buffer 430 are used to generate the next vector. A processing device is realized.

ベクトル処理装置は、メインメモリ７とベクトル演算レジスタ５−２との間に、メインメモリ７から読み出されたベクトルデータを一時的に格納するロードバッファ５−１を備える。メインメモリ７から読み出されたベクトルロード命令の解読時に、ロードバッファ５−１の使用領域を確保するとともにメインメモリ７からロードバッファ５−１へのベクトルデータの読み出しが起動される。起動後、ロードバッファ５−１にベクトルデータの要素が格納されたことと、ベクトルロード命令で使用するベクトル演算レジスタ領域が先行するベクトル命令で使用するベクトル演算レジスタ領域と競合しないことを条件に、ロードバッファ５−１からベクトル演算レジスタ５−２へのベクトルデータの転送が開始される。こうしたベクトル処理装置において、最大ベクトル要素数を予め決定した特定の大きさに分割し、その分割した単位でロードされたデータ要素が格納されたことを確認した後に、ロードバッファ５−１からベクトル演算レジスタ５−２へベクトルデータの転送を開始するよう制御が行われる。 The vector processing apparatus includes a load buffer 5-1 for temporarily storing vector data read from the main memory 7 between the main memory 7 and the vector calculation register 5-2. When the vector load instruction read from the main memory 7 is decoded, a use area of the load buffer 5-1 is secured and reading of vector data from the main memory 7 to the load buffer 5-1 is started. After starting, on condition that the element of vector data is stored in the load buffer 5-1, and that the vector operation register area used in the vector load instruction does not conflict with the vector operation register area used in the preceding vector instruction, Transfer of vector data from the load buffer 5-1 to the vector operation register 5-2 is started. In such a vector processing apparatus, after dividing the maximum number of vector elements into a predetermined size and confirming that the data elements loaded in the divided units are stored, the vector operation is performed from the load buffer 5-1. Control is performed to start transfer of vector data to the register 5-2.

第２に、上記構成に加えて、ベクトル演算命令発行待ちバッファ４１０、４３０、ベクトルロード命令発行待ちバッファ４２０、４４０が連携して動作することにより、次の機能を有するベクトル処理装置が実現される。 Second, in addition to the above configuration, the vector operation instruction issue waiting buffers 410 and 430 and the vector load instruction issue wait buffers 420 and 440 operate in cooperation to realize a vector processing device having the following functions. .

ベクトル処理装置において、命令解読順番におけるデータ整合性を損なわないように、上記の特定の大きさに分割した単位でチェックを行い、データ整合性に問題が無い場合には命令解読順番によらずに命令を発行するよう制御が行われる。 In the vector processing device, in order to avoid losing data consistency in the instruction decoding order, a check is performed in units divided into the above specific sizes, and if there is no problem in data consistency, regardless of the instruction decoding order Control is performed to issue an instruction.

第３に、上記構成に加えて、使用ロードバッファ決定部２０２、要素数セット部３０１、及び要素数減算カウンタ部３０２が連携して動作することにより、次の機能を有するベクトル処理装置が実現される。 Third, in addition to the above configuration, the use load buffer determination unit 202, the element number setting unit 301, and the element number subtraction counter unit 302 operate in cooperation, thereby realizing a vector processing device having the following functions. The

ベクトル処理装置において、上記の分割した単位のロードバッファを１つまたは複数のグループにまとめて、その中からベクトルロード要素数に応じてロードバッファ分割単位を必要な数だけ確保するとともに、確保した箇所のロードバッファ分割単位をベクトル命令発行部に伝達するよう制御が行われる。 In the vector processing apparatus, the load buffers in the divided units are grouped into one or a plurality of groups, and the necessary number of load buffer dividing units are secured according to the number of vector load elements. Control is performed so that the load buffer division unit is transmitted to the vector instruction issuing unit.

第４に、上記構成に加えて、使用ロードバッファ決定部２０２、ロードバッファ使用中フラグ２０３、ロードバッファ使用中フラグリセット信号生成部２０４、ベクトル演算命令発行待ちバッファ４１０、４３０が連携して動作することにより、次の機能を有するベクトル処理装置が実現される。 Fourth, in addition to the above configuration, the used load buffer determining unit 202, the load buffer in-use flag 203, the load buffer in-use flag reset signal generating unit 204, and the vector operation instruction issue waiting buffers 410 and 430 operate in cooperation. Thus, a vector processing device having the following functions is realized.

ベクトル処理装置において、上記の特定の大きさに分割した単位でロードバッファ５−１からベクトル演算レジスタ５−２へのベクトルデータの転送を開始するのに合わせて分割した単位に対応するロードバッファの解放を行い、後続のベクトルロード命令で使用するよう制御が行われる。 In the vector processing device, the load buffer corresponding to the unit divided in accordance with the start of transfer of vector data from the load buffer 5-1 to the vector operation register 5-2 in the unit divided into the specific size. Release and control to use in subsequent vector load instructions.

［動作の説明］
次に本実施の形態の動作を図５のタイムチャートと図７の説明用命令列例を使用して説明する。 [Description of operation]
Next, the operation of the present embodiment will be described using the time chart of FIG. 5 and the instruction sequence example for explanation of FIG.

図７は説明用命令列例で、番号１〜７の順でデコードされるものとする。ＡＤＤ−ＡはＬＤ−Ａで演算レジスタ５−２にロードしたデータを使ってＡＤＤ演算する命令と定義する。ＡＤＤ−ＢはＬＤ−Ｂで演算レジスタ５−２にロードしたデータを使ってＡＤＤ演算する命令と定義する。１番と２番、３番と４番の命令以外は使用するベクトル演算レジスタ領域が異なるため、データ整合性の観点からは命令間の発行順番依存関係は無い。ベクトルデータは全て要素数２５６であると定義する。 FIG. 7 shows an example of an instruction sequence for explanation, and it is assumed that decoding is performed in the order of numbers 1-7. ADD-A is defined as an instruction for performing an ADD operation using data loaded to the operation register 5-2 by the LD-A. ADD-B is defined as an instruction for performing an ADD operation using data loaded to the operation register 5-2 by LD-B. Since the vector operation register areas to be used are different except for the first, second, third and fourth instructions, there is no issue order dependency between instructions from the viewpoint of data consistency. All vector data are defined to have 256 elements.

ＡＤＤ演算は通常２つのオペランドデータを使って演算処理を行うが、本実施の形態ではロードデータと対となるデータが予め別のベクトル演算レジスタに格納されていることを動作説明の前提とする。また、ロードバッファ５−１からベクトル演算レジスタ５−２への書き込みパスは１つのみである構成とする。 The ADD operation is normally performed using two operand data, but in this embodiment, it is assumed that the data paired with the load data is stored in a separate vector operation register in advance. Further, there is only one write path from the load buffer 5-1 to the vector operation register 5-2.

図５は、図７に示された命令列が実行されたケースのタイムチャートを示している。以下、図の上部に記載されたクロック数１−３３を参照しながら、動作を説明する。ベクトルロード命令ＬＤ−Ａが命令デコード部１でデコードされてベクトルロード命令１１およびベクトル命令１２としてベクトルロードリクエスト処理部２とベクトル命令発行部４に出力されたタイミングをクロック１とする。 FIG. 5 shows a time chart of a case where the instruction sequence shown in FIG. 7 is executed. Hereinafter, the operation will be described with reference to the clock number 1-33 described in the upper part of the figure. The timing at which the vector load instruction LD-A is decoded by the instruction decoding unit 1 and output to the vector load request processing unit 2 and the vector instruction issuing unit 4 as the vector load instruction 11 and the vector instruction 12 is set as a clock 1.

図７で示した命令列が順次デコードされると、それぞれの命令列で読み込むベクトルデータを格納するために必要な分のサブロードバッファを確保したのち、タグ及びアドレス１８がメモリアクセス制御部６に送出される。 When the instruction sequence shown in FIG. 7 is sequentially decoded, the tag and address 18 are stored in the memory access control unit 6 after securing the subload buffer for storing vector data read by each instruction sequence. Sent out.

ベクトル命令処理部５は、＃０と＃１で特定される２つのロードバッファ５−１を備えるものとする。命令列の１行目のベクトルロード命令ＬＤ−Ａが使用ロードバッファ決定部２０２で処理されると、ロードバッファ＃０の中のサブロードバッファ０，１，２，３（図２、図４、図６ではＬＤ−Ｂｕｆ＃０−０，＃０−１，＃０−２，＃０−３と、ロードバッファ番号を示す＃０の後に枝番を付けることによって記載されている）の４つが確保される。確保番号通知１３、２３の使用箇所情報には“１１１１００００”という値が出力される。 The vector instruction processing unit 5 includes two load buffers 5-1 specified by # 0 and # 1. When the vector load instruction LD-A in the first row of the instruction sequence is processed by the use load buffer determination unit 202, the sub load buffers 0, 1, 2, and 3 in the load buffer # 0 (FIGS. 2, 4, and 4) In FIG. 6, four of LD-Buf # 0-0, # 0-1, # 0-2, # 0-3 and # 0 indicating the load buffer number are added to the branch number). Secured. A value of “11110000” is output as the usage location information of the reservation number notifications 13 and 23.

同様に命令列の２行目のＬＤ−Ｂが使用ロードバッファ決定部２０２で処理されると、ロードバッファ＃０のサブロードバッファ４，５，６，７が確保されて使用箇所情報には“００００１１１１”という値が出力される。命令列の５行目のＬＤ−Ｃが使用ロードバッファ決定部２０２で処理されるとロードバッファ＃１のサブロードバッファ０，１，２，３が確保され、ＬＤ−Ｂが使用ロードバッファ決定部２０２で処理されるとロードバッファ＃１のサブロードバッファ４，５，６，７が確保される。この時点でロードバッファ＃０，＃１は全て使用中となってしまうため、後続のベクトルロード命令ＬＤ−Ｅは処理待ちの状態となり、命令デコード部１にはビジー信号１０が出力される。 Similarly, when the LD-B in the second row of the instruction sequence is processed by the use load buffer determination unit 202, the sub load buffers 4, 5, 6, and 7 of the load buffer # 0 are secured and the use location information includes “ A value of 00001111 "is output. When the LD-C in the fifth row of the instruction sequence is processed by the use load buffer determination unit 202, the sub load buffers 0, 1, 2, and 3 of the load buffer # 1 are secured, and the LD-B uses the load buffer determination unit. When processed in 202, the sub load buffers 4, 5, 6, and 7 of the load buffer # 1 are secured. Since all the load buffers # 0 and # 1 are in use at this time, the subsequent vector load instruction LD-E enters a state of waiting for processing, and a busy signal 10 is output to the instruction decoding unit 1.

要素数セット部３０１は、使用ロードバッファ決定部２０２から確保番号通知１３を受け取り、要素数減算カウンタ部３０２のＬＤ−Ｂｕｆ＃０−０，ＬＤ−Ｂｕｆ＃０−１，ＬＤ−Ｂｕｆ＃０−２，ＬＤ−Ｂｕｆ＃０−３の整列判定カウンタにそれぞれ値６４をセットするとともに、対応する有効フラグに“１”をセットする。 The element number setting unit 301 receives the reservation number notification 13 from the use load buffer determining unit 202, and LD-Buf # 0-0, LD-Buf # 0-1, LD-Buf # 0- of the element number subtraction counter unit 302. 2, the value 64 is set in each of the alignment determination counters of LD-Buf # 0-3, and “1” is set in the corresponding valid flag.

４０１のベクトル命令バッファ部は、ベクトル命令１２として、図７で示すような命令列を順次、受け取る。まずベクトルロード命令ＬＤ−Ａはベクトルロード命令発行待ちバッファ４２０に格納され、次にベクトル演算命令ＡＤＤ−Ａがベクトル演算命令発行待ちバッファ４１０に格納される。続いてＬＤ−Ｂがベクトルロード命令発行待ちバッファ４４０に格納され、ＡＤＤ−Ｂがベクトル演算命令発行待ちバッファ４３０に格納される。ＬＤ−ＣとＬＤ−Ｄはベクトルロード命令発行待ちバッファ４２０またはベクトルロード命令発行待ちバッファ４４０が空くまでベクトル命令バッファ４０１内でバッファリングされる。 The vector instruction buffer unit 401 sequentially receives an instruction sequence as shown in FIG. First, the vector load instruction LD-A is stored in the vector load instruction issuance waiting buffer 420, and then the vector operation instruction ADD-A is stored in the vector operation instruction issuance waiting buffer 410. Subsequently, LD-B is stored in the vector load instruction issuance waiting buffer 440, and ADD-B is stored in the vector operation instruction issuance waiting buffer 430. LD-C and LD-D are buffered in the vector instruction buffer 401 until the vector load instruction issue wait buffer 420 or the vector load instruction issue wait buffer 440 becomes empty.

ＡＤＤ−ＡがＬＤ−Ａを追い越して実行されてデータ整合性が損なわれないように、ＡＤＤ−Ａがベクトル演算命令発行待ちバッファ４１０に格納される際に、先行してベクトルロード命令発行待ちバッファ４２０に格納されたＬＤ−Ａに対する命令間整合性維持フラグ４１３に“１１１１”がセットされる。このためＡＤＤ−Ａ命令で使用するベクトル演算レジスタ領域がビジーで無かったとしても、命令間整合性維持フラグ４１３が全て“１”なので、６４要素を分割単位として形成された４つの要素番号帯の発行許可信号が一つも有効“１”にならない。その結果、ＡＤＤ−Ａ命令は、ベクトル演算命令発行待ちバッファ４１０で待機状態となる。 When the ADD-A is stored in the vector operation instruction issuance waiting buffer 410 so that the ADD-A is executed over the LD-A and the data consistency is not impaired, the vector load instruction issuance waiting buffer is preceded. “1111” is set in the inter-instruction consistency maintaining flag 413 for the LD-A stored in 420. For this reason, even if the vector operation register area used in the ADD-A instruction is not busy, all of the inter-instruction consistency maintaining flags 413 are “1”, so that the four element number bands formed with 64 elements as a division unit. No issue permission signal is valid "1". As a result, the ADD-A instruction enters a standby state in the vector operation instruction issuance waiting buffer 410.

ＬＤ−Ａは最初の命令なので先行する命令とのデータ整合性を維持する必要性は無いことから、命令間整合性維持フラグ４２３には“００００”がセットされる。従って、サブロードバッファＬＤ−Ｂｕｆ＃０−０またはＬＤ−Ｂｕｆ＃０−１またはＬＤ−Ｂｕｆ＃０−２またはＬＤ−Ｂｕｆ＃０−３のデータが揃ったことが確認されたら、直ちに演算レジスタ５−２に対してロードバッファ５−１のデータを転送開始可能な状態となっている。この状態がタイムチャートでロードデータ整列待ちの状態として省略されているクロック１４の状態である。この時のロードバッファ使用状態を図６（ａ）に示す。 Since LD-A is the first instruction, there is no need to maintain data consistency with the preceding instruction, so “0000” is set in the inter-instruction consistency maintaining flag 423. Therefore, as soon as it is confirmed that the data of the sub load buffer LD-Buf # 0-0, LD-Buf # 0-1, LD-Buf # 0-2, or LD-Buf # 0-3 is prepared, the operation register The data in the load buffer 5-1 can be started to be transferred to 5-2. This state is a state of the clock 14 which is omitted from the time chart as a wait state for load data alignment. FIG. 6A shows the load buffer usage state at this time.

クロック１５で、ベクトルロード命令ＬＤ−Ａの最若番の確保済みサブロードバッファＬＤ−Ｂｕｆ＃０−０に割り当てられた６４個の要素全てをメモリアクセス制御部６より受け取ったことで、要素数減算カウンタ部３０２のＬＤ−Ｂｕｆ＃０−０の整列判定カウンタの値が“０”となる。整列判定部３０５は、サブロードバッファＬＤ−Ｂｕｆ＃０−０の全てのロード要素が揃ったと判断する。 When all the 64 elements assigned to the youngest reserved subload buffer LD-Buf # 0-0 of the vector load instruction LD-A are received from the memory access control unit 6 at the clock 15, the number of elements The value of the alignment determination counter of LD-Buf # 0-0 in the subtraction counter unit 302 is “0”. The alignment determining unit 305 determines that all the load elements of the sub load buffer LD-Buf # 0-0 are aligned.

クロック１６において、特定ロードバッファ転送条件確認部４２７では、ＬＤ−Ａのロードバッファ番号４２４とロードバッファ使用箇所４２５の情報と整列判定部３０５より受け取った整列通知１５を比較する。この比較により、ＬＤ−Ａ用に確保されているＬＤ−Ｂｕｆ＃０−０の整列が完了したことを認識し、更に、一番小さい確保済みサブロードバッファ番号の要素番号帯であることから要素番号００〜６３の整列が完了したことも識別して、ベクトルロード命令発行チェック部４２８に伝える。 At the clock 16, the specific load buffer transfer condition confirmation unit 427 compares the load buffer number 424 of the LD-A and the information of the load buffer usage location 425 with the alignment notification 15 received from the alignment determination unit 305. From this comparison, it is recognized that the alignment of LD-Buf # 0-0 reserved for LD-A is completed, and further, since it is the element number band of the smallest reserved subload buffer number, the element It is also identified that the alignment of the numbers 00 to 63 has been completed, and this is transmitted to the vector load instruction issue check unit 428.

ベクトルロード命令発行チェック部４２８は、ＬＤ−Ａの転送先ベクトル演算レジスタ領域がビジーではないことを確認し、サブロードバッファＬＤ−Ｂｕｆ＃０−０に対して命令で指定されたベクトル演算レジスタ領域へのロードバッファ転送開始指示１６を出す。同時に、ＬＤ−Ｂｕｆ＃０−０のロードバッファ解放通知１４をロードバッファ使用中フラグリセット信号生成部２０４に送出し、未発行要素識別フラグ４２２の要素番号００〜６３に対応するフラグをリセットする。さらに、ベクトルロード命令発行待ちバッファ４２０の命令間整合性維持フラグ４１３のＬＤ−Ａ要素番号００〜６３に対応するフラグを“０”にリセットする指示を出す。この結果、未発行要素識別フラグ４２２の値は“０１１１”に、命令間整合性維持フラグ４１３の値は“０１１１”になる。また、ＬＤ−Ａ転送先のベクトル演算レジスタ５−２への書き込みパスが１つのみである構成としているため、転送が終了するまでの２クロック間、ビジーフラグが点灯するようロードバッファ転送パスビジー管理部４０３に指示を出す。 The vector load instruction issuance check unit 428 confirms that the transfer destination vector operation register area of the LD-A is not busy, and the vector operation register area specified by the instruction for the sub load buffer LD-Buf # 0-0 A load buffer transfer start instruction 16 is issued. At the same time, the load buffer release notification 14 of LD-Buf # 0-0 is sent to the load buffer in-use flag reset signal generation unit 204, and the flags corresponding to the element numbers 00 to 63 of the unissued element identification flag 422 are reset. Further, an instruction is issued to reset the flag corresponding to the LD-A element number 00 to 63 of the inter-instruction consistency maintaining flag 413 of the vector load instruction issuance waiting buffer 420 to “0”. As a result, the value of the unissued element identification flag 422 becomes “0111”, and the value of the inter-instruction consistency maintenance flag 413 becomes “0111”. Since the LD-A transfer destination has only one write path to the vector operation register 5-2, the load buffer transfer path busy management unit turns on the busy flag for two clocks until the transfer is completed. An instruction is issued to 403.

また、本実施の形態では、クロック１６にてサブロードバッファＬＤ−Ｂｕｆ＃０−１の要素が全て揃ったことが識別されている。 Further, in the present embodiment, it is identified that all elements of the sub load buffer LD-Buf # 0-1 are prepared at the clock 16.

クロック１７において、サブロードバッファＬＤ−Ｂｕｆ＃０−０に対するベクトル演算レジスタ５−２への転送命令を受け、ベクトル命令処理部５はサブロードバッファＬＤ−Ｂｕｆ＃０−０からベクトル演算レジスタ５−２へのロードデータ転送を開始する。１クロックで３２要素のロードデータ転送が可能な構成としているため、ロードバッファ転送パスは２クロック間使用する。それに対応して２クロック間、ビジーフラグが点灯するようロードバッファ転送パスビジー管理部４０３に指示を出す。 In response to the transfer instruction to the vector operation register 5-2 for the sub load buffer LD-Buf # 0-0 at the clock 17, the vector instruction processing unit 5 receives the vector operation register 5- from the sub load buffer LD-Buf # 0-0. The load data transfer to 2 is started. Since 32 elements of load data can be transferred in one clock, the load buffer transfer path is used for two clocks. Correspondingly, the load buffer transfer path busy management unit 403 is instructed to turn on the busy flag for two clocks.

このタイミングで特定ロードバッファ転送条件確認部４２７は、サブロードバッファＬＤ−Ｂｕｆ＃０−１の整列判定信号を受け、ロードバッファ番号４２４並びにロードバッファ使用箇所指示フラグ４２５と比較を行い、要素番号６４〜１２７のロードデータが転送可能であることをベクトルロード命令発行チェック部４２８に伝達する。ベクトル演算器ビジー管理部４０２内のビジーフラグが点灯しているため、サブロードバッファＬＤ−Ｂｕｆ＃０−１に対する転送開始指示は発行されない。 At this timing, the specific load buffer transfer condition confirmation unit 427 receives the alignment determination signal of the sub load buffer LD-Buf # 0-1 and compares it with the load buffer number 424 and the load buffer use location instruction flag 425, and the element number 64 ˜127 is transmitted to the vector load instruction issue check unit 428 that the load data can be transferred. Since the busy flag in the vector computing unit busy management unit 402 is lit, no transfer start instruction is issued to the subload buffer LD-Buf # 0-1.

一方ＬＤ−Ａに対する命令間整合性維持フラグ４１３の値は“０１１１”となっている。そのため、命令間整合性維持フラグチェック部４１４にて要素番号００〜６３のＡＤＤ演算は実行可能と判断され、ベクトル演算命令発行チェック部４１５に命令発行許可信号“１０００”が送られる。ベクトル演算命令発行チェック部４１５は、未発行要素識別フラグ４１２の値“１１１１”という情報と、命令間整合性維持フラグチェック部４１４からの命令発行許可信号“１０００”という情報と、ベクトル演算器ビジーでないという情報から、要素番号００〜６３の要素番号帯に対するＡＤＤ演算開始指示を発行できると判断して、ベクトル演算開始指示１７として要素番号００〜６３に対するＡＤＤ演算開始指示を出す。それと同時にベクトル演算器ビジー管理部４０２内のＡＤＤ−Ａ命令実行結果格納先ベクトル演算レジスタ領域のビジーフラグを２クロック間点灯させる指示を出すとともに、未発行要素識別フラグ４１２の要素番号００〜６３に対応するフラグをリセットする。この結果、未発行要素識別フラグ４１２の値は“０１１１”となる。 On the other hand, the value of the inter-instruction consistency maintaining flag 413 for LD-A is “0111”. Therefore, the inter-instruction consistency maintaining flag check unit 414 determines that the ADD operation of the element numbers 00 to 63 can be executed, and an instruction issue permission signal “1000” is sent to the vector operation instruction issue check unit 415. The vector operation instruction issuance check unit 415 includes information indicating the value “1111” of the unissued element identification flag 412, information indicating the instruction issuance permission signal “1000” from the inter-instruction consistency maintaining flag check unit 414, and vector calculator busy Therefore, it is determined that the ADD calculation start instruction for the element number band of the element numbers 00 to 63 can be issued, and the ADD calculation start instruction for the element numbers 00 to 63 is issued as the vector calculation start instruction 17. At the same time, an instruction to turn on the busy flag in the ADD-A instruction execution result storage destination vector operation register area in the vector operation unit busy management unit 402 is given for 2 clocks, and it corresponds to the element numbers 00 to 63 of the unissued element identification flag 412. Reset the flag you want. As a result, the value of the unissued element identification flag 412 is “0111”.

クロック１８において、ロードバッファ転送パスビジー管理部４０３のビジーフラグが消灯する。そのため、ベクトルロード命令発行チェック部４２８にて発行条件のチェックを行い、要素番号６４〜１２７のロードデータが格納されているサブロードバッファＬＤ−Ｂｕｆ＃０−１のロードバッファ転送開始指示１６をベクトル命令処理部５に送出する。 At clock 18, the busy flag of the load buffer transfer path busy management unit 403 is turned off. Therefore, the issue condition is checked by the vector load instruction issue check unit 428, and the load buffer transfer start instruction 16 of the sub load buffer LD-Buf # 0-1 in which the load data of the element numbers 64 to 127 is stored is vectorized. It is sent to the instruction processing unit 5.

このタイミングで、ＬＤ−Ｂｕｆ＃０−１のロードバッファ解放通知１４をロードバッファ使用中フラグリセット信号生成部２０４に送出し、未発行要素識別フラグ４２２の要素番号６４〜１２７に対応するフラグをリセットし、さらにベクトルロード命令発行待ちバッファ４２０に格納されているＬＤ−Ａに対する命令間整合性維持フラグ４１３の要素番号６４〜１２７に対応するフラグにリセット指示を出す。この結果、未発行要素識別フラグ４２２の値は“００１１”に、命令間整合性維持フラグ４１３の値は“００１１”になる。また、ＬＤ−Ａ転送先のベクトル演算レジスタ５−２への書き込みパスが１つのみである構成としているため、ロードバッファ転送パスを使用する２クロックの間、ビジーフラグを点灯させるようロードバッファ転送パスビジー管理部４０３に指示をだす。 At this timing, the load buffer release notification 14 of LD-Buf # 0-1 is sent to the load buffer busy flag reset signal generation unit 204, and the flags corresponding to the element numbers 64-127 of the unissued element identification flag 422 are reset. Further, a reset instruction is issued to the flags corresponding to the element numbers 64 to 127 of the inter-instruction consistency maintaining flag 413 for the LD-A stored in the vector load instruction issuance waiting buffer 420. As a result, the value of the unissued element identification flag 422 is “0011”, and the value of the inter-instruction consistency maintaining flag 413 is “0011”. Also, since there is only one write path to the LD-A transfer destination vector operation register 5-2, the load buffer transfer path busy so that the busy flag is lit for two clocks using the load buffer transfer path. An instruction is issued to the management unit 403.

クロック１９において、サブロードバッファＬＤ−Ｂｕｆ＃０−１に対するベクトル演算レジスタ５−２への転送命令を受け、ベクトル命令処理部５はサブロードバッファＬＤ−Ｂｕｆ＃０−１からベクトル演算レジスタ５−２へのロードデータ転送を開始する。図５では２クロックでＡＤＤ演算が完了するタイムチャートとなっている。これは１クロックで３２要素のＡＤＤ演算が可能な構成としていることによる。 At the clock 19, upon receiving a transfer instruction to the vector operation register 5-2 for the sub load buffer LD-Buf # 0-1, the vector instruction processing unit 5 receives the vector operation register 5-5 from the sub load buffer LD-Buf # 0-1. The load data transfer to 2 is started. FIG. 5 is a time chart in which the ADD calculation is completed in two clocks. This is because 32 elements of ADD calculation can be performed in one clock.

このタイミングで特定ロードバッファ転送条件確認部４２７はサブロードバッファＬＤ−Ｂｕｆ＃０−２の整列判定信号を受け、ロードバッファ番号４２４並びにロードバッファ使用箇所指示フラグ４２５と比較を行い、要素番号１２８〜１９１のロードデータが転送可能であることを４２８に伝達している。しかし、ロードバッファ転送パスビジー管理部４０３のビジーフラグが点灯しているため、サブロードバッファＬＤ−Ｂｕｆ＃０−２に対する転送開始指示は発行されない。 At this timing, the specific load buffer transfer condition confirmation unit 427 receives the alignment determination signal of the sub load buffer LD-Buf # 0-2, compares it with the load buffer number 424 and the load buffer usage location instruction flag 425, and compares the element numbers 128 to It informs 428 that 191 load data can be transferred. However, since the busy flag of the load buffer transfer path busy management unit 403 is lit, a transfer start instruction for the sub load buffer LD-Buf # 0-2 is not issued.

一方、ＬＤ−Ａに対する命令間整合性維持フラグ４１３の値は“００１１”となっている。そのため、命令間整合性維持フラグチェック部４１４にて要素番号６４〜１２７のＡＤＤ演算は実行可能と判断され、ベクトル演算命令発行チェック部４１５に命令発行許可信号が送られる。 On the other hand, the value of the inter-instruction consistency maintaining flag 413 for LD-A is “0011”. For this reason, the inter-instruction consistency maintaining flag check unit 414 determines that the ADD operation of the element numbers 64 to 127 can be executed, and sends an instruction issue permission signal to the vector operation instruction issue check unit 415.

ベクトル演算命令発行チェック部４１５は、要素番号６４〜１２７に着目したときに未発行要素識別フラグ４１２の２ビット目の値が“１”であることと、命令間整合性維持フラグ４１３の２ビット目の値が“１”であることと、ベクトル演算器ビジーでないことをチェックして、要素番号６４〜１２７に対するＡＤＤ演算開始指示をベクトル演算開始指示１７としてベクトル命令処理部５に出力する。それと同時に、ビジーフラグを２クロック間点灯させる指示をベクトル演算器ビジー管理部４０２に出すとともに、未発行要素識別フラグ４１２の要素番号６４〜１２７に対応するフラグをリセットする。この結果、未発行要素識別フラグ４１２の値は“００１１”となる。 The vector operation instruction issuance check unit 415 indicates that the value of the second bit of the unissued element identification flag 412 is “1” when the element numbers 64 to 127 are focused, and the 2 bits of the inter-instruction consistency maintenance flag 413. It is checked that the value of the eye is “1” and that the vector calculator is not busy, and an ADD calculation start instruction for element numbers 64 to 127 is output to the vector instruction processing unit 5 as a vector calculation start instruction 17. At the same time, an instruction to turn on the busy flag for two clocks is issued to the vector computing unit busy management unit 402, and the flags corresponding to the element numbers 64 to 127 of the unissued element identification flag 412 are reset. As a result, the value of the unissued element identification flag 412 is “0011”.

このように２５６要素のロードデータを６４要素毎に分割した単位で命令間のデータ整合性を保つ機能を実現することで後続のＡＤＤ演算を効率的に行うことができる。 As described above, by realizing the function of maintaining data consistency between instructions in units of dividing the load data of 256 elements into 64 elements, subsequent ADD operations can be performed efficiently.

クロック２０において、要素番号１２８〜１９１が格納されたサブロードバッファＬＤ−Ｂｕｆ＃０−２に対する転送開始指示とロードバッファ解放指示が出される。それと同時に要素番号１２８〜１９１に対応する命令間整合性維持フラグ４１３の値が“０”にリセットされる。ベクトルロード命令ＬＤ−Ｅは要素数２５６のベクトルロード命令であり、４つのロードバッファ分割単位を確保する必要がある。この時点ではロードバッファ＃０に３つのサブロードバッファしか空きが無いため、命令デコード部１へのビジー信号は有効のままである。 At clock 20, a transfer start instruction and a load buffer release instruction are issued to sub load buffer LD-Buf # 0-2 in which element numbers 128 to 191 are stored. At the same time, the value of the inter-instruction consistency maintaining flag 413 corresponding to the element numbers 128 to 191 is reset to “0”. The vector load instruction LD-E is a vector load instruction having 256 elements, and it is necessary to secure four load buffer division units. At this point, since only three sub load buffers are available in the load buffer # 0, the busy signal to the instruction decoding unit 1 remains valid.

クロック２１において、要素番号１２８〜１９１に対応する命令間整合性維持フラグ４１３の値が“０”となるので、ＡＤＤ−Ａ命令の要素番号１２８〜１９１に対応する要素番号帯のベクトル演算開始指示が出される。 Since the value of the inter-instruction consistency maintaining flag 413 corresponding to the element numbers 128 to 191 becomes “0” in the clock 21, the vector operation start instruction for the element number band corresponding to the element numbers 128 to 191 of the ADD-A instruction is issued. Is issued.

クロック２２において、ＬＤ−Ｂ命令の要素番号００〜６３が格納されたサブロードバッファＬＤ−Ｂｕｆ＃０−４に対する転送開始指示とロードバッファ解放指示が出される。ロードバッファ使用中フラグリセット信号生成部２０４は、ベクトルロード命令発行チェック部４２８よりロードバッファ解放通知１４を受け、指示のあったサブロードバッファ番号に対応するロードバッファ使用中フラグをリセットする。 At clock 22, a transfer start instruction and a load buffer release instruction are issued to sub load buffer LD-Buf # 0-4 in which element numbers 00 to 63 of the LD-B instruction are stored. The load buffer busy flag reset signal generation unit 204 receives the load buffer release notification 14 from the vector load instruction issue check unit 428 and resets the load buffer busy flag corresponding to the instructed sub load buffer number.

クロック２３において、ＡＤＤ−Ｂ命令の要素番号００〜６３のＡＤＤ演算の実行指示が出される。この時点におけるロードバッファ使用状態を図６（ｂ）に示す。この時点でロードバッファ＃０に４つの空きができたため、使用ロードバッファ決定部２０２においてベクトルロード命令ＬＤ−Ｅ用のサブロードバッファが確保可能となり、ＬＤ−Ｅのベクトルロード処理を行うことができる。４つの確保済みサブロードバッファに対応するロードバッファ使用中フラグに“１”をセットする。ベクトルロード命令ＬＤ−Ｅが使用ロードバッファ決定部２０２で処理されると、ロードバッファ＃０の中のサブロードバッファ０，１，２，４の４つが確保されて、ロードバッファ確保番号通知１３、２３の使用箇所情報には“１１１０１０００”という値が出力される。 At clock 23, an instruction to execute an ADD operation of element numbers 00 to 63 of the ADD-B instruction is issued. The load buffer usage state at this time is shown in FIG. At this time, since there are four vacant spaces in the load buffer # 0, the use load buffer determination unit 202 can secure a sub load buffer for the vector load instruction LD-E, and the vector load processing of the LD-E can be performed. . “1” is set to the load buffer in-use flag corresponding to the four secured sub load buffers. When the vector load instruction LD-E is processed by the used load buffer determination unit 202, four of the sub load buffers 0, 1, 2, and 4 in the load buffer # 0 are secured, and the load buffer securing number notification 13; A value of “11101000” is output in the 23 usage location information.

以降残りのベクトルロード命令ＬＤ−Ａ−３，ＬＤ−Ｂ−１，ＬＤ−Ｂ−２，ＬＤ−Ｂ−３およびＬＤ−Ｃ，ＬＤ−Ｄ，ＬＤ−Ｅの各分割単位毎に全ての要素がロードされてベクトル演算レジスタ５−２へのデータ転送指示が出され、データ整合性を保ちつつ演算命令ＡＤＤ−Ａ−２，ＡＤＤ−Ａ−３およびＡＤＤ−Ｂの各分割単位毎に演算開始指示が全て出されて本実施の形態の命令列の実行が完了するが、動作説明が冗長となるため省略する。 Thereafter, all the elements for each division unit of the remaining vector load instructions LD-A-3, LD-B-1, LD-B-2, LD-B-3 and LD-C, LD-D, LD-E Is loaded and a data transfer instruction is issued to the vector calculation register 5-2, and calculation is started for each division unit of the calculation instructions ADD-A-2, ADD-A-3, and ADD-B while maintaining data consistency. Although all the instructions are issued and the execution of the instruction sequence of the present embodiment is completed, the description of the operation becomes redundant, and is omitted.

［変形例］
以下、本発明の実施の形態の変形例を説明する。基本的構成は上記の通りであるが、最大ベクトル要素数が２５６でない構成も可能である。最大ベクトル要素数は、１クロック毎の処理要素数の４倍以上の値であれば特に制限は無く、ＨＷ（Ｈａｒｄｗａｒｅ）量と性能とのトレード・オフによって決定可能である。例えば１クロック毎の処理要素数が４だとしたら、その４倍の１６を最大ベクトル要素数とするような構成であれば、上記の構成が有効に機能して性能が向上する。 [Modification]
Hereinafter, modifications of the embodiment of the present invention will be described. Although the basic configuration is as described above, a configuration in which the maximum number of vector elements is not 256 is also possible. The maximum number of vector elements is not particularly limited as long as it is a value that is four times or more the number of processing elements per clock, and can be determined by trade-off between HW (Hardware) amount and performance. For example, if the number of processing elements per clock is four, the above configuration functions effectively and the performance is improved if the maximum number of vector elements is set to 16 times four.

また本実施の形態ではロードバッファの分割単位として最大ベクトル要素数を４で割った６４要素を１つの分割単位としている。この分割単位については、最大ベクトル要素数を２以上の整数で割った値を１つの分割単位として設定することが可能である。 In the present embodiment, 64 elements obtained by dividing the maximum number of vector elements by 4 are used as one division unit as a load buffer division unit. For this division unit, a value obtained by dividing the maximum number of vector elements by an integer of 2 or more can be set as one division unit.

さらに本実施の形態では、１０２４要素分用意されたロードバッファを２段階に分割している。即ち、まず最大ベクトル長の２倍（＝要素数５１２）の値で分割した単位でロードバッファ番号を付与し、次いで１つのロードバッファ番号を最大ベクトル要素数を４で割った６４要素（＝２５６／４）単位毎の８つに分割している。コレに対して、１０２４要素分用意されたロードバッファをたとえば６４要素分割単位毎の１６個に分割するというように１段階でサブロードバッファ単位に分割する構成を取ることも可能である。この場合は本実施の形態におけるタグ情報中のロードバッファ番号というフィールドが無くなる代わりに、８ｂｉｔのサブロードバッファ使用箇所フィールドが分割数分の１６ｂｉｔに拡張される。 Furthermore, in this embodiment, the load buffer prepared for 1024 elements is divided into two stages. That is, first, a load buffer number is assigned in a unit divided by a value twice the maximum vector length (= 512 elements), and then 64 elements (= 256) obtained by dividing one load buffer number by the maximum vector element number by 4. / 4) Divided into 8 units. For this, it is possible to divide the load buffer prepared for 1024 elements into sub load buffer units in one step, for example, to divide the load buffer into 16 pieces every 64 element division units. In this case, instead of the field of the load buffer number in the tag information in the present embodiment disappearing, the 8-bit sub load buffer usage location field is expanded to 16 bits corresponding to the number of divisions.

また、本実施の形態ではロードバッファ容量を１０２４要素としているが、より大きい容量とすることも可能である。ロードバッファ容量は大きければ大きいほどベクトルロード命令を演算命令に先行して発行できるためトータル性能が向上する。ロードバッファ容量は、ＨＷ量と性能とのトレード・オフによって最適な量に決定可能である。 In the present embodiment, the load buffer capacity is 1024 elements, but a larger capacity is also possible. The larger the load buffer capacity, the greater the total performance because vector load instructions can be issued prior to the operation instructions. The load buffer capacity can be determined to an optimum amount by trade-off between the HW amount and the performance.

また、本実施の形態ではロードバッファを最大ベクトル長の２倍（＝要素数５１２）の値で分割した単位でロードバッファ番号を付与しているが、ロードバッファを最大ベクトル長の２倍以上の値で分割した単位にロードバッファ番号を付与しても良い。例えば要素数８９６＝２５６＊３＋１２８など中途半端な要素数で分割しても良い。但し、要素数の分割単位を最大ベクトル長の整数倍にすると使用効率が高くなる。 In the present embodiment, the load buffer number is assigned in units obtained by dividing the load buffer by twice the maximum vector length (= 512 elements), but the load buffer is more than twice the maximum vector length. A load buffer number may be assigned to a unit divided by a value. For example, the number of elements may be divided into halfway numbers such as 896 = 256 * 3 + 128. However, if the division unit of the number of elements is an integral multiple of the maximum vector length, the use efficiency is increased.

また、本実施の形態では２つのベクトルロード命令発行待ちバッファ４２０、４３０と、２つのベクトル演算命令発行待ちバッファ４１０、４７０が用意してあるが、命令間整合性維持フラグも合わせて増やすことで、各命令発行待ちバッファを増やすことも可能である。 In this embodiment, two vector load instruction issue wait buffers 420 and 430 and two vector operation instruction issue wait buffers 410 and 470 are prepared, but by increasing the inter-instruction consistency maintenance flag as well. It is also possible to increase the buffer for waiting for issuing each instruction.

また、本実施の形態ではロードバッファからベクトル演算レジスタへのデータ転送パスは１つのみである構成としたが、複数のデータ転送パスを設けて性能の向上を図ることも可能である。 In this embodiment, only one data transfer path from the load buffer to the vector operation register is used. However, it is also possible to improve performance by providing a plurality of data transfer paths.

また、本実施の形態では機能ブロック間のインタフェース信号の内訳も図示したが、必要な情報が伝達できるならば図示した信号の内訳とは異なる信号を用いてもよい。 In the present embodiment, the breakdown of the interface signals between the functional blocks is also illustrated, but a signal different from the breakdown of the illustrated signals may be used as long as necessary information can be transmitted.

以下、本実施の形態におけるベクトル処理装置、ベクトルロード方法によって達成される効果を説明する。 Hereinafter, effects achieved by the vector processing apparatus and the vector loading method according to the present embodiment will be described.

第１の効果は、ベクトルロード命令のデータを使った後続のベクトル演算命令を早く実行可能となるためベクトル演算器の使用効率が向上してシステムトータルの性能が向上することである。
その理由は、ベクトルロード命令および後続のベクトル演算命令をデータ整合性を保った上で分割した要素毎に管理ができるようになり、全ての要素が揃わなくても分割した要素単位で要素が揃えばロードバッファからベクトル演算レジスタへの転送とそのデータを使ったベクトル演算命令を開始するよう制御することによる。 The first effect is that the subsequent vector operation instruction using the data of the vector load instruction can be executed quickly, so that the use efficiency of the vector operation unit is improved and the total system performance is improved.
The reason is that the vector load instruction and subsequent vector operation instructions can be managed for each divided element while maintaining data consistency, and even if not all elements are aligned, the elements are aligned in divided elements. For example, transfer from the load buffer to the vector operation register and control to start a vector operation instruction using the data.

第２の効果は、ロードバッファの使用効率が向上するためベクトルロード命令の処理の開始を早く行うことができることによりメモリからのロードデータ待ちとなる確率が減少するためシステムトータルの性能が向上することである。
その理由は、ベクトルロード命令の全ての要素が揃わなくても分割した要素単位で要素が揃えばロードバッファからベクトル演算レジスタへの転送を開始すると同時に分割した要素単位でロードバッファを解放することにより、あるベクトルロード命令のロードバッファ転送開始指示が全て出されなくても後続のベクトルロード命令で使用する分割単位の数以上解放されれば後続のベクトルロード命令の処理を開始するよう制御することによる。 The second effect is that since the load buffer usage efficiency is improved, the vector load instruction processing can be started quickly, thereby reducing the probability of waiting for load data from the memory and improving the total system performance. It is.
The reason is that even if all the elements of the vector load instruction are not aligned, if the elements are aligned in divided elements, transfer from the load buffer to the vector operation register is started and at the same time the load buffer is released in divided elements. By controlling to start the processing of the subsequent vector load instruction if it is released more than the number of division units used in the subsequent vector load instruction even if not all the load buffer transfer start instructions of the vector load instruction are issued .

構成概略図Configuration diagram 整列判定部ブロック図Alignment judgment block block diagram ベクトル命令発行部Vector instruction issuing department ベクトルロードリクエスト処理部Vector load request processing section 発明構成タイムチャートInvention configuration time chart ロードバッファ使用状態推移Load buffer usage status transition 説明用命令列例Instruction sequence for explanation

Explanation of symbols

１命令デコード部
２ベクトルロードリクエスト処理部
３ロードデータ整列判定部
４ベクトル命令発行部
５ベクトル命令処理部
６メモリアクセス制御部
７メインメモリ
１０ビジー信号
１１ベクトルロード命令
１２ベクトル命令
１３確保番号通知
１４ロードバッファ解放通知
１５整列通知
１６ロードバッファ転送開始指示
１７ベクトル演算開始指示
１８タグ及びアドレス
１９タグ
２０タグ及びデータ
２１データ読み出しアドレス
２２読み出しデータ
２０１アドレス変換部
２０２使用ロードバッファ決定部
２０３ロードバッファ使用中フラグ
２０４ロードバッファ使用中フラグリセット信号生成部
２０５タグ生成部
３０１要素数セット部
３０２要素数減算カウンタ部
３０３ロードバッファ番号振り分け部
３０４要素数カウンタ
３０５整列判定部
３５１サブロードバッファ格納予定要素数
４０１ベクトル命令バッファ部
４０２ベクトル演算器ビジー管理部
４０３ロードバッファ転送パスビジー管理部
４１０、４３０ベクトル演算命令発行待ちバッファ
４１１ベクトル演算命令情報バッファ
４１２未発行要素識別フラグ
４１３命令間整合性維持フラグ
４１４命令間整合性維持フラグチェック部
４１５ベクトル演算命令発行チェック部
４２０、４４０ベクトルロード命令発行待ちバッファ
４２１ベクトルロード命令情報バッファ
４２２未発行要素識別フラグ
４２３命令間整合性維持フラグ
４２４ロードバッファ番号
４２５ロードバッファ使用箇所指示フラグ
４２６命令間整合性維持フラグチェック部
４２７特定ロードバッファ転送条件確認部
４２８ベクトルロード命令発行チェック部 DESCRIPTION OF SYMBOLS 1 Instruction decode part 2 Vector load request process part 3 Load data alignment determination part 4 Vector instruction issue part 5 Vector instruction process part 6 Memory access control part 7 Main memory 10 Busy signal 11 Vector load instruction 12 Vector instruction 13 Secure number notification 14 Load Buffer release notification 15 Alignment notification 16 Load buffer transfer start instruction 17 Vector operation start instruction 18 Tag and address 19 Tag 20 Tag and data 21 Data read address 22 Read data 201 Address conversion unit 202 Load buffer determination unit 203 Load buffer busy flag 204 Load buffer busy flag reset signal generation unit 205 Tag generation unit 301 Element number setting unit 302 Element number subtraction counter unit 303 Load buffer number distribution unit 304 Element number counter 05 Alignment determination unit 351 Number of elements scheduled to be stored in sub load buffer 401 Vector instruction buffer unit 402 Vector arithmetic unit busy management unit 403 Load buffer transfer path busy management unit 410, 430 Vector operation instruction issue wait buffer 411 Vector operation instruction information buffer 412 Unissued element Identification flag 413 Inter-instruction consistency maintenance flag 414 Inter-instruction consistency maintenance flag check unit 415 Vector operation instruction issuance check unit 420, 440 Vector load instruction issuance buffer 421 Vector load instruction information buffer 422 Unissued element identification flag 423 Inter-instruction consistency Maintainability flag 424 Load buffer number 425 Load buffer usage location instruction flag 426 Inter-instruction consistency maintenance flag check unit 427 Specific load buffer transfer condition confirmation unit 428 De instruction issue checking unit

Claims

A memory access control unit that reads vector data from the main memory based on the received command;
A load buffer for storing the vector data read by the memory access control unit;
A vector processing unit comprising a vector operation register, and vector processing the vector data transferred from the load buffer to the vector operation register;
The load buffer necessary for storing all the elements of the vector data processed in the received instruction is secured in units of sub load buffers obtained by dividing the load buffer into predetermined elements. A vector load request processing unit for generating a reservation notification indicating the sub load buffer;
A plurality of elements constituting the vector data are divided into a plurality of element groups each having the subload buffer as a unit, and all elements of the plurality of element groups are read from the main memory by the memory access control unit. A load data alignment determination unit that issues an alignment notification for the element group
A vector instruction for controlling to start transfer from the subload buffer corresponding to the alignment notification to the vector processing unit when the alignment notification corresponding to the subload buffer indicated in the reservation notification is issued A vector processing device comprising an issuing unit.

When the vector instruction issuance unit divides the vector data into the plurality of element groups and performs the transfer, the vector processing result executed in accordance with the received instruction is used by the vector load instruction in progress. 2. The vector processing device according to claim 1, wherein the vector operation register is controlled to start transfer from the load buffer to the vector processing unit on condition that the vector operation register does not compete with a preceding vector instruction.

The vector processing device according to claim 2, wherein when the transfer is started, the vector instruction issuing unit performs control to release the load buffer in which the transfer has been started so that it can be used in a subsequent vector load instruction.

  Reading vector data from main memory based on received instructions;
  Storing the vector data read in the reading step in a load buffer;
  Vector processing the vector data transferred from the load buffer to a vector operation register;
  The load buffer necessary for storing all the elements of the vector data processed in the received instruction is secured in units of sub load buffers obtained by dividing the load buffer by a predetermined number of elements. Generating an allocation notification indicating a subload buffer;
  A plurality of elements constituting the vector data are divided into a plurality of element groups each having the subload buffer as a unit, and all elements of the plurality of element groups are read from the main memory by the memory access control unit. Issuing an alignment notification for the set of elements;
  Controlling to start transfer from the subload buffer corresponding to the alignment notification to the vector processing unit when the alignment notification corresponding to the subload buffer indicated in the reservation notification is issued;
  A vector processing method comprising: