JP2016091362A

JP2016091362A - Information processing device, information processing method, and program

Info

Publication number: JP2016091362A
Application number: JP2014226144A
Authority: JP
Inventors: 尊博内田; Takahiro Uchida
Original assignee: NEC Platforms Ltd
Current assignee: NEC Platforms Ltd
Priority date: 2014-11-06
Filing date: 2014-11-06
Publication date: 2016-05-23

Abstract

PROBLEM TO BE SOLVED: To provide an information processing device, an information processing method and a program for the purpose which can efficiently execute an instruction with a relatively small amount of hardware even when there are many arithmetic resources.SOLUTION: A first time until timing when the influence of a scheduled instruction with a use resource coinciding therewith on an instruction of an instruction issue standby buffer 210 disappears is calculated, and the first time is stored in an inter-instruction dependence counter 213. Counter set value generation means 232 calculates a second time until timing when a path conflict in a computing element pipeline does not occur, and stores the second time in a path conflict counter 214. Scheduling determination instruction influence reflection means 231 calculates a previous instruction influence reflection value showing an influence which is given by an execution-determined instruction to an instruction selected just after. A conflict arbitration means 221 compares a value of the inter-instruction dependence counter 213, with a value of the path conflict counter, and with the previous instruction influence reflection value to calculate an execution waiting time of the shortest instruction.SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、情報処理方法、及び、そのためのプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program therefor.

特許文献１は、演算リソースおよびレジスタに対応したビジー管理カウンタを設け、発行待ち命令の実行判定にて関連する演算リソースのビジー管理カウンタ値を入力し、その値から最速で発行できるタイミングを算出して発行待ち時間を決定することでＲ→Ｗ（読み出し→書き込み）の関係にある後続命令を最速なタイミングで発行可能にする技術を開示する。 Patent Document 1 provides a busy management counter corresponding to a calculation resource and a register, inputs a busy management counter value of a related calculation resource in an execution waiting instruction execution determination, and calculates a timing at which it can be issued at the fastest speed from the value. A technique is disclosed in which a subsequent instruction having a relation of R → W (read → write) can be issued at the fastest timing by determining the issue waiting time.

特許文献２は、ビジーフラグを用いて、命令の追い越し発行が可能な、ベクトル処理装置について開示している。 Patent Document 2 discloses a vector processing device capable of overtaking and issuing instructions using a busy flag.

特許文献３は、ビジーフラグの更新が遅れる場合でも、パイプライン演算器の使用効率を低下させずに処理できる情報処理装置について開示している。 Patent Document 3 discloses an information processing apparatus that can perform processing without reducing the usage efficiency of a pipeline arithmetic unit even when the update of the busy flag is delayed.

特開２０１２−１７３７５５号公報JP 2012-173755 A 特開２００８−２６９０６７号公報JP 2008-269067 A 特開２０１１−０６００４８号公報JP 2011-060048 A

特許文献１は、演算リソース毎のビジー管理カウンタを想定しており、演算リソースが多い場合に必要なハードウェア量が多くなってしまうという問題があった。また、演算リソースが多くなると、対応するビジー管理カウンタから値を選択して比較する機能を実現する際に、高速クロックに対応した実装が難しいという問題があった。 Patent Document 1 assumes a busy management counter for each computing resource, and there is a problem in that the amount of hardware required increases when there are many computing resources. In addition, when the number of computing resources increases, there is a problem that it is difficult to implement a high-speed clock when realizing a function of selecting and comparing values from the corresponding busy management counter.

特許文献２は、ビジーフラグを用いており、特許文献１と同様の課題がある。 Patent Document 2 uses a busy flag and has the same problem as Patent Document 1.

また、特許文献３についても、ビジーフラグを用いており、特許文献１と同様の課題がある。 Also, Patent Document 3 uses a busy flag and has the same problem as Patent Document 1.

このように、上記特許文献では、演算リソースおよびレジスタに対応した数のビジー管理カウンタを設ける必要がある。したがって、管理対象が増えるに従いハードウェア量も増大してしまう。また、管理対象の増加に従い、ビジー管理カウンタも増えるため、命令の実行判定において多数のビジー管理カウンタの値を入力し比較する必要がある。このことから回路遅延が増大し、高速クロックへの対応が難しい、という問題があった。 Thus, in the above-mentioned patent document, it is necessary to provide the number of busy management counters corresponding to the operation resources and registers. Therefore, the amount of hardware increases as the number of management targets increases. Also, as the number of management targets increases, the number of busy management counters also increases. Therefore, it is necessary to input and compare a number of busy management counter values when determining the execution of an instruction. For this reason, there is a problem that the circuit delay increases and it is difficult to cope with a high-speed clock.

このため、本発明の目的は、上述した、管理対象の演算リソースが増えるに従いハードウェア量も増大してしまい、高速クロックへの対応が難しくなるという課題を解決することにある。 Therefore, an object of the present invention is to solve the above-described problem that the amount of hardware increases as the number of computing resources to be managed increases, making it difficult to cope with a high-speed clock.

本発明の情報処理装置は、演算器パイプラインのパイプライン処理を実行するパイプライン処理実行手段を、パイプライン制御手段が制御する情報処理装置であって、前記パイプライン制御手段が、命令発行待機バッファに存在する命令に対して、使用するリソースが一致する、スケジューリング済みの命令の影響が無くなるタイミングまでの第１の時間を算出して命令間依存関係カウンタに前記第１の時間に対応するカウンタ値を格納し、前記演算器パイプラインにおけるパス競合が発生しないタイミングまでの第２の時間を算出してパス競合カウンタに前記第２の時間に対応するカウンタ値を格納するカウンタセット値生成手段と、実行確定した命令が、直後に選択された命令に与える影響を示す、直前命令影響反映値を算出するスケジューリング確定命令影響反映手段と、前記命令間依存関係カウンタのカウンタ値、前記パス競合カウンタのカウンタ値、及び、前記直前命令影響反映値を比較して、最短となる命令の実行待ち時間を算出する、競合調停手段と、を包含する。 An information processing apparatus according to the present invention is an information processing apparatus in which pipeline processing means controls pipeline processing execution means for executing pipeline processing of an arithmetic unit pipeline, wherein the pipeline control means is configured to wait for issuing an instruction. A counter corresponding to the first time is calculated in the inter-instruction dependency counter by calculating a first time until an instruction existing in the buffer matches a resource to be used and when the influence of the scheduled instruction is eliminated. Counter set value generation means for storing a value, calculating a second time until a timing at which no path contention occurs in the computing unit pipeline, and storing a counter value corresponding to the second time in a path contention counter; A schedule to calculate the value that reflects the effect of the immediately preceding instruction, indicating the effect that the confirmed instruction has on the instruction selected immediately after The calculation instruction waiting time of the shortest instruction is calculated by comparing the counter value of the instruction for determining the influence of the ringing instruction, the counter value of the inter-instruction dependency counter, the counter value of the path conflict counter, and the reflection value of the immediately preceding instruction influence. Competing mediation means.

本発明の情報処理方法は、命令発行待機バッファに存在する命令に対して、使用するリソースが一致する、スケジューリング済みの命令の影響が無くなるタイミングまでの第１の時間を算出して命令間依存関係カウンタに前記第１の時間に対応するカウンタ値を格納し、前記演算器パイプラインにおけるパス競合が発生しないタイミングまでの第２の時間を算出してパス競合カウンタに前記第２の時間に対応するカウンタ値を格納し、実行確定した命令が、直後に選択された命令に与える影響を示す、直前命令影響反映値を算出し、前記命令間依存関係カウンタのカウンタ値、前記パス競合カウンタのカウンタ値、及び、前記直前命令影響反映値を比較して、最短となる命令の実行待ち時間を算出する。 The information processing method of the present invention calculates the first time until the timing at which the influence of the scheduled instruction that the resource to be used matches with the instruction existing in the instruction issue waiting buffer is eliminated, and the inter-instruction dependency The counter value corresponding to the first time is stored in the counter, the second time until the timing at which no path contention occurs in the computing unit pipeline is calculated, and the path contention counter corresponds to the second time. The counter value is stored, and the immediately preceding instruction influence reflection value indicating the influence of the instruction that has been confirmed to be executed on the immediately selected instruction is calculated, and the counter value of the inter-instruction dependency counter and the counter value of the path conflict counter are calculated. And the previous instruction influence reflection value is compared to calculate the execution waiting time of the shortest instruction.

本発明のコンピュータプログラムは、命令発行待機バッファに存在する命令に対して、使用するリソースが一致する、スケジューリング済みの命令の影響が無くなるタイミングまでの第１の時間を算出して命令間依存関係カウンタに前記第１の時間に対応するカウンタ値を格納し、前記演算器パイプラインにおけるパス競合が発生しないタイミングまでの第２の時間を算出してパス競合カウンタに前記第２の時間に対応するカウンタ値を格納する処理と、実行確定した命令が、直後に選択された命令に与える影響を示す、直前命令影響反映値を算出する処理と、前記命令間依存関係カウンタのカウンタ値、前記パス競合カウンタのカウンタ値、及び、前記直前命令影響反映値を比較して、最短となる命令の実行待ち時間を算出する処理と、をコンピュータに実行させる。 The computer program according to the present invention calculates a first time until a timing at which the influence of a scheduled instruction that matches resources used for an instruction existing in an instruction issuance waiting buffer is eliminated, and an inter-instruction dependency counter A counter value corresponding to the first time is stored, a second time until a timing at which no path contention occurs in the computing unit pipeline is calculated, and a counter corresponding to the second time is calculated in a path contention counter A process of storing a value, a process of calculating an immediately preceding instruction influence reflecting value indicating an influence of an instruction whose execution has been confirmed on an immediately selected instruction, a counter value of the inter-instruction dependency counter, and the path contention counter And a process of calculating the execution waiting time of the shortest instruction by comparing the counter value of To be executed by a computer.

本発明は、演算リソースが多い場合でも比較的少ないハードウェア量で効率的に命令を実行することが可能である。 The present invention can efficiently execute instructions with a relatively small amount of hardware even when there are many computing resources.

図１は、第一の実施形態に係る、情報処理装置の構成の一例を示すブロック図である。FIG. 1 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the first embodiment. 図２は、パイプライン処理実行部の構成の一例を示す図である。FIG. 2 is a diagram illustrating an example of the configuration of the pipeline processing execution unit. 図３は、パイプライン制御部の構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of the configuration of the pipeline control unit. 図４は、情報処理装置の動作を示すタイムチャートである。FIG. 4 is a time chart showing the operation of the information processing apparatus. 図５は、カウンタ値のデコードに用いるデコードテーブル（カウンタ値デコード対応表）を示す。FIG. 5 shows a decoding table (counter value decoding correspondence table) used for decoding the counter value. 図６は、図４の動作の詳細をクロック毎に示した図である。FIG. 6 is a diagram showing details of the operation of FIG. 4 for each clock. 図７は、図４の動作の詳細をクロック毎に示した図である。FIG. 7 is a diagram showing details of the operation of FIG. 4 for each clock. 図８は、図４の動作の詳細をクロック毎に示した図である。FIG. 8 is a diagram showing details of the operation of FIG. 4 for each clock. 図９は、図４の動作の詳細をクロック毎に示した図である。FIG. 9 is a diagram showing details of the operation of FIG. 4 for each clock. 図１０は、図４の動作の詳細をクロック毎に示した図である。FIG. 10 is a diagram showing details of the operation of FIG. 4 for each clock. 図１１は、スケジューリング済命令バッファが４個のみの構成の場合の情報処理装置の動作を示すタイムチャートである。FIG. 11 is a time chart showing the operation of the information processing apparatus in the case of a configuration with only four scheduled instruction buffers. 図１２は、異なる命令列を使用した場合の情報処理装置の動作を示すタイムチャートである。FIG. 12 is a time chart showing the operation of the information processing apparatus when different instruction sequences are used. 図１３は、第二の実施形態に係る、情報処理装置の構成の一例を示すブロック図である。FIG. 13 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the second embodiment. 図１４は、第三の実施形態に係る、情報処理装置の構成の一例を示すブロック図である。FIG. 14 is a block diagram illustrating an example of the configuration of the information processing apparatus according to the third embodiment.

発明を実施するための第一の形態について、図面を参照して詳細に説明する。 A first embodiment for carrying out the invention will be described in detail with reference to the drawings.

図１は、第一の実施形態に係る、情報処理装置１の構成の一例を示すブロック図である。 FIG. 1 is a block diagram illustrating an example of the configuration of the information processing apparatus 1 according to the first embodiment.

情報処理装置１は、パイプライン処理実行部１０、及び、パイプライン制御部２０を包含する。 The information processing apparatus 1 includes a pipeline processing execution unit 10 and a pipeline control unit 20.

図２は、パイプライン処理実行部１０の構成の一例を示す図である。 FIG. 2 is a diagram illustrating an example of the configuration of the pipeline processing execution unit 10.

パイプライン処理実行部１０は、本実施形態の制御の対象である演算パイプラインの構成を含む。 The pipeline processing execution unit 10 includes a configuration of an arithmetic pipeline that is a control target of the present embodiment.

演算パイプ制御部１６０は、ベクトル命令の発行が確定すると、ベクトルパイプ制御信号を送出する。 When the issuance of a vector instruction is confirmed, the arithmetic pipe control unit 160 sends a vector pipe control signal.

演算レジスタＶ０〜Ｖ３：１００〜１０３は、演算パイプ制御部１６０からの指示（レジスタ読み出し指示、及び、レジスタ格納指示）に従って、命令で指定されたＶＬ（ＶｅｃｔｏｒＬｅｎｇｔｈ：命令毎に指定可能なベクトル長）分を連続して、データの読み出し、及び、書き込み動作を行う。また、演算レジスタＶ０〜Ｖ３：１００〜１０３は、連続したデータ読み出しに対応していて、本例では最大４要素を、４サイクルに渡って連続して読み出し、または、書き込むことができる。なお、連続する要素数は、命令毎に指定可能である。 The arithmetic registers V0 to V3: 100 to 103 are VL (Vector Length: vector length that can be specified for each instruction) in accordance with instructions (register reading instruction and register storing instruction) from the arithmetic pipe control unit 160. ) Continuously, data reading and writing operations are performed. The arithmetic registers V0 to V3: 100 to 103 correspond to continuous data reading, and in this example, a maximum of 4 elements can be read or written continuously over 4 cycles. The number of consecutive elements can be specified for each instruction.

オペランドセレクタ１１０、１１１、１１２、１１３は、演算器の入力毎に存在し、演算パイプ制御部１６０からの指示（オペランドセレクタ指示）に従って、演算パイプ制御部１６０から指示のあった演算レジスタ出力を選択する。 Operand selectors 110, 111, 112, and 113 exist for each input of the arithmetic unit, and select an operation register output instructed from operation pipe control unit 160 according to an instruction (operand selector instruction) from operation pipe control unit 160. To do.

積和演算器１２０は、演算パイプ制御部１６０からの指示（演算実行指示）に従って、ＶＬ分の演算を実行する。積和演算器１２０は、信号線９００を介してクロスバ（Ｘ−ｂａｒ）１０６に演算結果を出力する。 The product-sum operation unit 120 executes a calculation for VL in accordance with an instruction (operation execution instruction) from the operation pipe control unit 160. The product-sum operation unit 120 outputs the operation result to the crossbar (X-bar) 106 via the signal line 900.

論理演算器１３０は、演算パイプ制御部１６０からの指示（演算実行指示）に従って、ＶＬ分の演算を実行する。論理演算器１３０は、信号線９０１を介してクロスバ１０６に演算結果を出力する。 The logical operation unit 130 executes an operation for VL in accordance with an instruction (operation execution instruction) from the operation pipe control unit 160. The logical operator 130 outputs the operation result to the crossbar 106 via the signal line 901.

クロスバ１０６は、演算パイプ制御部１６０の指示（Ｘ−ｂａｒセレクト指示）に従って、演算レジスタ１００、１０１、１０２、１０３に書き込むデータを選択してＶＬ分連続して出力する。 The crossbar 106 selects data to be written in the arithmetic registers 100, 101, 102, and 103 according to an instruction (X-bar selection instruction) from the arithmetic pipe control unit 160, and outputs the data continuously for VL.

ストアデータセレクタ１０５は、演算パイプ制御部１６０からの指示（ＳｔｏｒｅＳｅｌｅｃｔ指示）に従って、演算レジスタ１００、１０１、１０２、１０３の読み出しデータを選択し、メインメモリへのストアパスにデータを出力する。 The store data selector 105 selects read data from the operation registers 100, 101, 102, and 103 in accordance with an instruction (Store Select instruction) from the operation pipe control unit 160, and outputs the data to the store path to the main memory.

ロードバッファ１５０、１５１は、演算レジスタ１００〜１０３を効率的に利用するため、ロードバッファ読み出し指示に基づくロード命令でメモリから読み出したデータがＦｉｌｌ（充填）するまで格納しておき、演算レジスタ１００〜１０３への転送実行の指示に従い、指定された要素数分を連続して転送する。 The load buffers 150 and 151 store the data read from the memory by the load instruction based on the load buffer read instruction until the data is filled (filled) in order to efficiently use the arithmetic registers 100 to 103. According to the transfer execution instruction to 103, the specified number of elements are transferred continuously.

図３は、パイプライン制御部２０の構成の一例を示す図である。 FIG. 3 is a diagram illustrating an example of the configuration of the pipeline control unit 20.

命令緩衝バッファ２０１は、命令をバッファして命令発行待機バッファ２１０に空きがあれば命令を送出する。 The instruction buffer buffer 201 buffers an instruction and sends out the instruction if the instruction issuance waiting buffer 210 has a free space.

命令間依存関係チェック部２０２は、命令発行待機バッファ２１０に格納する予定の命令と、命令発行待機バッファ２１０およびスケジューリング済命令バッファ２４０に格納されている先行命令についての依存関係を確認する。そして、命令間依存関係チェック部２０２は、命令発行待機バッファ２１０に格納する予定の命令と命令発行待機バッファ２１０に格納されている命令との間に依存関係がある場合は、命令間依存関係フラグ２１２に依存関係ありを示す値をセットする。また、命令間依存関係チェック部２０２は、命令発行待機バッファ２１０に格納する予定の命令とスケジューリング済命令バッファ２４０との間に、依存関係およびパス競合関係がある場合、それぞれの関係が解消するタイミングまでの時間（第１の時間と第２の時間）を算出（計数）する。さらに、命令間依存関係チェック部２０２は、命令間依存関係カウンタ２１３およびパス競合カウンタ２１４に、算出された第１の時間と第２の時間に対応する値（カウンタ値とも言う）をセットする。 The inter-instruction dependency check unit 202 checks the dependency between the instruction scheduled to be stored in the instruction issue standby buffer 210 and the preceding instruction stored in the instruction issue standby buffer 210 and the scheduled instruction buffer 240. Then, the inter-instruction dependency check unit 202 determines that there is a dependency between the instruction scheduled to be stored in the instruction issuance standby buffer 210 and the instruction stored in the instruction issuance standby buffer 210. In 212, a value indicating that there is a dependency relationship is set. In addition, the inter-instruction dependency check unit 202, when there is a dependency relation and a path conflict relation between the instruction scheduled to be stored in the instruction issue standby buffer 210 and the scheduled instruction buffer 240, is a timing at which each relation is canceled. Time (first time and second time) is calculated (counted). Furthermore, the inter-instruction dependency check unit 202 sets values (also referred to as counter values) corresponding to the calculated first time and second time in the inter-instruction dependency counter 213 and the path conflict counter 214.

ロードバッファＦｉｌｌ判定部２０３は、ロードバッファ２０４にロード命令で指定された数の要素がメモリからロードされたことを確認して、ロード命令を実行するための要素が全て揃った場合にＦｉｌｌｅｄフラグ２１５に所定のフラグ値をセットする機能を持つ。 The load buffer fill determination unit 203 confirms that the number of elements designated by the load instruction has been loaded from the memory into the load buffer 204, and when all the elements for executing the load instruction have been prepared, the filled flag 215 Has a function of setting a predetermined flag value.

命令発行待機バッファ２１０は、命令格納バッファ２１１、命令間依存関係フラグ２１２、命令間依存関係カウンタ２１３、パス競合カウンタ２１４、及び、Ｆｉｌｌｅｄフラグ２１５から構成される。図３では、図の下側にある各バッファ及びカウンタに格納される命令及びカウンタ値が、最も古い命令及びカウンタ値となるよう順次格納される。なお、命令間依存関係フラグ２１２は、一番下に位置するバッファを用意していないが、これは、この位置の命令格納バッファ２１１に格納される命令は最も古い命令で、最優先であることが決定されているためである。 The instruction issue standby buffer 210 includes an instruction storage buffer 211, an inter-instruction dependency flag 212, an inter-instruction dependency counter 213, a path contention counter 214, and a filled flag 215. In FIG. 3, the instruction and counter value stored in each buffer and counter at the lower side of the figure are sequentially stored so as to be the oldest instruction and counter value. The inter-instruction dependency flag 212 does not have a buffer located at the bottom, but this is because the instruction stored in the instruction storage buffer 211 at this position is the oldest instruction and has the highest priority. This is because it has been determined.

命令格納バッファ２１１は、命令発行待機バッファ２１０内で命令本体を格納するバッファである。 The instruction storage buffer 211 is a buffer that stores an instruction body in the instruction issue standby buffer 210.

命令間依存関係フラグ２１２は、命令格納バッファ２１１に格納されている命令間の依存関係を示し、この格納値が「１」である場合は、２つの命令に依存関係があって命令スケジュール順番を守る必要性を示し、格納値が「０」である場合は、依存関係が無いため、状況によっては先行して受けた命令を追い越してスケジューリングされる。なお、命令格納バッファ２１１のより下側のバッファに格納されている古い命令は、それよりも上側に格納されている新しい命令の制約を受けずにスケジューリングされるべきものである。すなわち、古い命令の新しい命令に対する依存関係は、保持不要である。言い換えれば、図３に示すように、新しい命令ほどより多くの古い命令に対する依存関係をもっており、その結果、命令間依存関係フラグ２１２のビット幅が命令毎に異なっている。 The inter-instruction dependency flag 212 indicates the inter-instruction dependency stored in the instruction storage buffer 211. When the stored value is “1”, there is a dependency between two instructions and the instruction schedule order is changed. If the stored value is “0”, there is no dependency relationship, so that the instruction received in advance is overtaken depending on the situation. It should be noted that the old instruction stored in the lower buffer of the instruction storage buffer 211 should be scheduled without being restricted by the new instruction stored above it. That is, it is not necessary to maintain the dependency relationship between the old instruction and the new instruction. In other words, as shown in FIG. 3, the new instruction has a dependency on more old instructions, and as a result, the bit width of the inter-instruction dependency flag 212 is different for each instruction.

命令間依存関係カウンタ２１３は、発行がスケジューリングされた命令に対する依存関係が解消するタイミングまでの第１の時間に対応する値を保持する。その値は、命令間依存関係チェック部２０２によって算出され格納された値である。命令格納バッファ２１１に命令を格納するタイミングまたは先行命令の発行が確定したタイミングで、命令間依存関係カウンタ２１３は、Ｗ→Ｒ、Ｗ→Ｗ、Ｒ→Ｗの関係全てを満たす最も小さい値を選択してセットする（ただし、Ｗ：書き込み、Ｒ：読み出しを表すものとする）。最も小さい値を選択する方法としては、値の大小を比較する回路を使う方法、値をビット並びに展開して論理積を取る方法、または、それらを混在させる方法のいずれを選択しても良い。命令間依存関係カウンタ２１３は、１サイクル毎に１ずつ減算し、０となったら減算を止める。 The inter-instruction dependency counter 213 holds a value corresponding to the first time until the timing at which the dependency for the instruction scheduled to be issued is resolved. The value is a value calculated and stored by the inter-instruction dependency check unit 202. The inter-instruction dependency counter 213 selects the smallest value that satisfies all of the relations W → R, W → W, and R → W at the timing when the instruction is stored in the instruction storage buffer 211 or when the issue of the preceding instruction is confirmed. (W: write, R: read). As a method for selecting the smallest value, any one of a method using a circuit for comparing the magnitudes of values, a method of taking a logical product by expanding values into bits, and a method of mixing them may be selected. The inter-instruction dependency counter 213 subtracts 1 every cycle and stops subtraction when it becomes 0.

パス競合カウンタ２１４は、発行がスケジューリングされた命令に対し、演算器やストアパス等のデータ転送経路による競合を起さないタイミングまでの第２の時間に対応する値を保持する。その値は、命令間依存関係チェック部２０２によって算出され格納された値である。また、パス競合カウンタ２１４は、命令格納バッファ２１１に命令を格納するタイミング、または先行命令の発行が確定したタイミングで、値をセットする。演算パイプラインの構成によっては、データ転送経路の複数のポイントで競合が発生する場合があり、その場合、パス競合カウンタ２１４は、競合を回避する最も小さい値を選択してセットする。最も小さい値を選択する方法として、パス競合カウンタ２１４は、値の大小を比較する回路を使う方法、値をビット並びに展開して論理積を取る方法、または、それらを混在させる方法を用いても良い。パス競合カウンタ２１４は、１サイクル毎に１ずつ減算し、０となったら減算を止める。なお、高速クロックで動作させる装置において、競合確認ポイントが多く、大小比較する回路遅延が問題となるような場合は、パス競合カウンタ２１４を複数設ける構成も想定される。 The path conflict counter 214 holds a value corresponding to the second time until a timing at which no conflict occurs due to a data transfer path such as an arithmetic unit or a store path for an instruction scheduled to be issued. The value is a value calculated and stored by the inter-instruction dependency check unit 202. The path contention counter 214 sets a value at the timing when the instruction is stored in the instruction storage buffer 211 or when the preceding instruction is issued. Depending on the configuration of the arithmetic pipeline, there may be a conflict at a plurality of points in the data transfer path. In this case, the path conflict counter 214 selects and sets the smallest value that avoids the conflict. As a method for selecting the smallest value, the path contention counter 214 may be a method using a circuit for comparing the magnitudes of values, a method of taking a logical product by expanding values into bits, or a method of mixing them. good. The path contention counter 214 decrements by 1 every cycle and stops subtraction when it becomes 0. In a device that operates with a high-speed clock, a configuration in which a plurality of path contention counters 214 are provided may be considered when there are many contention confirmation points and a circuit delay to be compared becomes a problem.

Ｆｉｌｌｅｄフラグ２１５は、ロード命令を実行するための要素が全て揃ったことを示すフラグである。ロードに関係無い演算命令やストア命令などが命令格納バッファ２１１にセットされる場合、Ｆｉｌｌｅｄフラグ２１５は、「１」をセットする。 The Filled flag 215 is a flag indicating that all elements for executing the load instruction are prepared. When an operation instruction or a store instruction not related to load is set in the instruction storage buffer 211, the Filled flag 215 sets “1”.

命令実行スケジューリング部２２０は、命令格納バッファ２１１に保持された命令の中からスケジューリング可能な命令を選択し、命令間の整合性を保った上で最も早く実行可能なタイミングを算出する。 The instruction execution scheduling unit 220 selects an instruction that can be scheduled from the instructions held in the instruction storage buffer 211, and calculates the earliest execution timing while maintaining consistency between instructions.

スケジューリング可能な命令となる条件は、例えば、（１）命令間依存関係フラグ２１２が全て０、（２）命令間依存関係カウンタ２１３の値が７以下、（３）パス競合カウンタ２１４の値が７以下、（４）Ｆｉｌｌｅｄフラグ２１５の値が１、である。 For example, (1) the inter-instruction dependency flag 212 is 0, (2) the inter-instruction dependency counter 213 is 7 or less, and (3) the path contention counter 214 is 7 Hereinafter, (4) the value of the Filled flag 215 is 1.

なお、（２）、（３）の値は、本実施形態の値であり、別の閾値を採ることも可能である。命令実行スケジューリング部２２０は、これらの条件を全て満たした命令の中で最も古い命令を選択し、その命令に対応する命令間依存関係カウンタ２１３、及び、パス競合カウンタ２１４のカウンタ値を、命令発行待機バッファ２１０から読み出す。そして、命令実行スケジューリング部２２０は、読み出したカウンタ値を図５の表に基づいてデコード値に変換する。さらに、命令実行スケジューリング部２２０は、選択された命令とその命令の直前に発行が確定した命令（直前の命令）との依存関係を加味して、最速となる実行待ちタイミングを決定する。なお、直前の命令との依存関係やパス競合関係でスケジューリング不可となった場合（デコード後のｂｉｔパタン中に「１」が存在しない場合)、命令実行スケジューリング部２２０は、スケジューリングを中止し、有効な命令を出力しない。 Note that the values of (2) and (3) are the values of the present embodiment, and other threshold values can be taken. The instruction execution scheduling unit 220 selects the oldest instruction among the instructions satisfying all of these conditions, and issues the counter values of the inter-instruction dependency counter 213 and the path contention counter 214 corresponding to the instruction to issue instructions. Read from standby buffer 210. Then, the instruction execution scheduling unit 220 converts the read counter value into a decoded value based on the table of FIG. Further, the instruction execution scheduling unit 220 determines the fastest execution waiting timing in consideration of the dependency relationship between the selected instruction and the instruction that has been issued immediately before that instruction (immediately preceding instruction). Note that if scheduling becomes impossible due to dependency with the immediately preceding instruction or path conflict (when “1” does not exist in the decoded bit pattern), the instruction execution scheduling unit 220 cancels the scheduling and is effective. Does not output strict instructions.

競合調停部２２１は、命令間依存関係カウンタ２１３のカウンタ値、パス競合カウンタ２１４のカウンタ値、及び、直前命令影響反映値（後出）を比較して、最短となる命令の実行待ち時間を算出する。競合調停部２２１は、例えば、最若番優先の固定順位アービタ回路である。 The contention arbitration unit 221 compares the counter value of the inter-instruction dependency counter 213, the counter value of the path contention counter 214, and the immediately preceding instruction influence reflection value (described later), and calculates the instruction execution waiting time that is the shortest. To do. The contention arbitration unit 221 is, for example, a fixed priority arbiter circuit with the lowest priority.

スケジューリング確定命令影響反映部２３１は、スケジューリング確定した命令が直後に選択された命令に与える影響を計算して、その影響が無くなる最も早いタイミングを決定し、デコードを行い命令実行スケジューリング部２２０に送出する（直前命令影響反映値）。 The scheduling confirmed instruction influence reflecting unit 231 calculates the influence of the instruction that has been confirmed on the instruction selected immediately thereafter, determines the earliest timing at which the influence is eliminated, performs decoding, and sends it to the instruction execution scheduling unit 220. (Influence value reflecting the effect of the previous instruction).

カウンタセット値生成部２３２は、スケジューリング確定した命令が使用するリソースと命令発行待機バッファ２１０に存在する命令の使用リソースが一致して影響を受ける場合に、その影響が無くなる最も早いタイミングまでの時間を算出して、その時間に対応する値を命令発行待機バッファ２１０のカウンタ（命令間依存関係カウンタ２１３）にセットする。例えば、Ｗ→Ｒの関係とＲ→Ｗの関係の両方が存在するような場合には、カウンタセット値生成部２３２は、両方の影響が無くなる最も早いタイミングを算出する。また、パス競合カウンタ２１４においても、複数の経路で影響がある場合、カウンタセット値生成部２３２は、全ての経路で影響が無くなる最も早いタイミングまでの時間を算出して、命令発行待機バッファ２１０のカウンタ（パス競合カウンタ２１４）にセットする。 The counter set value generation unit 232 determines the time until the earliest timing when the resource used by the instruction whose scheduling has been confirmed and the resource used by the instruction existing in the instruction issuance standby buffer 210 are affected. The value corresponding to that time is calculated and set in the counter (inter-instruction dependency counter 213) of the instruction issuance standby buffer 210. For example, when both the relationship of W → R and the relationship of R → W exist, the counter set value generation unit 232 calculates the earliest timing at which both effects are eliminated. Also, in the path contention counter 214, if there is an influence on a plurality of paths, the counter set value generation unit 232 calculates the time until the earliest timing at which the influence does not occur in all the paths, and the instruction issue waiting buffer 210 Set to the counter (path conflict counter 214).

スケジューリング済命令バッファ２４０は、命令実行スケジューリング部２２０にてスケジューリングが確定した命令を、後続命令に対して影響が無くなるまで保持する機能を持つ。スケジューリング済命令格納レジスタ２４１は、スケジューリングが確定した命令を格納する。カウンタ（エントリ期間カウンタ）２４２は、スケジューリング済命令格納レジスタ２４１に対応するカウンタで、このカウンタに後続命令に対する影響が無くなるタイミングまでの時間に対応する値がセットされる。カウンタ２４２は、サイクル毎に減算して、後続命令に影響を与えない値（本例では「１」）となったらリセットする。 The scheduled instruction buffer 240 has a function of holding an instruction whose scheduling is determined by the instruction execution scheduling unit 220 until there is no influence on subsequent instructions. The scheduled instruction storage register 241 stores an instruction for which scheduling has been determined. The counter (entry period counter) 242 is a counter corresponding to the scheduled instruction storage register 241, and a value corresponding to the time until the timing at which the influence on the subsequent instruction is eliminated is set in this counter. The counter 242 is subtracted for each cycle, and is reset when it reaches a value that does not affect subsequent instructions (in this example, “1”).

実行待機バッファ２５０は、命令実行スケジューリング部２２０にてスケジューリングが確定した命令を格納し、指示された時間経過後に命令を実行する指示を出す。実行待機命令格納レジスタ２５１は、実行を待機する命令を格納する。実行待機命令カウンタ２５２は、命令を実行するタイミングを格納し、所定の値、例えばカウンタの値が「０」になったら、対応する命令の命令実行指示を演算パイプ制御部１６０に出力する。なお、複数命令のカウンタ値が同時に「０」になった場合は、実行待機命令カウンタ２５２は、同時に実行を開始するものとする。 The execution standby buffer 250 stores the instruction whose scheduling is fixed by the instruction execution scheduling unit 220, and issues an instruction to execute the instruction after the designated time has elapsed. The execution waiting instruction storage register 251 stores an instruction waiting for execution. The execution standby instruction counter 252 stores the instruction execution timing, and outputs an instruction execution instruction of the corresponding instruction to the operation pipe control unit 160 when a predetermined value, for example, the counter value becomes “0”. When the counter values of a plurality of instructions simultaneously become “0”, the execution standby instruction counter 252 starts executing at the same time.

ここで、ストアデータセレクタ１０５、クロスバ１０６、オペランドセレクタ１１０、１１１、１１２、１１３、積和演算器１２０、論理演算器１３０、演算パイプ制御部１６０、命令間依存関係チェック部２０２、ロードバッファＦｉｌｌ判定部２０３、競合調停部２２１、スケジューリング確定命令影響反映部２３１、及び、カウンタセット値生成部２３２は、例えば、論理回路等のハードウェア回路で構成される。 Here, the store data selector 105, the crossbar 106, the operand selectors 110, 111, 112, 113, the sum-of-products calculator 120, the logic calculator 130, the calculation pipe control unit 160, the inter-instruction dependency check unit 202, and the load buffer Fill determination The unit 203, the contention arbitration unit 221, the scheduling confirmed instruction influence reflection unit 231, and the counter set value generation unit 232 are configured by hardware circuits such as logic circuits, for example.

また、演算レジスタ１００、１０１、１０２、１０３、ロードバッファ１５０、１５１、命令緩衝バッファ２０１、命令発行待機バッファ２１０、命令格納バッファ２１１、命令間依存関係フラグ２１２、命令間依存関係カウンタ２１３、パス競合カウンタ２１４、Ｆｉｌｌｅｄフラグ２１５、実行確定命令レジスタ２２２、実行待ち時間２２３、スケジューリング済命令バッファ２４０、スケジューリング済命令格納レジスタ２４１、カウンタ２４２、実行待機バッファ２５０、実行待機命令格納レジスタ２５１、実行待機命令カウンタ２５２は、例えば、半導体メモリ等の記憶装置で構成される。また、情報処理装置１は、コンピュータ装置によって実現されてもよい。この場合、ストアデータセレクタ１０５、クロスバ１０６、オペランドセレクタ１１０、１１１、１１２、１１３、積和演算器１２０、論理演算器１３０、演算パイプ制御部１６０、命令間依存関係チェック部２０２、ロードバッファＦｉｌｌ判定部２０３、競合調停部２２１、スケジューリング確定命令影響反映部２３１、及び、カウンタセット値生成部２３２は、コンピュータである情報処理装置１のプロセッサが、図示されないメモリ上のプログラムを実行することで実現されてもよい。プログラムは、不揮発性メモリに格納されてもよい。 Also, arithmetic registers 100, 101, 102, 103, load buffers 150, 151, instruction buffer buffer 201, instruction issue standby buffer 210, instruction storage buffer 211, inter-instruction dependency flag 212, inter-instruction dependency counter 213, path contention Counter 214, Filled flag 215, execution fixed instruction register 222, execution waiting time 223, scheduled instruction buffer 240, scheduled instruction storage register 241, counter 242, execution standby buffer 250, execution standby instruction storage register 251, execution standby instruction counter 252 is configured by a storage device such as a semiconductor memory, for example. Further, the information processing apparatus 1 may be realized by a computer apparatus. In this case, the store data selector 105, the crossbar 106, the operand selectors 110, 111, 112, 113, the product-sum calculator 120, the logic calculator 130, the calculation pipe control unit 160, the inter-instruction dependency check unit 202, and the load buffer Fill determination The unit 203, the contention arbitration unit 221, the scheduling confirmation instruction influence reflection unit 231, and the counter set value generation unit 232 are realized by the processor of the information processing apparatus 1 being a computer executing a program on a memory (not shown). May be. The program may be stored in a nonvolatile memory.

図４は、情報処理装置１の動作を示すタイムチャートである。 FIG. 4 is a time chart showing the operation of the information processing apparatus 1.

また、図６〜１０は、図４の動作の詳細をクロック毎に示した図である。 6 to 10 show details of the operation of FIG. 4 for each clock.

なお、ここでは、下記の命令列が実行されるケースを想定している。いずれも、ベクトル長＝４（ＶＬ＝４）の命令であるとする。命令１）乃至命令５）は、次のとおりである。
１）ＡＤＤＶ０ ← Ｖ０＋Ｖ３
２）ＡＮＤＶ３ ← Ｖ０＆Ｖ１
３）ＬＤＶ０ ← Ｍ
４）ＬＤＶ１ ← Ｍ
５）ＡＮＤＶ２ ← Ｖ０＆Ｖ１
ここで、
ＡＤＤ、＋：加算
ＡＮＤ、＆：論理積
ＬＤ、Ｍ：ロード
Ｖ０〜Ｖ３：パイプライン処理実行部１０の演算レジスタ１００、１０１、１０２、１０３を示す。また矢印「←」は、図４中の命令列の枠内では「＜−」で示されている。 Here, it is assumed that the following instruction sequence is executed. In either case, it is assumed that the instruction is vector length = 4 (VL = 4). Instructions 1) to 5) are as follows.
1) ADD V0 ← V0 + V3
2) AND V3 ← V0 & V1
3) LD V0 ← M
4) LD V1 ← M
5) AND V2 ← V0 & V1
here,
ADD, +: addition AND, &: logical product LD, M: load V0 to V3: operation registers 100, 101, 102, 103 of the pipeline processing execution unit 10 are shown. The arrow “←” is indicated by “<−” in the frame of the instruction sequence in FIG.

なお、命令列においては、矢印を挟んだ右辺と左辺が対になり、各命令の動作を記述している。例えば、命令１）では、「＋」と「ＡＤＤ」、同様に、命令３）及び命令４）では、「Ｍ（「メモリアクセス」等を意味する）」と「ＬＤ」である。 In the instruction sequence, the right side and the left side across the arrow are paired to describe the operation of each instruction. For example, in the instruction 1), “+” and “ADD”, and in the instructions 3) and 4), “M (means“ memory access ”)” and “LD”.

図５は、カウンタ値のデコードに用いるデコードテーブル（カウンタ値デコード対応表）を示す。 FIG. 5 shows a decoding table (counter value decoding correspondence table) used for decoding the counter value.

デコードテーブルは、左側のカウンタ値に対応するデコード値を示す。そして、デコード値に「１」がある箇所が発行スケジューリング可能であることを示している。また、本例では、カウンタ値「８」以上の場合はスケジューリング不可、等とする。なお、図５のデコードテーブルは、以下の説明の中で用いる。 The decode table indicates a decode value corresponding to the counter value on the left side. A portion having a decode value of “1” indicates that issue scheduling is possible. In this example, scheduling is impossible when the counter value is “8” or more, and so on. The decode table in FIG. 5 is used in the following description.

以下、図４のタイムチャート、及び、図６〜１０を用いて、時間の経過に沿ってクロックのカウント順に、情報処理装置１の動作を説明する。 Hereinafter, the operation of the information processing apparatus 1 will be described in the order of clock counts with the passage of time, using the time chart of FIG. 4 and FIGS.

Ｃｌｏｃｋ１：
命令緩衝バッファ２０１に、命令１）が格納される。この時、命令発行待機バッファ２１０（命令格納バッファ２１１）には命令が格納されていないので、命令間依存関係チェック部２０２の出力（以下、単に出力と記載）は、先行命令との依存関係が全く無いことを示す「００００００」を出力する。 Clock1:
Instruction 1) is stored in the instruction buffer buffer 201. At this time, since no instruction is stored in the instruction issue standby buffer 210 (instruction storage buffer 211), the output of the inter-instruction dependency check unit 202 (hereinafter simply referred to as output) has a dependency with the preceding instruction. “000000” indicating that there is no output is output.

なお、出力は、命令緩衝バッファ２０１に格納された命令と、命令格納バッファ２１１の各命令との依存関係の有無を１／０で示すもので、例えば、左端から順に、２１１＃０、２１１＃１、２１１＃２（以下、省略）との依存関係を、順次、示すものとする。 The output indicates the presence / absence of the dependency relationship between the instruction stored in the instruction buffer buffer 201 and each instruction in the instruction storage buffer 211 by 1/0. For example, 211 # 0, 211 # in order from the left end. Dependencies with 1, 211 # 2 (hereinafter omitted) are sequentially shown.

Ｃｌｏｃｋ２：
命令１）が命令格納バッファ２１１＃０に格納され、命令緩衝バッファ２０１には、命令２）が入る。命令１）と命令２）の間にはＷ→Ｒの関係があるので、出力２０２は、「１０００００」となる。 Clock2:
Instruction 1) is stored in instruction storage buffer 211 # 0, and instruction buffer buffer 201 receives instruction 2). Since there is a relationship of W → R between the instruction 1) and the instruction 2), the output 202 is “100000”.

命令格納バッファ２１１に命令が格納される際、ロード命令以外は条件が揃ったとしてＦｉｌｌｅｄフラグ２１５には「１」がセットされる。 When an instruction is stored in the instruction storage buffer 211, “1” is set in the Filled flag 215 on the assumption that conditions other than the load instruction are met.

命令格納バッファ２１１＃０に格納された命令１）は、最優先でスケジュール可能として選択される。そして、対応する、命令間依存関係カウンタ２１３、及び、パス競合カウンタ２１４のカウンタ値から、どちらも「０」の値が読み出されてデコードされる。この時、図５のデコードテーブルに従い、「１１１１１１１１」を得る。 The instruction 1) stored in the instruction storage buffer 211 # 0 is selected as the schedule with the highest priority. Then, from the corresponding counter values of the inter-instruction dependency counter 213 and the path conflict counter 214, a value of “0” is read out and decoded. At this time, “11111111” is obtained according to the decoding table of FIG.

直前にスケジュールが確定した命令は存在しないため、命令実行スケジューリング部２２０は、スケジューリング確定命令影響反映部２３１から、影響が無い意味の「１１１１１１１１」（直前命令影響反映値）を受け取り、全て論理積した「１１１１１１１１」をｌｅａｄｉｎｇ０回路（リーディング０回路）に供給し待ち時間を算出し、「１０００００００」（待ち時間）を生成する。「１０００００００」は、一番左のｂｉｔが「１」となっており、０サイクル後に発行可能であることを示している。 Since there is no instruction for which the schedule has been confirmed immediately before, the instruction execution scheduling unit 220 receives “11111111” (immediate instruction effect reflection value) having no effect from the scheduling confirmation instruction effect reflection unit 231 and logically ANDs them all. “11111111” is supplied to the leading 0 circuit (leading 0 circuit), the waiting time is calculated, and “10000000” (waiting time) is generated. “10000000” indicates that the leftmost bit is “1” and can be issued after 0 cycles.

Ｃｌｏｃｋ３：
命令１）が実行確定となり、待ち時間が「０」であったため、減算を行わず待ち時間「０」となり、実行確定命令レジスタ２２２に格納される。命令２）が命令格納バッファ２１１＃１に格納され、命令緩衝バッファ２０１には命令３）が入る。命令３）と命令１）の間には、Ｗ→Ｗの関係とＲ→Ｗの関係があり、命令３）と命令２）の間には、Ｒ→Ｗの関係があるため、出力は、どちらとも依存関係があることを示す「１１００００」となる。 Clock3:
Since the instruction 1) is determined to be executed and the waiting time is “0”, the subtraction is not performed and the waiting time becomes “0” and is stored in the execution determined instruction register 222. Instruction 2) is stored in instruction storage buffer 211 # 1, and instruction buffer buffer 201 receives instruction 3). There is a relationship of W → W and a relationship of R → W between the instruction 3) and the instruction 1), and there is a relationship of R → W between the instruction 3) and the instruction 2). Both are “110000” indicating that there is a dependency.

命令２）もロード命令では無いため、Ｆｉｌｌｅｄフラグ２１５には、「１」がセットされる。 Since the instruction 2) is not a load instruction, the Filled flag 215 is set to “1”.

命令実行スケジューリング部２２０は、命令１）の実行が確定しているため、命令１）を除いた命令格納バッファ２１１から最優先の命令２）を選択して、スケジューリングを行う。 Since the execution of the instruction 1) is confirmed, the instruction execution scheduling unit 220 selects the instruction 2) having the highest priority from the instruction storage buffer 211 excluding the instruction 1) and performs scheduling.

カウンタ値は、命令間依存関係カウンタ２１３、及び、パス競合カウンタ２１４どちらも「０」なので、図５に従い、どちらも「１１１１１１１１」が出力される。 The counter value is “0” for both the inter-instruction dependency counter 213 and the path conflict counter 214, and therefore “11111111” is output according to FIG. 5.

次に、直前に命令１）の実行が確定しているので、スケジューリング確定命令影響反映部２３１が、命令１）が直後に選択された命令２）に与える影響を計算する。命令１）と命令２）は、使う演算器が異なるので、関係するのは、Ｖ０レジスタについてのＷ→Ｒの関係と、Ｖ３レジスタのＲ→Ｗの関係の２つである。Ｗ→Ｒの関係の場合、待ち時間＋先行命令ＴＡＴの式で計算できるので、０＋５＝５となる。また、Ｒ→Ｗの関係の場合、待ち時間−後続命令ＴＡＴ＋１の式で計算できるので、０−３＋１＝−２となるが、０以下は０とする。２つの関係から得られた「５」と「０」を考慮すると、データ一貫性を満たして最も早い実行時間である「５」が選出される（すなわち、「５」がある以上、他が「０」であっても、「５」のカウントが完了するのを待たなければならない。）。スケジューリング確定命令影響反映部２３１は、この「５」を図５に従ってデコードして「０００００１１１」を得る。そして、命令実行スケジューリング部２２０（競合調停部２２１）は、「１１１１１１１１」、「１１１１１１１１」、「０００００１１１」の３つの値を論理積して得られた「０００００１１１」をｌｅａｄｉｎｇ０回路に入力して「０００００１００」を得る。これは、実行待ち時間が５で、５サイクル後に発行可能であることを意味している。 Next, since the execution of the instruction 1) is confirmed immediately before, the scheduling confirmed instruction influence reflection unit 231 calculates the influence of the instruction 1) on the instruction 2) selected immediately after. The instruction 1) and the instruction 2) are different in the arithmetic unit to be used. Therefore, there are two relations, that is, the relation W → R for the V0 register and the relation R → W for the V3 register. In the case of the relationship of W → R, since it can be calculated by the equation of waiting time + preceding instruction TAT, 0 + 5 = 5. In the case of the relationship of R → W, since it can be calculated by the equation of waiting time−following instruction TAT + 1, 0−3 + 1 = −2, but 0 or less is 0. Considering “5” and “0” obtained from the two relationships, “5”, which is the earliest execution time satisfying the data consistency, is selected (that is, as long as “5” exists, other “ Even if it is “0”, it is necessary to wait for the count of “5” to be completed). The scheduling confirmation instruction influence reflection unit 231 decodes “5” according to FIG. 5 to obtain “00000111”. Then, the instruction execution scheduling unit 220 (contention arbitration unit 221) inputs “00000111” obtained by ANDing the three values “11111111”, “11111111”, and “00000111” to the leading 0 circuit, and outputs “00000100”. Get. This means that the execution waiting time is 5 and can be issued after 5 cycles.

Ｃｌｏｃｋ４：
Ｃｌｏｃｋ３で実行待ち時間を算出した命令２）が実行確定命令レジスタ２２２に格納され、実行待ち時間として５−１＝４を意味する「００００１０００」が格納される。 Clock4:
The instruction 2) for which the execution waiting time is calculated in Clock 3 is stored in the execution fixed instruction register 222, and “00001000” meaning 5-1 = 4 is stored as the execution waiting time.

また、Ｃｌｏｃｋ３で実行確定命令レジスタ２２２に格納されていた命令１）は、実行待機バッファ２５０（詳しくは、実行待機命令格納レジスタ２５１であるが、以下、このように記載する）に移動する。実行待ち時間についてもエンコード後「０」が実行待機命令カウンタ２５２にセットされる。実行待機命令カウンタ２５２の値が「０」なので、対応する命令１）は、次にＣｌｏｃｋ５で実行開始される。 In addition, the instruction 1) stored in the execution fixed instruction register 222 at Clock3 moves to the execution standby buffer 250 (specifically, the execution standby instruction storage register 251 is described below). As for the execution waiting time, “0” after encoding is set in the execution waiting instruction counter 252. Since the value of the execution standby instruction counter 252 is “0”, execution of the corresponding instruction 1) is next started at Clock 5.

命令１）は、スケジューリング済命令バッファ２４０（詳しくは、スケジューリング済命令格納レジスタ２４１であるが、以下、このように記載する）にも格納される。この時、対応するカウンタ２４２にも、実行待ち時間＋演算ＴＡＴ（ＡＤＤ）＋ＶＬで計算される０＋５＋４＝９がセットされる。 The instruction 1) is also stored in the scheduled instruction buffer 240 (specifically, the scheduled instruction storage register 241, which will be described below). At this time, 0 + 5 + 4 = 9 calculated by the execution waiting time + operation TAT (ADD) + VL is also set in the corresponding counter 242.

命令格納バッファ２１１には、命令３）が格納されるが、命令２）に対してＲ→Ｗの依存関係があることから、対応するｂｉｔに「１」をセットするところ、命令２）が実行確定命令レジスタ２２２に格納されたことを受けキャンセルされて、命令間依存関係フラグ２１２には「０」がセットされる。 Instruction 3) is stored in the instruction storage buffer 211. Since there is a dependency of R → W on the instruction 2), the instruction 2) is executed when the corresponding bit is set to “1”. In response to being stored in the fixed instruction register 222, it is canceled, and the inter-instruction dependency flag 212 is set to “0”.

命令１）が実行確定したのを受け、命令２）に対応する命令間依存関係カウンタ２１３には、０＋５＝５で計算される「５」からサイクル毎の減算を考慮した「４」がセットされる。 After the execution of the instruction 1) is confirmed, the inter-instruction dependency counter 213 corresponding to the instruction 2) is set to “4” in consideration of subtraction for each cycle from “5” calculated by 0 + 5 = 5. The

命令３）が命令格納バッファ２１０に格納されるが、命令３）はロード命令で、本来メモリからのロードデータ要素が全て揃わないとＦｉｌｌｅｄフラグ２１５には「１」がセットされないが、本動作説明では説明簡略化のため全ロードデータ要素がロードバッファに存在する前提としているため、Ｆｉｌｌｅｄフラグ２１５には「１」がセットされる。 The instruction 3) is stored in the instruction storage buffer 210. The instruction 3) is a load instruction. If all the load data elements from the original memory are not prepared, the Filled flag 215 is not set to "1". However, since it is assumed that all load data elements exist in the load buffer for the sake of simplification of description, “1” is set in the Filled flag 215.

このため、命令３）は、依存関係が無いためスケジューリング可能として選択される。 For this reason, the instruction 3) is selected as schedulable because there is no dependency.

命令３）は、直前に実行確定命令レジスタ２２２に格納された命令２）とはＶ０レジスタについてＲ→Ｗの依存関係があることから、Ｒ→Ｗ：待ち時間−後続命令ＴＡＴ＋１の計算を行い４−３＋１＝２をデコードして「００１１１１１１」を得る（直前命令影響反映値）。競合調停部２２１は、カウンタ値（命令間依存関係カウンタ２１３の値：「１１１１１１１１」、及び、パス競合カウンタ２１４の値：「１１１１１１１１」）からデコードされた値と論理積を取ってｌｅａｄｉｎｇ０回路に入力し、「００１０００００」を得る。 Since the instruction 3) has an R → W dependency on the V0 register with the instruction 2) stored in the execution confirmation instruction register 222 immediately before, the calculation of R → W: waiting time−following instruction TAT + 1 is performed. -3 + 1 = 2 is decoded to obtain “00111111” (immediate instruction effect reflection value). The contention arbitration unit 221 takes the logical product of the values decoded from the counter values (the value of the inter-instruction dependency counter 213: “11111111” and the value of the path contention counter 214: “11111111”) and inputs the logical product to the leading0 circuit. And “00100000” is obtained.

以降、特徴的な動作について説明する。 Hereinafter, characteristic operations will be described.

Ｃｌｏｃｋ５：
命令１）が実行されて実行待機バッファ２５０から削除されると同時に、命令２）が実行待ち時間「４」としてセットされる。 Clock5:
At the same time that the instruction 1) is executed and deleted from the execution waiting buffer 250, the instruction 2) is set as the execution waiting time “4”.

同時に、命令２）は、スケジューリング済命令バッファ２４０に、実行待ち時間＋演算ＴＡＴ（ＡＮＤの）＋ＶＬ＝４＋３＋４＝１１の値と共にセットされる。 At the same time, instruction 2) is set in the scheduled instruction buffer 240 with a value of execution latency + operation TAT (AND) + VL = 4 + 3 + 4 = 11.

命令３）は、算出された「１」と共に実行確定命令レジスタ２２２に格納される。 The instruction 3) is stored in the execution fixed instruction register 222 together with the calculated “1”.

命令４）が命令格納バッファ２１０に格納される際、実行が確定している命令２）との依存関係から算出された「１」が命令間依存関係カウンタ２１３にセットされる。 When the instruction 4) is stored in the instruction storage buffer 210, “1” calculated from the dependency relationship with the instruction 2) whose execution is confirmed is set in the inter-instruction dependency counter 213.

直前実行確定命令２２２の命令３）は、命令４）に対してロードバッファからの転送パスも異なる（パス競合カウンタ２１４のカウンタデコード値：「１１１１１１１１」である）など依存関係等の影響は無いので「１１１１１１１１」（直前命令反映値）が得られ、命令間依存関係カウンタ２１３から読み出した値の「１」を示す「０１１１１１１１」と論理積を取って、ｌｅａｄｉｎｇ０に入力すると「０１００００００」が得られる。命令３）は、実行確定命令レジスタ２２２に減算後の「１」と共に格納される。 The instruction 3) of the immediately-preceding execution confirmation instruction 222 is not affected by the dependency relationship such as the transfer path from the load buffer is different from the instruction 4) (the counter decode value of the path conflict counter 214 is “11111111”). "11111111" (immediate instruction reflected value) is obtained, and "01111111" indicating "1" of the value read from the inter-instruction dependency relation counter 213 is ANDed and input to reading0, "01000000" is obtained. The instruction 3) is stored in the execution confirmation instruction register 222 together with “1” after subtraction.

Ｃｌｏｃｋ６（格納データ図は省略）：
図４に示すように、命令３）が実行待機バッファ２５０に値「１」と共に格納される。また、命令４）は、確定命令レジスタ２２２に、減算後の「０」と共に格納される。 Clock 6 (stored data diagram is omitted):
As shown in FIG. 4, the instruction 3) is stored in the execution standby buffer 250 together with the value “1”. The instruction 4) is stored in the fixed instruction register 222 together with “0” after subtraction.

Ｃｌｏｃｋ７（格納データ図は省略）：
命令３）に対応する実行待機命令カウンタ２５２のカウンタ値が減算されて「０」となる。命令４）は、実行待機バッファ２５０に実行待機命令カウンタ２５２のカウンタ値「０」と共に格納される。命令３）および命令４）は、どちらも実行待機命令カウンタ２５２のカウンタ値が「０」となったので、次のサイクルで同時に実行可能である。 Clock 7 (stored data diagram is omitted):
The counter value of the execution standby instruction counter 252 corresponding to the instruction 3) is subtracted to “0”. The instruction 4) is stored in the execution standby buffer 250 together with the counter value “0” of the execution standby instruction counter 252. Since both the instruction 3) and the instruction 4) have the counter value of the execution standby instruction counter 252 being “0”, they can be executed simultaneously in the next cycle.

以降の説明は省略するが、このように本実施形態の命令発行制御方式においては、複数の命令を同時に発行するスケジューリングを行うことが可能である。 Although the following description is omitted, in this way, in the instruction issue control system of this embodiment, it is possible to perform scheduling for issuing a plurality of instructions simultaneously.

次に、スケジューリング済命令バッファ２４０が４個のみの構成を想定した場合に、図４と同じ命令列を実行した場合の動作を、図１１を用いて説明する。 Next, an operation when the same instruction sequence as that in FIG. 4 is executed when a configuration with only four scheduled instruction buffers 240 is assumed will be described with reference to FIG.

図１１は、スケジューリング済命令バッファ２４０が４個のみの構成の場合の情報処理装置１の動作を示すタイムチャートである。 FIG. 11 is a time chart showing the operation of the information processing apparatus 1 when the number of scheduled instruction buffers 240 is only four.

以下、図４の動作と異なる特徴的な箇所を中心に説明する。 In the following, a description will be given focusing on characteristic points different from the operation of FIG.

Ｃｌｏｃｋ６で、命令格納バッファ２１０から命令５）が読み出される。この時、スケジューリング済命令バッファ２４０は、４個中３個を使用しており、実行確定命令レジスタ２２２に命令４）が格納されているので、合計４個を使うことが確定している。このため、次のＣｌｏｃｋ７で、実行確定命令レジスタ２２２に命令５）は、格納されない。 At Clock 6, the instruction 5) is read from the instruction storage buffer 210. At this time, three of the four scheduled instruction buffers 240 are used, and since the instruction 4) is stored in the execution fixed instruction register 222, it is determined that a total of four are used. Therefore, the instruction 5) is not stored in the execution confirmation instruction register 222 at the next Clock7.

Ｃｌｏｃｋ８〜Ｃｌｏｃｋ１０までは、スケジューリング済命令バッファ２４０を４個全て使用している状況は変わらない。 From Clock 8 to Clock 10, the situation where all four scheduled instruction buffers 240 are used remains the same.

Ｃｌｏｃｋ１１において、スケジューリング済命令バッファ２４０に格納されている命令１）に対応するカウンタ２４２が「２」を示す（詳細図は省略）。このカウンタ値が「１」の時、次のサイクルでリセットされることが判っているため、次のサイクルで命令５）が実行確定命令レジスタ２２２に格納される。 In Clock 11, the counter 242 corresponding to the instruction 1) stored in the scheduled instruction buffer 240 indicates “2” (details are omitted). Since it is known that the counter value is reset in the next cycle when the counter value is “1”, the instruction 5) is stored in the execution confirmation instruction register 222 in the next cycle.

Ｃｌｏｃｋ１２において、命令５）が実行確定命令レジスタ２２２に格納される。 In Clock 12, the instruction 5) is stored in the execution confirmation instruction register 222.

Ｃｌｏｃｋ１３において、命令５）が実行待機バッファ２５０に値「０」と共に格納され、次のＣｌｏｃｋ１４で命令の実行が開始される。 In Clock 13, the instruction 5) is stored in the execution standby buffer 250 together with the value “0”, and the execution of the instruction is started in the next Clock 14.

このように、情報処理装置１は、バッファ数に制限があった場合にもレジスタ一貫性を保って命令発行することが可能である。 As described above, the information processing apparatus 1 can issue instructions while maintaining register consistency even when the number of buffers is limited.

次に、図１２を用いて、前述の命令とは異なる命令列をスケジューリングする動作を説明する。 Next, an operation for scheduling an instruction sequence different from the above-described instruction will be described with reference to FIG.

図１２は、異なる命令列を使用した場合の情報処理装置１の動作を示すタイムチャートである。 FIG. 12 is a time chart showing the operation of the information processing apparatus 1 when different instruction sequences are used.

ここでは、下記の命令列が実行されるケースを想定している。なお、いずれもベクトル長＝４（ＶＬ＝４）の命令である。命令１）乃至命令４）は、次のとおりである。
１）ＬＤＶ１ ← Ｍ
２）ＭＰＹＶ２ ← Ｖ０ * Ｖ１
３）ＬＤＶ０ ← Ｍ
４）ＡＮＤＶ１ ← Ｖ０＆Ｖ３
ここで、
ＭＰＹ、＊：乗算
を示す。また矢印「←」は、図１２中の命令列の枠内では「＜−」で示されている。なお、前述の命令列と同じ記号は説明を省略する。 Here, it is assumed that the following instruction sequence is executed. Both instructions are vector length = 4 (VL = 4). The instructions 1) to 4) are as follows.
1) LD V1 ← M
2) MPY V2 ← V0 * V1
3) LD V0 ← M
4) AND V1 ← V0 & V3
here,
MPY, *: indicates multiplication. The arrow “←” is indicated by “<−” in the frame of the instruction sequence in FIG. The description of the same symbols as those in the above instruction sequence is omitted.

この命令列で特徴的なのは、命令２）で読み出されたレジスタを命令３）でロードデータ書き込みに再利用して、命令４）でＡＮＤ命令に使っている点である。従って、この例のように、命令２）と命令３）の間で、Ｖ０レジスタについてＲ→Ｗの依存関係がある場合にも、命令３）を命令２）の前に実行開始して演算器を効率的に使用できるようスケジューリング可能である。 What is characteristic about this instruction sequence is that the register read in the instruction 2) is reused for writing the load data in the instruction 3) and used in the AND instruction in the instruction 4). Therefore, as in this example, even when there is an R → W dependency on the V0 register between the instruction 2) and the instruction 3), the execution of the instruction 3) is started before the instruction 2). Can be scheduled to be used efficiently.

図１２において、命令２）はＣｌｏｃｋ８で実行開始しているのに対して、命令３）は１サイクル前のＣｌｏｃｋ７で実行開始しているのが上の動作を示している。 In FIG. 12, the instruction 2) starts executing at Clock8, while the instruction 3) starts executing at Clock7 one cycle before.

上記のように、本実施形態では、発行待機命令毎に、用意したカウンタ２１３および２１４を設ける。これにより、例えば、演算レジスタ数が３２個で発行待機命令バッファ数が１６個の構成の場合に、通常は、少なくともレジスタ一貫性を保つためのビジー管理、リードパス、およびライトパスについてビジーを管理する必要があるため少なくとも３２個*３＝９６個のビジーを管理するカウンタが必要であった。しかし、本実施形態では、１６個*２＝３２個のカウンタで済むので、６４個のカウンタ分のハードウェア量が削減できる。 As described above, in the present embodiment, the prepared counters 213 and 214 are provided for each issue waiting instruction. Thus, for example, when the number of operation registers is 32 and the number of issued standby instruction buffers is 16, normally, busy management for at least register consistency, read path, and write path are managed. Because it is necessary, a counter that manages at least 32 * 3 = 96 busys is necessary. However, in the present embodiment, since 16 * 2 = 32 counters are sufficient, the amount of hardware for 64 counters can be reduced.

また、カウンタの値をデコード表に従ってデコードし、論理積後のビット列をｌｅａｄｉｎｇ０回路に入力してデータ一貫性に矛盾無い最短の待ち時間を算出する方法を取ることで、高クロックに対応した高効率な発行制御を行うことが可能である。本実施形態は、
論理積とｌｅａｄｉｎｇ０回路を使用可能なビット列を実現するデコード表が特徴的で、主に、命令実行スケジューリング部２２０、競合調停回路２２１、スケジューリング確定命令影響反映部２３１、カウンタセット値生成部２３２が連携して実現している。 In addition, by decoding the counter value according to the decoding table, and inputting the bit string after the logical product to the leading 0 circuit to calculate the shortest waiting time that is consistent with the data consistency, high efficiency corresponding to the high clock can be obtained. Issuance control. This embodiment
The decoding table that realizes the bit string that can use the logical product and the leading0 circuit is characteristic, and mainly the instruction execution scheduling unit 220, the contention arbitration circuit 221, the scheduling fixed instruction influence reflection unit 231, and the counter set value generation unit 232 cooperate. And realized.

本実施形態に係る情報処理装置１は、以下に記載するような効果を奏する。 The information processing apparatus 1 according to the present embodiment has the following effects.

その効果は、演算リソースが多い場合でも比較的少ないハードウェア量で効率的に命令を実行することが可能である。 The effect is that instructions can be executed efficiently with a relatively small amount of hardware even when there are many computing resources.

その理由は、先行命令に対する発行抑止期間を管理するカウンタを発行待機命令毎に用意する方式とし、命令間の依存関係を時間的に管理する命令間依存関係カウンタと、演算器パイプラインのビジーを管理するパス競合カウンタと、直前に実行確定した命令に対する影響も考慮して待ち時間を算出し、これら３つの値を比較して最速で実行できるタイミングを算出する機構を設ける、からである。
＜第二の実施形態＞
次に、本発明を実施するための最良の第二の形態について図面を参照して詳細に説明する。 The reason is that a counter that manages the issue suppression period for the preceding instruction is prepared for each issue waiting instruction, the inter-instruction dependency counter that temporally manages the inter-instruction dependency, and the busy of the arithmetic pipeline. This is because a waiting time is calculated in consideration of the path contention counter to be managed and the effect on the instruction that has been confirmed immediately before execution, and a mechanism is provided for calculating the timing that can be executed at the fastest speed by comparing these three values.
<Second Embodiment>
Next, the second best mode for carrying out the present invention will be described in detail with reference to the drawings.

図１３は、第二の実施形態に係る、情報処理装置３０の構成の一例を示すブロック図である。 FIG. 13 is a block diagram illustrating an example of the configuration of the information processing apparatus 30 according to the second embodiment.

情報処理装置３０は、第一の実施形態の情報処理装置１と同じ構成で、ロードバッファ１５０、１５１に格納されるデータの配置が異なる。 The information processing apparatus 30 has the same configuration as the information processing apparatus 1 of the first embodiment, and the arrangement of data stored in the load buffers 150 and 151 is different.

ロードバッファ１５０、１５１にロードデータを格納する際に、性能向上のため、図１３のように、格納するデータ配置を変更する場合がある。図１３のようにロードデータを配置した場合には、ロードバッファ１５０とロードバッファ１５１から交互にデータを読み出す形となる。 When load data is stored in the load buffers 150 and 151, the data arrangement to be stored may be changed as shown in FIG. When load data is arranged as shown in FIG. 13, data is read alternately from the load buffer 150 and the load buffer 151.

この時、あるタイミングでＬ０−０〜Ｌ０−３のロードがスケジューリングされた状態で、Ｌ１−０〜Ｌ１−３の命令をスケジューリングする際のビット列を「０００１０１０１」というようにデコードすることで、演算レジスタ１００〜１０３（クロスバ１０６）が、交互読み出しにも対応可能である(「１」が立っている箇所が空いているタイミングを示している)。 At this time, with the load of L0-0 to L0-3 scheduled at a certain timing, the bit string used when scheduling the instructions of L1-0 to L1-3 is decoded as “00010101”, thereby calculating The registers 100 to 103 (crossbar 106) can also cope with alternate reading (showing the timing when the place where “1” stands is empty).

ところで、上記は、複数のロードバッファ１５０、１５１にロードデータを格納する際の性能向上であったが、複数の演算器を用いる場合においても、同様に、性能向上を行うことができる。 By the way, the above is the performance improvement when the load data is stored in the plurality of load buffers 150 and 151. However, even when a plurality of arithmetic units are used, the performance improvement can be similarly performed.

例えば、同一機能の演算器が２つ存在し、どちらか空いている方を選ぶ構成の場合には、それぞれの演算器のビット列の論理和を取って表現することで対応可能である。 For example, when there are two arithmetic units having the same function and one of the two units is selected, it can be handled by expressing the logical sum of the bit strings of the respective arithmetic units.

例えば、「００１１１１１１」と「００００１１１１」の論理和から、「００１１１１１１」が得られる。これは、早くリソースが空く演算器をスケジューリング可能であることを示す。 For example, “00111111” is obtained from the logical sum of “00111111” and “00001111”. This indicates that it is possible to schedule a computing unit that has a fast resource availability.

本実施形態に係る情報処理装置３０は、以下に記載するような効果を奏する。 The information processing apparatus 30 according to the present embodiment has the following effects.

その効果は、パイプライン処理の性能を向上することができる。 The effect can improve the performance of pipeline processing.

その理由は、２つのロードバッファから交互にデータを読み出すからである。
＜第三の実施形態＞
図１４は、第三の実施形態に係る、情報処理装置４０の構成の一例を示すブロック図である。 The reason is that data is read alternately from the two load buffers.
<Third embodiment>
FIG. 14 is a block diagram illustrating an example of the configuration of the information processing apparatus 40 according to the third embodiment.

情報処理装置４０は、パイプライン制御部５０、及び、パイプライン処理実行部６０を備える。 The information processing apparatus 40 includes a pipeline control unit 50 and a pipeline processing execution unit 60.

パイプライン制御部５０は、パイプライン処理実行部６０を制御する。 The pipeline control unit 50 controls the pipeline processing execution unit 60.

パイプライン処理実行部６０は、演算器パイプラインのパイプライン処理を実行する。 The pipeline processing execution unit 60 executes pipeline processing of the arithmetic unit pipeline.

パイプライン制御部５０が、命令発行待機バッファに存在する命令に対して、使用するリソースが一致する、スケジューリング済みの命令の影響が無くなるタイミングまでの第１の時間を算出して命令間依存関係カウンタに第１の時間に対応するカウンタ値を格納し、演算器パイプラインにおけるパス競合が発生しないタイミングまでの第２の時間を算出してパス競合カウンタに第２の時間に対応するカウンタ値を格納する、カウンタセット値生成部５１と、実行確定した命令が、直後に選択された命令に与える影響を示す、直前命令影響反映値を算出する、スケジューリング確定命令影響反映部５２と、命令間依存関係カウンタのカウンタ値、パス競合カウンタのカウンタ値、及び、直前命令影響反映値を比較して、最短となる命令の実行待ち時間を算出する、競合調停部５３と、を包含する。 The pipeline control unit 50 calculates the first time until the timing at which the influence of the scheduled instruction that matches the resource used for the instruction existing in the instruction issue waiting buffer is eliminated, and the inter-instruction dependency counter The counter value corresponding to the first time is stored, the second time until the timing at which the path contention does not occur in the arithmetic unit pipeline is calculated, and the counter value corresponding to the second time is stored in the path contention counter A counter set value generation unit 51, a scheduling fixed instruction influence reflection unit 52 that calculates the influence of the instruction that has been confirmed to be executed on the instruction selected immediately after that, and the inter-instruction dependency Compare the counter value of the counter, the counter value of the path contention counter, and the value that reflects the effect of the immediately preceding instruction, and execute the shortest instruction Chi calculates the time, including a conflict arbitration unit 53.

本実施形態に係る情報処理装置４０は、以下に記載するような効果を奏する。 The information processing apparatus 40 according to the present embodiment has the following effects.

その理由は、先行命令に対する発行抑止期間を管理するカウンタを発行待機命令毎に用意する方式とし、命令間の依存関係を時間的に管理する命令間依存関係カウンタと、演算器パイプラインのビジーを管理するパス競合カウンタと、直前に実行確定した命令に対する影響も考慮して待ち時間を算出し、これら３つの値を比較して最速で実行できるタイミングを算出する機構を設ける、からである。 The reason is that a counter that manages the issue suppression period for the preceding instruction is prepared for each issue waiting instruction, the inter-instruction dependency counter that temporally manages the inter-instruction dependency, and the busy of the arithmetic pipeline. This is because a waiting time is calculated in consideration of the path contention counter to be managed and the effect on the instruction that has been confirmed immediately before execution, and a mechanism is provided for calculating the timing that can be executed at the fastest speed by comparing these three values.

以上、図面を参照して本発明の実施形態を説明したが、本発明は上記実施形態に限定されるものではない。本発明の構成や詳細には、本発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 As mentioned above, although embodiment of this invention was described with reference to drawings, this invention is not limited to the said embodiment. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

１情報処理装置
１０パイプライン処理実行部
１００、１０１、１０２、１０３演算レジスタ
１０５ストアデータセレクタ
１０６クロスバ
１１０、１１１、１１２、１１３オペランドセレクタ
１２０積和演算器
１３０論理演算器
１５０、１５１ロードバッファ
１６０演算パイプ制御部
２０パイプライン制御部
２０１命令緩衝バッファ
２０２命令間依存関係チェック部
２０３ロードバッファＦｉｌｌ判定部
２１０命令発行待機バッファ
２１１命令格納バッファ
２１２命令間依存関係フラグ
２１３命令間依存関係カウンタ
２１４パス競合カウンタ
２１５Ｆｉｌｌｅｄフラグ
２２０命令実行スケジューリング部
２２１競合調停部
２２２実行確定命令レジスタ
２２３実行待ち時間
２３１スケジューリング確定命令影響反映部
２３２カウンタセット値生成部
２４０スケジューリング済命令バッファ
２４１スケジューリング済命令格納レジスタ
２４２カウンタ
２５０実行待機バッファ
２５１実行待機命令格納レジスタ
２５２実行待機命令カウンタ
３０情報処理装置
４０情報処理装置
５０パイプライン制御部
５１カウンタセット値生成部
５２スケジューリング確定命令影響反映部
５３競合調停部
６０パイプライン処理実行部 DESCRIPTION OF SYMBOLS 1 Information processing apparatus 10 Pipeline process execution part 100, 101, 102, 103 Operation register 105 Store data selector 106 Crossbar 110, 111, 112, 113 Operand selector 120 Multiply-add operation unit 130 Logical operation unit 150, 151 Load buffer 160 Operation Pipe control unit 20 Pipeline control unit 201 Instruction buffer buffer 202 Inter-instruction dependency check unit 203 Load buffer Fill determination unit 210 Instruction issue waiting buffer 211 Instruction storage buffer 212 Inter-instruction dependency flag 213 Inter-instruction dependency counter 214 Path contention counter 215 Filled flag 220 Instruction execution scheduling unit 221 Contention arbitration unit 222 Execution fixed instruction register 223 Execution waiting time 231 Effect of scheduling fixed instruction Reflection unit 232 Counter set value generation unit 240 Scheduled instruction buffer 241 Scheduled instruction storage register 242 Counter 250 Execution standby buffer 251 Execution standby instruction storage register 252 Execution standby instruction counter 30 Information processing device 40 Information processing device 50 Pipeline control unit 51 Counter set value generation unit 52 Scheduling confirmation instruction effect reflection unit 53 Contention arbitration unit 60 Pipeline processing execution unit

Claims

An information processing apparatus in which pipeline processing means controls pipeline processing execution means for executing pipeline processing of an arithmetic unit pipeline,
The pipeline control means comprises:
A first time until the timing at which the influence of the scheduled instruction is eliminated and the influence of the scheduled instruction is eliminated with respect to the instruction existing in the instruction issue waiting buffer is calculated, and the inter-instruction dependency counter is set to the first time. A counter set value for storing a corresponding counter value, calculating a second time until a timing at which no path contention occurs in the arithmetic unit pipeline, and storing a counter value corresponding to the second time in a path contention counter Generating means;
A scheduling fixed instruction influence reflecting means for calculating an immediately preceding instruction influence reflecting value indicating an influence of an instruction determined to be executed on an instruction selected immediately after;
Competing arbitration means for comparing the counter value of the inter-instruction dependency counter, the counter value of the path contention counter, and the immediately preceding instruction influence reflection value to calculate the execution waiting time of the shortest instruction Information processing apparatus.

Using the counter value of the inter-instruction dependency counter, the counter value of the path contention counter, and a value obtained by decoding the immediately preceding instruction influence reflection value, the contention arbitration unit calculates the shortest instruction waiting time. The information processing apparatus according to claim 1.

The contention arbitration means calculates a logical product of the counter value of the inter-instruction dependency counter, the counter value of the path contention counter, and the immediately preceding instruction influence reflection value, and sets the execution waiting time of the shortest instruction. The information processing apparatus according to claim 1 or 2 to calculate.

The information processing apparatus according to claim 2, wherein the decoded value is leading0.

A first time until the timing at which the influence of the scheduled instruction is eliminated and the influence of the scheduled instruction is eliminated with respect to the instruction existing in the instruction issue waiting buffer is calculated, and the inter-instruction dependency counter is set to the first time. Storing a corresponding counter value, calculating a second time until a timing at which no path contention occurs in the computing unit pipeline, and storing a counter value corresponding to the second time in a path contention counter;
Calculate the previous instruction influence reflection value indicating the influence of the instruction that has been confirmed to be executed on the instruction selected immediately after,
An information processing method of calculating a shortest instruction execution waiting time by comparing a counter value of the inter-instruction dependency counter, a counter value of the path contention counter, and the immediately preceding instruction influence reflection value.

6. The waiting time of the shortest instruction is calculated by using a counter value of the inter-instruction dependency counter, a counter value of the path contention counter, and a value obtained by decoding the immediately preceding instruction influence reflection value. The information processing method described.

6. The execution waiting time of the shortest instruction is calculated by calculating a logical product of a counter value of the inter-instruction dependency counter, a counter value of the path contention counter, and the immediately preceding instruction influence reflection value. Or the information processing method of 6.

A first time until the timing at which the influence of the scheduled instruction is eliminated and the influence of the scheduled instruction is eliminated with respect to the instruction existing in the instruction issue waiting buffer is calculated, and the inter-instruction dependency counter is set to the first time. A process of storing a corresponding counter value, calculating a second time until a timing at which no path contention occurs in the arithmetic unit pipeline, and storing a counter value corresponding to the second time in a path contention counter;
A process for calculating the immediately preceding instruction influence reflection value indicating the influence of the instruction that has been confirmed to be executed on the instruction selected immediately after;
Comparing the counter value of the inter-instruction dependency counter, the counter value of the path contention counter, and the immediately preceding instruction influence reflection value, and causing the computer to execute the process of calculating the shortest instruction execution waiting time program.

Processing for calculating the waiting time of the shortest instruction using the counter value of the inter-instruction dependency counter, the counter value of the path contention counter, and a value obtained by decoding the immediately preceding instruction influence reflection value in the computer The program according to claim 8 to be executed.

Processing to calculate the execution waiting time of the shortest instruction by calculating a logical product of the counter value of the inter-instruction dependency counter, the counter value of the path contention counter, and the immediately preceding instruction influence reflection value; The program according to claim 8 or 9, wherein the program is executed.