JP5704012B2

JP5704012B2 - Processor and control method of processor

Info

Publication number: JP5704012B2
Application number: JP2011168522A
Authority: JP
Inventors: 建司西川
Original assignee: Fujitsu Semiconductor Ltd
Current assignee: Fujitsu Semiconductor Ltd
Priority date: 2011-08-01
Filing date: 2011-08-01
Publication date: 2015-04-22
Anticipated expiration: 2031-08-01
Also published as: JP2013033350A

Description

本発明は、プロセッサ、及びプロセッサの制御方法に関する。 The present invention relates to a processor and a method for controlling the processor.

ベクトルプロセッサは、データメモリからデータを読み出すロード・ストア命令や、読み出したデータに対する演算処理命令を複数のパイプラインにより並列で実行し、高速なベクトル演算を行う。複数のロード・ストア命令において、同一メモリへのアクセス競合を回避するための方法が提案されている。メモリアクセスの競合回避に関し、たとえば、特許文献１〜３に記載されている。一例では、メモリバンクへのアクセスは、バンクインターリーブ方式で制御される。バンクインターリーブ方式では、連続するアドレスを有する記憶領域を複数のバンクに分割し、連続するアドレスへのアクセスが、バンクごとに時間をずらして行われる。 The vector processor executes a load / store instruction for reading data from the data memory and an operation processing instruction for the read data in parallel by a plurality of pipelines to perform high-speed vector operations. A method for avoiding access conflict to the same memory in a plurality of load / store instructions has been proposed. For example, Patent Documents 1 to 3 describe the avoidance of contention in memory access. In one example, access to the memory bank is controlled in a bank interleave manner. In the bank interleaving method, a storage area having continuous addresses is divided into a plurality of banks, and access to continuous addresses is performed with a time shift for each bank.

しかし、バンクインターリーブ方式を採用したとしても、複数のロード・ストア命令において、同一のバンクへのアクセスが競合するバンク競合が発生する。よって、バンク競合を回避するために、競合する命令のうちいずれかの発行を遅らせる（ストールする）ことでバンク競合を回避する方法が提案されている。 However, even if the bank interleaving method is adopted, bank conflicts occur in which a plurality of load / store instructions compete for access to the same bank. Therefore, in order to avoid bank conflict, a method for avoiding bank conflict by delaying (stalling) issuance of any of the competing instructions has been proposed.

特開平６−１６２０６４号公報JP-A-6-162064 特開平９−３０５４８７号公報JP-A-9-305487 特開２００８−１３５８１３号公報JP 2008-135813 A

しかしながら、ストールが頻繁に生じると、処理効率が低下するので問題となる。 However, frequent stalls are problematic because processing efficiency is reduced.

そこで、本発明の目的は、複数のパイプラインを有するプロセッサであって、処理効率を低下させることなくバンク競合を回避できるプロセッサ、及びその制御方法を提供することにある。 Accordingly, an object of the present invention is to provide a processor having a plurality of pipelines, which can avoid bank conflicts without reducing processing efficiency, and a control method therefor.

上記の目的を達成するための一実施形態におけるプロセッサは、メモリの複数のバンクに第１のバンクアクセス順序でアクセスする第１の処理部と、前記第1の処理部のアクセスの開始に続いて第２のバンクアクセス順序で前記複数のバンクにアクセスを開始する第２の処理部と、前記第１の処理部及び前記第２の処理部による前記複数のバンクへのアクセスが競合する場合に、前記第２のバンクアクセス順序を前記競合が生じない第３のバンクアクセス順序に並べ替えて前記第２の処理部を前記複数のバンクにアクセスさせる制御部とを備える。 In one embodiment for achieving the above object, a processor includes: a first processing unit that accesses a plurality of banks of memory in a first bank access order; and a start of access of the first processing unit. When a second processing unit that starts accessing the plurality of banks in a second bank access order and access to the plurality of banks by the first processing unit and the second processing unit compete with each other, A control unit that rearranges the second bank access order into a third bank access order that does not cause the contention and causes the second processing unit to access the plurality of banks.

以下に説明する実施形態によれば、複数のパイプラインを有するプロセッサにおいて、処理効率を低下させることなくバンク競合を回避できる。 According to the embodiment described below, bank contention can be avoided without reducing processing efficiency in a processor having a plurality of pipelines.

本実施形態におけるプロセッサの構成を説明するための図である。It is a figure for demonstrating the structure of the processor in this embodiment. データメモリの構成を模式的に示す図である。It is a figure which shows the structure of a data memory typically. ベクトルレジスタについて説明する図である。It is a figure explaining a vector register. ロード・ストア命令を実行するベクトルパイプライン動作を説明するための図である。It is a figure for demonstrating the vector pipeline operation | movement which performs a load / store instruction. バンク競合について説明する図である。It is a figure explaining bank competition. 並べ替え制御部１１４４の構成図である。FIG. 11 is a configuration diagram of a rearrangement control unit 1144. メモリアクセス順序等を説明するための図である。It is a figure for demonstrating a memory access order etc. FIG. バンクアクセス順序の並べ替え手順を示すフローチャート図である。It is a flowchart figure which shows the rearrangement procedure of bank access order. ４個のスロットによりロード・ストア命令が実行される例を示す図である。It is a figure which shows the example in which a load / store instruction is executed by four slots. レジスタアクセス順序の並べ替えについて説明する図である。It is a figure explaining rearrangement of a register access order. アクセス順序制御部６０２の動作を説明するための図である。6 is a diagram for explaining the operation of an access order control unit 602. FIG. 並べ替え管理フラグに書き込まれる値の例を示す図である。It is a figure which shows the example of the value written in the rearrangement management flag. 発行タイミング検出について説明する図である。It is a figure explaining issue timing detection. ２つのロード・ストア命令に演算命令が依存する場合のシーケンスを示す図である。It is a figure which shows a sequence in case an operation instruction depends on two load / store instructions. レジスタアクセス順序の並べ替えの動作手順を示すフローチャート図である。It is a flowchart figure which shows the operation | movement procedure of rearrangement of register access order.

以下、図面にしたがって本発明の実施の形態について説明する。但し、本発明の技術的範囲はこれらの実施の形態に限定されず、特許請求の範囲に記載された事項とその均等物まで及ぶものである。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the technical scope of the present invention is not limited to these embodiments, but extends to the matters described in the claims and equivalents thereof.

図１は、本実施形態におけるプロセッサの構成を説明するための図である。本実施形態におけるプロセッサの例として、ベクトルプロセッサ１００の構成が示される。ベクトルプロセッサ１００では、命令メモリ１０２に格納された命令に従って、ベクトルパイプライン１０４がデータメモリ１０６に格納されるデータを読み出して各種演算を行う。 FIG. 1 is a diagram for explaining a configuration of a processor in the present embodiment. As an example of the processor in the present embodiment, the configuration of the vector processor 100 is shown. In the vector processor 100, the vector pipeline 104 reads data stored in the data memory 106 and performs various operations in accordance with the instructions stored in the instruction memory 102.

ベクトルプロセッサ１００は、命令メモリ１０２、ベクトルパイプライン１０４、データメモリ１０６のほかに、デコーダ１０８、ベクトルレジスタ１１０、スカラレジスタ１１２、バンク競合検出部１１３、及び制御部１１４を有する。ベクトルプロセッサ１００は、たとえば、信号処理用ＬＳＩ（Large Scale Integrated circuit）である。データメモリ１０６は、ベクトルプロセッサ１００の外部に設けてもよい。 The vector processor 100 includes a decoder 108, a vector register 110, a scalar register 112, a bank conflict detection unit 113, and a control unit 114 in addition to the instruction memory 102, the vector pipeline 104, and the data memory 106. The vector processor 100 is, for example, a signal processing LSI (Large Scale Integrated circuit). The data memory 106 may be provided outside the vector processor 100.

命令メモリ１０２は、ベクトルパイプライン１０４に対する各種命令を格納する。各種命令は、データメモリ１０６からデータをベクトルレジスタ１１０に読み出すロード・ストア命令や、ベクトルレジスタ１１０に格納されるデータに対し算術演算などを行う演算命令などである。命令メモリ１０２は、たとえばＳＲＡＭ（Static Random Access Memory）である。 The instruction memory 102 stores various instructions for the vector pipeline 104. The various instructions are a load / store instruction for reading data from the data memory 106 to the vector register 110, an arithmetic instruction for performing an arithmetic operation on the data stored in the vector register 110, and the like. The instruction memory 102 is, for example, an SRAM (Static Random Access Memory).

デコーダ１０８は、命令メモリ１０２から命令を読み出してデコードし、デコードした命令をベクトルパイプライン１０４に入力するとともに、ロード・ストア命令でアクセスするデータメモリ１０６のアドレスをスカラレジスタ１１２から読み出し、ベクトルパイプライン１０４に入力する。 The decoder 108 reads and decodes the instruction from the instruction memory 102, inputs the decoded instruction to the vector pipeline 104, and reads the address of the data memory 106 accessed by the load / store instruction from the scalar register 112, 104 is input.

データメモリ１０６は、たとえばＤＲＡＭ（Dynamic Random Access Memory）などの大容量メモリである。データメモリ１０６は、記憶領域に連続したアドレスが割り当てられる。記憶領域は、それぞれ入出力ポートを有する複数のバンクに分割される。データメモリ１０６では、バンクインターリーブにより記憶領域にアクセスがなされる。 The data memory 106 is a large capacity memory such as a DRAM (Dynamic Random Access Memory). In the data memory 106, consecutive addresses are assigned to storage areas. The storage area is divided into a plurality of banks each having an input / output port. In the data memory 106, the storage area is accessed by bank interleaving.

ベクトルパイプライン１０４は、それぞれロード・ストア命令や演算命令をパイプラインにより実行するスロット１０４０、１０４１、１０４２、及び１０４３を有する。スロット１０４０〜１０４３は、それぞれ、シーケンサｓｅｑ０〜ｓｅｑ３と、演算器ｐｒｃ０〜ｐｒｃ３を有する。スロット１０４０〜１０４３では、それぞれ、シーケンサｓｅｑ０〜ｓｅｑ３による制御のもと、演算器ｐｒｃ０〜ｐｒｃ３が命令を実行する。各スロットが実行する命令と、その実行タイミングは、各命令に応じてシーケンサｓｅｑ０〜ｓｅｑ３により決定される。スロット１０４０〜１０４３が、本実施形態における処理部の例である。 The vector pipeline 104 has slots 1040, 1041, 1042, and 1043 for executing load / store instructions and arithmetic instructions by the pipeline, respectively. The slots 1040 to 1043 include sequencers seq0 to seq3 and arithmetic units prc0 to prc3, respectively. In slots 1040 to 1043, the arithmetic units prc0 to prc3 execute instructions under the control of the sequencers seq0 to seq3, respectively. The instruction executed by each slot and its execution timing are determined by the sequencers seq0 to seq3 according to each instruction. Slots 1040 to 1043 are examples of processing units in the present embodiment.

ベクトルプロセッサ１００は、たとえば、命令のフェッチ「ＩＦ」、命令のデコード「ＩＤ」、命令の実行「ＥＸ」、メモリアクセス「ＭＥＭ」、及び実行結果のレジスタ書込み「ＷＢ」のステージにより、命令が実行される。ＩＦ、ＩＤステージは、同一サイクルで実行される。ＩＤステージで、デコードされた命令がスロット１０４０〜１０４３のシーケンサｓｅｑ０〜ｓｅｑ３に入力される。また、ＥＸ〜ＷＢの各ステージでは、ベクトルパイプライン１０４のスロット１０４０〜１０４３により、命令に応じた処理が実行される。ＥＸステージでシーケンサｓｅｑ０〜３から演算器ｐｒｃ０〜ｐｒｃ３に命令が転送され、演算器ｐｒｃ０〜ｐｒｃ３が各命令に応じた処理を実行する。 In the vector processor 100, for example, the instruction is executed by the stage of instruction fetch “IF”, instruction decode “ID”, instruction execution “EX”, memory access “MEM”, and execution result register write “WB”. Is done. The IF and ID stages are executed in the same cycle. In the ID stage, the decoded instruction is input to the sequencers seq0 to seq3 in the slots 1040 to 1043. In each stage of EX to WB, processing corresponding to the instruction is executed by the slots 1040 to 1043 of the vector pipeline 104. In the EX stage, instructions are transferred from the sequencers seq0 to 3 to the arithmetic units prc0 to prc3, and the arithmetic units prc0 to prc3 execute processing corresponding to each instruction.

たとえば、ロード・ストア命令の場合、ＩＦステージで、命令メモリ１０２からロード・ストア命令が読み出される。そして、ＩＤステージで、デコーダ１０８がロード・ストア命令をデコードして、スロット１０４０〜１０４３に入力する。各スロットは、ＥＸステージでデータメモリ１０６にアクセスし、ＭＥＭステージでデータメモリ１０６からデータ要素を読み出し、そして、読み出したデータ要素をＷＢステージでベクトルレジスタ１１０に書き込む。また、演算命令の場合、ＩＦステージで、命令メモリ１０２から演算命令が読み出され、ＩＤステージで演算命令がデコードされてベクトルパイプライン１０４０〜１０４３に入力される。各スロットは、ＥＸステージでベクトルレジスタ１１０からデータ要素を読み出して演算を実行し、ＷＢステージで演算結果をベクトルレジスタ１１０に書き込む。 For example, in the case of a load / store instruction, the load / store instruction is read from the instruction memory 102 in the IF stage. At the ID stage, the decoder 108 decodes the load / store instruction and inputs it to the slots 1040 to 1043. Each slot accesses the data memory 106 at the EX stage, reads a data element from the data memory 106 at the MEM stage, and writes the read data element to the vector register 110 at the WB stage. In the case of an arithmetic instruction, an arithmetic instruction is read from the instruction memory 102 at the IF stage, and the arithmetic instruction is decoded and input to the vector pipelines 1040 to 1043 at the ID stage. Each slot reads the data element from the vector register 110 at the EX stage and executes an operation, and writes the operation result to the vector register 110 at the WB stage.

図２には、データメモリ１０６の構成が模式的に示される。データメモリ１０６は、連続したアドレスが行方向及び列方向に割り当てられた記憶領域１０６０を有する。記憶領域１０６０の各マス目は、１６ビットごとのデータの格納領域を示す。１６ビットごとのデータがデータ要素である。マス目の中の数字は、データ要素の順番を示す。記憶領域１０６０は、それぞれ入出力ポートを有する４個のバンクＢＫ０〜ＢＫ３に分割される。以下、４個のバンクを例に説明がなされるが、バンク数は４個以外であってもよい。各バンクは、８個のデータ要素に対応する１２８ビットのバンク幅を有する。たとえば、行Ｒ１には、バンクＢＫ０に「０」番目〜「７」番目の、バンクＢＫ１に「８」番目〜「１５」番目の、バンクＢＫ１に「１６」番目〜「２３」番目の、そして、バンクＢＫ２に「２４」番目〜「３１」番目のデータ要素が格納される。バンクＢＫ０〜ＢＫ３からは、読出しアドレスが入力される１アクセスで、それぞれ８個のデータ要素が読み出される。 FIG. 2 schematically shows the configuration of the data memory 106. The data memory 106 has a storage area 1060 in which consecutive addresses are assigned in the row direction and the column direction. Each square of the storage area 1060 indicates a data storage area for every 16 bits. Data every 16 bits is a data element. The numbers in the squares indicate the order of the data elements. Storage area 1060 is divided into four banks BK0 to BK3 each having an input / output port. In the following description, four banks are used as an example, but the number of banks may be other than four. Each bank has a bank width of 128 bits corresponding to 8 data elements. For example, in the row R1, the banks BK0 are “0” to “7”, the banks BK1 are “8” to “15”, the banks BK1 are “16” to “23”, and The “24” th to “31” th data elements are stored in the bank BK2. From the banks BK0 to BK3, eight data elements are read out in one access to which a read address is input.

図３は、ベクトルレジスタ１１０について説明する図である。ベクトルレジスタ１１０には、スロット１０４０〜１０４３が演算処理するためのデータメモリ１０６から読み出されたデータ要素が、一時的に格納される。図３（Ａ）に示すように、ベクトルレジスタ１１０の記憶領域１１００は１２８ビットの幅を有し、１つの行アドレスに８個のデータ要素が格納される。記憶領域１１００における各マス目は１６ビット長の格納領域を示し、マス目の中の数字はデータ要素の番号を示す。 FIG. 3 is a diagram for explaining the vector register 110. In the vector register 110, data elements read from the data memory 106 for performing arithmetic processing on the slots 1040 to 1043 are temporarily stored. As shown in FIG. 3A, the storage area 1100 of the vector register 110 has a width of 128 bits and stores eight data elements in one row address. Each square in the storage area 1100 indicates a storage area having a 16-bit length, and the number in each square indicates the number of the data element.

図３（Ｂ）には、ベクトルレジスタ１１０における論理ベクトルレジスタ（ＶＲ）番号と物理ベクトルレジスタ（ＶＲ）番号の例が示される。論理ベクトルレジスタ番号は、ロード・ストア命令や演算命令におけるベクトルレジスタ１１０のデータの位置を示す。一方、物理ベクトルレジスタ番号は、ベクトルレジスタ１１０における物理的なデータの位置を示す。論理ベクトルレジスタ番号は、ｖｒ０、ｖｒ１、ｖｒ２、・・・と表わされる。また、それぞれの論理ベクトルレジスタ番号に対応する物理ベクトルレジスタ番号は、ｖｒ[０]、ｖｒ[１]、ｖｒ[２]、・・・と表わされる。物理ベクトルレジスタ番号は、たとえば、図３（Ａ）のデータ要素の番号に対応しており、本実施形態における複数のレジスタの例である。 FIG. 3B shows an example of a logical vector register (VR) number and a physical vector register (VR) number in the vector register 110. The logical vector register number indicates the position of data in the vector register 110 in a load / store instruction or an operation instruction. On the other hand, the physical vector register number indicates the position of physical data in the vector register 110. The logical vector register numbers are represented as vr0, vr1, vr2,. The physical vector register numbers corresponding to the respective logical vector register numbers are represented as vr [0], vr [1], vr [2],. The physical vector register number corresponds to, for example, the data element number in FIG. 3A, and is an example of a plurality of registers in the present embodiment.

図４は、ロード・ストア命令を実行するベクトルパイプライン１０４の動作を説明するための図である。図４（Ａ）には、データメモリ１０６におけるアドレス配置が示される。左から右へ、上から下へアドレスが増加する。バンクＢＫ０において、各行のベースアドレスは、０ｘ００、０ｘ４０、０ｘ８０、０ｘＣ０、・・・である。また、バンクＢＫ１〜ＢＫ３の先頭におけるベースアドレスからのオフセット値は、それぞれ、０ｘ１０、０ｘ２０、０ｘ３０である。 FIG. 4 is a diagram for explaining the operation of the vector pipeline 104 for executing the load / store instruction. FIG. 4A shows an address arrangement in the data memory 106. The addresses increase from left to right and from top to bottom. In the bank BK0, the base address of each row is 0x00, 0x40, 0x80, 0xC0,. The offset values from the base address at the head of the banks BK1 to BK3 are 0x10, 0x20, and 0x30, respectively.

図４（Ｂ）には、ロード・ストア命令の処理シーケンスが示される。ここでは、横軸を処理サイクルＣ１、Ｃ２、Ｃ３，・・・として、２個のスロットが実行するロード・ストア命令の例が示される。たとえば、ロード・ストア命令ＬＳ１は、スロット１０４０により実行される。この命令は、「ｖｌｄｈｓｒ２，ｖｒ０」であり、データメモリ１０６の論理アドレス「ｓｒ２」（たとえばバンクＢＫ０の「０ｘ００」）から順次、論理ベクトルレジスタ番号ｖｒ０にデータの読み出しを指示する命令である。また、ロード・ストア命令ＬＳ２は、スロット１０４１により実行される。この命令は、「ｖｌｄｈｓｒ３，ｖｒ１」であり、データメモリ１０６の論理アドレス「ｓｒ３」（たとえば、たとえばバンクＢＫ１の「０ｘ１１０」から、論理ベクトルレジスタ番号ｖｒ１にデータの読み出しを指示する命令である。図４（Ｂ）では、各処理サイクルにおいてアクセスするバンクとアドレスとが示される。また、各処理サイクルは図１で示したパイプラインにおける「ＥＸ」ステージに対応する。なお、ここでは、４つの処理サイクルが１つの命令サイクルに対応する（以下、同様）。 FIG. 4B shows a load / store instruction processing sequence. Here, an example of a load / store instruction executed by two slots is shown with the horizontal axis as the processing cycles C1, C2, C3,. For example, the load / store instruction LS1 is executed by the slot 1040. This instruction is “vldh sr2, vr0”, and is an instruction for instructing the logical vector register number vr0 to read data sequentially from the logical address “sr2” of the data memory 106 (for example, “0x00” of the bank BK0). The load / store instruction LS2 is executed by the slot 1041. This instruction is “vldh sr3, vr1”, and is an instruction that instructs the logical vector register number vr1 to read data from the logical address “sr3” of the data memory 106 (for example, “0x110” of the bank BK1, for example). 4B shows a bank and an address to be accessed in each processing cycle, and each processing cycle corresponds to the “EX” stage in the pipeline shown in FIG. A processing cycle corresponds to one instruction cycle (the same applies hereinafter).

ロード・ストア命令ＬＳ１は、たとえば、処理サイクルＣ１から実行される。まず、スロット１０４０は、サイクルＣ１でバンクＢＫ０のアドレス「０ｘ００」に、サイクルＣ２でバンクＢＫ１のアドレス「０ｘ１０」に、サイクルＣ３でバンクＢＫ２のアドレス「０ｘ２０」に、そして、サイクルＣ４でバンクＢＫ３のアドレス「０ｘ３０」にアクセスする。これに引き続いて、さらに、スロット１０４０は、サイクルＣ５でバンクＢＫ０のアドレス「０ｘ４０」に、サイクルＣ６でバンクＢＫ１のアドレス「０ｘ５０」に、サイクルＣ７でバンクＢＫ２のアドレス「０ｘ６０」に、そして、サイクルＣ８でバンクＢＫ３のアドレス「０ｘ７０」にアクセスする。なお、スロット１０４０がアクセスするデータメモリ１０６のアドレスは、図４（Ａ）で、斜線「／」により示される。 The load / store instruction LS1 is executed, for example, from the processing cycle C1. First, the slot 1040 is changed to the address “0x00” of the bank BK0 in the cycle C1, the address “0x10” of the bank BK1 in the cycle C2, the address “0x20” of the bank BK2 in the cycle C3, and the bank BK3 in the cycle C4. Access to address “0x30”. Following this, the slot 1040 further changes to the address “0x40” of the bank BK0 in the cycle C5, the address “0x50” of the bank BK1 in the cycle C6, the address “0x60” of the bank BK2 in the cycle C7, and the cycle. The address “0x70” of the bank BK3 is accessed at C8. Note that the address of the data memory 106 accessed by the slot 1040 is indicated by a hatched “/” in FIG.

一方、ロード・ストア命令ＬＳ２は、処理サイクルＣ２から実行される。スロット１０４１は、サイクルＣ２でバンクＢＫ０のアドレス「０ｘ１００」に、サイクルＣ３でバンクＢＫ１のアドレス「０ｘ１１０」に、サイクルＣ４でバンクＢＫ２のアドレス「０ｘ１２０」に、そして、サイクルＣ５でバンクＢＫ３のアドレス「０ｘ１３０」にアクセスする。これに引き続いて、さらに、スロット１０４１は、サイクルＣ６でバンクＢＫ０のアドレス「０ｘ１４０」に、サイクルＣ７でバンクＢＫ１のアドレス「０ｘ１５０」に、サイクルＣ８バンクＢＫ２のアドレス「０ｘ１６０」に、そして、サイクルＣ９でバンクＢＫ３のアドレス「０ｘ１７０」にアクセスする。なお、スロット１０４１がアクセスするデータメモリ１０６のアドレスは、図４（Ａ）で、斜線「＼」により示される。 On the other hand, the load / store instruction LS2 is executed from the processing cycle C2. The slot 1041 is changed to the address “0x100” of the bank BK0 in the cycle C2, the address “0x110” of the bank BK1 in the cycle C3, the address “0x120” of the bank BK2 in the cycle C4, and the address “0x120” of the bank BK3 in the cycle C5. 0x130 "is accessed. Following this, the slot 1041 further changes to the address “0x140” of the bank BK0 in the cycle C6, the address “0x150” of the bank BK1 in the cycle C7, the address “0x160” of the bank C8 bank BK2, and the cycle C9. To access the address “0x170” of the bank BK3. It should be noted that the address of the data memory 106 accessed by the slot 1041 is indicated by hatching “\” in FIG.

図４（Ｂ）では、スロット１０４０、１０４１が同じサイクルで同じバンクにアクセスすることがない。よって、この場合、バンク競合は生じない。しかし、スロット１０４０、１０４１が同じサイクルで同じバンクにアクセスすると、バンク競合が生じる。かかる場合、制御部１１４が命令の並べ替えを行うことで、処理効率を低下させることなくバンク競合を回避する。 In FIG. 4B, slots 1040 and 1041 do not access the same bank in the same cycle. Therefore, in this case, bank conflict does not occur. However, if the slots 1040 and 1041 access the same bank in the same cycle, bank contention occurs. In such a case, the control unit 114 rearranges instructions, thereby avoiding bank conflicts without reducing processing efficiency.

図５は、バンク競合が生じる場合を示す。図５（Ａ）には、図４（Ａ）と同じデータメモリ１０６のアドレスが示される。また、図５（Ｂ）〜（Ｄ）には、ロード・ストア処理のシーケンスが示される。なお、図５（Ｂ）〜（Ｄ）においてロード・ストア命令ＬＳ１、ＬＳ２によりアクセスされるメモリ１０６のアドレスは、図５（Ａ）においてそれぞれ斜線「／」、「＼」で示される。 FIG. 5 shows a case where bank contention occurs. FIG. 5A shows the same address of the data memory 106 as in FIG. 5B to 5D show a load / store processing sequence. The addresses of the memory 106 accessed by the load / store instructions LS1 and LS2 in FIGS. 5B to 5D are indicated by hatched lines “/” and “\” in FIG. 5A, respectively.

図５（Ｂ）では、スロット１０４１が実行するロード・ストア命令ＬＳ２において、データメモリ１０６におけるアクセスの開始アドレスが、図４（Ｂ）の場合と異なる。たとえば、スロット１０４１は、サイクルＣ２でバンクＢＫ１のアドレス「０ｘ１１０」へのアクセスを開始し、サイクルごとに、バンクＢＫ２のアドレス「０ｘ１２０」、バンクＢＫ３のアドレス「０ｘ１３０」にアクセスし、さらに引き続き、バンクＢＫ０のアドレス「０ｘ１４０」、バンクＢＫ１のアドレス「０ｘ１５０」、バンクＢＫ２のアドレス「０ｘ１６０」、ＢＫ３のアドレス「０ｘ１７０」、そしてバンクＢＫ０のアドレス「０ｘ１８０」にアクセスする。すなわち、いわゆるラップ（折り返し）パターンでアクセスが行われる。すると、サイクルＣ２〜Ｃ８においてバンク競合が発生する。 5B, in the load / store instruction LS2 executed by the slot 1041, the access start address in the data memory 106 is different from that in FIG. 4B. For example, the slot 1041 starts access to the address “0x110” of the bank BK1 in the cycle C2, and accesses the address “0x120” of the bank BK2 and the address “0x130” of the bank BK3 for each cycle. BK0 address “0x140”, bank BK1 address “0x150”, bank BK2 address “0x160”, BK3 address “0x170”, and bank BK0 address “0x180” are accessed. That is, access is performed in a so-called wrap (folding) pattern. Then, bank conflict occurs in cycles C2 to C8.

ここで、バンク競合をストールにより回避しようとすると、図５（Ｃ）に示すようになる。図５（Ｃ）に示すように、ロード・ストア命令ＬＳ２を１サイクル分ストールすることにより、バンク競合が回避される。しかし、これにより、ロード・ストア命令ＬＳ２の終了がサイクルＣ１０まで遅延する。 Here, an attempt to avoid bank conflict by stalling is as shown in FIG. As shown in FIG. 5C, bank contention is avoided by stalling the load / store instruction LS2 for one cycle. However, this delays the end of the load / store instruction LS2 until the cycle C10.

そこで、本実施形態では、制御部１１４が、第１の処理部であるスロット１０４０が第１のロード・ストア命令ＬＳ１の実行を開始した後に第２の処理部であるスロット１０４１が第２のロード・ストア命令ＬＳ２の実行を開始するとき、第２の命令における複数のバンクＢＫ０〜ＢＫ３へのバンクアクセス順序を、スロット１０４０、１０４１のバンクへのアクセスが競合しないようなバンクアクセス順序（以下、競合回避バンクアクセス順序という）に並べ替えて、スロット１０４１に第２のロード・ストア命令ＬＳ２を実行させる。 Therefore, in the present embodiment, the control unit 114 causes the slot 1041 as the second processing unit to execute the second load after the slot 1040 as the first processing unit starts executing the first load / store instruction LS1. When the execution of the store instruction LS2 is started, the bank access order to the plurality of banks BK0 to BK3 in the second instruction is set to the bank access order (hereinafter referred to as conflict) that does not conflict with the access to the banks of the slots 1040 and 1041. The second load / store instruction LS2 is executed in the slot 1041.

具体的には、図５（Ｄ）に示すように、図５（Ａ）でサイクルＣ５に対応していたバンクＢＫ０のアドレス「０ｘ１４０」へのアクセスをサイクルＣ２で実行し（矢印５１）、サイクルＣ２〜Ｃ４に対応していたＢＫ１のアドレス「０ｘ１１０」、バンクＢＫ２のアドレス「０ｘ１２０」、バンクＢＫ３のアドレス「０ｘ１３０」へのアクセスを、それぞれ１サイクル遅らせる。また、同様に、図５（Ａ）でサイクルＣ９に対応していたバンクＢＫ０のアドレス「０ｘ１８０」へのアクセスをサイクルＣ６で実行し（矢印５２）、サイクルＣ６〜Ｃ８に対応していたＢＫ２のアドレス「０ｘ１５０」、バンクＢＫ２のアドレス「０ｘ１６０」、バンクＢＫ３のアドレス「０ｘ１７０」へのアクセスを、それぞれ１サイクル遅らせる。そうすることで、４つのサイクルのうち、第１〜第３サイクルを１サイクル分ストールさせ、第４サイクルを最初に実行することと等価になる。よって、バンク競合が回避される。このように並べ替えられた競合回避バンクアクセス順序で、ロード・ストア命令ＬＳ２が実行される。このようにして、たとえば図５（Ｂ）との比較において示されるように、ロード・ストア命令ＬＳ２をストールすることなく、すなわち処理効率を低下させることなく、バンク競合を回避することができる。 Specifically, as shown in FIG. 5D, the access to the address “0x140” of the bank BK0 corresponding to the cycle C5 in FIG. 5A is executed in the cycle C2 (arrow 51). Access to the address “0x110” of BK1, the address “0x120” of bank BK2, and the address “0x130” of bank BK3 corresponding to C2 to C4 are each delayed by one cycle. Similarly, access to the address “0x180” of the bank BK0 corresponding to the cycle C9 in FIG. 5A is executed in the cycle C6 (arrow 52), and the address of BK2 corresponding to the cycles C6 to C8 is executed. Access to the address “0x150”, the address “0x160” of the bank BK2, and the address “0x170” of the bank BK3 are each delayed by one cycle. By doing so, it is equivalent to stalling the first to third cycles by one cycle among the four cycles and executing the fourth cycle first. Thus, bank contention is avoided. The load / store instruction LS2 is executed in the rearranged contention avoidance bank access order. In this way, as shown in, for example, comparison with FIG. 5B, bank contention can be avoided without stalling the load / store instruction LS2, that is, without reducing processing efficiency.

図１に戻り、さらに図６を参照しつつ、上記の制御を行う制御部１１４について説明する。制御部１１４は、依存関係検出部１１４２と、並べ替え制御部１１４４と、並べ替えアドレス生成部１１４６とを有する。また、並べ替え制御部１１４４は、図６に示すように、アクセス順序制御部６０２、レジスタ管理部６０４、発行タイミング検出部６０６、並べ替え管理フラグ６０８、６１０、６１２、６１４、及び、レジスタ管理フラグ６１６、６１８、６２０、６２２を有する。図１、図６に示す構成のうち、まず、バンクアクセス順序の並べ替えに関する構成の動作について説明する。 Returning to FIG. 1, the control unit 114 that performs the above control will be described with reference to FIG. The control unit 114 includes a dependency relationship detection unit 1142, a rearrangement control unit 1144, and a rearrangement address generation unit 1146. Further, as shown in FIG. 6, the rearrangement control unit 1144 includes an access order control unit 602, a register management unit 604, an issue timing detection unit 606, a rearrangement management flag 608, 610, 612, 614, and a register management flag. 616, 618, 620, 622. Of the configurations shown in FIGS. 1 and 6, the operation of the configuration relating to the rearrangement of the bank access order will be described first.

制御部１１４には、デコーダ１０８からデコードされたロード・ストア命令ＬＳ１、ＬＳ２が入力される。また、バンク競合検出部１１３には、デコードされたロード・ストア命令ＬＳ１、ＬＳ２でアクセスするデータメモリ１０６のアドレスが、スカラレジスタ１１２から入力される。バンク競合検出部１１３は、ロード・ストア命令ＬＳ１、ＬＳ２のバンクアクセス順序を解析し、ロード・ストア命令ＬＳ１、ＬＳ２におけるバンク競合を検出する。バンク競合は、各ロード・ストア命令がそれぞれ予定されるタイミングで実行された場合に、各ロード・ストア命令に同じサイクルで同じバンクにアクセスする命令が含まれているときに検出される。バンク競合検出部１１３は、バンク競合を検出すると、これを並べ替え制御部１１４４に通知する。これに応答して、並べ替え制御部１１４４は、ロード・ストア命令ＬＳ２のバンクアクセス順序を、競合回避バンクアクセス順序に並べ替える。 The controller 114 receives load / store instructions LS1 and LS2 decoded from the decoder 108. In addition, the bank conflict detection unit 113 receives the address of the data memory 106 accessed by the decoded load / store instructions LS 1 and LS 2 from the scalar register 112. The bank conflict detection unit 113 analyzes the bank access order of the load / store instructions LS1, LS2, and detects bank conflict in the load / store instructions LS1, LS2. Bank conflict is detected when each load / store instruction is executed at a predetermined timing and each load / store instruction includes an instruction that accesses the same bank in the same cycle. When the bank conflict detection unit 113 detects a bank conflict, the bank conflict detection unit 113 notifies the rearrangement control unit 1144 of this. In response to this, the rearrangement control unit 1144 rearranges the bank access order of the load / store instruction LS2 to the conflict avoidance bank access order.

並べ替え制御部１１４４では、アクセス順序制御部６０２が、ロード・ストア命令ＬＳ１、ＬＳ２のバンクアクセス順序を判定する。ロード・ストア命令ＬＳ１、ＬＳ２は、たとえば図７（Ａ）に示すような、データメモリ１０６のアドレスへの、処理サイクルごとのアクセス順序（以下、メモリアクセス順序という）を有する。ここでは、メモリアクセス順序が、各バンクの先頭のベースアドレスからのオフセット値で示される。たとえば、メモリアクセス順序ＭＡ１は、バンクＢＫ０の「０ｘ０」、バンクＢＫ１の「０ｘ１０」、バンクＢＫ２の「０ｘ２０」、そしてバンクＢＫ３の「０ｘ３０」の順序である。また、メモリアクセス順序ＭＡ２は、「０ｘ１０」、「０ｘ２０」、「０ｘ３０」、「０ｘ０」の順序である。さらに、メモリアクセス順序ＭＡ３は、「０ｘ２０」、「０ｘ３０」、「０ｘ０」、「０ｘ１０」の順序である。そして、メモリアクセス順序ＭＡ４は、「０ｘ３０」、「０ｘ０」、「０ｘ１０」、「０ｘ２０」の順序である。アクセス順序制御部６０２は、このようなメモリアクセス順序ＭＡ１〜ＭＡ４から、図７（Ｂ）に示すようなバンクアクセス順序を判定する。たとえば、メモリアクセス順序ＭＡ１に対応するバンクアクセス順序ＢＡ１は、バンクＢＫ０、ＢＫ１、ＢＫ２、ＢＫ３の順序である。また、メモリアクセス順序ＭＡ２に対応するバンクアクセス順序ＢＡ２は、バンクＢＫ１、ＢＫ２、ＢＫ３、ＢＫ０の順序である。さらに、メモリアクセス順序ＭＡ３に対応するバンクアクセス順序ＢＡ３は、バンクＢＫ２、ＢＫ３、ＢＫ０、ＢＫ１の順序である。そして、メモリアクセス順序ＭＡ４に対応するバンクアクセス順序ＢＡ４は、バンクＢＫ３、ＢＫ０、ＢＫ１、ＢＫ２の順序である。 In the rearrangement control unit 1144, the access order control unit 602 determines the bank access order of the load / store instructions LS1 and LS2. The load / store instructions LS1 and LS2 have an access order for each processing cycle (hereinafter referred to as a memory access order) to the address of the data memory 106 as shown in FIG. 7A, for example. Here, the memory access order is indicated by an offset value from the head base address of each bank. For example, the memory access order MA1 is “0x0” in the bank BK0, “0x10” in the bank BK1, “0x20” in the bank BK2, and “0x30” in the bank BK3. The memory access order MA2 is an order of “0x10”, “0x20”, “0x30”, “0x0”. Further, the memory access order MA3 is an order of “0x20”, “0x30”, “0x0”, “0x10”. The memory access order MA4 is an order of “0x30”, “0x0”, “0x10”, “0x20”. The access order control unit 602 determines the bank access order as shown in FIG. 7B from the memory access orders MA1 to MA4. For example, the bank access order BA1 corresponding to the memory access order MA1 is the order of the banks BK0, BK1, BK2, and BK3. The bank access order BA2 corresponding to the memory access order MA2 is the order of the banks BK1, BK2, BK3, and BK0. Furthermore, the bank access order BA3 corresponding to the memory access order MA3 is the order of the banks BK2, BK3, BK0, BK1. The bank access order BA4 corresponding to the memory access order MA4 is the order of the banks BK3, BK0, BK1, and BK2.

たとえば、図５（Ｂ）の例では、ロード・ストア命令ＬＳ１のバンクアクセス順序はＢＡ１と判定される。また、ロード・ストア命令ＬＳ２のバンクアクセス順序は、ＢＡ２と判定される。よって、アクセス順序制御部６０２は、ロード・ストア命令ＬＳ２のバンクアクセス順序ＢＡ２を、ロード・ストア命令ＬＳ１とバンク競合が生じないような競合回避アクセス順序に並べ替える。競合回避アクセス順序は、最後にアクセスされるバンクを最初にアクセスするように並べ替えた順序である。ここでは、競合回避アクセス順序は、ロード・ストア命令ＬＳ１と同じバンクアクセス順序ＢＫ１になる。そして、アクセス順序制御部６０２は、ロード・ストア命令ＬＳ１と、バンクアクセス順序が並べ替えられたＬＳ２を並べ替えアドレス生成部１１４６に転送する。並べ替えアドレス生成部１１４６は、ロード・ストア命令ＬＳ１、ＬＳ２に基づいて、データメモリ１０６のアクセスされるアドレスを生成して、スロット１１４０、１１４１にそれぞれ入力する。すると、スロット１１４０、１１４１は、ロード・ストア命令ＬＳ１、ＬＳ２に従って、図５（Ｄ）で示したように、それぞれバンクＢＫ０〜ＢＫ３に順次アクセスする。 For example, in the example of FIG. 5B, the bank access order of the load / store instruction LS1 is determined to be BA1. The bank access order of the load / store instruction LS2 is determined to be BA2. Therefore, the access order control unit 602 rearranges the bank access order BA2 of the load / store instruction LS2 into a conflict avoidance access order that does not cause bank conflict with the load / store instruction LS1. The contention avoidance access order is an order in which the last accessed bank is rearranged so as to be accessed first. Here, the conflict avoidance access order is the same bank access order BK1 as that of the load / store instruction LS1. Then, the access order control unit 602 transfers the load / store instruction LS1 and the LS2 in which the bank access order is rearranged to the rearrangement address generation unit 1146. The rearrangement address generation unit 1146 generates an address to be accessed of the data memory 106 based on the load / store instructions LS1 and LS2, and inputs the generated addresses to the slots 1140 and 1141, respectively. Then, the slots 1140 and 1141 sequentially access the banks BK0 to BK3 according to the load / store instructions LS1 and LS2, respectively, as shown in FIG.

また、アクセス順序制御部６０２は、各ロード・ストア命令の並べ替え前及び後のバンクアクセス順序を、並べ替え管理フラグ６０８〜６１４に書き込む。たとえば、並べ替え管理フラグ６０８〜６１４には、スロット１０４０〜１０４３で実行されるロード・ストア命令のバンクアクセス順序が書き込まれる。よって、ロード・ストア命令ＬＳ１のバンクアクセス順序ＢＡ１は、並べ替え管理フラグ６０８に書き込まれる。また、ロード・ストア命令ＬＳ２の並べ替え前のバンクアクセス順序ＢＡ１と並べ替え後のバンクアクセス順序ＢＡ２は、並べ替え管理フラグ６１０に書き込まれる。 The access order control unit 602 writes the bank access order before and after the rearrangement of each load / store instruction in the rearrangement management flags 608 to 614. For example, in the rearrangement management flags 608 to 614, the bank access order of load / store instructions executed in the slots 1040 to 1043 is written. Therefore, the bank access order BA1 of the load / store instruction LS1 is written in the rearrangement management flag 608. The bank access order BA1 before the rearrangement of the load / store instruction LS2 and the bank access order BA2 after the rearrangement are written in the rearrangement management flag 610.

さらに、レジスタ管理部６０４には、デコーダ１０８からロード・ストア命令の実行状態が入力される。命令の実行状態には、「ＥＸ」、「ＭＥＭ」、「ＷＢ」などのステージごとに、対応する命令がデコードされたことを示す情報が含まれる。レジスタ管理部６０４は、命令の実行状態をレジスタ管理フラグ６１６〜６２２に記録して管理する。この動作については後に詳述する。 Further, the execution state of the load / store instruction is input from the decoder 108 to the register management unit 604. The instruction execution state includes information indicating that the corresponding instruction is decoded for each stage such as “EX”, “MEM”, and “WB”. The register manager 604 records and manages the execution state of instructions in the register management flags 616 to 622. This operation will be described in detail later.

図８は、バンクアクセス順序の並べ替え手順を示すフローチャート図である。図８に示す手順は、たとえば、１命令サイクル分の命令がフェッチされるごとに実行される。まず、デコーダ１０８がロード・ストア命令をデコードする（Ｓ８０２）。そして、バンク競合検出部１１３が、バンク競合の有無を判定する（Ｓ８０４）。 FIG. 8 is a flowchart showing the bank access order rearrangement procedure. The procedure shown in FIG. 8 is executed each time an instruction for one instruction cycle is fetched, for example. First, the decoder 108 decodes a load / store instruction (S802). Then, the bank conflict detection unit 113 determines whether there is a bank conflict (S804).

バンク競合が検出されない場合（Ｓ８０４のＮｏ）、ロード・ストア命令がベクトルパイプライン１０４で実行される（Ｓ８１２）。たとえば、ロード・ストア命令ＬＳ１は、先行の命令との間でバンク競合が生じない。よって、並べ替えアドレス生成部１１４６が生成するアドレスに対し、ロード・ストア命令ＬＳ１がスロット１０４０により実行される。そして、並べ替え制御部１１４４が、データが書き込まれたベクトルレジスタ１１０のレジスタを管理する（Ｓ８１４）。たとえば、スロット１０４０がロード・ストア命令ＬＳ１のＷＢステージが完了したときに、そのことを示す情報がレジスタ管理フラグ６１６〜６２２のうち６１６に書き込まれる。詳しくは後述するが、ロード・ストア命令ＬＳ１では書き込み先のレジスタが指定されているので、ＷＢステージの完了を把握することで、そのレジスタへの処理完了が管理される。 If no bank conflict is detected (No in S804), a load / store instruction is executed in the vector pipeline 104 (S812). For example, the load / store instruction LS1 does not cause a bank conflict with the preceding instruction. Therefore, the load / store instruction LS1 is executed by the slot 1040 for the address generated by the rearrangement address generation unit 1146. Then, the rearrangement control unit 1144 manages the registers of the vector register 110 in which data is written (S814). For example, when the WB stage of the load / store instruction LS1 is completed in the slot 1040, information indicating that is written in the register management flags 616 to 622 in 616. Although details will be described later, since the register of the write destination is specified in the load / store instruction LS1, the completion of processing to the register is managed by grasping the completion of the WB stage.

一方、バンク競合が検出された場合（Ｓ８０４のＹｅｓ）、並べ替え制御部１１４２が、競合回避バンクアクセス順序を決定する（Ｓ８０６）。たとえば、ロード・ストア命令ＬＳ２の場合、ロード・ストア命令ＬＳ１との間でバンク競合が検出される。よって、アクセス順序制御部６０２により、ロード・ストア命令ＬＳ２の競合回避バンクアクセス順序ＢＡ１が決定される。そして、並べ替えアドレス生成部１１４６が、競合回避バンクアクセス順序に対応する並べ替えアドレスを生成する（Ｓ８０８）。そして、アクセス順序制御部６０２が、バンクアクセス順序を管理する（Ｓ８１０）。たとえば、ロード・ストア命令ＬＳ２の並べ替え前と後のバンクアクセス順序が、並べ替え管理フラグ６１０〜６２２のうち６１０に書き込まれる。そして、ロード・ストア命令がベクトルパイプライン１０４で実行される（Ｓ８１２）。たとえば、ロード・ストア命令ＬＳ２がスロット１０４１により実行される。そして、並べ替え制御部１１４４が、データが書き込まれたレジスタを管理する（Ｓ８１４）。 On the other hand, when bank conflict is detected (Yes in S804), the reordering control unit 1142 determines the conflict avoidance bank access order (S806). For example, in the case of the load / store instruction LS2, a bank conflict is detected with the load / store instruction LS1. Therefore, the access order control unit 602 determines the contention avoidance bank access order BA1 for the load / store instruction LS2. Then, the reorder address generation unit 1146 generates a reorder address corresponding to the conflict avoidance bank access order (S808). Then, the access order control unit 602 manages the bank access order (S810). For example, the bank access order before and after the rearrangement of the load / store instruction LS2 is written in 610 of the rearrangement management flags 610 to 622. Then, the load / store instruction is executed in the vector pipeline 104 (S812). For example, the load / store instruction LS 2 is executed by the slot 1041. Then, the rearrangement control unit 1144 manages the register in which the data is written (S814).

上記のような手順によれば、ストールにより処理効率を低下させることなく、バンク競合を回避できる。 According to the above procedure, bank contention can be avoided without reducing processing efficiency due to stall.

図９は、４個のスロットによりロード・ストア命令が実行される例を示す。図９（Ａ）〜（Ｃ）には、スロット１０４０〜１０４３によるロード・ストア処理ＬＳ１〜ＬＳ４のシーケンスが示される。 FIG. 9 shows an example in which a load / store instruction is executed by four slots. 9A to 9C show a sequence of load / store processing LS1 to LS4 by slots 1040 to 1043.

図９（Ａ）は、バンク競合が生じる場合のシーケンスを示す。スロット１０４０、１０４１によるロード・ストア命令ＬＳ１、ＬＳ２のシーケンスは、図５（Ｂ）と同じである。すなわち、ロード・ストア命令ＬＳ１を処理するスロット１０４０は、バンクアクセス順序ＢＡ１（ＢＫ０、ＢＫ１、ＢＫ２、ＢＫ３の順序）でデータメモリ１０６にアクセスしてデータの読み出しを行う。また、ロード・ストア命令ＬＳ２を処理するスロット１０４１は、バンクアクセス順序ＢＡ２（ＢＫ１、ＢＫ２、ＢＫ３、ＢＫ０の順序）でデータメモリ１０６にアクセスしてデータの読み出しを行う。 FIG. 9A shows a sequence when bank conflict occurs. The sequence of the load / store instructions LS1 and LS2 by the slots 1040 and 1041 is the same as that in FIG. That is, the slot 1040 that processes the load / store instruction LS1 accesses the data memory 106 in the bank access order BA1 (the order of BK0, BK1, BK2, and BK3) to read data. Further, the slot 1041 for processing the load / store instruction LS2 accesses the data memory 106 in the bank access order BA2 (the order of BK1, BK2, BK3, BK0) and reads the data.

これに加え、スロット１０４２は、ロード・ストア命令ＬＳ３を実行する。ロード・ストア命令ＬＳ３は、「ｖｌｄｈｓｒ４，ｖｒ２」であり、データメモリ１０６の論理アドレス「ｓｒ４」（たとえばバンクＢＫ３の「０ｘ２３０」）から順次、論理ベクトルレジスタ番号ｖｒ２にデータの読み出しを指示する命令である。これに従いスロット１０４２は、バンクアクセス順序ＢＡ４（ＢＫ３、ＢＫ０、ＢＫ１、ＢＫ２の順序）でデータメモリ１０６にアクセスする。すなわち、スロット１０４２は、サイクルＣ３でバンクＢＫ３のアドレス「０ｘ２３０」に、サイクルＣ４でバンクＢＫ０のアドレス「０ｘ２４０」に、サイクルＣ５でバンクＢＫ１のアドレス「０ｘ２５０」に、そして、サイクルＣ６でバンクＢＫ２のアドレス「０ｘ２６０」にアクセスする。さらに、スロット１０４２は、サイクルＣ７でバンクＢＫ３のアドレス「０ｘ２７０」に、サイクルＣ８でバンクＢＫ０のアドレス「０ｘ２８０」に、サイクルＣ９でバンクＢＫ１のアドレス「０ｘ２９０」に、そして、サイクルＣ１０でバンクＢＫ２のアドレス「０ｘ２Ａ０」にアクセスする。 In addition, the slot 1042 executes a load / store instruction LS3. The load / store instruction LS3 is “vldh sr4, vr2”, and is an instruction for sequentially instructing the logical vector register number vr2 to read data from the logical address “sr4” of the data memory 106 (for example, “0x230” of the bank BK3). It is. Accordingly, the slot 1042 accesses the data memory 106 in the bank access order BA4 (order of BK3, BK0, BK1, and BK2). That is, the slot 1042 is changed to the address “0x230” of the bank BK3 in the cycle C3, the address “0x240” of the bank BK0 in the cycle C4, the address “0x250” of the bank BK1 in the cycle C5, and the bank BK2 in the cycle C6. Access to address “0x260”. Further, the slot 1042 is changed to the address “0x270” of the bank BK3 in the cycle C7, the address “0x280” of the bank BK0 in the cycle C8, the address “0x290” of the bank BK1 in the cycle C9, and the bank BK2 in the cycle C10. Access to address “0x2A0”.

また、スロット１０４３は、ロード・ストア命令ＬＳ４を実行する。ロード・ストア命令ＬＳ４は、「ｖｌｄｈｓｒ５，ｖｒ３」であり、データメモリ１０６の論理アドレス「ｓｒ５」（たとえばバンクＢＫ２の「０ｘ３２０」）から順次、論理ベクトルレジスタ番号ｖｒ３にデータの読み出しを指示する命令である。これに従いスロット１０４３は、バンクアクセス順序ＢＡ３（ＢＫ２、ＢＫ３、ＢＫ０、ＢＫ１の順序）でデータメモリ１０６にアクセスする。たとえば、サイクルＣ４でバンクＢＫ２のアドレス「０ｘ３２０」に、サイクルＣ５でバンクＢＫ３のアドレス「０ｘ３３０」に、サイクルＣ６でバンクＢＫ０のアドレス「０ｘ３４０」に、そして、サイクルＣ７でバンクＢＫ１のアドレス「０ｘ３５０」にアクセスし、さらに引き続いて、サイクルＣ８でバンクＢＫ２のアドレス「０ｘ３６０」に、サイクルＣ９でバンクＢＫ３のアドレス「０ｘ３７０」に、サイクルＣ１０でバンクＢＫ０のアドレス「０ｘ３８０」に、サイクルＣ１１でバンクＢＫ１のアドレス「０ｘ３９０」にアクセスする。 The slot 1043 executes a load / store instruction LS4. The load / store instruction LS4 is “vldh sr5, vr3”, and is an instruction for instructing the logical vector register number vr3 to read data sequentially from the logical address “sr5” of the data memory 106 (for example, “0x320” of the bank BK2). It is. Accordingly, the slot 1043 accesses the data memory 106 in the bank access order BA3 (BK2, BK3, BK0, BK1 order). For example, the address “0x320” of the bank BK2 in cycle C4, the address “0x330” of the bank BK3 in cycle C5, the address “0x340” of the bank BK0 in cycle C6, and the address “0x350” of the bank BK1 in cycle C7 Then, in cycle C8, the address “0x360” of bank BK2 is changed to address “0x370” of bank BK3 in cycle C9, the address “0x380” of bank BK0 is changed in cycle C10, and the address of bank BK1 is changed in cycle C11. Access the address “0x390”.

図９（Ａ）では、サイクルＣ２〜Ｃ８で、スロット１０４０、１０４１においてバンク競合が発生している。ここで、ロード・ストア命令ＬＳ２〜ＬＳ４を順次、先のロード・ストア命令とのバンク競合が回避されるまでストールさせると、図９（Ｂ）のようになる。 In FIG. 9A, bank conflicts occur in slots 1040 and 1041 in cycles C2 to C8. Here, when the load / store instructions LS2 to LS4 are sequentially stalled until bank conflict with the previous load / store instruction is avoided, the result is as shown in FIG.

図９（Ｂ）に示すように、ロード・ストア命令ＬＳ２が１サイクル、ロード・ストア命令ＬＳ３が３命令サイクル、そして、ロード・ストア命令ＬＳ４が６命令サイクルストールされることで、バンク競合が回避される。しかし、そうすることにより、ロード・ストア命令ＬＳ１〜ＬＳ４の終了するタイミングが、サイクルＣ１７まで遅延する。 As shown in FIG. 9B, the bank conflict is avoided by stalling the load / store instruction LS2 for one cycle, the load / store instruction LS3 for three instruction cycles, and the load / store instruction LS4 for six instruction cycles. Is done. However, by doing so, the end timing of the load / store instructions LS1 to LS4 is delayed until the cycle C17.

そこで、本実施形態では、図９（Ｃ）に示すように、スロット１０４０がロード・ストア命令ＬＳ１の実行を開始した後にスロット１０４１がロード・ストア命令ＬＳ２の実行を開始するとき、制御部１１４は、スロット１０４０、１０４１のバンクへのアクセスが競合しないような競合回避バンクアクセス順序に並べ替えて（矢印９１、９２）、スロット１０４１にロード・ストア命令ＬＳ２を実行させる。競合回避アクセス順序は、最後にアクセスされるバンクが最初にアクセスされるような順序である。たとえば、バンクアクセス順序ＢＡ２において最後にアクセスされるバンクＢＫ０が最初にアクセスされるバンクアクセス順序ＢＡ１である。すなわち、ここでは、ロード・ストア命令ＬＳ１と同じバンクアクセス順序になる。ここにおいて、スロット１０４０、１０４１、及び１０４２におけるバンク競合が回避される。 Therefore, in the present embodiment, as shown in FIG. 9C, when the slot 1041 starts executing the load / store instruction LS2 after the slot 1040 starts executing the load / store instruction LS1, the control unit 114 Then, they are rearranged in the conflict avoidance bank access order so that accesses to the banks of the slots 1040 and 1041 do not conflict (arrows 91 and 92), and the load / store instruction LS2 is executed in the slot 1041. The contention avoidance access order is such that the last accessed bank is accessed first. For example, bank BK0 accessed last in bank access order BA2 is bank access order BA1 accessed first. That is, here, the bank access order is the same as that of the load / store instruction LS1. Here, bank contention in slots 1040, 1041, and 1042 is avoided.

さらに、スロット１０４２がロード・ストア命令ＬＳ３の実行を開始した後にスロット１０４３がロード・ストア命令ＬＳ４の実行を開始するとき、制御部１１４は、ロード・ストア命令ＬＳ４におけるバンクアクセス順序ＢＡ３を、スロット１０４２、１０４３のバンクへのアクセスが競合しないようなバンクアクセス順序、すなわち競合回避バンクアクセス順序に並べ替えて（矢印９３、９４）、スロット１０４３にロード・ストア命令ＬＳ４を実行させる。競合回避アクセス順序は、最後にアクセスされるバンクを最初にアクセスするような順序である。たとえば、バンクアクセス順序ＢＡ４において最後にアクセスされるバンクＢＫ１が最初にアクセスされるようなバンクアクセス順序ＢＡ２である。ここにおいて、スロット１０４０〜１０４３におけるバンク競合が回避される。 Further, when the slot 1043 starts executing the load / store instruction LS4 after the slot 1042 starts executing the load / store instruction LS3, the control unit 114 changes the bank access order BA3 in the load / store instruction LS4 to the slot 1042. , 1043 are rearranged in the bank access order that does not conflict with the access to the bank of 1043, that is, the conflict avoidance bank access order (arrows 93 and 94), and the load / store instruction LS4 is executed in the slot 1043. The contention avoidance access order is such that the last accessed bank is accessed first. For example, the bank access order BA2 is such that the bank BK1 accessed last in the bank access order BA4 is accessed first. Here, bank contention in the slots 1040 to 1043 is avoided.

図９（Ｃ）では、ロード・ストア命令ＬＳ１〜ＬＳ４が、サイクルＣ１１で終了する。図９（Ｂ）との比較において、６サイクル分、処理を高速化できる。 In FIG. 9C, the load / store instructions LS1 to LS4 end in the cycle C11. In comparison with FIG. 9B, the processing speed can be increased by 6 cycles.

このように、本実施形態によれば、複数のスロットにおいて、処理効率を低下させることなくバンク競合を回避することができる。 Thus, according to the present embodiment, bank contention can be avoided in a plurality of slots without reducing processing efficiency.

ところで、各ロード・ストア命令では、データメモリ１０６から読み出されるデータを書き込むベクトルレジスタ１１０のレジスタが指定される。たとえば、図９（Ａ）〜（Ｃ）におけるロード・ストア命令ＬＳ１は、論理ベクトルレジスタ番号ｖｒ０を指定し、データメモリ１０６から読み出されるデータ要素を、論理ベクトルレジスタ番号ｖｒ０に対応する物理ベクトルレジスタ番号ｖｒ[０]〜ｖｒ[６３]に書き込むための命令である。また、ロード・ストア命令ＬＳ２は、論理ベクトルレジスタ番号ｖｒ１を指定し、データメモリ１０６から読み出されるデータ要素を論理ベクトルレジスタ番号ｖｒ１に対応する物理ベクトルレジスタ番号ｖｒ[６４]〜ｖｒ[１２７]に書き込むための命令である。さらに、ロード・ストア命令ＬＳ３は、論理ベクトルレジスタ番号ｖｒ２を指定し、データメモリ１０６から読み出されるデータ要素を論理ベクトルレジスタ番号ｖｒ２に対応する物理ベクトルレジスタ番号ｖｒ[１２８]〜ｖｒ[１９１]に書き込むための命令である。そして、ロード・ストア命令ＬＳ４は、論理ベクトルレジスタ番号ｖｒ３を指定し、データメモリ１０６から読み出されるデータ要素を論理ベクトルレジスタ番号ｖｒ３に対応する物理ベクトルレジスタ番号ｖｒ[１９２]〜ｖｒ[２５５]に書き込むための命令である。すると、上記のようにしてロード・ストア命令におけるバンクアクセス順序を並べ替えたときに、ベクトルレジスタ１１０におけるデータ書込みレジスタのアクセス順序が変更される。このことは、ベクトルレジスタ１１０からデータを読み出して演算を行う演算命令の開始タイミングに影響を与える。 By the way, in each load / store instruction, a register of the vector register 110 to which data read from the data memory 106 is written is designated. For example, the load / store instruction LS1 in FIGS. 9A to 9C designates the logical vector register number vr0, and the data element read from the data memory 106 is changed to the physical vector register number corresponding to the logical vector register number vr0. This is an instruction for writing to vr [0] to vr [63]. The load / store instruction LS2 designates the logical vector register number vr1, and writes the data element read from the data memory 106 to the physical vector register numbers vr [64] to vr [127] corresponding to the logical vector register number vr1. It is an instruction for. Further, the load / store instruction LS3 designates the logical vector register number vr2, and writes the data element read from the data memory 106 to the physical vector register numbers vr [128] to vr [191] corresponding to the logical vector register number vr2. It is an instruction for. Then, the load / store instruction LS4 designates the logical vector register number vr3 and writes the data element read from the data memory 106 to the physical vector register numbers vr [192] to vr [255] corresponding to the logical vector register number vr3. It is an instruction for. Then, when the bank access order in the load / store instruction is rearranged as described above, the access order of the data write register in the vector register 110 is changed. This affects the start timing of an operation instruction that performs an operation by reading data from the vector register 110.

そこで、本実施形態における制御部１１４は、ベクトルレジスタ１１０の複数のレジスタにスロット１０４１によるレジスタアクセス順序とは異なるレジスタアクセス順序でアクセスし、スロット１０４１により書き込まれたデータを読出す演算命令をスロット１０４２が実行する場合において、スロット１０４２によるレジスタアクセス順序を、読出しデータのスロット１０４１による書き込みが終了したレジスタから順にアクセスされるようなレジスタアクセス順序に並べ替えて、スロット１０４２に演算命令を実行させる。 Therefore, the control unit 114 according to the present embodiment accesses a plurality of registers of the vector register 110 in a register access order different from the register access order by the slot 1041, and issues an operation instruction for reading the data written by the slot 1041 to the slot 1042. In this case, the register access order by the slot 1042 is rearranged to the register access order in which the read data is written in the slot 1041 in order, and the slot 1042 executes the arithmetic instruction.

図１０は、レジスタアクセス順序の並べ替えについて説明する図である。図１０（Ａ）には、スロット１０４０、１０４１、及び１０４２が、ロード・ストア命令ＬＳ１、ＬＳ２、及び演算命令ＡＬ３を実行する場合のシーケンスが示される。スロット１０４０、１０４１は、ロード・ストア命令ＬＳ１、ＬＳ２を実行して、ベクトルレジスタ１１０にデータ要素を書き込む。スロット１０４２は、スロット１０４１によりベクトルレジスタ１１０に書き込まれたデータを読み出す演算命令ＡＬ３を実行する。演算命令ＡＬ３は、「ｖａｄｄｈｖｒ１，ｖｒ２,ｖｒ３」であり、論理ベクトルレジスタ番号ｖｒ１、ｖｒ２に書き込まれたデータ要素を加算して、加算結果のｖｒ３への書き込みを指示する命令である。 FIG. 10 is a diagram for explaining the rearrangement of the register access order. FIG. 10A shows a sequence when the slots 1040, 1041, and 1042 execute the load / store instructions LS1, LS2, and the arithmetic instruction AL3. Slots 1040 and 1041 execute load / store instructions LS 1 and LS 2 to write data elements to the vector register 110. The slot 1042 executes an arithmetic instruction AL3 that reads data written to the vector register 110 by the slot 1041. The arithmetic instruction AL3 is “vaddh vr1, vr2, vr3”, and is an instruction for adding the data elements written in the logical vector register numbers vr1 and vr2 and instructing the writing of the addition result to vr3.

ロード・ストア命令ＬＳ１、ＬＳ２のシーケンスにおいて、スロット１０４０、１０４１によりアクセスされるバンクと、読み出されたデータ要素を書き込むベクトルレジスタ１１０の物理ベクトルレジスタ番号が示される。ここでは、スロット１０４０、１０４１におけるバンク競合を回避するために、ロード・ストア命令ＬＳ２が競合回避バンクアクセス順序に並べ変えられた状態が示される。 In the sequence of the load / store instructions LS1 and LS2, the bank accessed by the slots 1040 and 1041 and the physical vector register number of the vector register 110 to which the read data element is written are shown. Here, a state in which the load / store instruction LS2 is rearranged in the conflict avoidance bank access order in order to avoid bank conflict in the slots 1040 and 1041 is shown.

たとえば、スロット１０４０は、サイクルＣ１でバンクＢＫ０にアクセスして読み出すデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[０]−[７]に、サイクルＣ２でバンクＢＫ１にアクセスして読み出したデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[８]−[１５]に、サイクルＣ３でバンクＢＫ２にアクセスして読み出したデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[１６]−[２３]に、そして、サイクルＣ４でバンクＢＫ３にアクセスして読み出すデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[２４]−[３１]に書き込む。一方、スロット１０４１は、サイクルＣ２でバンクＢＫ０にアクセスして読み出すデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[８８]−[９５]に、サイクルＣ３でバンクＢＫ１にアクセスして読み出すデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[６４]−[７１]に、サイクルＣ４でバンクＢＫ２にアクセスして読み出すデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[７２]−[７９]に、そして、サイクルＣ５でバンクＢＫ３にアクセスして読み出すデータをベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[８０]−[８７]に書き込む。 For example, in the slot 1040, data read by accessing the bank BK0 in the cycle C1 is stored in the physical vector register number vr [0]-[7] of the vector register 110, and data read by accessing the bank BK1 in the cycle C2 is stored in the vector. The physical vector register number vr [8]-[15] of the register 110, the data read by accessing the bank BK2 in cycle C3 to the physical vector register number vr [16]-[23] of the vector register 110, and In cycle C4, data read out by accessing the bank BK3 is written into the physical vector register numbers vr [24]-[31] of the vector register 110. On the other hand, in the slot 1041, the data read out by accessing the bank BK0 in the cycle C2 is stored in the physical vector register number vr [88]-[95] of the vector register 110, and the data read out by accessing the bank BK1 in the cycle C3 110, the physical vector register number vr [64]-[71] of 110, the data read by accessing the bank BK2 in cycle C4 to the physical vector register number vr [72]-[79] of the vector register 110, and the cycle C5 The data read by accessing the bank BK3 is written in the physical vector register number vr [80]-[87] of the vector register 110.

また、演算命令ＡＬ３のシーケンスにおいて、スロット１０４２がデータを読み出すためにアクセスする、ベクトルレジスタ１１０の物理ベクトルレジスタ番号が示される。演算命令ＡＬ３では、ベクトルレジスタ１１０において物理ベクトルレジスタ番号の小さい順にアクセスが行われる。よって、ロード・ストア命令ＬＳ２によりアクセスされてデータが書き込まれるベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[６４]−[７１]、ｖｒ [７２]−[７９]、ｖｒ [８０]−[８７]、及びｖｒ [８８]−[９５]に対し、順にアクセスされる。このレジスタアクセス順序は、バンクアクセス順序変更前のロード・ストア命令ＬＳ２におけるレジスタアクセス順序に対応する。 In addition, in the sequence of the arithmetic instruction AL3, the physical vector register number of the vector register 110 that the slot 1042 accesses to read data is indicated. In the arithmetic instruction AL3, the vector register 110 is accessed in ascending order of physical vector register numbers. Therefore, the physical vector register numbers vr [64]-[71], vr [72]-[79], vr [80]-[87] of the vector register 110 to which the data is accessed by the load / store instruction LS2 are written. And vr [88]-[95] are accessed in order. This register access order corresponds to the register access order in the load / store instruction LS2 before the bank access order change.

ところで、ロード・ストア命令ＬＳ２におけるデータメモリ１０６へのアクセスは、図１で示した「ＥＸ」ステージで実行される。そして、データメモリ１０６からのデータ読み出しは「ＭＥＭ」ステージで実行され、ベクトルレジスタ１１０への書き込みは「ＷＢ」ステージで実行される。よって、図１０（Ａ）において、ロード・ストア命令ＬＳ２によるベクトルレジスタ１１０へのデータ書き込みは、データメモリ１０６へのアクセスが実行されるサイクルから２サイクル後に終了する。よって、このデータのベクトルレジスタ１１０からの読み出しは、３サイクル後から可能になる。たとえば、物理ベクトルレジスタ番号ｖｒ[６４]−[７１]からのデータの読み出しは、サイクルＣ３の３サイクル後のサイクルＣ６から可能になる。よって、演算命令ＡＬ３によれば、スロット１０４２は、サイクルＣ６でベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[６４]−[７１]から、サイクルＣ７でベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[７２]−[７９]から、サイクルＣ８でベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[８０]−[８７]から、そしてサイクルＣ９でベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[８８]−[９５]からデータ要素を読み出す。よって、演算命令ＡＬ３は、サイクルＣ９で終了する。 Incidentally, the access to the data memory 106 by the load / store instruction LS2 is executed in the “EX” stage shown in FIG. Data reading from the data memory 106 is executed at the “MEM” stage, and writing to the vector register 110 is executed at the “WB” stage. Accordingly, in FIG. 10A, the data writing to the vector register 110 by the load / store instruction LS2 ends after two cycles from the cycle in which the access to the data memory 106 is executed. Therefore, this data can be read from the vector register 110 after three cycles. For example, data can be read from the physical vector register numbers vr [64]-[71] from cycle C6, which is three cycles after cycle C3. Therefore, according to the operation instruction AL3, the slot 1042 is changed from the physical vector register number vr [64]-[71] of the vector register 110 in cycle C6 to the physical vector register number vr [72] − of the vector register 110 in cycle C7. From [79], in cycle C8, the data element is obtained from the physical vector register number vr [80]-[87] of the vector register 110, and in cycle C9 from the physical vector register number vr [88]-[95] of the vector register 110. read out. Therefore, the arithmetic instruction AL3 ends at cycle C9.

ここで、ベクトルレジスタ１１０の物理ベクトルレジスタ番号ｖｒ[８８]−[９５]からのデータ要素の読み出しに着目すると、ロード・ストア命令ＬＳ２のバンクアクセス順序が並べ替えられたことにより、物理ベクトルレジスタ番号ｖｒ[８８]−[９５]へのデータ要素の書込みは、サイクルＣ４で終了する。よって、物理ベクトルレジスタ番号ｖｒ[８８]−[９５]からのデータ要素の読出しは、サイクルＣ５から可能になる。そこで、本実施形態では、図１０（Ｂ）に示されるように、制御部１１４は、演算命令ＡＬ３におけるレジスタアクセス順序を、読出しデータのロード・ストア命令ＬＳ２による書き込みが終了したレジスタから順にアクセスされるようなレジスタアクセス順序、たとえば物理ベクトルレジスタ番号ｖｒ[８８]−[９５]へのアクセスが最初に行われるような順序に並べ替える（矢印１０００）。このようにすることで、演算命令ＡＬ３は、サイクルＣ８で終了する。よって、処理時間が短縮される。 Here, paying attention to reading of data elements from the physical vector register numbers vr [88]-[95] of the vector register 110, the physical vector register numbers are obtained by rearranging the bank access order of the load / store instruction LS2. Writing of the data element to vr [88]-[95] ends in cycle C4. Therefore, the data element can be read from the physical vector register number vr [88]-[95] from cycle C5. Therefore, in this embodiment, as shown in FIG. 10B, the control unit 114 accesses the register access order in the operation instruction AL3 in order from the register in which the read data write / store instruction LS2 has been written. In such a register access order, for example, the physical vector register numbers vr [88]-[95] are rearranged in the order in which access is first made (arrow 1000). In this way, the arithmetic instruction AL3 ends at cycle C8. Therefore, the processing time is shortened.

図１０（Ｃ）には、ベクトルレジスタ１１０へのデータ要素の書込みが終了する前に読み出しを行うことにより処理が破綻することを回避する例として、ロード・ストア命令ＬＳ２におけるバンクアクセス順序の並べ替えを管理せずに、演算命令ＡＬ３をストールする場合が示される。ここでは、ロード・ストア命令ＬＳ２の最後の処理サイクルＣ５でアクセスされるデータの書き込みが終了してから、すなわちサイクルＣ８から、演算命令ＡＬ３の実行が開始される。よって、演算命令ＡＬ３の終了は、サイクルＣ１１になる。本実施形態によれば、このような図１０（Ｃ）との比較において、図１０（Ｂ）に示すように、３サイクル処理の終了が早くなる。 FIG. 10C shows an example of rearranging the bank access order in the load / store instruction LS2 as an example of avoiding the processing failure by performing the reading before the writing of the data element to the vector register 110 is completed. In this case, the operation instruction AL3 is stalled without managing. Here, the execution of the arithmetic instruction AL3 is started after the writing of data accessed in the last processing cycle C5 of the load / store instruction LS2 is completed, that is, from the cycle C8. Therefore, the end of the operation instruction AL3 is cycle C11. According to this embodiment, in comparison with FIG. 10C, the end of the three-cycle process is accelerated as shown in FIG. 10B.

ここで、再び図１、及び図６を参照して、レジスタアクセス順序の並べ替えを行う制御部１１４の動作について説明する。 Here, the operation of the control unit 114 that rearranges the register access order will be described with reference to FIGS. 1 and 6 again.

制御部１１４では、依存関係検出部１１４２に、命令メモリ１０２からロード・ストア命令ＬＳ２と演算命令ＡＬ３が入力される。依存関係検出部１１４２は、ロード・ストア命令ＬＳ２と演算命令ＡＬ３とを解析して、ロード・ストア命令ＬＳ２に対する演算命令ＡＬ３の依存関係を検出する。依存関係は、ロード・ストア命令ＬＳ２によりデータを書き込む論理ベクトルレジスタ番号と、演算命令ＡＬ３によりデータを読み出す論理ベクトルレジスタ番号が重複するときに検出される。 In the control unit 114, the load / store instruction LS 2 and the operation instruction AL 3 are input from the instruction memory 102 to the dependency relationship detection unit 1142. The dependency relationship detection unit 1142 analyzes the load / store instruction LS2 and the operation instruction AL3, and detects the dependency relationship of the operation instruction AL3 with respect to the load / store instruction LS2. The dependency relationship is detected when the logical vector register number for writing data by the load / store instruction LS2 and the logical vector register number for reading data by the operation instruction AL3 overlap.

依存関係検出部１１４２は、検出結果を並べ替え制御部１１４４に転送する。並べ替え制御部１１４４は、依存関係が検出された場合、上述したロード・ストア命令ＬＳ２のバンクアクセス順序の並べ替えに加え、演算命令ＡＬ３におけるレジスタアクセス順序の並べ替えを行う。さらに、並べ替え制御部１１４４は、レジスタアクセス順序を並べ替えた演算命令ＡＬ３の発行タイミングを決定する。 The dependency relationship detection unit 1142 transfers the detection result to the rearrangement control unit 1144. When the dependency relationship is detected, the rearrangement control unit 1144 rearranges the register access order in the arithmetic instruction AL3 in addition to the rearrangement of the bank access order of the load / store instruction LS2. Further, the rearrangement control unit 1144 determines the issue timing of the arithmetic instruction AL3 in which the register access order is rearranged.

並べ替え制御部１１４４では、アクセス順序制御部６０２に、ロード・ストア命令ＬＳ２と、演算命令ＡＬ３が入力される。そして、アクセス順序制御部６０２は、ロード・ストア命令ＬＳ２の並べ替え前のバンクアクセス順序と、並べ替え後のバンクアクセス順序とに基づき、演算命令ＡＬ３におけるレジスタアクセス順序の並べ替えを行う。ロード・ストア命令ＬＳ２の並べ替え前のバンクアクセス順序と、並べ替え後のバンクアクセス順序とは、並べ替え管理フラグ６１０から取得される。 In the rearrangement control unit 1144, the load / store instruction LS 2 and the operation instruction AL 3 are input to the access order control unit 602. Then, the access order control unit 602 rearranges the register access order in the arithmetic instruction AL3 based on the bank access order before the rearrangement of the load / store instruction LS2 and the bank access order after the rearrangement. The bank access order before rearrangement of the load / store instruction LS2 and the bank access order after rearrangement are acquired from the rearrangement management flag 610.

図１１は、アクセス順序制御部６０２の動作を説明するための図である。図１１（Ａ）のテーブルには、並べ替え前のバンクアクセス順序と並べ替え後のバンクアクセス順序に対応するレジスタアクセス順序が示される。たとえば、ロード・ストア命令ＬＳ２の並べ替え前のバンクアクセス順序はＢＡ２であり、並べ替え後のバンクアクセス順序はＢＡ１である。よって、これに対応するレジスタアクセス順序は、ＲＡ４である。 FIG. 11 is a diagram for explaining the operation of the access order control unit 602. The table in FIG. 11A shows the register access order corresponding to the bank access order before rearrangement and the bank access order after rearrangement. For example, the bank access order before the rearrangement of the load / store instruction LS2 is BA2, and the bank access order after the rearrangement is BA1. Therefore, the register access order corresponding to this is RA4.

図１１（Ｂ）には、ベクトルレジスタ１１０におけるレジスタアクセス順序が示される。図１１（Ｂ）では、論理ベクトルレジスタ番号ｖｒ０における物理ベクトルレジスタ番号ｖｒ[０]−[３１]を例として、８個のデータ要素ずつデータ書込みを行うときの物理ベクトルレジスタ番号のアクセス順序が示される。たとえば、レジスタアクセス順序ＲＡ１は、ｖｒ[０]−[７]、ｖｒ[８]−[１５]、ｖｒ[１６]−[２３]、ｖｒ[２４]−[３１]の順序である。また、レジスタアクセス順序ＲＡ２は、ｖｒ[８]−[１５]、ｖｒ[１６]−[２３]、ｖｒ[２４]−[３１]、ｖｒ[０]−[７]の順序である。さらに、レジスタアクセス順序ＲＡ３は、ｖｒ[１６]−[２３]、ｖｒ[２４]−[３１]、ｖｒ[０]−[７]、ｖｒ[８]−[１５]の順序である。そして、レジスタアクセス順序ＲＡ４は、ｖｒ[２４]−[３１]、ｖｒ[０]−[７]、ｖｒ[８]−[１５]、ｖｒ[１６]−[２３]の順序である。図１１（Ｂ）によると、たとえば、演算命令ＡＬ３の並べ替えられるレジスタアクセス順序ＲＡ４は、物理ベクトルレジスタ番号ｖｒ[６４]−[９５]において、ｖｒ[８８]−[９５]、ｖｒ[６４]−[７１]、ｖｒ[７２]−[７９]、ｖｒ[８０]−[８７]の順序である。 FIG. 11B shows the register access order in the vector register 110. FIG. 11B shows the access order of physical vector register numbers when data is written by eight data elements, taking physical vector register numbers vr [0]-[31] in logical vector register number vr0 as an example. It is. For example, the register access order RA1 is the order of vr [0]-[7], vr [8]-[15], vr [16]-[23], vr [24]-[31]. The register access order RA2 is the order of vr [8]-[15], vr [16]-[23], vr [24]-[31], vr [0]-[7]. Furthermore, the register access order RA3 is the order of vr [16]-[23], vr [24]-[31], vr [0]-[7], vr [8]-[15]. The register access order RA4 is an order of vr [24]-[31], vr [0]-[7], vr [8]-[15], vr [16]-[23]. According to FIG. 11B, for example, the register access order RA4 in which the operation instruction AL3 is rearranged is vr [88]-[95], vr [64] in the physical vector register number vr [64]-[95]. The order is-[71], vr [72]-[79], vr [80]-[87].

アクセス順序制御部６０２は、図１１（Ａ）、（Ｂ）の情報をマップデータ等として予め内部のＲＯＭ（Read Only Memory）などに記憶する。そして、アクセス順序制御部６０２は、図１１（Ａ）、（Ｂ）の情報を用いて、ロード・ストア命令ＬＳ２の並べ替え前後のバンクアクセス順序から、演算命令ＡＬ３におけるレジスタアクセス順序を決定し、その並べ替えを行う。 The access order control unit 602 stores information in FIGS. 11A and 11B in advance in an internal ROM (Read Only Memory) or the like as map data or the like. Then, the access order control unit 602 determines the register access order in the arithmetic instruction AL3 from the bank access order before and after the rearrangement of the load / store instruction LS2, using the information in FIGS. 11A and 11B. Sort it.

そして、アクセス順序制御部６０２は、並べ替えたレジスタアクセス順序を並べ替えアドレス生成部１１４６に転送する。並べ替えアドレス生成部１１４６は、レジスタアクセス順序に対応するベクトルレジスタ１１０のアドレスを生成し、演算命令ＡＬ３を実行するスロット１１４２に転送する。 Then, the access order control unit 602 transfers the rearranged register access order to the rearrangement address generation unit 1146. The rearrangement address generation unit 1146 generates an address of the vector register 110 corresponding to the register access order, and transfers it to the slot 1142 that executes the operation instruction AL3.

図１２は、並べ替え管理フラグ６０８〜６１４に書き込まれる値の例を示す。図１２（Ａ）は、図１０（Ｂ）で示したシーケンス図である。図１２（Ｂ）には、図１２（Ａ）のシーケンス図に対応して、ロード・ストア命令ＬＳ１を実行するスロット１０４０用の並べ替え管理フラグ６０８、ロード・ストア命令ＬＳ２を実行するスロット１０４１用の並べ替え管理フラグ６１０、及び、演算命令ＡＬ３を実行するスロット１０４２用の並べ替え管理フラグ６１２の例が示される。ロード・ストア命令ＬＳ１はバンクアクセス順序ＢＡ１を有するので、ロード・ストア命令ＬＳ１が実行されるサイクルＣ１〜Ｃ４において、並べ替え管理フラグ６０８にはバンクアクセス順序「ＢＡ１」が書き込まれる。また、ロード・ストア命令ＬＳ２は、バンクアクセス順序がＢＡ２からＢＡ１に変更される。よって、ロード・ストア命令ＬＳ２が実行されるサイクルＣ２〜Ｃ５において、並べ替え管理フラグ６０８には、変更前のバンクアクセス順序「ＢＡ２」と変更後の競合回避バンクアクセス順序「ＢＡ１」が書き込まれる。 FIG. 12 shows an example of values written in the rearrangement management flags 608 to 614. FIG. 12A is the sequence diagram shown in FIG. In FIG. 12B, corresponding to the sequence diagram of FIG. 12A, the rearrangement management flag 608 for the slot 1040 for executing the load / store instruction LS1 and the slot 1041 for executing the load / store instruction LS2 are shown. The rearrangement management flag 610 and the rearrangement management flag 612 for the slot 1042 for executing the arithmetic instruction AL3 are shown. Since the load / store instruction LS1 has the bank access order BA1, the bank access order “BA1” is written in the rearrangement management flag 608 in the cycles C1 to C4 in which the load / store instruction LS1 is executed. In the load / store instruction LS2, the bank access order is changed from BA2 to BA1. Accordingly, in cycles C2 to C5 in which the load / store instruction LS2 is executed, the bank access order “BA2” before the change and the conflict avoidance bank access order “BA1” after the change are written in the rearrangement management flag 608.

そして、サイクルＣ１の時点で、並べ替え管理フラグ６０８が有するバンクアクセス順序「ＢＡ１」と、並べ替え管理フラグ６１０が有する競合回避バンクアクセス順序「ＢＡ１」とに基づき、演算命令ＡＬ３のレジスタアクセス順序がＲＡ１からＲＡ４に変更される。よって、並べ替え管理フラグ６１２にはレジスタアクセス順序「ＲＡ４」が書き込まれる。そして、演算命令ＡＬ３が実行されるサイクルＣ５〜Ｃ８において、並べ替え管理フラグ６１２の値は、レジスタアクセス順序「ＲＡ４」に維持される。 At the time of cycle C1, the register access order of the arithmetic instruction AL3 is based on the bank access order “BA1” included in the rearrangement management flag 608 and the conflict avoidance bank access order “BA1” included in the rearrangement management flag 610. It is changed from RA1 to RA4. Therefore, the register access order “RA4” is written in the rearrangement management flag 612. Then, in cycles C5 to C8 in which the arithmetic instruction AL3 is executed, the value of the rearrangement management flag 612 is maintained in the register access order “RA4”.

図１、図６に戻る。 Returning to FIG. 1 and FIG.

並べ替え制御部１１４４は、レジスタアクセス順序を並べ替えた演算命令ＡＬ３の発行タイミングを決定する。レジスタ管理部６０４に、デコーダ１０８からロード・ストア命令の実行状態が入力される。実行状態は、「ＥＸ」ステージ、「ＭＥＭ」ステージ、「ＷＢ」ステージのいずれのステージが実行されたかを示す。レジスタ管理部６０４は、実行状態をレジスタ管理フラグ６１６〜６２２に書き込んで記録する。たとえば、ロード・ストア命令ＬＳ２を実行するスロット１０４１の実行状態は、レジスタ管理フラグ６１６〜６２２のうち６１８に書き込まれる。処理レジスタ管理部６０４は、レジスタ管理フラグ６１６〜６２２の値を、発行タイミング検出部６０６に通知する。そして、発行タイミング検出部６０６は、レジスタ管理フラグ６１６〜６２２の値に基づき、演算命令ＡＬ３の発行タイミングを検出する。 The rearrangement control unit 1144 determines the issue timing of the arithmetic instruction AL3 in which the register access order is rearranged. The execution state of the load / store instruction is input from the decoder 108 to the register management unit 604. The execution state indicates which of the “EX” stage, the “MEM” stage, and the “WB” stage has been executed. The register manager 604 writes and records the execution state in the register management flags 616 to 622. For example, the execution state of the slot 1041 that executes the load / store instruction LS2 is written in 618 of the register management flags 616 to 622. The processing register management unit 604 notifies the issue timing detection unit 606 of the values of the register management flags 616 to 622. The issue timing detection unit 606 detects the issue timing of the arithmetic instruction AL3 based on the values of the register management flags 616 to 622.

図１３は、発行タイミング検出について説明する図である。図１３（Ａ）は、図１０（Ｂ）と同じシーケンス図である。ここでは、ロード・ストア命令ＬＳ１のバンクアクセス順序に応じてロード・ストア命令ＬＳ２のバンクアクセス順序が並べ替えられ、さらに、ロード・ストア命令ＬＳ２のレジスタアクセス順序に応じて演算命令ＡＬ３のレジスタアクセス順序が並べ替えられた状態が示される。 FIG. 13 is a diagram illustrating issue timing detection. FIG. 13A is the same sequence diagram as FIG. Here, the bank access order of the load / store instruction LS2 is rearranged according to the bank access order of the load / store instruction LS1, and further, the register access order of the arithmetic instruction AL3 according to the register access order of the load / store instruction LS2 The state where is rearranged is shown.

図１３（Ｂ）には、上記の並べ替えが行われるときの、レジスタ管理フラグの例が示される。ここでは、ロード・ストア命令ＬＳ２を実行するスロット１０４１用のレジスタ管理フラグ６１８が示される。レジスタ管理フラグ６１８は、さらに、ベクトルレジスタ１１０の物理ベクトルレジスタ番号順に、レジスタ管理フラグ６１８−１、６１８−２、６１８−３、及び６１８−４を有する。レジスタ管理フラグ６１８−１は物理ベクトルレジスタ番号ｖｒ[６４]−[７１]に対する処理に、レジスタ管理フラグ６１８−２は物理ベクトルレジスタ番号ｖｒ[７２]−[７９]に対する処理に、レジスタ管理フラグ６１８−３は物理ベクトルレジスタ番号ｖｒ[８０]−[８７]に対する処理に、そして、レジスタ管理フラグ６１８−４は物理ベクトルレジスタ番号ｖｒ[８８]−[９５]に対する処理に、それぞれ対応する。レジスタ管理フラグ６１８−１〜６１８−４は、初期値が「ＯＦＦ」である。そして、レジスタ管理フラグ６１８−１〜６１８−４は、それぞれ対応する物理ベクトルレジスタ番号ｖｒへのデータ要素の書き込みが終了したときに「ＯＮ」が書き込まれる。 FIG. 13B shows an example of the register management flag when the above rearrangement is performed. Here, a register management flag 618 for the slot 1041 for executing the load / store instruction LS2 is shown. The register management flag 618 further includes register management flags 618-1, 618-2, 618-3, and 618-4 in the order of physical vector register numbers of the vector register 110. The register management flag 618-1 is a process for the physical vector register number vr [64]-[71], and the register management flag 618-2 is a process for the physical vector register number vr [72]-[79]. -3 corresponds to the processing for the physical vector register number vr [80]-[87], and the register management flag 618-4 corresponds to the processing for the physical vector register number vr [88]-[95]. The register management flags 618-1 to 618-4 have an initial value “OFF”. Then, “ON” is written in the register management flags 618-1 to 618-4 when the writing of the data element to the corresponding physical vector register number vr is completed.

図１３（Ａ）に示すように、ロード・ストア命令ＬＳ２では、サイクルＣ２で物理ベクトルレジスタ番号ｖｒ[８８]−[９５]へのデータ読出しのためのバンクアクセスが実行される。すると、２サイクル後のサイクルＣ４のときに「ＷＢ」ステージが実行されてデータがベクトルレジスタ１１０に書き込まれる。よって、サイクルＣ４のときにレジスタ管理フラグ６１８−４は「ＯＮ」になる。同様にして、サイクルＣ３で物理ベクトルレジスタ番号ｖｒ[６４]−[７１]へのデータ読出しのためのバンクアクセスが実行され、２サイクル後のサイクルＣ５のときにデータがベクトルレジスタ１１０に書き込まれる。よって、サイクルＣ５でレジスタ管理フラグ６１８−１は「ＯＮ」になる。また、サイクルＣ４で物理ベクトルレジスタ番号ｖｒ[７２]−[７９]へのデータ読出しのためのバンクアクセスが実行され、２サイクル後のサイクルＣ６のときにデータがベクトルレジスタ１１０に書き込まれる。よって、サイクルＣ６でレジスタ管理フラグ６１８−２は「ＯＮ」になる。そして、サイクルＣ５で物理ベクトルレジスタ番号ｖｒ[８０]−[８７]へのデータ読出しのためのバンクアクセスが実行され、２サイクル後のサイクルＣ７のときにデータがベクトルレジスタ１１０に書き込まれる。よって、サイクルＣ７でレジスタ管理フラグ６１８−３は「ＯＮ」になる。 As shown in FIG. 13A, in the load / store instruction LS2, bank access for data reading to the physical vector register numbers vr [88]-[95] is executed in cycle C2. Then, the “WB” stage is executed at cycle C 4 after two cycles, and data is written into the vector register 110. Therefore, the register management flag 618-4 is set to “ON” in the cycle C4. Similarly, a bank access for reading data to physical vector register numbers vr [64]-[71] is executed in cycle C3, and data is written to vector register 110 in cycle C5 after two cycles. Therefore, the register management flag 618-1 is turned “ON” in the cycle C5. In cycle C4, bank access for reading data to physical vector register numbers vr [72]-[79] is executed, and data is written into the vector register 110 in cycle C6 after two cycles. Therefore, the register management flag 618-2 is turned “ON” in cycle C6. Then, in cycle C5, bank access for reading data to physical vector register numbers vr [80]-[87] is executed, and data is written into the vector register 110 in cycle C7 after two cycles. Therefore, the register management flag 618-3 is turned “ON” in cycle C7.

レジスタ管理部６０４は、発行タイミング検出部６０６に、レジスタ管理フラグ６１６〜６２２の値を転送する。そして、発行タイミング検出部６０６は、レジスタ管理フラグ６１６〜６２２の値が示すロード・ストア命令ＬＳ１およびＬＳ２の実行状態に基づき、発行タイミングを検出する。たとえば、図１３（Ｂ）に示したように、サイクルＣ４でレジスタ管理フラグ６１８−４が「ＯＮ」になると、次のサイクルから演算命令ＡＬ３が実行可能になる。よって、発行タイミング検出部６０６は、サイクルＣ４のときに、発行タイミングを検出する。そして、発行タイミング検出部６０６は、デコーダ１０８に演算命令ＡＬ３の発行を指示する制御信号を伝送する。これに応答して、デコーダ１０８は、サイクルＣ５から実行されるように、スロット１０４２に演算命令ＡＬ３を発行する。 The register management unit 604 transfers the values of the register management flags 616 to 622 to the issue timing detection unit 606. The issue timing detection unit 606 detects the issue timing based on the execution states of the load / store instructions LS1 and LS2 indicated by the values of the register management flags 616 to 622. For example, as shown in FIG. 13B, when the register management flag 618-4 is turned “ON” in the cycle C4, the arithmetic instruction AL3 can be executed from the next cycle. Therefore, the issue timing detection unit 606 detects the issue timing at cycle C4. The issue timing detection unit 606 transmits a control signal instructing the decoder 108 to issue the operation instruction AL3. In response to this, the decoder 108 issues an arithmetic instruction AL3 to the slot 1042 so as to be executed from the cycle C5.

次に、複数のロード・ストア命令に演算命令が依存する例を示す。 Next, an example in which an operation instruction depends on a plurality of load / store instructions is shown.

図１４は、２つのロード・ストア命令に演算命令が依存する場合のシーケンスを示す。図１４（Ａ）には、ロード・ストア命令ＬＳ１、ＬＳ２、及びＬＳ３がスロット１０４０、１０４１、及び１０４２により実行され、スロット１０４１、１０４２によりベクトルレジスタ１１０に書き込まれたデータを、演算命令ＡＬ４を実行するスロット１０４３が読み出す場合のシーケンスが示される。ここでは、ロード・ストア命令ＬＳ１〜ＬＳ３を実行するスロット１０４０〜１０４２におけるバンク競合を回避するために、ロード・ストア命令ＬＳ２、ＬＳ３がストールされた状態が示される。 FIG. 14 shows a sequence when an operation instruction depends on two load / store instructions. In FIG. 14A, the load / store instructions LS1, LS2, and LS3 are executed by the slots 1040, 1041, and 1042, and the operation instruction AL4 is executed on the data written to the vector register 110 by the slots 1041 and 1042. A sequence when the slot 1043 to read is read is shown. Here, a state in which the load / store instructions LS2 and LS3 are stalled in order to avoid bank conflict in the slots 1040 to 1042 for executing the load / store instructions LS1 to LS3 is shown.

ロード・ストア命令ＬＳ１では、バンクアクセス順序ＢＡ１でデータメモリ１０６にアクセスが行われ、ベクトルレジスタ１１０に、レジスタアクセス順序ＲＡ１でデータが書き込まれる。また、ロード・ストア命令ＬＳ２では、バンクアクセス順序ＢＡ２でデータメモリ１０６にアクセスが行われ、ベクトルレジスタ１１０に、レジスタアクセス順序ＲＡ１でデータが書き込まれる。そして、ロード・ストア命令ＬＳ３では、バンクアクセス順序ＢＡ３でデータメモリ１０６にアクセスが行われ、ベクトルレジスタ１１０に、レジスタアクセス順序ＲＡ１でデータが書き込まれる。ここでは、ロード・ストア命令ＬＳ２が１サイクル分ストールされている。また、ロード・ストア命令ＬＳ３が２サイクル分ストールされている。 In the load / store instruction LS1, the data memory 106 is accessed in the bank access order BA1, and data is written in the vector register 110 in the register access order RA1. In the load / store instruction LS2, the data memory 106 is accessed in the bank access order BA2, and data is written in the vector register 110 in the register access order RA1. In the load / store instruction LS3, the data memory 106 is accessed in the bank access order BA3, and data is written in the vector register 110 in the register access order RA1. Here, the load / store instruction LS2 is stalled for one cycle. The load / store instruction LS3 is stalled for two cycles.

また、演算命令ＡＬ４は、「ｖａｄｄｈｖｒ０，ｖｒ１,ｖｒ２」であり、論理ベクトルレジスタ番号ｖｒ０、ｖｒ１に書き込まれたデータ要素を加算して、加算結果のｖｒ２への書き込みを指示する命令である。これに従い、スロット１０４３は、ベクトルレジスタ１１０にレジスタアクセス順序ＲＡ１でアクセスしてデータを読み出し、演算を行う。たとえば、最初のサイクルでは、物理ベクトルレジスタ番号ｖｒ[０]、[６３]にアクセスが行われる。物理ベクトルレジスタ番号ｖｒ[０]に書き込まれるデータへのアクセスは、ロード・ストア命令ＬＳ３においてサイクルＣ５で行われる。よって、書込みが終了するのは２サイクル後のＣ７である。一方、物理ベクトルレジスタ番号ｖｒ[６４]に書き込まれるデータへのアクセスは、ロード・ストア命令ＬＳ２においてサイクルＣ３で行われる。よって、書込みが終了するのは２サイクル後のＣ５である。よって、この場合、演算命令ＡＬ４は、遅い方のサイクルＣ７の次のＣ８から開始される。 The arithmetic instruction AL4 is “vaddh vr0, vr1, vr2,” and is an instruction for adding the data elements written in the logical vector register numbers vr0 and vr1 and instructing writing of the addition result to vr2. In accordance with this, the slot 1043 accesses the vector register 110 in the register access order RA1, reads data, and performs an operation. For example, in the first cycle, the physical vector register numbers vr [0] and [63] are accessed. Access to data written to the physical vector register number vr [0] is performed in cycle C5 in the load / store instruction LS3. Therefore, the writing is completed at C7 after two cycles. On the other hand, access to the data written to the physical vector register number vr [64] is performed in the cycle C3 in the load / store instruction LS2. Therefore, the writing is completed at C5 after two cycles. Therefore, in this case, the arithmetic instruction AL4 starts from C8 next to the later cycle C7.

次に、本実施形態によるシーケンスが、図１４（Ｂ）に示される。図１４（Ｂ）では、ロード・ストア命令ＬＳ２、ＬＳ３が、それぞれ競合回避バンクアクセス順序に並べ替えられた状態が示される。ロード・ストア命令ＬＳ２における競合回避バンクアクセス順序は、当初のバンクアクセス順序ＢＡ２において、最後にアクセスされるバンクＢＫ０が最初にアクセスされるようにする。よって、バンクアクセス順序ＢＡ１になる。また、ロード・ストア命令ＬＳ３における競合回避バンクアクセス順序は、まず、当初のバンクアクセス順序ＢＡ３において、最後にアクセスされるバンクＢＫ１が最初にアクセスされるようにする。すると、並べ替え後のロード・ストア命令ＬＳ２との間でバンク競合が生じる。よって、同様の並べ替えが繰り返される。最終的に、競合回避バンクアクセス順序ＢＡ１になる。このように、最後にアクセスされるバンクが最初にアクセスされるようにする処理を繰り返すことで、競合回避バンクアクセス順序が求められる。 Next, a sequence according to the present embodiment is shown in FIG. FIG. 14B shows a state in which the load / store instructions LS2 and LS3 are rearranged in the conflict avoidance bank access order. The conflict avoidance bank access order in the load / store instruction LS2 is such that the bank BK0 accessed last is accessed first in the initial bank access order BA2. Therefore, the bank access order BA1 is obtained. In the conflict avoidance bank access order in the load / store instruction LS3, the bank BK1 that is accessed last is first accessed in the initial bank access order BA3. Then, a bank conflict occurs with the rearranged load / store instruction LS2. Therefore, the same rearrangement is repeated. Finally, the contention avoidance bank access order BA1 is obtained. In this way, the contention avoidance bank access order is obtained by repeating the process of making the bank accessed last be accessed first.

次に、レジスタアクセス順序の並べ替えについて説明する。演算命令ＡＬ４は２つのロード・ストア命令に依存する。複数のロード・ストア命令に演算命令が依存する場合は、並べ替え制御部１１４４の動作が、図１０、図１１において説明した内容とは異なる。この場合、レジスタ管理フラグ６１６〜６２２により、依存するロード・ストア命令の実行状態が管理される。図１４（Ｃ）に、具体例が示される。 Next, rearrangement of the register access order will be described. The arithmetic instruction AL4 depends on two load / store instructions. When an operation instruction depends on a plurality of load / store instructions, the operation of the rearrangement control unit 1144 is different from the contents described in FIGS. In this case, the execution state of the dependent load / store instruction is managed by the register management flags 616 to 622. A specific example is shown in FIG.

図１４（Ｃ）では、ロード・ストア命令ＬＳ２を実行するスロット１０４１用のレジスタ管理フラグ６１８−０〜６１８−３と、ロード・ストア命令ＬＳ３を実行するスロット１０４２用のレジスタ管理フラグ６２０−０〜６２０−３とが示される。ここでは、レジスタ管理フラグ６１８−０は物理ベクトルレジスタ番号ｖｒ[６４]−[７１]に対する処理に、レジスタ管理フラグ６１８−１は物理ベクトルレジスタ番号ｖｒ[７２]−[７９]に対する処理に、レジスタ管理フラグ６１８−２は物理ベクトルレジスタ番号ｖｒ[８０]−[８７]に対する処理に、そして、レジスタ管理フラグ６１８−３は物理ベクトルレジスタ番号ｖｒ[８８]−[９５]に対する処理に、それぞれ対応する。また、レジスタ管理フラグ６２０−０は物理ベクトルレジスタ番号ｖｒ[０]−[７]に対する処理に、レジスタ管理フラグ６２０−１は物理ベクトルレジスタ番号ｖｒ[８]−[１５]に対する処理に、レジスタ管理フラグ６２０−２は物理ベクトルレジスタ番号ｖｒ[１６]−[２３]に対する処理に、そして、レジスタ管理フラグ６２０−３は物理ベクトルレジスタ番号ｖｒ[２４]−[３１]に対する処理に、それぞれ対応する。レジスタ管理フラグ６１８−０〜６１８−３、６２０−０〜６２０−３は、それぞれ対応する物理ベクトルレジスタ番号に対するデータ要素の書き込みが終了するサイクルで、レジスタ管理部６０４により「ＯＮ」が書き込まれる。たとえば、レジスタ管理フラグ６１８−３はサイクルＣ４で、レジスタ管理フラグ６１８−０はサイクルＣ５で、レジスタ管理フラグ６１８はサイクルＣ６で、そして、レジスタ管理フラグ６１８−２はサイクルＣ７で、それぞれ「ＯＮ」になる。また、レジスタ管理フラグ６２０−２はサイクルＣ５で、レジスタ管理フラグ６２０−３はサイクルＣ６で、レジスタ管理フラグ６２０−０はサイクルＣ７で、そして、レジスタ管理フラグ６２０−１はサイクルＣ８で、それぞれ「ＯＮ」になる。 In FIG. 14C, register management flags 618-0 to 618-3 for the slot 1041 for executing the load / store instruction LS2 and register management flags 620-0 to for the slot 1042 for executing the load / store instruction LS3. 620-3. Here, the register management flag 618-0 is used for processing for the physical vector register number vr [64]-[71], and the register management flag 618-1 is used for processing for the physical vector register number vr [72]-[79]. The management flag 618-2 corresponds to the processing for the physical vector register number vr [80]-[87], and the register management flag 618-3 corresponds to the processing for the physical vector register number vr [88]-[95]. . The register management flag 620-0 is used for processing for the physical vector register number vr [0]-[7], and the register management flag 620-1 is used for processing for the physical vector register number vr [8]-[15]. The flag 620-2 corresponds to the process for the physical vector register number vr [16]-[23], and the register management flag 620-3 corresponds to the process for the physical vector register number vr [24]-[31]. In the register management flags 618-0 to 618-3 and 620-0 to 620-3, “ON” is written by the register management unit 604 in a cycle in which writing of data elements to the corresponding physical vector register numbers is completed. For example, register management flag 618-3 is "ON" in cycle C4, register management flag 618-0 is in cycle C5, register management flag 618 is in cycle C6, and register management flag 618-2 is in cycle C7. become. The register management flag 620-2 is in cycle C5, the register management flag 620-3 is in cycle C6, the register management flag 620-0 is in cycle C7, and the register management flag 620-1 is in cycle C8. "ON".

一方、演算命令ＡＬ４の各サイクルでは、物理ベクトルレジスタ番号ｖｒ[０]とｖｒ［６４］、ｖｒ[８]とｖｒ［７２］、ｖｒ[１６]とｖｒ［８０］、・・・というように、「６４」間隔が離れた物理ベクトルレジスタ番号にアクセスされる。これらの物理ベクトルレジスタ番号のペアは、それぞれレジスタ管理フラグのペア６１８−０、６２０−０、ペア６１８−１、６２０−１、ペア６１８−２、６２０−２、及び、ペア６１８−３、６２０−３のいずれかに対応する。よって、レジスタ管理部６０４は、レジスタ管理フラグのペア６１８−０、６２０−０、ペア６１８−１、６２０−１、ペア６１８−２、６２０−２、及び、ペア６１８−３、６２０−３のうちいずれかにおいて、両方とも「ＯＮ」になったときに、演算命令ＡＬ４におけるレジスタアクセス順序を決定する。たとえば、サイクルＣ６でペア６１８−３、６２０−３が、サイクルＣ７でペア６１８−０、６２０−０とペア６１８−２、６２０−２が、そして、サイクルＣ８でペア６１８−１、６２０−１が、それぞれ両方とも「ＯＮ」になる。よって、このなかでもっとも早いサイクルＣ６で、レジスタ管理部６０４は、ペア６１８−３、６２０−３に対応する物理ベクトルレジスタ番号へのアクセスを開始するように、レジスタアクセス順序を並べ替える。そして、次のサイクルＣ７から、並べ替えられた演算命令ＡＬ４が開始される。このときの状態が、図１４（Ｂ）に示される。 On the other hand, in each cycle of the arithmetic instruction AL4, physical vector register numbers vr [0] and vr [64], vr [8] and vr [72], vr [16] and vr [80], and so on. , The physical vector register numbers separated by “64” intervals are accessed. These physical vector register number pairs include register management flag pairs 618-0 and 620-0, pairs 618-1 and 620-1, pairs 618-2 and 620-2, and pairs 618-3 and 620, respectively. -3. Therefore, the register management unit 604 includes register management flag pairs 618-0 and 620-0, pairs 618-1 and 620-1, pairs 618-2 and 620-2, and pairs 618-3 and 620-3. In either of them, when both are turned "ON", the register access order in the operation instruction AL4 is determined. For example, pair 618-3, 620-3 in cycle C6, pair 618-0, 620-0 and pair 618-2, 620-2 in cycle C7, and pair 618-1, 620-1 in cycle C8. Are both “ON”. Therefore, in the earliest cycle C6, the register management unit 604 rearranges the register access order so as to start access to the physical vector register numbers corresponding to the pairs 618-3 and 620-3. Then, the rearranged operation instruction AL4 is started from the next cycle C7. The state at this time is shown in FIG.

レジスタ管理部６０４は、並べ替えたレジスタアクセス順序に対応するベクトルレジスタ１１０のアドレスの生成を、並べ替えアドレス生成部１１４６に指示する信号を出力する。すると、並べ替えアドレス生成部１１４６は、アクセスすべきアドレスを生成して、演算命令ＡＬ４を実行するスロット１０４３に送る。また、レジスタ管理部６０４は、演算命令ＡＬ４の発行を指示する信号を、発行タイミング検出部６０６に伝送する。すると、発行タイミング検出部６０６は、これをデコーダ１０８に転送する。そして、デコーダ１０８は、演算命令ＡＬ４をデコードしてスロット１０４３に送る。これにより、スロット１０４３は、図１４（Ｂ）に示されるタイミングで、演算命令ＡＬ４を実行する。 The register management unit 604 outputs a signal that instructs the rearrangement address generation unit 1146 to generate an address of the vector register 110 corresponding to the rearranged register access order. Then, the rearrangement address generation unit 1146 generates an address to be accessed and sends it to the slot 1043 that executes the operation instruction AL4. In addition, the register management unit 604 transmits a signal instructing the issue of the arithmetic instruction AL4 to the issue timing detection unit 606. Then, the issue timing detection unit 606 transfers this to the decoder 108. Then, the decoder 108 decodes the operation instruction AL4 and sends it to the slot 1043. Accordingly, the slot 1043 executes the arithmetic instruction AL4 at the timing shown in FIG.

図１５は、レジスタアクセス順序の並べ替えの動作手順を示すフローチャート図である。 FIG. 15 is a flowchart showing an operation procedure for rearranging the register access order.

図１５に示す手順は、たとえば、１命令サイクル分の命令がフェッチされるごとに実行される。まず、デコーダ１０８が演算命令をデコードする（Ｓ１５００）。そして、依存関係検出部１１４２が、先行するロード・ストア命令との依存関係を検出する（Ｓ１５０２）。 The procedure shown in FIG. 15 is executed each time an instruction for one instruction cycle is fetched, for example. First, the decoder 108 decodes the operation instruction (S1500). Then, the dependency relationship detection unit 1142 detects the dependency relationship with the preceding load / store instruction (S1502).

依存関係が検出されない場合（Ｓ１５０４のＮｏ）、演算命令がベクトルパイプライン１０４で実行される（Ｓ１５２０）。一方、依存関係が検出された場合であって（Ｓ１５０４のＹｅｓ）、依存する先行命令が１つの場合（Ｓ１５０６のＹｅｓ）、アクセス順序制御部６０２は、先行するロード・ストア命令のバンクアクセス順序を参照し（Ｓ１５０８）、演算命令のレジスタアクセス順序を制御する（Ｓ１５１０）。たとえば、アクセス順序制御部６０２がレジスタアクセス順序を並べ替え、レジスタ管理部６０４が処理レジスタ管理フラグ６１６〜６２２に実行状態を書き込む。そして、演算命令が実行される（Ｓ１５２０）。 When the dependency relationship is not detected (No in S1504), the operation instruction is executed in the vector pipeline 104 (S1520). On the other hand, if a dependency relationship is detected (Yes in S1504) and there is one dependent preceding instruction (Yes in S1506), the access order control unit 602 determines the bank access order of the preceding load / store instruction. Reference is made (S1508), and the register access order of operation instructions is controlled (S1510). For example, the access order control unit 602 rearranges the register access order, and the register management unit 604 writes the execution state in the processing register management flags 616 to 622. Then, an arithmetic instruction is executed (S1520).

また、依存関係が検出された場合であって（Ｓ１５０４のＹｅｓ）、依存する先行命令が２つの場合（Ｓ１５０６のＮｏ、Ｓ１５１２のＹｅｓ）、レジスタ管理部６０４は、処理レジスタ管理フラグ６１６〜６２２により処理完了レジスタを監視し（Ｓ１５１４）、レジスタアクセス順序を決定する（Ｓ１５１６）。そして、レジスタ管理部６０４は、レジスタアクセスの完了状況を処理レジスタ管理フラグ６１６〜６２２に書き込んで管理する（Ｓ１５１８）。そして、演算命令が実行される（Ｓ１５２０）。 If the dependency relationship is detected (Yes in S1504) and there are two dependent preceding instructions (No in S1506, Yes in S1512), the register management unit 604 uses the processing register management flags 616 to 622. The process completion register is monitored (S1514), and the register access order is determined (S1516). Then, the register management unit 604 manages the register access completion status by writing it in the processing register management flags 616 to 622 (S1518). Then, an arithmetic instruction is executed (S1520).

このようにして、１つ、または２つのロード・ストア命令に演算命令が依存するときであっても、処理が破綻することなく、処理効率低下を回避できる。 In this way, even when the operation instruction depends on one or two load / store instructions, the processing efficiency can be avoided and the processing efficiency can be avoided.

以上の実施の形態をまとめると、次の付記のとおりである。 The above embodiment is summarized as follows.

（付記１）
メモリの複数のバンクに第１のバンクアクセス順序でアクセスする第１の処理部と、
前記第1の処理部のアクセスの開始に続いて第２のバンクアクセス順序で前記複数のバンクにアクセスを開始する第２の処理部と、
前記第１の処理部及び前記第２の処理部による前記複数のバンクへのアクセスが競合する場合に、前記第２のバンクアクセス順序を前記競合が生じない第３のバンクアクセス順序に並べ替えて前記第２の処理部を前記複数のバンクにアクセスさせる制御部とを備えることを特徴とするプロセッサ。 (Appendix 1)
A first processing unit that accesses a plurality of banks of the memory in a first bank access order;
A second processing unit that starts accessing the plurality of banks in a second bank access order following the start of access of the first processing unit;
When access to the plurality of banks by the first processing unit and the second processing unit competes, the second bank access order is rearranged to a third bank access order that does not cause the conflict. And a control unit that causes the second processing unit to access the plurality of banks.

（付記２）
付記１において、
前記制御部は、前記第３のバンクアクセス順序による前記第２の処理部の前記複数のバンクへのアクセスの開始タイミングを、前記第１の処理部の前記複数のバンクへのアクセスの開始タイミングから１サイクル後に制御することを特徴とするプロセッサ。 (Appendix 2)
In Appendix 1,
The control unit determines a start timing of access to the plurality of banks of the second processing unit according to the third bank access order from a start timing of access to the plurality of banks of the first processing unit. A processor which is controlled after one cycle.

（付記３）
付記１または２において、
前記第２の処理部は、前記第３のバンクアクセス順序により前記メモリにアクセスして読み出したデータを複数のレジスタに第１のレジスタアクセス順序でアクセスして書き込み、
前記複数のレジスタに第２のレジスタアクセス順序でアクセスし、前記書き込まれたデータを読み出す第３の処理部をさらに有し、
前記制御部は、前記第２のレジスタアクセス順序を、前記第１のレジスタアクセス順序による書き込みが終了したレジスタから順にアクセスするように制御することを特徴とするプロセッサ。 (Appendix 3)
In Appendix 1 or 2,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the first register access order,
A third processing unit that accesses the plurality of registers in a second register access order and reads the written data;
The control unit controls the second register access order so that the second register access order is accessed sequentially from a register for which writing in the first register access order has been completed.

（付記４）
付記３において、
前記制御部は、前記第２のバンクアクセス順序と前記第３のバンクアクセス順序とに基づいて前記第２のレジスタアクセス順序を並べ替えることを特徴とするプロセッサ。 (Appendix 4)
In Appendix 3,
The processor is characterized in that the second register access order is rearranged based on the second bank access order and the third bank access order.

（付記５）
付記１または２において、
前記第１の処理部は、前記第１のバンクアクセス順序により前記メモリにアクセスして読み出したデータを複数のレジスタに第１のレジスタアクセス順序でアクセスして書き込み、
前記第２の処理部は、前記第３のバンクアクセス順序により前記メモリにアクセスして読み出したデータを前記複数のレジスタに第２のレジスタアクセス順序でアクセスして書き込み、
前記複数のレジスタに第３のレジスタアクセス順序でアクセスし、前記第1及び第２の処理部により書き込まれたデータを読出す第３の処理部をさらに有し、
前記制御部は、前記第３のレジスタアクセス順序を、前記第1及び第２の処理部による書き込みが終了したレジスタから順にアクセスするように制御することを特徴とするプロセッサ。 (Appendix 5)
In Appendix 1 or 2,
The first processing unit accesses and writes data read by accessing the memory in the first bank access order to a plurality of registers in the first register access order,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the second register access order,
A third processing unit for accessing the plurality of registers in a third register access order and reading data written by the first and second processing units;
The processor is configured to control the third register access order so that the third register access order is accessed sequentially from a register in which writing by the first and second processing units is completed.

（付記６）
付記５において、
前記制御部は、前記第１、第２の処理部それぞれの前記複数のレジスタへのアクセスを記録し、当該記録に基づいて前記第３のレジスタアクセス順序を決定するプロセッサ。 (Appendix 6)
In Appendix 5,
The control unit is a processor that records accesses to the plurality of registers of the first and second processing units and determines the third register access order based on the records.

（付記７）
付記１乃至６のいずれかにおいて、
前記第３のバンクアクセス順序では、前記第２のバンクアクセス順序で最後にアクセスされるバンクが他の前記バンクより先にアクセスされるプロセッサ。 (Appendix 7)
In any one of supplementary notes 1 to 6,
In the third bank access order, the processor that is accessed last in the second bank access order is accessed earlier than the other banks.

（付記８）
複数のレジスタに第１のレジスタアクセス順序でアクセスしてデータを書き込む第１の処理部と、
前記複数のレジスタに第２のレジスタアクセス順序でアクセスして前記書き込まれたデータを読出す第２の処理部と、
前記第２のレジスタアクセス順序を、前記複数のレジスタのうち、読み出すデータの前記第１の処理部による書き込みが終了したレジスタから順にアクセスされる第３のレジスタアクセス順序に並べ替えて、前記第２の処理部に前記複数のレジスタへアクセスさせる制御部とを有するプロセッサ。 (Appendix 8)
A first processing unit that accesses a plurality of registers in a first register access order and writes data;
A second processing unit that accesses the plurality of registers in a second register access order and reads the written data;
The second register access order is rearranged to a third register access order in which the data to be read is sequentially accessed from the register in which the writing of the read data by the first processing unit is completed, among the plurality of registers. And a control unit that causes the processing unit to access the plurality of registers.

（付記９）
複数のレジスタに第１のレジスタアクセス順序でアクセスしてデータを書き込む第１の処理部と、
前記複数のレジスタに第２のレジスタアクセス順序でアクセスしてデータを書き込む第２の処理部と、
前記複数のレジスタに第３のレジスタアクセス順序でアクセスして前記第１、第２の処理部により書き込まれたデータを読出す第３の処理部と、
前記第３のレジスタアクセス順序を、前記複数のレジスタのうち、読み出すデータの前記第１、第２の処理部による書き込みが終了したレジスタから順にアクセスするように制御する制御部とを有するプロセッサ。 (Appendix 9)
A first processing unit that accesses a plurality of registers in a first register access order and writes data;
A second processing unit for writing data by accessing the plurality of registers in a second register access order;
A third processing unit that accesses the plurality of registers in a third register access order and reads data written by the first and second processing units;
And a control unit that controls the third register access order so as to sequentially access the data to be read from the register in which the writing by the first and second processing units of the plurality of registers is completed.

（付記１０）
付記９において、
前記制御部は、前記第１、第２の処理部それぞれの前記複数のレジスタへのアクセスを記録し、当該記録に基づいて前記第３のレジスタアクセス順序を決定するプロセッサ。 (Appendix 10)
In Appendix 9,
The control unit is a processor that records accesses to the plurality of registers of the first and second processing units and determines the third register access order based on the records.

（付記１１）
メモリの複数のバンクに第１のバンクアクセス順序でアクセスする第１の処理部と、前記第1の処理部のアクセスの開始に続いて第２のバンクアクセス順序で前記複数のバンクにアクセスを開始する第２の処理部とを有するプロセッサの制御方法であって、
前記第１の処理部及び前記第２の処理部による前記複数のバンクへのアクセスが競合する場合に、前記第２のバンクアクセス順序を前記競合が生じない第３のバンクアクセス順序に並べ替えて前記第２の処理部を前記複数のバンクにアクセスさせる、
プロセッサの制御方法
（付記１２）
付記１１において、
前記第２の処理部が、前記第３のバンクアクセス順序により前記メモリにアクセスして読み出したデータを複数のレジスタに第１のレジスタアクセス順序でアクセスして書き込み、
前記プロセッサが、前記複数のレジスタに第２のレジスタアクセス順序でアクセスして前記書き込まれたデータを読み出す第３の処理部をさらに有し、
前記第２のレジスタアクセス順序を、前記第１のレジスタアクセス順序による書き込みが終了したレジスタから順にアクセスするように制御する、
プロセッサの制御方法。 (Appendix 11)
A first processing unit that accesses a plurality of banks of the memory in a first bank access order, and an access to the plurality of banks in a second bank access order is started following the start of access of the first processing unit. And a second processing unit for controlling a processor,
When access to the plurality of banks by the first processing unit and the second processing unit competes, the second bank access order is rearranged to a third bank access order that does not cause the conflict. Allowing the second processing unit to access the plurality of banks;
Processor control method (Appendix 12)
In Appendix 11,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the first register access order,
The processor further includes a third processing unit that accesses the plurality of registers in a second register access order and reads the written data;
The second register access order is controlled so as to be accessed sequentially from the register in which writing according to the first register access order is completed.
How to control the processor.

（付記１３）
付記１１において、
前記第１の処理部が、前記第1のバンクアクセス順序により前記メモリにアクセスして読み出したデータを複数のレジスタに第１のレジスタアクセス順序でアクセスして書き込み、
前記第２の処理部が、前記第３のバンクアクセス順序により前記メモリにアクセスして読み出したデータを前記複数のレジスタに第２のレジスタアクセス順序でアクセスして書き込み、
前記プロセッサが、前記複数のレジスタに第３のレジスタアクセス順序でアクセスして前記第1及び第２の処理部により書き込まれたデータを読出す第３の処理部をさらに有し、
前記第３のレジスタアクセス順序を、前記第1及び第２の処理部による書き込みが終了したレジスタから順にアクセスするように制御する、
プロセッサの制御方法。 (Appendix 13)
In Appendix 11,
The first processing unit accesses and writes the data read by accessing the memory in the first bank access order to a plurality of registers in the first register access order,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the second register access order;
The processor further includes a third processing unit that accesses the plurality of registers in a third register access order and reads data written by the first and second processing units;
The third register access order is controlled so as to be accessed in order from the register in which writing by the first and second processing units is completed.
How to control the processor.

（付記１４）
複数のレジスタに第１のレジスタアクセス順序でアクセスしてデータを書き込む第１の処理部と、前記複数のレジスタに第２のレジスタアクセス順序でアクセスして前記書き込まれたデータを読出す第２の処理部とを有するプロセッサの制御方法であって、
前記第２のレジスタアクセス順序を、前記複数のレジスタのうち、読み出すデータの前記第１の処理部による書き込みが終了したレジスタから順にアクセスされるような第３のレジスタアクセス順序に並べ替えて、前記第２の処理部に前記複数のレジスタへのアクセスを実行させる、
プロセッサの制御方法。 (Appendix 14)
A first processing unit that accesses a plurality of registers in a first register access order and writes data; and a second processing unit that accesses the plurality of registers in a second register access order and reads the written data. A control method of a processor having a processing unit,
The second register access order is rearranged in a third register access order in which the data to be read is sequentially accessed from the register in which writing by the first processing unit is completed, among the plurality of registers, Causing the second processing unit to access the plurality of registers;
How to control the processor.

（付記１５）
複数のレジスタに第１のレジスタアクセス順序でアクセスしてデータを書き込む第１の処理部と、前記複数のレジスタに第２のレジスタアクセス順序でアクセスしてデータを書き込む第２の処理部と、前記複数のレジスタに第３のレジスタアクセス順序でアクセスして前記第１、第２の処理部により書き込まれたデータを読出す第３の処理部とを有するプロセッサの制御方法であって、
前記第３のレジスタアクセス順序を、前記複数のレジスタのうち、読み出すデータの前記第１、第２の処理部による書き込みが終了したレジスタから順にアクセスされるような第４のレジスタアクセス順序に並べ替えて、前記第３の処理部に前記複数のレジスタへのアクセスを実行させる、
プロセッサの制御方法。 (Appendix 15)
A first processing unit that accesses a plurality of registers in a first register access order and writes data; a second processing unit that accesses the plurality of registers in a second register access order and writes data; And a third processing unit that accesses a plurality of registers in a third register access order and reads data written by the first and second processing units.
The third register access order is rearranged into a fourth register access order in which the data to be read is sequentially accessed from the register in which the writing of the read data by the first and second processing units is completed among the plurality of registers. And causing the third processing unit to access the plurality of registers,
How to control the processor.

１００：ベクトルプロセッサ、１０６：データメモリ、ＢＫ０〜ＢＫ３：バンク、
１１０：ベクトルレジスタ、１１４：制御部、１０４０〜１０４３：スロット、
１１４２：依存関係検出部、１１４４：並べ替え制御部 100: Vector processor, 106: Data memory, BK0 to BK3: Bank,
110: Vector register 114: Control unit 1040-1043: Slot
1142: Dependency detection unit, 1144: Rearrangement control unit

Claims

A first processing unit that accesses a plurality of banks of the memory in a first bank access order;
A second processing unit that starts accessing the plurality of banks in a second bank access order following the start of access of the first processing unit;
When access to the plurality of banks by the first processing unit and the second processing unit competes, the second bank access order is rearranged to a third bank access order that does not cause the conflict. And a control unit that causes the second processing unit to access the plurality of banks.

In claim 1,
The control unit determines a start timing of access to the plurality of banks of the second processing unit according to the third bank access order from a start timing of access to the plurality of banks of the first processing unit. A processor which is controlled after one cycle.

In claim 1 or 2,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the first register access order,
A third processing unit that accesses the plurality of registers in a second register access order and reads the written data;
The control unit controls the second register access order so that the second register access order is accessed sequentially from a register for which writing in the first register access order has been completed.

In claim 3,
The processor is characterized in that the second register access order is rearranged based on the second bank access order and the third bank access order.

In claim 1 or 2,
The first processing unit accesses and writes data read by accessing the memory in the first bank access order to a plurality of registers in the first register access order,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the second register access order,
A third processing unit for accessing the plurality of registers in a third register access order and reading data written by the first and second processing units;
The processor is configured to control the third register access order so that the third register access order is accessed sequentially from a register in which writing by the first and second processing units is completed.

A first processing unit that accesses a plurality of registers in a first register access order and writes data;
A second processing unit that accesses the plurality of registers in a second register access order and reads the written data;
The second register access order is rearranged to a third register access order in which the data to be read is sequentially accessed from the register in which the writing of the read data by the first processing unit is completed, among the plurality of registers. And a control unit that causes the processing unit to access the plurality of registers.

A first processing unit that accesses a plurality of registers in a first register access order and writes data;
A second processing unit for writing data by accessing the plurality of registers in a second register access order;
A third processing unit that accesses the plurality of registers in a third register access order and reads data written by the first and second processing units;
And a control unit that controls the third register access order so as to sequentially access the data to be read from the register in which the writing by the first and second processing units of the plurality of registers is completed.

A first processing unit that accesses a plurality of banks of the memory in a first bank access order, and an access to the plurality of banks in a second bank access order is started following the start of access of the first processing unit. And a second processing unit for controlling a processor,
When access to the plurality of banks by the first processing unit and the second processing unit competes, the second bank access order is rearranged to a third bank access order that does not cause the conflict. Allowing the second processing unit to access the plurality of banks;
How to control the processor.

In claim 8,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the first register access order,
The processor further includes a third processing unit that accesses the plurality of registers in a second register access order and reads the written data;
The second register access order is controlled so as to be accessed sequentially from the register in which writing according to the first register access order is completed.
How to control the processor.

In claim 8,
The first processing unit accesses and writes the data read by accessing the memory in the first bank access order to a plurality of registers in the first register access order,
The second processing unit accesses and writes the data read by accessing the memory in the third bank access order to the plurality of registers in the second register access order;
The processor further includes a third processing unit that accesses the plurality of registers in a third register access order and reads data written by the first and second processing units;
The third register access order is controlled so as to be accessed in order from the register in which writing by the first and second processing units is completed.
How to control the processor.