JPS58149566A

JPS58149566A - Processing system of optimization for parallel execution of vector operation

Info

Publication number: JPS58149566A
Application number: JP3119482A
Authority: JP
Inventors: Hideo Takashima; 高嶋　秀夫; Morie Sagawa; 佐川　守江; Shinya Miura; 信也三浦; Kazuhiko Suzuki; 一彦鈴木; Hideo Wada; 和田　英穂
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-02-27
Filing date: 1982-02-27
Publication date: 1983-09-05
Also published as: JPS6319905B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（Ａ）　　発明の技術分野本発明は、ベクトル演算の並列実行最適化処理方式、特
に複数の並列演算部をそなえたベクトル処理プロセッサ
に対して、与えられたソース・プログラムから目的プロ
グラムを生成して供給するコンパイラにおいて、上記ｅ
６１ＩＩＬｔ列演算部に可能な限ぎり空き状態が生じる
ことの少ないように、並列演算部を割付けてゆくベクト
ル演算の並列実行最適化処理方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION (A) Technical Field of the Invention The present invention relates to a parallel execution optimization processing method for vector operations, and in particular to a vector processing processor equipped with a plurality of parallel operation units. In a compiler that generates and supplies a target program from
The present invention relates to a parallel execution optimization processing method for vector operations in which parallel operation units are allocated to minimize the occurrence of empty states in the 61IILt column operation units.

（Ｂ）　　技術の背景と問題点例えは、第１図ＣＡ）に示す如（、ベクトルＡに属する
エレメントａ１．ａ２．・・・・・とベクトルＢに属す
るエレメントｂ１．ｂ２．・・・・・・との各エレメン
ト相互ヲ加算して、エレメントＣ，，Ｃ２，・・・・・
をもつベクトルＣを生成するような、ベクトル命令を実
行するベクトル処理プロセッサが存在している。第１図
ＦＡ）図示の場合、第１番目のエレメント相互の加算を
行なうか否かをマスク・エレメントｍ＋、　ｍｚ。(B) Technical background and problems An example is shown in Figure 1 CA) (elements a1, a2, ... belonging to vector A and elements b1, b2, ... belonging to vector B). By adding each element with..., element C,,C2,...
There are vector processing processors that execute vector instructions such as generating a vector C with . (FIG. 1 FA) In the illustrated case, mask elements m+, mz determine whether or not to perform mutual addition of the first elements.

・・・にて指示するようにされており、第１図（Ｂ）に
−膜化して示す如き処理が行なわれる。. . , and the processing shown in FIG. 1(B) as a film is performed.

上記の如き処理を行なうベクトル処理プロセッサを有す
るデータ処理システムは、−実施例として第２図図示の
如きシステム構成をもっている。A data processing system having a vector processing processor that performs the above processing has a system configuration as shown in FIG. 2 as an embodiment.

図中の符号１は主記憶装置、２はメモリ制御装置、３は
ベクトル処理プロセッサ、４はチャネル・プロセッサ、
５は大記憶装置、６はスカラ処理回路部、７はベクトル
処理回路部、８−０．　８−１゜・・・・・・は夫々浮
動小数点データ・レジスタ、９−０゜９−１．・・・・
・は夫々複数個のデータ（エレメント・データ）を格納
し得るベクトル・レジスタ、１０−０．１０−１．・・
・・・・は夫々複数個のマスク・データ（マスク・エレ
メント・データ）を格納し得るマスク・レジスタ、１１
はベクトル長レジスタであって各ベクトル・レジスタに
格納されるエレメントの個数情報がセットされるもの、
１２−０゜１２−１は夫々メモリ・アクセス・パイプラ
イン、１３は加減算パイプライン、１４は乗算処理パイ
プライン、１５は除算処理パイプライン、１６はマスク
処理パイプラインを表わしている。In the figure, 1 is a main storage device, 2 is a memory control device, 3 is a vector processing processor, 4 is a channel processor,
5 is a large storage device, 6 is a scalar processing circuit section, 7 is a vector processing circuit section, 8-0. 8-1° . . . are floating point data registers, 9-0° 9-1.・・・・・・
10-0.10-1. are vector registers each capable of storing a plurality of pieces of data (element data);・・・
. . . are mask registers each capable of storing a plurality of pieces of mask data (mask element data); 11;
is a vector length register in which information on the number of elements stored in each vector register is set;
Reference numerals 12-0 and 12-1 represent memory access pipelines, 13 an addition/subtraction pipeline, 14 a multiplication processing pipeline, 15 a division processing pipeline, and 16 a mask processing pipeline.

上記の如きベクトル処理プロセッサが処理を実行するに
当って、当該プロセッサが実行するに適した形に、与え
られたソース・プログラムをコンパイルし目的プログラ
ムを生成することが行なわれる。当該コンパイルを行な
うコンパイラの構成は第３図を参照して後述されるが、
当該コンパイラによるコンパイル処理に当って、上記ベ
クトル処理プロセッサが現実に処理を実行する際に、各
パイプライン演算部の空き状態が可能な限ぎり少なくな
るように、コンパイルすることが望まれる。When a vector processing processor as described above executes a process, a given source program is compiled into a form suitable for execution by the processor to generate a target program. The configuration of the compiler that performs the compilation will be described later with reference to FIG.
In compiling processing by the compiler, it is desirable to compile in such a way that when the vector processing processor actually executes processing, the free state of each pipeline operation unit is minimized as much as possible.

（ｑ　発明の目的と構成本発明は、上記の点を解決することを目的としており、
（ｉ）与えられた中間コードにもとづいて、各部φ・・
並列演算部が使用される態様を調べ、（ｉｉ）各部・・
・並り・１演算部がもつ個有重み情報と当該並列演算部
が使用可能となる態様に関連したパス重み情報とを抽出
し、（山）パス車み情報にもとづいて順次並列演算部の
使用態様を割付けてゆくようにして、最適化された目的
プログラムを生成するようにすることを特徴としている
１、以下図面を参照しつつ説明する。(q Purpose and structure of the invention The purpose of the present invention is to solve the above points,
(i) Based on the given intermediate code, each part φ...
Examine the manner in which the parallel processing unit is used, and (ii) each part...
- Extract the unique weight information of one calculation unit and the path weight information related to the mode in which the parallel calculation unit can be used, and sequentially calculate the parallel calculation units based on the (mountain) path information. A feature of the present invention is that an optimized target program is generated by allocating usage patterns.1, which will be explained below with reference to the drawings.

ＩＤ）　　発明の実施例第３図は本発明に用いるコンパイラの一実施例構成、第
４図は本発明においてソース・プログラムを中間コード
に移してゆく態様を説明する説明図、第５図はソース・
プログラムをベクトル化してゆく態様を説明する説明図
、第６図ないし第１０図は本発明による処理を説明する
説明図、第１１図は中間コード最適化部における本発明
に直接関連する部分の一実施例フローチャート、第１２
図はパイプライン演算部における処理に関連した説明図
を示す。ID) Embodiment of the Invention FIG. 3 shows the configuration of an embodiment of a compiler used in the present invention, FIG. 4 is an explanatory diagram illustrating the manner in which a source program is transferred to intermediate code in the present invention, and FIG. 5 shows a source program.・
An explanatory diagram illustrating how a program is vectorized. Figures 6 to 10 are explanatory diagrams illustrating the processing according to the present invention. Figure 11 is a part of the intermediate code optimization section that is directly related to the present invention. Example flowchart, 12th
The figure shows an explanatory diagram related to processing in the pipeline calculation section.

第３図において、１７は大記憶装置に格納されているソ
ース・プログラム、１８はコンパイラ、１９はコンパイ
ルされて大記憶装置上に格納される目的プログラム、２
０はソース解釈部、２１は記憶域割付は部、２２はベク
トル化部、２３は中間コード最適化部、２４はレジスタ
使用決定部、２５は目的プログラム出力部を表わしてい
る。In FIG. 3, 17 is a source program stored in the large storage device, 18 is a compiler, 19 is a target program that is compiled and stored in the large storage device, and 2
0 represents a source interpretation section, 21 a storage allocation section, 22 a vectorization section, 23 an intermediate code optimization section, 24 a register use determination section, and 25 a target program output section.

コンパイラ１８は、大記憶装置からソース・プログラム
１７を取込んで、所望の目的プログラム１９を生成する
。このとき図示の各部は次のような処理を行う。A compiler 18 takes in a source program 17 from a large storage device and generates a desired target program 19. At this time, each of the illustrated units performs the following processing.

即ち、ソース解釈部２０はソース拳プログラム１７を大
記憶装置から嘔込み、文解釈を行って中間コード（テキ
スト）に展開する１、例えばソース・プログラムが第４
図図示左側の如き場合に図示右側に示す如き弔問コード
に展開する。記憶域割付は部２１はプログラム内に出現
する各棟データに対応して記憶域内番地を割当てる。ベ
クトル化部２２は、プログラム中のループ構造を検量し
、並列実行可能部分を認識し、第５図図示の如く中間コ
ード変・更を行う。中間コード最適化部２３は、中間コ
ードのレベルで、第２図図示の如きベクトル処理プロセ
ッサを有効に利用するための最適化を施す。レジスタ使
用決定部２４は、中間コードに現われたデータに対して
、ベクトル処理プロセッサ上の資源（レジスタ）を割当
てる。そして目的プログラム出力部２５は機械命令語を
大配憶装置へ出力しかつ命令語レベルでの最適化を行う
。That is, the source interpreter 20 loads the source program 17 from the large storage device, performs sentence interpretation, and develops it into intermediate code (text).
In the case shown on the left side of the figure, the condolence code is developed as shown on the right side of the figure. The storage area allocation unit 21 allocates addresses in the storage area corresponding to each piece of data that appears in the program. The vectorization unit 22 examines the loop structure in the program, recognizes parts that can be executed in parallel, and changes/changes the intermediate code as shown in FIG. The intermediate code optimization section 23 performs optimization at the intermediate code level to effectively utilize a vector processing processor as shown in FIG. The register use determining unit 24 allocates resources (registers) on the vector processing processor to data appearing in the intermediate code. Then, the target program output unit 25 outputs the machine instruction words to the large storage device and performs optimization at the instruction word level.

ベクトル処理プロセッサを稼動させるためのコンパイラ
は第３図図示の如き構成をもっており、ソース・プログ
ラムをコンパイルしてゆく。この間、上述の中間コード
最適化部は、後述する如く、ベクトル処理プロセッサに
おける処理が現実に行われる際に上述のパイプライン演
算部に空き状態が生じることの少ないように、与えられ
た中間コード（テキスト）の順序を入れ替える処理を行
なう。A compiler for operating a vector processing processor has a configuration as shown in FIG. 3, and compiles a source program. During this time, the above-mentioned intermediate code optimization section, as described later, uses the given intermediate code ( Performs processing to rearrange the order of text).

即ち、令弟６図（Ａ）図示の如きソース・プログラムが
与えられて、第３図図示の中間コード最適化部２３に対
して、第６図（Ｂ）図示の如き中間コードが供給されて
きたとするっ該第６図ＣＢ）図示の中間コードにもとづ
いて、上述のパイプライン演算部１２ないし１５が割伺
けられてゆくと、第８図図示の如き態様となり、各パイ
プライン演算部の使用時に非所望な空き状態が発生する
形となるっこの点を解決するために、第６図（Ｂ）図示
の如き中間コードが与えられると、当該中間コードにも
とづいて第７図図示の如き「ツリー」をつくってゆく。That is, a source program as shown in FIG. 6(A) is given, and an intermediate code as shown in FIG. 6(B) is supplied to the intermediate code optimization unit 23 shown in FIG. 3. Suppose that if the above-mentioned pipeline calculation units 12 to 15 are divided based on the intermediate code shown in FIG. 6 (CB), the configuration as shown in FIG. In order to solve this problem that causes an undesired empty state during use, when an intermediate code as shown in FIG. 6(B) is given, the code as shown in FIG. Create a "tree".

即ち、コード「Ｖｉｌ−・・・・・・」や「Ｖｔ２−・
・・・」に対応してロードが行われることから、第７図
図示ロードに対応した形で○印で示す節（パイプライン
演算部の１つによる処理に対応する）を用意する。そし
てコード「ｖｔ３−・・・・」に対応して加算が行われ
ることから、加算に対応した形で節を用意する。以下同
様Ｋ　Ｌ、て、第７図図示の如きツリーがつくられる。That is, the code "Vil-..." or "Vt2-..."
. . ”, therefore, a section marked with a circle (corresponding to processing by one of the pipeline calculation units) is prepared in a manner corresponding to the load shown in FIG. 7. Since addition is performed in response to the code "vt3-...", a clause is prepared in a form corresponding to the addition. In the same manner, a tree as shown in FIG. 7 is created.

なお言うまでもなく、当該ツリーは、テーブル上に展開
されてつくられろ。Needless to say, the tree must be expanded and created on a table.

このとき、各節に対して右下隅に示す如く各節に対応し
て個有の重み（個有重み）情報が与えられ、また各節に
対して右下隅に示す如くパス恵み情報が与えられる。こ
れらの重み情報は、次の如くして与えられる。即ち、個
有重み情報は、演算実行時間などから、例えばロード／
ストア用パイプライン演算部、または加減算用パイプラ
イン演算部または乗算用パイプライン演算部に対応して
重み「１」が与えられ、除算用パイプライン演算部に対
応して重み「３」が与えられる。第７図図示のｖｔ６．
　ｖｔｓ、　ｖｔ】ｏに対応する節に対して、個有重み
情報として重み「３」が与えられているが、これは除算
」が３回行なわれることにもとづいている。At this time, unique weight (individual weight) information is given to each node as shown in the lower right corner, and path grace information is given to each node as shown in the lower right corner. . These weight information are given as follows. In other words, the unique weight information is determined based on the calculation execution time, for example, when loading/unloading.
A weight of "1" is assigned to the store pipeline operation section, an addition/subtraction pipeline operation section, or a multiplication pipeline operation section, and a weight of "3" is assigned to the division pipeline operation section. . VT6.
A weight of "3" is given as unique weight information to the clause corresponding to vts, vt]o, and this is based on the fact that "division" is performed three times.

このようにして各節に対して個有重４情報が与えられる
と、図示上方にパスをもたない節に対して、パス重み情
報として当該節の個有ｌみを与える。図示の場合、Ｘ（
→やＺ　ＨやＹ　（＊）に対応する節に対してパス重み
情報として重み「１」が与えられる。次いで、例えばＸ
（→に対応する節に対してパスをもつ所のｖｔａに対応
する節が抽出され、当該節に対してパス重み情報が仮決
定される。即ち、Ｘ（→節のパス重み「１」とｖｔａ節
の個有重み「１」とを加算して、Ｖｔ３節のパス重みを
「２」と仮決定する。同様に図示ｖｔｌｔ、　Ｖｔ１３
節やＶｔ６．　ｖｔｓ、　ＶｔｌＯ節に対しても同様に
パス重みが仮決定される。図示ｖｔａ節の場合には、ｖ
ｔｎ、　ｖｔｔａ節に対してもパスをもっており、これ
に対応するパス重みが「４」とされることから、より大
きいパス重みをもつ側がＶｔ３節についてのパス重みと
して決定される。When the unique weight 4 information is given to each node in this way, the unique value 1 of the node is given as path weight information to a node that does not have a path in the upper part of the diagram. In the case shown, X(
A weight of "1" is given as path weight information to the clauses corresponding to →, Z H, and Y (*). Then, for example,
The node corresponding to vta that has a path to the node corresponding to (→) is extracted, and the path weight information is tentatively determined for the node. By adding the unique weight "1" of the vta node, the path weight of the Vt3 node is tentatively determined to be "2".
Clause and Vt6. Path weights are similarly tentatively determined for the vts and VtlO nodes. In the case of the illustrated vta clause, v
Since it also has a path for the tn and vtta nodes, and the corresponding path weight is set to "4," the side with the larger path weight is determined as the path weight for the Vt3 node.

第７図に示す各節のパス重みは上述の如くして決定され
ている。The path weights of each node shown in FIG. 7 are determined as described above.

第７図図示の如くパス重みが決定されると、第９図に示
す如（、パス重みの大ぎいものから順に各パイプライン
演算部の使用を割付けてゆくようにする。即ち例えばＶ
ｔ１節とＶｔ２節とを夫々ロード／ストア用パイプライ
ン演算都に割付けると、図示時刻ｔ！において、Ａ（→
とＢ（（イ）との各最初のエレメントについてのロード
が終了し４　ｖｔ３節に対応する加減算用のパイプライ
ン演算部を当該時刻ｔ１から発動することが可能となる
。第９図は、このようにして順次各パイプライン演算部
の使用態様を割付けて行ったものであり、第８図図示の
場合にくらべて約６５％程度処理速度が向上されること
が判る。第１０図は、第９図に示す如く各中間コードの
順序を入れ替えた結果を示している。When the path weights are determined as shown in FIG. 7, the usage of each pipeline operation unit is assigned in order from the path weight to the one with the largest path weight, as shown in FIG.
When the t1 node and the Vt2 node are assigned to the load/store pipeline operation capitals, the illustrated time t! In, A(→
After the loading of the first elements of and B((a) is completed, it becomes possible to activate the pipeline calculation unit for addition and subtraction corresponding to clause 4vt3 from the time t1. In this way, the usage mode of each pipeline calculation unit is sequentially assigned, and it can be seen that the processing speed is improved by about 65% compared to the case shown in FIG. 8. As shown in FIG. 9, the result of rearranging the order of each intermediate code is shown.

第１１図は、上述の処理を実行すべく、第３図図示の中
間コード最適化部における本発明に直接関連する部分の
一実施例フローチャートを示している。FIG. 11 shows a flowchart of an embodiment of the portion directly related to the present invention in the intermediate code optimization section shown in FIG. 3 in order to execute the above-described processing.

図示のテーク・ディペンデンジの把握部２６゜ツリー構
造の作成部２７　節の重み計算部２８における処理は、
第６図、第７図に関連して述べた説明に対応しており、
各節のバス重みが抽出される。次いで、ベクトル演算頭
圧決定処理に入り、パイプラインの決定部３０．ベクト
ル演算の決定部３１．ベクトル演算の出力部３２によっ
て、第９図、第１０図に関連して述べた如く、各パイプ
ライン演算部の使用態様が決定されてゆく。The processes in the illustrated take/dependency understanding unit 26, tree structure creation unit 27, and node weight calculation unit 28 are as follows:
It corresponds to the explanation given in relation to Figures 6 and 7,
The bus weight of each node is extracted. Next, a vector calculation head pressure determination process is entered, and the pipeline determination unit 30. Vector calculation determining unit 31. The vector operation output unit 32 determines how each pipeline operation unit is used, as described in connection with FIGS. 9 and 10.

パイプラインの決定部３０は次の如き処理機能をもって
いる。即ち、次に出力すべき演算がどのパイプライン演
算部を使用するのが、より好ましいかを決定する。この
ためには、２つの観点が調べられる。即ち、成る１つの
演算がいわば始動レディ状態にある時点と、当該演算が
使用されるであろうパイプライン演算部が空き状態とな
る時点とが調べられる。そして、両者時点のより遅い時
点をもって、当該パイプライン演算部に対する「命令発
信可能位置」とされる。このような命令発信可能位置が
各パイプライン演算部対応に求められ、このうちで最も
早い時点において「命令発信可能位置」をもっているパ
イプライン演算部に対して、当該パイプライン演算部を
使用すべく決定されてゆく。The pipeline determining unit 30 has the following processing functions. That is, it is determined which pipeline operation unit is more preferable to use for the operation to be output next. To this end, two aspects are examined. That is, the point in time when one operation is in a so-called ready-to-start state and the point in time when the pipeline operation section in which the operation will be used becomes idle are checked. Then, the later point in time between the two points is determined as the "instruction transmittable position" for the pipeline operation unit. Such a position where an instruction can be issued is determined for each pipeline operation unit, and the pipeline operation unit that has the “instruction enable position” at the earliest point in time is determined to use the pipeline operation unit. It will be decided.

第１１図図示のベクトル演算の決定部３１は、上記パイ
プラインの決定部３０において次に使用するよう決定さ
れたパイプライン演算部について、１１− 当該パイプライン演算部な使用することのできる複数個
の演算のうち、どの演算を演算させるかを決定する。こ
の決定には、次の如きアルゴリズムを利用するようにさ
れる。即ち、「当該パイプライン演算部における上記命
令発信可能位置よりも以前に始動レディ状態にある所の
複数の演算を抽出し、その中で上述のパづ重み情報の最
も大きい演算を決定する」。The vector calculation determining unit 31 shown in FIG. Determine which operation to perform among the operations in . The following algorithm is used for this determination. In other words, ``a plurality of operations in the pipeline operation section that are in a start ready state before the above-mentioned instruction issuing position are extracted, and among them, the operation with the largest padding weight information is determined.''

第１１図図示のベクトル演算の出力部３２は、上述の如
く、成るパイプライン演算部に対応して成る１つの演算
による使用が決定されると、当該演算に対応する所の値
、即ち第１２図図示の（１）命令発信開始位置、（ｉｌ
　命令発信終了位置、（ｉｉｌ）命令実行開始位置（又
は立上げ完了位置）　、（＋Ｖ）命令実行開始位置をシ
ミュレートする。そして当該演算が１つのパ・１プライ
ン演算部を使用することになったことに伴なって生じる
各パイプライン毎の命令発信町１止位置を設定し直し、
残余の演算に対するパイプライン演算部の欧州を決定す
るための準備を行なう。As described above, when it is determined that the vector operation output section 32 shown in FIG. (1) Command transmission start position, (il
The command transmission end position, (iii) command execution start position (or start-up completion position), and (+V) command execution start position are simulated. Then, the instruction transmission stop position for each pipeline, which occurs due to the fact that the calculation uses one pipeline calculation unit, is reset,
Preparations are made to determine the Europe of the pipeline operation unit for the remaining operations.

１２− 停）発明の詳細な説明した如く、本発明によれば、ベクトル処理プロセ
ッサが処理を実行してゆく際に各並列演算部に非所望な
空き状態が生じることが少なくなるよう、中間コードの
処理順序を変換しつつ適正な形でコンパイルすることが
可能となる。12-1) As described in detail, according to the present invention, intermediate code is created so that undesirable vacant states are less likely to occur in each parallel processing unit when a vector processing processor executes processing. It becomes possible to compile in an appropriate form while converting the processing order of

[Brief explanation of drawings]

第１図はベクトル命令に対応した処理を概念的に説明す
る説明図、第２図は本発明にいうベクトル処理プロセッ
サを有する処理システムの一実施例、第３図は本発明に
用いるコンパイラの一実施例構成、第４図はソース・プ
ログラムを中間コードに移してゆく態様を説明する説明
図、第５図はソース・プログラムをベクトル化してゆ（
態様を説明する説明図、第６図ないし第１０図は本発明
による処理を説明する説明図、第１１図は中間コード最
適化部における本発明に直接関連する部分の一実施例フ
ローチャート、第１２図はパイプライン演算部における
処理に関連した説明図を示す。図中、１は主記憶装置、２はメモリ制御装置、３はベク
トル処理プロセッサ、４はチャネル・プロセッサ、５は
大記憶装置、９はベクトル・レジスタ、１０はマスク・
レジスタ、１１ないし１６は夫々パイプライン演算部、
１７はソース・プログラム、１８はコンパイラ、１９は
目的プログラム、２０はソース解釈部、２１は記憶域割
付げ部、２２はペクト、ル化部、２３は中間コード最適
化部、２４はレジスタ使用決定部、２５は目的プログラ
ム出力部を表わしている。特許出願人　　富士通株式会社代理人弁理士　　　森　　１）　　寛（外１名）才１図７４図４二０１＋τ２才５図Ｄｏ　１０　　Ｚ＝７．ｔｏｏ　　　　　　　　　　　
　　ｖｉ−ＥＮｅ＝ｔｏ。才　６図（Ａ）Ｄｏ　　１０　１＝　　１，１００Ｘ（υ＝　Ａ（Ｉ）　十Ｂ（Ｉ）Ｙ（Ｉ）＝　Ｃ（Ｉ）　　六〇（１）　ＨＥ（Ｚ）矢Ｆ
（０１ｍ　＝　Ｘ（Ｚ）　＋Ｙ（Ｉ）すＡ（１）ｌ　Ｂ
ｍｌｏ　　ＣｏＮＴＩＮ（ＪＥＣＢ）２例）　　＝　　７／−ｔ１３第１２図「−−イＳ了ｔトイ７始イ尤θ賀［「−−イらンン毎ト
３ｊ５イ吉柊了４左裳埼肉FIG. 1 is an explanatory diagram conceptually explaining processing corresponding to vector instructions, FIG. 2 is an example of a processing system having a vector processing processor according to the present invention, and FIG. 3 is an example of a compiler used in the present invention. Embodiment configuration, FIG. 4 is an explanatory diagram illustrating the manner in which a source program is transferred to intermediate code, and FIG.
6 to 10 are explanatory diagrams illustrating processing according to the present invention. FIG. 11 is a flowchart of an embodiment of a portion directly related to the present invention in the intermediate code optimization section. The figure shows an explanatory diagram related to processing in the pipeline calculation section. In the figure, 1 is a main storage device, 2 is a memory control device, 3 is a vector processing processor, 4 is a channel processor, 5 is a large storage device, 9 is a vector register, and 10 is a mask processor.
Registers 11 to 16 are pipeline operation units, respectively;
17 is a source program, 18 is a compiler, 19 is an objective program, 20 is a source interpretation section, 21 is a storage allocation section, 22 is a program processing section, 23 is an intermediate code optimization section, and 24 is a register use determination section. A section 25 represents a target program output section. Patent applicant Fujitsu Ltd. Representative Patent Attorney Mori 1) Hiroshi (1 other person) 1 figure 74 figure 4 201 + τ2 5 figure Do 10 Z=7. too
vi-ENe=to. 6 Figure (A) Do 10 1= 1,100 X(υ= A(I) 10B(I) Y(I)= C(I) 60(1) HE(Z) Arrow F
(01m = X(Z) +Y(I)suA(1)l B
mlo CoNTIN (JE CB) 2 examples) = 7/-t13 Fig. 12 ``--I S Ryo t toy 7 beginning I y θga [``--Irann every 3 j 5 I Yoshihiragi ry 4 left mo Sai meat

Claims

[Claims]

In a compiler that generates and supplies a target program from a given source program to a vector processing processor that is equipped with a plurality of parallel operation units and at least an augend register and executes vector instructions, the source program The source interpretation section interprets program statements and develops them into intermediate code, the storage allocation section allocates storage locations for various data that appears in the program, and the A vectorization unit that performs recognition and changes the intermediate code, an intermediate code optimization unit that performs optimization to effectively utilize the vector processing processor at the intermediate code level, and a vectorization unit that performs optimization to effectively utilize the vector processing processor at the intermediate code level; and a target program output condition, and the intermediate code optimization unit determines how each parallel operation unit in the vector processing processor is used based on the given intermediate code. The processing unit is provided with a processing unit that examines and extracts the unique type information of each parallel processing unit and path weight information related to the manner in which the parallel processing unit can be used, and based on the path weight information. and a vector operation order determination processing unit that executes a simulation in which usage patterns are sequentially allocated to parallel operation units in a vacant state, and the processing order of the intermediate code is exchanged so as to optimize the processing by the vector processing processor. A parallel execution optimization processing method for vector performance.