JPH02132525A

JPH02132525A - How to compile

Info

Publication number: JPH02132525A
Application number: JP63285647A
Authority: JP
Inventors: Kyoko Iwazawa; 岩澤　京子; Giichi Tanaka; 義一田中
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1988-11-14
Filing date: 1988-11-14
Publication date: 1990-05-22

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、並列計算機システムに係り、特にシングルプ
ロセッサ向きに記述されたソースプログラムから、並列
に実行するのに好適なオブジェクトコードを生成するコ
ンパイル方式に関する．〔従来の技術〕計算速度の向上のために、プロセッサを複数台並べて同
時に動かす．並列計算機システムが考案されてきた．従
来のシングルプロセッサ向きにコ一デイングされたソー
スプログラムは単一メモリを仮定してデータが構成され
ている。これらをそのままメインメモリに配置して、並
列に実行させても、複数のプロセッサが、同一アドレス
へ随時、書き込み読み出しを行うため、値が保証されず
、シングルプロセッサで実行した場合と結果が異なって
しまう。これを避けるためには、適宜、プロセッサごと
に、作業領域を確保し、各々コピーを置けばよい。[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a parallel computer system, and in particular to a compilation method for generating object code suitable for parallel execution from a source program written for a single processor. Regarding the method. [Conventional technology] To improve calculation speed, multiple processors are lined up and run simultaneously. Parallel computer systems have been devised. A conventional source program coded for a single processor has data configured assuming a single memory. Even if these are placed in the main memory and executed in parallel, the values are not guaranteed because multiple processors read and write to the same address at any time, and the results may differ from those executed by a single processor. Put it away. To avoid this, it is sufficient to secure a work area for each processor and place a copy for each processor.

また、メインメモリ上でコンフリクトが生じ、プロセッ
サ台数に応じた実行性能を確保することはできない。こ
れを避けるためには，プロセッサごとに用意されたロー
カルメモリを有効に用いる必要がある。Furthermore, a conflict occurs on the main memory, making it impossible to ensure execution performance commensurate with the number of processors. To avoid this, it is necessary to effectively use the local memory provided for each processor.

ＦＯＲＴＲＡＮ言語で記述されたソースプログラムから
、スーバコンピュータ（ベクトル並列計算機）のオブジ
ェクトを生成するコンパイラで、最も先進的なものは、
イリノイ大学を中心としたＣｅｄｅｒプロジェクトのコ
ンパイラ（Ｐａｒａｆｒａｓｅ　）である・これに関す
る詳しい論文、デイビツド・エイ・パドア，マイケル・
ジエイ・ウルフエ共著“スーパコンピュータのための高
度コンパイラ最適化″（ＤＡＶＩＤ　Ａ．ＰＡＤＵＡ，
　ＭＩＣＨＡＥＬ　Ｊ．１０ＬＦＢ共著“ＡＤＶＡＮＣ
ＥＤ　ＣＯＭＰＩＬＥＲ　ＯＰＴＩＭＩＺＡＴＩＯＮＳ
　ＦＯＲＳＵＰＥＲＣＯＭＰＵＴＥＲ”）　　（Ｃｏｍ
ｍｕｎｉｃａｔｉｏｎｓ　ｏｆ　ｔｈｅＡＣＭ，　Ｄｅ
ｃｅｍｂｅｒ　１　９　８　６　Ｖｏｌ　２　９　Ｎａ
ｌ．　２）があるが、メインメモリとローカルメモリの
使い分けや，データの分割や割付けについては記述され
ていない。他にも、従来言語で記述されたデータをコン
パイラが自動的に分割し，メインメモリやローカルメモ
リに対する割付の具体的方法について論じられているも
のはない．〔発明が解決しようとする課題〕従来のマルチプロセッサシステムにおいては、演算器を
複数台並列に動かすことができるが、メモリは主記憶が
ただ１つしかないため，メモリコンフリクトが生じやす
く，実行性能の向上を妨げるほか，接続できる演算器の
個数も自と制限されてきた．一方、並列に動作する各プ
ロセッサごとにローカルメモリを有する並列計算機シス
テムでは、従来のシングルプロセッサ向きに記述された
プログラムでは、十分有効にローカルメモリを使うこと
ができず、台数に似合う実行性能を上げることができな
い。並列機実行性能を引き出すために、ユーザは、各プ
ロセッサのローカルメモリを意識して、再コーディング
しなければならないという問題があった。The most advanced compiler that generates supercomputer (vector parallel computer) objects from source programs written in the FORTRAN language is
A compiler (Parafrase) of the Ceder project centered on the University of Illinois.Detailed paper on this, David A. Padua, Michael.
“Advanced Compiler Optimization for Supercomputers” co-authored by David A. PADUA,
MICHAEL J. 10LFB co-author “ADVANC”
ED COMPILER OPTIMIZATIONS
FORSUPER COMPUTER”) (Com
communications of the ACM, De
cember 1 9 8 6 Vol 2 9 Na
l. 2), but it does not describe the proper use of main memory and local memory, or the division and allocation of data. There is also no discussion of a specific method for a compiler to automatically divide data written in a conventional language and allocate it to main memory or local memory. [Problems to be solved by the invention] In conventional multiprocessor systems, multiple arithmetic units can be operated in parallel, but since there is only one main memory, memory conflicts are likely to occur, resulting in poor execution performance. In addition to impeding improvements in performance, the number of arithmetic units that can be connected has also been limited. On the other hand, in parallel computer systems that have local memory for each processor running in parallel, programs written for conventional single processors are unable to use the local memory effectively enough to improve execution performance commensurate with the number of machines. I can't. In order to bring out the performance of parallel machine execution, the user has to be aware of the local memory of each processor and recode it.

本発明は、各プロセッサごとにローカルメモリを有する
並列計算機システムに対して、従来のシングルプロセッ
サ向きにコーディングされたソースプログラムを、ユー
ザが再コーディングすることなく、ローカルメモリを有
効に活用して、実行効率の高いオブジェクトコードを生
成することを目的とする。The present invention enables a parallel computer system having local memory for each processor to execute a source program coded for a conventional single processor by effectively utilizing the local memory without the user having to recode the program. The purpose is to generate highly efficient object code.

また，共有メモリを各プロセッサがアクセスするマルチ
プロセッサにおいては、適宜プロセッサに合わせて作業
領域を共有メモリに確保し、計算結果を保証し、コンフ
リクトを避けることを目的とする。In addition, in a multiprocessor in which each processor accesses a shared memory, the purpose is to secure a work area in the shared memory according to the processor as appropriate, guarantee calculation results, and avoid conflicts.

[Means to solve the problem]

上記目的は，コンパイラが、「ユーザの並列化指示を解
析する」か又は「自動的に並列性を検出する」ことによ
り，並列処理方法を決定した後、この並列性を損わない
よう、以下の処理を実現することにより達成される。The above purpose is for the compiler to determine the parallel processing method by ``analyzing the user's parallelization instruction'' or ``automatically detecting parallelism,'' and then, in order not to impair this parallelism, to do the following: This is achieved by implementing the following processing.

まず、並列化対象部に現れる変数と配列の添字の変化に
従って、第３図のように分類し、さらに定義や参照の関
係を解析し、第４図のように分類する。First, they are classified as shown in FIG. 3 according to changes in the subscripts of variables and arrays that appear in the parallelization target part, and then the relationships of definitions and references are analyzed and classified as shown in FIG. 4.

次に、これらの分類に従い、既に決めた並列性を損ｂな
いように、変数や配列の分割とそれらをメインメモリや
ローカルメモリにどのように割付けるかを第５図に示す
ように分類する。この分類は次の原則に従う。Next, according to these classifications, divide variables and arrays and how to allocate them to main memory and local memory, as shown in Figure 5, so as not to impair the parallelism that has already been decided. . This classification follows the following principles:

（１）ループ回数に線形な添字でアクセスされる配列は
、定義される値と、その値を定義する計算式を同一プロ
セッサ（データはそのローカルメモリ）に置く。(1) For an array that is accessed with an index linear to the number of loops, the defined value and the calculation formula that defines the value are placed in the same processor (the data is in its local memory).

（２）ワーク的に１つのプロセッサ内で閉じて定義・参
照があれば，プロセッサの台数分だけ一次元拡張する。(2) If the work is defined and referenced within one processor, it is expanded one dimension by the number of processors.

（３）　（２）以外のプログラムの途中で定義がある配
列要素は、ただ１つのローカルメモリに置き、他のプロ
セッサは通信によってこれをアクセスする．（４）プログラムの途中で値が変わらない変数や配列は
、それを参照する可能性のある全てのプロセッサのロー
カルメモリに置く。(3) Array elements that are defined in the middle of a program other than (2) are placed in only one local memory, and other processors access this through communication. (4) Variables and arrays whose values do not change during the program should be placed in the local memory of all processors that may reference them.

最後に、並列処理の方法と、決定したデータの割付けに
従い、並列に実行できる形にプログラムを変換する。Finally, the program is converted into a form that can be executed in parallel according to the parallel processing method and determined data allocation.

このようにして、従来のシングルプロセッサ向きのソー
スプログラムを再コーディングすることなく、プロセッ
サごとのローカルメモリを有効に用いることにより、実
行効率の高いオブジェクトコードを生成することができ
る。In this way, object code with high execution efficiency can be generated by effectively using the local memory of each processor without recoding a conventional source program suitable for a single processor.

複数のプロセッサが、１つの共有メモリをアクセスする
マルチプロセッサシステムにおいては、並列化対象部に
現れる変数と配列の添字の変化に従って，第３図のよう
に分割し、さらに定義や参照の関係を解析し、第４図の
ように分類する。In a multiprocessor system where multiple processors access one shared memory, it is divided as shown in Figure 3 according to the changes in the subscripts of variables and arrays that appear in the parallelized part, and the relationships of definitions and references are further analyzed. and classify them as shown in Figure 4.

次に、これらの分類に従い、既に決めた並列性を損わな
いように、変数や配列の作業領域の確保の要・不要を分
類する。この分類の原則が、個別にローカルメモリを分
散して保持する、並列計算機とは異なる。Next, according to these classifications, we classify whether it is necessary or unnecessary to secure work areas for variables and arrays, so as not to impair the parallelism that has already been decided. This classification principle differs from parallel computers, which store local memory in a distributed manner.

（１）基本的には、全ての変数や配列は、共有メモリの
中の全てのプロセッサがアクセスする可能性のあるグロ
ーバルな領域に置く。(1) Basically, all variables and arrays are placed in a global area in shared memory that can be accessed by all processors.

（２）ループ不変な添字でアクセスされる配列，定義や
変数定義など、各プロセッサが同一アドレスに書き込む
可能性がある場合は、プロセッサごとに、その変数や配
列（又は配列要素）を保持するような，ローカルな作業
領域を置く。(2) If there is a possibility that each processor writes to the same address, such as an array, definition, or variable definition that is accessed with a loop-invariant index, the variable or array (or array element) should be retained for each processor. Place a local work area.

最後に，並列処理の方法と、決定したデータの割付けに
従い、並列に実行できる形にプログラムを変換する。Finally, the program is converted into a form that can be executed in parallel according to the parallel processing method and determined data allocation.

このようにして、従来のシングルプロセッサ向きのソー
スプログラムを再コーディングすることなく、共有メモ
リを複数のプロセッサがアクセスするマルチプロセッサ
に対して，実行効率の高いオブジェクトコードを生成す
ることができる。In this way, object code with high execution efficiency can be generated for a multiprocessor in which shared memory is accessed by multiple processors, without recoding a conventional source program suitable for a single processor.

〔Example〕

以下、本発明のＦＯＲＴＲＡＮコンパイラにおける実施
例を図表を参照しつつ説明する。Hereinafter, embodiments of the FORTRAN compiler of the present invention will be described with reference to figures and tables.

第１図に、本発明を適用するコンパイラ全体の構造を示
す。第１図構文解析５が，ソースプログラム２を入力し
、この字句や構文を解析して中間コード４を生成する。FIG. 1 shows the overall structure of a compiler to which the present invention is applied. A syntax analyzer 5 in FIG. 1 inputs the source program 2, analyzes the words and syntax, and generates an intermediate code 4.

並列化手段決定処理６が，この中間コード４を入力とし
て、コンパイラに並列性の抽出能力が無ければ、ユーザ
に並列化指示を仰ぎ、ユーザ指示解析１５がその指示か
ら並列化手段を決定する。また、コンパイラに十分な並
列性解析能力があれば、自動並列性検出処理１６が並列
処理方法を決定する。次に，並列化変換７が決定した並
列処理を実施できる形に、中間コード４を変換する。そ
して、シングルプロセッサ向きのコンパイルと同様に、
最適化処理８、メモリ割付・レジスタ割当９、コード生
成１０を順次行う。本発明は、並列化変換７に係り、ソ
ースプロダラムに現れるデータを，並列に動作する各プ
ロセッサのローカルメモリに適宜分割することにより、
並列に走らせるオブジェクトコード３の実行効率を上げ
るものである。A parallelization means determination process 6 inputs this intermediate code 4, and if the compiler does not have the ability to extract parallelism, asks the user for a parallelization instruction, and a user instruction analysis 15 determines a parallelization means based on the instruction. Further, if the compiler has sufficient parallelism analysis ability, the automatic parallelism detection processing 16 determines a parallel processing method. Next, the intermediate code 4 is converted into a form in which the parallel processing determined by the parallelization conversion 7 can be executed. And just like compiling for a single processor,
Optimization processing 8, memory allocation/register allocation 9, and code generation 10 are performed in sequence. The present invention relates to parallelization conversion 7, and by appropriately dividing data appearing in a source program into local memories of each processor operating in parallel,
This improves the execution efficiency of the object code 3 that is run in parallel.

並列処理方法が決定したプログラムに対する、並列化変
換７の処理概要を示す。A processing overview of parallelization conversion 7 for a program for which a parallel processing method has been determined is shown.

まず，第１図の変数・配列のアクセス状態の解析１１が
，制御の流れやデータの流れから、値を計算する定義を
値を用いる参照の順序や、ループに運搬される依存の有
無を解析する。First, analysis 11 of the access state of variables and arrays in Figure 1 analyzes the definition for calculating values from the control flow and data flow, the order of reference using values, and the presence or absence of dependencies carried in loops. do.

この解析結果を用いて，第１図の変数・配列の分類１２
が，第３図の変数・配列分類テーブル３０のフィールド
３１，３２，３３，３４，３５，３６に分類し、さらに
第４図の定義・参照テーブル４０のフィールド４１，４
２，４３，４４．４５に分類する。Using this analysis result, classify variables and arrays in Fig. 12.
is classified into fields 31, 32, 33, 34, 35, and 36 of the variable/sequence classification table 30 in FIG. 3, and further into fields 41, 4 of the definition/reference table 40 in FIG.
Classified into 2, 43, 44, and 45.

繰り返し処理の各繰り返しを並列に実行する例を第２図
に示して説明する。ループ２０が並列化対象ループで、
この各繰り返しを並列に実行することが、第１図並列化
手段決定６により決まっている．ここに現れる変数・配
列に対して、次の様に分類する．まず、第３図のテーブ
ル３０に分類する。ループ２０，２１．２２の制御変数
Ｉ，Ｊ，Ｋ、終値のＮ，Ｍ、文２４と文２５のＳ、文２
３のＭＭが変数であり、フイードル３１に格納する。An example of executing each iteration of the iterative process in parallel will be described with reference to FIG. 2. Loop 20 is the loop to be parallelized,
It is determined by parallelization means determination 6 in FIG. 1 that each of these repetitions be executed in parallel. The variables and arrays that appear here are classified as follows. First, they are classified into table 30 in FIG. Control variables I, J, K of loops 20, 21.22, final values N, M, S of statements 24 and 25, statement 2
3 MM is a variable and is stored in the feedle 31.

文２４と文２７の配列ＢとＬやＬＬの添字ＫとＪは、ル
ープ２０の工に依存しないためループ不変であり、しか
も値が変わる。従って、配列Ｂ，Ｌ，ＬＬはフィールド
３２に格納する。一方、文２３の配列Ｄの添字ＭＭが変
わることなく常に一定であるため，Ｄはフィールド３３
に格納する。文２３の配列Ｅと文２６や文２８の配列Ｆ
の添字■はループ２０の繰り返しに対して線形であるた
めフィールド３４に格納する。文２３の配列Ａの添字は
非線形に変化するが、異る繰り返しで同一配列要素がア
クセスされることはないので．フィールド３５に格納し
、文２７の配列Ｃは、ＬやＬＬの配列に同じ値を持つ要
素があれば異るループ繰り返し回数で配列Ｃの同じ要素
をアクセスすることになるため，フィールド３６に格納
する。The arrays B and L of statements 24 and 27, and the subscripts K and J of LL, do not depend on the operation of the loop 20, so they are loop-invariant, and their values change. Therefore, arrays B, L, and LL are stored in field 32. On the other hand, since the subscript MM of array D in sentence 23 remains constant, D is in field 33.
Store in. Array E of sentence 23 and array F of sentence 26 and sentence 28
Since the subscript ■ is linear with respect to the repetition of the loop 20, it is stored in the field 34. Although the subscript of array A in statement 23 changes nonlinearly, the same array element is not accessed in different iterations. Array C in statement 27 is stored in field 36 because if there is an element with the same value in the L or LL array, the same element in array C will be accessed with a different number of loop repetitions. do.

次に第４図のテーブル４０に分類する。ループ２０，２
１．２２の終値Ｎ，Ｍ、文２３の配列Ｅ、配列Ｄとその
添字ＭＭ、文２７の配列Ｃの間接指４１Ｌ，ＬＬは、並
列化対象ループ２０では値が更新されることが無く、参
照だけであり、フィールド４１に格納する．ループ２０
の制御変数工や文２３のＡ，文２５のＳ，文２７のＣ，
文２４のＢは，定義がただ一箇所にある。このうち、Ｂ
はルーブ２９ですぐに新たな値が定義されるため、外側
まで文２４の定義は届かず、Ｂはフィールド４３に格納
する。残りのＡ，Ｓ，Ｃは、ループ２９まで、ルーブ２
０の中で定義した値が届いているため、フィールド４２
に格納する．Ｊ，Ｋなどの内側ループ制御変数は、繰り
返しの前に初期値を設定し、ループ内で増分値を加え込
むため定義は２箇所にある。これらは、ループ２９で、
値を参照することなく、別な値を定義しているため、フ
ィールド４５に格納する。文２６と文２８の２箇所にＦ
の定義があり、かつループ２９まで定義が届いているた
め，フィールド４４に格納する。Next, they are classified into table 40 in FIG. loop 20,2
The values of the final values N and M of 1.22, the array E of statement 23, the array D and its subscript MM, and the indirect fingers 41L and LL of array C of statement 27 are not updated in the parallelization target loop 20, It is only a reference and is stored in field 41. loop 20
control variable engineering, A of sentence 23, S of sentence 25, C of sentence 27,
Sentence 24, B, has a definition in only one place. Of these, B
Since a new value is immediately defined in the rube 29, the definition of the statement 24 does not reach the outside, and B is stored in the field 43. The remaining A, S, and C are loop 2 up to loop 29.
Since the value defined in 0 has arrived, field 42
Store it in . Inner loop control variables such as J and K are defined in two places because initial values are set before repetition and incremental values are added within the loop. These are loop 29,
Since a different value is defined without referring to the value, it is stored in field 45. F in two places, sentence 26 and sentence 28
Since there is a definition and the definition has reached up to loop 29, it is stored in field 44.

これらの第３図のテーブル３０と第４図のテーブル４０
を入力として、第１図の変数・配列の分割方法の決定１
３が、第５図の分割バタンテーブル５０のいずれかに変
数と配列を割付ける。These table 30 in FIG. 3 and table 40 in FIG.
As input, determine how to divide variables and arrays in Figure 1 1
3 allocates variables and arrays to any of the divided button tables 50 in FIG.

終値保証とは、プログラム全体で用いる変数や配列に対
して、ループを並列に実行した後の計算で用いるために
、逐次に実行した時と同じ値を配列や変数が保持できる
よう、ループの並列実後終了時の値を整えることである
。この時の値を終値という。終値は、メインメモリ（Ｍ
Ｓと略す）確保することも，ローカルメモリ（ＬＳと略
す）に確保することもできる。ただし、ローカルメモリ
に配列の全要素の終値と作業領域の両方を置くようなこ
とはしない。Closing value guarantee is for the variables and arrays used throughout the program to be used in calculations after executing the loop in parallel, so that the arrays and variables retain the same values as when executed sequentially. This is to adjust the value at the end of the process. The value at this time is called the closing price. The closing price is stored in the main memory (M
It can be secured in the local memory (abbreviated as LS) or in the local memory (abbreviated as LS). However, do not store both the final values of all elements of the array and the work area in local memory.

第３図と第４図のテーブル３０と４０から，変数・配列
の分割方法決定１３が第５図のテーブル５０を作る判定
規則を第６図に示す。判定６０が変数を配列を分け，７
１が変数は全てＬＳに用意する。判定６１により、ルー
プの外へ定義が届く場合は７２がＭＳに終値を置き、そ
うでなければ終値は不要である。即ち，第３図のテーブ
ル３０のフィールド３１にある変数は、第４図のテーブ
ル４０のフィールド４２や４４にあれば、第５図のテー
ブル５ｏのフィールド５４に格納する。第２図の例だと
、Ｓがこれである６第４図のフィールド４１，４３．４
５にあれば、第５図のフィールド５９に格納する。第２
図の例だと、Ｎ，Ｍ，ＭＭ，Ｉ，Ｊ，Ｋがこれである。FIG. 6 shows a determination rule for creating the table 50 in FIG. 5 from the tables 30 and 40 in FIGS. 3 and 4, in which the variable/array division method determination 13 creates the table 50 in FIG. Judgment 60 divides the variables into arrays, 7
1 prepares all variables in LS. According to decision 61, if the definition arrives outside the loop, 72 places a final value in the MS, otherwise no final value is needed. That is, if a variable in field 31 of table 30 in FIG. 3 is in field 42 or 44 in table 40 in FIG. 4, it is stored in field 54 of table 5o in FIG. 5. In the example of Figure 2, S is this 6 Fields 41, 43.4 in Figure 4
5, it is stored in field 59 in FIG. Second
In the example shown in the figure, these are N, M, MM, I, J, and K.

配列については、判定６２が参照アクセスのみであれば
、判定６３が、コンパイル時に解析可能な添字か否かを
判定し、解析不可能であれば、処理７３が全ＬＳへ全要
素を置く。解析可能であれば，参照するプロセッサのＬ
Ｓへその要素を置く。For arrays, if the determination 62 is only reference access, a determination 63 determines whether the subscript can be analyzed at compile time, and if it cannot be analyzed, a process 73 places all elements in all LSs. If analysis is possible, L of the reference processor
Place the element on S.

即ち，第４図のテーブル４０のフィールド４１にある配
列は、第３図のテーブル３０のフィールド３２，３３．
３４は、処理７４により第５図のテーブル５０のフィー
ルド５３に相当する．第２図の例だとＬ，ＬＬ，Ｄ，Ｅ
がこれである。That is, the array in field 41 of table 40 in FIG. 4 is the same as fields 32, 33, . . . in table 30 in FIG.
34 corresponds to field 53 of table 50 in FIG. 5 by processing 74. In the example of Figure 2, L, LL, D, E
is this.

これらは全く別の並列処理ループでの定義の形式に依存
する。配列Ｄは添字が常に一定であるため、Ｄ　（Ｍ）
を全ＬＳの作業領域に置く。他のループでのＤの使い方
により、Ｄの終値の分割方法が決まる。Ｄが他のループ
で線形にアクセスされれば、テーブル５０のフィールド
５８に格納する。These depend on the form of definition in completely different parallel processing loops. Since the subscript of array D is always constant, D (M)
Place it in the work area of all LS. How D is used in other loops determines how the final value of D is divided. If D is linearly accessed in another loop, it is stored in field 58 of table 50.

配列Ｅは、線形なループ可変添字でアクセスされ、かつ
定義は無いため、ＬＳ上に作業領域は不要であり、プロ
セッサが自分のＬＳを参照すればよいように各ＬＳに終
値を分割する。これは、テーブル５０のフィールド５３
に格納する。配列Ｌ，ＬＬは、１つのプロセッサで全要
素をアクセスする可能性があるため、全ＬＳに全要素を
置く．また、他の並列化ループでも、ＬやＬＬの要素を
参照する可能性があるため、これは作業領域ではなく、
終値として、テーブル５０のフィールド５２に格納する
．定義のある配列は、判定６４によりループ不変添字であ
ると認識すると、処理７５がＬＳに作業領域を確保する
。さらにこの定義がループの外まで届いていれば、処理
７５が終値を確保するが、ＭＳに置くかＬＳに置くかは
、全く別のループの定義の形体に依存する。第３図テー
ブル３０のフィールド３２．３３にある配列のうち、第
４図のテーブル４ｏのフィールド４２．４４にあるのが
第５図のテーブル５０のフィールド５６，５７，５８の
いずれかに格納する。第４図のフィールド４３．４５に
ある配列は，第５図のフィールド５９に格納し，第２図
の例だと、Ｂがこれである。Since the array E is accessed with a linear loop variable subscript and has no definition, no work area is required on the LS, and the final value is divided into each LS so that the processor only needs to refer to its own LS. This is field 53 of table 50.
Store in. Since all elements of arrays L and LL may be accessed by one processor, all elements are placed in all LSs. Also, other parallelized loops may refer to elements of L and LL, so this is not a work area.
It is stored in field 52 of table 50 as the final price. If the defined array is recognized as a loop-invariant subscript by the determination 64, a process 75 secures a work area in the LS. Further, if this definition reaches outside the loop, processing 75 secures the final value, but whether it is placed in MS or LS depends on the form of the definition of a completely different loop. Among the arrays in fields 32 and 33 of table 30 in FIG. 3, those in fields 42 and 44 of table 4o in FIG. 4 are stored in fields 56, 57, and 58 of table 50 in FIG. . The arrays in fields 43 and 45 of FIG. 4 are stored in field 59 of FIG. 5, and in the example of FIG. 2, this is B.

ループ可変な添字の配列については、判定６ｏが、コン
パイル時に解析可能な添字か否かを判定し、可能であれ
ば、定義が１つと６７が判定すれば処理７７がその定義
を計算するプロセッサのＬＳに各要素を置くように分割
する。定義が複数であれば出力依存を解析し、その終点
の定義を計算するプロセッサのＬＳに各要素を置くよう
に分割する。第３図のテーブル３０のフィールド３４の
配列で、第４図テーブル４０フィールド４２，４４にあ
れば、第５図のテーブル５０のフィールド５２に格納し
，フィールド４２の配列は第６図の７７が、フィールド
４４の配列は処理７８が分割する。For loop-variable index arrays, decision 6o determines whether the index can be parsed at compile time, and if possible, if 67 determines that there is one definition, processing 77 determines whether the index is parsable at compile time. Divide so that each element is placed in LS. If there are multiple definitions, the output dependence is analyzed and the definition is divided so that each element is placed in the LS of the processor that calculates the definition of the end point. In the arrangement of field 34 of table 30 in FIG. 3, if it is in fields 42 and 44 of table 40 in FIG. 4, it is stored in field 52 of table 50 in FIG. , field 44 is divided by process 78.

配列Ｆは、第２図で文２６と文２８の２箇所ある。出力
依存の終点は文２６であるため、この定義に合わせて分
割する。Ｆについては、出力依存の始点の定義（文２６
）は、Ｉ＝Ｈの時以外は不要であるため、特にＬＳ上に
作業領域は不要であり、第５図テーブル５０のフィール
ド５３に格納する。Array F has two locations, sentence 26 and sentence 28 in FIG. Since the end point of output dependence is statement 26, it is divided according to this definition. For F, the definition of the starting point of output dependence (statement 26
) is unnecessary except when I=H, so no work area is required on the LS, and it is stored in the field 53 of the table 50 in FIG.

コンパイル時に解析できない添字の配列については，判
定６８が、一つの要素は常に同一ループ回数で定義され
るか否かを判定し、常に同一ループ回数でのみ定義され
る時は処理７９が、定数を計算するプロセッサのＬＳに
各要素を置くように分割する。一つの要素に対して異る
ループ回数で定義される場合は、並列に実行することは
できないため、ＭＳに配列全体を置き逐次に実行する。For arrays with subscripts that cannot be analyzed at compile time, a decision 68 determines whether an element is always defined with the same number of loops, and if it is always defined only with the same number of loops, a process 79 determines whether or not an element is always defined with the same number of loops. Divide so that each element is placed in the LS of the processor that performs the calculation. If a different number of loops are defined for one element, it cannot be executed in parallel, so the entire array is placed in the MS and executed sequentially.

第３図テーブル３０のフィールド３５の配列は、第５図
のフィールド５３に相等し、第２図の例′では，Ａであ
る．第３図のフィールド３６が第５図のフィールド５１
に相当し、第２図の例では、Ｃである。The arrangement of field 35 of table 30 of FIG. 3 is equivalent to field 53 of FIG. 5, which is A in the example ' of FIG. Field 36 in FIG. 3 is replaced by field 51 in FIG.
In the example of FIG. 2, it is C.

このようにして、データの割付け方法を決定した後，第
１図中間コード変換１４が並列に実行できる形に変換す
る。この結果、第２図のソースプログラムのデータは第
７図に示すようになる。メインメモリには９ｏに示すよ
うなデータを置く。After determining the data allocation method in this manner, the intermediate code conversion 14 in FIG. 1 is converted into a form that can be executed in parallel. As a result, the data of the source program shown in FIG. 2 becomes as shown in FIG. Data as shown in 9o is placed in the main memory.

各ＬＳは各々９１．９３に示すようなワーク的な次の並
列実行時には全く別の用途に用いるデータと、９２．９
４に示すように１つのＪＯＢを通じて保持するデータに
分けて、格納する。Each LS stores data that will be used for a completely different purpose during the next parallel execution of work, as shown in 91.93, and 92.9.
As shown in 4, the data is divided into data to be held through one job and stored.

一方、複数のプロセッサが１つのメモリを共有するマル
チプロセッサシステムに対しても、同様の処理は有効で
ある。ただし、物理的にメインメモリと独立なローカル
メモリが存在するわけではないので、メインメモリ（第
８図９６）中に、全プロセッサがアクセスするグローバ
ルな領域（第８図９７）と、プロセッサ台数分用意する
、各プロセッサのためのローカルな作業領域（第８図９
８．９９）を確保する。On the other hand, similar processing is also effective for multiprocessor systems in which multiple processors share one memory. However, since there is no local memory that is physically independent of main memory, there is a global area (97 in Figure 8) that is accessed by all processors in the main memory (96 in Figure 8), and a global area (97 in Figure 8) that is accessed by all processors. Prepare a local work area for each processor (Figure 8
8.99).

変数や配列の終値は分割する必要もないため、グローバ
ルな領域にまとめて置く。ローカルな作業領域には，基
本的には第７図の９１や９３のワーク的な変数や配列を
置く。ただし、これらの中で．参照だけに用いられてい
るＮ，Ｍ，Ｌ，ＬＬ，Ｄ　（Ｍ）のように各ＬＳの値が
全て同じものは、プロセッサ台数分用意する必要が無い
ため、ただ１つ、グローバルな領域に置く。第２図のプ
ログラムのデータは、第８図のようになる。The final values of variables and arrays do not need to be divided, so they are stored together in the global area. Basically, the work variables and arrays shown in 91 and 93 in Figure 7 are placed in the local work area. However, among these... If the values of each LS are all the same, such as N, M, L, LL, D (M), which are used only for reference, there is no need to prepare them for the number of processors, so only one LS can be stored in the global area. put. The data of the program shown in FIG. 2 is as shown in FIG.

〔Effect of the invention〕

本発明によれば、可能な限りの有効にローカルメモリを
用いて並列に実行するオブジェクトコードを生成するこ
とができるので、複数のプロセッサを多数台並べて並列
に実行させる時に、主記憶上でデータのコンフリクトの
発生による実行性能の低下の要因を減少させ、並列に動
くオブジェクトの実行効率を向上させることができる。According to the present invention, it is possible to generate object code that is executed in parallel using local memory as effectively as possible, so when multiple processors are lined up and executed in parallel, data is stored in the main memory. It is possible to reduce the causes of deterioration in execution performance due to the occurrence of conflicts and improve the execution efficiency of objects running in parallel.

また、コンパイルが自動的にデータの分割・割付けを行
うので、シングルプロセッサ向きにコーディングしたプ
ログラムを並列計算機で走らせようとするユーザの負担
を軽くすることができるという効果もある。Additionally, since the compilation automatically divides and allocates data, it has the effect of easing the burden on users who try to run a program coded for a single processor on a parallel computer.

[Brief explanation of drawings]

第１図は本発明の一実施例のコンパイラの全体構成図、
第２図は実施例を説明するためのソースプログラムの例
，第３図は変数や配列の分類テーブル、第４図は定義・
参照テーブル、第５図は分割バタンテーブル、第６図は
、変数・配列の分割方法決定処理の概要、第７図は第２
図のソースプログラムのデータを分散メモリ型並列機向
きに分割した結果，第８図は第２図のソースプログラム
を共有メモリ型の並列機向きに分割した結果をそれぞれ
示す図である。１・・・ＦＯＲＴＲＡＮコンパイラ、２・・・ソースプ
ログラム、３・・・オブジェクトコード、４・・・中間
コード、５・・・構文解析、６・・・並列化手段決定処
理，７・・・並列化変換、８・・・最適化処理、９・・
・メモリ割付・レジスタ割当て、１ｏ・・・コード生成
、１１・・・変数・配列のアクセス状態の解析、１２・
・・変数・配列の分類、１３・・・変数・配列の割付方
法の決定、１４・・・中間コード変換、２ｏ・・・並列
化対象ループ、２１，２２・・・ループ、２３，２４，
２５，２６，２７．２８・・・実行文，２９・・・ルー
プ、３０・・・変数・配列の分類テーブル、３１，３２
，３３，３４，３５．３６・・・変数・配列の分類テー
ブルの各フィールド、４０・・・定数・参照テーブル、
４１，４２，４３，４４，４５・・・定義・参照テーブ
ルの各フィールド、５０・・・分割バタンテーブル、５
１，５２，５３，５４，５５，５６，５７，５８，５９
・・・分割バタンテーブルの各フィールド．６０，６１
，６２，６３，６４，６５，６６，６７．６８・・・分
割方法決定のための判定処理、７１，７２，７３，７４
，７５，７６，７７，７８，７９，８０・・・分割・割
付処理、９ｏ・・・メインメモリ、９１，９２，９３，
９４・・・ローカルメモリ、９６・・・メモリ（共有）
メモリ，９７・・・グローバルな領域、９８．９９・・
・プロセッサごとのローカルな作業領域。第図１Ｌ０Ｙ図罵図冨図５ρ ノ菓ア図ＶＵｑ／ｑＺデ４６θ 第図第図FIG. 1 is an overall configuration diagram of a compiler according to an embodiment of the present invention.
Figure 2 is an example of a source program for explaining the embodiment, Figure 3 is a classification table of variables and arrays, and Figure 4 is a definition and
The reference table, Figure 5 is the division button table, Figure 6 is an overview of the process of determining the division method for variables and arrays, and Figure 7 is the second
FIG. 8 is a diagram showing the result of dividing the source program data of FIG. 2 into a shared memory type parallel machine. 1... FORTRAN compiler, 2... Source program, 3... Object code, 4... Intermediate code, 5... Syntax analysis, 6... Parallelization means determination process, 7... Parallelism conversion, 8... optimization processing, 9...
・Memory allocation/register allocation, 1o...Code generation, 11...Analysis of variable/array access status, 12.
...Classification of variables/arrays, 13...Determination of allocation method of variables/arrays, 14...Intermediate code conversion, 2o...Loop to be parallelized, 21, 22...Loop, 23, 24,
25, 26, 27. 28... Executable statement, 29... Loop, 30... Variable/array classification table, 31, 32
, 33, 34, 35. 36... Each field of the variable/array classification table, 40... Constant/reference table,
41, 42, 43, 44, 45...Each field of definition/reference table, 50...Divided button table, 5
1, 52, 53, 54, 55, 56, 57, 58, 59
...Each field of the split button table. 60,61
, 62, 63, 64, 65, 66, 67. 68... Judgment process for determining division method, 71, 72, 73, 74
, 75, 76, 77, 78, 79, 80... Division/allocation processing, 9o... Main memory, 91, 92, 93,
94...Local memory, 96...Memory (shared)
Memory, 97...Global area, 98.99...
・Local work area for each processor. Fig. 1L0 Y Fig.

Claims

[Claims] 1. For a parallel computer composed of a plurality of processors each having a local memory corresponding to each processor,
Performs syntax analysis on a user-written source program coded for a conventional single processor with one memory space, generates intermediate code,
Using this intermediate code as input, it determines the parallel processing method by analyzing the user's parallelization instructions or by automatically detecting parallelism of the compiler, and converts the intermediate code to parallelism so that it is executed on each processor according to this decision. , After this, in the compilation method that performs optimization, allocates computer resources, allocates memory and allocates registers, and generates object code, when performing parallelization conversion on the intermediate word after the above determination, variables are Analyze the access status of variables and arrays, classify them into several types, and decide whether to place variables and arrays in main memory, local memory, or just one place according to this classification.
A compilation method characterized by determining whether there are multiple locations, determining the division of data (variables and arrays) according to this, and converting intermediate code into a form that can be executed by each processor. 2. A source program coded for a conventional single processor with only one memory space written by the user, as opposed to a multiprocessor consisting of multiple processors running in parallel and one shared memory that can be accessed by them. , performs syntax analysis to generate intermediate code, uses this intermediate code as input, determines the parallel processing method by analyzing the user's parallel instructions or automatically detects parallelism during compilation, and generates the intermediate code according to this parallelization method. After the above decision, in the compilation method that parallelizes the code so that it is executed on each processor, performs optimization, allocates computer resources, allocates memory and registers, and generates object code. When performing parallelization conversion on intermediate words, we analyze the access status of variables and arrays, classify them into several types,
In line with this, all or part of variables and arrays are arrayed for the number of processors, and it is decided whether to have copies or not, and according to this, data (variables and arrays)
A compilation method characterized by determining the allocation of the code and converting the intermediate code into a form that can be executed on each processor.