JP6252140B2

JP6252140B2 - Task allocation program and task allocation method

Info

Publication number: JP6252140B2
Application number: JP2013248325A
Authority: JP
Inventors: 康行大野; 三田　要; 要三田; 直樹末安
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-11-29
Filing date: 2013-11-29
Publication date: 2017-12-27
Anticipated expiration: 2033-11-29
Also published as: US20150154054A1; JP2015106298A; US9733982B2

Description

本願は、タスク割り付けプログラム及びタスク割り付け方法に関する。 The present application relates to a task allocation program and a task allocation method.

従来、プロセッサへのソケットやコアの割り付けは、各処理を実行するアプリケーションに対応するプログラムにおいて、最大でも構文のループ単位やサブルーチン単位でのみ指定が可能である。したがって、それ以上の（それより粒度の小さい）構文については、アプリケーション側で指定することができず、ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ（ＯＳ）のレベルで決められてしまうため、処理効率や処理性能の向上にも限界があった。 Conventionally, the assignment of sockets and cores to processors can be specified only in units of syntax loops or subroutines at the maximum in a program corresponding to an application that executes each process. Therefore, more syntax (smaller than that) cannot be specified on the application side, and is determined at the level of the Operating System (OS), which limits the improvement in processing efficiency and processing performance. was there.

例えば、ＨｉｇｈＰｅｒｆｏｒｍａｎｃｅＣｏｍｐｕｔｉｎｇ（ＨＰＣ）用プロセッサ等に用いられるスレッド並列アプリケーションの構文であるＯｐｅｎＭＰ（登録商標）には、タスクの概念が導入されている。ＯｐｅｎＭＰは、アプリケーション側で複数のソケットやコアを選択することが可能となっている。 For example, the concept of a task is introduced in OpenMP (registered trademark) which is a syntax of a thread parallel application used for a processor for High Performance Computing (HPC) or the like. In OpenMP, a plurality of sockets and cores can be selected on the application side.

特開２００３−６１７５号公報JP 2003-6175 A 特開２００８−８４００９号公報JP 2008-84009 A

ＳｔｅｐｈｅｎＬ．Ｏｌｉｖｉｅｒ，ＡｌｌａｎＫ．Ｐｏｒｔｅｒｆｉｅｌｄ，ＫｙｌｅＢ．Ｗｈｅｅｌｅｒ，ａｎｄＪａｎＦ．Ｐｒｉｎｓ、"Ｓｃｈｅｄｕｌｉｎｇｔａｓｋｐａｒａｌｌｅｌｉｓｍｏｎｍｕｌｔｉ−ｓｏｃｋｅｔｍｕｌｔｉｃｏｒｅｓｙｓｔｅｍｓ．"、ＩｎＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅ１ｓｔＩｎｔｅｒｎａｔｉｏｎａｌＷｏｒｋｓｈｏｐｏｎＲｕｎｔｉｍｅａｎｄＯｐｅｒａｔｉｎｇＳｙｓｔｅｍｓｆｏｒＳｕｐｅｒｃｏｍｐｕｔｅｒｓ（ＲＯＳＳ '１１）、２０１１．Stephen L. Olivier, Allan K. et al. Porterfield, Kyle B.M. Wheeler, and Jan F .; Princes, “Scheduling task parallelism on multi-socket multicore systems.”, In Proceedings of the 1st International co., S11.

上述したように、タスクの概念が導入されたことで、アプリケーションプログラムでは、アプリケーション内部の情報を用いて構文それぞれのソケットやコアの割り付けを行うことができ、処理効率や処理性能の向上が可能となる。 As described above, the introduction of the concept of tasks enables application programs to allocate sockets and cores for each syntax using information inside the application, thereby improving processing efficiency and processing performance. Become.

しかしながら、従来の単純な先入れ先出し方式やラウンドロビン方式等のアルゴリズムの場合は、タスクそれぞれのメモリアクセスについての考慮がなされていない等の影響により、処理効率や処理性能の向上に限界があった。 However, in the case of a conventional algorithm such as a simple first-in first-out method or a round robin method, there is a limit to improvement in processing efficiency and processing performance due to the effect that the memory access of each task is not considered.

１つの側面では、本発明は、処理効率や処理性能の向上を図ることを目的とする。 In one aspect, an object of the present invention is to improve processing efficiency and processing performance.

一態様におけるタスク割り付けプログラムは、アプリケーションの実行により得られるハードウェアモニタ情報からタスク毎の、タスクの親子関係の有無を示す情報と、メモリアクセス率と、を含むタスク別プロファイル情報を生成し、前記タスク別プロファイル情報に基づき、前記アプリケーションからのタスク指示に対して、前記アプリケーションを実行するプログラムにおける前記タスクの構文単位で、プロセッサのソケット又はコアへの割り付けを行って、前記タスクを実行した結果に基づき、前記タスク別プロファイル情報を更新し、更新された前記タスク別プロファイル情報に基づき、次のタスク指示により実行されるタスクの構文単位で、プロセッサのソケット又はコアへの割り付ける、処理をコンピュータに実行させる。 The task allocation program in one aspect generates task-specific profile information including information indicating the presence / absence of a parent-child relationship between tasks and a memory access rate for each task from hardware monitor information obtained by executing an application, Based on the task-specific profile information, in response to a task instruction from the application, the task is assigned to a processor socket or core in the syntax unit of the task in the program that executes the application, and the task is executed. The task-specific profile information is updated based on the updated task-specific profile information, and the processing is executed on the computer, which is assigned to the processor socket or core in the syntax unit of the task executed by the next task instruction based on the updated task-specific profile information. Let

処理効率や処理性能の向上を図ることができる。 Processing efficiency and processing performance can be improved.

情報処理装置の機能構成例を示す図である。It is a figure which shows the function structural example of information processing apparatus. 情報処理装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of information processing apparatus. ＣＰＵの具体例を示す図である。It is a figure which shows the specific example of CPU. 情報処理装置の処理の一例を示すフローチャートである。It is a flowchart which shows an example of a process of information processing apparatus. 本実施形態におけるタスク割り付け手法の具体例を示す図である。It is a figure which shows the specific example of the task allocation method in this embodiment. タスクを実行するアプリケーション中に含まれる構文の一例を示す図である。It is a figure which shows an example of the syntax contained in the application which performs a task. 本実施形態におけるタスク割り付けの概略例を示す図である。It is a figure which shows the schematic example of the task allocation in this embodiment. 割り付け手段における処理の一例を示すフローチャートである。It is a flowchart which shows an example of the process in an allocation means. メモリアクセス率に基づくソケットやコアへの割り付け例を示す図（その１）である。It is FIG. (1) which shows the example of allocation to the socket and core based on a memory access rate. メモリアクセス率に基づくソケットやコアへの割り付け例を示す図（その２）である。It is FIG. (2) which shows the example of allocation to the socket and core based on a memory access rate. 第１実施例における割り付け例を示す図（その１）である。It is FIG. (1) which shows the example of allocation in 1st Example. 第１実施例における割り付け例を示す図（その２）である。It is FIG. (2) which shows the example of allocation in 1st Example. 第１実施例における割り付け例を示す図（その３）である。It is FIG. (3) which shows the example of allocation in 1st Example. 第２実施例における割り付け例を示す図（その１）である。It is FIG. (1) which shows the example of allocation in 2nd Example. 第２実施例における割り付け例を示す図（その２）である。It is FIG. (2) which shows the example of allocation in 2nd Example.

以下、添付図面を参照しながら実施例について詳細に説明する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

＜情報処理装置の機能構成例＞
図１は、情報処理装置の機能構成例を示す図である。図１に示す情報処理装置１０は、例えば所定のアプリケーションに対応させて、プロセッサのソケットやコア等にタスクを割り付けて並列処理を実行する。図１の例に示す情報処理装置１０は、入力手段１１と、出力手段１２と、記憶手段１３と、割り付け手段１４と、処理実行手段１５と、プロファイル情報測定手段１６と、算出手段１７と、更新手段１８と、通信手段１９と、制御手段２０とを有する。 <Functional configuration example of information processing apparatus>
FIG. 1 is a diagram illustrating a functional configuration example of the information processing apparatus. The information processing apparatus 10 illustrated in FIG. 1 executes a parallel process by assigning a task to a socket, a core, or the like of a processor corresponding to a predetermined application, for example. The information processing apparatus 10 shown in the example of FIG. 1 includes an input unit 11, an output unit 12, a storage unit 13, an allocation unit 14, a process execution unit 15, a profile information measurement unit 16, a calculation unit 17, The update unit 18, the communication unit 19, and the control unit 20 are included.

入力手段１１は、ユーザ等から各種指示の開始や終了、設定の入力等の各種入力を受け付ける。例えば、入力手段１１は、本実施形態におけるタスクのソケットやコアへの割り付け指示、処理実行指示、プロファイル情報測定指示、算出指示、更新指示、通信指示等の各指示を受け付ける。入力手段１１は、例えばキーボードやマウス等でもよく、また画面を用いたタッチパネル形式等でもよく、マイクロフォン等でもよいが、これに限定されるものではない。 The input unit 11 accepts various inputs such as start and end of various instructions and input of settings from a user or the like. For example, the input unit 11 accepts instructions such as task assignment instructions to sockets and cores, processing execution instructions, profile information measurement instructions, calculation instructions, update instructions, and communication instructions in this embodiment. The input unit 11 may be, for example, a keyboard or a mouse, or may be a touch panel type using a screen, a microphone, or the like, but is not limited thereto.

出力手段１２は、入力手段１１により入力された内容や、入力内容に基づいて実行された内容等の出力を行う。例えば、出力手段１２は、画面表示により出力する場合には、ディスプレイやモニタ等の表示手段でもよく、また音声により出力する場合には、スピーカ等の音声出力手段でもよいが、これに限定されるものではない。 The output unit 12 outputs the content input by the input unit 11 and the content executed based on the input content. For example, the output unit 12 may be a display unit such as a display or a monitor when outputting by screen display, and may be an audio output unit such as a speaker when outputting by sound, but is not limited thereto. It is not a thing.

記憶手段１３は、本実施形態において必要となる各種情報を記憶する。例えば、記憶手段１３は、入力手段１１から得られる各指示情報や、各種アプリケーションに対応して設定された１又は複数のタスク、タスクのソケットやコアへの割り付け内容、ハードウェアモニタ情報、プロファイル情報等を記憶する。記憶手段１３に記憶される情報は、上述した情報に限定されるものではない。 The storage unit 13 stores various information necessary in the present embodiment. For example, the storage unit 13 includes each instruction information obtained from the input unit 11, one or more tasks set in correspondence with various applications, assignment contents of tasks to sockets and cores, hardware monitor information, profile information Memorize etc. The information stored in the storage unit 13 is not limited to the information described above.

記憶手段１３は、記憶された各種情報を必要に応じて所定のタイミングで読み出したり、書き込んだりする。記憶手段１３は、例えばハードディスクやメモリ等であるが、これに限定されるものではない。記憶手段１３は、通信手段１９を介してデータの送受信可能な状態で接続される記憶装置（ディスク装置）として設けられていてもよい。 The storage unit 13 reads and writes various stored information at a predetermined timing as necessary. The storage means 13 is, for example, a hard disk or a memory, but is not limited thereto. The storage unit 13 may be provided as a storage device (disk device) connected in a state where data can be transmitted and received via the communication unit 19.

割り付け手段１４は、過去のプロファイル情報からタスク別のメモリアクセス等の度合い（メモリアクセス率）を取得し、タスクを割り付けるソケットやコア等をタスク単位で割り付ける。なお、タスクとは、例えばＯｐｅｎＭＰを用いたアプリケーションプログラム（ソースコード）のｔａｓｋ（タスク）構文等で囲まれたブロック等である。したがって、タスク単位とは、例えばタスク構文単位に相当するが、これに限定されるものではない。 The allocation unit 14 acquires the degree of memory access for each task (memory access rate) from the past profile information, and allocates sockets, cores, and the like to which tasks are allocated in units of tasks. Note that a task is, for example, a block surrounded by a task (task) syntax of an application program (source code) using OpenMP. Therefore, the task unit corresponds to, for example, a task syntax unit, but is not limited to this.

例えば、割り付け手段１４は、例えばメモリアクセス率に基づき、アプリケーションからのタスク指示に対して、アプリケーションを実行するプログラムにおけるタスクの構文単位で、プロセッサのソケット又はコアへの割り付けを行う。また、割り付け手段１４は、更新手段１８により更新されたプロファイル情報に基づいてタスク割り付けを行うこともできる。タスクの割り付けは、プロファイル情報が更新されるタイミング（例えば、ループ単位やサブルーチン単位の実行終了等）で行うことができるが、これに限定されるものではなく、例えばアプリケーション単位の実行タイミングでもよい。 For example, the allocating unit 14 allocates a processor socket or core in units of a task syntax in a program that executes an application in response to a task instruction from the application based on, for example, a memory access rate. The allocation unit 14 can also perform task allocation based on the profile information updated by the update unit 18. The task assignment can be performed at the timing when the profile information is updated (for example, the end of execution in a loop unit or a subroutine unit), but is not limited to this, and may be performed in an application unit, for example.

処理実行手段１５は、割り付け手段１４によりプロセッサが有するコアやソケットに割り付けたタスクの処理（並列処理等）を実行する。処理実行手段１５は、例えばＯｐｅｎＭＰＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍＩｎｔｅｒｆａｃｅ（ＡＰＩ）Ｖｅｒｓｉｏｎ３．０で規定されたタスク構文を用いてプログラミングされたアプリケーションに対応する実行可能ファイル等を実行することで、その内部のタスク処理を実行させる。 The processing execution means 15 executes processing (parallel processing or the like) of tasks assigned by the assignment means 14 to the core or socket of the processor. The process execution unit 15 executes an internal task process by executing an executable file corresponding to an application programmed using a task syntax defined in, for example, OpenMP Application Program Interface (API) Version 3.0. Let

プロファイル情報測定手段１６は、処理実行手段１５による処理の実行により得られるハードウェアモニタ情報を用いてタスク別プロファイル情報を測定する。ハードウェアモニタ情報とは、例えばアプリケーション実行中におけるハードウェアの動作状況をモニタリングした情報である。ハードウェアモニタ情報は、タスク単位やループ単位、サブルーチン単位、アプリケーション単位等の情報を出力することができるが、これに限定されるものではない。 The profile information measuring unit 16 measures the task-specific profile information using the hardware monitor information obtained by executing the process by the process executing unit 15. The hardware monitor information is information obtained by monitoring the operation status of hardware during application execution, for example. The hardware monitor information can output information in units of tasks, loops, subroutines, applications, etc., but is not limited to this.

ハードウェアモニタ情報の測定情報としては、例えば「経過時間」、「ＭｉｌｌｉｏｎＦｌｏａｔｉｎｇ−ｐｏｉｎｔＯｐｅｒａｔｉｏｎｓＰｅｒＳｅｃｏｎｄ（ＭＦＬＯＰＳ）」、「ＭＦＬＯＰＳピーク性能比」、「ＭｉｌｌｉｏｎＩｎｓｔｒｕｃｔｉｏｎｓＰｅｒＳｅｃｏｎｄ（ＭＩＰＳ）」、「ＭＩＰＳピーク性能比」、「メモリアクセススループット（チップ単位）」、「メモリアクセススループットピーク性能比（チップ単位）」、「ＳｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＭｕｌｔｉｐｌｅＤａｔａ（ＳＩＭＤ）命令率」等があるが、これに限定されるものではない。例えば、ハードウェアモニタ情報として、メモリアクセス待ち時間やキャッシュミス情報等を有していてもよい。 The measurement information of the hardware monitor information includes, for example, “elapsed time”, “Million Floating-point Operations Per Second (MFLOPS)”, “MFLOPS Peak Performance Ratio”, “Million Instructions Per Second (MIPS)”, “MIPS Peak Performance”. Ratio ”,“ memory access throughput (chip unit) ”,“ memory access throughput peak performance ratio (chip unit) ”,“ single instruction multiple data (SIMD) instruction rate ”, etc., but are not limited thereto. . For example, the hardware monitor information may include memory access waiting time, cache miss information, and the like.

「経過時間」とは、例えば経過時間測定範囲におけるタスク等の命令実行に要した時間である。「ＭＦＬＯＰＳ」は、例えば浮動小数点演算実行効率（１秒間の平均浮動小数点演算実行数）である。「ＭＦＬＯＰＳピーク性能比」は、例えばＭＦＬＯＰＳの論理ピーク値に対する実測値の比率である。「ＭＩＰＳ」は、命令実行効率（１秒間の平均命令実行数）である。「ＭＩＰＳピーク性能比」は、例えばＭＩＰＳの論理ピーク値に対する実測値の比率である。 “Elapsed time” is, for example, the time required to execute an instruction such as a task in the elapsed time measurement range. “MFLOPS” is, for example, the floating-point arithmetic execution efficiency (the average number of floating-point arithmetic executions per second). The “MFLOPS peak performance ratio” is, for example, a ratio of an actual measurement value to a logical peak value of MFLOPS. “MIPS” is instruction execution efficiency (average number of instruction executions per second). The “MIPS peak performance ratio” is, for example, a ratio of an actual measurement value to a logical peak value of MIPS.

「メモリアクセススループット（チップ単位）」は、メモリ、ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＣＰＵ）チップ間における１秒あたりの平均データ転送量である。また、「メモリアクセススループットピーク性能比（チップ単位）」は、メモリアクセススループット（チップ単位）の論理ピーク値に対する実測値の比率である。 “Memory access throughput (chip unit)” is an average data transfer amount per second between a memory and a central processing unit (CPU) chip. The “memory access throughput peak performance ratio (chip unit)” is the ratio of the actually measured value to the logical peak value of the memory access throughput (chip unit).

また、「ＳＩＭＤ命令率」は、例えば命令実行数に占めるＳＩＭＤ命令数の割合である。ＳＩＭＤ命令とは、例えば１命令で複数のオペランドを処理する命令である。 The “SIMD instruction rate” is, for example, the ratio of the number of SIMD instructions to the number of instruction executions. The SIMD instruction is an instruction that processes a plurality of operands with one instruction, for example.

なお、ハードウェアモニタ情報は、プログラムの実行性能を確認することができ、例えばＭＩＰＳ値及びＭＦＬＯＰＳ値がそれぞれのピーク値に近いほど、実行性能及び演算性能が高いプログラムであることを示す。 The hardware monitor information can confirm the execution performance of the program. For example, the closer the MIPS value and the MFLOPS value are to the respective peak values, the higher the execution performance and the calculation performance are.

算出手段１７は、上述したハードウェアモニタ情報等から、タスクそれぞれのメモリアクセス率を算出する。算出手段１７における算出手法の具体例については、後述する。 The calculation means 17 calculates the memory access rate of each task from the hardware monitor information described above. A specific example of the calculation method in the calculation means 17 will be described later.

更新手段１８は、算出手段１７により算出したメモリアクセス率等に基づいてタスク別プロファイル情報を更新する。これにより、割り付け手段１４は、実行途中のその場でのタスクのメモリアクセス率等の履歴からタスクを割り付けるソケットやコアを適切に割り付けることができる。 The update unit 18 updates the task-specific profile information based on the memory access rate calculated by the calculation unit 17. As a result, the allocating unit 14 can appropriately allocate a socket and a core to which a task is allocated from a history such as a memory access rate of the task on the spot during execution.

通信手段１９は、例えばインターネットやＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ（ＬＡＮ）等に代表される通信ネットワークを介して外部装置と各種情報の送受信を行う。通信手段１９は、外部装置等にすでに記憶されている各種情報等を受信することができ、また情報処理装置１０で処理された結果を、通信ネットワーク等を介して外部装置等に送信することもできる。 The communication unit 19 transmits / receives various information to / from an external device via a communication network represented by the Internet, a local area network (LAN), and the like. The communication means 19 can receive various information already stored in the external device or the like, and can transmit the result processed by the information processing device 10 to the external device or the like via a communication network or the like. it can.

制御手段２０は、情報処理装置１０の各構成全体の制御を行う。具体的には、制御手段２０は、例えばユーザ等による入力手段１１からの指示等に基づいて、タスク割り付け処理に関する各制御を行う。ここで、各制御とは、例えば上述した割り付け手段１４によりタスクにソケットやコアを割り付ける、処理実行手段１５によりタスク処理等を実行させる、プロファイル情報測定手段１６によりプロファイル情報を測定させる、算出手段１７によりメモリアクセス率を算出させる、更新手段１８によりタスク別プロファイル情報を更新させる等があるが、これらに限定されるものではない。なお、上述した割り付け手段１４、処理実行手段１５、プロファイル情報測定手段１６、算出手段１７、及び更新手段１８における各処理は、例えば予め設定された少なくとも１つのアプリケーション（プログラム）を実行することで実現することができる。 The control unit 20 controls the entire configuration of the information processing apparatus 10. Specifically, the control means 20 performs each control regarding a task allocation process based on the instruction | indication from the input means 11 by a user etc., for example. Here, each control means, for example, assigning sockets and cores to tasks by the assigning means 14 described above, causing the process executing means 15 to execute task processing, etc., causing the profile information measuring means 16 to measure profile information, and calculating means 17 The memory access rate is calculated by the above, and the task-specific profile information is updated by the updating means 18, but is not limited thereto. Each process in the allocation unit 14, the process execution unit 15, the profile information measurement unit 16, the calculation unit 17, and the update unit 18 described above is realized by executing, for example, at least one preset application (program). can do.

本実施形態では、上述した情報処理装置１０により、例えばタスクの割り付けにおいて、実行時のプロファイル情報を使って、タスクの割り付け先をリアルタイムに制御することができる。したがって、ＨＰＣ用プロセッサ等を用いた並列処理において、処理効率や処理性能の向上を図ることができる。 In the present embodiment, the information processing apparatus 10 described above can control task assignment destinations in real time using, for example, profile information at the time of task assignment. Therefore, in parallel processing using an HPC processor or the like, it is possible to improve processing efficiency and processing performance.

情報処理装置１０は、例えばＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ（ＰＣ）やサーバ等からなるが、これに限定されるものではなく、ＨＰＣ用プロセッサ等のマルチプロセッサを有する計算機等に適用することができる。 The information processing apparatus 10 includes, for example, a personal computer (PC), a server, and the like, but is not limited thereto, and can be applied to a computer having a multiprocessor such as an HPC processor.

＜情報処理装置１０のハードウェア構成例＞
図２は、情報処理装置のハードウェア構成例を示す図である。図２に示す情報処理装置１０は、入力装置３１と、出力装置３２と、ドライブ装置３３と、補助記憶装置３４と、主記憶装置３５と、各種制御を行うＣＰＵ３６と、ネットワーク接続装置３７とを有し、これらはシステムバスＢで相互に接続されている。 <Hardware Configuration Example of Information Processing Apparatus 10>
FIG. 2 is a diagram illustrating a hardware configuration example of the information processing apparatus. The information processing apparatus 10 illustrated in FIG. 2 includes an input device 31, an output device 32, a drive device 33, an auxiliary storage device 34, a main storage device 35, a CPU 36 that performs various controls, and a network connection device 37. These are connected to each other by a system bus B.

入力装置３１は、ユーザ等が操作するキーボード及びマウス等のポインティングデバイスや、マイクロフォン等の音声入力デバイスを有しており、ユーザ等からのプログラムの実行指示、各種操作情報、ソフトウェア等を起動するための情報等の入力を受け付ける。 The input device 31 includes a pointing device such as a keyboard and a mouse operated by a user, and a voice input device such as a microphone. The input device 31 activates a program execution instruction, various operation information, software, and the like from the user. The input of the information etc. is received.

出力装置３２は、本実施形態における処理を行うためのコンピュータ本体（情報処理装置１０）を操作するのに必要な各種ウィンドウやデータ等を表示するディスプレイ等を有する。出力装置３２は、ＣＰＵ３６が有する制御プログラムによりプログラムの実行経過や結果等を表示することができる。 The output device 32 includes a display that displays various windows, data, and the like necessary for operating the computer main body (the information processing device 10) for performing processing in the present embodiment. The output device 32 can display program execution progress, results, and the like by a control program of the CPU 36.

ここで、本実施形態において、例えばコンピュータ本体にインストールされる実行プログラムは、記録媒体３８等により提供される。記録媒体３８は、ドライブ装置３３にセット可能である。ＣＰＵ３６からの制御信号に基づき、記録媒体３８に格納された実行プログラムが、記録媒体３８からドライブ装置３３を介して補助記憶装置３４にインストールされる。 Here, in the present embodiment, for example, an execution program installed in the computer main body is provided by the recording medium 38 or the like. The recording medium 38 can be set in the drive device 33. Based on the control signal from the CPU 36, the execution program stored in the recording medium 38 is installed from the recording medium 38 into the auxiliary storage device 34 via the drive device 33.

補助記憶装置３４は、例えばＨａｒｄＤｉｓｋＤｒｉｖｅ（ＨＤＤ）やＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ（ＳＳＤ）等のストレージ手段等である。補助記憶装置３４は、ＣＰＵ３６からの制御信号に基づき、本実施形態における実行プログラム（タスク割り付けプログラム）や、コンピュータに設けられた制御プログラム等を記憶し、必要に応じて入出力を行う。補助記憶装置３４は、ＣＰＵ３６からの制御信号等に基づいて、記憶された各情報から必要な情報を読み出したり、書き込むことができる。 The auxiliary storage device 34 is storage means such as a hard disk drive (HDD) and a solid state drive (SSD). The auxiliary storage device 34 stores an execution program (task allocation program) in this embodiment, a control program provided in a computer, and the like based on a control signal from the CPU 36, and performs input / output as necessary. The auxiliary storage device 34 can read and write necessary information from each stored information based on a control signal from the CPU 36.

主記憶装置３５は、ＣＰＵ３６により補助記憶装置３４から読み出された実行プログラム等を格納する。主記憶装置３５は、ＲｅａｄＯｎｌｙＭｅｍｏｒｙ（ＲＯＭ）やＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ（ＲＡＭ）等である。 The main storage device 35 stores an execution program read from the auxiliary storage device 34 by the CPU 36. The main storage device 35 is a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

ＣＰＵ３６は、ＯＳ等の制御プログラム、及び主記憶装置３５に格納されている実行プログラムに基づいて、各種演算や各ハードウェア構成部とのデータの入出力等、コンピュータ全体の処理を制御して各処理を実現する。プログラムの実行中に必要な各種情報等は、補助記憶装置３４から取得することができ、また実行結果等を格納することもできる。また、ＣＰＵ３６は、マルチプロセッサを有しており、アプリケーション等によるタスク指示を用いて、タスク構文単位でプロセッサへのソケットやコアを指定して並列処理を実行する。 Based on a control program such as an OS and an execution program stored in the main storage device 35, the CPU 36 controls various processes of the entire computer, such as various operations and data input / output with each hardware component, and controls each processing. Realize processing. Various information necessary during the execution of the program can be acquired from the auxiliary storage device 34, and the execution result and the like can also be stored. In addition, the CPU 36 has a multiprocessor, and executes parallel processing by specifying a socket or core to the processor in task syntax units using a task instruction from an application or the like.

具体的には、ＣＰＵ３６は、例えば入力装置３１から得られるプログラムの実行指示等に基づき、補助記憶装置３４にインストールされたプログラムを実行させることにより、主記憶装置３５上でプログラムに対応する処理を行う。例えば、ＣＰＵ３６は、タスク割り付けプログラムを実行させることで、上述した割り付け手段１４によるソケット又はコアの割り付け、処理実行手段１５による処理の実行、プロファイル情報測定手段１６によるプロファイル情報の測定、算出手段１７によるメモリアクセス率の算出、更新手段１８によるプロファイル情報の更新等の処理を行う。なお、ＣＰＵ３６における処理内容は、これに限定されるものではない。ＣＰＵ３６により実行された内容は、必要に応じて補助記憶装置３４に記憶される。 Specifically, the CPU 36 executes processing corresponding to the program on the main storage device 35 by executing the program installed in the auxiliary storage device 34 based on, for example, an instruction to execute the program obtained from the input device 31. Do. For example, by executing a task allocation program, the CPU 36 allocates sockets or cores by the allocation unit 14 described above, executes processing by the process execution unit 15, measures profile information by the profile information measurement unit 16, and calculates by the calculation unit 17. Processing such as calculation of the memory access rate and updating of profile information by the updating means 18 is performed. In addition, the processing content in CPU36 is not limited to this. The contents executed by the CPU 36 are stored in the auxiliary storage device 34 as necessary.

ネットワーク接続装置３７は、上述した通信ネットワークを介して、他の外部装置との通信を行う。ネットワーク接続装置３７は、ＣＰＵ３６からの制御信号に基づき、通信ネットワーク等と接続することにより、実行プログラムやソフトウェア、設定情報等を外部装置等から取得する。また、ネットワーク接続装置３７は、プログラムを実行することで得られた実行結果を外部装置に提供したり、本実施形態における実行プログラム自体を外部装置等に提供してもよい。 The network connection device 37 communicates with other external devices via the communication network described above. The network connection device 37 acquires an execution program, software, setting information, and the like from an external device or the like by connecting to a communication network or the like based on a control signal from the CPU 36. The network connection device 37 may provide an execution result obtained by executing the program to an external device, or may provide the execution program itself in the present embodiment to an external device or the like.

記録媒体３８は、上述したように実行プログラム等が格納されたコンピュータで読み取り可能な記録媒体である。記録媒体３８は、例えばフラッシュメモリ等の半導体メモリやＣＤ−ＲＯＭ、ＤＶＤ等の可搬型の記録媒体であるが、これに限定されるものではない。 The recording medium 38 is a computer-readable recording medium that stores an execution program and the like as described above. The recording medium 38 is, for example, a semiconductor memory such as a flash memory, or a portable recording medium such as a CD-ROM or DVD, but is not limited thereto.

図２に示すハードウェア構成に実行プログラム（例えば、タスク割り付けプログラム等）をインストールすることで、ハードウェア資源とソフトウェアとが協働して本実施形態におけるタスク割り付け処理等を実現することができる。 By installing an execution program (for example, a task allocation program) in the hardware configuration shown in FIG. 2, the task allocation processing and the like in this embodiment can be realized in cooperation with hardware resources and software.

＜本実施形態におけるＣＰＵ３６の具体例＞
次に、本実施形態におけるＣＰＵ３６の具体例について説明する。図３は、ＣＰＵの具体例を示す図である。図３の例に示すＣＰＵ３６は、マルチコアのプロセッサであり、１つのプロセッサ・パッケージ内に複数のコアを有する。例えば、ＣＰＵ３６は、メモリ４１と、１又は複数のソケット４２（図３の例では、２つのソケット＃１，＃２）とを有する。また、各ソケット４２は、１又は複数のコア４３（図３の例では、４つのコア＃０〜＃３）を含むパッケージである。ソケット数やコア数については、図３の例に限定されるものではない。ＣＰＵ３６は、図３に示すようなマルチコアプロセッサによる並列処理によって性能を向上させる。 <Specific Example of CPU 36 in the Present Embodiment>
Next, a specific example of the CPU 36 in the present embodiment will be described. FIG. 3 is a diagram illustrating a specific example of the CPU. The CPU 36 shown in the example of FIG. 3 is a multi-core processor, and has a plurality of cores in one processor package. For example, the CPU 36 includes a memory 41 and one or a plurality of sockets 42 (two sockets # 1 and # 2 in the example of FIG. 3). Each socket 42 is a package including one or more cores 43 (four cores # 0 to # 3 in the example of FIG. 3). The number of sockets and the number of cores are not limited to the example of FIG. The CPU 36 improves the performance by parallel processing by a multi-core processor as shown in FIG.

メモリ４１は、マイクロプロセッサの内部に設けられた高速な記憶装置（例えば、１次キャッシュ）である。メモリ４１に使用頻度の高いデータを蓄積しておくことで、低速なメインメモリへのアクセスを減らすことができ、処理を高速化することができる。例えば、ＣＰＵ３６は、２段階のキャッシュメモリを搭載している場合、最初にデータを読みに行くのが、より高速でより容量の小さい１次キャッシュであり、１次キャッシュにデータがなかった場合は、より低速でより容量の大きな２次キャッシュに読みに行く。なお、メモリ４１の１次キャッシュ及び２次キャッシュは、上述した主記憶装置３５及び補助記憶装置３４としてもよい。 The memory 41 is a high-speed storage device (for example, a primary cache) provided in the microprocessor. By accumulating frequently used data in the memory 41, access to the low-speed main memory can be reduced and the processing can be speeded up. For example, if the CPU 36 is equipped with a two-stage cache memory, the first data read is the primary cache with a higher speed and smaller capacity, and there is no data in the primary cache. Go to the slower, larger secondary cache. The primary cache and the secondary cache of the memory 41 may be the main storage device 35 and the auxiliary storage device 34 described above.

ソケット４２は、複数のコア４３を格納する領域である。例えば親子関係にあるタスクの場合には、メモリ等のローカリティ（局所性）を保つため、同一ソケットにそれらのタスクを配置する。これにより、アクセスが１ソケット内に閉じることになるため、処理効率や処理性能が向上する。 The socket 42 is an area for storing a plurality of cores 43. For example, in the case of a task having a parent-child relationship, these tasks are arranged in the same socket in order to maintain locality (locality) such as memory. As a result, access is closed in one socket, so that processing efficiency and processing performance are improved.

コア４３は、コンピュータの演算処理を行う部分である。本実施形態では、例えば１つのコア４３に１つのタスクが構文単位で割り付けられる。 The core 43 is a part that performs computer arithmetic processing. In the present embodiment, for example, one task is assigned to one core 43 in syntactic units.

＜情報処理装置１０における処理の一例＞
次に、本実施形態における情報処理装置１０における処理（タスク割り付け処理）の一例について、フローチャートを用いて説明する。図４は、情報処理装置の処理の一例を示すフローチャートである。 <Example of Processing in Information Processing Device 10>
Next, an example of processing (task allocation processing) in the information processing apparatus 10 according to the present embodiment will be described using a flowchart. FIG. 4 is a flowchart illustrating an example of processing of the information processing apparatus.

図４の例において、情報処理装置１０の割り付け手段１４は、アプリケーションプログラムのタスク指示により実行するタスクのコアへの割り付けを行う（Ｓ０１）。Ｓ０１の処理において、割り付け手段１４は、過去のタスク別プロファイル情報を用いてタスク毎のメモリアクセス等の度合いを取得し、その結果から各タスクを構文単位でコアへの割り付けを行う。なお、Ｓ０１の処理では、例えば図３に示すように複数のソケットがある場合には、どのソケットを用いるかについての割り付けを行ってもよい。 In the example of FIG. 4, the allocating unit 14 of the information processing apparatus 10 allocates tasks to be executed to cores according to task instructions of application programs (S01). In the processing of S01, the allocating unit 14 acquires the degree of memory access for each task using the past task profile information, and allocates each task to the core in syntactic units. In the process of S01, when there are a plurality of sockets as shown in FIG. 3, for example, an assignment as to which socket is used may be performed.

次に、情報処理装置１０の処理実行手段１５は、コアに割り付けたタスクの処理を実行する（Ｓ０２）。情報処理装置１０のプロファイル情報測定手段１６は、実行中におけるハードウェアモニタ情報を取得し（Ｓ０３）、取得したハードウェアモニタ情報よりタスク毎のメモリアクセス率を算出する（Ｓ０４）。 Next, the process execution means 15 of the information processing apparatus 10 executes the process of the task assigned to the core (S02). The profile information measurement unit 16 of the information processing apparatus 10 acquires hardware monitor information during execution (S03), and calculates a memory access rate for each task from the acquired hardware monitor information (S04).

次に、情報処理装置１０の更新手段１８は、Ｓ０４の処理により算出されたタスク毎のメモリアクセス率に基づき、タスク別プロファイル情報を更新する（Ｓ０５）。 Next, the updating unit 18 of the information processing apparatus 10 updates the task-specific profile information based on the memory access rate for each task calculated by the process of S04 (S05).

次に、情報処理装置１０は、例えばアプリケーションに含まれる全てのタスクの処理が終了したか否かを判断し（Ｓ０６）、全てのタスクの処理が終了していない場合（Ｓ０６において、ＮＯ）、Ｓ０１の処理に戻る。この場合、Ｓ０１の処理では、Ｓ０６の処理で更新されたタスク別プロファイル情報を用いて各タスクのコアへの割り付けを行う。また、情報処理装置１０は、全てのタスクの処理が終了した場合（Ｓ０６において、ＹＥＳ）、処理を終了する。 Next, the information processing apparatus 10 determines, for example, whether or not the processing of all tasks included in the application has been completed (S06). If the processing of all tasks has not been completed (NO in S06), The process returns to S01. In this case, in the process of S01, assignment of each task to the core is performed using the task-specific profile information updated in the process of S06. Further, the information processing apparatus 10 ends the process when the processes of all tasks are completed (YES in S06).

＜本実施形態におけるタスク割り付けについて＞
ここで、本実施形態におけるタスク割り付けの具体例について説明する。図５は、本実施形態におけるタスク割り付け手法の具体例を示す図である。図５の例において、図３に示すようなマルチコアプロセッサの一例であるＣＰＵ３６は、所定のアプリケーションに対応する実行可能ファイルを実行する。ここで、ＣＰＵ３６は、タスクの構文単位でのコアへの割り付けを行う場合に、タスク別プロファイル情報を取得する。 <About task assignment in this embodiment>
Here, a specific example of task assignment in the present embodiment will be described. FIG. 5 is a diagram showing a specific example of the task assignment method in the present embodiment. In the example of FIG. 5, the CPU 36, which is an example of a multi-core processor as shown in FIG. 3, executes an executable file corresponding to a predetermined application. Here, the CPU 36 acquires task-specific profile information when performing allocation to a core in a syntax unit of a task.

図５の例に示すタスク別プロファイル情報の項目としては、例えば「タスク」、「レベル」、「メモリアクセス率」等があるが、これに限定されるものではない。「タスク」とは、タスクを識別するための情報である。「レベル」とは、タスクの階層（例えば、親子関係等）を示す情報である。例えば、図５の例において、タスクＡは、レベル０、１、２の３階層であることを示している。また、「メモリアクセス率」とは、タスク及びレベル毎のメモリアクセス率である。 Examples of the items of the task-specific profile information shown in the example of FIG. 5 include “task”, “level”, “memory access rate”, and the like, but are not limited thereto. “Task” is information for identifying a task. The “level” is information indicating a task hierarchy (for example, a parent-child relationship). For example, in the example of FIG. 5, the task A has three levels 0, 1 and 2. The “memory access rate” is a memory access rate for each task and level.

また、ＣＰＵ３６は、ランタイムライブラリ等に含まれるコンピュータプログラムの実行時に必要となるソフトウェア部品（モジュール）等を用いて、実行中のタスクのソケットやコアへの割り付け状況を取得する。タスク割り付け状況の項目としては、例えば「ソケット」、「コア」、「タスク識別」、「レベル」、「メモリアクセス率」等があるが、これに限定されるものではない。「ソケット」とは、ＣＰＵ３６中のソケットを識別するための情報である。「コア」は、ソケット毎のコアを識別するための情報である。図５の例では、１つのソケット＃０に４つのコア＃０〜＃３を有していることを示している。「タスク」は、各ソケットの各コアに割り付けられたタスクの情報である。「レベル」は、タスクに対する階層を示す図である。「メモリアクセス率」は、取得したタスクの割り付け状況から、算出手段１７によりタスクそれぞれのメモリアクセス率を算出した結果を示している。 In addition, the CPU 36 acquires the allocation status of the task being executed to the socket and the core by using software components (modules) required when executing the computer program included in the runtime library or the like. Examples of the task allocation status items include “socket”, “core”, “task identification”, “level”, “memory access rate”, and the like, but are not limited thereto. “Socket” is information for identifying a socket in the CPU 36. “Core” is information for identifying a core for each socket. In the example of FIG. 5, it is shown that one core # 0 has four cores # 0 to # 3. “Task” is information of a task assigned to each core of each socket. “Level” is a diagram showing a hierarchy for a task. The “memory access rate” indicates the result of calculating the memory access rate of each task by the calculation means 17 from the acquired task assignment status.

ＣＰＵ３６は、算出したメモリアクセス率を用いて、構文単位でソケットやコアを新たに割り付ける。なお、構文単位とは、例えばタスク構文単位であるが、これに限定されるものではなく、アプリケーション単位やスレッド（並列する処理）単位でもよいが、これに限定されるものではない。 The CPU 36 newly allocates sockets and cores in syntactic units using the calculated memory access rate. The syntax unit is, for example, a task syntax unit, but is not limited thereto, and may be an application unit or a thread (parallel processing) unit, but is not limited thereto.

このように、本実施形態では、アプリケーション側で、実際のメモリアクセス率を算出し、算出したメモリアクセス率に基づいて、次回のソケット及びコアの割り付けを、タスク指示を通じて行うことができる。 As described above, in the present embodiment, the actual memory access rate is calculated on the application side, and the next socket and core allocation can be performed through the task instruction based on the calculated memory access rate.

＜タスク構文について＞
次に、本実施形態におけるタスク構文について説明する。例えば、従来手法におけるタスクの実行では、図３に示すようなプロセッサ（ＣＰＵ３６）の構成において、ＯＳレベル単純な先入れ先出し方式やラウンドロビン方式等により適当なソケットにタスクを割り付けて実行していた。そのため、従来手法では、処理効率や処理性能の向上にも限界があった。そこで、本実施形態では、ＯｐｅｎＭＰのタスクを構文のネスト（入れ子）レベルまで分けてコアに割り付ける。したがって、例えばＯｐｅｎＭＰのタスクのネストレベル別の動作特定を用いた割り付け制御を行うことができる。 <About task syntax>
Next, the task syntax in this embodiment will be described. For example, in the task execution in the conventional method, in the configuration of the processor (CPU 36) as shown in FIG. 3, the task is assigned to an appropriate socket by the OS level simple first-in first-out method, round robin method, or the like. Therefore, the conventional method has a limit in improving the processing efficiency and processing performance. Therefore, in this embodiment, OpenMP tasks are divided up to the nesting level of syntax and assigned to the core. Therefore, for example, it is possible to perform allocation control using operation specification for each nesting level of an OpenMP task.

ここで、図６は、タスクを実行するアプリケーション中に含まれる構文の一例を示す図である。また、図７は、本実施形態におけるタスク割り付けの概略例を示す図である。 Here, FIG. 6 is a diagram illustrating an example of syntax included in an application that executes a task. FIG. 7 is a diagram showing a schematic example of task assignment in the present embodiment.

図６の例では、ＯｐｅｎＭＰＡＰＩＶｅｒｓｉｏｎ３．０に対応する構文例を示している。図６の例に示すようなプログラムによりタスクを実行するアプリケーションを動作させた場合、従来ではｔａｓｋ（タスク）構文内のｔｒａｖｅｒｓｅ内のメモリアクセス率がわからない。そのため、どのソケットに割り付けられるかはｔｒａｖｅｒｓｅ内の命令に関係なくコアを先頭から詰めて割り付けるか、又は、ラウンドロビン的に割り付けられていた。 In the example of FIG. 6, a syntax example corresponding to OpenMP API Version 3.0 is shown. When an application for executing a task is operated by a program as shown in the example of FIG. 6, conventionally, the memory access rate in the traverse in the task (task) syntax is not known. Therefore, to which socket is allocated, the cores are allocated from the top, regardless of the instruction in the traverse, or are allocated in a round robin manner.

例えば、図７（Ａ）に示すような５つのタスクＡ〜Ｅがあった場合、タスクＡ〜Ｅの従来の割り付けでは、例えば図７（Ｂ）の（ａ）に示すようにソケット＃０からコアの先頭（コア＃０）を詰めて割り付けていた。また、別の従来方式として、図７（Ｂ）の（ｂ）に示すように負荷を均等にするためにラウンドロビン的に割り付けていた。そのため、適切なソケットへの割り付けが行われずにいた。 For example, when there are five tasks A to E as shown in FIG. 7A, in the conventional assignment of tasks A to E, for example, from socket # 0 as shown in FIG. The top of the core (core # 0) was packed and assigned. Further, as another conventional method, as shown in FIG. 7B (b), the load is allocated in a round robin manner to equalize the load. For this reason, allocation to an appropriate socket was not performed.

そこで、本実施形態では、タスクＡ〜Ｅに対し、ハードウェアモニタ情報を利用して、実行途中のその場でのタスクのメモリアクセス率等の情報を履歴管理して、図７（Ａ）に示すようなタスク別プロファイル情報を測定する。次に、このタスク別プロファイル情報を用いて、これから動作させるタスクが、メモリアクセス率が高いタスクかどうか判断し、その情報に基づいて、図７（Ｂ）の（ｃ）に示すように、タスク毎にソケットやコアへの割り付けを行う。これにより、プロセッサの処理効率や処理性能の向上を図ることができる。 Therefore, in the present embodiment, for the tasks A to E, the hardware monitor information is used to manage the history of information such as the memory access rate of the task in the middle of execution, and FIG. Measure task-specific profile information as shown. Next, using this task-specific profile information, it is determined whether the task to be operated is a task with a high memory access rate. Based on this information, as shown in (c) of FIG. Assign to sockets and cores every time. As a result, the processing efficiency and processing performance of the processor can be improved.

＜コアの割り付け例＞
次に、上述した割り付け手段１４におけるコアの割り付け例について、図を用いて説明する。図８は、割り付け手段における処理の一例を示すフローチャートである。図８の例において、割り付け手段１４は、まず過去のタスク別プロファイル情報があるか否かを判断する（Ｓ１１）。なお、過去のタスク別プロファイル情報は、例えば同一のプログラム（タスク群）において実行されたアプリケーションに対するハードウェアモニタ情報から得られるタスク別プロファイル情報等であることが好ましいが、これに限定されるものではない。 <Example of core assignment>
Next, an example of core allocation in the above-described allocation unit 14 will be described with reference to the drawings. FIG. 8 is a flowchart illustrating an example of processing in the allocation unit. In the example of FIG. 8, the allocation unit 14 first determines whether there is past task-specific profile information (S11). The past task-specific profile information is preferably, for example, task-specific profile information obtained from hardware monitor information for an application executed in the same program (task group), but is not limited thereto. Absent.

割り付け手段１４は、Ｓ１１の処理において、過去のタスク別プロファイル情報がある場合（Ｓ１１において、ＹＥＳ）、その過去のプロファイル情報によるタスクのメモリアクセス等の度合い（例えば、メモリアクセス率等）を取得する（Ｓ１２）。次に、割り付け手段１４は、適正制御によりタスクを割り付けるコアを選択する（Ｓ１３）。適正制御とは、例えばメモリアクセス率が高いタスクは、メモリアクセス率が低いタスクが存在するソケットのコアに割り付け、またメモリアクセス率が低いタスクは、メモリアクセスが高いタスクが存在するソケットのコアに割り付ける等である。つまり、各ソケットにおけるメモリアクセスの度合いを均等になるように割り付けを行う。なお、適正制御については、これに限定されるものではなく、例えば親子関係にあるタスクは、同一のソケットに割り付ける等の処理を行ってもよい。 If there is past task-specific profile information in the processing of S11 (YES in S11), the allocating unit 14 acquires the degree of memory access or the like (for example, memory access rate) of the task based on the past profile information. (S12). Next, the assigning means 14 selects a core to which a task is assigned by appropriate control (S13). For example, a task with a high memory access rate is assigned to a socket core where a task with a low memory access rate exists, and a task with a low memory access rate is assigned to a socket core with a task with a high memory access. And so on. That is, the allocation is performed so that the degree of memory access in each socket is equal. In addition, about appropriate control, it is not limited to this, For example, the task which has a parent-child relationship may perform the process of assigning to the same socket.

また、割り付け手段１４は、過去のタスク別プロファイル情報がない場合（Ｓ１１において、ＮＯ）、例えばコアの先頭から詰めて割り付けたり、負荷を均等にするためにラウンドロビン的に割り付ける等の従来方式でタスクを割り付けるコアを選択する（Ｓ１４）。 Further, when there is no past task-specific profile information (NO in S11), the assigning means 14 is assigned by a conventional method such as assigning from the top of the core, or assigning in a round-robin manner to equalize the load. A core to which a task is assigned is selected (S14).

＜メモリアクセス率に基づくソケットやコアへの割り付け例＞
次に、本実施形態におけるメモリアクセス率に基づくソケットやコアへの割り付け例について、図を用いて説明する。図９、図１０は、メモリアクセス率に基づくソケットやコアへの割り付け例を示す図（その１、その２）である。なお、図９の例では、実行するタスクが親子関係にある場合を示し、図１０の例では、実行するタスクが親子関係にない場合を示している。タスクの親子関係とは、例えば親（レベル０）のタスクの実行結果を子（レベル１）のタスクで入力データとして利用する場合等であるが、これに限定されるものではない。 <Examples of allocation to sockets and cores based on memory access rate>
Next, an example of allocation to sockets and cores based on the memory access rate in this embodiment will be described with reference to the drawings. FIGS. 9 and 10 are diagrams (part 1 and part 2) showing examples of allocation to sockets and cores based on the memory access rate. The example in FIG. 9 shows a case where the task to be executed has a parent-child relationship, and the example in FIG. 10 shows a case where the task to be executed does not have a parent-child relationship. The parent-child relationship of tasks is, for example, a case where the execution result of a parent (level 0) task is used as input data in a child (level 1) task, but is not limited thereto.

図９（Ａ）は、２スレッド並列プログラム（レベル：２段階）で実行されるタスクの内容と処理実行後のハードウェアモニタ情報の一例を示している。図９（Ｂ）は、図９（Ａ）のハードウェアモニタ情報を用いて測定したタスク別プロファイル情報の一例を示している。図９（Ｃ）は、図９（Ｂ）のタスク別プロファイル情報に対応させてＣＰＵ３６のソケット及びコアにタスクを割り付けた例を示している。図９（Ｄ）は、タスク割り付け状況の一例を示している。 FIG. 9A shows an example of the contents of a task executed in a two-thread parallel program (level: two stages) and hardware monitor information after execution of the process. FIG. 9B shows an example of task-specific profile information measured using the hardware monitor information of FIG. FIG. 9C shows an example in which tasks are allocated to the sockets and cores of the CPU 36 in correspondence with the task-specific profile information of FIG. 9B. FIG. 9D shows an example of task assignment status.

図９（Ａ）に示す２スレッド並列プログラムの項目としては、例えば「タスク名（レベル）」、「スレッド数」、「経過時間（秒）」、「メモリアクセス待ち（秒）」、「タスク種別」等があるが、これに限定されるものではない。図９（Ａ）における「タスク種別」は、例えばタスクを区別するために、「タスク名＋レベル＋スレッド数」で名前を付けているが、これに限定されるものではない。 The items of the two-thread parallel program shown in FIG. 9A include, for example, “task name (level)”, “number of threads”, “elapsed time (seconds)”, “memory access waiting (seconds)”, “task type” However, it is not limited to this. The “task type” in FIG. 9A is named “task name + level + number of threads”, for example, to distinguish the tasks, but is not limited to this.

図９の例において、実行するタスク（ｔａｓｋ＿Ａ）は、親子関係（レベル０、レベル１）にある。このような場合にタスクの割り付け先コアを選択する際には、キャッシュ（メモリ４１）のローカリティ（局所性）を保つため、同一ソケットに配置する。したがって、図９（Ｂ）に示すように、ソケット＃０のコア＃０〜＃３に対して、それぞれタスクＡ００、Ａ０１、Ａ１０、Ａ１１が割り付けられる。 In the example of FIG. 9, the task to be executed (task_A) is in a parent-child relationship (level 0, level 1). In such a case, when a task assignment destination core is selected, it is arranged in the same socket in order to maintain the locality (locality) of the cache (memory 41). Therefore, as shown in FIG. 9B, tasks A00, A01, A10, and A11 are assigned to the cores # 0 to # 3 of the socket # 0, respectively.

なお、図９（Ｂ）に示すメモリアクセス率は、図９（Ａ）に示すメモリアクセス待ち時間及び経過時間を用いて、「メモリアクセス率（％）＝メモリアクセス待ち（秒）／経過時間（秒）×１００」として取得することができるが、これに限定されるものではない。 Note that the memory access rate shown in FIG. 9B is obtained by using the memory access waiting time and elapsed time shown in FIG. 9A, “memory access rate (%) = memory access waiting (seconds) / elapsed time ( Second) × 100 ”, but is not limited to this.

一方、図１０の例において、図１０（Ａ）は、２スレッド並列プログラム（レベル：１段階）で実行されるタスクの内容と処理実行後のハードウェアモニタ情報の一例を示している。図１０（Ｂ）は、図１０（Ａ）のハードウェアモニタ情報を用いて測定したタスク別プロファイル情報の一例を示している。図１０（Ｃ）は、図１０（Ｂ）のタスク別プロファイル情報に対応させてＣＰＵ３６のソケット及びコアにタスクを割り付けた例を示している。図１０（Ｄ）は、タスク割り付け状況の一例を示している。 On the other hand, in the example of FIG. 10, FIG. 10A shows an example of the contents of a task executed in a two-thread parallel program (level: 1 stage) and hardware monitor information after execution of processing. FIG. 10B shows an example of task-specific profile information measured using the hardware monitor information of FIG. FIG. 10C shows an example in which tasks are assigned to the sockets and cores of the CPU 36 in correspondence with the task-specific profile information of FIG. 10B. FIG. 10D shows an example of task assignment status.

図１０の例において、実行タスク（ｔａｓｋ＿Ｂ，ｔａｓｋ＿Ｃ）は、親子関係にないタスクである。このような場合でタスクの適切な割り付け先コアの決定する際には、メモリアクセス率とタスク割り付け状況により、各ソケットのメモリアクセス率が均等になるように割り付けを行う。例えば、割り付け手段１４は、メモリアクセス率が高いタスクは、メモリアクセス率が低いタスクが存在するソケットに割り付け、メモリアクセス率が低いタスクはメモリアクセス率が高いタスクが存在するソケットに割り付ける。 In the example of FIG. 10, the execution tasks (task_B, task_C) are tasks that are not in a parent-child relationship. In such a case, when determining an appropriate task allocation core, allocation is performed so that the memory access rate of each socket is equal depending on the memory access rate and task allocation status. For example, the assigning unit 14 assigns a task having a high memory access rate to a socket having a task having a low memory access rate, and assigns a task having a low memory access rate to a socket having a task having a high memory access rate.

図１０の例に示す割り付けでは、ソケット＃０にタスクＢ００、Ｃ０１、Ｃ０１が割り付けられているため、タスク別プロファイル情報からソケット＃０のメモリアクセス率は、２０＋２０＋５＝４５％となる。また、ソケット＃１にタスクＢ０１が割り付けられているため、タスク別プロファイル情報からソケット＃０のメモリアクセス率は、５０％となる。 In the allocation shown in the example of FIG. 10, since tasks B00, C01, and C01 are allocated to socket # 0, the memory access rate of socket # 0 is 20 + 20 + 5 = 45% based on the profile information for each task. Since task B01 is assigned to socket # 1, the memory access rate of socket # 0 is 50% from the profile information by task.

上述した割り付けを行う場合には、例えばソケット内に割り付けられたタスクのメモリアクセス率の合計が所定値（例えば、８０％〜１００％等）を超えないように調整を行うことが好ましいが、これに限定されるものではない。 When performing the above allocation, for example, it is preferable to adjust so that the total memory access rate of tasks allocated in the socket does not exceed a predetermined value (for example, 80% to 100%, etc.). It is not limited to.

このように、本実施形態では、実行する複数のタスクが親子関係にあるか否かに応じて、図９，図１０に示すように、タスクのソケットやコアへの割り付けを変えることできる。これにより、処理効率や処理性能の向上を図ることができる。 As described above, in this embodiment, the assignment of tasks to sockets and cores can be changed as shown in FIGS. 9 and 10 depending on whether or not a plurality of tasks to be executed have a parent-child relationship. Thereby, it is possible to improve processing efficiency and processing performance.

＜タスク処理の開始から終了までの処理内容について＞
次に、プロファイル情報測定手段１６におけるタスク処理の開始から終了までの処理内容について説明する。まず割り付け手段１４は、上述したコア選択処理における適正制御により、タスクを予め設定されたソケットやコアに割り付ける。処理実行手段１５は、割り付け手段１４によりタスク単位でコアに割り付けられたタスク処理を開始（実行）する。このとき、プロファイル情報測定手段１６は、タスク別プロファイル情報を更新するために、ハードウェアモニタ情報を使用したタスクのプロファイル情報の測定も開始する。なお、タスクの初回実行では、タスク別プロファイル情報が存在しないため、例えば従来方式（例えば、単純な先入れ先出し方式やラウンドロビン方式等）でのコアの割り付けを行う。どの従来方式を用いるかについては、例えば予めユーザ等により設定しておくことができる。 <Processing contents from start to end of task processing>
Next, processing contents from the start to the end of task processing in the profile information measuring means 16 will be described. First, the assigning means 14 assigns a task to a preset socket or core by appropriate control in the core selection process described above. The process execution means 15 starts (executes) the task process assigned to the core by the assignment means 14 in units of tasks. At this time, the profile information measuring unit 16 also starts measuring task profile information using the hardware monitor information in order to update the task-specific profile information. In the first execution of a task, there is no task-specific profile information, so, for example, cores are allocated by a conventional method (for example, a simple first-in first-out method or round robin method). Which conventional method is used can be set in advance by a user or the like, for example.

また、処理実行中は、ループ単位、サブルーチン単位等の所定のタスク構文の終了毎又は所定の時間間隔毎に、プロファイル情報等の更新を行ってもよい。また、タスク処理の終了時には、プロファイル情報の測定も終了する。 During the execution of the process, the profile information and the like may be updated at the end of a predetermined task syntax such as a loop unit or a subroutine unit or at a predetermined time interval. At the end of task processing, measurement of profile information is also ended.

＜更新手段１８における処理について＞
次に、更新手段１８における処理について説明する。更新手段１８は、上述したプロファイル情報測定の結果を用いてタスク別プロファイル情報を更新する。例えば、本実施形態では、プログラム実行中にタスクの構文単位でソケットやコアへの割り付けを制御することができる。 <About processing in the updating means 18>
Next, processing in the updating unit 18 will be described. The updating unit 18 updates the task-specific profile information using the above-described profile information measurement result. For example, in this embodiment, allocation to sockets and cores can be controlled in units of task syntax during program execution.

例えば、１つのアプリケーションプログラムでは、ループ処理やサブルーチン処理等により同じ関数（タスク）が何度も実行される。そのため、本実施形態では、常時プロファイラでタスク別プロファイラ情報を取得することで、次回のタスク処理におけるソケットやコアへの割り付けを適切に行う。例えば、タスクは、プログラムの振る舞い(例えば、ｉｆ文等の条件分岐)等により、同じタスクでも実行する度に処理内容が異なり、負荷も異なる。しかしながら、本実施形態のように、常時プロファイラでタスク別プロファイラ情報を取得して次回のタスクの割り付けの予測材料とすることで、負荷が異なる場合にも対応することができる。また、本実施形態では、タスクの実行において、ハードウェアモニタ情報を利用してタスクを適切なソケットやコアへ割り付けることで、タスク実行の負荷が軽減されてタスクの性能を向上することができる。 For example, in one application program, the same function (task) is executed many times by loop processing, subroutine processing, or the like. For this reason, in the present embodiment, the task-specific profiler information is always acquired by the profiler, so that allocation to sockets and cores in the next task processing is appropriately performed. For example, each time a task is executed by the same task, the processing contents are different and the load is different depending on the behavior of the program (for example, conditional branching such as an if statement). However, as in the present embodiment, by obtaining task-specific profiler information with the regular profiler and using it as a prediction material for the next task assignment, it is possible to cope with different loads. Also, in this embodiment, in task execution, the task execution load can be reduced and task performance can be improved by allocating tasks to appropriate sockets and cores using hardware monitor information.

＜次回のタスクの割り付け例＞
次に、次回のタスクのソケットやコアへの割り付け例（各実施例）について、図を用いて説明する。 <Next task assignment example>
Next, an example of assigning the next task to a socket or core (each example) will be described with reference to the drawings.

＜第１実施例＞
図１１〜図１３は、第１実施例における割り付け例を示す図（その１〜その３）である。図１１（Ａ）の例では、タスク割り付けＮ−１回目のタスクの内容と処理実行後のハードウェアモニタ情報の一例を示している。また、図１１（Ｂ）の例では、タスク割り付けＮ回目のタスクの内容と処理実行後のハードウェアモニタ情報の一例を示している。 <First embodiment>
FIGS. 11 to 13 are diagrams (Nos. 1 to 3) showing examples of allocation in the first embodiment. In the example of FIG. 11A, an example of the task allocation N-1th task content and hardware monitor information after the process execution is shown. Further, in the example of FIG. 11B, an example of the task allocation N-th task content and hardware monitor information after the process execution is shown.

第１実施例では、２スレッド並列プログラム(レベル：1階層)でタスクに親子関係がない場合であって、ソケットやコアを占有して実行する（タスク割り付け時に常にコアが未使用状態）場合の例を示している。 In the first embodiment, when a two-thread parallel program (level: 1 layer) has no parent-child relationship with a task, and the socket and core are occupied and executed (the core is always unused at the time of task allocation) An example is shown.

第１実施例では、図１２（Ａ）の例に示すように、各ソケットのタスクの割り付けＮ回目の直前のメモリアクセス率の合計がソケット＃０＝０％（未使用）、ソケット＃１＝０％（未使用）であるとする。 In the first embodiment, as shown in the example of FIG. 12A, the total of the memory access rates immediately before the Nth task allocation of each socket is socket # 0 = 0% (unused), and socket # 1 = Assume 0% (unused).

ここで、タスクＢ００、Ｂ０１、Ｃ００、Ｃ０１を含むプログラム（アプリケーション）を実行する場合、各タスクをコアに割り付ける。そのため、割り付け手段１４は、図１１（Ａ）に示す過去のハードウェアモニタ情報から取得した過去のタスク別プロファイル情報（例えば、１〜Ｎ−１回目までの統計履歴又はＮ−１回目のみの履歴情報）（図１２（Ｂ））を参照して、メモリアクセス率を取得する。また、割り付け手段１４は、取得したメモリアクセス率に基づいて、図１２（Ｃ）に示すようにタスクの割り付けを行う。図１２（Ｃ）の例では、ソケット毎にメモリアクセス率が均等になるように割り付けが行われ、各ソケットのメモリアクセス率の合計がソケット＃０が４５％となり、ソケット＃１が５０％となる。したがって、図１３（Ａ）に示すタスクのソケットやコアへの割り付けにより処理が実行される。なお、第１実施例では、図１１（Ｂ）に示すように処理実行中のハードウェアモニタ情報を取得し、取得したハードウェアモニタ情報を用いて、タスク別プロファイル情報を更新する。 Here, when a program (application) including tasks B00, B01, C00, and C01 is executed, each task is assigned to a core. Therefore, the allocating unit 14 uses the past task-specific profile information acquired from the past hardware monitor information shown in FIG. 11A (for example, the first to N-1th statistics history or the N-1th history only). Referring to (information) (FIG. 12B), the memory access rate is acquired. Further, the allocation means 14 allocates tasks as shown in FIG. 12C based on the acquired memory access rate. In the example of FIG. 12C, allocation is performed so that the memory access rate is equalized for each socket, and the total memory access rate of each socket is 45% for socket # 0 and 50% for socket # 1. Become. Therefore, processing is executed by assigning tasks to sockets and cores shown in FIG. In the first embodiment, as shown in FIG. 11B, the hardware monitor information being processed is acquired, and the task-specific profile information is updated using the acquired hardware monitor information.

次に、第１実施例において、図１２（Ｄ）の例に示すように、各ソケットのタスクの割り付けＮ＋１回目の直前のメモリアクセス率の合計がソケット＃０＝０％（未使用）、ソケット＃１＝０％（未使用）であるとする。 Next, in the first embodiment, as shown in the example of FIG. 12D, the total of the memory access rates immediately before the N + 1th task allocation of each socket is socket # 0 = 0% (unused), and the socket Assume that # 1 = 0% (unused).

ここで、タスクＢ００、Ｂ０１、Ｃ００、Ｃ０１を含むプログラムを実行する場合、割り付け手段１４は、図１２（Ｅ）に示す更新したタスク別プロファイル情報（例えば、１〜Ｎ回目までの統計履歴又はＮ回目のみの履歴情報）を参照して、メモリアクセス率を取得する。また、割り付け手段１４は、取得したメモリアクセス率に基づいて、図１２（Ｆ）に示すようにタスクの割り付けを行う。図１２（Ｆ）の例では、ソケット毎にメモリアクセス率が均等になるように割り付けが行われ、各ソケットのメモリアクセス率の合計がソケット＃０が７０％となり、ソケット＃１が６０％となる。したがって、図１３（Ｂ）に示すタスクのソケットやコアへの割り付けにより処理が実行される。また、第１実施例では、処理実行中のハードウェアモニタ情報を用いて、タスク別プロファイル情報を更新する。そのため、Ｎ＋２回目以降のタスクの割り付けにおいても同様に更新されたタスク別プロファイル情報を用いてタスク構文単位でソケットやコアを指定することができる。 Here, when executing a program including tasks B00, B01, C00, and C01, the allocating unit 14 updates the updated task-specific profile information (for example, the statistical history from the first to the Nth time or N The memory access rate is acquired by referring to the history information only for the first time). Further, the allocation means 14 allocates tasks as shown in FIG. 12 (F) based on the acquired memory access rate. In the example of FIG. 12 (F), the allocation is performed so that the memory access rate is equal for each socket, and the total memory access rate of each socket is 70% for socket # 0 and 60% for socket # 1. Become. Therefore, processing is executed by assigning tasks to sockets and cores shown in FIG. In the first embodiment, the task-specific profile information is updated using the hardware monitor information being processed. Therefore, sockets and cores can be specified in units of task syntax using task-specific profile information that is similarly updated in the N + 2 and subsequent task assignments.

＜第２実施例＞
図１４、図１５は、第２実施例における割り付け例を示す図（その１、その２）である。第２実施例において、２スレッド並列プログラム(レベル：1階層)の場合で、タスクに親子関係なしの場合を示している点では、上述した第１実施例と同様である。更に、第２実施例では、ソケットやコアを他のアプリケーションプログラムと共有して実行する（例えば、タスク割り付け時に常に他のプログラムが任意のコアに割り付いている）場合を示している。なお、第２実施例でも、第１実施例と同様に、上述した図１１に示したタスクの内容と処理実行後のハードウェアモニタ情報の一例を用いることとする。 <Second embodiment>
FIGS. 14 and 15 are diagrams (No. 1 and No. 2) showing an allocation example in the second embodiment. The second embodiment is the same as the first embodiment described above in that it is a case of a two-thread parallel program (level: one layer) and the task has no parent-child relationship. Furthermore, the second embodiment shows a case where a socket or core is shared with another application program and executed (for example, another program is always assigned to an arbitrary core at the time of task assignment). In the second embodiment, as in the first embodiment, an example of the task contents and the hardware monitor information after the process shown in FIG. 11 is used.

第２実施例において、図１４（Ａ）に示すように、コアの一部に実行対象のタスク以外のタスクＸ、Ｙ、Ｚが存在する。そのため、タスク割り付けＮ回目の直前のメモリアクセス率の合計では、ソケット＃０＝４０％、ソケット＃１＝４０％となる。 In the second embodiment, as shown in FIG. 14A, tasks X, Y, and Z other than the task to be executed exist in a part of the core. Therefore, the total of the memory access rates immediately before the Nth task allocation is socket # 0 = 40% and socket # 1 = 40%.

ここで、タスクＢ００、Ｂ０１、Ｃ００、Ｃ０１を含むプログラム（アプリケーション）を実行する場合、各タスクをコアに割り付けるため、割り付け手段１４は、図１４（Ｂ）に示す過去のタスク別プロファイル情報（例えば、１〜Ｎ−１回目までの統計履歴）を参照して、メモリアクセス率を取得する。また、割り付け手段１４は、取得したメモリアクセス率に基づいて、図１４（Ｃ）に示すようにタスクの割り付けを行う。図１４（Ｃ）の例では、ソケット毎にメモリアクセス率が均等になるように割り付けが行われる。なお、図１４（Ｃ）の例では、ソケット＃０にタスクＸ、Ｂ００、Ｃ００、Ｙが割り付けられているため、ソケット＃０内の全てのコア＃０〜＃３を使用しているため、タスクＣ０１がソケット＃１に割り付けられる。したがって、最終的には、各ソケットのメモリアクセス率の合計がソケット＃０が８５％となり、ソケット＃１が９５％となる。 Here, when executing a program (application) including tasks B00, B01, C00, and C01, in order to assign each task to the core, the allocating unit 14 includes past task-specific profile information (for example, FIG. 14B). , 1 to N-1th statistical history) to obtain the memory access rate. Further, the allocation unit 14 allocates tasks as shown in FIG. 14C based on the acquired memory access rate. In the example of FIG. 14C, the allocation is performed so that the memory access rate is uniform for each socket. In the example of FIG. 14C, since tasks X, B00, C00, and Y are assigned to socket # 0, all cores # 0 to # 3 in socket # 0 are used. Task C01 is assigned to socket # 1. Therefore, finally, the total memory access rate of each socket is 85% for socket # 0 and 95% for socket # 1.

したがって、図１５（Ａ）に示すタスクのソケットやコアへの割り付けにより処理が実行される。なお、第２実施例では、処理実行中のハードウェアモニタ情報を取得し、取得したハードウェアモニタ情報を用いて、タスク別プロファイル情報を更新する。 Therefore, processing is executed by assigning tasks to sockets and cores shown in FIG. In the second embodiment, the hardware monitor information being processed is acquired, and the task-specific profile information is updated using the acquired hardware monitor information.

次に、第２実施例において、Ｎ＋１回目の割り付けを行う場合、図１４（Ｄ）に示すように、コアの一部に実行対象のタスク以外のタスクＵ、Ｖ、Ｗが存在する。そのため、タスク割り付けＮ＋１回目の直前のメモリアクセス率の合計では、ソケット＃０＝２０％、ソケット＃１＝１５％となる。 Next, in the second embodiment, when the N + 1th allocation is performed, as shown in FIG. 14D, tasks U, V, and W other than the task to be executed exist in a part of the core. Therefore, the total of the memory access rates immediately before the task allocation N + 1th time is socket # 0 = 20% and socket # 1 = 15%.

ここで、タスクＢ００、Ｂ０１、Ｃ００、Ｃ０１を含むプログラムを実行する場合、割り付け手段１４は、図１４（Ｅ）に示す更新したタスク別プロファイル情報（例えば、１〜Ｎ回目までの統計履歴）を参照して、メモリアクセス率を取得する。また、割り付け手段１４は、取得したメモリアクセス率に基づいて、図１４（Ｆ）に示すようにタスクの割り付けを行う。図１４（Ｆ）の例では、ソケット毎にメモリアクセス率が均等になるように割り付けが行われ、各ソケットのメモリアクセス率の合計がソケット＃０が８０％となり、ソケット＃１が８５％となる。したがって、図１５（Ｂ）に示すタスクのソケットやコアへの割り付けにより処理が実行される。また、第２実施例では、処理実行中のハードウェアモニタ情報を用いて、タスク別プロファイル情報を更新する。そのため、Ｎ＋２回目以降のタスクの割り付けにおいても同様に更新されたタスク別プロファイル情報を用いてタスク構文単位でソケットやコアを指定することができる。 Here, when executing a program including tasks B00, B01, C00, and C01, the allocating means 14 uses the updated task-specific profile information (for example, statistical history from 1 to N times) shown in FIG. Refer to and obtain the memory access rate. Further, the allocation unit 14 allocates tasks as shown in FIG. 14F based on the acquired memory access rate. In the example of FIG. 14F, allocation is performed so that the memory access rate is uniform for each socket, and the total memory access rate of each socket is 80% for socket # 0 and 85% for socket # 1. Become. Therefore, processing is executed by assigning tasks to sockets and cores shown in FIG. In the second embodiment, the task-specific profile information is updated using the hardware monitor information being processed. Therefore, sockets and cores can be specified in units of task syntax using task-specific profile information that is similarly updated in the N + 2 and subsequent task assignments.

このように本実施形態では、次の割り付けについてもリアルタイムにフィードバック制御することができ、タスク別プロファイル情報を用いて適切な対応を取ることができる。 As described above, in the present embodiment, the next allocation can be feedback-controlled in real time, and appropriate correspondence can be taken using the task-specific profile information.

なお、上述した実施形態では、ハードウェアモニタ情報のメモリアクセス待ち時間と経過時間とを用いてメモリアクセス率を算出したが、これに限定されるものではなく、例えばキャッシュミス率と経過時間とに基づいてメモリアクセス率を算出してもよい。キャッシュミス情報は、ハードウェアモニタ情報から収集することができる。キャッシュミス情報とは、ＣＰＵ３６がキャッシュメモリ（メモリ４１）を検索してもデータが存在しない場合のことである。キャッシュミスの場合は、（必要なデータが近距離にある）キャッシュメモリ上には存在せず、必要なデータが遠距離にあるメインメモリ上に存在するため、データへのアクセスに時間がかかる。このアクセスに要する時間がメモリアクセス待ち時間である。したがって、キャッシュミスが多い場合は、メモリアクセス待ち時間も増えるため、上述したメモリアクセス待ち時間ではなく、キャッシュミス率を用いてメモリアクセス率を算出することができる。したがって、本実施形態では、メモリアクセスが多く、タスクのキャッシュミス等が極端に大きい場合又は小さい場合に、タスク実行の割り付け先ソケットやコアを変更することで、タスク実行の効率を大幅に向上することができる。 In the above-described embodiment, the memory access rate is calculated using the memory access waiting time and the elapsed time of the hardware monitor information. However, the present invention is not limited to this, and for example, the cache miss rate and the elapsed time are calculated. Based on this, the memory access rate may be calculated. Cache miss information can be collected from hardware monitor information. The cache miss information is when no data exists even when the CPU 36 searches the cache memory (memory 41). In the case of a cache miss, it takes time to access data because it does not exist on the cache memory (necessary data is at a short distance) and the necessary data exists on a main memory at a long distance. The time required for this access is the memory access waiting time. Therefore, when there are many cache misses, the memory access wait time also increases, so that the memory access rate can be calculated using the cache miss rate instead of the memory access wait time described above. Therefore, in this embodiment, when the memory access is large and the cache miss of the task is extremely large or small, the task execution efficiency is greatly improved by changing the task execution allocation socket or core. be able to.

上述したように本実施形態によれば、アプリケーション側で実際のメモリアクセス率に基づいてソケット及びコアの指定を、タスク指示を通じて構文単位で行うことができる。そのため、処理効率や処理性能の向上を図ることができる。 As described above, according to the present embodiment, it is possible to specify a socket and a core on a syntax basis through a task instruction on the application side based on an actual memory access rate. Therefore, it is possible to improve processing efficiency and processing performance.

また、本実施形態によれば、例えばＯｐｅｎＭＰ等のタスク構文を持つアプリケーションを実行し、実行途中のその場でのタスクのメモリアクセス率等の履歴から、タスクを割り付けるソケットやコアを適切に選択することができる。また、例えばハードウェアモニタ情報を利用して、実行途中のその場でのタスクのメモリアクセス率等の情報を履歴管理し、それらの情報を使いこれから動作するタスクが、メモリアクセス率が高いタスクか否かを判断し、その情報に基づいてタスクを割り付けるソケットやコアを適切に選択することができる。また、本実施形態によれば、メモリアクセス率からソケットやコアに適切な負荷分散で割り付けられるため、各ソケットあたりの処理時間を短縮させることができる。 Further, according to the present embodiment, for example, an application having a task syntax such as OpenMP is executed, and a socket or a core to which a task is allocated is appropriately selected from a history such as a memory access rate of the task in the middle of execution. be able to. Also, for example, using hardware monitor information, history management of information such as the memory access rate of the task in the middle of execution is performed, and whether a task to be operated using such information is a task with a high memory access rate. It is possible to determine whether or not and appropriately select a socket or core to which a task is assigned based on the information. Further, according to the present embodiment, the processing time per socket can be shortened because the memory access rate is allocated to the sockets and cores with appropriate load distribution.

以上、実施例について詳述したが、特定の実施例に限定されるものではなく、特許請求の範囲に記載された範囲内において、種々の変形及び変更が可能である。また、上述した各実施例の一部又は全部を組み合わせることも可能である。 Although the embodiments have been described in detail above, the invention is not limited to the specific embodiments, and various modifications and changes can be made within the scope described in the claims. Moreover, it is also possible to combine a part or all of each Example mentioned above.

なお、以上の実施例に関し、更に以下の付記を開示する。
（付記１）
アプリケーションの実行により得られるハードウェアモニタ情報からタスク毎のメモリアクセス率を算出し、
算出した前記メモリアクセス率に基づき、前記アプリケーションからのタスク指示に対して、前記アプリケーションを実行するプログラムにおける前記タスクの構文単位で、プロセッサのソケット又はコアへの割り付けを行う、処理をコンピュータに実行させるためのタスク割り付けプログラム。
（付記２）
前記ハードウェアモニタ情報から得られる前記タスクに対するメモリアクセス率と、前記ソケットにすでに割り付けられているタスクのメモリアクセス率とに基づいて、前記タスクを割り付けるソケットを選択することを特徴とする付記１に記載のタスク割り付けプログラム。
（付記３）
前記メモリアクセス率が高いタスクは、メモリアクセス率が低いタスクが存在するソケットに割り付け、前記メモリアクセス率が低いタスクはメモリアクセス率が高いタスクが存在するソケットに割り付けることを特徴とする付記２に記載のタスク割り付けプログラム。
（付記４）
前記メモリアクセス率は、前記ハードウェアモニタ情報に含まれるメモリアクセス待ち時間又はキャッシュミス情報と、前記タスクの命令実行に要した経過時間とを用いて算出することを特徴とする付記１乃至３の何れか１項に記載のタスク割り付けプログラム。
（付記５）
前記タスクの親子関係の有無に応じて、前記ソケット又は前記コアに対する割り付けを行うことを特徴とする付記１乃至４の何れか１項に記載のタスク割り付けプログラム。
（付記６）
前記タスクに親子関係がある場合は、前記親子関係にある複数のタスクを同一ソケットに割り付け、前記タスクに親子関係がない場合は、各ソケットにおける前記メモリアクセス率が均等になるように、前記ソケットに割り付けることを特徴とする付記５に記載のタスク割り付けプログラム。
（付記７）
情報処理装置が、
アプリケーションの実行により得られるハードウェアモニタ情報からタスク毎のメモリアクセス率を算出し、
算出した前記メモリアクセス率に基づき、前記アプリケーションからのタスク指示に対して、前記アプリケーションを実行するプログラムにおける前記タスクの構文単位で、プロセッサのソケット又はコアへの割り付けを行うことを特徴とするタスク割り付け方法。
（付記８）
アプリケーションの実行により得られるハードウェアモニタ情報からタスク毎のメモリアクセス率を算出する算出手段と、
前記算出手段により算出した前記メモリアクセス率に基づき、前記アプリケーションからのタスク指示に対して、前記アプリケーションを実行するプログラムにおける前記タスクの構文単位で、プロセッサのソケット又はコアへの割り付けを行う割り付け手段とを有することを特徴とする情報処理装置。 In addition, the following additional remarks are disclosed regarding the above Example.
(Appendix 1)
Calculate the memory access rate for each task from the hardware monitor information obtained by executing the application,
Based on the calculated memory access rate, in response to a task instruction from the application, the processor executes a process of allocating to a socket or core of a processor in a syntax unit of the task in a program executing the application. Task assignment program for.
(Appendix 2)
The socket 1 to which the task is assigned is selected based on the memory access rate for the task obtained from the hardware monitor information and the memory access rate of the task already assigned to the socket. The task assignment program described.
(Appendix 3)
The task 2 having a high memory access rate is assigned to a socket having a task with a low memory access rate, and the task having a low memory access rate is assigned to a socket having a task having a high memory access rate. The task assignment program described.
(Appendix 4)
The memory access rate is calculated using memory access waiting time or cache miss information included in the hardware monitor information and an elapsed time required to execute the instruction of the task. The task assignment program according to any one of the above items.
(Appendix 5)
The task assignment program according to any one of appendices 1 to 4, wherein assignment to the socket or the core is performed according to the presence or absence of a parent-child relationship of the task.
(Appendix 6)
When the task has a parent-child relationship, a plurality of tasks having the parent-child relationship are allocated to the same socket, and when the task does not have a parent-child relationship, the socket is set so that the memory access rate is uniform in each socket. The task allocation program according to appendix 5, wherein the task allocation program is allocated to
(Appendix 7)
Information processing device
Calculate the memory access rate for each task from the hardware monitor information obtained by executing the application,
Based on the calculated memory access rate, task allocation is performed by assigning a processor socket or core in units of syntax of the task in a program that executes the application in response to a task instruction from the application. Method.
(Appendix 8)
A calculation means for calculating a memory access rate for each task from hardware monitor information obtained by executing the application;
Allocating means for allocating to a processor socket or core in a unit of syntax of the task in a program executing the application in response to a task instruction from the application based on the memory access rate calculated by the calculating means; An information processing apparatus comprising:

１０情報処理装置
１１入力手段
１２出力手段
１３記憶手段
１４割り付け手段
１５処理実行手段
１６プロファイル情報測定手段
１７算出手段
１８更新手段
１９通信手段
２０制御手段
３１入力装置
３２出力装置
３３ドライブ装置
３４補助記憶装置
３５主記憶装置
３６ＣＰＵ
３７ネットワーク接続装置
３８記録媒体
４１メモリ
４２ソケット
４３コア DESCRIPTION OF SYMBOLS 10 Information processing apparatus 11 Input means 12 Output means 13 Storage means 14 Allocation means 15 Process execution means 16 Profile information measurement means 17 Calculation means 18 Update means 19 Communication means 20 Control means 31 Input device 32 Output device 33 Drive device 34 Auxiliary storage device 35 Main memory 36 CPU
37 Network connection device 38 Recording medium 41 Memory 42 Socket 43 Core

Claims

For each task, from the hardware monitor information obtained by executing the application, information indicating the presence or absence of the parent-child relationship of the task, and the memory access rate, the task-specific profile information is generated,
Based on the task-specific profile information, in response to a task instruction from the application, a result of executing the task by allocating to a processor socket or core in a syntax unit of the task in a program executing the application Update the task-specific profile information based on
Based on the updated the task-specific profile information, the syntax units of tasks performed by the next task instruction, task allocation program for executing attached Ri allocate to the processor socket or core, the process to the computer.

The task assignment program according to claim 1 , wherein a socket to which the task is assigned is selected based on a memory access rate included in the updated profile information for each task.

3. The task having a high memory access rate is assigned to a socket having a task having a low memory access rate, and the task having a low memory access rate is assigned to a socket having a task having a high memory access rate. Task assignment program described in 1.

The task assignment program according to any one of claims 1 to 3, wherein assignment to the socket or the core is performed in accordance with the presence or absence of a parent-child relationship of the task.

Information processing device
For each task, from the hardware monitor information obtained by executing the application, information indicating the presence or absence of the parent-child relationship of the task, and the memory access rate, the task-specific profile information is generated,
Based on the task-specific profile information, in response to a task instruction from the application, a result of executing the task by allocating to a processor socket or core in a syntax unit of the task in a program executing the application Update the task-specific profile information based on
A task allocation method comprising: allocating to a processor socket or core in a syntax unit of a task executed by a next task instruction based on the updated task-specific profile information.