CN106874158A - A kind of heterogeneous system Whole Process power consumption metering method - Google Patents
A kind of heterogeneous system Whole Process power consumption metering method Download PDFInfo
- Publication number
- CN106874158A CN106874158A CN201710020074.5A CN201710020074A CN106874158A CN 106874158 A CN106874158 A CN 106874158A CN 201710020074 A CN201710020074 A CN 201710020074A CN 106874158 A CN106874158 A CN 106874158A
- Authority
- CN
- China
- Prior art keywords
- power consumption
- heterogeneous
- program
- segment
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000008569 process Effects 0.000 title abstract description 10
- 238000004891 communication Methods 0.000 claims abstract description 40
- 230000003068 static effect Effects 0.000 claims description 17
- 238000005265 energy consumption Methods 0.000 claims description 13
- 238000005259 measurement Methods 0.000 claims description 11
- 230000001133 acceleration Effects 0.000 claims description 9
- 238000002076 thermal analysis method Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 4
- 238000012887 quadratic function Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 abstract description 8
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000000691 measurement method Methods 0.000 description 6
- 230000004048 modification Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3006—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3058—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
- G06F11/3062—Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations where the monitored property is the power consumption
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3452—Performance evaluation by statistical analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Power Sources (AREA)
Abstract
Description
技术领域technical field
本发明涉及异构系统领域,尤其涉及一种异构系统全程序功耗计量方法。The invention relates to the field of heterogeneous systems, in particular to a method for measuring power consumption of a whole program of a heterogeneous system.
背景技术Background technique
功耗精确计量是面向特定体系结构进行功耗优化的基础。目前关于异构系统功耗计量方法的研究并不充分,大多是基于同构系统功耗计量方法修改得到。然而,异构系统由于集成有多种不同类型的处理器(主要分为主处理器与加速处理器),各处理器不仅具有不同体系结构;同时主处理器与加速处理器大都通过系统总线链接,在调度加速部件执行加速计算的过程中必然引入额外的通信操作;另外加速处理器密集处理单元使得芯片温度较一般处理器高,而温度对静态功耗会产生一定影响,导致静态功耗比例在逐渐增大,因此面向异构系统的功耗计量对象相比同构系统会更加复杂。Accurate metering of power consumption is the basis for power optimization for specific architectures. At present, the research on the measurement method of power consumption of heterogeneous system is not sufficient, and most of them are based on the modification of the measurement method of power consumption of homogeneous system. However, since heterogeneous systems integrate many different types of processors (mainly divided into main processors and accelerator processors), each processor not only has a different architecture; , in the process of scheduling acceleration components to perform accelerated calculations, additional communication operations must be introduced; in addition, the intensive processing unit of the accelerated processor makes the chip temperature higher than that of ordinary processors, and the temperature will have a certain impact on static power consumption, resulting in static power consumption ratio It is gradually increasing, so the power consumption measurement object for heterogeneous systems will be more complicated than that of homogeneous systems.
传统功耗计量的对象基本上都是单独针对处理器部件或者整个处理器进行建模,考虑的系统功耗与应用程序的执行过程无关,仅由处理器决定。然而在异构系统中,由于编程模型或体系结构上的限制,并行应用程序大都采用通用微处理器与加速部件依次执行不同计算段的方式来完成整个应用,并且随着异构并行处理技术及其支撑环境的不断完善,越来越多的并行程序将采用异构多处理器并行组合处理单个并行计算段的方式,以充分开发系统并行处理的优势。同时,由于异构系统中主处理器与加速部件间大都通过PCI接口传递数据,其单项峰值带宽仅为8GB/s,特别是以GPU为代表的加速处理器显存容量已经很难满足科学计算应用的需求,进一步增大了数据通信带宽的压力,对于大量数据密集型应用,处理器间的数据通信开销对异构系统高功耗造成了不小影响。随着集成电路进入纳米工艺,漏电流静态功耗已超过动态功耗,成为了芯片功耗的主要来源。The objects of traditional power consumption measurement are basically modeled solely for processor components or the entire processor, and the considered system power consumption has nothing to do with the execution process of the application program, and is only determined by the processor. However, in heterogeneous systems, due to the limitations of programming models or architectures, most parallel applications use general-purpose microprocessors and acceleration components to execute different calculation segments in sequence to complete the entire application. With the continuous improvement of its supporting environment, more and more parallel programs will adopt the method of parallel combination of heterogeneous multi-processors to process a single parallel computing segment, so as to fully exploit the advantages of system parallel processing. At the same time, since the main processor and the acceleration components in the heterogeneous system mostly transmit data through the PCI interface, its single peak bandwidth is only 8GB/s, especially the memory capacity of the acceleration processor represented by the GPU has been difficult to meet the requirements of scientific computing applications. The demand further increases the pressure on data communication bandwidth. For a large number of data-intensive applications, the data communication overhead between processors has a significant impact on the high power consumption of heterogeneous systems. As integrated circuits enter the nanometer process, the static power consumption of the leakage current has exceeded the dynamic power consumption and has become the main source of chip power consumption.
发明内容Contents of the invention
为克服现有技术的不足,从全程序角度建立异构系统功耗计量方法,有效降低系统能耗,更为高效开发异构系统效能优势,本发明提出一种异构系统全程序功耗计量方法。In order to overcome the deficiencies of the existing technology, establish a heterogeneous system power consumption measurement method from the perspective of the whole program, effectively reduce the system energy consumption, and more efficiently develop the performance advantages of the heterogeneous system, the present invention proposes a whole program power consumption measurement method for the heterogeneous system method.
本发明的技术方案是这样实现的:Technical scheme of the present invention is realized like this:
一种异构系统全程序功耗计量方法,包括步骤A method for measuring power consumption of a heterogeneous system, including the steps of
S1:针对异构多处理器并行处理单个并行计算段,根据同一类型处理器或多种不同类型处理器完成计算段的不同方式,分析同构计算段程序执行时间对该计算段动态功耗的影响,建立同构计算段功耗与执行时间关系,获得基于同构程序划分的动态功耗表示方法;S1: Aiming at the parallel processing of a single parallel computing segment by heterogeneous multiprocessors, according to the different ways that the same type of processor or multiple different types of processors complete the computing segment, analyze the impact of the program execution time of the homogeneous computing segment on the dynamic power consumption of the computing segment Influence, establish the relationship between power consumption and execution time of the isomorphic computing segment, and obtain a dynamic power consumption representation method based on isomorphic program division;
S2:分析时间约束条件下单个计算段达到功耗最优的条件,建立异构计算段功耗与执行时间关系,获得基于异构程序划分的动态功耗表示方法;S2: Analyze the condition that a single computing segment achieves optimal power consumption under time constraints, establish the relationship between power consumption and execution time of heterogeneous computing segments, and obtain a dynamic power consumption representation method based on heterogeneous program division;
S3:在同构计算段程序中,以并行数据规模为对象,分析主处理器与加速处理器之间数据传输对通信能耗的影响,获得同构计算段通信能耗表示方法;S3: In the program of the isomorphic computing segment, taking the parallel data scale as the object, analyzing the impact of data transmission between the main processor and the accelerator processor on the communication energy consumption, and obtaining the expression method of the communication energy consumption of the isomorphic computing segment;
S4:在异构计算段程序中,以并行执行任务为对象,利用异构处理器实际效能与任务特征的直接关系,分析单个计算段中具有数据依赖关系的多个并行任务划分对通信能耗的影响,获得异构计算段通信能耗表示方法;S4: In the heterogeneous computing segment program, taking parallel execution tasks as the object, using the direct relationship between the actual performance of the heterogeneous processor and the task characteristics, analyzing the communication energy consumption caused by the division of multiple parallel tasks with data dependencies in a single computing segment The impact of the communication energy consumption of heterogeneous computing segments is obtained;
S5:以多核处理器芯片为对象,利用处理器内核的热传导特性,采用等效RC电路方法建立实时系统热分析模型,求解芯片工作温度;S5: Taking the multi-core processor chip as the object, using the heat conduction characteristics of the processor core, using the equivalent RC circuit method to establish a real-time system thermal analysis model to solve the chip operating temperature;
S6:分析芯片漏电流与静态功耗的相互关系,进行曲线拟合,获得漏电流与芯片温度、电压的函数关系式;S6: Analyze the relationship between chip leakage current and static power consumption, perform curve fitting, and obtain the functional relationship between leakage current and chip temperature and voltage;
S7:引入两个工作参考温度,建立漏电流与温度的二次函数,获得静态功耗与芯片温度的函数关系式,建立基于实时温度管理的静态功耗计量表示方法。S7: Introduce two working reference temperatures, establish the quadratic function of leakage current and temperature, obtain the functional relationship between static power consumption and chip temperature, and establish a static power consumption measurement representation method based on real-time temperature management.
进一步地,步骤S6中所述进行曲线拟合是使用HISPICE软件完成的。Further, the curve fitting described in step S6 is completed by using HISPICE software.
本发明的有益效果在于,与现有技术相比,本发明分析并行程序在异构系统上的执行过程,关注多个并行段的功耗建模,同时考虑主处理器与加速处理器任务通信带来的通信开销,以及芯片温度升高带来的漏电流影响,从全程序角度,精确统计异构并行系统功耗计算。The beneficial effect of the present invention is that, compared with the prior art, the present invention analyzes the execution process of the parallel program on the heterogeneous system, pays attention to the power consumption modeling of multiple parallel segments, and considers the task communication between the main processor and the accelerated processor The communication overhead caused by it, and the impact of leakage current caused by the increase of chip temperature, from the perspective of the whole program, accurately calculate the power consumption calculation of heterogeneous parallel systems.
附图说明Description of drawings
图1是本发明一种异构系统全程序功耗计量方法流程图;Fig. 1 is a flow chart of a method for measuring power consumption of a heterogeneous system in a whole program according to the present invention;
图2是本发明一种异构系统全程序功耗计量方法整体框架示意图;Fig. 2 is a schematic diagram of the overall framework of a heterogeneous system full-program power consumption measurement method according to the present invention;
图3是本发明一种异构系统全程序功耗计量方法的异构并行程序分类图。FIG. 3 is a classification diagram of heterogeneous parallel programs in a method for measuring power consumption of a whole program in a heterogeneous system according to the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
请参见图1和图2,本发明一种异构系统全程序功耗计量方法,包括三个部分:Please refer to Fig. 1 and Fig. 2, a heterogeneous system whole-program power consumption measurement method of the present invention includes three parts:
(1)建立多处理器多计算段划分的程序执行时间与动态功耗关系,包括步骤:(1) Establish the relationship between program execution time and dynamic power consumption divided by multi-processor multi-computing segments, including steps:
S1:针对异构多处理器并行处理单个并行计算段,根据同一类型处理器或多种不同类型处理器完成计算段的不同方式,分析同构计算段程序执行时间对该计算段动态功耗的影响,建立同构计算段功耗与执行时间关系,获得基于同构程序划分的动态功耗表示方法;S1: Aiming at the parallel processing of a single parallel computing segment by heterogeneous multiprocessors, according to the different ways that the same type of processor or multiple different types of processors complete the computing segment, analyze the impact of the program execution time of the homogeneous computing segment on the dynamic power consumption of the computing segment Influence, establish the relationship between power consumption and execution time of the isomorphic computing segment, and obtain a dynamic power consumption representation method based on isomorphic program division;
S2:分析时间约束条件下单个计算段达到功耗最优的条件,建立异构计算段功耗与执行时间关系,获得基于异构程序划分的动态功耗表示方法;S2: Analyze the condition that a single computing segment achieves optimal power consumption under time constraints, establish the relationship between power consumption and execution time of heterogeneous computing segments, and obtain a dynamic power consumption representation method based on heterogeneous program division;
(2)获得数据传输与多任务动态分配的通信功耗形式化描述方法,包括步骤:(2) Obtain a formal description method of communication power consumption for data transmission and multi-task dynamic allocation, including steps:
S3:在同构计算段程序中,以并行数据规模为对象,分析主处理器与加速处理器之间数据传输对通信能耗的影响,获得同构计算段通信能耗表示方法;S3: In the program of the isomorphic computing segment, taking the parallel data scale as the object, analyzing the impact of data transmission between the main processor and the accelerator processor on the communication energy consumption, and obtaining the expression method of the communication energy consumption of the isomorphic computing segment;
S4:在异构计算段程序中,以并行执行任务为对象,利用异构处理器实际效能与任务特征的直接关系,分析单个计算段中具有数据依赖关系的多个并行任务划分对通信能耗的影响,获得异构计算段通信能耗表示方法;S4: In the heterogeneous computing segment program, taking parallel execution tasks as the object, using the direct relationship between the actual performance of the heterogeneous processor and the task characteristics, analyzing the communication energy consumption caused by the division of multiple parallel tasks with data dependencies in a single computing segment The impact of the communication energy consumption of heterogeneous computing segments is obtained;
(3)分析热分析模型下实时芯片温度管理与静态功耗相互影响,包括步骤:(3) Analyze the interaction between real-time chip temperature management and static power consumption under the thermal analysis model, including steps:
S5:以多核处理器芯片为对象,利用处理器内核的热传导特性,采用等效RC电路方法建立实时系统热分析模型,求解芯片工作温度;S5: Taking the multi-core processor chip as the object, using the heat conduction characteristics of the processor core, using the equivalent RC circuit method to establish a real-time system thermal analysis model to solve the chip operating temperature;
S6:分析芯片漏电流与静态功耗的相互关系,进行曲线拟合,获得漏电流与芯片温度、电压的函数关系式;S6: Analyze the relationship between chip leakage current and static power consumption, perform curve fitting, and obtain the functional relationship between leakage current and chip temperature and voltage;
S7:引入两个工作参考温度,建立漏电流与温度的二次函数,获得静态功耗与芯片温度的函数关系式,建立基于实时温度管理的静态功耗计量表示方法。S7: Introduce two working reference temperatures, establish the quadratic function of leakage current and temperature, obtain the functional relationship between static power consumption and chip temperature, and establish a static power consumption measurement representation method based on real-time temperature management.
本发明首先对并行程序在异构系统上的执行过程进行抽象。其中S表示串行计算段,S={s0,…,sn-1},根据计算段的并行性将程序分为n段,si表示第i个计算段的任务量;C表示通信段;R={r0,…,rm-1},表示异构并行系统由m类处理器组成;Nj表示第j(0≤j≤m-1)类处理器rj的数量;vj表示最高频率下的速度(处理器单位时间内完成任务量);P表示并行计算段(第一个并行计算段由主处理器完成,第二个并行计算段由主处理器和加速部件并行完成,第三个并行计算段由加速部件独立完成)。将由主处理器/加速部件独立完成并行计算段称为同构计算段程序,主处理器和加速部件共同完成并行计算段称为异构计算段程序。接着将并行程序执行特征进行符号定义,如图3所示。The invention firstly abstracts the execution process of the parallel program on the heterogeneous system. Among them, S represents the serial computing segment, S={s 0 ,…,s n-1 }, the program is divided into n segments according to the parallelism of the computing segment, s i represents the task amount of the i-th computing segment; C represents the communication Segment; R={r 0 ,...,r m-1 }, means that the heterogeneous parallel system is composed of m type processors; N j means the number of jth (0≤j≤m-1) type processor r j ; v j represents the speed at the highest frequency (the amount of tasks completed by the processor per unit time); P represents the parallel computing segment (the first parallel computing segment is completed by the main processor, and the second parallel computing segment is completed by the main processor and the acceleration unit completed in parallel, and the third parallel computing segment is completed independently by the acceleration component). The parallel computing segment completed independently by the main processor/acceleration component is called a homogeneous computing segment program, and the parallel computing segment completed by the main processor and the acceleration component is called a heterogeneous computing segment program. Next, the parallel program execution characteristics are symbolically defined, as shown in Figure 3.
(1)异构系统动态功耗计量(1) Dynamic power consumption measurement of heterogeneous systems
在同构计算段程序中,如果si为串行段,则由ri类型的单个处理器完成;如果si为并行段,则由ri类型的所有处理器完成。动态电压与处理器频率的关系可以近似的描述为f=KVγ-1,其中K和γ是与工艺相关的参数。记因此动态功耗Pd可以看成与频率f的α次方成正比的关系,即Pd=Kfα。记第i个计算段的执行时间为ti,Ni表示第i个计算段ri类处理器个数,fi表示第i个计算段时ri处理器运行频率,同构程序段程序总功耗可以表示为In the isomorphic computing segment program, if s i is a serial segment, it is completed by a single processor of type ri; if s i is a parallel segment, it is completed by all processors of type ri . The relationship between the dynamic voltage and the processor frequency can be approximately described as f=KV γ-1 , where K and γ are parameters related to the process. remember Therefore, the dynamic power consumption P d can be regarded as a relationship proportional to the α power of the frequency f, that is, P d =Kf α . Note that the execution time of the i-th computing segment is t i , N i represents the number of r i processors in the i-th computing segment, f i represents the operating frequency of the r i processor in the i-th computing segment, and the isomorphic segment program The total power dissipation can be expressed as
针对由多个计算段组成的程序模型,求解在给定执行时间T的约束下使全程序总功耗达到最小,其中对任意计算段Si的时间约束ti的分析如下:如果第i个计算段Si为串行段,则该计算段仅由一个处理器完成,此时执行时间满足如果第i个计算段Si为并行段,则该计算段由ri类型的所有处理器并行完成,此时执行时间ti满足因此,基于同构程序的计算功耗计量可以表示为:For a program model composed of multiple computing segments, the solution is to minimize the total power consumption of the whole program under the constraint of a given execution time T, where the analysis of the time constraint t i of any computing segment S i is as follows: if the i-th The calculation segment S i is a serial segment, then the calculation segment is completed by only one processor, and the execution time satisfies If the i-th calculation segment S i is a parallel segment, then this calculation segment is completed in parallel by all processors of type ri, and the execution time t i satisfies Therefore, the computational power consumption measurement based on isomorphic programs can be expressed as:
在异构计算程序中,如果Si为串行段,则由ri类型的单个处理器完成;如果Si为并行段,则由系统内所有类型的处理器共同完成。本项目主要针对CPU-GPU异构并行系统进行研究,因此处理器类型只包含CPU和GPU两类(假设CPU的型号一致;GPU的型号一致)。In a heterogeneous computing program, if S i is a serial segment, it will be completed by a single processor of type ri; if S i is a parallel segment, it will be completed by all types of processors in the system. This project mainly focuses on the research of CPU-GPU heterogeneous parallel system, so the processor types only include CPU and GPU (assuming that the CPU models are the same; the GPU models are the same).
记第i个计算段的执行时间为ti,NC表示第i个计算段CPU处理器的个数,NG表示第i个计算段GPU处理器的个数,kC和kG分别表示CPU和GPU处理器相关常数。fC和fG分别表示第i个计算段CPU和GPU处理器的运行频率。表示第j类处理器在第i个计算段内单位时间内完成的任务量。因此异构程序段程序总功耗可以表示为Note that the execution time of the i-th computing segment is t i , N C represents the number of CPU processors in the i-th computing segment, N G represents the number of GPU processors in the i-th computing segment, k C and k G represent CPU and GPU processor-related constants. f C and f G represent the operating frequencies of the i-th computing segment CPU and GPU processors, respectively. Indicates the amount of tasks completed per unit time in the i-th computing segment by the j-type processor. Therefore, the total power consumption of heterogeneous program segments can be expressed as
异构计算段程序功耗最优问题原则上可分为两个子问题进行研究,即计算段内局部功耗最优和全程序整体功耗最优。第一个子问题的关键是建立计算段处理器最优功耗与执行时间的关系,第二个子问题是在计算段内功耗最优的基础上分配不同计算段的执行时间。因此异构计算段程序功耗优化问题可归纳为一般多元极值问题,基于异构程序的计算功耗可以表示为:In principle, the problem of optimal power consumption of heterogeneous computing segment programs can be divided into two sub-problems for research, that is, the optimal local power consumption within the computing segment and the optimal overall power consumption of the entire program. The key to the first sub-problem is to establish the relationship between the optimal power consumption of the computing segment processor and the execution time, and the second sub-problem is to allocate the execution time of different computing segments based on the optimal power consumption within the computing segment. Therefore, the power consumption optimization problem of heterogeneous computing segment programs can be summarized as a general multivariate extreme value problem, and the computing power consumption based on heterogeneous programs can be expressed as:
(2)异构系统通信功耗计量(2) Heterogeneous system communication power consumption measurement
在异构并行系统中,CPU与GPU通过PCI-E总线进行连接,PCI-E总线不支持动态电压/频率调节技术,即数据通信操作的执行速度与功耗开销一定。将PCI-E总线记为一类特殊的功能单元,其运行过程中的功耗开销为pm,0,空闲状态下的功耗开销为pm,1。同时假设通信操作不可中断,即多个数据通信操作需顺序执行,由于系统总线由单个通信操作独占使用,因此通信开销与数据规模成正比关系;而数据规模取决于具有数据依赖关系的两个并行任务的划分策略。In a heterogeneous parallel system, the CPU and GPU are connected through the PCI-E bus. The PCI-E bus does not support dynamic voltage/frequency adjustment technology, that is, the execution speed and power consumption of data communication operations are constant. The PCI-E bus is recorded as a special functional unit, and its power consumption overhead during operation is p m,0 , and its power consumption overhead in idle state is p m,1 . At the same time, it is assumed that the communication operation cannot be interrupted, that is, multiple data communication operations need to be executed sequentially. Since the system bus is exclusively used by a single communication operation, the communication overhead is proportional to the data size; and the data size depends on two parallel data with data dependencies. Task division strategy.
①同构程序段通信功耗计量①Isomorphic program segment communication power consumption measurement
在同构计算段程序中,通信功耗主要为输入数据由CPU传到GPU存储空间,输出数据由GPU回存到CPU存储空间所引入的通信开销。记通信操作的执行时间表示CPU与GPU之间数据通信开销,tm,0表示PCI-E总线空闲状态下的时间开销,则同构程序的通信功耗可以表示为,In the isomorphic computing segment program, the communication power consumption is mainly the communication overhead introduced by the input data being transferred from the CPU to the GPU storage space, and the output data being stored back from the GPU to the CPU storage space. Record the execution time of the communication operation Indicates the data communication overhead between the CPU and GPU, t m,0 indicates the time overhead in the idle state of the PCI-E bus, then the communication power consumption of the isomorphic program can be expressed as,
②异构程序段通信功耗计量② Communication power consumption measurement of heterogeneous program segments
在异构计算段程序中,通信功耗主要为在单个计算段中,具有数据依赖关系的多个并行任务划分所产生的通信开销。由于异构处理器的实际效能与任务特征直接相关,因此在多个任务间容易产生各不相同的划分策略,而由此引入了较大的通信开销。记表示任务v在划分方式z下与任务v'在划分方式z'下的通信开销,则异构程序的通信功耗可以表示为,In a heterogeneous computing segment program, communication power consumption is mainly the communication overhead generated by the division of multiple parallel tasks with data dependencies in a single computing segment. Since the actual performance of heterogeneous processors is directly related to task characteristics, it is easy to generate different division strategies among multiple tasks, which introduces a large communication overhead. remember Indicates the communication overhead of task v in the division mode z and task v' in the division mode z', then the communication power consumption of the heterogeneous program can be expressed as,
(3)异构系统静态功耗计量(3) Static power consumption measurement of heterogeneous systems
为了研究处理器内核的热传导特性,采用等效RC电路方法进行热分析建模,该模型采用如下公式进行工作温度的求解:In order to study the thermal conduction characteristics of the processor core, the equivalent RC circuit method is used for thermal analysis modeling. The model uses the following formula to solve the working temperature:
T和Tamb分别代表芯片的温度与环境温度,P代表时间t时芯片的功耗,Rth、Cth分别为等效热阻与等效热容。处理器的系统状态可以分为工作状态和休眠状态。只有在工作状态下处理器才执行任务;否则,处理器将进入休眠状态以减少功耗并降低自身温度。工作状态下的静态功耗可以表示为,T and T amb represent the temperature of the chip and the ambient temperature, respectively, P represents the power consumption of the chip at time t, and R th and C th are the equivalent thermal resistance and equivalent thermal capacity, respectively. The system state of the processor can be divided into a working state and a sleep state. The processor performs tasks only when it is active; otherwise, the processor goes to sleep to reduce power consumption and cool itself. The static power consumption in the working state can be expressed as,
Pstatic=NgateIleakageVdd (10)P static = N gate I leakage V dd (10)
通过HSPICE软件进行曲线拟合,与温度、电压相关的漏电流可以写为Curve fitting is carried out by HSPICE software, and the leakage current related to temperature and voltage can be written as
其中,A,B,α,β,γ,δ,μ,η是经验参数,由生产工艺所决定,当工作温度T在300k—380k的正常范围内变化,的波动变化很小。当给定了Vdd后,通过引入两个参考温度TH和TL进一步将漏电流简化为温度的二次函数。于是与漏电流相关的静态功耗可以形式化表示为Among them, A, B, α, β, γ, δ, μ, η are empirical parameters, which are determined by the production process. When the working temperature T changes within the normal range of 300k-380k, fluctuations are small. When V dd is given, the leakage current is further simplified as a quadratic function of temperature by introducing two reference temperatures TH and TL. The static power dissipation associated with the leakage current can then be formalized as
其中,in,
以上对本发明的具体实施例进行了描述。需要理解的是,本发明并不局限于上述特定实施方式,本领域技术人员可以在权利要求的范围内做出各种变形或修改,这并不影响本发明的实质内容。Specific embodiments of the present invention have been described above. It should be understood that the present invention is not limited to the specific embodiments described above, and those skilled in the art may make various changes or modifications within the scope of the claims, which do not affect the essence of the present invention.
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也视为本发明的保护范围。The above description is a preferred embodiment of the present invention, it should be pointed out that for those skilled in the art, without departing from the principle of the present invention, some improvements and modifications can also be made, and these improvements and modifications are also considered Be the protection scope of the present invention.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710020074.5A CN106874158A (en) | 2017-01-11 | 2017-01-11 | A kind of heterogeneous system Whole Process power consumption metering method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710020074.5A CN106874158A (en) | 2017-01-11 | 2017-01-11 | A kind of heterogeneous system Whole Process power consumption metering method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106874158A true CN106874158A (en) | 2017-06-20 |
Family
ID=59159228
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710020074.5A Pending CN106874158A (en) | 2017-01-11 | 2017-01-11 | A kind of heterogeneous system Whole Process power consumption metering method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106874158A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818040B (en) * | 2017-09-28 | 2021-09-21 | 华南师范大学 | Analysis method, system and device suitable for guiding parallelization of correlation algorithm |
CN113467936A (en) * | 2021-06-16 | 2021-10-01 | 上海行健职业学院 | Processor scale selection method based on parallel computing time shortest estimation model |
WO2021227418A1 (en) * | 2020-05-11 | 2021-11-18 | 深圳先进技术研究院 | Task deployment method and device based on multi-board fpga heterogeneous system |
CN114546666A (en) * | 2022-04-25 | 2022-05-27 | 沐曦科技(北京)有限公司 | Power consumption distribution method based on multiple computing devices |
CN117349029A (en) * | 2023-12-04 | 2024-01-05 | 浪潮电子信息产业股份有限公司 | Heterogeneous computing system, energy consumption determining method and device, electronic equipment and medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106293003A (en) * | 2016-08-05 | 2017-01-04 | 广东工业大学 | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query |
-
2017
- 2017-01-11 CN CN201710020074.5A patent/CN106874158A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106293003A (en) * | 2016-08-05 | 2017-01-04 | 广东工业大学 | A kind of heterogeneous system dynamic power consumption optimization method based on AOV gateway key path query |
Non-Patent Citations (2)
Title |
---|
ZHUOWEI WANG: ""An architecture-level graphics processing unit energy model"", 《WILEY ONLINE LIBRARY》 * |
王桂彬: ""大规模异构并行系统软件低功耗优化关键技术研究"", 《中国博士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818040B (en) * | 2017-09-28 | 2021-09-21 | 华南师范大学 | Analysis method, system and device suitable for guiding parallelization of correlation algorithm |
WO2021227418A1 (en) * | 2020-05-11 | 2021-11-18 | 深圳先进技术研究院 | Task deployment method and device based on multi-board fpga heterogeneous system |
CN113467936A (en) * | 2021-06-16 | 2021-10-01 | 上海行健职业学院 | Processor scale selection method based on parallel computing time shortest estimation model |
CN114546666A (en) * | 2022-04-25 | 2022-05-27 | 沐曦科技(北京)有限公司 | Power consumption distribution method based on multiple computing devices |
CN114546666B (en) * | 2022-04-25 | 2022-07-19 | 沐曦科技(北京)有限公司 | Power consumption distribution method based on multiple computing devices |
CN117349029A (en) * | 2023-12-04 | 2024-01-05 | 浪潮电子信息产业股份有限公司 | Heterogeneous computing system, energy consumption determining method and device, electronic equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Moreau et al. | SNNAP: Approximate computing on programmable SoCs via neural acceleration | |
Zhu et al. | High-performance and energy-efficient mobile web browsing on big/little systems | |
US7716006B2 (en) | Workload scheduling in multi-core processors | |
Bertran et al. | A systematic methodology to generate decomposable and responsive power models for CMPs | |
Jahanshahi et al. | Gpu-nest: Characterizing energy efficiency of multi-gpu inference servers | |
Wang et al. | OPTiC: Optimizing collaborative CPU–GPU computing on mobile devices with thermal constraints | |
Paul et al. | Coordinated energy management in heterogeneous processors | |
CN106874158A (en) | A kind of heterogeneous system Whole Process power consumption metering method | |
Tiwari et al. | Predicting optimal power allocation for cpu and dram domains | |
Rossi et al. | Modeling power consumption for DVFS policies | |
Stamoulis et al. | Can we guarantee performance requirements under workload and process variations? | |
Liu et al. | Source-level energy consumption estimation for cloud computing tasks | |
Wang et al. | GPGPU power estimation with core and memory frequency scaling | |
Metz et al. | Towards neural hardware search: Power estimation of cnns for gpgpus with dynamic frequency scaling | |
León-Vega et al. | A Comprehensive Analysis of Process Energy Consumption on Multi-Socket Systems with GPUs | |
Xiong et al. | A novel scalability metric about iso-area of performance for parallel computing | |
Maghsoud et al. | PEPS: Predictive energy-efficient parallel scheduler for multi-core processors | |
Kornaros et al. | Hardware-assisted dynamic power and thermal management in multi-core SoCs | |
Munir et al. | A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems | |
Li et al. | Kernel scheduling approach for reducing GPU energy consumption | |
Peng et al. | PROPHET: Predictive on-chip power Meter in hardware accelerator for DNN | |
Feng et al. | Efficient task assignment and scheduling on MPSOC with STT-RAM based hybrid SPMs considering data allocation | |
Bambini et al. | Modeling the thermal and power control subsystem in HPC processors | |
Wang et al. | Whole procedure heterogeneous multiprocessors low-power optimization at algorithm-level | |
Lösch et al. | reMinMin: A novel static energy-centric list scheduling approach based on real measurements |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170620 |
|
WD01 | Invention patent application deemed withdrawn after publication |