CN103914121B - Multicomputer system and method and device for optimizing power consumption of same - Google Patents
Multicomputer system and method and device for optimizing power consumption of same Download PDFInfo
- Publication number
- CN103914121B CN103914121B CN201310001368.5A CN201310001368A CN103914121B CN 103914121 B CN103914121 B CN 103914121B CN 201310001368 A CN201310001368 A CN 201310001368A CN 103914121 B CN103914121 B CN 103914121B
- Authority
- CN
- China
- Prior art keywords
- test point
- data processing
- computer system
- power consumption
- processing devices
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Landscapes
- Power Sources (AREA)
Abstract
本发明提供一种多机系统、用于优化多机系统功耗的方法及装置,通过在确定的用于调整该多机系统功耗的数据处理设备的数量范围内,确定每次搜索的第一试验点和第二试验点两个试验点,使得每一次搜索后都会舍弃功耗值较大的试验点一侧的区间,而以其中该试验点作为下一次搜索核数范围的边界,有效地缩小了核数搜索范围,提高了多机系统的功耗优化效率。
The present invention provides a multi-machine system, a method and a device for optimizing the power consumption of the multi-machine system, by determining the number of data processing devices used to adjust the power consumption of the multi-machine system within the determined number range of each search The first test point and the second test point are two test points, so that after each search, the interval on the side of the test point with a larger power consumption value will be discarded, and the test point will be used as the boundary of the next search core range, effectively The search range of the number of cores is greatly narrowed, and the power consumption optimization efficiency of the multi-computer system is improved.
Description
技术领域technical field
本发明涉及计算机节能技术,尤其涉及一种多机系统、用于优化多机系统功耗的方法及装置。The invention relates to computer energy-saving technology, in particular to a multi-machine system, a method and a device for optimizing power consumption of the multi-machine system.
背景技术Background technique
处理器功耗管理技术是近年来处理器设计的重要课题。随着深亚微米工艺的进步,漏电功耗已经成为处理器功耗的组成部分。因此,一系列针对减少处理器漏电功耗(静态功耗)的技术相继诞生。Processor power management technology is an important topic in processor design in recent years. With the advancement of deep submicron technology, leakage power consumption has become an integral part of processor power consumption. Therefore, a series of technologies aimed at reducing processor leakage power consumption (static power consumption) have been born.
首先被广泛用于减少处理器漏电功耗的方法是动态功耗管理技术(DynamicPower Management,DPM)。DPM先通过关闭空闲的处理器或处理器核来减少不必要的功耗开销,然后通过任务迁移并关闭负载较低的处理器或处理器核的方法来减少功耗。The first method widely used to reduce leakage power consumption of processors is Dynamic Power Management (DPM). DPM first reduces unnecessary power consumption overhead by shutting down idle processors or processor cores, and then reduces power consumption by migrating tasks and shutting down processors or processor cores with lower loads.
其次,随着动态电压频率缩放技术(Dynamic Voltage and Frequency Scaling,DVFS)的广泛应用,将DVFS与前述DPM相结合,在关闭负载较低的处理器或处理器核的同时,提高其他工作核的电压频率,可以做到既节约功耗又保证性能不受损失。Secondly, with the widespread application of Dynamic Voltage and Frequency Scaling (DVFS), combining DVFS with the aforementioned DPM can improve the performance of other working cores while shutting down processors or processor cores with lower loads. Voltage frequency can save power consumption and ensure performance without loss.
但是,在保证性能的前提下,并不是频率越高核数越少就能获得较低功耗。However, under the premise of ensuring performance, it is not that the higher the frequency and the fewer the number of cores, the lower power consumption will be obtained.
一方面,频率提高会导致功耗的超线性增长,因此当由频率增加导致的动态功耗增加大于由关闭处理器核带来的静态功耗降低时,处理器的总功耗就会增加;另一方面,对于并行性高的程序可以通过增加执行核数并降低频率来保证性能,但核数增加带来的静态功耗增加大于由频率降低带来的动态功耗降低时,处理器的总功耗也会增大。因此在保证性能不变的前提下,使用DVFS的同时调整处理器的执行核数,功耗会随核数的增加出现先降低后增加的变化趋势。On the one hand, the increase in frequency will lead to a super-linear increase in power consumption, so when the increase in dynamic power consumption caused by the increase in frequency is greater than the decrease in static power consumption caused by turning off the processor core, the total power consumption of the processor will increase; On the other hand, for programs with high parallelism, the performance can be guaranteed by increasing the number of execution cores and reducing the frequency, but when the increase in the number of cores brings about a static power consumption increase greater than the reduction in dynamic power consumption caused by frequency reduction, the processor's The total power consumption will also increase. Therefore, under the premise of ensuring the same performance, if the number of execution cores of the processor is adjusted while using DVFS, the power consumption will first decrease and then increase as the number of cores increases.
上述规律对于多核乃至众核处理器上运行大规模多线程程序的情况来说,如何在一定性能约束下寻找不同程序的最优执行核数的频率,成为功耗优化管理的最终目标。For the above rules, for the case of running large-scale multi-threaded programs on multi-core or even many-core processors, how to find the frequency of the optimal number of execution cores for different programs under certain performance constraints becomes the ultimate goal of power consumption optimization management.
目前,涡轮加速器(Turbo boost)技术是英特尔(Intel)主流处理器中使用的一种功耗管理方法。该技术通过底层硬件进行处理器核的频率调节,能够对指定的单一核进行频率操作,同时其余的空负载处理器核进入深度睡眠状态,以达到功耗与性能间的平衡。Currently, turbo boost (Turbo boost) technology is a power management method used in Intel (Intel) mainstream processors. This technology adjusts the frequency of the processor core through the underlying hardware, and can operate the frequency of a specified single core, while the rest of the unloaded processor cores enter a deep sleep state to achieve a balance between power consumption and performance.
但是,Turbo boost技术主要应用于小于或等于8个核的处理器上,面向的主流处理器的核数较少,当处理器核数规模大于目前的核数时,使用Turbo boost技术关闭负载低的核并提升负载高的核频率,极有可能出现功耗随核数的增加出现先降低再增加的情况。并且,该技术针对的应用程序并行度有限,导致线程数通常小于处理器核数,此时关闭空闲的处理器核虽然可以降低静态功耗,但是当程序线程数大于处理器核数时,关闭一部分处理器核可能导致其他处理器核的负载增加,从而无法保证目标性能或者导致功耗增大。However, Turbo boost technology is mainly applied to processors with less than or equal to 8 cores, and the number of cores for mainstream processors is small. If the number of cores increases and the frequency of cores with high load is increased, it is very likely that the power consumption will first decrease and then increase with the increase of the number of cores. Moreover, the parallelism of the applications targeted by this technology is limited, resulting in the number of threads being usually smaller than the number of processor cores. At this time, turning off idle processor cores can reduce static power consumption, but when the number of program threads is greater than the number of processor cores, turning off Some processor cores may increase the load on other processor cores, making it impossible to guarantee target performance or resulting in increased power consumption.
另外一种功耗管理方法是在功耗-核数空间上,使用爬山法搜索最低功耗所需核数。该方法以某一核数a作为试验点,测得功耗值,然后在a+1个处理器核上执行,如果功耗大于a个核上测得的功耗值,则下一次试验点为a-1个核;若功耗小于a个核上测得的功耗,则下一次试验点的核数为a+2,依次在核数为a的两侧循环执行,并测得相应的功耗,从中找到最低功耗对应的处理器核数。Another power management method is to use the hill-climbing method to search for the number of cores required for the lowest power consumption in the power consumption-core space. This method takes a certain number of cores a as the test point, measures the power consumption value, and then executes it on a+1 processor cores. If the power consumption is greater than the power consumption value measured on a core, the next test point is a-1 cores; if the power consumption is less than the power consumption measured on a core, the number of cores at the next test point is a+2, and it is executed in turn on both sides of the core number a, and the corresponding power consumption, and find the number of processor cores corresponding to the lowest power consumption.
这种方法的缺点是试验点向最低功耗所在核数逼近的速度较慢,每次试验点都比上一次增加或减少一个核数。随着处理器规模的增大,爬山法搜索遍历的试验次数也会大大增加,得到最优解的速度较慢。因此该方法的可扩展性比较差,不能迅速逼近功耗最优值。The disadvantage of this method is that the test point approaches the number of cores where the lowest power consumption is slow, and each test point increases or decreases by one core number compared to the previous one. As the scale of the processor increases, the number of trials of the hill-climbing search and traversal will also increase greatly, and the speed of obtaining the optimal solution is relatively slow. Therefore, the scalability of this method is relatively poor, and it cannot quickly approach the optimal value of power consumption.
发明内容Contents of the invention
本发明实施例提供一种多机系统、用于优化多机系统功耗的方法及装置,用于提高多机系统的功耗优化效率。Embodiments of the present invention provide a multi-machine system, a method and an apparatus for optimizing power consumption of the multi-machine system, and are used for improving power consumption optimization efficiency of the multi-machine system.
第一个方面,本发明实施例提供一种用于优化多机系统功耗的方法,包括:In the first aspect, an embodiment of the present invention provides a method for optimizing power consumption of a multi-machine system, including:
确定多机系统中用于调整所述多机系统功耗的数据处理设备的数量范围,所述数量范围的最小值为下界,最大值为上界;Determining the number range of data processing devices used to adjust the power consumption of the multi-machine system in the multi-machine system, the minimum value of the number range is the lower bound, and the maximum value is the upper bound;
在所述数量范围内进行搜索,确定第一试验点和第二试验点;所述第一试验点、第二试验点均为数据处理设备的数量,且所述第一试验点与所述第二试验点之和等于所述上界与所述下界之和;Search within the number range to determine the first test point and the second test point; the first test point and the second test point are the number of data processing equipment, and the first test point and the second test point The sum of the two test points is equal to the sum of the upper bound and the lower bound;
根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,并逐步降低剩余数据处理设备的频率,以满足目标性能。Shut down all non-executing data processing devices according to the first test point and the second test point, and gradually reduce the frequency of the remaining data processing devices to meet the target performance.
第二个方面,本发明实施例提供一种用于优化多机系统功耗的装置,包括:In the second aspect, an embodiment of the present invention provides an apparatus for optimizing power consumption of a multi-machine system, including:
范围确定单元,用于确定多机系统中用于调整所述多机系统功耗的数据处理设备的数量范围,所述数量范围的最小值为下界,最大值为上界;A range determination unit, configured to determine the range of the number of data processing devices used to adjust the power consumption of the multi-machine system in the multi-computer system, the minimum value of the number range is a lower bound, and the maximum value is an upper bound;
试验点确定单元,用于在所述数量范围内进行搜索,确定第一试验点和第二试验点;所述第一试验点、第二试验点均为数据处理设备的数量,且所述第一试验点与所述第二试验点之和等于所述上界与所述下界之和;The test point determination unit is used to search within the quantity range to determine the first test point and the second test point; the first test point and the second test point are the numbers of data processing equipment, and the first test point The sum of a test point and said second test point is equal to the sum of said upper bound and said lower bound;
性能调整单元,用于根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,并逐步降低剩余数据处理设备的频率,以满足目标性能。A performance adjustment unit, configured to shut down all non-executing data processing devices according to the first test point and the second test point, and gradually reduce the frequency of the remaining data processing devices to meet the target performance.
第三个方面,本发明实施例提供一种多机系统,包括多机系统本体及上述用于优化多机系统功耗的装置,所述装置用于优化所述多机系统本体的功耗。In a third aspect, an embodiment of the present invention provides a multi-machine system, including a multi-machine system body and the above-mentioned device for optimizing power consumption of the multi-machine system, and the device is used for optimizing power consumption of the multi-machine system body.
本发明实施例提供的多机系统、用于优化多机系统功耗的方法及装置,通过在确定的用于调整该多机系统功耗的数据处理设备的数量范围内,确定每次搜索的第一试验点和第二试验点两个试验点,使得每一次搜索后都会舍弃功耗值较大的试验点一侧的区间,而以其中该试验点作为下一次搜索核数范围的边界,有效地缩小了核数搜索范围,提高了多机系统的功耗优化效率。The multi-machine system, method and device for optimizing the power consumption of the multi-machine system provided by the embodiments of the present invention determine the power consumption of each search within the determined number of data processing devices used to adjust the power consumption of the multi-machine system The first test point and the second test point are two test points, so that after each search, the interval on the side of the test point with a larger power consumption value will be discarded, and the test point will be used as the boundary of the next search core range, The search range of the number of cores is effectively narrowed, and the power consumption optimization efficiency of the multi-computer system is improved.
附图说明Description of drawings
图1为本发明实施例提供的一种用于优化多机系统功耗的方法的流程图;FIG. 1 is a flowchart of a method for optimizing power consumption of a multi-machine system provided by an embodiment of the present invention;
图2为本发明实施例提供的另一种用于优化多机系统功耗的方法的流程图;FIG. 2 is a flowchart of another method for optimizing power consumption of a multi-computer system provided by an embodiment of the present invention;
图3为本发明实施例提供的一种用于优化多机系统功耗的装置的结构示意图;FIG. 3 is a schematic structural diagram of an apparatus for optimizing power consumption of a multi-computer system provided by an embodiment of the present invention;
图4为本发明实施例一种多机系统的结构示意图。FIG. 4 is a schematic structural diagram of a multi-machine system according to an embodiment of the present invention.
具体实施方式detailed description
图1为本发明实施例提供的一种用于优化多机系统功耗的方法的流程图。本实施例所提供的方法可通过在多机系统中增加一个装置来实现,如在多机系统中设置一个功能模块来实现。或者另外设置一个装置来实现该方法。如图1所示,该方法包括:FIG. 1 is a flowchart of a method for optimizing power consumption of a multi-computer system provided by an embodiment of the present invention. The method provided in this embodiment can be implemented by adding a device in the multi-computer system, for example, setting a function module in the multi-computer system. Or another device is provided to implement the method. As shown in Figure 1, the method includes:
步骤11、确定多机系统中用于调整该多机系统功耗的数据处理设备的数量范围,该数量范围的最小值为下界,最大值为上界。Step 11: Determine the number range of the data processing devices used to adjust the power consumption of the multi-machine system in the multi-machine system, the minimum value of the number range is the lower limit, and the maximum value is the upper limit.
其中,多机系统可为多核处理器,也可为具有多个处理器的系统,还可为具体有多个可独立调节频率的设备的系统等。Wherein, the multi-computer system may be a multi-core processor, a system with multiple processors, or a system with multiple devices whose frequencies can be adjusted independently.
步骤12、在上述数量范围内进行搜索,确定第一试验点和第二试验点;该第一试验点、第二试验点均为数据处理设备的数量,且该第一试验点与该第二试验点之和等于所述上界与所述下界之和。Step 12. Search within the above quantity range to determine the first test point and the second test point; the first test point and the second test point are the numbers of data processing equipment, and the first test point and the second test point The sum of the test points is equal to the sum of the upper bound and the lower bound.
步骤13、根据上述第一试验点和第二试验点关闭所有非执行数据处理设备,并逐步降低剩余数据处理设备的频率,以满足目标性能。Step 13. Shut down all non-executing data processing devices according to the first test point and the second test point, and gradually reduce the frequency of the remaining data processing devices to meet the target performance.
本实施例中,通过在确定的用于调整该多机系统功耗的数据处理设备的数量范围内,确定每次搜索的第一试验点和第二试验点两个试验点,使得每一次搜索后都会舍弃功耗值较大的试验点一侧的区间,而以其中该试验点作为下一次搜索核数范围的边界,有效地缩小了核数搜索范围,提高了多机系统的功耗优化效率。In this embodiment, two test points, the first test point and the second test point, are determined for each search within the determined number of data processing devices used to adjust the power consumption of the multi-computer system, so that each search Finally, the interval on the side of the test point with a large power consumption value will be discarded, and the test point will be used as the boundary of the next search for the number of cores, which effectively narrows the search range for the number of cores and improves the power consumption optimization of the multi-computer system. efficiency.
上述步骤11中,确定多机系统中用于调整所述多机系统功耗的数据处理设备的数量范围可包括:确定所述多机系统中用于调整所述多机系统功耗的数据处理设备的初始数量范围。In the above step 11, determining the number range of data processing devices used to adjust the power consumption of the multi-machine system in the multi-machine system may include: determining the data processing equipment used in the multi-machine system to adjust the power consumption of the multi-machine system The initial quantity range of devices.
具体地,确定所述多机系统中用于调整所述多机系统功耗的数据处理设备的初始数量范围,可包括:Specifically, determining the initial quantity range of the data processing equipment used to adjust the power consumption of the multi-computer system in the multi-computer system may include:
若所述多机系统中运行的线程数大于所述多机系统中所有数据处理设备的数量,则所述初始数量范围的最小值为0,最大值为所述多机系统中所有数据处理设备的数量;If the number of threads running in the multi-computer system is greater than the number of all data processing devices in the multi-computer system, the minimum value of the initial number range is 0, and the maximum value is all data processing devices in the multi-computer system quantity;
若所述多机系统中的总线程数或运行的线程数小于所述多机系统中所有数据处理设备的数量,则所述初始数量范围的最小值为0,最大值为所述多机系统中运行的线程数。If the total number of threads or the number of running threads in the multi-computer system is less than the number of all data processing devices in the multi-computer system, the minimum value of the initial number range is 0, and the maximum value is 0 in the multi-computer system The number of threads running in .
上述步骤12中,在所述数量范围内进行搜索,确定第一试验点和第二试验点,可包括:In the above step 12, searching within the quantity range to determine the first test point and the second test point may include:
计算a=(X+Y)×m,b=(X+Y)×(1-m);其中,a为所述第一试验点,b为所述第二试验点,0<m<1,X为所述上界,Y为所述下界,a、b、X、Y均为变量;Calculate a=(X+Y)×m, b=(X+Y)×(1-m); where a is the first test point, b is the second test point, 0<m<1 , X is the upper bound, Y is the lower bound, and a, b, X, and Y are variables;
用所述a个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗A;Execute all threads in the multi-computer system with the a data processing device, and measure the power consumption A of the multi-computer system;
用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;Execute all threads in the multi-computer system with the b data processing devices, and measure the power consumption B of the multi-computer system;
比较所述A和B;compare said A and B;
若|A-B|<w,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,w为第一预定值;If |A-B|<w, execute the step of shutting down all non-executing data processing devices according to the first test point and the second test point, where w is a first predetermined value;
若|A-B|>=w,则判断是否|a-b|<e;If |A-B|>=w, judge whether |a-b|<e;
若|a-b|<e,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,e为第二预定值;若|a-b|>=e,则判断是否I>=d;If |a-b|<e, then execute the described first test point and second test point to close all non-executive data processing devices, wherein, e is the second predetermined value; if |a-b|>=e, then judge whether i>=d;
若I>=d,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,I为循环次数,初始值为0,d为第三预定值;If I>=d, then execute the described first test point and the second test point to close all non-executive data processing devices, wherein, I is the number of cycles, the initial value is 0, and d is the third predetermined value;
若I<d,且A>B,a<b,则计算I=I+1,Y=a,a=b,b=X+Y-a;A=B,然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A>B, a<b, then calculate I=I+1, Y=a, a=b, b=X+Y-a; A=B, and then execute the b data The processing device executes all the threads in the multi-computer system, and measures the power consumption B of the multi-computer system;
若I<d,且A>B,a>b,则计算I=I+1,X=a,a=b,b=X+Y-a;A=B,然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A>B, a>b, then calculate I=I+1, X=a, a=b, b=X+Y-a; A=B, and then execute the b data The processing device executes all the threads in the multi-computer system, and measures the power consumption B of the multi-computer system;
若I<d,且A<B,a<b,则计算I=I+1,X=b,a=a,b=X+Y-a;然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A<B, a<b, then calculate I=I+1, X=b, a=a, b=X+Y-a; then execute the described b data processing equipment to execute the All threads in the multi-machine system, and measure the power consumption B of the multi-machine system;
若I<d,且A<B,a>b,则计算I=I+1,Y=b,a=a,b=X+Y-a;然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B。If I<d, and A<B, a>b, then calculate I=I+1, Y=b, a=a, b=X+Y-a; then execute the b data processing equipment to execute the All the threads in the multi-computer system, and measure the power consumption B of the multi-computer system.
上述步骤13中,根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,可包括:In the above step 13, closing all non-executive data processing devices according to the first test point and the second test point may include:
计算Z-(a+b)/2,其中Z为所述多机系统中数据处理设备的总数;Calculate Z-(a+b)/2, where Z is the total number of data processing devices in the multi-computer system;
关闭所述多机系统中Z-(a+b)/2个数据处理设备。Shut down Z-(a+b)/2 data processing devices in the multi-computer system.
下面以具有N个核的多核处理器为例,对用于优化多机系统功耗的方法作进一步详细说明。Taking a multi-core processor with N cores as an example, the method for optimizing power consumption of a multi-computer system will be further described in detail below.
在每次执行试验点核数时,若该试验点核数大于上一个试验点执行的核数,需要将一部分等待执行的线程迁移至启动并空闲的处理器核上;若该试验点核数小于上一个试验点执行的核数,需要将即将关闭的处理器核上执行的线程迁移至其他活动的处理器核上,并尽量做到每个活动的处理器核的负载平均分配。When the number of cores of the test point is executed each time, if the number of cores of the test point is greater than the number of cores executed in the previous test point, it is necessary to migrate a part of the threads waiting for execution to the started and idle processor cores; if the number of cores of the test point If it is smaller than the number of cores executed in the previous test point, it is necessary to migrate the threads executed on the processor core that is about to be shut down to other active processor cores, and try to achieve an even load distribution of each active processor core.
参见图2,优化多核处理器功耗的方法包括以下步骤:Referring to Figure 2, the method for optimizing the power consumption of a multi-core processor includes the following steps:
步骤21、将多核处理器中运行的程序充分并行化,确定多核处理器中用于调整该多核处理器的核的初始数量范围[Y,X],即确定X和Y的初始值。并且,设置I=0,其中I为循环次数。Step 21. Fully parallelize the program running in the multi-core processor, and determine the initial number range [Y, X] of the multi-core processor for adjusting the cores of the multi-core processor, that is, determine the initial values of X and Y. Also, set I=0, where I is the number of cycles.
其中,X为核数上界,Y为核数下界。Among them, X is the upper bound of the number of cores, and Y is the lower bound of the number of cores.
例如,若所运行程序的线程总数大于多核处理器中核的总数N,则确定核的初始数量范围为0到总核数,即[0,N],也就是说,X=N,Y=0。For example, if the total number of threads of the running program is greater than the total number N of cores in the multi-core processor, the initial number of cores is determined to range from 0 to the total number of cores, that is, [0, N], that is, X=N, Y=0 .
若所运行程序的线程总数小于多核处理器中核的总数N,则确定核的初始数量范围为0到线程总数,即[0,线程总数],也就是说,X=线程总数,Y=0。If the total number of threads of the running program is less than the total number N of cores in the multi-core processor, the initial number of cores ranges from 0 to the total number of threads, that is, [0, the total number of threads], that is, X=the total number of threads, Y=0.
如果线程总数小于多核处理器中核的总数N,可采用类似传统turbo boost的方法关闭多核处理器中空闲的核,然后对剩下的核执行步骤22及以后的步骤,对多核处理器进行功耗优化。If the total number of threads is less than the total number N of cores in the multi-core processor, a method similar to traditional turbo boost can be used to turn off idle cores in the multi-core processor, and then perform steps 22 and later on the remaining cores to power the multi-core processor optimization.
步骤22、计算第一个试验点a=(X+Y)×m,计算第二个试验点b=(X+Y)×(1-m)。其中,m的选择区间为(0,1)。Step 22. Calculate the first test point a=(X+Y)×m, and calculate the second test point b=(X+Y)×(1-m). Among them, the selection interval of m is (0, 1).
这里,以(Y,X)作为最优执行核数的搜索空间,分别在该搜索空间内选择第一个试验点a、第二个试验点b作为本次搜索中的试验核数。Here, (Y, X) is used as the search space for the optimal number of execution cores, and the first test point a and the second test point b are respectively selected in the search space as the number of test cores in this search.
步骤23、将多核处理器中的a个核作为执行核,将该多核处理器中其他X-a个核上的的线程迁移至这a个核上,这X-a个核成为非执行核,并关闭这些非执行核。然后,逐步降低各执行核的频率,直到该多核处理器刚好满足目标性能,测得该多核处理器此时的功耗A。假设所有核必须运行在相同的频率上,降低频率时,所有核的频率一同降低。Step 23, using a core in the multi-core processor as an execution core, migrating threads on other X-a cores in the multi-core processor to this a core, these X-a cores become non-executive cores, and shut down these non-executive core. Then, the frequency of each execution core is gradually reduced until the multi-core processor just meets the target performance, and the power consumption A of the multi-core processor at this time is measured. Assuming that all cores must run at the same frequency, when the frequency is reduced, the frequency of all cores is reduced together.
其中,目标性能为在启动功耗管理机制后使用的目标性能,可预先设定。如:可以设置一个固定的目标性能,可以设置一个固定的目标性能下限,还可以设置一个容许性能损失的上限,或者将直接测得的程序在所有可用核正常执行时多核处理器的性能作为目标性能。Wherein, the target performance is the target performance used after the power consumption management mechanism is started, and can be set in advance. For example: you can set a fixed target performance, you can set a fixed target performance lower limit, you can also set an upper limit of allowable performance loss, or you can use the directly measured performance of the multi-core processor when all available cores are normally executed as the target performance.
步骤24、将多核处理器中的b个核作为执行核,将该多核处理器中其他X-b个核上的的线程迁移至这b个核上,这X-b个核成为非执行核,并关闭这些非执行核。然后,逐步降低各执行核的频率,直到该多核处理器刚好满足目标性能,测得该多个处理器此时的功耗B。Step 24, using b cores in the multi-core processor as execution cores, migrating threads on other X-b cores in the multi-core processor to the b cores, these X-b cores become non-executive cores, and shut down these non-executive core. Then, the frequency of each execution core is gradually reduced until the multi-core processor just meets the target performance, and the power consumption B of the multiple processors at this time is measured.
步骤25、判断功耗A与功耗B差别是否小于一个既定数值,如判断是否|A-B|<w,若是,执行步骤28;否则,执行步骤26。其中,w为A和B差别的既定数值。Step 25. Determine whether the difference between power consumption A and power consumption B is less than a predetermined value, for example, determine whether |A-B|<w, if yes, go to step 28; otherwise, go to step 26. Among them, w is the predetermined value of the difference between A and B.
步骤26、判断a与b差值是否小于一个既定数值,如判断是否|a-b|<ε,若是,执行步骤28;否则,执行步骤27。其中,ε为差值的既定数值。Step 26. Judging whether the difference between a and b is less than a predetermined value, such as judging whether |a-b|<ε, if yes, go to step 28; otherwise, go to step 27. Wherein, ε is a predetermined value of the difference.
步骤27、至此,搜索过程是否达到一个循环次数限制,如判断是否I≥d,若是,执行步骤28,否则,执行步骤29。其中,d为循环次数的既定数值。Step 27. So far, whether the search process has reached a limit on the number of cycles, if it is judged whether I≥d, if yes, go to step 28; otherwise, go to step 29. Wherein, d is a predetermined numerical value of the number of cycles.
步骤28、得到多核处理器最低功耗所需的核数为(a+b)/2,即确定多核处理器中执行核的最优数量为(a+b)/2,因此,仅保留(a+b)/2个核作为执行核,关闭其余所有非执行核,逐步降低执行核的频率,直到刚好满足多核处理器的目标性能,完成多核处理器的功耗优化。此时的频率即为最低功耗所需的处理器频率。结束流程。Step 28. Obtain that the number of cores required for the minimum power consumption of the multi-core processor is (a+b)/2, that is, determine the optimal number of execution cores in the multi-core processor to be (a+b)/2. Therefore, only ( a+b)/2 cores are used as execution cores, and all other non-execution cores are turned off, and the frequency of execution cores is gradually reduced until the target performance of the multi-core processor is just met, and the power consumption optimization of the multi-core processor is completed. The frequency at this point is the processor frequency required for the lowest power consumption. End the process.
步骤29、计算I=I+1。Step 29, calculate I=I+1.
步骤210、比较功耗A和功耗B,判断是否功耗A>功耗B,若是,执行步骤211;否则,执行步骤214。Step 210 , compare power consumption A and power consumption B, and determine whether power consumption A>power consumption B, if yes, execute step 211 ; otherwise, execute step 214 .
步骤211、判断是否a<b,若是,执行步骤212;否则,执行步骤213。Step 211 , judging whether a<b, if yes, execute step 212 ; otherwise, execute step 213 .
步骤212、将新的核数搜索区间缩小为(a,X),即将核数下界Y设为a,也就是说Y=a,并且,将当前的b设为新的核数搜索区间中的第一试验点a即a=b,且设b=X+Y-a。这样,之后不需执行上述步骤23,可直接得到A=B,然后,执行步骤24,得到新的B值。Step 212: Reduce the new core number search interval to (a, X), that is, set the lower bound Y of the core number to a, that is to say, Y=a, and set the current b to be in the new core number search interval The first test point a is a=b, and b=X+Y-a. In this way, there is no need to perform the above step 23 afterwards, and A=B can be obtained directly, and then, step 24 is performed to obtain a new value of B.
步骤213、将新的核数搜索区间缩小为(Y,a),即将核数上界X设为a,也就是说X=a,将b设为新的核数搜索区间中的第一个试验核数a即a=b。并且设b=X+Y-a。这样,之后不需执行上述步骤23,可直接得到A=B,然后,执行步骤24,得到新的B值。Step 213: Reduce the new core number search interval to (Y, a), that is, set the upper bound X of the core number to a, that is to say, X=a, and set b to be the first in the new core number search interval The test core number a is a=b. And let b=X+Y-a. In this way, there is no need to perform the above step 23 afterwards, and A=B can be obtained directly, and then, step 24 is performed to obtain a new value of B.
步骤214、判断是否a<b,若是,执行步骤215;否则,执行步骤216。Step 214 , judging whether a<b, if yes, execute step 215 ; otherwise, execute step 216 .
步骤215、将新的核数搜索区间缩小为(Y,b),即将核数上界X设为b,也就是说X=b,将a仍设为新的核数搜索区间中的第一个试验核数a即a=a。并且设b=X+Y-a。这样,A值不变,之后不需执行上述步骤23。然后,执行步骤24,得到新的B值。Step 215. Reduce the new core number search interval to (Y, b), that is, set the upper bound X of the core number to b, that is to say, X=b, and set a to be the first in the new core number search interval The number of test cores a is a=a. And let b=X+Y-a. In this way, the value of A remains unchanged, and the above step 23 does not need to be performed afterwards. Then, execute step 24 to obtain a new B value.
步骤216、将新的核数搜索区间缩小为(b,X),即将核数下界Y设为b,也就是说Y=b,将a仍设为新的核数搜索区间中的第一个试验核数a即a=a。并且设b=X+Y-a。这样,A值不变,之后不需执行上述步骤23。然后,执行步骤24,得到新的B值。Step 216: Reduce the new core number search interval to (b, X), that is, set the lower bound Y of the core number to b, that is to say, Y=b, and set a to be the first in the new core number search interval The test core number a is a=a. And let b=X+Y-a. In this way, the value of A remains unchanged, and the above step 23 does not need to be performed afterwards. Then, execute step 24 to obtain a new B value.
当执行完上述步骤28,多机系统的功耗达到最低后,若存在以下任意一种情况时,可再次执行上述步骤21~步骤216,对多机系统进行功耗优化:After the above step 28 is executed and the power consumption of the multi-machine system reaches the minimum, if any of the following situations exists, the above steps 21 to 216 can be performed again to optimize the power consumption of the multi-machine system:
a、多核处理器所运行的线程数发生改变;a. The number of threads run by the multi-core processor changes;
b、多线程程序的负载发生改变;b. The load of the multi-threaded program changes;
c、功耗管理机制启动时;c. When the power consumption management mechanism starts;
d、当改变设定的目标性能时。d. When changing the set target performance.
上述实施例中,通过确定每次搜索的两个试验点a和b,使得每一次搜索后都会舍弃功耗值较大的试验点一侧的区间,而以其中该试验点作为下一次搜索核数范围的边界,从而有效地缩小了核数搜索范围。并且,通过循环执行搜索来不断缩小核数搜索范围,直到达到停止搜索的条件为止。这样,若循环中每次搜索所使用的分数m都一样,则下一次搜索中没有作为边界的试验点仍可以作为试验点,而且试验所得功耗可以直接使用上一次测得的结果,可以在每次循环中节省一次执行测量。In the above embodiment, by determining two test points a and b for each search, the interval on the side of the test point with a larger power consumption value will be discarded after each search, and the test point will be used as the next search core. The boundary of the number range, thus effectively narrowing the search range of the number of cores. And, the search range of the number of cores is continuously narrowed by performing the search in a loop until the condition for stopping the search is reached. In this way, if the fraction m used in each search in the loop is the same, the test point that is not used as a boundary in the next search can still be used as a test point, and the power consumption obtained from the test can directly use the result measured last time. Save performing measurements once per loop.
采用上述实施例提供的技术方案,首先,在功耗-核数空间上可以快速判断需要舍弃的搜索区间,并且每次试验点的功耗比较都可以使用一个上一次测得的功耗值,减少了搜索测量的开销,并迅速降低了需要搜索的空间。其次,在找到最低功耗所需的核数和频率的同时,可以采取关闭不需要执行程序的处理器核来节省静态功耗,也可以在不需要执行本程序的处理器核上执行其他程序,大幅提高了处理器的利用率,提高处理器的能效。最后,可以针对某一类应用程序设计专用处理器,根据程序的特征找到最低功耗所需的核数和频率,指导芯片的设计,可以有效控制芯片规模,大幅节省片上硬件开销。Using the technical solutions provided by the above embodiments, firstly, the search interval that needs to be discarded can be quickly judged in the space of power consumption-number of cores, and a power consumption value measured last time can be used for power consumption comparison of each test point, Reduces the overhead of searching for measurements and quickly reduces the space that needs to be searched. Secondly, while finding the core number and frequency required for the lowest power consumption, you can turn off the processor cores that do not need to execute the program to save static power consumption, or execute other programs on the processor cores that do not need to execute the program , greatly improving the utilization rate of the processor and improving the energy efficiency of the processor. Finally, a special processor can be designed for a certain type of application, and the number of cores and frequency required for the lowest power consumption can be found according to the characteristics of the program to guide the design of the chip, which can effectively control the chip size and greatly save on-chip hardware costs.
图3为本发明实施例提供的一种用于优化多机系统功耗的装置的结构示意图。本实施例提供的装置用于实现上述图1所示的方法,如图3所示,该装置包括:范围确定单元31、试验点确定单元32和性能调整单元33。FIG. 3 is a schematic structural diagram of an apparatus for optimizing power consumption of a multi-computer system provided by an embodiment of the present invention. The device provided in this embodiment is used to implement the method shown in FIG. 1 above. As shown in FIG. 3 , the device includes: a range determination unit 31 , a test point determination unit 32 and a performance adjustment unit 33 .
范围确定单元31用于确定多机系统中用于调整所述多机系统功耗的数据处理设备的数量范围,所述数量范围的最小值为下界,最大值为上界。The range determination unit 31 is used to determine the range of the number of data processing devices used to adjust the power consumption of the multi-computer system in the multi-computer system, the minimum value of the number range is the lower limit, and the maximum value is the upper limit.
试验点确定单元32用于在所述数量范围内进行搜索,确定第一试验点和第二试验点;所述第一试验点、第二试验点均为数据处理设备的数量,且所述第一试验点与所述第二试验点之和等于所述上界与所述下界之和。The test point determination unit 32 is used to search within the quantity range to determine the first test point and the second test point; the first test point and the second test point are the quantity of data processing equipment, and the first test point The sum of the first test point and the second test point is equal to the sum of the upper bound and the lower bound.
性能调整单元33用于根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,并逐步降低剩余数据处理设备的频率,以满足目标性能。The performance adjustment unit 33 is configured to shut down all non-executing data processing devices according to the first test point and the second test point, and gradually reduce the frequency of the remaining data processing devices to meet the target performance.
范围确定单元31可具体用于确定所述多机系统中用于调整所述多机系统功耗的数据处理设备的初始数量范围。The range determining unit 31 may be specifically configured to determine an initial quantity range of data processing devices used to adjust power consumption of the multi-computer system in the multi-computer system.
进一步地,范围确定单元31可具体用于:Further, the range determination unit 31 can be specifically used for:
若所述多机系统中运行的线程数大于所述多机系统中所有数据处理设备的数量,则所述初始数量范围的最小值为0,最大值为所述多机系统中所有数据处理设备的数量;If the number of threads running in the multi-computer system is greater than the number of all data processing devices in the multi-computer system, the minimum value of the initial number range is 0, and the maximum value is all data processing devices in the multi-computer system quantity;
若所述多机系统中的总线程数或运行的线程数小于所述多机系统中所有数据处理设备的数量,则所述初始数量范围的最小值为0,最大值为所述多机系统中运行的线程数。If the total number of threads or the number of running threads in the multi-computer system is less than the number of all data processing devices in the multi-computer system, the minimum value of the initial number range is 0, and the maximum value is 0 in the multi-computer system The number of threads running in .
可选地,试验点确定单元32可具体用于:Optionally, the test point determination unit 32 can be specifically used for:
计算a=(X+Y)×m,b=(X+Y)×(1-m);其中,a为所述第一试验点,b为所述第二试验点,0<m<1,X为所述上界,Y为所述下界,a、b、X、Y均为变量;Calculate a=(X+Y)×m, b=(X+Y)×(1-m); where a is the first test point, b is the second test point, 0<m<1 , X is the upper bound, Y is the lower bound, and a, b, X, and Y are variables;
用所述a个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗A;Execute all threads in the multi-computer system with the a data processing device, and measure the power consumption A of the multi-computer system;
用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;Execute all threads in the multi-computer system with the b data processing devices, and measure the power consumption B of the multi-computer system;
比较所述A和B;compare said A and B;
若|A-B|<w,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,w为第一预定值;If |A-B|<w, execute the step of shutting down all non-executing data processing devices according to the first test point and the second test point, where w is a first predetermined value;
若|A-B|>=w,则判断是否|a-b|<e;If |A-B|>=w, then judge whether |a-b|<e;
若|a-b|<e,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,e为第二预定值;若|a-b|>=e,则判断是否I>=d;If |a-b|<e, then execute the described first test point and second test point to close all non-executive data processing devices, wherein, e is the second predetermined value; if |a-b|>=e, then judge whether i>=d;
若I>=d,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,I为循环次数,初始值为0,d为第三预定值;If I>=d, then execute the described first test point and the second test point to close all non-executive data processing devices, wherein, I is the number of cycles, the initial value is 0, and d is the third predetermined value;
若I<d,且A>B,a<b,则计算I=I+1,Y=a,a=b,b=X+Y-a;A=B,然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A>B, a<b, then calculate I=I+1, Y=a, a=b, b=X+Y-a; A=B, and then execute the b data The processing device executes all the threads in the multi-computer system, and measures the power consumption B of the multi-computer system;
若I<d,且A>B,a>b,则计算I=I+1,X=a,a=b,b=X+Y-a;A=B,然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A>B, a>b, then calculate I=I+1, X=a, a=b, b=X+Y-a; A=B, and then execute the b data The processing device executes all the threads in the multi-computer system, and measures the power consumption B of the multi-computer system;
若I<d,且A<B,a<b,则计算I=I+1,X=b,a=a,b=X+Y-a;然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A<B, a<b, then calculate I=I+1, X=b, a=a, b=X+Y-a; then execute the described b data processing equipment to execute the All threads in the multi-machine system, and measure the power consumption B of the multi-machine system;
若I<d,且A<B,a>b,则计算I=I+1,Y=b,a=a,b=X+Y-a;然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B。If I<d, and A<B, a>b, then calculate I=I+1, Y=b, a=a, b=X+Y-a; then execute the b data processing equipment to execute the All the threads in the multi-computer system, and measure the power consumption B of the multi-computer system.
可选地,性能调整单元33可包括:数量计算子单元331和设备关闭子单元332。Optionally, the performance adjustment unit 33 may include: a quantity calculation subunit 331 and a device shutdown subunit 332 .
数量计算子单元331用于计算Z-(a+b)/2,其中Z为所述多机系统中数据处理设备的总数;The quantity calculation subunit 331 is used to calculate Z-(a+b)/2, where Z is the total number of data processing devices in the multi-computer system;
设备关闭子单元332用于关闭所述多机系统中Z-(a+b)/2个数据处理设备。The device shutting down subunit 332 is used to shut down Z-(a+b)/2 data processing devices in the multi-computer system.
图4为本发明实施例一种多机系统的结构示意图。如图4所示,该多机系统可在现有多核处理器、多处理器系统、多设备系统等基础上增加了功耗优化功能,即包括多机系统本体41和优化装置42。多机系统本体41可为现有的多核处理器、多处理器系统、多设备系统等。优化装置42可为图3所示的任意一种用于优化多机系统功耗的装置,用于优化所述多机系统本体41的功耗。FIG. 4 is a schematic structural diagram of a multi-machine system according to an embodiment of the present invention. As shown in FIG. 4 , the multi-machine system can add a power consumption optimization function on the basis of existing multi-core processors, multi-processor systems, multi-device systems, etc., that is, it includes a multi-machine system body 41 and an optimization device 42 . The multi-machine system body 41 may be an existing multi-core processor, a multi-processor system, a multi-device system, and the like. The optimization device 42 may be any device shown in FIG. 3 for optimizing the power consumption of the multi-machine system, and is used for optimizing the power consumption of the multi-machine system body 41 .
本发明实施例还给出一种计算机程序产品,该计算机程序产品包括计算机可读介质,该可读介质包括第一组程序代码,用于执行上述图1所示方法中的步骤:The embodiment of the present invention also provides a computer program product, the computer program product includes a computer-readable medium, and the readable medium includes a first set of program codes for executing the steps in the method shown in Figure 1 above:
确定多机系统中用于调整所述多机系统功耗的数据处理设备的数量范围,所述数量范围的最小值为下界,最大值为上界;Determining the number range of data processing devices used to adjust the power consumption of the multi-machine system in the multi-machine system, the minimum value of the number range is the lower bound, and the maximum value is the upper bound;
在所述数量范围内进行搜索,确定第一试验点和第二试验点;所述第一试验点、第二试验点均为数据处理设备的数量,且所述第一试验点与所述第二试验点之和等于所述上界与所述下界之和;Search within the number range to determine the first test point and the second test point; the first test point and the second test point are the number of data processing equipment, and the first test point and the second test point The sum of the two test points is equal to the sum of the upper bound and the lower bound;
根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,并逐步降低剩余数据处理设备的频率,以满足目标性能。Shut down all non-executing data processing devices according to the first test point and the second test point, and gradually reduce the frequency of the remaining data processing devices to meet the target performance.
可选地,确定多机系统中用于调整所述多机系统功耗的数据处理设备的数量范围,包括:Optionally, determining the number range of data processing devices used to adjust the power consumption of the multi-computer system in the multi-computer system includes:
确定所述多机系统中用于调整所述多机系统功耗的数据处理设备的初始数量范围。Determine an initial quantity range of data processing devices in the multi-computer system for adjusting power consumption of the multi-computer system.
可选地,确定所述多机系统中用于调整所述多机系统功耗的数据处理设备的初始数量范围,包括:Optionally, determining an initial quantity range of data processing devices used to adjust power consumption of the multi-computer system in the multi-computer system includes:
若所述多机系统中运行的线程数大于所述多机系统中所有数据处理设备的数量,则所述初始数量范围的最小值为0,最大值为所述多机系统中所有数据处理设备的数量;If the number of threads running in the multi-computer system is greater than the number of all data processing devices in the multi-computer system, the minimum value of the initial number range is 0, and the maximum value is all data processing devices in the multi-computer system quantity;
若所述多机系统中的总线程数或运行的线程数小于所述多机系统中所有数据处理设备的数量,则所述初始数量范围的最小值为0,最大值为所述多机系统中运行的线程数。If the total number of threads or the number of running threads in the multi-computer system is less than the number of all data processing devices in the multi-computer system, the minimum value of the initial number range is 0, and the maximum value is 0 in the multi-computer system The number of threads running in .
可选地,在所述数量范围内进行搜索,确定第一试验点和第二试验点,包括:Optionally, searching within the quantity range to determine the first test point and the second test point includes:
计算a=(X+Y)×m,b=(X+Y)×(1-m);其中,a为所述第一试验点,b为所述第二试验点,0<m<1,X为所述上界,Y为所述下界,a、b、X、Y均为变量;Calculate a=(X+Y)×m, b=(X+Y)×(1-m); where a is the first test point, b is the second test point, 0<m<1 , X is the upper bound, Y is the lower bound, and a, b, X, and Y are variables;
用所述a个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗A;Execute all threads in the multi-computer system with the a data processing device, and measure the power consumption A of the multi-computer system;
用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;Execute all threads in the multi-computer system with the b data processing devices, and measure the power consumption B of the multi-computer system;
比较所述A和B;compare said A and B;
若|A-B|<w,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,w为第一预定值;If |A-B|<w, execute the step of shutting down all non-executing data processing devices according to the first test point and the second test point, where w is a first predetermined value;
若|A-B|>=w,则判断是否|a-b|<e;If |A-B|>=w, judge whether |a-b|<e;
若|a-b|<e,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,e为第二预定值;若|a-b|>=e,则判断是否I>=d;If |a-b|<e, then perform the described first test point and second test point to close all non-executive data processing devices, wherein, e is the second predetermined value; if |a-b|>=e, then judge whether i>=d;
若I>=d,则执行所述根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,其中,I为循环次数,初始值为0,d为第三预定值;If I>=d, then execute the described first test point and the second test point to close all non-executive data processing devices, wherein, I is the number of cycles, the initial value is 0, and d is the third predetermined value;
若I<d,且A>B,a<b,则计算I=I+1,Y=a,a=b,b=X+Y-a;A=B,然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A>B, a<b, then calculate I=I+1, Y=a, a=b, b=X+Y-a; A=B, and then execute the b data The processing device executes all the threads in the multi-computer system, and measures the power consumption B of the multi-computer system;
若I<d,且A>B,a>b,则计算I=I+1,X=a,a=b,b=X+Y-a;A=B,然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A>B, a>b, then calculate I=I+1, X=a, a=b, b=X+Y-a; A=B, and then execute the b data The processing device executes all the threads in the multi-computer system, and measures the power consumption B of the multi-computer system;
若I<d,且A<B,a<b,则计算I=I+1,X=b,a=a,b=X+Y-a;然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B;If I<d, and A<B, a<b, then calculate I=I+1, X=b, a=a, b=X+Y-a; then execute the b data processing equipment to execute the All threads in the multi-machine system, and measure the power consumption B of the multi-machine system;
若I<d,且A<B,a>b,则计算I=I+1,Y=b,a=a,b=X+Y-a;然后执行所述用所述b个数据处理设备执行所述多机系统中的所有线程,并测得所述多机系统的功耗B。If I<d, and A<B, a>b, then calculate I=I+1, Y=b, a=a, b=X+Y-a; then execute the b data processing equipment to execute the All the threads in the multi-computer system, and measure the power consumption B of the multi-computer system.
可选地,根据所述第一试验点和第二试验点关闭所有非执行数据处理设备,包括:Optionally, shutting down all non-executive data processing devices according to the first test point and the second test point includes:
计算Z-(a+b)/2,其中Z为所述多机系统中数据处理设备的总数;Calculate Z-(a+b)/2, where Z is the total number of data processing devices in the multi-computer system;
关闭所述多机系统中Z-(a+b)/2个数据处理设备。Shut down Z-(a+b)/2 data processing devices in the multi-computer system.
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above method embodiments can be completed by program instructions and related hardware. The aforementioned program can be stored in a computer-readable storage medium. When the program is executed, it executes the steps including the above-mentioned method embodiments; and the aforementioned storage medium includes: ROM, RAM, magnetic disk or optical disk and other various media that can store program codes.
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.
Claims (11)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310001368.5A CN103914121B (en) | 2013-01-04 | 2013-01-04 | Multicomputer system and method and device for optimizing power consumption of same |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201310001368.5A CN103914121B (en) | 2013-01-04 | 2013-01-04 | Multicomputer system and method and device for optimizing power consumption of same |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN103914121A CN103914121A (en) | 2014-07-09 |
| CN103914121B true CN103914121B (en) | 2017-04-19 |
Family
ID=51039875
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201310001368.5A Expired - Fee Related CN103914121B (en) | 2013-01-04 | 2013-01-04 | Multicomputer system and method and device for optimizing power consumption of same |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN103914121B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10296067B2 (en) * | 2016-04-08 | 2019-05-21 | Qualcomm Incorporated | Enhanced dynamic clock and voltage scaling (DCVS) scheme |
| CN109471716A (en) * | 2018-09-26 | 2019-03-15 | 努比亚技术有限公司 | A kind of application thread processing method, terminal and computer readable storage medium |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1641534A (en) * | 2004-01-13 | 2005-07-20 | Lg电子株式会社 | Apparatus for controlling power of processor having a plurality of cores and control method of the same |
| CN101010655A (en) * | 2004-09-03 | 2007-08-01 | 英特尔公司 | Coordinating idle state transitions in multi-core processors |
| CN101790709A (en) * | 2007-08-27 | 2010-07-28 | 马维尔国际贸易有限公司 | Dynamic core switches |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7568115B2 (en) * | 2005-09-28 | 2009-07-28 | Intel Corporation | Power delivery and power management of many-core processors |
| US7617403B2 (en) * | 2006-07-26 | 2009-11-10 | International Business Machines Corporation | Method and apparatus for controlling heat generation in a multi-core processor |
-
2013
- 2013-01-04 CN CN201310001368.5A patent/CN103914121B/en not_active Expired - Fee Related
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1641534A (en) * | 2004-01-13 | 2005-07-20 | Lg电子株式会社 | Apparatus for controlling power of processor having a plurality of cores and control method of the same |
| CN101010655A (en) * | 2004-09-03 | 2007-08-01 | 英特尔公司 | Coordinating idle state transitions in multi-core processors |
| CN101790709A (en) * | 2007-08-27 | 2010-07-28 | 马维尔国际贸易有限公司 | Dynamic core switches |
Non-Patent Citations (1)
| Title |
|---|
| Godson-T: An Efficient Many-Core Architecture for Parallel Program Executions;范东睿等;《JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY》;20091130;第24卷(第6期);1061-1073页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN103914121A (en) | 2014-07-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11579934B2 (en) | Scheduler for amp architecture with closed loop performance and thermal controller | |
| CN104657219B (en) | A kind of application program threads number dynamic adjusting method being used under isomery many-core system | |
| Santriaji et al. | Grape: Minimizing energy for gpu applications with performance requirements | |
| KR101842016B1 (en) | Method for dynamically controlling power in multicore environment | |
| WO2013002864A1 (en) | Method and system for determining an energy-efficient operating point of a platform | |
| CN112748998B (en) | Convolutional neural network task scheduling method and system for mobile terminal | |
| CN110941325A (en) | Frequency modulation method and device for processor, and computing device | |
| CN103037109A (en) | Multicore equipment energy consumption management method and device | |
| CN105786615A (en) | control method and control system for optimizing processor | |
| Xu et al. | {Power-Aware} Throughput Control for Database Management Systems | |
| CN103914121B (en) | Multicomputer system and method and device for optimizing power consumption of same | |
| JP2017528851A (en) | Processor state control based on detection of producer / consumer workload serialization | |
| CN109582119B (en) | Double-layer Spark energy-saving scheduling method based on dynamic voltage and frequency adjustment | |
| Terzopoulos et al. | Performance evaluation of a real-time grid system using power-saving capable processors | |
| WO2023082723A1 (en) | Processor circuit, power supply control method and terminal device | |
| Wang et al. | Evaluating the energy consumption of openmp applications on haswell processors | |
| CN104160359A (en) | Priority based intelligent platform passive thermal management | |
| CN106708242B (en) | An optimal method for energy consumption of hard real-time systems | |
| Kaur et al. | Adaptive Behavior-Driven Thermal Management Framework in Heterogeneous Multi-Core Processors | |
| Holmbacka et al. | Accurate energy modeling for many-core static schedules with streaming applications | |
| CN106598203A (en) | Power management method of CMP (Chip MultiProcessors) system under data-intensive environment | |
| Ge et al. | Performance and energy modeling for cooperative hybrid computing | |
| KR102165987B1 (en) | A device and frequency setting method using dvfs technique | |
| Zhu et al. | Onac: optimal number of active cores detector for energy efficient gpu computing | |
| WO2021056033A2 (en) | Apparatus and method of intelligent power and performance management |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170419 Termination date: 20210104 |