CN102156666B

CN102156666B - Temperature optimizing method for resource scheduling of coarse reconfigurable array processor

Info

Publication number: CN102156666B
Application number: CN2011100993472A
Authority: CN
Inventors: 谢雳; 何卫锋; 景乃锋; 绳伟光; 毛志刚
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2011-04-20
Filing date: 2011-04-20
Publication date: 2012-11-28
Anticipated expiration: 2031-04-20
Also published as: CN102156666A

Abstract

A temperature optimization method for resource scheduling of coarse-grained reconfigurable array processors in the field of computer application technology. The temperature optimization strategy is initialized when hardware resource scheduling is executed in the algorithm task compilation process; the initial resource scheduling scheme is selected and passed Computing task nodes perform initial scheduling of array resources constrained by data dependencies, and each computing task node is completed by an array unit of a reconfigurable hardware array processor; finally, the scheduling position of computing tasks is randomly reselected and re-solved by three passes The thermal equation obtains the predicted temperature value, based on which a more optimal resource scheduling scheme is selected to reduce the temperature distribution of the device during operation. When the number of optimizations or the set optimal temperature is reached, the optimization process is stopped and the current optimal scheme is selected as The final resource scheduling scheme.

Description

A temperature-optimized method for processor resource scheduling of coarse-grained reconfigurable arrays

技术领域 technical field

本发明涉及的是一种计算机应用技术领域的方法，具体是一种用于粗粒度可重构阵列处理器资源调度的温度优化方法。The invention relates to a method in the field of computer application technology, in particular to a temperature optimization method for coarse-grained reconfigurable array processor resource scheduling.

背景技术 Background technique

可重构计算系统的诞生是为了填补ASIC和微处理器之间的空白。粗粒度可重构系统由于在系统性能和功能灵活性方面有着较好的折中，因而获得了越来越多的关注。粗粒度可重构系统的应用涉及图像压缩，图形和图片处理，数据加密和DSP转换等等。Reconfigurable computing systems were born to fill the gap between ASICs and microprocessors. Coarse-grained reconfigurable systems have gained more and more attention due to their good trade-off between system performance and functional flexibility. The applications of coarse-grained reconfigurable systems involve image compression, graphics and picture processing, data encryption and DSP conversion, and so on.

然而，对可重构计算系统进行编程将是一个极为繁琐且容易出错的过程，因为它需要程序员对目标硬件有着较好的理解，这就限制了这项技术的广泛接受和推广。针对这个问题，研究者们已经提出了许多不同的方法，例如采用编程语言，高级别的编译方式等来将应用算法映射到硬件中去。这些半自动、自动的编译工具可以使程序员将应用算法的硬件映射过程变得更为容易，也使得算法的移植性更强。半自动的工具如Hartej等在2000年为Morphosys系统提出的一套综合软件平台，它采用手工方式在图形界面中为可重构阵列(RCA)产生配置字。全自动的方式如Joao M.P.在2006年为PACT-XPP混合可重构处理器系统开发的XPP C编译器，该编译器使用流水线向量化和时域分割方法产生配置字。在编译器进行编译的过程中都需要将算法中的计算任务通过调用可重构硬件资源来完成。However, programming a reconfigurable computing system would be an extremely tedious and error-prone process because it requires programmers to have a good understanding of the target hardware, which limits the widespread acceptance and promotion of this technology. In response to this problem, researchers have proposed many different methods, such as using programming languages, high-level compilation methods, etc. to map application algorithms to hardware. These semi-automatic and automatic compilation tools can make it easier for programmers to map the hardware of the application algorithm, and also make the algorithm more portable. Semi-automatic tools such as Hartej et al. proposed a comprehensive software platform for the Morphosys system in 2000, which uses manual methods to generate configuration words for reconfigurable arrays (RCA) in a graphical interface. A fully automatic method, such as the XPP C compiler developed by Joao M.P. for the PACT-XPP hybrid reconfigurable processor system in 2006, uses pipeline vectorization and time domain segmentation methods to generate configuration words. During the compilation process of the compiler, the calculation tasks in the algorithm need to be completed by calling reconfigurable hardware resources.

经过对现有技术的检索发现，目前的主要的资源调度技术都只关注硬件的利用率和任务执行的效率。例如在Joao M.P.在2006年发表的《XPP-VC：一个PACT-XPP平台上具有时域划分的C编译器》中提出的编译器中资源调度技术只关注了任务进行时域划分后在计算上计算的效率以及占用的PE数量，而没有考虑任务在器件上运行时的温度效应。然而，随着芯片性能的发展，片上的热问题变得也越来越严重。高温既会影响器件运行时的功耗，还会降低器件的稳定性。为了最小化高温的负面效应，在芯片的设计、制造以及软件编译过程中都应该采取不同的措施。已经有很多策略用来解决这个问题，例如微处理器中的动态热管理技术(DTM)，以及多核处理器中的同步多线程技术(SMT)。本技术首次将温度因素考虑到了资源的调度技术中。本文中温度优化的目的是通过平衡芯片上的温度分布来避免芯片上出现局部过热点。它在算法编译的过程中对映射任务进行资源调度时自动完成，这样不会增加任何硬件的设计成本。After searching the existing technologies, it is found that the current main resource scheduling technologies only focus on the utilization rate of hardware and the efficiency of task execution. For example, the resource scheduling technology in the compiler proposed in "XPP-VC: A C Compiler with Time Domain Division on the PACT-XPP Platform" published by Joao M.P. in 2006 only focuses on the calculation of tasks after time domain division. The efficiency of the calculation and the number of PEs occupied, without considering the temperature effect of the task running on the device. However, with the development of chip performance, on-chip thermal problems become more and more serious. High temperatures both affect power consumption during device operation and degrade device stability. In order to minimize the negative effects of high temperature, different measures should be taken during chip design, manufacturing and software compilation. Many strategies have been used to solve this problem, such as dynamic thermal management technology (DTM) in microprocessors, and simultaneous multithreading technology (SMT) in multi-core processors. This technology takes the temperature factor into the resource scheduling technology for the first time. The purpose of temperature optimization in this paper is to avoid local hot spots on the chip by balancing the temperature distribution on the chip. It automatically completes the resource scheduling of the mapping task during the algorithm compilation process, which does not increase any hardware design costs.

发明内容Contents of the invention

本发明针对现有技术存在的上述不足，提供一种用于粗粒度可重构阵列处理器资源调度的温度优化方法，在编译过程中通过预测来平衡硬件资源执行时温度的方式来缓解这个问题。这样可以在不增加任何硬件成本的情况下降低芯片产生局部过热点的风险。优化的结果可以将算法中间表示所代表的计算任务通过温度优化的调度方案调用硬件资源来完成。Aiming at the above-mentioned deficiencies in the prior art, the present invention provides a temperature optimization method for resource scheduling of coarse-grained reconfigurable array processors, which alleviates this problem by predicting and balancing the temperature of hardware resources during execution in the compilation process . This reduces the risk of localized hotspots on the chip without adding any hardware cost. As a result of the optimization, the computing tasks represented by the intermediate representation of the algorithm can be completed by invoking hardware resources through a temperature-optimized scheduling scheme.

本发明是通过以下技术方案实现的，本发明包括以下步骤：The present invention is achieved through the following technical solutions, and the present invention comprises the following steps:

第一步、在算法任务编译流程中执行硬件资源调度时开始进行温度优化策略初始化；The first step is to initialize the temperature optimization strategy when executing hardware resource scheduling in the algorithm task compilation process;

所述的温度优化策略初始化是指：根据硬件物理参数，通过热电导来等效器件内模块之间的热传导关系建立热模型，然后读取器件功耗参数。The temperature optimization strategy initialization refers to establishing a thermal model based on hardware physical parameters, equivalent to the heat conduction relationship between modules in the device through thermal conductance, and then reading the power consumption parameters of the device.

所述的硬件模型物理参数包括：硬件版图参数，芯片厚度，热扩散系数，比热系数。The physical parameters of the hardware model include: hardware layout parameters, chip thickness, thermal diffusivity, and specific heat coefficient.

所述的热电导的计算方式是：其中：t是模块的有效厚度，A是模块的面积，k是热电导系数。The calculation method of the thermal conductance is: Where: t is the effective thickness of the module, A is the area of the module, and k is the thermal conductivity.

第二步、选择初始资源调度方案并通过计算任务节点进行数据相关性约束的阵列资源初始调度，每个计算任务节点都由一个可重构硬件阵列处理器的阵列单元来完成。The second step is to select an initial resource scheduling scheme and perform initial scheduling of array resources constrained by data dependencies through computing task nodes. Each computing task node is completed by an array unit of a reconfigurable hardware array processor.

所述的计算任务节点对热模型进行求解并预测在当前资源调度方案下硬件执行时的温度：G·T＝P，其中：G代表热电导矩阵，T代表需要预测的温度，P代表功耗。The computing task node solves the thermal model and predicts the temperature during hardware execution under the current resource scheduling scheme: G T = P, where: G represents the thermal conductivity matrix, T represents the temperature to be predicted, and P represents power consumption .

第三步、随机重新选择计算任务的调度位置并通三过重新求解热方程获取预测的温度值，据此选取更优化的资源调度方案，以降低器件在运行时的温度分布，当达到优化次数或者设定的优化温度则停止优化过程并选取当前最优的方案作为最终资源调度方案。The third step is to randomly re-select the scheduling position of the computing task and obtain the predicted temperature value by re-solving the thermal equation three times. Based on this, a more optimized resource scheduling scheme is selected to reduce the temperature distribution of the device during operation. When the optimization times are reached Or the set optimization temperature stops the optimization process and selects the current optimal solution as the final resource scheduling solution.

本发明是通过以下原理提出的：当阵列中存在空余的硬件资源时，可以在不改变数据相关性的前提下将温度较高的计算任务迁移到空余资源上，或者将其迁移到散热率较高的可重构单元上去执行。最佳的迁移策略是通过优化搜索来获得的。搜索时先进行二维阵列行方向上的搜索。当某一行的调度完成时，开始进行阵列列方向上的搜索。在进行列搜索的过程中，每行中的节点作为一个整体在阵列内平移，行内任务节点之间的相对位置不变。The present invention is proposed through the following principle: when there are spare hardware resources in the array, the computing tasks with higher temperature can be migrated to the spare resources without changing the data correlation, or can be migrated to High reconfigurable units are executed. The optimal migration strategy is obtained by optimizing the search. When searching, search in the row direction of the two-dimensional array first. When the scheduling of a certain row is completed, the search in the direction of the array column is started. In the process of column search, the nodes in each row are translated in the array as a whole, and the relative positions of the task nodes in the row remain unchanged.

本发明与现有的热优化技术相比，具有以下优点：1)能够在资源调度的同时对执行时的器件温度进行预测，并根据预测结果选取最佳调度方案。2)温度优化过程是在资源调度时自动完成，无需额外的硬件资源，从而可以节省硬件设计成本。3)温度优化过程是在硬件执行之前通过编译器静态完成，无需复杂的动态管理策略。Compared with the existing thermal optimization technology, the present invention has the following advantages: 1) It can predict the device temperature during execution while scheduling resources, and select the best scheduling scheme according to the prediction result. 2) The temperature optimization process is automatically completed during resource scheduling without requiring additional hardware resources, thereby saving hardware design costs. 3) The temperature optimization process is statically completed by the compiler before hardware execution, without complex dynamic management strategies.

附图说明 Description of drawings

图1可重构硬件阵列处理器构架。Figure 1 Reconfigurable hardware array processor architecture.

图2资源调度过程中进行温度优化的流程。Figure 2 is the flow of temperature optimization in the process of resource scheduling.

图3器件的简化热模型示例。Figure 3 Example of a simplified thermal model of a device.

图4优化前后的阵列温度分布示例。Figure 4 Example of array temperature distribution before and after optimization.

具体实施方式 Detailed ways

下面对本发明的实施例作详细说明，本实施例在以本发明技术方案为前提下进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The embodiments of the present invention are described in detail below. This embodiment is implemented on the premise of the technical solution of the present invention, and detailed implementation methods and specific operating procedures are provided, but the protection scope of the present invention is not limited to the following implementation example.

实施例Example

如图1所示，本实施例所提到的可重构阵列硬件资源是一个同构的可重构单元组成的二维阵列，相邻行间的阵列单元之间可以进行数据通信，而同行之内的阵列单元则不可进行数据交互。As shown in Figure 1, the reconfigurable array hardware resource mentioned in this embodiment is a two-dimensional array composed of isomorphic reconfigurable units, and data communication can be performed between array units in adjacent rows, while The array units within cannot perform data interaction.

本实施例包括以下步骤：This embodiment includes the following steps:

所述的热电导的计算方式是：

其中：t是模块的有效厚度，A是模块的面积，k是热电导系数。The calculation method of the thermal conductance is:

Where: t is the effective thickness of the module, A is the area of the module, and k is the thermal conductivity.

如图2所示，温度优化技术基于对器件的建模。优化的流程包括读取器件模型参数和器件各个RC运行时的功耗参数，然后基于热模型对不同的调度方案进行择优选择。选择的过程是一个迭代搜索的过程。搜索的过程中需要不断调整任务节点在阵列RC上的位置，以获取最优的计算任务的资源调度方案。As shown in Figure 2, temperature optimization techniques are based on modeling the device. The optimization process includes reading the device model parameters and the power consumption parameters of each RC operation of the device, and then selecting the best scheduling schemes based on the thermal model. The selection process is an iterative search process. During the search process, it is necessary to continuously adjust the position of the task nodes on the array RC to obtain the optimal resource scheduling scheme for computing tasks.

如图3所示，对处理器中的可重构阵列采用了简化热模型的方式进行建模。模型考虑了阵列内各个RC之间的热传导过程，还考虑了器件层与导热层之间的热传导。热模型采用横向热电阻来等效RC之间的热传导关系，而用纵向热电阻来等效器件层与导热层之间热传导关系。As shown in Figure 3, the reconfigurable array in the processor is modeled using a simplified thermal model. The model considers the heat conduction process between each RC in the array, and also considers the heat conduction between the device layer and the heat conduction layer. The thermal model uses the horizontal thermal resistance to equate the heat conduction relationship between RCs, and uses the longitudinal thermal resistance to equate the heat conduction relationship between the device layer and the heat conduction layer.

如图4所示，比较了某个应用中采用温度优化技术后器件上温度分布的变化。优化后器件最高温度减少了将近7℃。同时器件运行时的温度分布范围也降低了约10℃。优化技术能改善器件运行时的温度分布，从而有效缓解了热效应对器件的危害。As shown in Figure 4, the change in the temperature distribution on the device after using the temperature optimization technique in an application is compared. After optimization, the maximum temperature of the device is reduced by nearly 7°C. At the same time, the temperature distribution range during device operation is also reduced by about 10°C. Optimization technology can improve the temperature distribution of the device during operation, thereby effectively alleviating the damage of the thermal effect on the device.

本方法需要的输入数据包括代表计算任务的数据流图(DFG)、建立热模型所需的物理参数文件和可重构阵列单元的功耗参数文件。The input data required by this method include a data flow graph (DFG) representing a computing task, a physical parameter file required for building a thermal model, and a power consumption parameter file of a reconfigurable array unit.

映射的任务DFG从算法C源码中转化而来。先需将DFG进行优化，并划分成适合硬件阵列计算规模的子图，才能将其作为资源调度的输入数据。The mapped task DFG is transformed from the algorithm C source code. The DFG needs to be optimized and divided into subgraphs suitable for the computing scale of the hardware array before it can be used as input data for resource scheduling.

1.在编译器开始进行资源调度时开始实施温度优化策略。1. The temperature optimization strategy is implemented when the compiler starts resource scheduling.

2.将阵列硬件的热模型参数读入程序，开始为器件建立简化热模型，为下一步温度分析做准备。热模型中通过热电导来等效器件内模块之间的热传导关系，热电导的计算方式是：2. Read the thermal model parameters of the array hardware into the program, start to build a simplified thermal model for the device, and prepare for the next step of temperature analysis. In the thermal model, the heat conduction relationship between the modules in the device is equivalent to the thermal conductance. The calculation method of the thermal conductance is:

$G G = = k k \frac{t t}{A A}$

其中t是模块的有效厚度，A是模块的面积，k是热电导系数。建模时可以将器件分为两层：一层代表器件层，另一层代表导热层，见附图3。也可以根据器件的具体实现选择更复杂的建模方式。where t is the effective thickness of the module, A is the area of the module, and k is the thermal conductivity. When modeling, the device can be divided into two layers: one layer represents the device layer, and the other layer represents the thermal conductivity layer, see Figure 3. You can also choose more complex modeling methods according to the specific implementation of the device.

3.然后读取阵列单元执行时的功耗参数，为求解热方程提供必要的功耗信息。3. Then read the power consumption parameters when the array unit is executing, and provide the necessary power consumption information for solving the heat equation.

4.寻找一个既满足硬件连通约束，又满足任务节点数据相关性的初始资源调度方案，将任务DFG中的计算任务接点调度到某一个阵列单元位置上。4. Find an initial resource scheduling scheme that satisfies both hardware connectivity constraints and task node data dependencies, and schedules the computing task nodes in the task DFG to a certain array unit position.

5.开始对初始调度方案进行迭代式的随机优化搜索，搜索的过程中应保证任务节点之间的数据相关性，同时通过求解热方程5. Start an iterative random optimization search for the initial scheduling plan. During the search process, the data correlation between task nodes should be ensured, and at the same time, by solving the heat equation

G·T＝PG·T=P

预测当前调度方案的温度分布。搜索时先进行阵列行方向上的搜索，当阵列某一行上的阵列单元被调度完或者无法在继续在该行上调度任务时，停止对该行的任务调度。当某一行的调度完成时，开始进行阵列列方向上的搜索。在进行列搜索的过程中，每行中的节点作为一个整体在阵列内平移，行内的任务节点之间的相对位置不变。Predict the temperature distribution for the current schedule. When searching, the search is performed in the row direction of the array first, and when the array units on a row of the array are dispatched or the task cannot be scheduled on the row, the task scheduling of the row is stopped. When the scheduling of a certain row is completed, the search in the direction of the array column is started. In the process of column search, the nodes in each row are translated in the array as a whole, and the relative positions of the task nodes in the row remain unchanged.

6.当到达设定的优化次数或优化温度后，终止优化的过程，并选取当前最优方案作为最终的资源调度方案。6. When the set number of optimization times or optimization temperature is reached, the optimization process is terminated, and the current optimal solution is selected as the final resource scheduling solution.

Claims

1. A temperature optimization method for coarse-grained reconfigurable array processor resource scheduling, characterized in that, comprising the following steps:

The first step is to initialize the temperature optimization strategy when executing hardware resource scheduling in the algorithm task compilation process;

The second step is to select an initial resource scheduling scheme and perform initial scheduling of array resources constrained by data dependencies through computing task nodes. Each computing task node is completed by an array unit of a reconfigurable hardware array processor;

The third step is to randomly re-select the scheduling position of the computing task and obtain the predicted temperature value by re-solving the thermal equation. Based on this, a more optimized resource scheduling scheme is selected to reduce the temperature distribution of the device during operation. If the optimization temperature is set, the optimization process will be stopped and the current optimal plan will be selected as the final resource scheduling plan; different scheduling plans will be selected based on the thermal model. The selection process is an iterative search process; When the array units on a certain row of the array are scheduled or cannot continue to schedule tasks on this row, stop the task scheduling of this row; when the scheduling of a certain row is completed, start to search in the direction of the array column , during the column search process, the nodes in each row are translated in the array as a whole, and the relative positions of the task nodes in the row remain unchanged.

2. The temperature optimization method for coarse-grained reconfigurable array processor resource scheduling according to claim 1, characterized in that, the initialization of the temperature optimization strategy refers to: according to the physical parameters of the hardware, through thermal conductance etc. Establish a thermal model based on the heat conduction relationship between modules in the effective device, and then read the device power consumption parameters.

3. The temperature optimization method for coarse-grained reconfigurable array processor resource scheduling according to claim 2, wherein the physical parameters of the hardware model include: hardware layout parameters, chip thickness, thermal diffusivity, Specific heat coefficient.

4. The temperature optimization method for resource scheduling of a coarse-grained reconfigurable array processor according to claim 2, wherein the calculation method of the thermal conductance is:

5. The temperature optimization method for resource scheduling of coarse-grained reconfigurable array processors according to claim 1, wherein the computing task node solves the thermal model and predicts the hardware temperature under the current resource scheduling scheme. Execution temperature: G·T=P, where: G represents the thermal conductance matrix, T represents the temperature to be predicted, and P represents power consumption.