[go: up one dir, main page]

CN100533344C - Method, system and processor for testing thermal regulation control of real-time software - Google Patents

Method, system and processor for testing thermal regulation control of real-time software Download PDF

Info

Publication number
CN100533344C
CN100533344C CNB2007101054865A CN200710105486A CN100533344C CN 100533344 C CN100533344 C CN 100533344C CN B2007101054865 A CNB2007101054865 A CN B2007101054865A CN 200710105486 A CN200710105486 A CN 200710105486A CN 100533344 C CN100533344 C CN 100533344C
Authority
CN
China
Prior art keywords
time
temperature
thermal
interrupt
thermal management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB2007101054865A
Other languages
Chinese (zh)
Other versions
CN101093413A (en
Inventor
C·R·约翰斯
王帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/425,483 external-priority patent/US7512513B2/en
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN101093413A publication Critical patent/CN101093413A/en
Application granted granted Critical
Publication of CN100533344C publication Critical patent/CN100533344C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Test And Diagnosis Of Digital Computers (AREA)
  • Hardware Redundancy (AREA)

Abstract

To provide a thermal management solution for guaranteeing real-time use nature even in a temperature condition which requires throttling of a processor. A computer implemented method, a data processing system and the processor are provided for thermal throttling control for testing of real-time software. At least one thermal control setting is received. A thermal management system is set to a test mode using the at least one thermal control setting. The test mode indicates thermal throttling control using the thermal control setting. The real-time software is executed under the test mode, and a test is performed as to whether a real-time deadline associated with the real-time software is met under the test mode. At least one thermal control setting is recorded as a passing thermal control setting in response to the real-time software meeting the real-time deadline.

Description

用于测试实时软件的热调节控制的方法、系统和处理器 Method, system and processor for testing thermal regulation control of real-time software

技术领域 technical field

本申请一般地涉及对热管理的使用。更特别地,本申请涉及一种用于测试实时软件的热调节控制的计算机实现的方法、数据处理系统和处理器。This application relates generally to the use of thermal management. More particularly, the present application relates to a computer-implemented method, data processing system, and processor for testing thermal regulation control of real-time software.

背景技术 Background technique

第一代异质Cell Broadband EngineTM(BE)处理器是包括一个64比特的Power 

Figure C200710105486D00051
处理器内核和八个单指令多数据(SIMD)协处理器内核的多内核芯片,能够进行大规模浮点处理,针对运算密集型工作负载和宽带富媒体应用而进行了优化。高速存储控制器和高带宽总线接口也集成到芯片上。Cell BE的突破性多内核体系结构和超高速通信能力在很多情况下以最新PC处理器性能的10倍递送大大改善的实时响应。Cell BE是操作系统中立的并同时支持多个操作系统。这种类型的处理器的应用的范围可以从具有显著增强的真实感的下一代游戏系统,到形成家庭数字媒体和流式传送内容中心(hub)的系统,到用于开发和分布数字内容的系统,并且到加速可视化和超级计算应用的系统。The first-generation heterogeneous Cell Broadband Engine TM (BE) processor is a 64-bit Power
Figure C200710105486D00051
A multicore chip with processor cores and eight single-instruction multiple-data (SIMD) coprocessor cores, capable of massive floating-point processing, is optimized for compute-intensive workloads and broadband-rich media applications. A high-speed memory controller and a high-bandwidth bus interface are also integrated on-chip. Cell BE's breakthrough multi-core architecture and ultra-high-speed communication capabilities deliver greatly improved real-time response in many cases at 10 times the performance of the latest PC processors. Cell BE is OS neutral and supports multiple operating systems simultaneously. Applications for this type of processor can range from next-generation gaming systems with dramatically enhanced realism, to systems forming home digital media and streaming content hubs, to systems for developing and distributing digital content. systems, and to systems that accelerate visualization and supercomputing applications.

现在的多内核处理器常常受到热考虑的限制。典型的解决方案包括冷却和电源管理。冷却可能是昂贵的并且/或者难以整合。功率管理一般是粗略的措施,作为对达到热限度的响应,对处理器的很大一部分或整个处理器进行“调节”。诸如热管理之类的其他技术通过只调节超过给定温度的单元来帮助实现这些粗略措施。但是,多数热管理技术会影响应用的实时保证。因此,提供热管理解决方案是有益的,该解决方案为处理器提供一种方法,用以在即使出现需要调节处理器的热状况的情况下也保证应用的实时性。在不能满足实时保证的情况下,通知应用管理者使得可以实现纠正措施。Today's multi-core processors are often limited by thermal considerations. Typical solutions include cooling and power management. Cooling can be expensive and/or difficult to integrate. Power management is generally a crude measure to "throttle" a large portion of a processor, or the entire processor, in response to reaching a thermal limit. Other techniques such as thermal management help with these rough measures by only regulating units that exceed a given temperature. However, most thermal management techniques affect the real-time assurance of the application. Accordingly, it would be beneficial to provide a thermal management solution that provides a method for the processor to ensure real-time performance of applications even in the presence of conditions that require the processor's thermal conditions to be adjusted. In the event that real-time guarantees cannot be met, the application manager is notified so that corrective actions can be implemented.

发明内容 Contents of the invention

说明性实施例的不同方面提供了一种用于测试实时软件的热调节控制的计算机实现的方法、数据处理系统和处理器。说明性实施例接收至少一个热控制设置。说明性实施例使用该至少一个热控制设置将热管理系统设置为测试模式,其中测试模式表明使用热控制设置的热调节控制。说明性实施例在测试模式下执行实时软件,并且测试在该测试模式下是否满足与该实时软件相关联的实时期限。作为对实时软件满足实时期限的响应,说明性实施例将该至少一个热控制设置记录为通过的热控制设置。作为对实时软件不满足实时期限的响应,将该至少一个热控制设置记录为失败的热控制设置。Various aspects of the illustrative embodiments provide a computer-implemented method, data processing system, and processor for testing thermal regulation control of real-time software. The illustrative embodiments receive at least one thermal control setting. The illustrative embodiment places the thermal management system into a test mode using the at least one thermal control setting, wherein the test mode indicates thermal regulation control using the thermal control setting. The illustrative embodiments execute real-time software in a test mode and test whether real-time deadlines associated with the real-time software are met in the test mode. In response to the real-time software meeting the real-time deadline, the illustrative embodiment records the at least one thermal control setting as a passed thermal control setting. In response to the real-time software failing to meet the real-time deadline, the at least one thermal control setting is logged as a failed thermal control setting.

说明性实施例确定是否还存在至少一个将增加调节量的热控制设置,其中在该热控制设置下可以测试实时软件。作为对存在将增加调节量的热控制设置的响应,说明性实施例使用增加的调节量来第二次执行并测试实时软件。The illustrative embodiment determines whether there is also at least one thermal control setting at which real-time software can be tested that will increase throttling. In response to the presence of a thermal control setting that would increase the amount of adjustment, the illustrative embodiment executes and tests the real-time software a second time with the increased amount of adjustment.

说明性实施例确定是否还存在至少一个将减少调节量的热控制设置的装置,其中在该热控制设置下可以测试实时软件。作为对存在将减少调节量的热控制设置的响应,说明性实施例使用减少的调节量来第二次执行并测试实时软件。The illustrative embodiment determines whether there is also at least one thermal control setting that will reduce the amount of adjustment, at which the real-time software can be tested. In response to the presence of a thermal control setting that would reduce the throttling, the illustrative embodiment executes and tests the real-time software a second time with the reduced throttling.

说明性实施例中的测试模式可以指定热管理系统的始终调节状态或热管理系统的随机调节状态。随机调节状态注入随机热事件以更真实地模拟调节的相互作用和软件的执行。The test mode in the illustrative embodiments may specify either a constant tuning state of the thermal management system or a random tuning state of the thermal management system. Stochastic tuning states inject random thermal events to more realistically simulate tuning interactions and software execution.

热控制设置可以包括要被初始化的调节量和针对要被初始化的测试模式的持续时间。该要被初始化的调节量可以是单元停止的时间与该单元运行的时间的百分比。单元停止的时间和该单元运行的时间可以是根据比例寄存器的值来缩放的。针对要被初始化的测试模式的持续时间可以是单元停止和运行的实际时钟周期数。The thermal control settings may include the amount of adjustment to be initiated and the duration for the test mode to be initiated. The amount of adjustment to be initialized may be the percentage of time the unit is off to the time the unit is running. The time the unit is stopped and the time the unit is running can be scaled according to the value of the scale register. The duration for the test pattern to be initialized may be the actual number of clock cycles the unit stops and runs.

通过或失败的热控制设置可以存储在数据结构中。测试模式可以在热管理控制寄存器中设置。集成电路可以是异质多内核处理器。Passed or failed thermal control settings can be stored in a data structure. The test mode can be set in the thermal management control register. An integrated circuit may be a heterogeneous multi-core processor.

附图说明 Description of drawings

在所附权利要求中阐明了确信是说明性实施例所特有的新颖特征。但是,在结合附图阅读时,参考下面对说明性实施例的详细描述,可以最好地理解说明性实施例本身以及其优选的使用模式、进一步的目的及优势,其中:The novel features believed characteristic of the illustrative embodiments are set forth in the appended claims. However, the illustrative embodiments themselves, together with their preferred modes of use, further objects and advantages, are best understood by reference to the following detailed description of the illustrative embodiments when read in conjunction with the accompanying drawings, in which:

图1描述了可以实现说明性实施例各方面的数据处理系统的网络的图示;1 depicts an illustration of a network of data processing systems that may implement aspects of the illustrative embodiments;

图2描述了可以实现说明性实施例各方面的数据处理系统的框图;2 depicts a block diagram of a data processing system in which aspects of the illustrative embodiments may be implemented;

图3描述了可以实现说明性实施例各方面的Cell BE芯片的示例性示图;Figure 3 depicts an exemplary diagram of a Cell BE chip that can implement various aspects of the illustrative embodiments;

图4示出了根据说明性实施例的示例性热管理系统;FIG. 4 shows an example thermal management system in accordance with an illustrative embodiment;

图5描述了根据说明性实施例的温度曲线图以及可能发生中断和动态调节的各个点;FIG. 5 depicts a temperature profile and various points at which interruptions and dynamic adjustments may occur, in accordance with an illustrative embodiment;

图6描述了根据说明性实施例的用于记录最大温度的操作的流程图;6 depicts a flowchart of operations for recording maximum temperature, in accordance with an illustrative embodiment;

图7描述了根据另一个说明性实施例的用于通过性能监控来跟踪热数据的操作的流程图;7 depicts a flowchart of operations for tracking thermal data through performance monitoring, according to another illustrative embodiment;

图8A和图8B描述了根据另外的说明性实施例的针对高级热中断产生的操作的流程图;8A and 8B depict a flowchart of operations for advanced thermal interrupt generation, according to additional illustrative embodiments;

图9描述了根据另外的说明性实施例的用于在热管理系统中支持深度节能模式和部分良好的操作的流程图;FIG. 9 depicts a flowchart for supporting deep power-saving modes and partially benign operation in a thermal management system, according to an additional illustrative embodiment;

图10描述了根据另外的说明性实施例的针对使热感知软件应用的实时测试能够与温度相独立的热调节控制特征的操作的流程图;10 depicts a flowchart of operations for a thermal throttling control feature that enables real-time testing of a thermal-aware software application independent of temperature, in accordance with additional illustrative embodiments;

图11描述了根据另外的说明性实施例的用于实现对中断等待时间影响最小的热调节控制的操作的流程图;11 depicts a flowchart of operations for implementing thermal throttling control with minimal impact on interrupt latency, in accordance with additional illustrative embodiments;

图12描述了根据另外的说明性实施例的用于热调节中的滞后的操作的流程图;以及12 depicts a flowchart of operations for hysteresis in thermal regulation, according to additional illustrative embodiments; and

图13描述了根据另外的说明性实施例的用于实现热调节逻辑的操作的流程图。FIG. 13 depicts a flowchart of operations for implementing thermal throttling logic, according to an additional illustrative embodiment.

具体实施方式 Detailed ways

说明性实施例涉及用于测试实时软件的热调节控制。图1-图2被提供为可以实现说明性实施例的数据处理环境的示例性示图。应当理解,图1-图2只是示例性的,并非旨在明确或暗示任意关于可以实现实施例各方面的环境的限制。在不偏离说明性实施例的精神和范围的情况下,可以对所描述的环境进行很多修改。The illustrative embodiments relate to thermal regulation control for testing real-time software. 1-2 are provided as exemplary illustrations of data processing environments in which illustrative embodiments may be implemented. It should be understood that FIGS. 1-2 are only exemplary, and are not intended to expressly or imply any limitation on the environment in which various aspects of the embodiments can be implemented. Many modifications may be made to the described environment without departing from the spirit and scope of the illustrative embodiments.

现在参考附图,图1描述了可以实现说明性实施例各方面的数据处理系统的网络的图示。网络数据处理系统100是可以实现说明性实施例的计算机网络。网络数据处理系统100包含网络102,该网络102是一种用于在网络数据处理系统100内连接在一起的各种设备和计算机之间提供通信链路的媒介。网络102可以包括诸如线缆、无线通信链路或光纤电缆之类的连接。Referring now to the drawings, FIG. 1 depicts an illustration of a network of data processing systems in which aspects of the illustrative embodiments may be implemented. Network data processing system 100 is a network of computers on which the illustrative embodiments may be implemented. Network data processing system 100 contains network 102 , which is the medium used to provide communications links between various devices and computers connected together within network data processing system 100 . Network 102 may include connections such as wires, wireless communication links, or fiber optic cables.

在所描述的示例中,服务器104和服务器106连接到网络102和随之的存储单元108。另外,客户端110、112和114连接到网络102。这些客户端110、112和114可以是例如个人计算机或网络计算机。在所描述的示例中,服务器104向客户端110、112和114提供诸如启动文件、操作系统映像和应用之类的数据。在该示例中,客户端110、112和114是服务器104的客户端。网络数据处理系统100可以包括附加的服务器、客户端和没有示出的其他设备。In the depicted example, server 104 and server 106 are connected to network 102 and consequently storage unit 108 . Additionally, clients 110 , 112 and 114 are connected to network 102 . These clients 110, 112 and 114 may be, for example, personal computers or network computers. In the depicted example, server 104 provides data such as boot files, operating system images, and applications to clients 110 , 112 , and 114 . In this example, clients 110 , 112 , and 114 are clients to server 104 . Network data processing system 100 may include additional servers, clients, and other devices not shown.

在所描述的示例中,网络数据处理系统100是具有网络102的因特网,网络102表示使用传输控制协议/网际协议(TCP/IP)协议组来相互通信的网络和网关的全球集合。在因特网的中心是主节点或主机之间的高速数据通信线路骨干,包括数以千计的商业计算机系统、政府计算机系统、教育计算机系统和其他对数据和消息进行路由的计算机系统。当然,网络数据处理系统100还可以实现为多种不同类型的网络,诸如内网、局域网(LAN)或广域网(WAN)。图1旨在作为一个示例,而不是作为对不同说明性实施例的体系结构限制。In the depicted example, network data processing system 100 is the Internet with network 102 representing a global collection of networks and gateways that communicate with each other using the Transmission Control Protocol/Internet Protocol (TCP/IP) suite of protocols. At the heart of the Internet is a backbone of high-speed data communication lines between major nodes or host computers, including thousands of business, government, educational and other computer systems that route data and messages. Of course, the network data processing system 100 can also be implemented as various types of networks, such as an intranet, a local area network (LAN) or a wide area network (WAN). FIG. 1 is intended as an example, and not as an architectural limitation for the different illustrative embodiments.

现在参考图2,示出了可以实现说明性实施例各方面的数据处理系统的框图。数据处理系统200是诸如图1中的服务器104或客户端110之类的计算机的示例,实现说明性实施例的处理的计算机可用代码或指令可以位于该计算机中。Referring now to FIG. 2 , a block diagram of a data processing system in which aspects of the illustrative embodiments may be implemented is shown. Data processing system 200 is an example of a computer, such as server 104 or client 110 in FIG. 1, in which computer usable code or instructions implementing the processes of the illustrative embodiments may reside.

在所描述的示例中,数据处理系统200采用中心体系结构,包括北桥和存储控制器中心(MCH)202以及南桥和输入/输出(I/O)控制器中心(ICH)204。处理单元206、主存储器208和图形处理器210连接到北桥和存储控制器中心202。图形处理器210可以通过加速图形端口(AGP)连接到北桥和存储控制器中心202。In the depicted example, data processing system 200 employs a hub architecture including north bridge and memory controller hub (MCH) 202 and south bridge and input/output (I/O) controller hub (ICH) 204 . Processing unit 206 , main memory 208 and graphics processor 210 are connected to northbridge and memory controller hub 202 . Graphics processor 210 may be connected to Northbridge and storage controller hub 202 through an accelerated graphics port (AGP).

在所描述的示例中,LAN适配器212连接到南桥和I/O控制器中心204。音频适配器216、键盘和鼠标适配器220、调制解调器222、只读存储器(ROM)224、硬盘驱动器(HDD)226、CD-ROM驱动器230、通用串行总线(USB)端口和其他通信端口232、以及PCI/PCIe设备234通过总线238和总线240连接到南桥和I/O控制器中心204。PCI/PCIe设备可以包括例如以太网适配器、插入卡和笔记本计算机的PC卡。PCI使用卡总线控制器,而PCIe则不使用。ROM 224可以是例如闪速二进制输入/输出系统(BIOS)。In the depicted example, LAN adapter 212 is connected to Southbridge and I/O controller hub 204 . Audio adapter 216, keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224, hard disk drive (HDD) 226, CD-ROM drive 230, universal serial bus (USB) port and other communication ports 232, and PCI PCIe device 234 is connected to Southbridge and I/O controller hub 204 via bus 238 and bus 240 . PCI/PCIe devices may include, for example, Ethernet adapters, add-in cards, and PC cards for notebook computers. PCI uses a card bus controller, PCIe does not. ROM 224 may be, for example, a flash binary input/output system (BIOS).

硬盘驱动器226和CD-ROM驱动器230通过总线240连接到南桥和I/O控制器中心204。硬盘驱动器226和CD-ROM驱动器230可以使用例如集成驱动电子设备(IDE)或串行高级技术附件(SATA)接口。超级I/O(SIO)设备236可以连接到南桥和I/O控制器中心204。Hard disk drive 226 and CD-ROM drive 230 are connected to Southbridge and I/O controller hub 204 by bus 240 . Hard disk drive 226 and CD-ROM drive 230 may use, for example, integrated drive electronics (IDE) or serial advanced technology attachment (SATA) interfaces. Super I/O (SIO) devices 236 may be connected to Southbridge and I/O controller hub 204 .

操作系统在处理单元206上运行并且调整和提供对图2中的数据处理系统200内的各种组件的控制。作为客户端,操作系统可以是市面上有售的操作系统,诸如

Figure C200710105486D00091
 
Figure C200710105486D00092
 XP(Microsoft和Windows是微软公司在美国、其他国家或同时在美国和其他国家的商标)。诸如JavaTM编程系统之类的面向对象编程系统可以结合操作系统而运行,并提供从在数据处理系统200上执行的Java程序或应用对操作系统的调用(Java是Sun微系统公司在美国、其他国家或同时在美国和其他国家的商标)。An operating system runs on processing unit 206 and coordinates and provides control of various components within data processing system 200 in FIG. 2 . As a client, the operating system can be a commercially available operating system, such as
Figure C200710105486D00091
Figure C200710105486D00092
XP (Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both). An object-oriented programming system such as the Java programming system can run in conjunction with an operating system and provide calls to the operating system from Java programs or applications executing on data processing system 200 country or both in the United States and other countries).

作为服务器,数据处理系统200可以是例如运行高级交互执行

Figure C200710105486D00101
操作系统或LINUX操作系统的IBM eServerTM 
Figure C200710105486D00102
计算机系统(eServer、pSeries和AIX是国际商业机器公司在美国、其他国家或同时在美国和其他国家的商标,而Linux是Linux Torvalds在美国、其他国家或同时在美国和其他国家的商标)。数据处理系统200可以是在处理单元206中包括多个处理器的对称多处理器(SMP)系统。作为选择,可以采用单处理器系统。As a server, data processing system 200 may be, for example, running a high-level interactive implementation
Figure C200710105486D00101
OS or IBM eServer TM with LINUX OS
Figure C200710105486D00102
Computer Systems (eServer, pSeries and AIX are trademarks of International Business Machines Corporation in the United States, other countries, or both, and Linux is a trademark of Linux Torvalds in the United States, other countries, or both). Data processing system 200 may be a symmetric multiprocessor (SMP) system including multiple processors in processing unit 206 . Alternatively, a single processor system may be used.

操作系统、面向对象的编程系统以及应用或程序的指令位于诸如硬盘驱动器226之类的存储设备上,并且可以被载入主存储器208以供处理单元206执行。说明性实施例的处理由处理单元206使用计算机可用程序代码来执行,这些代码可以位于诸如主存储器208、只读存储器224之类的存储器中,或位于一个或多个外围设备226和230中。Instructions for the operating system, object-oriented programming system, and applications or programs reside on storage devices such as hard drive 226 and may be loaded into main memory 208 for execution by processing unit 206 . The processes of the illustrative embodiments are performed by processing unit 206 using computer usable program code, which may be located in a memory such as main memory 208 , read only memory 224 , or in one or more peripheral devices 226 and 230 .

本领域的普通技术人员应当理解,根据不同的实现,图1-图2中的硬件可以变化。可以使用诸如闪速存储器、等效非易失性存储器或光盘驱动器等其他内部硬件或外围设备来补充或代替图1-图2中所描述的硬件。同样,说明性实施例的处理还可以应用于多处理器数据处理系统。Those of ordinary skill in the art should understand that, according to different implementations, the hardware in FIGS. 1-2 may vary. Other internal hardware or peripherals, such as flash memory, equivalent non-volatile memory, or optical disk drives, may be used in addition to or in place of the hardware described in Figures 1-2. Likewise, the processes of the illustrative embodiments may also be applied to multiprocessor data processing systems.

在一些说明性示例中,数据处理系统200可以是个人数字助理(PDA),该个人数字助理配置有闪速存储器以提供用于存储操作系统文件和/或用户生成的数据的非易失性存储器。In some illustrative examples, data processing system 200 may be a personal digital assistant (PDA) configured with flash memory to provide non-volatile memory for storing operating system files and/or user-generated data. .

总线系统可以包括一个或多个总线,诸如图2中示出的总线238或总线240。当然,可以使用任意类型的通信架构或体系结构来实现总线系统,该架构或体系结构提供对附在该架构或体系结构上的不同组件或设备之间的数据的传送。通信单元可以包括一个或多个用于发送或接收数据的设备,诸如图2的调制解调器222或网络适配器212。存储器可以是例如主存储器208、只读存储器224或诸如在图2中的北桥和存储控制器中心202中见到的高速缓存。图1-图2中所描述的示例及上述示例并非意味着暗示体系结构限制。例如,除了采取PDA的形式之外,数据处理系统200还可以是写字板计算机、膝上型计算机或电话设备。A bus system may include one or more buses, such as bus 238 or bus 240 shown in FIG. 2 . Of course, the bus system may be implemented using any type of communication framework or architecture that provides for the transfer of data between different components or devices attached to the framework or architecture. A communications unit may include one or more devices used to transmit or receive data, such as modem 222 or network adapter 212 of FIG. 2 . The memory may be, for example, main memory 208 , read-only memory 224 , or a cache such as found in north bridge and memory controller hub 202 in FIG. 2 . The examples depicted in FIGS. 1-2 and the above examples are not meant to imply architectural limitations. For example, data processing system 200 also may be a tablet computer, laptop computer, or telephone device in addition to taking the form of a PDA.

图3描述了可以实现说明性实施例各方面的Cell BE芯片的示例性示图。Cell BE芯片300是针对分布式处理的单芯片多处理器实现,该分布式处理的目标在于诸如游戏控制台、桌面系统和服务器之类的富媒体应用。FIG. 3 depicts an exemplary diagram of a Cell BE chip that may implement aspects of the illustrative embodiments. The Cell BE chip 300 is a single-chip multiprocessor implementation for distributed processing targeting media-rich applications such as game consoles, desktop systems, and servers.

Cell BE芯片300可以在逻辑上分成以下功能组件:Power PC

Figure C200710105486D0011155232QIETU
处理器单元(PPE)301、协处理器单元(SPU)310、311和312、以及存储器流控制器(MFC)305、306和307。尽管通过示例示出了协处理器单元(SPE)302、303和304以及PPE 301,但是可以支持任意类型的处理器单元。尽管图3只示出了三个SPE 302、303和304,但示例性的Cell BE芯片300的实现包括一个PPE 301和八个SPE。CELL处理器的SPE是设计为加速媒体和数据流工作负载的新处理器体系结构的第一实现。Cell BE chip 300 can be logically divided into the following functional components: Power PC
Figure C200710105486D0011155232QIETU
A processor unit (PPE) 301 , coprocessor units (SPU) 310 , 311 and 312 , and memory flow controllers (MFC) 305 , 306 and 307 . Although co-processor units (SPE) 302, 303, and 304 and PPE 301 are shown by way of example, any type of processor unit may be supported. Although FIG. 3 only shows three SPEs 302, 303, and 304, an exemplary implementation of the Cell BE chip 300 includes one PPE 301 and eight SPEs. SPE for CELL processors is the first implementation of a new processor architecture designed to accelerate media and streaming workloads.

Cell BE芯片300可以是片上系统,使得可以在单个微处理器芯片上提供图3所示的每一个单元。此外,Cell BE芯片300是一种异质处理环境,其中每个SPU 310、311和312可以从系统中的每个其他SPU接收不同的指令。此外,SPU 310、311和312的指令集与Power PC

Figure C200710105486D0011155232QIETU
处理器单元(PPU)308的指令集不同,例如,PPU 308可以在PowerTM体系结构中执行基于精简指令集计算机(RISC)的指令,而SPU 310、311和312执行向量化的指令。The Cell BE chip 300 may be a system-on-chip, so that each cell shown in FIG. 3 may be provided on a single microprocessor chip. Furthermore, Cell BE chip 300 is a heterogeneous processing environment where each SPU 310, 311 and 312 can receive different instructions from every other SPU in the system. In addition, the instruction set of SPU 310, 311 and 312 is the same as that of Power PC
Figure C200710105486D0011155232QIETU
The processor unit (PPU) 308 has a different instruction set, for example, the PPU 308 may execute Reduced Instruction Set Computer (RISC) based instructions in the Power (TM) architecture, while the SPUs 310, 311 and 312 execute vectorized instructions.

每个SPE包括一个SPU 310、311或312,它自己的本地存储(LS)区域313、314或315和具有关联的存储管理单元(MMU)316、317或318的专用MFC 305、306或307保存并处理存储器保护和访问许可信息。同样,尽管通过示例示出了SPU,但是可以支持任意类型的处理器单元。另外,Cell BE芯片300实现单元互连总线(EIB)319和其他I/O结构以实现片上和外部数据流。Each SPE includes an SPU 310, 311 or 312, its own local storage (LS) area 313, 314 or 315 and a dedicated MFC 305, 306 or 307 with an associated memory management unit (MMU) 316, 317 or 318 to store Also handles memory protection and access permission information. Also, although an SPU is shown by way of example, any type of processor unit may be supported. In addition, the Cell BE chip 300 implements an cell interconnect bus (EIB) 319 and other I/O structures to enable on-chip and external data flow.

EIB 319用作PPE 301以及SPE 302、303和304的主片上总线。另外,EIB319与专用于片外(off-chip)访问的其他片上接口控制器进行接口连接。片上接口控制器包括存储接口控制器(MIC)320和Cell BE接口单元(BEI)323,其中MIC320提供两个极速数据速率I/O(XIO)存储通道321和322,BEI 323为Cell BE 300提供两个高速外部I/O通道和内部中断控制。BEI 323实现为总线接口控制器(BIC,标注为BIC0和BIC1)324和325以及I/O接口控制器(IOC)326。两个高速外部I/O通道连接到Redwood 

Figure C200710105486D00121
 Asic Cell(RRAC)接口的一端,该接口为Cell BE 300提供灵活的输入和输出(FlexIO_0和FlexIO_1)353。EIB 319 serves as the main on-chip bus for PPE 301 and SPEs 302 , 303 and 304 . Additionally, the EIB319 interfaces with other on-chip interface controllers dedicated for off-chip access. The on-chip interface controller includes a storage interface controller (MIC) 320 and a Cell BE interface unit (BEI) 323, wherein the MIC320 provides two extremely fast data rate I/O (XIO) storage channels 321 and 322, and the BEI 323 provides the Cell BE 300 Two high-speed external I/O channels and internal interrupt control. BEI 323 is implemented as Bus Interface Controllers (BICs, labeled BIC0 and BIC1 ) 324 and 325 and I/O Interface Controller (IOC) 326 . Two high-speed external I/O channels connect to the Redwood
Figure C200710105486D00121
One end of the Asic Cell (RRAC) interface, which provides flexible input and output (FlexIO_0 and FlexIO_1) 353 for the Cell BE 300 .

每个SPU 310、311或312都有对应的LS区域313、314或315以及协执行单元(SXU)354、355或356。每个单独的SPU 310、311或312只能从与它关联的LS区域313、314或315内执行指令(包括数据加载和存储操作)。由于这个原因,MFC直接存储器访问(DMA)操作通过SPU的310、311和312专用的MFC 305、306和307来执行所有需要的去往或来自系统中其他地方的存储器的数据传送。Each SPU 310, 311 or 312 has a corresponding LS region 313, 314 or 315 and a co-execution unit (SXU) 354, 355 or 356. Each individual SPU 310, 311 or 312 can only execute instructions (including data load and store operations) from within its associated LS region 313, 314 or 315. For this reason, MFC direct memory access (DMA) operations perform all required data transfers to and from memory elsewhere in the system through the dedicated MFCs 305, 306, and 307 of the SPUs 310, 311, and 312.

在SPU 310、311或312上运行的程序只使用LS地址来引用它自己的LS区域313、314或315。但是,还为每个SPU的LS区域313、314或315分配一个在整个系统的存储映射内的真实地址(RA)。该RA是设备将响应的地址。在Power PC

Figure C200710105486D0011155232QIETU
中,应用通过有效地址(EA)来引用存储位置(或设备),然后该EA被映射成存储位置(或设备)的虚拟地址(VA),然后该VA被映射为RA。EA是由应用用来引用存储器和/或设备的地址。这种映射使操作系统能够分配比系统中物理上更多的存储器(也就是称为VA的虚拟存储器)。存储映射是系统中所有设备(包括存储器)和它们对应的RA的列表。存储映射是对标识设备或存储器将响应的RA的真实地址空间的映射。A program running on an SPU 310, 311 or 312 refers to its own LS region 313, 314 or 315 using only the LS address. However, each SPU's LS region 313, 314 or 315 is also assigned a real address (RA) within the system-wide memory map. The RA is the address to which the device will respond. on PowerPC
Figure C200710105486D0011155232QIETU
In , an application refers to a storage location (or device) by an effective address (EA), which is then mapped to a virtual address (VA) of the storage location (or device), which is then mapped to an RA. EAs are addresses used by applications to reference memory and/or devices. This mapping enables the operating system to allocate more memory than is physically present in the system (also known as virtual memory called VA). A storage map is a list of all devices (including storage) in the system and their corresponding RAs. A memory map is a map of the real address space of an RA that identifies a device or memory to which it will respond.

这使特权软件能够将LS区域映射到处理器的EA以在一个SPU的LS和另一个SPU的LS区域之间实现直接存储器访问传送。PPE 301还可以使用EA来直接访问任意SPU的LS区域。在

Figure C200710105486D00122
中有三个状态(问题、特权和管理)。特权软件是在特权或管理状态下运行的软件。这些状态有不同的访问特权。例如,特权软件可以访问用于将真实存储器映射成应用的EA的数据结构寄存器。问题状态是在运行应用并通常被禁用访问系统管理资源(诸如用于映射真实存储器的数据结构)时处理器通常所处的状态。This enables privileged software to map the LS region to the processor's EA for direct memory access transfers between the LS region of one SPU and the LS region of another SPU. PPE 301 can also use EA to directly access any SPU's LS region. exist
Figure C200710105486D00122
There are three states in (Problem, Privileged, and Admin). Privileged software is software that runs under a privileged or administrative state. These states have different access privileges. For example, privileged software may access the data structure registers used to map real memory to the application's EA. The problem state is the state the processor typically is in when running applications and is typically disabled from accessing system management resources, such as the data structures used to map real memory.

MFC DMA数据命令始终包括一个LS地址和一个EA。DMA命令将存储内容从一个位置复制到另一个位置。在这种情况下,MFC DMA命令在EA和LS地址之间复制数据。LS地址直接指向与MFC命令队列对应的关联的SPU 310、311或312的LS区域313、314或315。命令队列是MFC命令的队列。有一个队列用来保存来自SPU的命令,一个队列用来保存来自PXU或其他设备的命令。但是可以安排或映射EA以访问系统中的任意其他存储器存储区域,包括其他SPE 302、303和304的LS区域313、314和315。MFC DMA data commands always include an LS address and an EA. DMA commands copy the contents of storage from one location to another. In this case, MFC DMA commands copy data between EA and LS addresses. The LS address points directly to the LS region 313, 314 or 315 of the associated SPU 310, 311 or 312 corresponding to the MFC command queue. The command queue is a queue of MFC commands. There is one queue for commands from the SPU and one queue for commands from the PXU or other devices. But the EA can be arranged or mapped to access any other memory storage area in the system, including LS areas 313, 314, and 315 of other SPEs 302, 303, and 304.

主存储器(没有示出)由诸如图2所示出的系统之类的系统中的PPU 308、PPE 301、SPE 302、303和304以及I/O设备(没有示出)共享。所有保存在主存储器中的信息对系统中的所有处理器和设备来说是可见的。程序使用EA来引用主存储器。由于MFC代理命令队列、控制和状态设施具有RA,并且使用EA来映射RA,所以Power处理器单元可能在关联的SPE 302、303和304的主存储器和本地存储器之间使用EA来初始化DMA操作。Main memory (not shown) is shared by PPU 308, PPE 301, SPEs 302, 303, and 304 and I/O devices (not shown) in a system such as that shown in FIG. All information stored in main memory is visible to all processors and devices in the system. Programs use EAs to refer to main memory. Since the MFC Proxy Command Queue, Control and Status facilities have RAs and use EAs to map RAs, it is possible for Power processor units to use EAs to initiate DMA operations between the main memory and local memory of the associated SPE 302, 303, and 304.

作为示例,当在SPU 310、311或312上运行的程序需要访问主存储器时,SPU程序生成具有适当的EA和LS地址的DMA命令并将其放置到它的MFC 305、306或307的命令队列中。在命令被SPU程序放置到队列中之后,MFC 305、306或307执行该命令并在LS区域和主存储器之间传送所需要的数据。MFC 305、306或307为由诸如PPE 301之类的其他设备生成的命令提供第二代理命令队列。MFC代理命令队列典型地用于在启动SPU之前将程序存储到本地存储中。MFC代理命令还可以用于上下文存储操作。As an example, when a program running on an SPU 310, 311 or 312 needs to access main memory, the SPU program generates a DMA command with the appropriate EA and LS address and places it into the command queue of its MFC 305, 306 or 307 middle. After a command is queued by the SPU program, the MFC 305, 306 or 307 executes the command and transfers required data between the LS area and the main memory. The MFC 305, 306 or 307 provides a second proxy command queue for commands generated by other devices such as the PPE 301. The MFC proxy command queue is typically used to store programs into local storage before starting the SPU. MFC proxy commands can also be used for context store operations.

EA地址为MFC提供了一个可以由MMU转换为RA的地址。转换处理考虑到系统存储器的虚拟化和对在真实地址空间中的存储器和设备的访问保护。由于LS区域被映射成真实地址空间,因此EA还可以指向所有的SPU LS区域。The EA address provides the MFC with an address that can be translated to the RA by the MMU. The translation process allows for virtualization of system memory and access protection to memory and devices in the real address space. Since the LS area is mapped into the real address space, the EA can also point to all SPU LS areas.

Cell BE芯片300上的PPE 301包括64比特的PPU308和Power PC

Figure C200710105486D0011155232QIETU
存储子系统(PPSS)309。PPU 308包含处理器执行单元(PXU)329、一级(L1)高速缓存330、MMU 331和替换管理表(RMT)332。PPSS309包括可高速缓存接口单元(CIU)333、不可高速缓存单元(NCU)334、二级(L2)高速缓存328、RMT 335和总线接口单元(BIU)327。BIU 327将PPSS 309连接到EIB 319。The PPE 301 on the Cell BE chip 300 includes a 64-bit PPU308 and a Power PC
Figure C200710105486D0011155232QIETU
Storage Subsystem (PPSS) 309 . The PPU 308 includes a Processor Execution Unit (PXU) 329 , a Level 1 (L1) cache 330 , an MMU 331 , and a Replacement Management Table (RMT) 332 . PPSS 309 includes cacheable interface unit (CIU) 333 , non-cacheable unit (NCU) 334 , level two (L2) cache 328 , RMT 335 and bus interface unit (BIU) 327 . BIU 327 connects PPSS 309 to EIB 319 .

SPU 310、311或312以及MFC 305、306和307通过具有容量的单向通道相互通信。通道实质上是使用34个SPU指令中的一个指令来访问的FIFO,读通道(RDCH)、写通道(WRCH)和读通道计数(RDCHCNT)。RDCHCNT返回通道中的信息量。容量是FIFO的深度。通道对去往和来自MFC 305、306和307,SPU 310、311和312的数据进行传送。BIU 339、340和341将MFC 305、306和307连接到EIB 319。The SPU 310, 311 or 312 and the MFC 305, 306 and 307 communicate with each other through a one-way channel with capacity. Channels are essentially FIFOs accessed using one of the 34 SPU instructions, Read Channel (RDCH), Write Channel (WRCH), and Read Channel Count (RDCHCNT). RDCHCNT returns the amount of information in the channel. Capacity is the depth of the FIFO. The channels carry data to and from the MFCs 305, 306 and 307, the SPUs 310, 311 and 312. BIUs 339, 340 and 341 connect the MFCs 305, 306 and 307 to the EIB 319.

MFC 305、306和307为SPU 310、311和312提供两个主要功能。MFC 305、306和307在SPU 310、311或312,LS区域313、314或315和主存储器之间移动数据。另外,MFC 305、306和307在SPU 310、311和312与系统中的其他设备之间提供同步设施。The MFCs 305, 306 and 307 provide the SPUs 310, 311 and 312 with two main functions. The MFC 305, 306 and 307 moves data between the SPU 310, 311 or 312, the LS area 313, 314 or 315 and the main memory. In addition, MFCs 305, 306 and 307 provide synchronization facilities between SPUs 310, 311 and 312 and other devices in the system.

MFC 305、306和307的实现具有四个功能单元:直接存储器访问控制器(DMAC)336、337和338,MMU 316、317和318,原子单元(ATO)342、343和344,RMT 345、346和347以及BIU 339、340和341。DMAC 336、337和338维护并处理MFC命令队列(MFC CMDQ)(没有示出),其包括MFC SPU命令队列(MFC SPUQ)和MFC代理命令队列(MFC PrxyQ)。十六条目的MFC SPUQ处理从SPU通道接口接收的MFC命令。八条目的MFC PrxyQ通过存储映射输入和输出(MMIO)加载以及存储操作来处理来自诸如PPE 301或SPE 302、303和304之类的其他设备的MFC命令。典型的直接存储器访问命令在LS区域313、314或315与主存储器之间移动数据。MFC DMA命令的EA参数用于指向主存储设备,包括主存储器、本地存储器和所有具有RA的设备。MFC DMA命令的本地存储器参数用于指向关联的本地存储器。The implementation of MFC 305, 306 and 307 has four functional units: Direct Memory Access Controller (DMAC) 336, 337 and 338, MMU 316, 317 and 318, Atomic Unit (ATO) 342, 343 and 344, RMT 345, 346 and 347 and BIU 339, 340 and 341. DMACs 336, 337, and 338 maintain and process MFC command queues (MFC CMDQ) (not shown), which include MFC SPU command queues (MFC SPUQ) and MFC proxy command queues (MFC PrxyQ). The sixteen-entry MFC SPUQ handles MFC commands received from the SPU channel interface. The eight-entry MFC PrxyQ handles MFC commands from other devices such as the PPE 301 or SPEs 302, 303 and 304 through memory-mapped input and output (MMIO) load and store operations. Typical direct memory access commands move data between the LS area 313, 314 or 315 and main memory. The EA parameter of the MFC DMA command is used to point to the main storage device, including main memory, local memory and all devices with RA. The local memory parameter of the MFC DMA command is used to point to the associated local memory.

在虚拟模式中,MMU 316、317和318提供地址转换和存储器保护设施以处理来自DMAC 336、337和338的EA转换请求并送回已转换的地址。每个SPE的MMU维护区段监视缓冲器(SLB)和转换监视缓冲器(TLB)。SLB将EA转换为VA,TLB将从SLB出来的VA转换为RA。EA由应用使用并且通常是32比特或64比特的地址。不同的应用或一个应用的多个副本可以使用相同的EA来引用不同的存储位置(例如,均使用相同EA的一个应用的两个副本,需要两个不同的物理存储位置)。为了完成这一点,EA首先被转换成一个更大的VA空间,其对在操作系统下运行的所有应用来说是公共的。EA到VA的转换由SLB执行。然后使用TLB来将VA转换成RA,该TLB是包含VA到RA的映射的页表或映射表的高速缓存。该表由操作系统维护。In virtual mode, MMUs 316, 317 and 318 provide address translation and memory protection facilities to handle EA translation requests from DMACs 336, 337 and 338 and return translated addresses. The MMU of each SPE maintains a segment watchdog buffer (SLB) and a translation watchdog buffer (TLB). SLB converts EA to VA, and TLB converts VA from SLB to RA. The EA is used by the application and is usually a 32-bit or 64-bit address. Different applications or multiple copies of an application may use the same EA to reference different storage locations (eg, two copies of an application, both using the same EA, require two different physical storage locations). To accomplish this, the EA is first converted into a larger VA space that is common to all applications running under the operating system. The conversion of EA to VA is performed by SLB. The VA is then translated to RA using the TLB, which is a cache of page tables or mapping tables containing VA to RA mappings. This table is maintained by the operating system.

ATO 342、343和344提供了维护与系统中其他处理单元的同步所必需的数据高速缓存的级别。原子直接存储器访问命令提供了使协处理器单元执行与其他单元的同步的手段。ATOs 342, 343 and 344 provide the level of data caching necessary to maintain synchronization with other processing units in the system. Atomic direct memory access commands provide a means for coprocessor units to perform in synchronization with other units.

BIU 339、340和341的主要功能是为SPE 302、303和304提供到EIB的接口。EIB 319在Cell BE芯片300上的所有处理器内核和附在EIB 319上的外部接口控制器之间提供通信路径。The main function of BIUs 339, 340 and 341 is to provide interfaces for SPEs 302, 303 and 304 to the EIB. The EIB 319 provides a communication path between all processor cores on the Cell BE chip 300 and the external interface controller attached to the EIB 319.

MIC 320在EIB 319与XIO 321和322中的一个或两个之间提供接口。极速数据速率(XDRTM)动态随机存取存储器(DRAM)是由Rambus

Figure C200710105486D0011155232QIETU
提供的高速高度串行存储器。由Rambus提供的宏访问极速数据速率动态随机存取存储器,该存储器在本文中称为XIO 321和322。MIC 320 provides an interface between EIB 319 and one or both of XIOs 321 and 322 . Extreme Data Rate (XDR TM ) Dynamic Random Access Memory (DRAM) is developed by Rambus
Figure C200710105486D0011155232QIETU
High-speed highly serial memory provided. Macro-access extreme data rate dynamic random access memories provided by Rambus, referred to herein as XIO 321 and 322.

MIC 320只是EIB 319上的从设备。MIC 320对在它的配置地址范围内的命令进行确认,该范围与所支持的中心中的存储器对应。MIC 320 is just a slave on EIB 319. The MIC 320 acknowledges commands within its configured address range corresponding to memory in the supported hubs.

BIC 324和325管理片上或片外的从EIB 319到两个外部设备中的任意一个的数据传送。BIC 324和325可以和I/O设备交换非一致性业务,或者它可以将EIB 319扩展到另一个设备,该设备甚至可以是另一个Cell BE芯片。当用于扩展EIB 319时,总线协议维护Cell BE芯片300中的高速缓存和附带的外部设备中的高速缓存之间的一致性,该外部设备可以是另一个Cell BE芯片。BICs 324 and 325 manage on-chip or off-chip data transfers from EIB 319 to either of the two external devices. BIC 324 and 325 can exchange non-coherent services with I/O devices, or it can extend EIB 319 to another device, which can even be another Cell BE chip. When used to extend the EIB 319, the bus protocol maintains coherence between the cache in the Cell BE chip 300 and the cache in an attached external device, which may be another Cell BE chip.

IOC 326处理在I/O接口设备中发起并发往相关EIB 319的命令。I/O接口设备可以是附到I/O接口上的任意设备,诸如附带多个I/O设备或用非一致性的方式访问的另一个Cell BE芯片300的I/O桥芯片。IOC 326还截取EIB 319上针对存储映射寄存器的访问,并将它们路由到正确的I/O接口,这些寄存器驻留在I/O桥芯片或非一致性Cell BE芯片300之中或之后。IOC 326还包括内部中断控制器(IIC)349和I/O地址转换单元(I/O Trans)350。The IOC 326 handles commands initiated in the I/O interface device and sent to the associated EIB 319. The I/O interface device can be any device attached to the I/O interface, such as an I/O bridge chip with multiple I/O devices or another Cell BE chip 300 accessed in a non-uniform manner. The IOC 326 also intercepts accesses on the EIB 319 to memory-mapped registers residing in or after the I/O bridge chip or non-coherent Cell BE chip 300 and routes them to the correct I/O interface. The IOC 326 also includes an internal interrupt controller (IIC) 349 and an I/O address translation unit (I/O Trans) 350 .

普适逻辑(pervasive logic)351是为Cell BE芯片300提供时钟管理、测试特征和上电序列的控制器。普适逻辑可以为处理器提供热管理系统。普适逻辑包含通过本领域公知的联合测试行动小组(JTAG)或SPI(串行外围接口)接口到系统中其他设备的连接。Pervasive logic (pervasive logic) 351 is a controller that provides clock management, test features and power-on sequence for Cell BE chip 300 . Pervasive logic can provide a thermal management system for the processor. The pervasive logic includes connections to other devices in the system through Joint Test Action Group (JTAG) or SPI (Serial Peripheral Interface) interfaces known in the art.

尽管已经提供了如何实现不同组件的特定示例,但这并不意味着对可以使用说明性实施例各方面的体系结构构成限制。可以结合任意多内核处理器系统而使用说明性实施例的各方面。Although specific examples of how the various components may be implemented have been provided, this is not meant to limit the architectures in which aspects of the illustrative embodiments may be used. Aspects of the illustrative embodiments may be used in conjunction with any multi-core processor system.

在应用或软件的执行期间,Cell BE芯片内的区域的温度可能上升。如果不加抑制,温度可能上升到最大指定结温度(junction temperature)之上,导致不正确的运行或物理损害。为了避免这些情形,Cell BE芯片的数字热管理单元在运行期间监控并试图控制Cell BE芯片内的温度。数字热管理单元包括此处描述的一个热管理控制单元(TMCU)和十个分布式数字热传感器(DTS)。During the execution of the application or software, the temperature of the area inside the Cell BE chip may rise. If left unchecked, the temperature may rise above the maximum specified junction temperature, causing incorrect operation or physical damage. To avoid these situations, the Cell BE chip's digital thermal management unit monitors and attempts to control the temperature inside the Cell BE chip during operation. The digital thermal management unit consists of a thermal management control unit (TMCU) and ten distributed digital thermal sensors (DTS) as described here.

一个传感器位于八个SPE中的一个SPE中,一个传感器位于PPE中,并且一个传感器与线性热二极管相邻。线性热二极管是计算温度的片上二极管。这些传感器位于与关联的单元内的各区域相邻的位置,该关联的单元典型地在执行大多数应用期间经历温度的最大上升。热控制单元监控来自每个这些传感器的反馈。如果传感器的温度上升到可编程的点之上,热控制单元就可以配置为引起对PPE或一个或多个SPE的中断并且动态地调节关联的PPE或SPE的执行。One sensor is located in one of the eight SPEs, one sensor is located in the PPE, and one sensor is adjacent to the linear thermal diode. Linear thermal diodes are on-chip diodes that calculate temperature. These sensors are located adjacent to areas within the associated unit that typically experience the greatest rise in temperature during execution of most applications. The thermal control unit monitors the feedback from each of these sensors. If the temperature of the sensor rises above a programmable point, the thermal control unit may be configured to cause interruption of the PPE or one or more SPEs and dynamically adjust execution of the associated PPE or SPE.

将PPE或SPE停止和运行可编程数量的周期提供了必需的调节。中断使特权软件能够采取正确的措施,同时动态调节在没有软件介入的情况下试图将broadband engine(宽带引擎)芯片内的温度保持在可编程级别以下。特权软件将调节级别设置为等于或低于应用所提供的推荐设置。每个应用可能会不同。Stopping and running the PPE or SPE for a programmable number of cycles provides the necessary accommodations. Interrupts enable privileged software to take corrective action, while dynamic regulation attempts to keep the temperature inside the broadband engine chip below a programmable level without software intervention. Privileged software sets the throttling level at or below the recommended setting provided by the application. Each application may be different.

如果调节PPE或SPE没有有效地管理温度并且温度继续上升,则普适逻辑351在温度达到热过载温度(由可编程配置数据限定)时停止Cell BE芯片的时钟。热过载特征保护Cell BE芯片不受物理损害。从这种情形恢复需要硬重启(hard reset)。由DTS监控的区域的温度并不一定是关联的PPE或SPE内的最热点。If adjusting the PPE or SPE is not effectively managing the temperature and the temperature continues to rise, the pervasive logic 351 stops the clock of the Cell BE chip when the temperature reaches the thermal overload temperature (defined by the programmable configuration data). The thermal overload feature protects Cell BE chips from physical damage. Recovery from this situation requires a hard reset. The temperature of the area monitored by the DTS is not necessarily the hottest point within the associated PPE or SPE.

图4示出了根据说明性实施例的示例性热管理系统。热管理系统可以实现为集成电路,诸如图3的普适逻辑单元351所提供的集成电路。热管理系统可以是专用集成电路、处理器、多处理器或异质多内核处理器。热管理系统被分在十个分布式DTS和热管理控制单元(TMCU)402之间,为简单起见只示出了DTS 404、406、408和410。每个在SPU传感器440中的DTS 404和406、在PPU传感器442中的DTS 408、以及在与线性热二极管(没有示出)相邻的传感器444中的DTS 410提供当前温度检测信号。该信号表明温度等于或小于由TMCU 402设置的当前温度检测范围。TMCU 402使用来自DTS 404、406、408和410的信号的状态来连续地跟踪每个PPE或SPE的DTS 404、406、408或410的温度。随着对温度的跟踪,TMCU 402将当前温度提供为表示关联的PPE或SPE内的温度的数值。校准单独的传感器的制造厂设置内部校准存储器428。FIG. 4 shows an example thermal management system in accordance with an illustrative embodiment. The thermal management system may be implemented as an integrated circuit, such as that provided by pervasive logic unit 351 of FIG. 3 . Thermal management systems can be ASICs, processors, multiprocessors, or heterogeneous multicore processors. The thermal management system is divided between ten distributed DTSs and Thermal Management Control Units (TMCUs) 402, only DTSs 404, 406, 408 and 410 are shown for simplicity. Each of DTS 404 and 406 in SPU sensor 440, DTS 408 in PPU sensor 442, and DTS 410 in sensor 444 adjacent to a linear thermal diode (not shown) provides a current temperature sense signal. This signal indicates that the temperature is equal to or less than the current temperature detection range set by TMCU 402. The TMCU 402 uses the status of the signals from the DTS 404, 406, 408 and 410 to continuously track the temperature of the DTS 404, 406, 408 or 410 of each PPE or SPE. As the temperature is tracked, the TMCU 402 provides the current temperature as a value representing the temperature within the associated PPE or SPE. The factory that calibrates individual sensors sets internal calibration memory 428 .

除了上述TMCU 402的单元之外,TMCU 402还包括复用器446和450、工作寄存器448、比较器452和454、串行器456、热管理控制状态机458和数据流(DF)单元460。复用器446和450对各种发出的和进入的信号进行组合以便在单个媒介上传送。工作寄存器448保存在TMCU 402中执行的乘法的结果。比较器452和454提供了对两个输入的比较功能。比较器452是大于或等于比较器。比较器454是大于比较器。串行器456将来自源的低速并行数据转换为用于传送的高速串行数据。串行器456结合SPU传感器440上的解串器462和464而工作。解串器462和464将接收的高速串行数据转换为低速并行数据。热管理控制状态机458启动TMCU 402的内部初始化。DF单元460控制去往和来自热管理控制状态机458的数据。In addition to the units of TMCU 402 described above, TMCU 402 includes multiplexers 446 and 450, working registers 448, comparators 452 and 454, serializer 456, thermal management control state machine 458, and data flow (DF) unit 460. Multiplexers 446 and 450 combine the various outgoing and incoming signals for transmission on a single medium. Working registers 448 hold the results of multiplications performed in TMCU 402. Comparators 452 and 454 provide a compare function for the two inputs. Comparator 452 is a greater than or equal to comparator. Comparator 454 is a greater than comparator. Serializer 456 converts low-speed parallel data from a source to high-speed serial data for transmission. Serializer 456 works in conjunction with deserializers 462 and 464 on SPU sensor 440 . Deserializers 462 and 464 convert received high-speed serial data into low-speed parallel data. Thermal management control state machine 458 initiates internal initialization of TMCU 402. DF unit 460 controls data to and from thermal management control state machine 458 .

TMCU 402可以配置为使用中断逻辑416来引发对PPE的中断以使用调节逻辑418来动态地调节PPE或SPE的执行。TMCU 402 may be configured to use interrupt logic 416 to raise an interrupt to the PPE to use throttling logic 418 to dynamically throttle execution of the PPE or SPE.

TMCU 402将表示温度的数值与可编程的中断温度和可编程的调节点进行比较。每个DTS具有一个独立的可编程的中断温度。如果该温度在已编程的中断温度范围之内,那么如果已启用,则TMCU 402产生对PPE的中断。如果温度在依赖于下述方向比特的已编程级别以上或以下,则产生中断。另外,第二可编程的中断温度可以引发对系统控制器的注意信号。系统控制器在系统面板上并且连接到在SPI端口上的CellBE。The TMCU 402 compares the value representing the temperature to a programmable cutout temperature and a programmable setpoint. Each DTS has an independently programmable cutout temperature. If the temperature is within the programmed interrupt temperature range, the TMCU 402 generates an interrupt to the PPE, if enabled. An interrupt is generated if the temperature is above or below a programmed level depending on the direction bit described below. In addition, a second programmable interrupt temperature may trigger an attention signal to the system controller. The system controller is on the system board and connects to the CellBE on the SPI port.

如果由与PPE或SPE关联的DTS感应的温度等于或高于调节点,则TMCU 402通过独立地开始和停止PPE或一个或多个SPE来调节该PPE或SPE的执行。软件可以使用诸如热管理停止时间寄存器和热管理比例寄存器之类的热管理寄存器来控制调节的比率和频率。If the temperature sensed by the DTS associated with the PPE or SPE is at or above the throttling point, the TMCU 402 regulates the performance of the PPE or SPE by independently starting and stopping the PPE or one or more SPEs. Software can control the rate and frequency of throttling using thermal management registers such as the Thermal Management Stop Time Register and the Thermal Management Scale Register.

图5描述了根据说明性实施例的温度曲线图以及可能发生中断和动态调节的各个点。在图5中,线500可以表示PPE或SPE的温度。如果PPE或SPE正常运行,那么在用“N”标记的区域中不进行调节。当PPE或SPE的温度达到调节点时,TMCU开始调节关联的PPE或SPE的执行。调节发生的区域用“T”标记。当PPE或SPE的温度下降到结束调节点以下时,执行返回到正常操作。FIG. 5 depicts a temperature profile and various points at which interruptions and dynamic adjustments may occur, according to an illustrative embodiment. In FIG. 5, line 500 may represent the temperature of the PPE or SPE. If the PPE or SPE is functioning normally, then no adjustments are made in the area marked with an "N". When the temperature of the PPE or SPE reaches a throttling point, the TMCU begins to throttle the execution of the associated PPE or SPE. Regions where regulation occurs are marked with a "T". When the temperature of the PPE or SPE drops below the end regulation point, execution returns to normal operation.

如果由于任意原因温度继续上升并达到在全面调节点处或之上的温度时,TMCU 402停止PPE或SPE直到温度下降到全面调节点以下。停止PPE或SPE的区域用“S”标记。在温度在全面调节点处或之上时停止PPE或SPE称为内核停止安全性。If for any reason the temperature continues to rise and reaches a temperature at or above the full turndown point, the TMCU 402 stops the PPE or SPE until the temperature drops below the full turndown point. Areas where PPE or SPE are discontinued are marked with an "S". Stopping the PPE or SPE when the temperature is at or above the full regulation point is called core stop safety.

在该示例性图示中,将中断温度设置在调节点以上;因此,TMCU402产生中断,该中断是对软件的通知,即关于因为温度曾经在或仍在内核停止温度以上所以相应的PPE或SPE被停止的通知;假设热中断屏蔽寄存器(TM_ISR)设置为活动的,参见图4中的422,使PPE或SPE在未决中断(pending interrupt)期间能够继续。如果动态调节被禁用,则特权软件管理热状况。不管理热状况可能导致关联的PPE或SPE的不正确运行或由热过载功能引起的热关闭。In this exemplary illustration, the interrupt temperature is set above the throttling point; therefore, the TMCU 402 generates an interrupt, which is a notification to the software that the corresponding PPE or SPE because the temperature was or is still above the core stop temperature Stopped notification; assuming the Thermal Interrupt Mask Register (TM_ISR) is set active, see 422 in Figure 4, enabling the PPE or SPE to continue during a pending interrupt. If dynamic scaling is disabled, privileged software manages thermal conditions. Failure to manage thermal conditions may result in incorrect operation of the associated PPE or SPE or thermal shutdown caused by the thermal overload function.

返回到图4,热传感器状态寄存器包括热传感器当前温度状态寄存器412和热传感器最大温度状态寄存器414。这些寄存器使软件能够读取每个DTS的当前温度,确定在一段时间期间达到的最高温度,并在温度达到可编程的温度时引发中断。热传感器状态寄存器具有关联的可以标记为享有管理特权的真实地址页。Returning to FIG. 4 , the thermal sensor status registers include a thermal sensor current temperature status register 412 and a thermal sensor maximum temperature status register 414 . These registers enable software to read the current temperature of each DTS, determine the maximum temperature reached over a period of time, and raise an interrupt when the temperature reaches a programmable temperature. The thermal sensor status register has an associated real address page that can be marked as administratively privileged.

热传感器当前温度状态寄存器412包含每个DTS的当前温度的编码值或数字值。由于传感器温度检测中的等待时间、读取这些寄存器的等待时间和正常的温度波动,在这些寄存器中报告的温度是较早时间点的温度,可能不能反映软件接收数据时的实际温度。由于每个传感器都有专用的控制逻辑,因此DTS 404、408和410内的控制逻辑并行地对所有的传感器进行采样。TMCU 402在采样周期结束时更新热传感器当前温度状态寄存器412的内容。TMCU 402将热传感器当前温度状态寄存器412中的值改变为当前温度。TMCU 402每个SenSampTime周期都轮询新的当前温度。SenSampTime配置字段控制采样周期的长度。Thermal sensor current temperature status register 412 contains an encoded or digital value of the current temperature for each DTS. Due to latency in sensor temperature detection, latency in reading these registers, and normal temperature fluctuations, the temperature reported in these registers is at an earlier point in time and may not reflect the actual temperature at the time the software received the data. Since each sensor has dedicated control logic, the control logic within DTS 404, 408 and 410 samples all sensors in parallel. The TMCU 402 updates the contents of the thermal sensor current temperature status register 412 at the end of the sampling period. TMCU 402 changes the value in thermal sensor current temperature status register 412 to the current temperature. The TMCU 402 polls for a new current temperature every SenSampTime cycle. The SenSampTime configuration field controls the length of the sampling period.

热传感器最大温度状态寄存器414包含从热传感器最大温度状态寄存器414最后被读取的时间开始,每个传感器达到的数字地编码的最大温度。通过软件或诸如片外设备472或片外I/O设备474之类的任意片外设备读取这些寄存器使TMCU 402将每个传感器的当前温度复制进寄存器。在读取之后,TMCU 402从该点开始继续跟踪最大温度。每个寄存器的读取是独立的。对一个寄存器的读取不影响另一寄存器的内容。Thermal sensor maximum temperature status register 414 contains the digitally encoded maximum temperature reached by each sensor since the time thermal sensor maximum temperature status register 414 was last read. Reading these registers by software or any off-chip device such as off-chip device 472 or off-chip I/O device 474 causes TMCU 402 to copy the current temperature of each sensor into the registers. After the read, the TMCU 402 continues to track the maximum temperature from that point on. Each register is read independently. A read of one register does not affect the contents of the other register.

每个传感器都具有专用的控制逻辑,因此DTS 404、406、408和410内的控制逻辑并行地对所有的传感器进行采样。TMCU 402将热传感器最大温度状态寄存器414中的值改变为当前温度。TMCU 402每个SenSampTime周期都轮询新的当前温度。SenSampTime配置字段控制采样周期的长度。Each sensor has dedicated control logic, so the control logic within DTS 404, 406, 408 and 410 samples all sensors in parallel. TMCU 402 changes the value in thermal sensor maximum temperature status register 414 to the current temperature. The TMCU 402 polls for a new current temperature every SenSampTime cycle. The SenSampTime configuration field controls the length of the sampling period.

中断逻辑416中的热传感器中断寄存器控制对PPE的热管理中断的产生。这组寄存器包括热传感器中断温度寄存器420(TS_ITR1和TS_ITR2)、热传感器中断状态寄存器422(TS_ISR)、热传感器中断屏蔽寄存器424(TS_IMR)和热传感器全局中断温度寄存器426(TS_GITR)。热传感器中断温度寄存器420和热传感器全局中断温度寄存器426包含引起对PPE的热管理中断的温度的编码。The thermal sensor interrupt register in interrupt logic 416 controls the generation of thermal management interrupts to the PPE. The set of registers includes Thermal Sensor Interrupt Temperature Register 420 (TS_ITR1 and TS_ITR2), Thermal Sensor Interrupt Status Register 422 (TS_ISR), Thermal Sensor Interrupt Mask Register 424 (TS_IMR), and Thermal Sensor Global Interrupt Temperature Register 426 (TS_GITR). Thermal sensor interrupt temperature register 420 and thermal sensor global interrupt temperature register 426 contain codes for the temperature that caused the thermal management of the PPE to be interrupted.

当热传感器当前温度状态寄存器412中的针对传感器的用数字格式编码的温度大于或等于热传感器中断温度寄存器420中的相应传感器的中断温度编码时,TMCU 402设置热传感器中断状态寄存器422中的相应状态比特(TS_ISR[Sx])。当热传感器当前温度状态寄存器412中的针对任意传感器的温度编码大于或等于热传感器全局中断温度寄存器426中的全局中断温度编码时,TMCU 402设置热传感器中断状态寄存器422中的相应状态比特(TS_ISR[Gx])。When the temperature coded in digital format for the sensor in thermal sensor current temperature status register 412 is greater than or equal to the interrupt temperature code for the corresponding sensor in thermal sensor interrupt temperature register 420, TMCU 402 sets the corresponding Status bits (TS_ISR[Sx]). When the temperature code for any sensor in the thermal sensor current temperature status register 412 is greater than or equal to the global interrupt temperature code in the thermal sensor global interrupt temperature register 426, the TMCU 402 sets the corresponding status bit (TS_ISR) in the thermal sensor interrupt status register 422 [Gx]).

如果设置了任意热传感器中断状态寄存器422比特(TS_ISR[Sx])并且还设置了热传感器中断屏蔽寄存器424中的相应屏蔽比特(TS_IMR[Mx]),那么TMCU 402引发对PPE的热管理中断信号。如果设置了任意热传感器中断状态寄存器422比特(TS_ISR[Gx])并且还设置了热传感器中断屏蔽寄存器424中的相应屏蔽比特(TS_IMR[Cx]),那么TMCU 402引发对PPE的热管理中断信号。If any Thermal Sensor Interrupt Status Register 422 bit (TS_ISR[Sx]) is set and the corresponding mask bit (TS_IMR[Mx]) in Thermal Sensor Interrupt Mask Register 424 is also set, then the TMCU 402 asserts a thermal management interrupt signal to the PPE . If any Thermal Sensor Interrupt Status Register 422 bit (TS_ISR[Gx]) is set and the corresponding mask bit (TS_IMR[Cx]) in Thermal Sensor Interrupt Mask Register 424 is also set, then the TMCU 402 asserts a thermal management interrupt signal to the PPE .

为了清除中断条件,特权软件应当将热传感器中断屏蔽寄存器中的任意相应的屏蔽比特设置为“0”。为了启用热管理中断,特权软件保证温度在相应传感器的中断温度以下,然后执行以下序列。在温度不在中断温度以下时启用中断可能导致产生立即热管理中断。To clear an interrupt condition, privileged software should set any corresponding mask bit in the thermal sensor interrupt mask register to '0'. To enable thermal management interrupts, privileged software guarantees that the temperature is below the interrupt temperature for the corresponding sensor, and then executes the following sequence. Enabling an interrupt while the temperature is not below the interrupt temperature may result in an immediate thermal management interrupt.

1.将“1”写到热传感器中断状态寄存器422中的相应状态比特。1. Write a "1" to the corresponding status bit in thermal sensor interrupt status register 422.

2.将“1”写到热传感器中断屏蔽寄存器424中的相应屏蔽比特。2. Write a "1" to the corresponding mask bit in thermal sensor interrupt mask register 424 .

热传感器中断温度寄存器420包含位于SPE、PPE中且与线性热二极管相邻的传感器的中断温度级别。TMCU 402将该寄存器中已编码的中断温度级别与热传感器当前温度状态寄存器412中的相应中断温度编码进行比较。这些比较的结果产生热管理中断。每个传感器的中断温度级别是独立的。Thermal sensor interrupt temperature register 420 contains the interrupt temperature level for sensors located in the SPE, PPE and adjacent to the linear thermal diode. The TMCU 402 compares the interrupt temperature level encoded in this register with the corresponding interrupt temperature code in the thermal sensor current temperature status register 412. The results of these comparisons generate thermal management interrupts. The interrupt temperature level is independent for each sensor.

除了在热传感器中断温度寄存器420中设置的独立的中断温度级别之外,热传感器全局中断温度寄存器426包含第二中断温度级别。该级别适用于Cell BE芯片中的所有传感器。TMCU 402将该寄存器中的已编码的全局中断温度级别与每个传感器的当前温度编码进行比较。这些比较的结果产生热管理中断。Thermal sensor global interrupt temperature register 426 contains a second interrupt temperature level in addition to the individual interrupt temperature level set in thermal sensor interrupt temperature register 420 . This level applies to all sensors in the Cell BE chip. The TMCU 402 compares the encoded global interrupt temperature level in this register to each sensor's current temperature encoding. The results of these comparisons generate thermal management interrupts.

全局中断温度的目的是提供对Cell BE芯片中的温度上升的早期指示。特权软件和系统控制器可以使用该信息来启动措施以控制温度,例如,增加扇入速度、在单元之间重新平衡应用软件等等。The purpose of the global interrupt temperature is to provide an early indication of temperature rise in the Cell BE chip. Privileged software and system controllers can use this information to initiate measures to control temperature, such as increasing fan-in speed, rebalancing application software between units, and so on.

热传感器中断状态寄存器422标识哪些寄存器满足中断条件。中断条件是指每个热传感器中断状态寄存器422比特所具有的特定条件,当满足该特定条件时中断可能发生。如果设置了相应的屏蔽比特,那么实际中断只提交给PPE。Thermal sensor interrupt status register 422 identifies which registers satisfy the interrupt condition. An interrupt condition refers to a specific condition that each thermal sensor interrupt status register 422 bit has, and an interrupt may occur when the specific condition is met. The actual interrupt is only submitted to the PPE if the corresponding mask bit is set.

热传感器中断状态寄存器422包含三组状态比特,即数字传感器全局门限中断状态比特(TS_ISR[Gx])、数字传感器门限中断状态比特(TS_ISR[Sx])和数字传感器全局门限以下中断状态比特(TS_ISR[Gb])。The thermal sensor interrupt status register 422 includes three groups of status bits, namely digital sensor global threshold interrupt status bit (TS_ISR[Gx]), digital sensor threshold interrupt status bit (TS_ISR[Sx]) and digital sensor global threshold interrupt status bit (TS_ISR[Sx]). [Gb]).

当热传感器当前温度状态寄存器412中的传感器温度编码大于或等于热传感器中断温度寄存器420中的相应的传感器的中断温度编码并且热传感器中断屏蔽寄存器424中的相应方向比特TM_IMR[Bx]=’0’时,TMCU 402设置热传感器中断状态寄存器422中的状态比特(TS_ISR[Sx])。另外,当热传感器当前温度状态寄存器412中的传感器温度编码低于热传感器中断温度寄存器420中的相应的传感器的中断温度编码并且热传感器中断屏蔽寄存器424中的相应方向比特TM_IMR[Bx]=’1’时,TMCU402设置热传感器中断状态寄存器422,即TS_ISR[Sx]。When the sensor temperature code in the thermal sensor current temperature status register 412 is greater than or equal to the interrupt temperature code of the corresponding sensor in the thermal sensor interrupt temperature register 420 and the corresponding direction bit TM_IMR[Bx]='0 in the thermal sensor interrupt mask register 424 ', the TMCU 402 sets the status bit (TS_ISR[Sx]) in the Thermal Sensor Interrupt Status Register 422. In addition, when the sensor temperature code in the thermal sensor current temperature status register 412 is lower than the interrupt temperature code of the corresponding sensor in the thermal sensor interrupt temperature register 420 and the corresponding direction bit TM_IMR[Bx] in the thermal sensor interrupt mask register 424=' When 1', TMCU 402 sets thermal sensor interrupt status register 422, namely TS_ISR[Sx].

当任意参与的传感器的当前温度大于或等于热传感器全局中断温度寄存器426的当前温度并且热传感器中断屏蔽寄存器424TM_IMR[BG]=‘0’时,TMCU402设置热传感器中断状态寄存器422,即TS_ISR[Gx]。单独的热传感器中断状态寄存器422的TS_ISR[Gx]比特表明哪些单独的传感器满足这些条件。When the current temperature of any participating sensor is greater than or equal to the current temperature of thermal sensor global interrupt temperature register 426 and thermal sensor interrupt mask register 424TM_IMR[B G ]='0', TMCU 402 sets thermal sensor interrupt status register 422, i.e. TS_ISR[ Gx]. The TS_ISR[Gx] bits of the individual thermal sensor interrupt status register 422 indicate which individual sensors meet these conditions.

当热传感器中断屏蔽寄存器424TM_IMR[Cx]中的所有参与的传感器的当前温度低于热传感器全局中断温度寄存器426的当前温度并且热传感器中断屏蔽寄存器424TM_IMR[BG]=‘1’时,TMCU 402设置热传感器中断状态寄存器422,即TS_ISR[Gb]。由于所有参与的传感器的当前温度低于热传感器全局中断温度寄存器426的当前温度,因此对于全局门限以下中断条件,只出现热传感器中断状态寄存器422中的一个状态比特(TS_ISR[Gb])。When the current temperature of all participating sensors in thermal sensor interrupt mask register 424TM_IMR[Cx] is lower than the current temperature of thermal sensor global interrupt temperature register 426 and thermal sensor interrupt mask register 424TM_IMR[B G ]='1', TMCU 402 Sets the Thermal Sensor Interrupt Status Register 422, ie TS_ISR[Gb]. Since the current temperature of all participating sensors is lower than the current temperature of the thermal sensor global interrupt temperature register 426, only one status bit (TS_ISR[Gb]) in the thermal sensor interrupt status register 422 is present for the global subthreshold interrupt condition.

一旦将热传感器中断状态寄存器422中的一个状态比特(TS_ISR[Sx]、TS_ISR[Gx]或TS_ISR[Gb])设置为‘1’,TMCU 402就维护该状态直到由特权软件重置为‘0’。特权软件通过将‘1’写到热传感器中断状态寄存器422中的相应比特而将状态比特重置为‘0’。Once a status bit (TS_ISR[Sx], TS_ISR[Gx], or TS_ISR[Gb]) in Thermal Sensor Interrupt Status Register 422 is set to '1', the TMCU 402 maintains that status until reset to '0' by privileged software '. Privileged software resets the status bits to '0' by writing a '1' to the corresponding bit in thermal sensor interrupt status register 422.

热传感器中断屏蔽寄存器424包含单独传感器的两个字段和全局中断条件的多个字段。中断条件是指每个热传感器中断状态寄存器422比特所具有的特定条件,当满足该特定条件时中断可能发生。如果设置了相应的屏蔽比特,那么实际中断只提交给PPE。Thermal sensor interrupt mask register 424 contains two fields for individual sensors and multiple fields for global interrupt conditions. An interrupt condition refers to a specific condition that each thermal sensor interrupt status register 422 bit has, and an interrupt may occur when the specific condition is met. The actual interrupt is only submitted to the PPE if the corresponding mask bit is set.

单独传感器的两个热传感器中断屏蔽寄存器的数字热门限中断字段是TS_IMR[Mx]和TS_IMR[Bx]。热传感器中断屏蔽寄存器424的屏蔽比特TS_IMR[Mx]防止中断状态比特产生对PPE的热管理中断。热传感器中断屏蔽寄存器424的方向比特TS_IMR[Bx]将中断条件的温度方向设置为高于或低于热传感器中断温度寄存器420中的相应温度。将热传感器中断屏蔽寄存器424的TS_IMR[Bx]设置为‘1’将中断条件的温度设置为低于热传感器中断温度寄存器420中的相应温度。将热传感器中断屏蔽寄存器424的TS_IMR[Bx]设置为‘0’将中断条件的温度设置为等于或高于热传感器中断温度寄存器420中的相应温度。The digital thermal limit interrupt fields of the two thermal sensor interrupt mask registers for individual sensors are TS_IMR[Mx] and TS_IMR[Bx]. Mask bits TS_IMR[Mx] of thermal sensor interrupt mask register 424 prevent interrupt status bits from generating thermal management interrupts to the PPE. Direction bit TS_IMR[Bx] of thermal sensor interrupt mask register 424 sets the temperature direction of the interrupt condition to be above or below the corresponding temperature in thermal sensor interrupt temperature register 420 . Setting TS_IMR[Bx] of Thermal Sensor Interrupt Mask Register 424 to '1' sets the temperature of the interrupt condition to be lower than the corresponding temperature in Thermal Sensor Interrupt Temperature Register 420 . Setting TS_IMR[Bx] of thermal sensor interrupt mask register 424 to '0' sets the temperature of the interrupt condition to be equal to or higher than the corresponding temperature in thermal sensor interrupt temperature register 420 .

针对全局中断条件的热传感器中断屏蔽寄存器424字段是TS_IMR[Cx]、TS_IMR[BG]、TS_IMR[Cgb]和TS_IMR[A]。热传感器中断屏蔽寄存器424的屏蔽比特TS_IMR[Cx]防止全局门限中断并且选择哪些传感器参与全局门限以下中断条件。热传感器中断屏蔽寄存器424的方向比特TS_IMR[BG]选择针对全局中断条件的温度方向。热传感器中断屏蔽寄存器424的屏蔽比特TS_IMR[Cgb]防止全局门限以下中断。热传感器中断屏蔽寄存器424TS_IMR[A]引发对系统控制器的注意信号。注意信号是一种对系统控制器的信号,表明普适逻辑需要注意或具有针对系统控制器的状态。可以将注意信号映射到系统控制器中的中断。系统控制器在系统面板(planer)上并且连接到在SPI端口上的CellBroadband Engine。The thermal sensor interrupt mask register 424 fields for global interrupt conditions are TS_IMR[Cx], TS_IMR[B G ], TS_IMR[Cgb], and TS_IMR[A]. Mask bits TS_IMR[Cx] of thermal sensor interrupt mask register 424 prevent global threshold interrupts and select which sensors participate in global sub-threshold interrupt conditions. Direction bits TS_IMR[B G ] of thermal sensor interrupt mask register 424 select the temperature direction for the global interrupt condition. Mask bit TS_IMR[Cgb] of thermal sensor interrupt mask register 424 prevents interrupts below the global threshold. Thermal sensor interrupt mask register 424TS_IMR[A] raises an attention signal to the system controller. The attention signal is a signal to the system controller that the pervasive logic requires attention or has a status for the system controller. Attention signals can be mapped to interrupts in the system controller. The system controller is on the system plane (planer) and is connected to the CellBroadband Engine on the SPI port.

将热传感器中断屏蔽寄存器424的TS_IMR[BG]设置为‘1’,就将针对全局中断条件的温度范围设置为当在热传感器中断屏蔽寄存器424的TS_IMR[Cx]中设置的所有参与传感器的温度都低于全局中断温度级别时发生。将热传感器中断屏蔽寄存器424的TS_IMR[BG]设置为‘0’,就将针对全局中断条件的温度范围设置为当任意参与传感器的温度大于或等于热传感器全局中断温度寄存器426中的相应温度时发生。如果热传感器中断屏蔽寄存器424的TS_IMR[A]设置为‘1’,那么当任意热传感器中断屏蔽寄存器424TS_IMR[Cx]比特和它相应的热传感器中断状态寄存器422状态比特(TS_ISR[Gx])都设置为‘1’时TMCU 402引发注意信号。另外,当热传感器中断屏蔽寄存器424的TS_IMR[Cgb]和热传感器中断状态寄存器422的TS_ISR[Gb]都设置为‘1’时,TMCU402引发注意信号。Setting TS_IMR[B G ] of thermal sensor interrupt mask register 424 to '1' sets the temperature range for the global interrupt condition to that of all participating sensors when set in TS_IMR[Cx] of thermal sensor interrupt mask register 424 Occurs when temperatures are below the global interrupt temperature level. Setting TS_IMR[B G ] of thermal sensor interrupt mask register 424 to '0' sets the temperature range for the global interrupt condition to be when the temperature of any participating sensor is greater than or equal to the corresponding temperature in thermal sensor global interrupt temperature register 426 happens when. If TS_IMR[A] of thermal sensor interrupt mask register 424 is set to '1', then any thermal sensor interrupt mask register 424 TS_IMR[Cx] bit and its corresponding thermal sensor interrupt status register 422 status bit (TS_ISR[Gx]) are both When set to '1' the TMCU 402 raises an attention signal. In addition, TMCU 402 raises an attention signal when both TS_IMR[Cgb] of thermal sensor interrupt mask register 424 and TS_ISR[Gb] of thermal sensor interrupt status register 422 are set to '1'.

当任意热传感器中断屏蔽寄存器424 TS_IMR[Mx]比特和它相应的热传感器中断状态寄存器422状态比特(TS_ISR[Sx])都设置为‘1’时,TMCU 402将热管理中断提交给PPE。当任意热传感器中断屏蔽寄存器424 TS_IMR[Cx]比特和它相应的热传感器中断状态寄存器422状态比特(TS_ISR[Gx])都设置为‘1’时,TMCU 402产生热管理中断。另外,当热传感器中断屏蔽寄存器424的TS_IMR[Cgb]和热传感器中断状态寄存器422的TS_ISR[Gb]都设置为‘1’时,TMCU 402将热管理中断提交给PPE。The TMCU 402 submits a thermal management interrupt to the PPE when any Thermal Sensor Interrupt Mask Register 424 TS_IMR[Mx] bit and its corresponding Thermal Sensor Interrupt Status Register 422 Status bit (TS_ISR[Sx]) are both set to '1'. The TMCU 402 generates a thermal management interrupt when any thermal sensor interrupt mask register 424 TS_IMR[Cx] bit and its corresponding thermal sensor interrupt status register 422 status bit (TS_ISR[Gx]) are both set to '1'. Additionally, the TMCU 402 submits a thermal management interrupt to the PPE when both TS_IMR[Cgb] of the thermal sensor interrupt mask register 424 and TS_ISR[Gb] of the thermal sensor interrupt status register 422 are set to '1'.

调节逻辑418中的动态热管理寄存器包含用于控制PPE或SPE的执行调节的参数。动态热管理寄存器是一组寄存器,包括热管理控制寄存器430(TM_CR1和TM_CR2)、热管理调节点寄存器432(TM_TPR)、热管理停止时间寄存器434(TM_STR1和TM_STR2)、热管理调节比例寄存器436(TM_TSR)和热管理系统中断屏蔽寄存器438(TM_SIMR)。The dynamic thermal management registers in the tuning logic 418 contain parameters for controlling the performance tuning of the PPE or SPE. The dynamic thermal management register is a group of registers, including thermal management control register 430 (TM_CR1 and TM_CR2), thermal management regulation point register 432 (TM_TPR), thermal management stop time register 434 (TM_STR1 and TM_STR2), thermal management regulation ratio register 436 ( TM_TSR) and Thermal Management System Interrupt Mask Register 438 (TM_SIMR).

热管理调节点寄存器432设置传感器的调节温度点。可以在热管理调节点寄存器432中设置两个独立的调节温度点,即ThrottlePPE和ThrottleSPE,一个用于PPE并且另一个用于SPE。该寄存器中还包含用于禁用调节和停止PPE或SPE的温度点。PPE或SPE的执行调节在温度等于或高于调节点时开始。调节在温度下降到用以禁用调节的温度(TM_TPR[EndThrottlePPE/EndThrottleSPE])以下时停止。如果温度达到全面调节温度或停止温度(TM_TPR[FullThrottlePPE/FullThrottleSPE]),则TMCU402停止PPE或SPE的执行。热管理控制寄存器430控制调节行为。The thermal management trim point register 432 sets the trim temperature point for the sensor. Two separate throttling temperature points, ThrottlePPE and ThrottleSPE, may be set in thermal management throttling point register 432, one for the PPE and the other for the SPE. Also included in this register are temperature points for disabling regulation and stopping the PPE or SPE. Execution regulation of PPE or SPE begins when the temperature is at or above the regulation point. Throttling stops when the temperature drops below the temperature used to disable throttling (TM_TPR[EndThrottlePPE/EndThrottleSPE]). If the temperature reaches a full throttle temperature or a stop temperature (TM_TPR[FullThrottlePPE/FullThrottleSPE]), the TMCU 402 stops execution of the PPE or SPE. Thermal management control register 430 controls throttling behavior.

热管理停止时间寄存器434和热管理调节比例寄存器436控制调节频率和调节量。当温度达到调节点时,TMCU 402将相应的PPE或SPE停止一定时钟数,该时钟数由热管理停止时间寄存器434中的相应值中的停止时间乘以热管理调节比例寄存器436中的相应比例值来指定。然后TMCU 402使PPE或SPE能够运行一定的时钟数,该时钟数由运行时间乘以相应的比例值来指定,其中运行时间是依赖于实现方式的固定时间量减去停止时间之间的差值。热管理调节比例寄存器436中的可编程的比例值是停止时间和运行时间的乘数。一个示例可以是(Stop×Scale)/(Run×Scale)((停止时间×比例)/(运行时间×比例))。内核停止的时间百分比保持相同,但是周期增大或频率减小。该序列继续直到温度下降到禁用调节(TM_TPR[EndThrottlePPE/EndThrottleSPE])以下。The thermal management stop time register 434 and the thermal management throttling ratio register 436 control throttling frequency and throttling amount. When the temperature reaches the regulation point, TMCU 402 stops the corresponding PPE or SPE for a certain number of clocks, which is multiplied by the corresponding ratio in the thermal management regulation ratio register 436 by the stop time in the corresponding value in the thermal management stop time register 434 value to specify. The TMCU 402 then enables the PPE or SPE to run for a number of clocks specified by the run time multiplied by the corresponding scale value, where the run time is the difference between an implementation-dependent fixed amount of time minus the stop time . The programmable scale value in thermal management throttling scale register 436 is a multiplier for stop time and run time. An example could be (Stop*Scale)/(Run*Scale)((Stop Time*Scale)/(Run Time*Scale)). The percentage of time the core is stalled remains the same, but the period is increased or the frequency is decreased. This sequence continues until the temperature drops below disable regulation (TM_TPR[EndThrottlePPE/EndThrottleSPE]).

热管理系统中断屏蔽寄存器438选择哪个PPE中断将使TMCU402禁用调节。当这些中断仍然未决并且屏蔽仍然选择未决中断时TMCU402将继续阻止调节。如果取消选定屏蔽或中断不再是未决的,则TMCU402将不再阻止中断。Thermal management system interrupt mask register 438 selects which PPE interrupt will cause TMCU 402 to disable throttling. The TMCU 402 will continue to block throttling while these interrupts are still pending and the mask still selects the pending interrupt. If the mask is deselected or the interrupt is no longer pending, the TMCU 402 will no longer block interrupts.

热管理控制寄存器430独立地为每个PPE或SPE设置调节模式。在两个寄存器之间拆分控制比特。下面是可以独立地为每个PPE或SPE设置的五个不同的模式:The thermal management control register 430 sets the throttling mode independently for each PPE or SPE. Split the control bits between the two registers. Below are five different modes that can be set independently for each PPE or SPE:

禁用动态调节(包括内核停止安全性);Disable dynamic scaling (including kernel stop safety);

正常操作(启用动态调节和内核停止安全性);Normal operation (dynamic scaling and kernel halted safety enabled);

始终调节PPE或SPE(启用内核停止安全性);Always throttle PPE or SPE (kernel stop safety enabled);

禁用内核停止安全性(启用动态调节并且禁用内核停止安全性);disable kernel stop safety (enable dynamic scaling and disable kernel stop safety);

始终调节PPE或SPE并且禁用内核停止安全性。Always tune PPE or SPE and disable kernel stop safety.

特权软件应当针对运行应用或操作系统的PPE或SPE而将控制比例设置为正常操作。如果PPE或SPE没有运行应用代码,则特权软件应当将控制比特设置为禁用。“始终调节PPE或SPE”模式拟用于应用开发。这些模式对确定应用是否能够在极限调节条件下运行是有用的。应当只在特权软件主动地管理热事件时才使PPE或SPE能够在禁用动态调节或内核停止安全性的情况下执行。Privileged software should set the control ratio to normal operation for the PPE or SPE running the application or operating system. If the PPE or SPE is not running application code, privileged software should set the control bits to disabled. The "always adjust PPE or SPE" mode is intended for application development. These modes are useful for determining whether an application can operate under extreme regulation conditions. Enabling PPE or SPE to execute with dynamic scaling or kernel stall security disabled should only be done when privileged software is actively managing thermal events.

热管理系统中断屏蔽寄存器438控制哪个PPE中断使热管理逻辑暂时停止调节PPE。TMCU 402在中断未决的同时暂时挂起对这两个线程的调节,而不管中断所指向的线程。当中断不再未决时,只要调节条件仍然存在调节就可以继续。从不基于系统中断条件而禁用对SPE的调节。可以优先于调节条件的PPE中断条件如下:Thermal management system interrupt mask register 438 controls which PPE interrupt causes thermal management logic to temporarily stop regulating the PPE. TMCU 402 temporarily suspends mediation of these two threads while the interrupt is pending, regardless of which thread the interrupt is directed to. When an interrupt is no longer pending, throttling can continue as long as the throttling condition still exists. The throttling of the SPE is never disabled based on a system interrupt condition. The PPE interruption conditions that can take precedence over conditioning conditions are as follows:

外部external

减量器Reducer

管理程序(Hypervisor)减量器Hypervisor reducer

系统错误system error

热管理thermal management

热管理调节点寄存器432包含PPE或SPE的执行调节开始和结束时的已编码温度点。该寄存器还包含PPE或SPE的执行被全面调节时的已编码温度点。Thermal management throttling point register 432 contains encoded temperature points at which execution throttling of the PPE or SPE begins and ends. This register also contains the encoded temperature point at which the execution of the PPE or SPE is fully tuned.

软件使用热管理调节点寄存器中的值来设置用于在三个热管理状态之间改变的三个温度点,这三个状态是:正常运行(N)、调节PPE或SPE(T)以及停止PPE或SPE(S)。TMCU 402支持针对PPE和SPE的独立温度点。Software uses the value in the Thermal Management Throttle Point register to set the three temperature points for changing between the three thermal management states: normal operation (N), throttling PPE or SPE (T), and stop PPE or SPE(S). The TMCU 402 supports separate temperature points for PPE and SPE.

当热传感器当前温度状态寄存器412中的已编码的传感器当前温度等于或大于调节温度(ThrottlePPE/ThrottleSPE)时,如果已经启用,那么相应的PPE或SPE的执行调节就会开始。执行调节继续到相应的传感器的已编码当前温度小于结束调节(EndThrottlePPE/EndThrottleSPE)的已编码温度为止。作为一种安全措施,如果已编码的当前温度等于或大于全面调节点(FullThrottlePPE/FullThrottleSPE),那么TMCU 402停止相应的PPE或SPE。When the encoded sensor current temperature in the thermal sensor current temperature status register 412 is equal to or greater than the throttling temperature (ThrottlePPE/ThrottleSPE), if enabled, execution throttling of the corresponding PPE or SPE will begin. Execution of throttling continues until the encoded current temperature of the corresponding sensor is less than the encoded temperature of the end throttling (EndThrottlePPE/EndThrottleSPE). As a safety measure, if the encoded current temperature is equal to or greater than the full throttling point (FullThrottlePPE/FullThrottleSPE), then the TMCU 402 stops the corresponding PPE or SPE.

热管理停止时间寄存器434控制在热管理调节状态下应用于特定PPE或SPE的调节量。热管理停止时间寄存器434中由软件设置的值表示内核将停止的时间量相对于允许内核运行的时间量的比值(stop/run)或者内核停止的时间百分比。热管理调节比例寄存器436控制PPE或SPE停止和运行的实际时钟数(NClks)。The thermal management stop time register 434 controls the amount of throttling applied to a particular PPE or SPE while in the thermal management throttling state. The value set by software in thermal management stop time register 434 indicates the amount of time the core will be stopped relative to the amount of time the core is allowed to run (stop/run) or the percentage of time the core is stopped. Thermal management throttling scale register 436 controls the actual number of clocks (NClks) that the PPE or SPE stops and runs.

热管理调节比例寄存器436控制PPE或SPE在热管理调节状态期间停止和运行的实际周期数。该寄存器中的值是配置环设置TM_config[MinStopSPE]的倍数。下面的等式计算实际的停止和运行周期数:Thermal management throttling scale register 436 controls the actual number of cycles that the PPE or SPE is stopped and run during the thermal management throttling state. The value in this register is a multiple of the configuration ring setting TM_config[MinStopSPE]. The following equations calculate the actual number of stop and run cycles:

SPE运行和停止时间:SPE running and stopping time:

SPE_StopTime=(TM_STR1[StopCore(x)]*SPE_StopTime=(TM_STR1[StopCore(x)]*

TM_Config[MinStopSPE])*TM_TSR[ScaleSPE]TM_Config[MinStopSPE])*TM_TSR[ScaleSPE]

SPE_RunTime=(32-TM_STR1[StopCore(x)])*SPE_RunTime=(32-TM_STR1[StopCore(x)])*

TM_Config[MinStopSPE])*TM_TSR[ScaleSPE]TM_Config[MinStopSPE])*TM_TSR[ScaleSPE]

Power 

Figure C200710105486D00261
单元运行和停止时间:power
Figure C200710105486D00261
Unit run and stop times:

PPE_StopTime=(TM_STR2[StopCore(8)]*PPE_StopTime=(TM_STR2[StopCore(8)]*

TM_Config[MinStopPPE])*TM_TSR[ScalePPE]TM_Config[MinStopPPE])*TM_TSR[ScalePPE]

PPE_RunTime=(32-TM_STR2[StopCore(8)])*PPE_RunTime=(32-TM_STR2[StopCore(8)])*

TM_Config[MinStopPPE])*TM_TSR[ScalePPE]TM_Config[MinStopPPE])*TM_TSR[ScalePPE]

运行和停止时间可以通过中断和对各种热管理寄存器进行写入的特权软件来改变。Run and stop times can be changed by interrupts and privileged software writing to various thermal management registers.

片上性能监控器466可以提供可以跟踪由诸如DTS404、406、408和410之类的温度感应设备提供的热数据的性能监控。可以将热数据存储在存储器470中或者写到诸如图2的主存储器208之类的片外设备472或写到诸如图2的南桥和输入/输出(I/O)控制器中心(ICH)204之类的片外I/O设备474。位于性能监控器466中的控制器468控制确定将热数据发送到哪里。On-chip performance monitor 466 may provide performance monitoring that may track thermal data provided by temperature sensing devices such as DTS 404 , 406 , 408 , and 410 . Thermal data may be stored in memory 470 or written to an off-chip device 472 such as main memory 208 of FIG. 2 or to a south bridge and input/output (I/O) controller hub (ICH) such as FIG. 2 An off-chip I/O device 474 such as 204. Controller 468 located in performance monitor 466 controls the determination of where to send thermal data.

尽管以下描述针对一个指令流和一个处理器,但是该指令流可以是一组指令流并且该处理器可以是一组处理器。也就是说,一组可以是单个指令流和单个处理器或者两个或更多指令流和处理器。Although the following description is directed to one instruction stream and one processor, the instruction stream may be a set of instruction streams and the processor may be a set of processors. That is, a group can be a single instruction stream and a single processor or two or more instruction streams and processors.

利用上述体系结构,针对Cell BE芯片的热管理和热调节进行了很多改进并添加了可编程性。这些改进和所添加的可编程性中的一些使得可以实现关键特征而另一些增强了可用性。With the architecture described above, many improvements have been made and programmability has been added to the thermal management and thermal regulation of the Cell BE chips. Some of these improvements and added programmability enable key features while others enhance usability.

图6描述了根据说明性实施例的用于记录最大温度的操作的流程图。随着操作开始,包含诸如图3的Cell BE芯片300之类的Cell Be芯片的计算机系统启动或重启(步骤602)。如以前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。对于诸如图4的DTS 404、406、408和410之类的每个DTS,该热管理系统包括一组最大温度状态寄存器和一组当前温度状态寄存器,诸如图4的最大温度状态寄存器414和当前温度状态寄存器412。当前温度状态寄存器存储它的目标DTS自诸如图4的热管理控制状态机458之类的热管理控制状态机最后一次感应DTS之后的当前温度。最大温度状态寄存器存储它的目标DTS自计算机系统最后一次读取最大温度状态寄存器或计算机系统重启之后的最大温度。可以使用任意数量的诸如处理器、集成电路之类的设备或通过使用串行外围接口(SPI)端口或联合测试行动小组(JTAG)端口的设备来读取最大温度状态寄存器。但是,通过JTAG端口读取寄存器不会导致重启。FIG. 6 depicts a flowchart of operations for recording maximum temperature, in accordance with an illustrative embodiment. As operations begin, the computer system that includes the Cell BE chip, such as the Cell BE chip 300 of FIG. 3, starts up or restarts (step 602). As mentioned before, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 of FIG. 3 . For each DTS such as DTS 404, 406, 408 and 410 of FIG. Temperature Status Register 412 . The current temperature status register stores its target DTS and the current temperature since the DTS was last sensed by a thermal management control state machine, such as thermal management control state machine 458 of FIG. 4 . The maximum temperature status register stores its target DTS maximum temperature since the last time the computer system read the maximum temperature status register or since the computer system was restarted. The maximum temperature status register can be read using any number of devices such as processors, integrated circuits, or by using a serial peripheral interface (SPI) port or a joint test action group (JTAG) port. However, reading registers through the JTAG port does not cause a reboot.

说明性地将以下讨论限于一个DTS,计算机系统启动或重启(步骤602)之后的最大温度是零。一旦热管理控制状态机感应到DTS的温度,该热管理控制状态机就将感应的DTS的温度发送到诸如图4的比较器454之类的比较器(步骤604)。该比较器将感应温度与最大温度状态寄存器中存储的针对该DTS的当前最大温度进行比较(步骤606)。如果在步骤606感应温度高于最大温度状态寄存器中存储的当前最大温度,那么感应温度变成新的最大温度并且热管理控制状态机将新的最大温度记录到最大温度状态寄存器中(步骤608)。也就是说,热管理控制状态机覆盖或替换最大温度状态寄存器中存储的当前最大温度。如果在步骤606感应温度低于或等于最大温度状态寄存器中存储的当前最大温度,那么最大温度状态寄存器保持最大温度状态寄存器中现有的当前最大温度(步骤610)。Illustratively limiting the following discussion to one DTS, the maximum temperature after a computer system startup or restart (step 602 ) is zero. Once the thermal management control state machine senses the temperature of the DTS, the thermal management control state machine sends the sensed temperature of the DTS to a comparator, such as comparator 454 of FIG. 4 (step 604). The comparator compares the sensed temperature to the current maximum temperature for the DTS stored in the maximum temperature status register (step 606). If at step 606 the sensed temperature is higher than the current maximum temperature stored in the maximum temperature status register, then the sensed temperature becomes the new maximum temperature and the thermal management control state machine records the new maximum temperature into the maximum temperature status register (step 608) . That is, the thermal management control state machine overrides or replaces the current maximum temperature stored in the maximum temperature status register. If the sensed temperature is lower than or equal to the current maximum temperature stored in the maximum temperature status register at step 606, then the maximum temperature status register holds the existing current maximum temperature in the maximum temperature status register (step 610).

最大温度状态寄存器中的当前最大温度停留在最大温度直到计算机系统以读取请求的形式读取最大温度状态寄存器(步骤612)或计算机系统重启为止。如果没有读取当前最大温度,那么操作返回到步骤604。如果在步骤612计算机系统读取当前最大温度,那么热管理控制状态机将当前最大温度重置为当前温度状态寄存器中的当前温度(步骤614),然后操作返回到步骤604。The current maximum temperature in the maximum temperature status register stays at the maximum temperature until the computer system reads the maximum temperature status register in the form of a read request (step 612) or the computer system is restarted. If the current maximum temperature is not read, then operation returns to step 604 . If the computer system reads the current maximum temperature at step 612 , the thermal management control state machine resets the current maximum temperature to the current temperature in the current temperature status register (step 614 ) and operation returns to step 604 .

对于该操作的一个示例,如果诸如处理器内核或处理器本身之类的特定单元的DTS在一段时间上要感应67℃、70℃、75℃、72℃和74℃的温度,那么最大温度状态寄存器中的最大温度将是75℃。如果在对DTS的第四次感应之后,计算机系统发出读取请求,那么返回的最大温度将是75℃。但是,此时热管理控制状态机将最大温度重置为当前温度,并且在由DTS执行的最后一次感应之后,最大温度状态寄存器中的最大温度将是74℃。For an example of this operation, if the DTS for a particular unit, such as a processor core or the processor itself, is to sense temperatures of 67°C, 70°C, 75°C, 72°C, and 74°C over a period of time, then the maximum temperature state The maximum temperature in the register will be 75°C. If the computer system issues a read request after the fourth sense of the DTS, the maximum temperature returned will be 75°C. However, at this point the thermal management control state machine resets the maximum temperature to the current temperature, and after the last sense performed by the DTS, the maximum temperature in the maximum temperature status register will be 74°C.

这样,最大温度状态寄存器的目的是记录DTS自最大温度寄存器最后一次被读取之后达到的最大温度。该最大温度信息帮助操作系统在不用连续轮询当前温度寄存器的情况下确定DTS在应用或程序执行期间达到的最大温度。连续轮询将影响系统的性能,因此可能影响最大温度。另外,轮询当前温度不能保证读取到最大温度。如果最大温度发生在对当前温度的多次读取之间,就属于这种情况。Thus, the purpose of the Max Temperature Status Register is to record the maximum temperature the DTS has reached since the Max Temperature Register was last read. This maximum temperature information helps the operating system determine the maximum temperature the DTS reached during application or program execution without continuously polling the current temperature register. Continuous polling will affect the performance of the system and therefore may affect the maximum temperature. Also, polling for the current temperature does not guarantee that the maximum temperature will be read. This is the case if the maximum temperature occurs between multiple readings of the current temperature.

图7描述了根据另一个说明性实施例的用于通过性能监控来跟踪热数据的操作的流程图。如前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。可以通过诸如图4的性能监控器466之类的性能监控器提供性能监控。性能监控可以跟踪由诸如图4的DTS404、406、408和410之类的温度感应设备提供的在诸如图4的存储器470之类的其内部存储器中的热数据,写到诸如图2的主存储器208之类的主存储器或图4的片外设备472,或写到诸如图2的南桥和输入/输出(I/O)控制器中心(ICH)204或图4的片外I/O设备474之类的I/O设备。7 depicts a flowchart of operations for tracking thermal data through performance monitoring, according to another illustrative embodiment. As mentioned above, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 in FIG. 3 . Performance monitoring may be provided by a performance monitor, such as performance monitor 466 of FIG. 4 . Performance monitoring can track thermal data provided by temperature sensing devices such as DTS 404, 406, 408 and 410 of FIG. 4 in their internal memory, such as memory 470 of FIG. Main memory such as 208 or off-chip device 472 of FIG. 4, or write to such as the south bridge of FIG. I/O devices such as 474.

性能监控支持两种主要的跟踪模式:跟踪固定时间段和持续跟踪。对热性能的跟踪可以是诸如图5的跟踪500之类的跟踪。性能监控还可以规定采样频率的配置以控制两个连续的采样之间的时间段。此外,可以使用热信息压缩来增大采样间隔。一种压缩技术是只在发生变化时存储热信息。对相同的热采样的数量的计数还可以与热信息一起存储。因为热信息典型地变化缓慢,所以这是一种有用的技术。Performance Monitoring supports two main tracing modes: tracing for a fixed period of time and continuous tracing. The track of thermal performance may be a track such as track 500 of FIG. 5 . Performance monitoring can also specify the configuration of the sampling frequency to control the time period between two consecutive samples. Additionally, thermal information compression can be used to increase the sampling interval. One compression technique is to store hot information only when changes occur. A count of the number of identical thermal samples may also be stored with the thermal information. This is a useful technique because thermal information typically changes slowly.

随着用于通过性能监控器来跟踪热数据的操作开始,诸如图4的热管理控制状态机458之类的热管理控制状态机将性能监控器设置为跟踪模式(步骤702)。说明性地将以下讨论限于一个DTS,热管理控制状态机感应DTS的温度(步骤704)并将感应到的DTS的温度发送到当前温度状态寄存器和/或其他数据结构以进行存储(步骤706)。此时热管理控制状态机确定性能监控器是否还在运行(步骤708)。一旦性能监控器在步骤702中启动,则该性能监控器将运行用户指定的时间段或运行到由用户通过用户输入而停止。但是,性能监控器还可以基于特定的热状况而停止。该特定的热状况称为触发器,诸如在一组信号上寻找特定条件的逻辑分析器。触发器的使用在软件调试中很有用。例如,用户可以将性能监控器设置为在达到热状况时停止或检停(checkstop)系统。这可以使用户能够准确地确定哪条代码或代码组合在引发热状况。如果性能监控器在步骤708仍然在运行,则操作返回到步骤704。As operations for tracking thermal data by the performance monitor begin, a thermal management control state machine, such as thermal management control state machine 458 of FIG. 4, sets the performance monitor into tracking mode (step 702). Illustratively limiting the following discussion to one DTS, the thermal management control state machine senses the temperature of the DTS (step 704) and sends the sensed temperature of the DTS to a current temperature status register and/or other data structure for storage (step 706) . At this point the thermal management control state machine determines whether the performance monitor is still running (step 708). Once the performance monitor is started in step 702, the performance monitor will run for a user-specified period of time or until stopped by the user via user input. However, the performance monitor can also be stopped based on certain thermal conditions. This specific thermal condition is called a trigger, such as a logic analyzer that looks for a specific condition on a set of signals. The use of flip-flops is useful in software debugging. For example, a user can set the performance monitor to stop or checkstop the system when a thermal condition is reached. This enables the user to determine exactly which code or combination of codes is causing the thermal condition. If the performance monitor is still running at step 708, then operation returns to step 704.

返回到步骤708,如果性能监控器不再运行,则热管理控制状态机读取存储在存储器中的温度信息并以图形形式为用户显示所存储的信息(步骤710),之后操作结束。在步骤706发送到当前温度状态寄存器和/或其他数据结构的感应温度还可以在操作仍然在箭头712所表明的处理(步骤710)中时同时显示,而不是等待跟踪结束。Returning to step 708, if the performance monitor is no longer running, the thermal management control state machine reads the temperature information stored in memory and graphically displays the stored information to the user (step 710), after which the operation ends. The sensed temperature sent to the current temperature status register and/or other data structures at step 706 may also be displayed while the operation is still in the process indicated by arrow 712 (step 710 ), rather than waiting for the trace to end.

这样,性能监控器跟踪由DTS提供的热数据。自动跟踪热数据消除了对软件持续轮询当前温度寄存器的需要。性能监控对于收集工作负载的热数据很重要,原因是性能监控不需要插入附加代码来轮询热数据,这种插入可能会改变工作负载的行为。换句话说,性能监控提供非侵入的方法来实时跟踪软件应用的热特征数据。将热信息发送到性能监控器的另外的好处是能够触发或停止对预先指定的热状况上的热信息的记录。另外,性能监控器还可以用于在满足热状况时停止系统(或检停)。这样做使用户能够确定哪个代码段或代码段组合正在产生热状况。然后用户可以重写代码段或避免特定的组合,从而避免了热事件。Thus, the performance monitor tracks thermal data provided by DTS. Automatic tracking of thermal data eliminates the need for software to continuously poll the current temperature register. Performance monitoring is important for collecting hot data for workloads because performance monitoring does not require the insertion of additional code to poll for hot data, which could change the behavior of the workload. In other words, performance monitoring provides a non-intrusive way to track thermal signature data of software applications in real time. An additional benefit of sending thermal information to the performance monitor is the ability to trigger or stop logging of thermal information on pre-specified thermal conditions. Additionally, performance monitors can also be used to shut down the system (or checkout) when thermal conditions are met. Doing so enables the user to determine which code segment or combination of code segments is producing the thermal condition. Users can then rewrite code segments or avoid specific combinations, thus avoiding hot events.

图8A和图8B描述了根据另外的说明性实施例的针对高级热中断产生的操作的流程图。如前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。高级热中断产生是帮助操作系统处理热事件的另一个特征。高级热中断逻辑是诸如图4的TMCU 402之类的热管理控制单元的一部分。当有热状况时(也就是芯片温度上升到一定的门限以上),热中断警告操作系统。在这种情况下,操作系统应当采取正确措施来降低芯片温度。正确措施可以由软件中断处理程序处理,软件中断处理程序是一条处理热状况并初始化正确措施的代码。然后操作系统等待热状况在继续正常操作之前消失。这通常需要操作系统等待特定的时间量,然后轮询处理器的温度以确定继续正常操作是否安全。使用高级热中断产生,操作系统可以设置中断以检测温度何时下降到一定的门限以下,从而消除了对轮询当前温度寄存器的需要。图4的热传感器中断屏蔽寄存器424(TS_IMR)和热传感器中断状态寄存器422(TS_ISR)的组合使操作系统处理热事件更加容易。8A and 8B depict flow diagrams of operations for advanced thermal interrupt generation, according to further illustrative embodiments. As mentioned above, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 in FIG. 3 . Advanced thermal interrupt generation is another feature that helps the operating system handle thermal events. Advanced thermal interrupt logic is part of a thermal management control unit such as TMCU 402 of FIG. 4 . A thermal interrupt alerts the operating system when there is a thermal condition (that is, the chip temperature rises above a certain threshold). In this case, the operating system should take corrective measures to reduce the chip temperature. Corrective action can be handled by a software interrupt handler, which is a piece of code that handles the thermal condition and initiates corrective action. The operating system then waits for the thermal condition to dissipate before continuing normal operation. This usually requires the operating system to wait a specific amount of time and then poll the processor's temperature to determine if it is safe to continue normal operation. Using advanced thermal interrupt generation, the operating system can set up an interrupt to detect when the temperature drops below a certain threshold, eliminating the need to poll the current temperature register. The combination of thermal sensor interrupt mask register 424 (TS_IMR) and thermal sensor interrupt status register 422 (TS_ISR) of FIG. 4 makes it easier for the operating system to handle thermal events.

高级热中断产生可以在局部级别或全局级别上执行。也就是说,高级热中断产生可以单独地(局部)在特定DTS上执行或在诸如图4的DTS 404、406、408和410之类的所有(全局)DTS上执行。热传感器中断屏蔽寄存器的方向比特是BG和BX。中断方向定义了产生中断的条件。中断可以在温度从低于中断温度变到等于或高于中断温度时,或在温度从高于或等于中断温度变到低于中断温度时产生。热管理控制状态机用中断屏蔽寄存器中的方向比特BG和BX来标识条件。BG是全局方向比特。当BG设置为‘0’时,热管理控制状态机在任意DTS的温度大于或等于全局中断温度时产生中断。当BG设置为‘1’时,热管理控制状态机在所有DTS的温度都低于全局中断温度时产生中断。BX是局部方向比特,其中X是单独关联的DTS的数量。当BX设置为‘0’时,热管理控制状态机在单独DTS的温度大于或等于DTS中断温度时产生中断。当BX设置为‘1’时,热管理控制状态机在单独DTS的温度低于DTS中断温度时产生中断。热中断状态寄存器(TS_ISR)记录哪个传感器引起了高级热中断。软件读取该寄存器以确定发生了哪种状况以及哪个传感器或哪些传感器引起了中断。一旦被软件读取,热管理控制状态机就重置热中断状态寄存器中的状态比特。Advanced thermal interrupt generation can be performed on a local level or a global level. That is, advanced hot interrupt generation can be performed individually (locally) on a specific DTS or on all (global) DTSs such as DTSs 404, 406, 408 and 410 of FIG. The direction bits of the thermal sensor interrupt mask register are B G and B X . The interrupt direction defines the conditions under which an interrupt is generated. An interruption may occur when the temperature changes from below the interruption temperature to at or above the interruption temperature, or when the temperature changes from above or at the interruption temperature to below the interruption temperature. The thermal management control state machine identifies the condition with the direction bits B G and B X in the interrupt mask register. B G is the global direction bit. When B G is set to '0', the thermal management control state machine generates an interrupt when the temperature of any DTS is greater than or equal to the global interrupt temperature. When B G is set to '1', the thermal management control state machine generates an interrupt when the temperature of all DTSs is lower than the global interrupt temperature. B X is the local direction bit, where X is the number of individually associated DTSs. When B X is set to '0', the thermal management control state machine generates an interrupt when the temperature of an individual DTS is greater than or equal to the DTS interrupt temperature. When B X is set to '1', the thermal management control state machine generates an interrupt when the temperature of an individual DTS is lower than the DTS interrupt temperature. The thermal interrupt status register (TS_ISR) records which sensor caused the high-level thermal interrupt. Software reads this register to determine which condition occurred and which sensor or sensors caused the interrupt. Once read by software, the thermal management control state machine resets the status bit in the thermal interrupt status register.

因此,针对高级热中断产生的操作可以从全局和局部角度示出。图8A描述了全局高级热中断产生,图8B描述了局部高级热中断产生。随着操作在图8A的全局高级热中断产生中开始,热管理控制状态机将全局中断温度T设置为温度T1并将全局中断方向BG设置为‘0’(步骤802)。热管理控制状态机感应DTS的温度(步骤804)。热管理控制状态机确定是否有任意从DTS感应的温度大于或等于温度T1(步骤806)。如果没有感应温度大于或等于温度T1,那么操作返回到步骤804。如果在步骤806任意一个感应温度大于或等于温度T1,那么热管理控制状态机产生中断并设置热中断状态寄存器中的相应状态比特以记录哪个传感器或哪些传感器引起了中断(步骤808)。然后操作系统将为中断提供服务并且可以减缓处理器上的工作负载或将处理器的部分工作负载卸载到系统中的另一个处理器。Thus, operations generated for high-level thermal interrupts can be shown both globally and locally. FIG. 8A depicts the generation of a global high-level thermal interrupt, and FIG. 8B describes the generation of a local high-level thermal interrupt. As operation begins in the global advanced thermal interrupt generation of FIG. 8A, the thermal management control state machine sets the global interrupt temperature T to temperature T1 and the global interrupt direction BG to '0' (step 802). The thermal management control state machine senses the temperature of the DTS (step 804). The thermal management control state machine determines whether any temperature sensed from the DTS is greater than or equal to temperature T1 (step 806). If no sensed temperature is greater than or equal to temperature T1 , then operation returns to step 804 . If any sensed temperature is greater than or equal to temperature T1 at step 806, the thermal management control state machine generates an interrupt and sets the corresponding status bit in the thermal interrupt status register to record which sensor or sensors caused the interrupt (step 808). The operating system will then service the interrupt and can either slow down the workload on the processor or offload part of the processor's workload to another processor in the system.

在产生中断之后,热管理控制状态机将全局中断温度T设置为温度T2并将全局中断方向BG设置为‘1’(步骤810)。温度T2应当设置为小于或等于温度T1。热管理控制状态机再次感应DTS的温度(步骤812)。热管理控制状态机确定是否所有从DTS感应的温度都低于温度T2(步骤814)。如果没有感应温度低于温度T2,那么操作返回到步骤812。如果在步骤814所有感应温度都低于温度T2,那么热管理控制状态机产生中断并设置热中断状态寄存器中的相应状态比特以记录哪个传感器或哪些传感器引起了中断(步骤816)。此时,操作系统继续正常操作现在是安全的。然后操作系统将为中断提供服务并且将系统恢复到正常操作。下一步,操作返回到步骤802,其中全局中断温度T设置为温度T1并且全局中断方向BG设置为‘0’。After generating the interrupt, the thermal management control state machine sets the global interrupt temperature T to temperature T2 and the global interrupt direction BG to '1' (step 810). The temperature T2 should be set to be less than or equal to the temperature T1. The thermal management control state machine again senses the temperature of the DTS (step 812). The thermal management control state machine determines if all sensed temperatures from the DTS are below temperature T2 (step 814). If no sensed temperature is below temperature T2 , then operation returns to step 812 . If at step 814 all sensed temperatures are below temperature T2, then the thermal management control state machine generates an interrupt and sets the corresponding status bit in the thermal interrupt status register to record which sensor or sensors caused the interrupt (step 816). At this point, it is now safe for the operating system to continue normal operation. The operating system will then service the interrupt and restore the system to normal operation. Next, the operation returns to step 802, where the global breakout temperature T is set to temperature T1 and the global breakout direction BG is set to '0'.

该操作的一个示例是所有DTS都具有全局中断温度80℃和全局中断方向‘0’。一旦诸如处理器内核或处理器本身之类的关联的单元的任意DTS感应到大于或等于80℃的温度,热管理控制状态机就产生中断并设置热中断状态寄存器中的相应状态比特以记录哪个传感器或哪些传感器引起了中断。然后操作系统将为中断提供服务并且可以减缓处理器上的工作负载或将处理器的部分工作负载卸载到系统中的另一个处理器。同样,此时热管理控制状态机可以将全局中断温度重置为示例性的77℃并将全局中断方向设置为‘1’。工作负载将继续在慢模式下操作或保持不被处理器处理直到DTS感应到对于所有DTS都低于77℃的温度。一旦热管理控制状态机确定感应温度低于77℃,该热管理控制状态机就产生另一个中断。热管理控制状态机将全局中断温度设置为80℃,将全局中断方向设置为‘0’,然后操作系统继续对工作负载的正常操作。An example of this operation is that all DTSs have a global interrupt temperature of 80°C and a global interrupt direction of '0'. Once any DTS of an associated unit such as the processor core or the processor itself senses a temperature greater than or equal to 80°C, the thermal management control state machine generates an interrupt and sets the corresponding status bit in the thermal interrupt status register to record which The sensor or which sensors caused the interrupt. The operating system will then service the interrupt and can either slow down the workload on the processor or offload part of the processor's workload to another processor in the system. Also, at this point the thermal management control state machine may reset the global interrupt temperature to an exemplary 77°C and set the global interrupt direction to '1'. The workload will continue to operate in slow mode or remain unprocessed by the processor until the DTS senses a temperature below 77°C for all DTSs. Once the thermal management control state machine determines that the sensed temperature is below 77°C, the thermal management control state machine generates another interrupt. The thermal management control state machine sets the global interrupt temperature to 80°C, sets the global interrupt direction to '0', and then the operating system continues normal operation of the workload.

转到图8B,将说明性实施例限于一个DTS,但该说明性实施例对于每个DTS都是相同的。随着针对局部高级热中断产生的操作开始,热管理控制状态机将局部中断温度T设置为温度T3并将局部中断方向BX设置为‘0’(步骤852)。热管理控制状态机感应DTS的温度(步骤854)。热管理控制状态机确定从DTS感应的温度是否大于或等于温度T3(步骤856)。如果感应温度并不大于或等于温度T3,那么操作返回到步骤854。如果感应温度大于或等于温度T3,那么热管理控制状态机产生中断并设置热中断状态寄存器中的相应状态比特以记录哪个传感器或哪些传感器引起了中断(步骤858)。然后操作系统将为中断提供服务并且可以减缓处理器上的工作负载或将处理器的部分工作负载卸载到处理器内其他单元或卸载到系统中的另一个处理器。Turning to FIG. 8B, the illustrative embodiment is limited to one DTS, but the illustrative embodiment is the same for each DTS. As operations begin for local advanced thermal interrupt generation, the thermal management control state machine sets the local interrupt temperature T to temperature T3 and the local interrupt direction B X to '0' (step 852 ). The thermal management control state machine senses the temperature of the DTS (step 854). The thermal management control state machine determines whether the temperature sensed from the DTS is greater than or equal to temperature T3 (step 856). If the sensed temperature is not greater than or equal to temperature T3 , then operation returns to step 854 . If the sensed temperature is greater than or equal to temperature T3, the thermal management control state machine generates an interrupt and sets the corresponding status bit in the thermal interrupt status register to record which sensor or sensors caused the interrupt (step 858). The operating system will then service the interrupt and may slow down the workload on the processor or offload part of the processor's workload to other units within the processor or to another processor in the system.

在热管理控制状态机产生中断之后,热管理控制状态机将局部中断温度T设置为温度T4并将全局中断方向BX设置为‘1’(步骤860)。温度T4应当设置为小于或等于温度T3。热管理控制状态机再次感应DTS的温度(步骤862)。热管理控制状态机确定从DTS感应的温度是否低于温度T4(步骤864)。如果感应温度不低于温度T4,那么操作返回到步骤862。如果感应温度低于温度T4,那么热管理控制状态机产生中断并设置热中断状态寄存器中的相应状态比特以记录哪个传感器或哪些传感器引起了中断(步骤866)。此时,操作系统继续正常操作现在是安全的。然后操作系统将为中断提供服务并且将系统恢复到正常操作。下一步,操作返回到步骤852,其中热管理控制状态机将全局中断温度T设置为温度T3并且全局中断方向BX设置为‘0’。After the thermal management control state machine generates the interrupt, the thermal management control state machine sets the local interrupt temperature T to temperature T4 and the global interrupt direction B X to '1' (step 860). The temperature T4 should be set to be less than or equal to the temperature T3. The thermal management control state machine again senses the temperature of the DTS (step 862). The thermal management control state machine determines whether the temperature sensed from DTS is lower than temperature T4 (step 864). If the sensed temperature is not lower than temperature T4, then operation returns to step 862. If the sensed temperature is lower than temperature T4, the thermal management control state machine generates an interrupt and sets the corresponding status bit in the thermal interrupt status register to record which sensor or sensors caused the interrupt (step 866). At this point, it is now safe for the operating system to continue normal operation. The operating system will then service the interrupt and restore the system to normal operation. Next, operation returns to step 852, where the thermal management control state machine sets the global interrupt temperature T to temperature T3 and the global interrupt direction B X to '0'.

该操作的一个示例是给定的DTS具有局部中断温度80℃和局部中断方向‘0’。一旦关联的单元的DTS感应到大于或等于80℃的温度,热管理控制状态机就产生中断并设置热中断状态寄存器中的相应状态比特以记录哪个传感器或哪些传感器引起了中断。然后操作系统将为中断提供服务并且可以减缓处理器上的工作负载或将处理器的部分工作负载卸载到系统中的另一个处理器。同样,此时热管理控制状态机可以将局部中断温度重置为示例性的77℃并将局部中断方向设置为‘1’。工作负载将继续在慢模式下运行或保持在处理器单元之外直到DTS感应到低于77℃的温度。一旦热管理控制状态机确定感应温度低于77℃,该热管理控制状态机就产生另一个中断。热管理控制状态机将局部中断温度设置为80℃,将局部中断方向设置为‘0’,然后操作系统继续对工作负载的正常操作。An example of this operation is given a DTS with a local interruption temperature of 80°C and a local interruption direction of '0'. Once the associated cell's DTS senses a temperature greater than or equal to 80°C, the thermal management control state machine generates an interrupt and sets the corresponding status bit in the thermal interrupt status register to record which sensor or sensors caused the interrupt. The operating system will then service the interrupt and can either slow down the workload on the processor or offload part of the processor's workload to another processor in the system. Also, at this point the thermal management control state machine may reset the local interrupt temperature to an exemplary 77°C and set the local interrupt direction to '1'. The workload will continue to run in slow mode or remain off the processor unit until the DTS senses a temperature below 77°C. Once the thermal management control state machine determines that the sensed temperature is below 77°C, the thermal management control state machine generates another interrupt. The thermal management control state machine sets the local interrupt temperature to 80°C and the local interrupt direction to '0', and then the operating system continues normal operation of the workload.

这样,高级热中断产生使操作系统能够对中断产生进行编程以跟随温度变化的方向,并且消除了对中断处理程序的需要以在热中断的情况下持续轮询当前温度。In this way, advanced thermal interrupt generation enables the operating system to program interrupt generation to follow the direction of temperature change and eliminates the need for an interrupt handler to continuously poll the current temperature in the event of a thermal interrupt.

图9描述了根据另外的说明性实施例的用于在热管理系统中支持深度节能模式和部分良好的操作的流程图。如前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。在图3的Cell BE芯片300中,存在多种节能模式。根据每种节能模式的实现方式,一些节能模式可以限制诸如图4的DTS 404、406、408和410之类的DTS的可访问性。例如,如果诸如图3的SPU 310、311和312之类的SPU处于时钟关闭的节能模式,也就是说诸如图4的解串器462之类的解串器被禁用,那么诸如图4的串行器456之类的串行器与诸如图4的DTS 404之类的DTS之间的路径不起作用。节能模式的另一个示例可以是电源关闭的情况。在这种情况下,实际的DTS可能被禁用。另一个示例是热管理控制状态机确定处理器内的传感器或单元在制造测试期间是否坏掉的情况。如果传感器或单元是多余的,制造商可以将该传感器或单元标记为不合格,从而产生将只有有限数量的单元或传感器起作用的部分良好的处理器。在任一情况下,诸如图4的热管理控制状态机458之类的热管理控制状态机需要监控这些电源模式的状态并屏蔽掉不起作用的DTS使其不能参与热管理任务(调节、中断等)。FIG. 9 depicts a flow diagram for supporting deep power saving mode and partially benign operation in a thermal management system, according to an additional illustrative embodiment. As mentioned above, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 in FIG. 3 . In the Cell BE chip 300 in FIG. 3 , there are multiple energy-saving modes. Depending on how each power saving mode is implemented, some power saving modes may limit the accessibility of DTSs such as DTSs 404, 406, 408, and 410 of FIG. 4 . For example, if an SPU such as SPUs 310, 311, and 312 of FIG. The path between a serializer such as serializer 456 and a DTS such as DTS 404 of FIG. 4 does not work. Another example of a power saving mode may be a case where the power is turned off. In this case, actual DTS may be disabled. Another example is where the thermal management control state machine determines if a sensor or unit within the processor is broken during manufacturing testing. If a sensor or unit is redundant, the manufacturer can mark that sensor or unit as defective, resulting in a partially good processor that will have only a limited number of units or sensors functioning. In either case, a thermal management control state machine, such as thermal management control state machine 458 of FIG. ).

返回到图9,其描述了用于在热感应和热管理系统中支持深度节能模式和部分良好的操作的流程图。随着操作开始,热管理控制状态机使用来自各DTS的数据跟踪DTS的状态(步骤902)。热管理控制状态机将这些数据存储到内部校准存储器中,诸如图4的内部校准存储器428中。如前所述,节能模式、不合格的DTS或通过诸如图4的数据流460之类的数据流与热管理控制状态机通信的SPU可以禁止特定DTS的操作。由制造工艺报告的部分良好状况的效果与节能模式类似,只不过部分良好是永久条件并且应当永久地屏蔽掉DTS。在SPU被标记为不合格的情况下,热管理控制状态机关闭整个SPU,并禁用串行器。在DTS被标记为不合格的情况下,热管理控制状态机屏蔽掉该DTS。热管理控制状态机确定DTS或SPU是不合格还是在起作用(步骤904)。如果DTS或SPU不合格,则热管理控制状态机屏蔽掉DTS(步骤906),之后操作结束。Returning to FIG. 9 , a flowchart for supporting deep power saving modes and partially benign operation in a thermal sensing and thermal management system is depicted. As operation begins, the thermal management control state machine tracks the state of the DTS using data from each DTS (step 902). The thermal management control state machine stores these data into internal calibration memory, such as internal calibration memory 428 of FIG. 4 . As previously mentioned, power saving modes, disqualified DTSs, or the SPU communicating with the thermal management control state machine through a data stream such as data stream 460 of FIG. 4 may inhibit the operation of a particular DTS. The partial good condition reported by the fabrication process has similar effects to power save mode, except that partial good is a permanent condition and DTS should be permanently disabled. In the event that an SPU is marked as bad, the thermal management control state machine shuts down the entire SPU and disables the serializer. In the event that a DTS is marked as unqualified, the thermal management control state machine masks the DTS. The thermal management control state machine determines whether the DTS or SPU is faulty or functional (step 904). If the DTS or the SPU is unqualified, the thermal management control state machine masks the DTS (step 906), and then the operation ends.

为了屏蔽掉处于电源管理状态的DTS,热管理控制状态机将诸如图4的当前温度状态寄存器412之类的当前温度状态寄存器中的相关的当前温度状态寄存器重置为0x0,0x0是最低温度设置。另一种方法还可以是通过设置状态比特来分配相关的当前温度状态寄存器的编码,以表明DTS被屏蔽,这可以比只重置传感器读数更加精确。然后热管理控制状态机从当前温度状态寄存器停止去往和来自DTS的通信。停止通信是一个可选步骤,主要用于节能和不执行无用的开销工作。然后热管理控制状态机产生表明DTS现在被屏蔽并且不应当参与热管理任务的信号。最后,热管理控制状态机重置DTS的状态。当诸如处理器内核或处理器本身之类的与DTS相关的单元退出节能模式时,热管理控制状态机继续与DTS通信,继续对当前温度状态寄存器进行更新,并发送DTS可以参与热管理任务的信号。In order to shield the DTS in the power management state, the thermal management control state machine resets the relevant current temperature state register in the current temperature state register such as the current temperature state register 412 of FIG. 4 to 0x0, and 0x0 is the minimum temperature setting . An alternative could also be to assign the associated current temperature status register code by setting the status bit to indicate that DTS is masked, which can be more accurate than just resetting the sensor reading. The thermal management control state machine then stops communication to and from the DTS from the current temperature status register. Stopping communication is an optional step, primarily to save power and not perform useless overhead work. The thermal management control state machine then generates a signal indicating that the DTS is now disabled and should not participate in thermal management tasks. Finally, the thermal management control state machine resets the state of the DTS. When a DTS-related unit such as the processor core or the processor itself exits the energy-saving mode, the thermal management control state machine continues to communicate with the DTS, continues to update the current temperature status register, and sends a message that the DTS can participate in the thermal management task. Signal.

返回步骤904,如果DTS和SPU都起作用,则热管理控制状态机开始与DTS通信(步骤908)。热管理控制状态机监控SPU的电源管理状态以确定SPU何时进入节能模式(步骤910)。在SPU进入节能模式之前,操作返回到步骤908。如果SPU进入节能模式并且DTS被禁用,那么热管理控制状态机用上面结合步骤906而讨论的方法屏蔽掉DTS(步骤912)。由于表明了DTS是禁用还是在起作用,热管理控制状态机继续监控SPU的电源管理状态(步骤914)。在SPU退出节能模式之前,操作返回到步骤912。当SPU退出节能模式并且DTS不再被禁用时,热管理控制状态机开始与DTS通信,继续对当前温度状态寄存器进行更新,并发送DTS可以参与热管理任务的信号(步骤916),然后操作返回到步骤908。Returning to step 904, if both the DTS and the SPU are functional, the thermal management control state machine begins communicating with the DTS (step 908). The thermal management control state machine monitors the power management state of the SPU to determine when the SPU enters a power saving mode (step 910). Operation returns to step 908 before the SPU enters the power saving mode. If the SPU enters power saving mode and DTS is disabled, then the thermal management control state machine disables DTS using the method discussed above in connection with step 906 (step 912). With the indication of whether DTS is disabled or active, the thermal management control state machine continues to monitor the power management status of the SPU (step 914). Operation returns to step 912 before the SPU exits the power saving mode. When the SPU exits the energy-saving mode and the DTS is no longer disabled, the thermal management control state machine begins to communicate with the DTS, continues to update the current temperature status register, and sends a signal that the DTS can participate in the thermal management task (step 916), and then the operation returns Go to step 908.

这样,对部分良好、不合格或处于节能模式的DTS的温度读数的屏蔽隔离了不工作的或禁用的DTS使其不能参与热管理任务。In this way, masking the temperature readings of DTSs that are partially good, failing, or in power saving mode isolates inoperative or disabled DTSs from participating in thermal management tasks.

图10描述了根据另外的说明性实施例的针对使热感知软件应用的实时测试能够与温度相独立的热调节控制特征的操作的流程图。如前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。诸如图4的热管理控制寄存器430之类的热管理控制寄存器为各种热调节控制特征提供访问和配置。将热调节设计为通过在使用调节的热事件的情况下消减性能来降低温度。10 depicts a flow diagram of operations for a thermal throttling control feature that enables real-time testing of a thermal-aware software application independent of temperature, according to additional illustrative embodiments. As mentioned above, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 in FIG. 3 . Thermal management control registers, such as thermal management control register 430 of FIG. 4 provide access and configuration for various thermal throttling control features. Thermal throttling is designed to reduce temperature by penalizing performance with throttled thermal events.

诸如图4的热管理停止时间寄存器434之类的热管理停止时间寄存器,和诸如图4的热管理调节比例寄存器436之类的热管理调节比例寄存器一起设置调节量和调节行为。在实时系统中,需要保证实时期限。软件开发者和质量保证团队知道并测试最大调节量很重要,最大调节量是程序或代码段能够容忍并仍然保证实时系统的实时期限的热管理停止时间寄存器和热管理调节比例寄存器的最大设置。作为对调节硬件的实际温度以引发热事件并且因此触发调节条件的替代,热管理控制状态机提供了不管温度如何都始终提供调节的模式。热管理控制状态机在热管理控制寄存器中设置该模式,这将芯片设置为恒定调节状态。该特征帮助软件开发者进行测试并确保他们的代码满足实时标准。A thermal management stop time register, such as thermal management stop time register 434 of FIG. 4 , together with a thermal management throttling scale register, such as thermal management throttling scale register 436 of FIG. 4 , sets the throttling amount and throttling behavior. In real-time systems, real-time deadlines need to be guaranteed. It is important for software developers and quality assurance teams to know and test the maximum throttle, which is the maximum setting of the Thermal Management Stop Time Register and Thermal Management Throttle Scale Register that a program or code segment can tolerate and still guarantee the real-time deadline of the real-time system. Instead of adjusting the actual temperature of the hardware to cause a thermal event and thus trigger a throttling condition, the thermal management control state machine provides a mode that always provides throttling regardless of temperature. The thermal management control state machine sets this mode in the thermal management control register, which sets the chip into constant regulation. This feature helps software developers to test and ensure their code meets real-time standards.

随着操作开始,接收热管理停止时间寄存器和热管理调节比例寄存器的热控制设置(步骤1002)。热管理控制状态机使用热管理停止时间寄存器和热管理调节比例寄存器的设置来确定如何执行调节。然后,热管理控制状态机设置测试模式并将热管理控制寄存器设置为始终调节设置(步骤1004)。然后程序运行以进行实时确认,即软件或程序将在热管理停止时间寄存器和热管理调节比例寄存器的热控制设置下满足实时期限(1006)。测试模式可以是任意类型的调节模式,诸如始终调节或随机调节。然后热管理控制状态机确定是否满足实时期限(步骤1008)。如果不满足实时期限,则热管理控制状态机将当前热管理停止时间寄存器和热管理调节比例寄存器的热控制设置记录为失败(步骤1010)。然后热管理控制状态机确定是否有将降低调节量的任意新的热管理停止时间寄存器和热管理调节比例寄存器的热控制设置(步骤1012)。如果有新的热管理停止时间寄存器和热管理调节比例寄存器的热控制设置,则操作返回到步骤1002。如果在步骤1002没有任何新的热管理停止时间寄存器和热管理调节比例寄存器的热控制设置,则操作结束。As operation begins, thermal control settings for the Thermal Management Stop Time Register and the Thermal Management Throttle Scale Register are received (step 1002). The Thermal Management Control State Machine uses the settings of the Thermal Management Stop Time Register and the Thermal Management Throttle Scale Register to determine how throttling is performed. Then, the thermal management control state machine sets the test mode and sets the thermal management control registers to always throttle settings (step 1004). The program is then run for a real-time confirmation that the software or program will meet the real-time deadline under the thermal control settings of the Thermal Management Stop Time Register and Thermal Management Throttle Scale Register (1006). The test pattern can be any type of regulation pattern, such as constant regulation or random regulation. The thermal management control state machine then determines whether a real-time deadline is met (step 1008). If the real-time deadline is not met, the thermal management control state machine records the thermal control settings of the current thermal management stop time register and thermal management throttling scale register as failed (step 1010). The thermal management control state machine then determines if there are any new thermal management stop time registers and thermal management throttling scale register thermal control settings that will reduce the throttling (step 1012). If there are new thermal control settings for the thermal management stop time register and the thermal management scaling register, the operation returns to step 1002 . If at step 1002 there are no new thermal control settings for the Thermal Management Stop Time Register and Thermal Management Throttle Scale Register, then the operation ends.

返回到步骤1008,如果满足实时期限,则热管理控制状态机将当前热管理停止时间寄存器和热管理调节比例寄存器的热控制设置记录为通过(步骤1014)。热管理控制状态机确定是否有将增加调节量的任意新的热管理停止时间寄存器和热管理调节比例寄存器的热控制设置(步骤1016)。如果有新的热管理停止时间寄存器和热管理调节比例寄存器的热控制设置,则操作返回到步骤1002。如果在步骤1016没有任何新的热管理停止时间寄存器和热管理调节比例寄存器的热控制设置,则操作结束。Returning to step 1008, if the real-time deadline is met, the thermal management control state machine records the thermal control settings of the current thermal management stop time register and thermal management adjustment ratio register as pass (step 1014). The thermal management control state machine determines whether there are any new thermal management stop time registers and thermal management throttling scale register thermal control settings that will increase the throttling (step 1016). If there are new thermal control settings for the thermal management stop time register and the thermal management scaling register, the operation returns to step 1002 . If at step 1016 there are no new thermal control settings for the Thermal Management Stop Time Register and Thermal Management Throttle Scale Register, then the operation ends.

这样,提供始终调节的操作模式帮助软件开发者进行测试并确保他们的代码在最坏情况的热状况下也能满足实时期限。软件开发者和质量保证团队可以使用该特征来确定程序或代码段能够容许并仍然保证满足实时系统的实时期限的最大调节量。一旦热管理控制状态机确定并确认了最大调节量,软件就可以将中断设置为在全面调节发生的状况下发生。如果热管理控制状态机总是产生该中断,那么热管理控制状态机通知应用可能存在违反或不满足实时保证的情况。In this way, providing an always-on mode of operation helps software developers test and ensure their code meets real-time deadlines under worst-case thermal conditions. Software developers and quality assurance teams can use this feature to determine the maximum amount of adjustment that a program or piece of code can tolerate and still be guaranteed to meet the real-time deadlines of a real-time system. Once the thermal management control state machine has determined and validated the maximum amount of throttling, software can set an interrupt to occur when full throttling occurs. If the thermal management control state machine always generates this interrupt, the thermal management control state machine notifies the application that there may be a violation or non-satisfaction of real-time guarantees.

除了始终调节控制设置,实现方式还可以提供注入随机热事件或定向随机热事件的模式以对调节与软件执行的更具有真实感的交互进行仿真。该技术类似于在总线上随机注入错误以测试错误恢复代码。In addition to always adjusting control settings, implementations may also provide modes for injecting random thermal events or directing random thermal events to simulate a more realistic interaction of tuning and software execution. The technique is similar to randomly injecting errors on the bus to test error recovery code.

图11描述了根据另外的说明性实施例的用于实现对中断等待时间影响最小的热调节控制的操作的流程图。如前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。当计算机系统的任意部分被放置在调节条件下,该调节条件会降低整个系统的性能。性能的降低会根据要多久才能为该中断提供服务以及将为该中断提供多久的服务而增加中断等待时间。中断等待时间的增加在总体上对系统有严重的影响,因此期望并且有必要最小化热调节对中断等待时间的影响。最小化热调节因中断等待时间而产生的影响是针对诸如由图3的PPU 308进行的PPU调节控制的特征。诸如图3的SPU 310、311和312之类的SPU不会获得中断,因此不会被该特征影响。11 depicts a flowchart of operations for implementing thermal throttling control with minimal impact on interrupt latency, according to additional illustrative embodiments. As mentioned above, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 in FIG. 3 . When any part of a computer system is placed in a throttling condition, the throttling condition can degrade the performance of the entire system. The reduction in performance increases interrupt latency based on how long it will take to service the interrupt and how long the interrupt will be serviced. An increase in interrupt latency has a severe impact on the system as a whole, so it is desirable and necessary to minimize the impact of thermal throttling on interrupt latency. Minimizing the impact of thermal throttling due to interrupt latency is a feature for PPU throttling control such as that performed by PPU 308 of FIG. 3 . SPUs such as SPUs 310, 311 and 312 of FIG. 3 do not get interrupts and therefore are not affected by this feature.

随着操作开始,诸如图4的热管理控制状态机458之类的热管理控制状态机监控所有的PPU中断状态比特和热管理系统中断屏蔽寄存器,诸如图4的热管理系统中断屏蔽寄存器438(步骤1102)。热管理系统中断屏蔽寄存器控制对中断的屏蔽。热管理控制状态机确定是否有任意未屏蔽的未决中断(步骤1104)。如果没有未决中断或者有未决中断但是被屏蔽了,则操作返回到步骤1102。As operation begins, a thermal management control state machine such as thermal management control state machine 458 of FIG. 4 monitors all PPU interrupt status bits and thermal management system interrupt mask registers, such as thermal management system interrupt mask register 438 ( Step 1102). The thermal management system interrupt mask register controls the masking of interrupts. The thermal management control state machine determines whether there are any unmasked interrupts pending (step 1104). If there are no interrupts pending or there are interrupts pending but masked, then operation returns to step 1102 .

如果在步骤1104有未屏蔽的未决中断,则热管理控制状态机暂时禁用任何调节模式,不管是部分调节还是全面调节状态(步骤1106)。禁用调节模式使PPU能够在没有热调节效果引起的任何延迟的情况下暂时以全性能运行并处理任意未决中断。同样,热管理控制状态机监控所有的PPU中断状态和热管理系统中断屏蔽寄存器(步骤1108)。热管理控制状态机确定是否有任意未屏蔽的未决中断(步骤1110)。如果没有未决中断或者有未决中断但是被屏蔽了,则操作返回到步骤1108。当在步骤1110中断状态清除时,热管理控制状态机将PPU恢复到初始调节模式(步骤1112),并且操作返回到步骤1102。If there are unmasked interrupts pending at step 1104, the thermal management control state machine temporarily disables any throttling mode, whether partial throttling or full throttling state (step 1106). Disabling throttling mode enables the PPU to temporarily run at full performance and service any pending interrupts without any delay caused by thermal throttling effects. Likewise, the thermal management control state machine monitors all PPU interrupt status and thermal management system interrupt mask registers (step 1108). The thermal management control state machine determines whether there are any unmasked interrupts pending (step 1110). If there are no interrupts pending or there are interrupts pending but masked, then operation returns to step 1108 . When the interrupt status clears at step 1110 , the thermal management control state machine returns the PPU to the initial throttling mode (step 1112 ), and operation returns to step 1102 .

中断处理程序可以选择在中断处理程序例程的开头或结尾清除中断状态比特。中断处理程序可以位于诸如图3的Power处理器单元301之类的Power处理器单元或由Power处理器单元执行的软件中。如果中断处理程序选择在开头清除中断状态比特并且还希望避免PPU的任意性能降低,则中断处理程序可以在清除中断状态比特之前禁用热调节。也就是说,中断不引发控制寄存器中的变化。因此,调节仍然是启用的,但是在出现未屏蔽中断时由诸如图4的TMCU 402之类的热管理控制单元挂起。如果中断处理程序应当在对中断进行处理之前重置中断状态,则该处理程序应当将控制寄存器设置为禁用调节(或将调节量减小到可接受的级别),重置中断,为中断提供服务,然后重新启用调节或将调节量设回以前的级别。可以通过将诸如图4的热管理控制寄存器430之类的热管理控制寄存器设置为0XX来执行对热调节的示例性禁用,其中X是“无关位”(does not care)。在中断例程的结尾,中断处理程序应当将热管理控制寄存器设回它的初始值。如果中断处理程序在中断例程结尾清除中断状态比特,那么就不需要额外的工作,并且只要中断状态比特是激活的,热管理控制状态机就会将PPU保持在调节模式之外。An interrupt handler can choose to clear the interrupt status bit at the beginning or end of the interrupt handler routine. The interrupt handler may be located in a Power processor unit such as Power processor unit 301 of FIG. 3 or in software executed by the Power processor unit. If the interrupt handler chooses to clear the interrupt status bit at the beginning and also wishes to avoid any performance degradation of the PPU, the interrupt handler can disable thermal throttling before clearing the interrupt status bit. That is, interrupts do not cause changes in control registers. Thus, throttling is still enabled, but is suspended by a thermal management control unit such as TMCU 402 of FIG. 4 when an unmasked interrupt occurs. If an interrupt handler should reset the interrupt status before servicing the interrupt, the handler should set the control register to disable throttling (or reduce throttling to an acceptable level), reset the interrupt, and service the interrupt , then re-enable the throttling or set the throttling amount back to the previous level. An example disabling of thermal throttling may be performed by setting a thermal management control register, such as thermal management control register 430 of FIG. 4, to 0XX, where X is a "does not care". At the end of the interrupt routine, the interrupt handler should set the thermal management control registers back to their initial values. If the interrupt handler clears the interrupt status bit at the end of the interrupt routine, then no extra work is required and the thermal management control state machine will keep the PPU out of throttling mode as long as the interrupt status bit is active.

图12描述了根据另外的说明性实施例的用于热调节中的滞后的操作的流程图。如前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。热调节中的滞后是做出诸如调节或结束调节之类的改变与该变化的响应或效果之间的迟滞。例如,如果将调节点设置为75℃并将结束调节点设置为72℃,那么滞后范围是从75℃到72℃。图5描述了热调节滞后。12 depicts a flowchart of operations for hysteresis in thermal regulation, according to an additional illustrative embodiment. As mentioned above, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 in FIG. 3 . Hysteresis in thermal regulation is the lag between making a change such as a regulation or ending a regulation and the response or effect of that change. For example, if the set point is set to 75°C and the end set point is set to 72°C, then the hysteresis range is from 75°C to 72°C. Figure 5 depicts thermal regulation hysteresis.

诸如图4的热管理调节点寄存器432之类的热管理调节点寄存器提供两个温度设置:调节温度和结束调节温度。调节温度应当设置为高于结束调节温度。温度差异定义了调节温度和结束调节温度之间的滞后量,从而提供了可编程的滞后量。A thermal management throttling point register such as thermal management throttling point register 432 of FIG. 4 provides two temperature settings: a throttling temperature and an end throttling temperature. The throttling temperature should be set higher than the end throttling temperature. The temperature difference defines the amount of hysteresis between the regulation temperature and the end regulation temperature, thus providing a programmable hysteresis.

说明性地将以下讨论限于一个DTS,随着滞后热调节的操作开始,热管理控制状态机设置热管理调节点寄存器中的调节温度和结束调节温度(步骤1202)。热管理控制状态机感应DTS的温度(步骤1204)。热管理控制状态机确定从DTS感应的温度是否大于或等于调节温度(步骤1206)。如果感应温度并不大于或等于调节温度,则操作返回步骤1204。如果在步骤1206感应温度大于或等于调节温度,则热管理控制状态机初始化调节模式(步骤1208)。Illustratively limiting the following discussion to one DTS, as operation of hysteretic thermal throttling begins, the thermal management control state machine sets the throttling temperature and the end throttling temperature in the thermal management throttling point register (step 1202). The thermal management control state machine senses the temperature of the DTS (step 1204). The thermal management control state machine determines whether the temperature sensed from the DTS is greater than or equal to the regulated temperature (step 1206). If the sensed temperature is not greater than or equal to the adjusted temperature, the operation returns to step 1204 . If the sensed temperature is greater than or equal to the regulated temperature at step 1206, the thermal management control state machine initiates a regulated mode (step 1208).

同样,热管理控制状态机感应DTS的温度(步骤1210)。热管理控制状态机确定从DTS感应的温度是否大于或等于结束调节温度(步骤1212)。如果感应温度不小于结束调节温度,则操作返回步骤1210。如果在步骤1212感应温度小于结束调节温度,则热管理控制状态机禁用调节模式(步骤1214),操作返回到步骤1204。Likewise, the thermal management control state machine senses the temperature of the DTS (step 1210). The thermal management control state machine determines whether the temperature sensed from the DTS is greater than or equal to the end throttling temperature (step 1212). If the sensed temperature is not less than the end adjustment temperature, the operation returns to step 1210 . If the sensed temperature is less than the end throttling temperature at step 1212 , the thermal management control state machine disables throttling mode (step 1214 ) and operation returns to step 1204 .

这样,假定正确配置了热管理控制寄存器以允许调节模式,当温度上升到等于或高于调节温度时,热管理控制状态机使单元进入调节模式。热管理控制状态机将单元保持在调节模式下直到温度下降到结束调节温度以下。如果结束调节温度小于调节温度,那么所标识的滞后使单元能够在禁用调节模式之前充分冷却。没有滞后,单元可能会很频繁地进入和退出调节模式并降低调节的整体效率和处理器的效率。Thus, assuming the thermal management control registers are properly configured to allow throttling mode, the thermal management control state machine causes the unit to enter throttling mode when the temperature rises to or above the throttling temperature. The thermal management control state machine keeps the unit in throttling mode until the temperature drops below the end throttling temperature. If the end throttling temperature is less than the throttling temperature, the identified hysteresis allows the unit to cool down sufficiently before throttling mode is disabled. Without hysteresis, the unit could enter and exit regulation mode very frequently and reduce the overall efficiency of regulation and the efficiency of the processor.

可以通过阻断指令的分派来完成示例性处理器调节方法。如果调节被频繁地启动和禁用,那么可能会经常刷新处理器的流水线,从而降低处理能力。另一个示例性处理器调节方法可以通过减缓时钟频率来完成。An exemplary processor throttling method may be accomplished by blocking dispatch of instructions. If throttling is enabled and disabled frequently, the processor's pipeline may be flushed frequently, reducing processing power. Another exemplary processor throttling method can be accomplished by slowing down the clock frequency.

图13描述了根据另外的说明性实施例的用于实现热调节逻辑的操作的流程图。图13表示一个如以上附图所述的完整的热管理解决方案。如前所述,Cell BE芯片包括通过图3的普适逻辑单元351提供的热管理系统。诸如图4的TMCU 402之类的TMCU包括多个动态热管理寄存器。动态热管理寄存器是热管理控制寄存器、热管理调节点寄存器、热管理停止时间寄存器、热管理调节比例寄存器和热管理系统中断屏蔽寄存器,诸如图4的热管理控制寄存器430(TM_CR1和TM_CR2)、热管理调节点寄存器432(TM_TPR)、热管理停止时间寄存器434(TM_STR1和TM_STR2)、热管理调节比例寄存器436(TM_TSR)和热管理系统中断屏蔽寄存器438(TM_SIMR)。FIG. 13 depicts a flowchart of operations for implementing thermal throttling logic, according to an additional illustrative embodiment. Figure 13 shows a complete thermal management solution as described in the previous figures. As mentioned above, the Cell BE chip includes a thermal management system provided by the pervasive logic unit 351 in FIG. 3 . A TMCU, such as TMCU 402 of FIG. 4, includes a number of dynamic thermal management registers. The dynamic thermal management registers are thermal management control registers, thermal management throttling point registers, thermal management stop time registers, thermal management throttling ratio registers, and thermal management system interrupt mask registers, such as thermal management control registers 430 (TM_CR1 and TM_CR2) of FIG. Thermal Management Throttle Point Register 432 (TM_TPR), Thermal Management Stop Time Register 434 (TM_STR1 and TM_STR2), Thermal Management Throttle Scale Register 436 (TM_TSR), and Thermal Management System Interrupt Mask Register 438 (TM_SIMR).

热管理调节点寄存器设置针对DTS的调节点。可以在热管理调节点寄存器中设置两个独立的调节点,一个针对PPE,一个针对SPE。该寄存器中还包含用于启用调节和禁用调节或者停止PPE或SPE的温度点。当温度等于或关于调节点时开始对PPE或SPE的执行调节。当温度下降到禁用调节的温度以下时调节停止。如果温度达到全面调节温度或停止温度,则停止对PPE或SPE的执行。The thermal management throttling point register sets the throttling point for DTS. Two independent throttling points, one for the PPE and one for the SPE, can be set in the thermal management throttling point register. Also included in this register are temperature points for enabling and disabling throttling or stopping the PPE or SPE. Performed regulation of the PPE or SPE begins when the temperature is at or about the regulation point. Throttling stops when the temperature drops below the temperature at which throttling is disabled. Execution of PPE or SPE is stopped if the temperature reaches the full regulation temperature or the stop temperature.

热管理控制状态机使用热管理停止时间寄存器和热管理调节比例寄存器来控制调节频率和调节量。当温度达到调节点时,热管理控制状态机将相应的PPE或SPE停止由在热管理调节比例寄存器中的相应比例值指定的时钟数。然后热管理控制状态机使PPE或SPE能够运行由在热管理停止时间寄存器中的运行值乘以相应的比例值所指定的时钟数。该序列持续到温度下降到禁用调节以下。The thermal management control state machine uses the thermal management stop time register and the thermal management throttling ratio register to control the throttling frequency and throttling amount. When the temperature reaches the throttling point, the thermal management control state machine will stop the corresponding PPE or SPE for the number of clocks specified by the corresponding scale value in the thermal management throttling scale register. The thermal management control state machine then enables the PPE or SPE to run for the number of clocks specified by the run value in the thermal management stop time register multiplied by the corresponding scale value. This sequence continues until the temperature drops below the disabled regulation.

热管理控制状态机使用热管理系统中断屏蔽寄存器来选择哪些中断在中断未决的同时禁用对PPE的调节。The Thermal Management Control State Machine uses the Thermal Management System Interrupt Mask Register to select which interrupts disable throttling of the PPE while the interrupt is pending.

热管理控制寄存器独立地为每个PPE或SPE设置调节模式。下面是可以独立地为每个PPE或SPE设置的五种不同的模式:The thermal management control register sets the throttling mode independently for each PPE or SPE. Below are five different modes that can be set independently for each PPE or SPE:

禁用动态调节(包括内核停止安全性);Disable dynamic scaling (including kernel stop safety);

正常操作(启用动态调节和内核停止安全性);Normal operation (dynamic scaling and kernel halted safety enabled);

始终调节PPE或SPE(启用内核停止安全性);Always throttle PPE or SPE (kernel stop safety enabled);

禁用内核停止安全性(启用动态调节并且禁用内核停止安全性);disable kernel stop safety (enable dynamic scaling and disable kernel stop safety);

始终调节PPE或SPE并且禁用内核停止安全性。Always tune PPE or SPE and disable kernel stop safety.

作为用于实现热调节逻辑的操作,热管理控制状态机设置热管理调节点寄存器中的调节温度和结束调节温度(步骤1302)。热管理控制状态机感应DTS的温度(步骤1304)。热管理控制状态机确定从DTS感应的温度是否大于或等于调节温度(步骤1306)。如果感应温度并不大于或等于调节温度,则操作返回步骤1304。如果感应温度大于或等于调节温度,则热管理控制状态机初始化调节模式(步骤1308)。As an operation for implementing the thermal throttling logic, the thermal management control state machine sets the throttling temperature and the end throttling temperature in the thermal management throttling point register (step 1302). The thermal management control state machine senses the temperature of the DTS (step 1304). The thermal management control state machine determines whether the temperature sensed from the DTS is greater than or equal to the regulated temperature (step 1306). If the sensed temperature is not greater than or equal to the adjusted temperature, the operation returns to step 1304 . If the sensed temperature is greater than or equal to the regulated temperature, the thermal management control state machine initiates a regulated mode (step 1308).

然后,热管理控制状态机通过如热管理控制寄存器中所表明的值所表示的调节类型来控制调节(步骤1310)。一旦表明了调节模式,热管理控制状态机就通过在热管理停止时间寄存器中所表明的调节量来限制调节(步骤1312)。停止时间寄存器设置处理器将停止的时间与将允许处理器运行的时间之间的比值或调节百分比。最后,热管理控制状态机通过在热管理比例寄存器中指定的值来缩放停止的持续时间和运行时间(步骤1314)。此时操作分为并发的操作,即步骤1316和步骤1322。在步骤1316,热管理控制状态机感应DTS的温度。热管理控制状态机确定从DTS感应的温度是否小于调节温度(步骤1318)。如果感应温度不小于结束调节温度,则操作返回步骤1316。如果DTS小于结束调节温度,则热管理控制状态机禁用调节模式(步骤1320),操作返回到步骤1304。The thermal management control state machine then controls throttling by the throttling type represented by the value indicated in the thermal management control register (step 1310). Once the throttling mode is indicated, the thermal management control state machine limits throttling by the amount of throttling indicated in the thermal management stop time register (step 1312). The stall time register sets the ratio, or throttling percentage, between the time the processor will stall and the time the processor will be allowed to run. Finally, the thermal management control state machine scales the stalled duration and run time by the value specified in the thermal management scale register (step 1314). At this time, the operation is divided into concurrent operations, that is, step 1316 and step 1322 . At step 1316, the thermal management control state machine senses the temperature of the DTS. The thermal management control state machine determines if the temperature sensed from the DTS is less than the regulated temperature (step 1318). If the sensed temperature is not less than the end regulation temperature, the operation returns to step 1316 . If DTS is less than the end throttling temperature, the thermal management control state machine disables throttling mode (step 1320 ) and operation returns to step 1304 .

返回到步骤1314,在实现最终的调节限制之后,热管理控制状态机并发地监控任意未决中断的所有PPU中断状态(步骤1322)。如果在实现调节时遇到中断,则热管理控制状态机暂时禁用任何调节模式直到中断已被处理,于是不管是部分调节状态还是全面调节状态都启用调节并且操作返回到步骤1308。参考图11进行对监控中断状态的深入讨论。Returning to step 1314, after the final throttling limit is achieved, the thermal management control state machine concurrently monitors the status of all PPU interrupts for any pending interrupts (step 1322). If an interruption is encountered while implementing throttling, the thermal management control state machine temporarily disables any throttling mode until the interrupt has been serviced, at which point throttling is enabled regardless of the partial or full throttling state and operation returns to step 1308 . Refer to FIG. 11 for an in-depth discussion of monitoring interrupt status.

这样,Cell BE芯片所包括的热管理系统的热中断逻辑提供了一种动态手段来管理Cell BE芯片的热状态和保护Cell BE芯片及它的组件。Thus, the thermal interrupt logic of the thermal management system included in the Cell BE chip provides a dynamic means to manage the thermal state of the Cell BE chip and protect the Cell BE chip and its components.

说明性实施例可以采取全硬件实施例、全软件实施例或既包含硬件单元又包含软件单元的实施例的形式。说明性实施例在软件中实现,该软件包括但不限于固件、驻留软件、微代码等。The illustrative embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. The illustrative embodiments are implemented in software, including but not limited to firmware, resident software, microcode, and the like.

此外,说明性实施例可以采取可以从计算机可用或计算机可读的介质访问的计算机程序产品的形式,该计算机程序产品提供程序代码以供计算机或任意指令执行系统使用或结合计算机或任意指令执行系统而使用。为了该描述的目的,计算机可用或计算机可读的介质可以是任意能够包含、存储、传送、传播或传输程序以供指令执行系统、设备或装置使用或结合指令执行系统、设备或装置而使用的有形设备。Furthermore, the illustrative embodiments may take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system And use. For the purposes of this description, a computer-usable or computer-readable medium is any medium that can contain, store, communicate, propagate or transport a program for use by or in connection with an instruction execution system, device or device tangible equipment.

介质可以是电子、磁、光、电磁、红外线或半导体系统(或者设备或装置)或传播媒介。计算机可读介质的示例包括半导体或固态存储器、磁带、可移动计算机磁盘、随机存取存储器(RAM)、只读存储器(ROM)、硬磁盘和光盘。光盘的当前示例包括压缩光盘-只读存储器(CD-ROM)、光盘-读/写(CD-R/W)和DVD。A medium may be an electronic, magnetic, optical, electromagnetic, infrared or semiconductor system (or device or arrangement) or a propagation medium. Examples of computer readable media include semiconductor or solid state memory, magnetic tape, removable computer disk, random access memory (RAM), read only memory (ROM), hard disk, and optical disk. Current examples of optical disks include compact disk - read only memory (CD-ROM), compact disk - read/write (CD-R/W) and DVD.

适合于存储和/或执行程序代码的数据处理系统将包括至少一个直接地或通过系统总线间接地与存储单元连接的处理器。存储单元可以包括在程序代码的实际执行期间使用的本地存储器、大容量存储器和高速缓冲存储器,为了减小在执行期间从大容量存储器获取代码的次数,高速缓冲存储器提供对至少一部分程序代码的临时存储。A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. Storage units may include local memory used during actual execution of the program code, bulk storage, and cache memory. In order to reduce the number of code fetches from bulk storage during execution, the cache memory provides temporary access to at least a portion of the program code. storage.

输入/输出或I/O设备(包括但不限于键盘、显示器、指示设备等等)能够直接地或通过中间I/O控制器与系统连接。Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be connected to the system either directly or through intervening I/O controllers.

网络适配器也可以与系统连接以使数据处理系统能够通过中间专用网络或公共网络与其他数据处理系统或者远程打印机或存储设备连接。调制解调器、电缆调制解调器和以太网卡正是几种当前可用类型的网络适配器。Network adapters may also be connected to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.

已经为了说明和描述的目的而提出了对说明性实施例的描述,并且该描述并非旨在穷举或将本发明限制为所公开形式的说明性实施例。很多修改和变更对本领域的普通技术人员来说将是很明显的。选择和描述实施例是为了最好地说明说明性实施例的原理、实际应用并使本领域的普通技术人员能够针对具有适合于所考虑的特定用途的各种修改的各种实施例来理解说明性实施例。The description of the illustrative embodiments has been presented for purposes of illustration and description, and is not intended to be exhaustive or to limit the invention to the illustrative embodiments in the form disclosed. Many modifications and alterations will be apparent to those of ordinary skill in the art. The embodiments were chosen and described in order to best explain the principles of the illustrative embodiments, the practical application, and to enable others of ordinary skill in the art to understand the description for various embodiments with various modifications as are suited to the particular use contemplated. sexual embodiment.

Claims (24)

1. computer implemented method that is used in the thermal conditioning control of integrated circuit test real-time software comprises:
Receive at least one heat control setting;
Use described at least one heat control that heat management system is set and be set to test pattern, wherein said test pattern shows the thermal conditioning control of using described heat control to be provided with;
Under described test pattern, carry out described real-time software;
Whether test is satisfied the real-time time limit that is associated with described real-time software under described test pattern;
As described real-time software is satisfied the response in described real-time time limit, described at least one heat control setting is recorded as the heat control setting of passing through; And
As described real-time software is not satisfied the response in described real-time time limit, described at least one heat control setting is recorded as the heat control setting of failure.
2. method according to claim 1, is provided with step, execution in step, testing procedure and recording step and is carried out by the heat management control state machine that resides in the described integrated circuit wherein said receiving step.
3. method according to claim 1 further comprises:
Determine whether also to exist at least one will increase the heat control setting of regulated quantity, wherein under this heat control is provided with, can test described real-time software; And
As response, use the regulated quantity that increases to carry out and test for the second time described real-time software to existing the described heat control that will increase regulated quantity to be provided with.
4. method according to claim 1 further comprises:
Determine whether also to exist at least one will reduce the heat control setting of regulated quantity, wherein under this heat control is provided with, can test described real-time software; And
As response, use the regulated quantity that reduces to carry out and test for the second time described real-time software to existing the described heat control that will reduce regulated quantity to be provided with.
5. method according to claim 1, wherein said test pattern are specified the adjustment state all the time of described heat management system.
6. method according to claim 1, wherein said test pattern are specified the adjustment state at random of described heat management system.
7. method according to claim 6, wherein said adjustment state are at random injected at random incident heat with the interaction of analog regulation and the execution of software more realistically.
8. method according to claim 1, wherein said heat control setting comprise the regulated quantity that will be initialised and at duration of the described test pattern that will be initialised.
9. method according to claim 8, the wherein said regulated quantity that will be initialised are the number percent of the time of the time that stops of unit and the operation of described unit.
10. method according to claim 9, the time of time that wherein said unit stops and the operation of described unit is to come convergent-divergent according to the value of ratio register.
11. method according to claim 8, the wherein said duration at the described test pattern that will be initialised is the actual clock periodicity that the unit stops and moving.
12. method according to claim 1, the wherein said heat control setting of passing through is stored in the data structure.
13. method according to claim 1, the heat control setting of wherein said failure is stored in the data structure.
14. method according to claim 1, wherein said test pattern is provided with in the heat management control register.
15. method according to claim 1, wherein said integrated circuit are heterogeneous multi-core processors.
16. a data handling system that is used in the thermal conditioning control of integrated circuit test real-time software comprises:
Be used to receive the device that at least one heat control is provided with;
Be used to use described at least one heat control that the device that heat management system is set to test pattern is set, wherein said test pattern shows the thermal conditioning control of using described heat control to be provided with;
Be used under described test pattern, carrying out the device of described real-time software;
For use in testing for whether satisfying the device in the real-time time limit that is associated with described real-time software under the described test pattern; And
Be used for as described real-time software is satisfied the response in described real-time time limit, described at least one heat control setting is recorded as the heat control setting of passing through, and, described at least one heat control setting is recorded as the device of the heat control setting of failure as described real-time software is not satisfied the response in described real-time time limit.
17. system according to claim 16 also comprises:
Be used to determine whether also to exist the device of at least one heat control that will increase regulated quantity setting, wherein under this heat control is provided with, can test described real-time software; And
Be used for using the regulated quantity that increases to carry out and test for the second time the device of described real-time software as response to existing the described heat control that will increase regulated quantity to be provided with.
18. system according to claim 16 also comprises:
Be used to determine whether also to exist the device of at least one heat control that will reduce regulated quantity setting, wherein under this heat control is provided with, can test described real-time software; And
Be used for using the regulated quantity that reduces to carry out and test for the second time the device of described real-time software as response to existing the described heat control that will reduce regulated quantity to be provided with.
19. system according to claim 16, wherein said test pattern is specified at least one in the adjustment state at random of the adjustment state all the time of described heat management system or described heat management system.
20. incident heat injects at random with the interaction of analog regulation and the execution of software more realistically in system according to claim 19, wherein said adjustment state at random.
21. system according to claim 16, wherein said heat control setting comprises the regulated quantity that will be initialised and at duration of the described test pattern that will be initialised.
The number percent of the time of time that 22. system according to claim 21, the wherein said regulated quantity that will be initialised are the unit to be stopped and the operation of described unit.
23. system according to claim 22, the time of time that wherein said unit stops and the operation of described unit is to come convergent-divergent according to the value of ratio register.
24. system according to claim 21, the wherein said duration at the described test pattern that will be initialised is the actual clock periodicity that the unit stops and moving.
CNB2007101054865A 2006-06-21 2007-06-01 Method, system and processor for testing thermal regulation control of real-time software Active CN100533344C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US11/425,483 US7512513B2 (en) 2005-11-29 2006-06-21 Thermal throttling control for testing of real-time software
US11/425,483 2006-06-21

Publications (2)

Publication Number Publication Date
CN101093413A CN101093413A (en) 2007-12-26
CN100533344C true CN100533344C (en) 2009-08-26

Family

ID=38991698

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2007101054865A Active CN100533344C (en) 2006-06-21 2007-06-01 Method, system and processor for testing thermal regulation control of real-time software

Country Status (2)

Country Link
JP (1) JP5186137B2 (en)
CN (1) CN100533344C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5751633B2 (en) * 2012-03-29 2015-07-22 Necプラットフォームズ株式会社 Semiconductor integrated circuit device, semiconductor integrated circuit control method, and control parameter generation method
CN102662914B (en) * 2012-04-25 2015-01-28 上海交通大学 Method for configuring heat sensor of microprocessor
JP6065579B2 (en) * 2012-12-25 2017-01-25 富士通株式会社 Information processing apparatus, cooling control method for information processing apparatus, and cooling control program
US9632841B2 (en) * 2014-05-29 2017-04-25 Mediatek Inc. Electronic device capable of configuring application-dependent task based on operating behavior of application detected during execution of application and related method thereof
CN106708010B (en) * 2016-11-29 2019-10-22 北京长城华冠汽车科技股份有限公司 A test device and test system for an electric vehicle thermal management system
CN107908178A (en) * 2017-12-06 2018-04-13 湖南航天远望科技有限公司 A kind of automatic temperature control test system and test method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3604742B2 (en) * 1994-09-02 2004-12-22 株式会社ルネサステクノロジ Simulation equipment for circuit verification
JPH08297614A (en) * 1995-04-27 1996-11-12 Fujitsu Ltd Magnetic disk device evaluation method and device
JPH08314748A (en) * 1995-05-22 1996-11-29 Fujitsu Ltd Temperature tester
JP2005321949A (en) * 2004-05-07 2005-11-17 Seiko Epson Corp Computer startup method, startup device, and computer system
US20060161375A1 (en) * 2004-12-30 2006-07-20 Allen Duberstein Optimizing processing speed based on measured temperatures

Also Published As

Publication number Publication date
JP5186137B2 (en) 2013-04-17
CN101093413A (en) 2007-12-26
JP2008004095A (en) 2008-01-10

Similar Documents

Publication Publication Date Title
US7957848B2 (en) Support of deep power savings mode and partial good in a thermal management system
US7721128B2 (en) Implementation of thermal throttling logic
US7376532B2 (en) Maximal temperature logging
US9097590B2 (en) Tracing thermal data via performance monitoring
US7480585B2 (en) Tracing thermal data via performance monitoring
US7480586B2 (en) Thermal interrupt generation
US7512513B2 (en) Thermal throttling control for testing of real-time software
CN100517176C (en) Data processing system and method implementing thermal regulation logic
US7603576B2 (en) Hysteresis in thermal throttling
US7756666B2 (en) Generation of hardware thermal profiles for a set of processors
US7681053B2 (en) Thermal throttle control with minimal impact to interrupt latency
JP2007193775A (en) Computer implemented method which carries out scheduling using software and hardware thermal profiles, data processing system, and computer program
JP2007183925A (en) Method for executing analytical generation of software thermal profile by computer, data processing system and computer program
JP2007200285A (en) Method executed by computer generating software thermal profile for application in simulated environment, data processing system, and computer program
CN100533344C (en) Method, system and processor for testing thermal regulation control of real-time software
US7395174B2 (en) Generation of software thermal profiles executed on a set of processors using thermal sampling

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant