[go: up one dir, main page]

CN104331387B - microprocessor and configuration method thereof - Google Patents

microprocessor and configuration method thereof Download PDF

Info

Publication number
CN104331387B
CN104331387B CN201410431423.9A CN201410431423A CN104331387B CN 104331387 B CN104331387 B CN 104331387B CN 201410431423 A CN201410431423 A CN 201410431423A CN 104331387 B CN104331387 B CN 104331387B
Authority
CN
China
Prior art keywords
core
block
microprocessor
cores
microcode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410431423.9A
Other languages
Chinese (zh)
Other versions
CN104331387A (en
Inventor
G·葛兰·亨利
史蒂芬·嘉斯金斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Via Technologies Inc
Original Assignee
Via Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/281,709 external-priority patent/US9971605B2/en
Application filed by Via Technologies Inc filed Critical Via Technologies Inc
Publication of CN104331387A publication Critical patent/CN104331387A/en
Application granted granted Critical
Publication of CN104331387B publication Critical patent/CN104331387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Power Sources (AREA)

Abstract

The invention provides a microprocessor and a configuration method thereof. The microprocessor includes an instruction and a plurality of processing cores. Each processing core of the plurality of processing cores is configured to sample the indication. When the indication indicates a first preset value, the processing cores are configured to commonly designate the processing cores of the processing cores as a steering processor. When the indication indicates a second predetermined value different from the first predetermined value, the processing cores are configured to collectively designate a single processing core of the processing cores as the steering processor. The invention has less power consumption.

Description

微处理器及其配置方法Microprocessor and its configuration method

技术领域technical field

本发明有关于一微处理器,且特别有关于在多核中指定多核为导引处理器的选择性指定。The present invention relates to a microprocessor, and more particularly to the selective designation of multiple cores among multiple cores as the lead processor.

背景技术Background technique

多核微处理器的增加,主要是因为其提供了在性能上的优势。可能主要是由于半导体装置几何维度大小迅速的减少,从而增加了晶体管密度。在一微处理器中多核的存在已产生与一核与其它核通信的需求,以完成各种功能,例如电源管理、高速缓冲存储器管理、除错及与更多核相关的配置。The increase in multi-core microprocessors is mainly due to the advantages in performance they provide. This may be primarily due to the rapid reduction in the size of semiconductor device geometries, thereby increasing transistor density. The presence of multiple cores in a microprocessor has created a need for one core to communicate with other cores to perform various functions such as power management, cache memory management, debugging and configuration related to more cores.

传统上,运行在多核处理器上架构的程序(例如,操作系统或应用程序)已使用位于由所有核架构上可寻址的一系统存储器中的信号量进行通信。这可能足够用于许多目的,但可能无法提供其它所需的速度、准确度及/或系统层级透明度。Traditionally, programs (eg, operating systems or applications) running on architectures on multi-core processors have communicated using semaphores located in a system memory architecturally addressable by all cores. This may be sufficient for many purposes, but may not provide other desired speed, accuracy and/or system level transparency.

发明内容Contents of the invention

本发明提供一种微处理器。上述一微处理器包括一指示及多个处理核。上述多个处理核的每一处理核被配置为取样上述指示。当上述指示指示一第一预设值时,上述多个处理核被配置为共同指定上述多个处理核的多个处理核为一导引处理器(BootstrapProcessor,BSP)。当上述指示指示不同于上述第一预设值的一第二预设值时,上述多个处理核被配置为共同指定上述多个处理核的一单一处理核为上述导引处理器。The invention provides a microprocessor. The above-mentioned microprocessor includes an instruction and multiple processing cores. Each processing core of the plurality of processing cores is configured to sample the indication. When the indication indicates a first preset value, the plurality of processing cores are configured to jointly designate a plurality of processing cores of the plurality of processing cores as a Bootstrap Processor (BSP). When the indication indicates a second default value different from the first default value, the plurality of processing cores are configured to jointly designate a single processing core of the plurality of processing cores as the guide processor.

本发明提供一种配置多核微处理器的方法。上述方法包括:取样上述微处理器的一指示;当上述指示指示一第一预设值时,指定上述多核微处理器中上述多个处理核的多个处理核为一导引处理器;以及当上述指示指示不同于上述第一预设值的一第二预设值时,指定上述多个处理核的一单一处理核为上述导引处理器。The invention provides a method of configuring a multi-core microprocessor. The method includes: sampling an indication of the microprocessor; when the indication indicates a first preset value, specifying a plurality of processing cores of the plurality of processing cores in the multi-core microprocessor as a guide processor; and When the indication indicates a second default value different from the first default value, designate a single processing core of the plurality of processing cores as the guide processor.

本发明提供一种在用于一计算机装置中至少一非暂态计算机可用介质所编码的计算机程序产品,上述计算机程序产品包括指示一微处理器的计算机可用程序码。上述计算机可用程序码包括:指示一指示的第一程序码;以及指示多个处理核的第二程序码。上述多个处理核每一处理核被配置为取样上述指示。当上述指示指示一第一预设值时,上述多个处理核被配置为共同指定上述多个处理核的多个处理核为一导引处理器(BootstrapProcessor,BSP)。当上述指示指示不同于上述第一预设值的一第二预设值时,上述多个处理核被配置为共同指定上述多个处理核的一单一处理核为上述导引处理器。The invention provides a computer program product encoded on at least one non-transitory computer usable medium used in a computer device, said computer program product comprising computer usable program code indicating a microprocessor. The computer-usable program code includes: first program code indicating an indication; and second program code indicating a plurality of processing cores. Each of the plurality of processing cores is configured to sample the indication. When the indication indicates a first preset value, the plurality of processing cores are configured to jointly designate a plurality of processing cores of the plurality of processing cores as a Bootstrap Processor (BSP). When the indication indicates a second default value different from the first default value, the plurality of processing cores are configured to jointly designate a single processing core of the plurality of processing cores as the guide processor.

本发明具有更少的功率消耗。The present invention has less power consumption.

附图说明Description of drawings

图1是显示一多核微处理器的方块图。FIG. 1 is a block diagram showing a multi-core microprocessor.

图2是显示一控制字、一状态字及一配置字的方块图。FIG. 2 is a block diagram showing a control word, a status word and a configuration word.

图3是显示一控制单元操作的流程图。Fig. 3 is a flowchart showing the operation of a control unit.

图4是显示另一实施例的微处理器的一方块图。FIG. 4 is a block diagram showing another embodiment of a microprocessor.

图5是显示一微处理器操作以转储调试信息的流程图。FIG. 5 is a flowchart showing the operation of a microprocessor to dump debug information.

图6是显示一根据图5流程图中微处理器的操作示例时序图。FIG. 6 is a timing diagram showing an example of the operation of the microprocessor according to the flowchart of FIG. 5 .

图7A~7B是显示一微处理器执行跨核高速缓冲控制操作的流程图。7A-7B are flowcharts showing a microprocessor performing cross-core cache control operations.

图8是显示根据图7A~7B流程图的微处理器操作例子的时序图。FIG. 8 is a timing chart showing an example of the operation of the microprocessor according to the flowcharts of FIGS. 7A to 7B.

图9是显示微处理器进入一低功率封装C-状态的操作流程图。9 is a flowchart showing the operation of a microprocessor entering a low power package C-state.

图10是显示根据图9流程图一微处理器操作例子的时序图。FIG. 10 is a timing chart showing an example of the operation of a microprocessor according to the flowchart of FIG. 9. FIG.

图11是根据本发明另一实施例的微处理器进入一低功率封装C-状态的操作流程图。FIG. 11 is a flowchart of the operation of a microprocessor entering a low power package C-state according to another embodiment of the present invention.

图12是显示根据图11流程图的微处理器操作一例子的时序图。FIG. 12 is a timing chart showing an example of the operation of the microprocessor according to the flowchart of FIG. 11. Referring to FIG.

图13是显示根据图11流程图的微处理器操作另一例子的时序图。FIG. 13 is a timing chart showing another example of the operation of the microprocessor according to the flowchart of FIG. 11. Referring to FIG.

图14是显示微处理器的动态重新配置的流程图。Figure 14 is a flowchart showing dynamic reconfiguration of a microprocessor.

图15是显示根据另一实施例中微处理器动态重新配置的流程图。FIG. 15 is a flowchart illustrating dynamic reconfiguration of a microprocessor according to another embodiment.

图16是显示根据图15流程图的微处理器操作一例子的时序图。FIG. 16 is a timing chart showing an example of the operation of the microprocessor according to the flowchart of FIG. 15. FIG.

图17是显示在图1中硬件信号量118的一方块图。FIG. 17 is a block diagram showing the hardware semaphore 118 in FIG. 1 .

图18是显示当一核102读取硬件信号量118的操作流程图。FIG. 18 is a flowchart showing the operation when a core 102 reads the hardware semaphore 118 .

图19是显示当一核写入硬件信号量的操作流程图。FIG. 19 is a flowchart showing the operation when a core writes to a hardware semaphore.

图20是显示当微处理器使用硬件信号量以执行需一资源独占所有权的操作流程图。FIG. 20 is a flow chart showing the operation of a microprocessor using hardware semaphores to perform operations requiring exclusive ownership of a resource.

图21是显示根据图3流程图的核发出非睡眠同步请求操作一例子的时序图。FIG. 21 is a sequence diagram showing an example of an operation of issuing a non-sleep synchronization request by a core according to the flowchart of FIG. 3 .

图22是显示配置微处理器的一程序流程图。Fig. 22 is a flowchart showing a program for configuring the microprocessor.

图23是显示根据另一实施例中配置微处理器的一程序流程图。FIG. 23 is a flow chart showing a procedure for configuring a microprocessor according to another embodiment.

图24是显示根据另一实施例的一多核微处理器的方块图。FIG. 24 is a block diagram showing a multi-core microprocessor according to another embodiment.

图25是显示一微码修补架构的方块图。Figure 25 is a block diagram showing a microcode patch architecture.

图26A~26B是显示图24中该微处理器以传播图25的一微码修补至该微处理器的多核的一操作流程图。26A-26B are flowcharts showing an operation of the microprocessor in FIG. 24 to propagate a microcode patch of FIG. 25 to the multi-cores of the microprocessor.

图27是显示根据图26A~26B流程图的一微处理器操作的一例子的时序图。Fig. 27 is a timing chart showing an example of the operation of a microprocessor according to the flowcharts of Figs. 26A to 26B.

图28是显示根据另一实施例的一多核微处理器的方块图。FIG. 28 is a block diagram showing a multi-core microprocessor according to another embodiment.

图29A~29B是显示根据另一实施例的图28中该微处理器用以传播一微码修补至该微处理器的多个核的一操作流程图。29A-29B are flowcharts illustrating an operation of the microprocessor in FIG. 28 for propagating a microcode patch to multiple cores of the microprocessor according to another embodiment.

图30是显示图24的微处理器用以修补一服务处理器程序码的流程图。FIG. 30 is a flowchart showing the microprocessor of FIG. 24 for patching a service processor program code.

图31是显示根据另一实施例的一多核微处理器的方块图。FIG. 31 is a block diagram showing a multi-core microprocessor according to another embodiment.

图32是显示图31中该微处理器用以传播一MTRR更新至该微处理器的多个核的一操作流程图。32 is a flowchart showing an operation of the microprocessor of FIG. 31 for propagating a MTRR update to the cores of the microprocessor.

其中,附图中符号的简单说明如下:Among them, a brief description of the symbols in the drawings is as follows:

100:多核微处理器;102A、102B、102N:核A、核B、核N;103:非核;104:控制单元;106:状态暂存器;108A、108B、108C、108D、108N:同步暂存器;108E、108F、108G、108H:影子同步暂存器;114:熔断器;116:专用随机存取存储器;118:硬件信号量;119:共享高速缓冲存储器;122A、122B、122N:时脉信号;124A、124B、124N:中断信号;126A、126B、126N:数据信号;128A、128B、128N:电能控制信号;202:控制字;204:唤醒事件;206:同步控制;208:电源闸;212:睡眠;214:选择性唤醒;222:S;224:C;226:同步状态或C-状态;228:核集合;232:强迫同步;234:选择性同步中止;236:停用核;242:状态字;244:唤醒事件;246:最低常用C-状态;248:错误码;252:配置字;254-0~254-7:致能;256:本地核数量;258:晶体数量;302、304、305、306、312、314、316、318、322、326、328、332、334、336:步骤;402A、402B:晶体间总线单元A、晶体间总线单元B;404:晶体间总线;406A、406B:晶体A、晶体B;502、504、505、508、514、516、518、524、526、528、532:步骤;702、704、706、708、714、716、717、718、724、726、727、728、744、746、747、748、749、752:步骤;902、904、906、907、908、909、914、916、919、921、924:步骤;1102、1104、1106、1108、1109、1121、1124、1132、1134、1136、1137:步骤;1402、1404、1406、1408、1412、1414、1416、1417、1418、1422、1424、1426:步骤;1502、1504、1506、1508、1517、1518、1522、1524、1526、1532:步骤;1702:拥有位;1704:所有者位;1706:状态机1802、1804、1806、1808:步骤;1902、1904、1906、1908、1912、1914、1916、1918:步骤;2002、2004、2006、2008:步骤;2202、2203、2204、2205、2206、2208、2212、2214、2216、2218、2222、2224:步骤;2302、2304、2305、2306、2312、2315、2318、2324:步骤;2404:核微码只读存储器;2408:非核微码修补随机存取存储器;2423:服务处理单元;2425:非核微码只读存储器;2439:修补可定址内容存储器;2497:服务处理单元起始地址暂存器2499:核随机存取存储器;2500:微码修补;2502:标头;2504:即时修补;2506:校对和;2508:CAM数据;2512:核PRAM修补;2514:校对和;2516:RAM修补;2518:非核PRAM修补;2522:校对和;2602、2604、2606、2608、2611、2612、2614、2616、2618、2621、2622、2624、2626、2628、2631、2632、2634、2652:步骤;2808:核修补RAM;2912、2916、2922、2932:步骤;3002、3004、3006:步骤;3102:存储器类型范围暂存器;3202、3204、3206、3208、3211、3212、3214、3216、3218、3252:步骤。100: multi-core microprocessor; 102A, 102B, 102N: core A, core B, core N; 103: non-core; 104: control unit; 106: status register; 108A, 108B, 108C, 108D, 108N: synchronization register 108E, 108F, 108G, 108H: shadow synchronous register; 114: fuse; 116: dedicated random access memory; 118: hardware semaphore; 119: shared cache memory; 122A, 122B, 122N: time Pulse signal; 124A, 124B, 124N: interrupt signal; 126A, 126B, 126N: data signal; 128A, 128B, 128N: power control signal; 202: control word; 204: wake-up event; 206: synchronous control; 208: power gate ; 212: Sleep; 214: Selective wakeup; 222: S; 224: C; 226: Synchronous state or C-state; 228: Core collection; 232: Forced synchronization; ;242: status word; 244: wakeup event; 246: lowest common C-state; 248: error code; 252: configuration word; 254-0~254-7: enable; 256: local core number; 258: crystal number ; 302, 304, 305, 306, 312, 314, 316, 318, 322, 326, 328, 332, 334, 336: steps; 402A, 402B: inter-crystal bus unit A, inter-crystal bus unit B; 404: crystal Inter-bus; 406A, 406B: crystal A, crystal B; 502, 504, 505, 508, 514, 516, 518, 524, 526, 528, 532: steps; 702, 704, 706, 708, 714, 716, 717 , 718, 724, 726, 727, 728, 744, 746, 747, 748, 749, 752: steps; 902, 904, 906, 907, 908, 909, 914, 916, 919, 921, 924: steps; 1102 , 1104, 1106, 1108, 1109, 1121, 1124, 1132, 1134, 1136, 1137: steps; 1402, 1404, 1406, 1408, 1412, 1414, 1416, 1417, 1418, 1422, 1424, 1426: steps; , 1504, 1506, 1508, 1517, 1518, 1522, 1524, 1526, 1532: step; 1702: own bit; 1704: owner bit; 1706: state machine 1802, 1804, 1806, 1808: step; 1902, 1904, 1906, 1908, 1912, 1914, 1916, 1918: steps; 2002, 20 04, 2006, 2008: steps; 2202, 2203, 2204, 2205, 2206, 2208, 2212, 2214, 2216, 2218, 2222, 2224: steps; 2302, 2304, 2305, 2306, 2312, 2315, 2318, 2324: Step; 2404: kernel microcode ROM; 2408: non-core microcode patch random access memory; 2423: service processing unit; 2425: non-core microcode ROM; 2439: patch addressable content memory; 2497: service processing unit Start address scratchpad 2499: kernel random access memory; 2500: microcode patch; 2502: header; 2504: instant patch; 2506: checksum; 2508: CAM data; 2512: kernel PRAM patch; 2514: checksum ;2516: RAM Patching; 2518: Non-Core PRAM Patching; 2522: Proofsumming; , 2652: step; 2808: kernel patched RAM; 2912, 2916, 2922, 2932: step; 3002, 3004, 3006: step; 3102: memory type range scratchpad; 3202, 3204, 3206, 3208, 3211, 3212, 3214, 3216, 3218, 3252: steps.

具体实施方式Detailed ways

下文为介绍本发明的最佳实施例。各实施例用以说明本发明的原理,但非用以限制本发明。本发明的范围当以权利要求书为准。The following describes the preferred embodiment of the present invention. Each embodiment is used to illustrate the principles of the present invention, but not to limit the present invention. The scope of the present invention should be determined by the claims.

请参照图1,其是显示一多核微处理器100的方块图。微处理器100包括多个处理核,标示为102A、102B至102N,其统称为多个处理核102,或简称多个核102,且被单独称为处理核102或简称核102。更佳地说,每一核102包括一或多个功能单元的管线(图未示出),其包括一指令高速缓冲存储器(instruction cache)、一指令转换单元或指令解码器,更佳地包括一微码(microcode)单元、暂存换名单元、保留站(Reservation station)、高速缓冲存储器、执行单元、存储器子系统及包括一排序缓冲器的引退单元(retire unit)。更佳地说,多个核102包括一超纯量(Superscalar)、非顺序执行(out-of-order execution)微体架构。在一实施例中,微处理器100是一x86架构微处理器,但在其它实施例中,微处理器100符合其它指令集的架构。Please refer to FIG. 1 , which is a block diagram showing a multi-core microprocessor 100 . The microprocessor 100 includes a plurality of processing cores, labeled 102A, 102B to 102N, collectively referred to as the plurality of processing cores 102 , or simply the plurality of cores 102 , and individually referred to as the processing cores 102 or simply the cores 102 . More preferably, each core 102 includes a pipeline of one or more functional units (not shown in the figure), which includes an instruction cache (instruction cache), an instruction conversion unit or an instruction decoder, and more preferably includes A microcode unit, temporary renaming unit, reservation station, cache memory, execution unit, memory subsystem, and retirement unit including a sort buffer. More preferably, the plurality of cores 102 includes a superscalar, out-of-order execution microarchitecture. In one embodiment, the microprocessor 100 is an x86 architecture microprocessor, but in other embodiments, the microprocessor 100 conforms to other instruction set architectures.

微处理器100也包括一耦接至上述多个核102的不同于上述多个核102的非核103。非核103包括一控制单元104、熔断器114、一专用随机存取存储器116(Private RandomAccess Memory,PRAM)以及一共享高速缓冲存储器119(Shared Cache Memory),例如,由多个核102所共享的一第二级(level-2,L2)和/或第三级(level-3,L3)高速缓冲存储器。每一核102配置用以通过一各自的地址/数据总线126从非核103读取数据/写入数据至非核103,核102提供一非架构地址空间(也视为专用或微架构地址空间)至非核103的共享资源。专用随机存取存储器116为专用或非架构的,也就是说其未在微处理器100的架构使用者程序地址空间中。在一实施例中,非核103包括仲裁逻辑(Arbitration Logic),其通过多个核102仲裁请求存取非核103的资源。The microprocessor 100 also includes an uncore 103 different from the plurality of cores 102 coupled to the plurality of cores 102 . The non-core 103 includes a control unit 104, fuses 114, a dedicated random access memory 116 (Private Random Access Memory, PRAM) and a shared cache memory 119 (Shared Cache Memory), for example, a shared by a plurality of cores 102 Second level (level-2, L2) and/or third level (level-3, L3) cache memory. Each core 102 is configured to read/write data from/to the non-core 103 via a respective address/data bus 126, and the core 102 provides a non-architectural address space (also referred to as a private or micro-architectural address space) to Uncore 103 shared resources. The dedicated random access memory 116 is dedicated or non-architectural, that is, it is not in the architectural user program address space of the microprocessor 100 . In one embodiment, the non-core 103 includes an arbitration logic (Arbitration Logic), which arbitrates requests to access resources of the non-core 103 through multiple cores 102 .

每一熔断器114是一电子装置,其可被烧断或不被烧断;当熔断器114不被烧断时,熔断器114具有低阻抗且易传导电流;当熔断器114被烧断时,熔断器114具有高阻抗且不容易传导电流。一检测电路与每一熔断器114相关联,以评估该熔断器114,例如,检测该熔断器114是否传导一高电流或低电压(不烧断,例如,逻辑为零、或清除(clear))或一低电流或高电压(烧断,例如,逻辑为一、或设置(set))。该熔断器114可在微处理器100的制造期间内被烧断,且在一些实施例中,一未烧断的熔断器114可在微处理器100制造后被烧断。更佳地说,一烧断的熔断器114是不可逆的。一熔断器114的例子为一多晶硅熔丝,其可在装置间施加一足够高的电压而烧断。一熔断器114的另一例子为镍–铬熔丝,其可使用一激光而烧断。更佳地说,感测电路电力开启感测熔断器114,并提供其评估至微处理器100的保存暂存器(Holding Register)中的一相应位。当微处理器100被重置解除时,多个核102(例如,微码)读取保存暂存器以决定所感测的熔断器114的值。在一实施例中,在微处理器100被重置解除前,已更新的值可经由一边界扫描输入扫描至保存暂存器,举例来说,像是一联合测试行为组织(Joint Test Action Group,JTAG)输入,以实质更新熔断器114的值。此用于测试和/或侦错目的,如在下方描述与图22和图23相关的实施例中特别有用。Each fuse 114 is an electronic device that can be blown or not blown; when the fuse 114 is not blown, the fuse 114 has low impedance and easily conducts current; when the fuse 114 is blown , the fuse 114 has high impedance and does not easily conduct current. A detection circuit is associated with each fuse 114 to evaluate the fuse 114, e.g., detect whether the fuse 114 conducts a high current or low voltage (not blown, e.g., logic zero, or clear) ) or a low current or high voltage (blow, eg logic one, or set). The fuse 114 can be blown during the manufacture of the microprocessor 100 , and in some embodiments, an unblown fuse 114 can be blown after the manufacture of the microprocessor 100 . More preferably, a blown fuse 114 is irreversible. An example of a fuse 114 is a polysilicon fuse that can be blown by applying a high enough voltage across the devices. Another example of a fuse 114 is a nickel-chrome fuse, which can be blown using a laser. More preferably, the sensing circuit powers on the sensing fuse 114 and provides its evaluation to a corresponding bit in the holding register (Holding Register) of the microprocessor 100 . When the microprocessor 100 is released from reset, the plurality of cores 102 (eg, microcode) read the save register to determine the value of the sensed fuse 114 . In one embodiment, before the microprocessor 100 is released from reset, the updated value may be scanned into a holding register via a boundary scan input, such as a Joint Test Action Group (JTAG), for example. , JTAG) input to substantially update the value of fuse 114. This is especially useful for testing and/or debugging purposes, as in the embodiments described below in relation to FIGS. 22 and 23 .

另外,在一实施例中,微处理器100包括与每一核102相关不同的本地高级可编程中断控制器(Advanced Programmable Interrupt Controller,APIC)(图未示出)。在一实施例中,本地高级可编程中断控制器架构地遵守加利福尼亚州(California)圣塔克拉拉(Santa Clara)的Intel公司在2012年5月Intel 64及IA-32架构软件开发人员手册3A中一本地高级可编程中断控制器的说明,特别是在第10.4节中。尤其是本地高级可编程中断控制器包括一高级可编程中断控制器ID及一包括导引处理器(Bootstrap Processor,BSP)旗标的高级可编程中断控制器基址暂存器,其产生及用途将更详细地描述的如下,尤其是与图14至图16有关的实施例。In addition, in one embodiment, the microprocessor 100 includes a different local Advanced Programmable Interrupt Controller (Advanced Programmable Interrupt Controller, APIC) associated with each core 102 (not shown). In one embodiment, the local Advanced Programmable Interrupt Controller architecturally complies with Intel Corporation, Santa Clara, California (Santa Clara), May 2012 in Intel 64 and IA-32 Architectures Software Developer's Handbook 3A A description of the Local Advanced Programmable Interrupt Controller, especially in Section 10.4. Especially the local advanced programmable interrupt controller includes an advanced programmable interrupt controller ID and an advanced programmable interrupt controller base address register including a bootstrap processor (Bootstrap Processor, BSP) flag, and its generation and use will be The embodiment is described in more detail below, especially in relation to FIGS. 14 to 16 .

控制单元104包括硬件、软件、或硬件和软件的组合。控制单元104包括一硬件信号量(Hardware Semaphore)118(详细地描述如下图17至图20)、一状态暂存器106、一配置暂存器112、和与每一核102各自对应的一同步暂存器108。更佳地说,每一非核103的实体在非架构地址空间内不同地址中可由每一核102所寻址,其该非架构地址空间能使微码读取及写入核102。The control unit 104 includes hardware, software, or a combination of hardware and software. The control unit 104 includes a hardware semaphore (Hardware Semaphore) 118 (described in detail in FIGS. scratchpad 108. More preferably, each uncore 103 entity is addressable by each core 102 at a different address within a non-architectural address space that enables microcode to read from and write to the core 102 .

每一同步暂存器108可由各自对应的核102写入。状态暂存器106由每一核102读取。配置暂存器112可(经由如下所述的图2的停用核位236)由每一核102读取及间接写入。控制单元104还可包括中断逻辑(图未示出),该中断逻辑生成至每一核102的对应的中断信号(interrupt signal,INTR)124,该中断信号由控制单元104产生以中断对应的核102。中断源响应该控制单元104产生至一核102的一中断信号124,且中断源可包括外部中断源(例如x86架构INTR、SMI、NMI中断来源)或总线事件(例如,x86架构式的总线信号STPCLK确立(assertion)或解除确立(de-assertion))。此外,每一核102可通过写入控制单元104传送一核间中断信号124至其它每一核102。更佳地说,除非另有说明,否则本文中所描述的核间中断信号为一核102的微码经由一微指令(microinrstuction)请求非架构核间中断信号,其不同于由系统软件经由一架构指令所请求的传统架构核间中断信号。最后,当一同步情况(Synchronization Condition)已经发生时,如下文所述(例如,请参阅图21及图3中的方块334),控制单元104可产生一中断信号124至核102(一同步中断信号)。控制单元104也产生一对应的时脉信号(CLOCK)122至每一核102,其中控制单元104可以选择性地关闭,且有效地使对应的核102进入睡眠并开启以唤醒核102来备份。控制单元104还产生一对应核的电能控制信号(PWR)128至每一核102,其选择性地控制对应的核102接收或不接收电能。因此,控制单元104可经由对应的电能控制信号128选择性地使一核102进入一更深的睡眠状态以关闭该核的电能,并重新开启电能至该核102以唤醒该核102。Each synchronous register 108 is writable by a respective corresponding core 102 . State register 106 is read by each core 102 . Configuration register 112 is readable and indirectly writable by each core 102 (via disable core bit 236 of FIG. 2 as described below). The control unit 104 may also include interrupt logic (not shown in the figure), which generates a corresponding interrupt signal (interrupt signal, INTR) 124 to each core 102, which is generated by the control unit 104 to interrupt the corresponding core 102. The interrupt source responds to the control unit 104 generating an interrupt signal 124 to a core 102, and the interrupt source may include an external interrupt source (for example, an x86 architecture INTR, SMI, NMI interrupt source) or a bus event (for example, an x86 architecture-style bus signal STPCLK assertion or de-assertion). In addition, each core 102 can send an inter-core interrupt signal 124 to each other core 102 through the write control unit 104 . More preferably, unless otherwise specified, the inter-core interrupt signal described herein is a non-architectural inter-core interrupt signal requested by the microcode of a core 102 via a microinstruction (microinrstuction), which is different from the system software via a Legacy architecture intercore interrupt signal requested by an architecture instruction. Finally, when a synchronization condition (Synchronization Condition) has occurred, as described below (for example, please refer to FIG. 21 and block 334 in FIG. 3), the control unit 104 can generate an interrupt signal 124 to the core 102 (a synchronization interrupt Signal). The control unit 104 also generates a corresponding clock signal (CLOCK) 122 to each core 102, wherein the control unit 104 can selectively shut down and effectively put the corresponding core 102 to sleep and turn on to wake the core 102 for backup. The control unit 104 also generates a corresponding core power control signal (PWR) 128 to each core 102 , which selectively controls the corresponding core 102 to receive or not receive power. Therefore, the control unit 104 can selectively cause a core 102 to enter a deeper sleep state to power off the core and re-enable power to the core 102 to wake up the core 102 via the corresponding power control signal 128 .

一核102可写入与其相应的、具有同步位集合(请参阅图2的S位222)的同步暂存器108中,上述操作被视为一同步请求(Synchronization Request)。更详细的说明描述如下,在一实施例中,该同步请求请求控制单元104使核102进入睡眠状态,并当一同步情况发生时和/或当一特定的唤醒事件发生时唤醒该核102。一同步情况发生在微处理器100中所有可启用(请参阅图2中的致能位254)的核102或可启用核102的一特定子集合(请参阅图2中的核集合栏位228)已写入相同的同步情况(详细说明于图2中C位224、同步情况或C-状态栏位226的一组合及核集合栏位228,S位222更详细地描述如下)至其对应的同步暂存器108时。为了响应一同步情况的发生,控制单元104同时唤醒正等待该同步情况的所有核102,即,已请求同步情况。在另一描述如下的实施例中,核102可以请求仅最后写入该同步请求的一核102被唤醒(请参阅图2的选择性唤醒位214)。在另一实施例中,同步请求不请求核102进入睡眠状态,相反地,同步请求请求控制单元104当同步情况发生时中断核102,更详细地描述如下,特别是图3和图21。A core 102 can write to its corresponding synchronization register 108 with a synchronization bit set (see S bit 222 in FIG. 2 ), and the above operation is regarded as a synchronization request (Synchronization Request). A more detailed description is as follows. In one embodiment, the sync request control unit 104 puts the core 102 into a sleep state and wakes up the core 102 when a sync condition occurs and/or when a specific wakeup event occurs. A synchronization condition occurs for all enabled (see enable bit 254 in FIG. 2 ) cores 102 in microprocessor 100 or a specific subset of enabled cores 102 (see core set field 228 in FIG. 2 ). ) has written the same synchronization condition (detailed in Figure 2 in a combination of C bit 224, synchronization condition or C-state field 226, and core set field 228, S bit 222 is described in more detail below) to its corresponding of the synchronous scratchpad 108. In response to the occurrence of a synchronization condition, the control unit 104 simultaneously wakes up all cores 102 that are waiting for the synchronization condition, ie, the synchronization condition has been requested. In another embodiment described below, a core 102 may request that only the last core 102 that wrote the sync request be woken up (see selective wakeup bit 214 of FIG. 2 ). In another embodiment, the sync request does not request the core 102 to go to sleep, but rather, the sync request requests that the control unit 104 interrupt the core 102 when a sync condition occurs, as described in more detail below, particularly in FIGS. 3 and 21 .

更佳地说,当控制单元104侦测一同步情况已发生时(由于最后写入同步请求至同步暂存器108中的最后核102),控制单元104使最后核102进入睡眠状态,例如,关闭传送至最后写入核102的时脉信号122,接着同时唤醒所有核102,例如,开启传送至所有核102的时脉信号122。在此方法中,所有核102皆精确地在相同的时钟周期(clock cycles)中被唤醒,例如,使其时脉信号122被开启。对于某些操作,例如除错(debugging),是特别有益的(请参阅图5中的实施例),其对于精确地在同一时钟周期唤醒核102是有益的。在一实施例中,非核103包括一单一锁相回路(Phase-locked Loop,PLL),其产生提供给核102的时脉信号122。在其它实施例中,微处理器100包括多个锁相回路,其产生提供至核102的时脉信号122。More preferably, when the control unit 104 detects that a sync condition has occurred (due to the last write of a sync request to the last core 102 in the sync register 108), the control unit 104 puts the last core 102 into a sleep state, e.g., The clock signal 122 transmitted to the last write core 102 is turned off, and then all cores 102 are woken up at the same time, eg, the clock signal 122 transmitted to all cores 102 is turned on. In this approach, all cores 102 are woken up at exactly the same clock cycles, eg, have their clock signals 122 turned on. It is especially beneficial for certain operations, such as debugging (see the embodiment in FIG. 5 ), which is beneficial for waking up the core 102 at exactly the same clock cycle. In one embodiment, the non-core 103 includes a single phase-locked loop (PLL), which generates the clock signal 122 provided to the core 102 . In other embodiments, the microprocessor 100 includes multiple phase-locked loops that generate the clock signal 122 that is provided to the core 102 .

控制、状态及配置字Control, Status and Configuration Words

请参照图2,其显示一控制字202、状态字242及一配置字252的一方块图。一核102写入控制字202的一值至图1的控制单元104的同步暂存器108,以产生一原子请求(atomicrequest),以请求进入睡眠状态和/或与微处理器100中所有其它核102或一特定子集合同步化(同步)。一核102读取该控制单元104中状态暂存器106所传送的该状态字242的一值,以决定本文所描述的状态信息。一核102读取该控制单元104中配置暂存器112所传送的该配置字252的一值,并使用该值,描述如下。Please refer to FIG. 2 , which shows a block diagram of a control word 202 , a status word 242 and a configuration word 252 . A core 102 writes a value of the control word 202 to the synchronous register 108 of the control unit 104 of FIG. Cores 102, or a specific subset, are synchronized (synchronized). A core 102 reads a value of the status word 242 transmitted from the status register 106 of the control unit 104 to determine the status information described herein. A core 102 reads a value of the configuration word 252 transmitted from the configuration register 112 of the control unit 104 and uses the value, as described below.

控制字202包括一唤醒事件栏位204、一同步控制栏位206以及一电源闸(PowerGate,PG)位208。该同步控制栏位206包括各种位或子栏位,其控制核102的睡眠和/或核102与其它核102的同步。同步控制栏位206包括一睡眠位212、一选择性唤醒(SEL WAKE)位214、一S位222、一C位224、一同步状态或C-状态栏位226、一核集合栏位228、一强迫同步位232、一选择性同步中止(kill)位234,以及核停用核位236。状态字242包括一唤醒事件栏位244、一最低常用C-状态栏位246及一错误码栏位248。该配置字252包括微处理器100的每一核102的一致能位254、一本地核数量栏位256及一晶体数量栏位258。The control word 202 includes a wakeup event field 204 , a synchronization control field 206 and a power gate (PG) bit 208 . The synchronization control field 206 includes various bits or subfields that control sleep of the core 102 and/or synchronization of the core 102 with other cores 102 . The synchronous control field 206 includes a sleep bit 212, a selective wake-up (SEL WAKE) bit 214, an S bit 222, a C bit 224, a sync state or C-state field 226, a core set field 228, A force sync bit 232 , an optional sync kill bit 234 , and a core disable core bit 236 . The status word 242 includes a wakeup event field 244 , a minimum common C-state field 246 and an error code field 248 . The configuration word 252 includes enable bits 254 for each core 102 of the microprocessor 100 , a local core number field 256 and a crystal number field 258 .

该控制字202的唤醒事件栏位204包括多个对应于不同事件的位。如果核102设置一位在唤醒事件栏位204中,当事件发生对应该位时,控制单元104将唤醒该核102(例如,开启时脉信号122至该核102)。当该核102已与在核集合栏位228中所指定的所有其它核同步时,则发生一唤醒事件。在一实施例中,核集合栏位228可指定微处理器100中所有核102;所有核102与即时(instant)核102共享一高速缓冲存储器(例如,一第二级(L2)高速缓冲及/或第三级(L3)高速缓冲);在相同半导体晶体中,所有核102为即时核102(参阅图4中描述一多晶体、多核微处理器100的实施例的一例子);或在其它半导体晶体中的所有核102为即时核102。一共享高速缓冲存储器的核集合102可视为一晶片(Slice)。其它唤醒事件的其它例子包括,但不局限于,一x86INTR、SMI、NMI、STPCLK的确立(assertion)或解除确立(de-assertion)及一核间中断(inter-core interrupt)。当一核102被唤醒时,其可读取在状态字242中的唤醒事件栏位244以决定该正活动唤醒事件。The wakeup event field 204 of the control word 202 includes a plurality of bits corresponding to different events. If the core 102 sets a bit in the wakeup event field 204, the control unit 104 will wake up the core 102 (eg, turn on the clock signal 122 to the core 102) when an event occurs corresponding to the bit. A wakeup event occurs when the core 102 has synchronized with all other cores specified in the core set field 228 . In one embodiment, the core set field 228 may specify all cores 102 in the microprocessor 100; all cores 102 share a cache (e.g., a second-level (L2) cache and and/or third-level (L3) cache); in the same semiconductor crystal, all cores 102 are real-time cores 102 (see FIG. 4 for an example of an embodiment of a multi-crystal, multi-core microprocessor 100); or in All cores 102 in other semiconductor crystals are instant cores 102 . A set of shared cache cores 102 can be regarded as a slice. Other examples of other wakeup events include, but are not limited to, an x86INTR, SMI, NMI, STPCLK assertion or de-assertion, and an inter-core interrupt. When a core 102 wakes up, it can read the wakeup event field 244 in the status word 242 to determine the active wakeup event.

如果核102设置该PG位208时,该控制单元104使核102进入睡眠状态后关闭至核102的电能(例如,经由该电能控制信号128)。当控制单元104随后恢复供电至核102时,控制单元104清除PG位208。PG位208的使用在如下图11至图13将有更详细地描述。If the core 102 sets the PG bit 208 , the control unit 104 puts the core 102 into a sleep state and shuts down power to the core 102 (eg, via the power control signal 128 ). When control unit 104 subsequently restores power to core 102 , control unit 104 clears PG bit 208 . The use of the PG bit 208 is described in more detail in FIGS. 11-13 below.

如果该核102设定睡眠位212或选择性唤醒位214时,控制单元104在核102写入使用指定在唤醒事件栏位204唤醒事件的同步暂存器108后,使核102进入睡眠状态。该睡眠位212和选择性唤醒位214互斥。当一同步情况发生时,它们之间的差别与控制单元104所采取的行动有关。若核102设置睡眠位212,当一同步情况发生时,则控制单元104将唤醒所有核102。反之,若一核102设置选择性唤醒位214,当一同步情况发生时,控制单元104将仅唤醒最后写入同步情况至其同步暂存器的核102。If the core 102 sets the sleep bit 212 or the selective wake-up bit 214 , the control unit 104 makes the core 102 enter the sleep state after the core 102 writes to the sync register 108 using the wake-up event specified in the wake-up event field 204 . The sleep bit 212 and the selective wakeup bit 214 are mutually exclusive. The difference between them is related to the action taken by the control unit 104 when a synchronization situation occurs. If the core 102 sets the sleep bit 212, the control unit 104 will wake up all the cores 102 when a synchronization situation occurs. Conversely, if a core 102 sets the selective wake-up bit 214 , when a sync condition occurs, the control unit 104 will only wake up the core 102 that last wrote the sync condition to its sync register.

如果核102并未置睡眠位212,也未设置选择性唤醒位214时,虽然控制单元104不会使核102进入睡眠状态,但当一同步情况发生时,控制单元104将不会唤醒核102。控制单元104仍将设置在指示一同步情况为正活动的唤醒事件栏位204的位,因此核102可以侦测该同步情况已经发生。许多可指定于该唤醒事件栏位204中的唤醒事件也可中断由该控制单元104所产生一中断信号至核102的来源。然而,若有要求,则核102的微码可遮蔽中断来源。如此,当核102被唤醒时,该微码可读取状态暂存器106决定一同步情况或一唤醒事件或两者是否发生。If the core 102 does not set the sleep bit 212 and does not set the selective wake-up bit 214, although the control unit 104 will not cause the core 102 to enter the sleep state, when a synchronization occurs, the control unit 104 will not wake up the core 102 . The control unit 104 will still set the bit in the wakeup event field 204 indicating that a sync condition is active, so the core 102 can detect that the sync condition has occurred. A number of wakeup events that can be specified in the wakeup event field 204 can also interrupt the source of an interrupt signal generated by the control unit 104 to the core 102 . However, the microcode of the core 102 can mask the source of the interrupt if desired. Thus, when the core 102 wakes up, the microcode can read the status register 106 to determine whether a synchronization condition or a wakeup event or both have occurred.

如果核102设置S位222,其请求控制单元104在一同步情况中同步。该同步情况在C位224、同步情况或C-状态栏位226的一些组合中及核集合栏位228中被指定。若C位224被设置时,C-状态栏位226指定一C-状态值;若C位224是清除的,同步情况栏位226指定一非C-状态同步情况。更佳地说,同步状态或C-状态栏位226的值包括一非负整数的有界集合。在一实施例中,该同步情况或C-状态栏位226为4位。当C位224为清除(clear)时,一同步情况发生在:一特定的核集合栏位228中的所有核102已经写入S位222集合和同步情况栏位226的相同值至同步暂存器108中。在一实施例中,同步情况栏位226的值对应一唯一的同步情况,例如,在下方所描述示范的实施例中各种的同步情况。当C位224被设置时,同步情况发生在在一特定的核集合栏位228中所有核102不论是否已写入该C-状态栏位226中相同的值、皆写入各自S位222集合至同步暂存器108中。在此情况下,控制单元104分发(post)该C-状态栏位226中的最低写入值至该状态暂存器106中的最低常用C-状态栏位246,该最低写入值可由一核102所读取,例如,通过在方块908中的主要核102或通过方块1108中最后写入/选择性地被唤醒核102所读取。在一实施例中,若核102在同步情况栏位226中指定一预设值(例如,所有位集合),此指示控制单元104以匹配即时核102与其它核102所指定的任一同步情况栏位226值。If the core 102 sets the S bit 222, it requests the control unit 104 to synchronize in a synchronous situation. The synchronization condition is specified in some combination of the C bit 224 , the synchronization condition or C-state field 226 , and the core set field 228 . The C-state field 226 specifies a C-state value if the C bit 224 is set; the sync condition field 226 specifies a non-C-state sync condition if the C bit 224 is cleared. More preferably, the value of the sync state or C-state field 226 includes a bounded set of non-negative integers. In one embodiment, the sync condition or C-status field 226 is 4 bits. When the C bit 224 is clear (clear), a sync condition occurs when all cores 102 in a particular core set field 228 have written the same value of the S bit 222 set and the sync condition field 226 to the sync scratchpad device 108. In one embodiment, the value of the synchronization condition field 226 corresponds to a unique synchronization condition, for example, various synchronization conditions in the exemplary embodiment described below. When the C-bit 224 is set, a synchronous condition occurs in which all cores 102 in a particular core-set field 228 write to their respective S-bit 222 sets whether or not they have written the same value in that C-status field 226 to the synchronous register 108. In this case, the control unit 104 distributes (post) the lowest written value in the C-state field 226 to the lowest common C-state field 246 in the state register 106, which can be determined by a Read by the core 102 , eg, by the primary core 102 in block 908 or by the last write/selectively woken core 102 in block 1108 . In one embodiment, if the core 102 specifies a default value (e.g., all bit sets) in the synchronization condition field 226, this instructs the control unit 104 to match any synchronization condition specified by the real-time core 102 and other cores 102 Field 226 value.

若核102设定强迫同步位232时,控制单元104将强迫所有正进行的同步请求被立即匹配。If the core 102 sets the forced sync bit 232, the control unit 104 will force all ongoing sync requests to be matched immediately.

一般来说,若任一核102因在唤醒事件栏位204中所指定的一唤醒事件所唤醒时,控制单元104通过清除在同步暂存器108中S位222来中止(kill)所有正进行的同步请求。然而,若核102设定该选择性同步中止位234时,控制单元104将中止只有因(非同步情况发生)唤醒事件所唤醒的核102所正进行的同步请求。In general, if any core 102 wakes up due to a wakeup event specified in the wakeup event field 204, the control unit 104 will stop (kill) all ongoing processes by clearing the S bit 222 in the sync register 108. synchronization request. However, if the core 102 sets the selective sync abort bit 234, the control unit 104 will abort the ongoing sync request of only the cores 102 awakened by the (asynchronous condition occurrence) wakeup event.

若两或多个核102在不同的同步情况下请求同步时,控制单元104认为这为一停顿(deadlock)情况。若两或多个核102将一值为设置(set)的S位222、一值为清除(clear)的C位224及同步情况栏位226中的不同值写入各自的同步暂存器108中时,两或多个核102则在不同的同步情况下请求同步。举例来说,若一核102将一值为设置(set)的S位222、一值为清除(clear)的C位224及一同步情况226的值7写入至同步暂存器108中,且另一核102将一值为设置(set)的S位222、一值为清除(clear)的C位224及一同步情况226值9写入至同步暂存器108中时,控制单元104则认为此为一停顿情况。此外,若一核102将一值为清除的C位224写入至其同步暂存器108中、而另一核102将一值为设置(set)的C位224写入至其同步暂存器108中,则控制单元104认为此为一停顿情况。为了响应一停顿情况,控制单元104中止所有正进行的同步请求,并唤醒所有在睡眠状态中的核102。控制单元104也分发(post)在状态暂存器106的错误码栏位248中的值,其状态暂存器106为可由核102读取以决定该停顿原因并采取适当行动的状态暂存器。在一实施例中,错误码248表示每一核102所写入的同步情况,该同步情况使每一核决定是否继续执行其动作的预定路线或延迟至另一核102。举例来说,若一核102写入一同步情况以执行一电源管理操作(例如,执行一x86MWAIT指令)以及另一核102写入一同步情况以执行一高速缓冲管理操作(例如,x86WBINVD指令),则计划执行该MWAIT指令的核102因MWAIT是一可选择的操作,而WBINVD是一强制性的操作而取消MWAIT指令,以延迟至另一正执行WBINVD指令的核102。在举另一例子,若一核102写入一同步情况以执行一除错操作(例如,转储除错状态(Dump debug state))以及另一核102写入一同步情况以执行一高速缓冲管理操作(例如,WBINVD指令)时,则计划进行WBINVD的核102通过储存WBINVD状态,等待转储除错发生及恢复WBINVD状态并执行WBINVD指令,以延迟至执行转储除错的核102。If two or more cores 102 request synchronization under different synchronization conditions, the control unit 104 regards this as a deadlock condition. If two or more cores 102 write different values in the S bit 222 with a value set (set), the C bit 224 with a clear value, and the synchronization status field 226 into their respective synchronization registers 108 In the middle, two or more cores 102 request synchronization under different synchronization conditions. For example, if a core 102 writes an S bit 222 whose value is set (set), a C bit 224 whose value is cleared, and a value 7 of a sync condition 226 into the sync register 108, And when another core 102 writes an S bit 222 whose value is set (set), a C bit 224 whose value is cleared (clear), and a synchronization condition 226 value 9 into the synchronization register 108, the control unit 104 This is considered to be a standstill situation. In addition, if one core 102 writes a C bit 224 with a value of clear into its synchronous register 108 and another core 102 writes a C bit 224 with a value of set into its synchronous register In the device 108, the control unit 104 regards this as a stall situation. In response to a stall condition, the control unit 104 aborts all ongoing synchronization requests and wakes up all cores 102 in the sleep state. Control unit 104 also posts the value in error code field 248 of status register 106, which is a status register that can be read by core 102 to determine the cause of the stall and take appropriate action . In one embodiment, the error code 248 represents a synchronization condition written by each core 102 that enables each core to decide whether to proceed with its predetermined route of action or delay to another core 102 . For example, if one core 102 writes a sync condition to perform a power management operation (e.g., executes an x86MWAIT instruction) and another core 102 writes a sync condition to perform a cache management operation (eg, x86WBINVD instruction) , then the core 102 that plans to execute the MWAIT instruction cancels the MWAIT instruction because MWAIT is an optional operation, and WBINVD is a mandatory operation, so as to delay to another core 102 that is executing the WBINVD instruction. In another example, if one core 102 writes a sync state to perform a debug operation (eg, dump debug state) and another core 102 writes a sync state to perform a cache During a management operation (eg, a WBINVD instruction), the core 102 that plans to perform WBINVD stores the WBINVD state, waits for the dump to occur and restores the WBINVD state, and executes the WBINVD instruction to delay until the core 102 that executes the dump.

在一单一晶体的实施例中晶体数量栏位258为零。在一多个晶体的实施例(例如,图4中),晶体数量栏位258指示哪一晶体由读取配置暂存器112的该核102所驻留。举例来说,在一二晶体的实施例中,该晶体被指定为0和1以及该晶体数量栏位258具有0或1的值。在一实施例中,举例来说,熔断器114选择性地被烧断以指定一晶体为0或1。Crystal number field 258 is zero in a single crystal embodiment. In a multiple crystal embodiment (eg, in FIG. 4 ), the crystal number field 258 indicates which crystal is resident by the core 102 that reads the configuration register 112 . For example, in the one-two crystal embodiment, the crystals are designated as 0 and 1 and the crystal number field 258 has a value of 0 or 1. In one embodiment, fuse 114 is selectively blown to designate a crystal as 0 or 1, for example.

本地核数量栏位256指示本地到正读取配置暂存器112的核102的晶体中核的数量。更佳地说,虽然具有一由所有核102所共享的单一配置暂存器112,然而控制单元104知道哪个核102正读取配置暂存器112,并根据一读取器在本地核数量栏位256中提供正确的值。这使得核102的微码知道位于同一晶体中其它核102间的本地核数量。在一实施例中,在微处理器100的非核103部分的一多路复用器选择适当的值,该适当的值可基于核102读取配置暂存器112而在配置字252的本地核数量栏位256中恢复。在一实施例中,选择性地烧断熔断器114操作与多路复用器一起恢复本地核数量栏位256的值。更佳地说,本地核数量栏位256的值是固定独立的,其在晶体中的核102为可使用的,如以下所描述的致能位254所指示。也就是说,即使在该晶体的一或多个核102被停用时,本地核数量栏位256的值维持固定。另外,核102的微码计算核102的整体核数量,该核102的整体核数量为一与配置相关的值,其用途详细描述如下。整体核数量指示微处理器100整体核102的核数量。核102通过使用晶体数量栏位258的值计算其整体核数量。例如,在一实施例中,微处理器100包括8个核102,平均分至两个具有晶体值0及1的晶体中,在每一晶体中,本地核数量栏位256恢复一0、1、2或3的值;在晶体值为1的核加上4即恢复本地核数量栏位256的值以计算其整体核数量。The local core count field 256 indicates the number of cores in the crystal local to the core 102 that is reading the configuration register 112 . More preferably, although there is a single configuration register 112 shared by all cores 102, the control unit 104 knows which core 102 is reading the configuration register 112, and based on a reader in the local core number field The correct value is provided in bit 256. This allows the microcode of a core 102 to know the number of local cores among other cores 102 in the same crystal. In one embodiment, a multiplexer in the non-core 103 portion of the microprocessor 100 selects the appropriate value, which may be local to the core in the configuration word 252 based on the core 102 reading the configuration register 112. Quantity field 256 is restored. In one embodiment, selectively blowing the fuse 114 operates in conjunction with the multiplexer to restore the value of the local core count field 256 . More preferably, the value of the local core number field 256 is fixed independently of which cores 102 in the crystal are available, as indicated by the enable bit 254 described below. That is, the value of the local core count field 256 remains fixed even when one or more cores 102 of the crystal are disabled. In addition, the microcode of the core 102 calculates the overall core quantity of the core 102, and the overall core quantity of the core 102 is a configuration-related value, and its usage is described in detail as follows. The overall core number indicates the core number of the overall cores 102 of the microprocessor 100 . The core 102 calculates its overall core count by using the value of the crystal count field 258 . For example, in one embodiment, the microprocessor 100 includes 8 cores 102 divided equally into two crystals having crystal values 0 and 1, in each crystal the local core count field 256 returns a 0, 1 , a value of 2 or 3; adding 4 to a core with a crystal value of 1 restores the value of the local core count field 256 to calculate its overall core count.

微处理器100的每一核102具有一配置字252对应致能位254,配置字252指示该核102是否被启用或停用。在图2中,致能位254分别用致能位254-x表示,其中x是该对应核102的整体核数量。图2中的例子假设微处理器100中具有八个核102,在图2及图4的例子中,致能位254-0指示具有整体核数量0的核102(例如,核A)是否被启用,致能位254-1指示具有整体核数量1的核102(例如,核B)是否被启用,致能位254-2指示具有整体核数量2的核102(例如,核C)是否被启用等等。因此,通过了解整体核数量,一核102的微码可由配置字252中决定微处理器100的哪一核102被停用以及哪一核102被启用。更佳地说,若该核102被启用时,则一致能位254被设定,若核102被停用时,则致能位254被清除。当该微处理器100被重新设定时,硬件自动地填入(populate)该致能位254。更佳地说,当微处理器100被制造指示一已给定的核102是否为启用,若是停用时,该硬件基于熔断器114选择性地被烧断而填入致能位254。举例来说,如果一已给定的核102被测试并发现其为故障时,一熔断器114可被烧断以清除该核102的致能位254。在一实施例中,一被烧断的熔断器114指示一核102为停用,并防止来自被提供至停用的核102的时脉信号。每一核102可将该停用核位236写入至其同步暂存器108中,以清除其致能位254,更多与图14至图16相关的细节将详细描述如下。更佳地说,清除致能位254不会阻止该核102执行指令,但会更新该配置暂存器112,并且,该核102必须设定一不同的位(图未示出),以防止该核本身执行指令,例如,使其电源被移除和/或关闭其时脉信号。对于一多晶体配置微处理器100(例如,图4),该配置暂存器112包括该微处理器100中所有核102的一致能位254,例如,所有核102不仅可是该本地晶体的核102,而且也可为该远端晶体的核102。更佳地说,在一多晶体配置的微处理器100中,当一核102写入至其同步暂存器108时,同步暂存器108的值被传递至对应另一晶体中的影子同步暂存器108的核102(请参阅图4),其中,若该停用核位236被设置,将造成一更新被传送至远端晶体配置暂存器112,使得本地和远端晶体配置暂存器112皆具有相同的值。Each core 102 of the microprocessor 100 has a configuration word 252 corresponding to an enable bit 254, and the configuration word 252 indicates whether the core 102 is enabled or disabled. In FIG. 2 , the enable bits 254 are represented by enable bits 254 -x respectively, where x is the overall core quantity of the corresponding core 102 . The example in FIG. 2 assumes that there are eight cores 102 in the microprocessor 100. In the examples of FIGS. Enabled, the enable bit 254-1 indicates whether the core 102 with an overall core number of 1 (eg, core B) is enabled, and the enable bit 254-2 indicates whether the core 102 with an overall core number of 2 (eg, core C) is enabled enable etc. Therefore, by knowing the overall core count, the microcode of a core 102 can determine from the configuration word 252 which core 102 of the microprocessor 100 is disabled and which core 102 is enabled. More preferably, if the core 102 is enabled, the enable bit 254 is set, and if the core 102 is disabled, the enable bit 254 is cleared. When the microprocessor 100 is reset, hardware automatically populates the enable bit 254 . More preferably, when the microprocessor 100 is manufactured to indicate whether a given core 102 is enabled or disabled, the hardware populates the enable bit 254 based on the fuse 114 being selectively blown. For example, if a given core 102 is tested and found to be faulty, a fuse 114 may be blown to clear the enable bit 254 for that core 102 . In one embodiment, a blown fuse 114 indicates a core 102 is disabled and prevents clock signals from being provided to the disabled core 102 . Each core 102 can write the disable core bit 236 into its synchronous register 108 to clear its enable bit 254. More details related to FIGS. 14-16 are described below. More preferably, clearing enable bit 254 will not prevent the core 102 from executing instructions, but will update the configuration register 112, and the core 102 must set a different bit (not shown) to prevent The core itself executes instructions, for example, to have its power removed and/or its clock signal turned off. For a multi-crystal configuration microprocessor 100 (e.g., FIG. 4), the configuration register 112 includes uniform enable bits 254 for all cores 102 in the microprocessor 100, e.g., all cores 102 may not only be cores of the local crystal 102, and may also be the nucleus 102 of the distal crystal. More preferably, in a multi-crystal microprocessor 100, when a core 102 writes to its sync register 108, the value of the sync register 108 is passed to the corresponding shadow sync register in the other crystal. The core 102 of the register 108 (see FIG. 4 ), wherein, if the disable core bit 236 is set, will cause an update to be sent to the remote crystal configuration register 112 so that the local and remote crystal configuration temporary The registers 112 all have the same value.

在一实施例中,配置暂存器112无法直接由一核102写入。然而,由一核102写入至该配置暂存器112将造成本地致能位254的值被传播到在一多晶体微处理器100中其它晶体的配置暂存器112中,例如,如图14中方块1406中的描述。In one embodiment, the configuration register 112 cannot be directly written by a core 102 . However, writing to the configuration register 112 by a core 102 will cause the value of the local enable bit 254 to be propagated to the configuration registers 112 of other crystals in a multi-crystal microprocessor 100, for example, as shown in FIG. 14 described in block 1406.

控制单元control unit

请参考图3,是显示一描述该控制单元104的流程图。流程开始于方块302。在方块302中,一核102写入一同步请求,例如,写入一控制字202至其同步暂存器108,其中该同步请求由控制单元104接收。在一多晶体配置微处理器100的情况下(例如,请参见图4),当一控制单元104的影子同步暂存器108接收由其它晶体406所传送的已传播同步暂存器108的值,该控制单元104根据图3有效地操作,例如,当该控制单元104从其本地核102其中之一接收一同步请求(方块302),除了该控制单元104使核102进入睡眠(例如,方块314)、或唤醒(在方块306、328或336)、或中断(在方块334)、或阻止核102在其本地晶体406的唤醒事件(方块326),还填入其本地状态暂存器106(方块318)。流程进行到方块304。Please refer to FIG. 3 , which shows a flowchart describing the control unit 104 . Flow begins at block 302 . In block 302 , a core 102 writes a synchronization request, eg, a control word 202 , into its synchronization register 108 , wherein the synchronization request is received by the control unit 104 . In the case of a multi-crystal configuration microprocessor 100 (see, for example, FIG. , the control unit 104 effectively operates according to FIG. 3, for example, when the control unit 104 receives a synchronization request from one of its local cores 102 (block 302), except that the control unit 104 puts the core 102 into sleep (e.g., block 302). 314), or wake up (at blocks 306, 328, or 336), or interrupt (at block 334), or prevent core 102 from waking up events (block 326) at its local crystal 406, and also populate its local state register 106 (block 318). Flow proceeds to block 304 .

在方块304中,该控制单元104检查在方块302中的该同步情况,以决定一停顿(deadlock)情况是否已发生,如上图2所描述。若是,则流程行进至方块306;否则,流程进行到判断方块312。In block 304, the control unit 104 checks the synchronization condition in block 302 to determine whether a deadlock condition has occurred, as described above in FIG. 2 . If yes, the process proceeds to block 306 ; otherwise, the process proceeds to decision block 312 .

在方块305中,该控制单元104侦测在同步暂存器108其中之一的唤醒事件栏位204的一唤醒事件的发生(除了在方块316中被侦测的一同步情况的发生之外)。如下方方块326中所描述,控制单元104可自动地阻止唤醒事件。控制单元104可以侦测该唤醒事件发生为一事件不同步(Event Asynchronous)时在方块302中写入一同步请求。流程也由方块305进行至方块306。In block 305, the control unit 104 detects the occurrence of a wake-up event in the wake-up event column 204 of one of the synchronization registers 108 (in addition to the occurrence of a sync condition detected in block 316) . As described in block 326 below, the control unit 104 may automatically prevent the wake-up event. The control unit 104 may write a synchronization request in block 302 when detecting that the wakeup event occurs as an event asynchronous (Event Asynchronous). The process also proceeds from block 305 to block 306 .

在方块306中,该控制单元104填入状态暂存器106,中止正进行的同步请求,并且唤醒任一睡眠的核102。如上所述,唤醒睡眠核102可包括恢复其功率。该核102接着可读取该状态暂存器106,特别是错误码248,以决定停顿的原因,并根据该冲突同步请求对应的优先顺序处理它,如上所描述。此外,该控制单元104中止所有正进行的同步请求(例如,清除在每一核102的同步暂存器105中的S位222),除非方块306是由方块305后达成且该选择性同步中止位234被设定时,在此种情况下,该控制单元104会中止仅由该唤醒事件所唤醒的核102正进行的同步请求。若方块306是由方块305后达成,则该核102可读取唤醒事件244栏位以决定所发生的唤醒事件。此外,若该唤醒事件是一未遮蔽(unmasked)的中断来源,则控制单元104将通过该中断信号124产生一中断请求至该核102。流程在方块306中结束。In block 306 , the control unit 104 fills the status register 106 , aborts the ongoing synchronization request, and wakes up any sleeping cores 102 . As noted above, waking the sleeping core 102 may include restoring its power. The core 102 can then read the status register 106, particularly the error code 248, to determine the cause of the stall, and process it according to the corresponding priority of the conflicting sync request, as described above. In addition, the control unit 104 aborts all ongoing synchronization requests (e.g., clears the S bit 222 in the synchronization register 105 of each core 102), unless block 306 is reached after block 305 and the selective synchronization is terminated When bit 234 is set, in this case, the control unit 104 will abort the ongoing synchronization request of only the cores 102 awakened by the wakeup event. If block 306 is achieved after block 305, the core 102 can read the wakeup event 244 field to determine the wakeup event that occurred. In addition, if the wakeup event is an unmasked interrupt source, the control unit 104 will generate an interrupt request to the core 102 through the interrupt signal 124 . Flow ends in block 306 .

在判断方块312中,该控制单元104决定睡眠位212或选择性唤醒位214是否被设定。若有,则流程进行至方块314;否则,流程进行至判断方块316。In decision block 312, the control unit 104 determines whether the sleep bit 212 or the selective wakeup bit 214 is set. If yes, the flow proceeds to block 314 ; otherwise, the flow proceeds to decision block 316 .

在方块314中,控制单元104使该核102进入睡眠状态。如上所述,使一核102进入睡眠状态可包括移除其电源。在一实施例中,作为一最佳化的例子,即使该PG位208被设定,若此为最后写入的核102(例如,将造成同步情况的发生),在方块314中,该控制单元104不移除该核102的电源,并且因该控制单元104将在方块328中即时唤醒最后写入的核102备份,因此该选择性唤醒位214被设定。在一实施例中,该控制单元104包括同步逻辑及睡眠逻辑,两者互相分开,但互相通信;此外,该每一同步逻辑和睡眠逻辑包括该同步暂存器108的一部分。有利的是,写入至该同步暂存器108的同步逻辑部分和写入到该同步暂存器108的睡眠逻辑部分是原子的(atomic),即不可分割的。也就是说,如果一部分写入发生时,其同步逻辑部分及睡眠逻辑部分皆保证会发生。更佳地说,该核102的管线阻塞,不允许任何更多的写入发生,直到其被保证写入至该同步暂存器108中的两个部分皆已发生为止。写入一同步请求并立即进入睡眠状态的优点是其不需要该核102(例如,微码)连续地运转以决定该同步情况是否已经发生。由于可以节省电力且不消耗其它资源,例如总线及/或存储器频宽,因此非常有益。值得注意的是,为了进入睡眠状态但无需请求与其它核102同步(例如,方块924和方块1124),该核102可以写入S位222为清除(Clear)及睡眠位212为设定(Set),在本文中称为一睡眠请求,至该同步暂存器108中;若在唤醒事件栏位204中所指定一未遮蔽的唤醒事件发生时(例如,方块305),但未寻找此核102一同步情况的发生(例如,方块316)时,在此种情况下,该控制单元104唤醒该核102(例如,方块306)。流程进行到判断方块316。In block 314, the control unit 104 puts the core 102 into a sleep state. As noted above, putting a core 102 to sleep may include removing its power source. In one embodiment, as an optimized example, even if the PG bit 208 is set, if this is the last core 102 written to (e.g., would cause a synchronous condition to occur), in block 314, the control Unit 104 does not remove power to the core 102 and since the control unit 104 will immediately wake up the last written core 102 backup in block 328 , the selective wakeup bit 214 is set. In one embodiment, the control unit 104 includes a sync logic and a sleep logic, which are separate from each other but communicate with each other; in addition, each of the sync logic and sleep logic includes a part of the sync register 108 . Advantageously, the sync logic portion written to the sync register 108 and the sleep logic portion written to the sync register 108 are atomic, ie indivisible. That is, if a portion of the write occurs, both the sync logic and the sleep logic are guaranteed to occur. More preferably, the pipeline of the core 102 is blocked, not allowing any more writes to occur until both parts of its guaranteed write to the sync register 108 have occurred. An advantage of writing a sync request and immediately going to sleep is that it does not require the core 102 (eg, microcode) to run continuously to determine whether the sync condition has occurred. This is beneficial because it saves power and does not consume other resources, such as bus and/or memory bandwidth. It is worth noting that, in order to enter the sleep state but without requesting synchronization with other cores 102 (for example, block 924 and block 1124), the core 102 can write the S bit 222 to clear (Clear) and the sleep bit 212 to set (Set ), referred to herein as a sleep request, to the sync register 108; if an unmasked wakeup event specified in the wakeup event field 204 occurs (e.g., block 305), but does not look for the core 102 when a synchronization condition occurs (eg, block 316 ), in which case the control unit 104 wakes up the core 102 (eg, block 306 ). Flow proceeds to decision block 316 .

在判断方块316中,该控制单元104决定一同步情况是否发生。若是,流程进行至方块318。如上所述,一同步情况可仅在S位222被设定时发生。在一实施例中,该控制单元104使用图2中该致能位254,其指示该微处理器100中哪些核102被启用,以及哪些核102被停用。该控制单元104仅寻找被启用的核102,以决定一同步情况是否发生。一核102可因其被测试且发现在生产时间中有缺陷而被停用。因此,一熔断器被烧断以使该核102无法操作并指示该核102被停用。一核102可因该核102所请求的软件而被停用(例如,请参阅图15)。举例来说,在一使用者请求时,BIOS写入一特殊模组暂存器(Model Specific Register,MSR)以请求该核102被停用,以响应该核102停止使用其本身(例如,通过该停用核位236),并通知其它核102读取其它核102决定停用该核102的配置暂存器112。一核102还可经由一微码来修补(patch)(例如,请参阅图14),该微码可通过烧断熔断器114产生和/或从系统存储器(例如一FLASH存储器)载入。除了决定一同步情况是否发生之外,该控制单元104检查该强迫同步位232。若为设置(set),流程则进行至方块318。若该强迫同步位232为清除(clear)且一同步情况尚未发生,则流程结束于方块316中。In decision block 316, the control unit 104 determines whether a synchronization condition occurs. If yes, the process proceeds to block 318 . As mentioned above, a synchronous condition can only occur when the S bit 222 is set. In one embodiment, the control unit 104 uses the enable bit 254 in FIG. 2 , which indicates which cores 102 in the microprocessor 100 are enabled and which cores 102 are disabled. The control unit 104 only looks for the enabled cores 102 to determine whether a synchronization condition occurs. A core 102 may be disabled because it was tested and found to be defective during production time. Accordingly, a fuse is blown to disable the core 102 and indicate that the core 102 is disabled. A core 102 may be disabled due to software requested by the core 102 (eg, see FIG. 15 ). For example, when a user requests, the BIOS writes a special module register (Model Specific Register, MSR) to request that the core 102 be disabled, in response to the core 102 stopping using itself (for example, by The disable core bit 236 ), and notify other cores 102 to read the configuration register 112 that other cores 102 decide to disable the core 102 . A core 102 can also be patched via a microcode (eg, see FIG. 14 ) that can be generated by blowing a fuse 114 and/or loaded from system memory (eg, a FLASH memory). In addition to determining whether a sync condition occurs, the control unit 104 checks the forced sync bit 232 . If it is set (set), the process proceeds to block 318 . If the forced sync bit 232 is clear and a sync condition has not occurred, the process ends at block 316 .

在方块318中,该控制单元104填入该状态暂存器106。明确的说,如果发生同步情况为所有核102请求一C-状态的同步时,如上所述,该控制单元104填入最低常用C-状态栏位246。流程进行至判断方块322。In block 318 , the control unit 104 populates the state register 106 . Specifically, the control unit 104 fills the lowest common C-state field 246 if a synchronization condition occurs that requests a C-state synchronization for all cores 102 , as described above. Flow proceeds to decision block 322 .

在判断方块322中,该控制单元104检查选择性唤醒(SEL WAKE)位214。如果该位为设置(set)时,流程进行至方块326;否则,流程进行至判断方块322。In decision block 322 , the control unit 104 checks the SEL WAKE bit 214 . If the bit is set, the process proceeds to block 326 ; otherwise, the process proceeds to decision block 322 .

在方块326中,该控制单元104阻止除了即时核(instant core)外所有其它核102的所有唤醒事件,其中该即时核为在方块302中最后写入同步请求至其同步暂存器108的核102,因此使该同步情况发生。在一实施例中,如果欲阻止唤醒事件及其它方面为真(True)时,该控制单元104的逻辑简单地布尔(Boolean)AND运算具有一为假(False)信号的唤醒情况。阻止所有核的所有唤醒事件的用途被更详细地描述如下,特别是图11至图13。流程进行至方块328。In block 326, the control unit 104 prevents all wake-up events of all other cores 102 except the instant core, which is the core that wrote the sync request to its sync register 108 last in block 302 102, thus causing this synchronization condition to occur. In one embodiment, the logic of the control unit 104 simply performs a Boolean AND operation with a wakeup condition that is a False signal if the wakeup event to be prevented and other aspects are true. The use of blocking all wakeup events for all cores is described in more detail below, particularly FIGS. 11-13 . Flow proceeds to block 328 .

在方块328中,该控制单元104仅唤醒该即时核102,但不唤醒请求该同步的其它核。此外,该控制单元104通过清除该S位222中止该即时核102正进行的同步请求,但不中止其它核102正进行的同步请求,例如,离开其它核102的S位222设置。因此有利的是,如果当即时核102在其被唤醒后写入另一同步请求时,其将会再次造成同步情况的发生(假设其它核102的同步请求尚未被中止),一例子将在下方图12及图13中描述。流程结束于方块328。In block 328, the control unit 104 wakes up only the immediate core 102, but not other cores requesting the synchronization. In addition, the control unit 104 aborts the ongoing sync request of the immediate core 102 by clearing the S bit 222 , but does not abort the ongoing sync request of other cores 102 , eg, leaving the S bit 222 of the other core 102 set. It is therefore advantageous if when an instant core 102 writes another sync request after it wakes up, it will cause the sync situation to happen again (assuming the sync requests of other cores 102 have not been aborted), an example will be below Described in Figure 12 and Figure 13. Flow ends at block 328 .

在判断方块332中,该控制单元104检查该睡眠位212。如果该位为设置(set)时,则流程前进到方块336;否则,流程前进到方块334。In decision block 332 , the control unit 104 checks the sleep bit 212 . If the bit is set, flow proceeds to block 336 ; otherwise, flow proceeds to block 334 .

在方块334中,该控制单元104传送一中断信号(同步中断)至所有核102。图21的时序图是说明一非睡眠同步请求的例子。每一核102可读取该唤醒事件栏位244并侦测一同步情况的发生是中断的原因。流程已进行到方块334,在此情况下,当核102写入其同步请求时,核102选择不进入睡眠状态。虽然此种情况并未使核102获得与进入睡眠状态时同样的益处(例如,同时唤醒),但其具有使核102在等待最后写入其同步要求的核102在无需同时唤醒的情况下,继续处理指令的潜在优势。流程结束于方块334。In block 334 , the control unit 104 sends an interrupt signal (synchronous interrupt) to all cores 102 . FIG. 21 is a timing diagram illustrating an example of a non-sleep sync request. Each core 102 can read the wakeup event field 244 and detect the occurrence of a synchronous condition as the cause of the interrupt. Flow has proceeded to block 334, in which case core 102 chooses not to go to sleep when core 102 writes its sync request. While this situation does not give the cores 102 the same benefits as going to sleep (e.g., simultaneous wakeup), it has the advantage of allowing the cores 102 to wait for the cores 102 to finally write their synchronization requirements without simultaneous wakeup. Potential advantages of continuing to process instructions. Flow ends at block 334 .

在方块336中,该控制单元104同时被所有核102唤醒。在一实施例中,该控制单元104在同一时钟周期准确地开启至所有核102的该时脉信号122。在另一实施例中,该控制单元104以一交错方式开启该时脉信号122至所有核102。也就是说,该控制单元104在开启时脉信号122至每一核间引入一时钟周期的预定数量(例如,时钟顺序为十或一百)。然而,时脉信号122交错(staggering)开启被同时考虑于本发明中。为降低所有核102被唤醒时的一电力耗损尖峰的可能性,时脉信号122交错开启是有益的。在又另一实施例中,为了降低电力耗损尖峰的可能性时,该控制单元104在同一时钟周期开启至所有核102的时脉信号122,但通过初始在一已减少的频率中提供时脉信号122并提高频率至目标频率下,在一断断续续(stuttering)、或压制(throttled)方式中执行。在一实施例中,该同步请求作为该核102的微码指令的执行结果被发出,并且该微码被设计用于至少一些同步情况值,且指定该同步情况值的该微码位置为唯一的。举例来说,在微码中仅一地方包括一同步x请求,在微码中仅一地方包括一同步y请求,依此类推。在这些情况下,因所有核102在完全相同的地方被唤醒,其可使得微码设计人员设计出更有效率且无缺陷的程序码,因此同时唤醒是有益的。此外,当尝试重新建立和修复因多核相互作用而出现错误,但当单一核运行时则不出现错误时,以除错为目的同时唤醒可能是特别有益的。图5及图6是显示此一例子。此外,该控制单元104中止所有正进行的同步请求(例如,清除在每一核102的同步暂存器108中的S位222)。流程结束于方块336。In block 336, the control unit 104 is woken up by all cores 102 simultaneously. In one embodiment, the control unit 104 accurately turns on the clock signal 122 to all cores 102 at the same clock cycle. In another embodiment, the control unit 104 enables the clock signal 122 to all cores 102 in a staggered manner. That is, the control unit 104 introduces a predetermined number of clock cycles (eg, a clock sequence of ten or one hundred) between the turn-on clock signal 122 to each core. However, staggering on of the clock signal 122 is also contemplated in the present invention. To reduce the possibility of a power consumption spike when all cores 102 wake up, it is beneficial to stagger the clock signal 122 on. In yet another embodiment, to reduce the possibility of power loss spikes, the control unit 104 turns on the clock signal 122 to all cores 102 on the same clock cycle, but by initially providing the clock at a reduced frequency Signal 122 and boost frequency to below the target frequency, performed in a stuttering, or throttled, manner. In one embodiment, the synchronization request is issued as a result of execution of microcode instructions of the core 102, and the microcode is designed for at least some synchronization condition values, and the microcode location specifying the synchronization condition values is unique of. For example, there is only one place in the microcode that includes a synchronous x request, only one place in the microcode that includes a synchronous y request, and so on. In these cases, simultaneous wakeup is beneficial because all cores 102 wake up at exactly the same place, which allows microcode designers to design more efficient and bug-free program code. Also, simultaneous wakeup for debugging purposes can be particularly beneficial when attempting to re-establish and fix errors due to multi-core interaction, but not when a single core is running. Figures 5 and 6 show such an example. In addition, the control unit 104 aborts all ongoing sync requests (eg, clears the S bit 222 in the sync register 108 of each core 102). Flow ends at block 336 .

本文所描述实施例的一优点为其可显著减少在一微处理器中微码的数量,因比起循环(looping)或执行其它检查以同步多核间的操作,在每一核中的微码可简单地写入同步请求,进入睡眠状态,并且知道何时在微码中同一地方唤醒所有核。该同步请求机制的微码用途将描述于下方。An advantage of the embodiments described herein is that it can significantly reduce the amount of microcode in a microprocessor, because the microcode in each core can Simply write a synchronous request to go to sleep and know when to wake up all cores in the same place in the microcode. The microcode usage of the synchronous request mechanism is described below.

多晶体微处理器polycrystalline microprocessor

请参照图4,是显示另一实施例微处理器100的一方块图。图4中的微处理器100在许多方面类似于图1的微处理器100,其中一多核处理器及核102均相似。然而,图4的实施例是一多晶体配置。也就是说,该微处理器100包括安装在一共同封装体(common package)内并经由一晶体内总线404与另一晶体通信的多半导体晶体406。图4的实施例包括两个晶体406,标记为晶体A406A和通过晶体间总线404所耦接的晶体B 406B。此外,每一晶体406包括一晶体间总线单元402,晶体间总线单元402联系各自的晶体406至该晶体间总线404。更进一步地,每一晶体406包括耦接至各自核102及晶体间总线单元402的非核103中的控制单元104。在图4的实施例中,晶体A 406A包括四个核102—核A 102A、核B 102B、核C 102C和核D102D,其中上述四个核102耦接至一耦接于一晶体间总线单元A 402A的控制单元A 104A;同样地,晶体B 406B包括四个核102—核E 102E、核F 102F、核G102G和核H102H,其中上述四个核102耦接至一耦接于一晶体间总线单元B 402B的控制单元B104B。最后,每一控制单元104不仅包括在包括本身的该晶体406中每一核的一同步暂存器108,也包括另一晶体406中每一核的一同步暂存器108,其中,上述另一晶体406中的同步暂存器108为图4中所示的影子暂存器(Shadow register)。因此,图4所示实施例中的每一控制单元包括八个同步暂存器108,表示为108A、108B、108C、108D、108E、108F、108G和108H。在控制单元A104A,同步暂存器108E、108F、108G和108H为影子暂存器,而在控制单元B104B中,同步暂存器108A、108B、108C、108D为影子暂存器。Please refer to FIG. 4 , which is a block diagram showing another embodiment of a microprocessor 100 . Microprocessor 100 in FIG. 4 is similar in many respects to microprocessor 100 in FIG. 1 , where a multi-core processor and cores 102 are similar. However, the embodiment of Figure 4 is a polycrystalline configuration. That is, the microprocessor 100 includes multiple semiconductor crystals 406 mounted in a common package and communicating with one another via an intra-crystal bus 404 . The embodiment of FIG. 4 includes two crystals 406 , labeled crystal A 406A and crystal B 406B coupled by an inter-crystal bus 404 . In addition, each crystal 406 includes an inter-crystal bus unit 402 that connects the respective crystal 406 to the inter-crystal bus 404 . Furthermore, each crystal 406 includes the control unit 104 in the non-core 103 coupled to the respective core 102 and the inter-crystal bus unit 402 . In the embodiment of FIG. 4, crystal A 406A includes four cores 102—core A 102A, core B 102B, core C 102C, and core D 102D—wherein the four cores 102 are coupled to an inter-crystal bus unit Control unit A 104A of A 402A; likewise, crystal B 406B includes four cores 102—core E 102E, core F 102F, core G 102G, and core H 102H, wherein the four cores 102 are coupled to an inter-crystal coupling Control unit B 104B of bus unit B 402B. Finally, each control unit 104 includes not only a synchronous register 108 for each core in the crystal 406 including itself, but also a synchronous register 108 for each core in the other crystal 406, wherein the other The synchronous register 108 in a crystal 406 is a shadow register shown in FIG. 4 . Accordingly, each control unit in the embodiment shown in FIG. 4 includes eight synchronous registers 108, indicated as 108A, 108B, 108C, 108D, 108E, 108F, 108G, and 108H. In control unit A 104A, sync registers 108E, 108F, 108G, and 108H are shadow registers, and in control unit B 104B, sync registers 108A, 108B, 108C, 108D are shadow registers.

当一核102将一值写入到其同步暂存器108时,在核102的晶体406中的控制单元104,经由晶体间总线单元402及晶体间总线404,写入该值至另一晶体406中对应的影子暂存器108。此外,如果停用核位236被设定于传播至影子同步暂存器108的该值中时,该控制单元104还更新在配置暂存器112中对应的致能位254。在此种方式下,即使是在微处理器100核配置是动态变化的情形下(例如,图14至图16),一同步情况的发生(包括一跨晶体(trans-die)同步情况的发生)可以被侦测。在一实施例中,晶体间总线404是一相对低速的总线,且该传播可采用为一预定数量100核的时钟周期顺序,并且每一控制单元104包括一状态机制,其取用一预定数量的时间以侦测该同步情况的发生,并开启该时脉信号至各自晶体406中的所有核102。更佳地说,在控制单元104开始写入值至另一晶体406后(例如,被授予的晶体间总线404),在本地晶体406中的控制单元104(例如,包括写入核102的晶体406)被配置为延迟更新该本地同步暂存器直到一预定数量的时间为止(例如,传播时间数量与状态机制同步情况发生侦测时间数量的总和)。在此种方式中,在两个晶体中的控制单元104同时侦测一同步情况的发生,并且同时在两个晶体406中开启至所有核102的时脉信号。当尝试重新建立及修复仅因多核相互作用而出现,但不在一单一核正运行时出现的错误时,以除错为目的而言可能特别地有益。图5和图6描述了可能利用此功能优势的实施例。When a core 102 writes a value into its synchronous register 108, the control unit 104 in the crystal 406 of the core 102 writes the value to the other crystal via the inter-crystal bus unit 402 and the inter-crystal bus 404. 406 corresponds to the shadow register 108. Additionally, the control unit 104 also updates the corresponding enable bit 254 in the configuration register 112 if the disable core bit 236 is set in the value propagated to the shadow sync register 108 . In this manner, even in situations where the microprocessor 100 core configuration is dynamically changing (eg, FIGS. 14-16 ), the occurrence of a synchronization condition (including the occurrence of a trans-die synchronization condition) ) can be detected. In one embodiment, the inter-crystal bus 404 is a relatively low-speed bus, and the propagation may take a predetermined number of clock cycles in the order of 100 cores, and each control unit 104 includes a state machine that takes a predetermined number of time to detect the occurrence of the synchronous condition and turn on the clock signal to all cores 102 in the respective crystal 406 . More preferably, after a control unit 104 begins writing values to another crystal 406 (e.g., the granted inter-crystal bus 404), the control unit 104 in the local crystal 406 (e.g., the crystal that includes the write core 102 406) configured to defer updating the local synchronization register until a predetermined amount of time (eg, the sum of the amount of propagation time and the amount of time the state machine synchronization situation has been detected). In this way, the control unit 104 in both crystals simultaneously detects the occurrence of a synchronization condition and turns on the clock signals to all cores 102 in both crystals 406 at the same time. It may be particularly beneficial for debugging purposes when attempting to recreate and fix errors that occur only due to multi-core interaction, but not while a single core is running. Figures 5 and 6 describe embodiments that may take advantage of this functionality.

调试操作debug operation

微处理器100的核102被配置用以执行单独的调整操作,例如指令执行及数据存取的断点(Breakpoint)。此外,微处理器100被配置用以执行为跨核(trans-core)的调试操作,例如,该调试操作与该微处理器100一个以上的核102相关。The core 102 of the microprocessor 100 is configured to perform individual adjustment operations, such as breakpoints for instruction execution and data access. Additionally, the microprocessor 100 is configured to perform debug operations that are trans-core, eg, the debug operations are associated with more than one core 102 of the microprocessor 100 .

请参阅图5,其是显示微处理器100操作以转储(dump)调试(debug)信息的流程图。该操作是从一单一核的角度所描述,但微处理器100中每一核102根据其描述操作共同转储微处理器100的状态。更具体地说,图5描述了一核接收请求以转储调试信息的操作,其流程开始于方块502,且其它核102的操作流程开始于方块532。Please refer to FIG. 5 , which is a flowchart showing the operation of the microprocessor 100 to dump debug information. The operation is described from the perspective of a single core, but each core 102 in the microprocessor 100 collectively dumps the state of the microprocessor 100 according to its described operation. More specifically, FIG. 5 describes the operation of one core receiving a request to dump debug information, its flow starts at block 502 , and the operation flow of other cores 102 starts at block 532 .

在方块502中,核102其中之一接收一请求以转储调试信息。更佳地说,上述调整信息包括该核102或其一子集的状态。更佳地说,调整信息被转储至系统存储器或一可通过调整设备监控的外部总线,像是一逻辑分析器。响应该请求,核102传送一调试转储信息至其它核102并传送其它核102一核间中断信号。更佳地说,在此时间中断被停用的期间内(例如,该微码不允许本身被中断),核102阻止微码以响应该请求以转储调试信息(在方块502中),或响应上述中断信号(在方块532),并保持在微码中,直到方块528为止。在一实施例中,核102只需当其处于睡眠状态且位于架构指令边界时中断。在一实施例中,本文所描述的各种核间信息(像是在方块502及其它像是在方块702、1502、2606和3206中的信息)经由同步暂存器108控制字的该同步情况或C-状态栏位226被传送及接收。在其它实施例中,核间信息经由非核专用随机存取存储器116被传送及接收。流程从方块502前进到方块504。In block 502, one of the cores 102 receives a request to dump debug information. More preferably, the adjustment information includes the state of the core 102 or a subset thereof. More preferably, tuning information is dumped to system memory or an external bus that can be monitored by tuning equipment, such as a logic analyzer. In response to the request, the core 102 sends a debug dump message to the other core 102 and sends the other core 102 an inter-core interrupt signal. More preferably, during this time that interrupts are disabled (e.g., the microcode is not allowed to be interrupted itself), the core 102 blocks the microcode from responding to the request to dump debug information (in block 502), or Respond to the aforementioned interrupt signal (at block 532 ) and remain in microcode until block 528 . In one embodiment, the core 102 only needs to interrupt when it is asleep and at an architectural instruction boundary. In one embodiment, various inter-core messages described herein (such as in block 502 and others such as in blocks 702, 1502, 2606, and 3206) control the synchronization of words via the sync register 108 Or C-status field 226 is sent and received. In other embodiments, the inter-core information is transmitted and received via the non-core dedicated RAM 116 . Flow proceeds from block 502 to block 504 .

在方块532中,其它核102的其中之一(例如,在方块502中接收该调试转储请求核102之外的一核102)由于在方块502中传送的核间中断信号及信息被中断并接收该调试转储信息。如上所述,虽然在方块532中的流程由单一核102的角度所描述,但每一其它核102(例如,不在方块502中的核102)在方块532被中断并接收该信息,且执行方块504至528的步骤。流程由方块532前进到方块504。In block 532, one of the other cores 102 (e.g., a core 102 other than the core 102 receiving the debug dump request in block 502) is interrupted due to the inter-core interrupt signal and information transmitted in block 502 and Receive the debug dump information. As noted above, although the flow in block 532 is described from the perspective of a single core 102, each other core 102 (eg, a core 102 not in block 502) is interrupted at block 532 and receives this information, and executes block 532. Steps 504 to 528. Flow proceeds from block 532 to block 504 .

在方块504中,核102写入一同步情况1(在图5中标示为SYNC 1)的同步请求至其同步暂存器108中。因此,该控制单元104使核102进入睡眠状态。流程进行到方块506。In block 504 , the core 102 writes a sync request into its sync register 108 for a sync case 1 (labeled SYNC 1 in FIG. 5 ). Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 506 .

在方块506中,当所有核已写入SYNC 1时,核102由控制单元104所唤醒。流程进行到方块508。In block 506 , core 102 is woken up by control unit 104 when all cores have written to SYNC 1 . Flow proceeds to block 508 .

在方块508中,核102转储其状态至存储器中。流程进行到方块514。In block 508, the core 102 dumps its state to memory. Flow proceeds to block 514 .

在方块514中,核102写入一SYNC 2,其造成控制单元104使核102进入睡眠状态。流程进行到方块516。In block 514, the core 102 writes a SYNC 2, which causes the control unit 104 to put the core 102 into a sleep state. Flow proceeds to block 516 .

在方块516中,当所有核已写入SYNC 2时,核102由控制单元104所唤醒。流程进行到方块518。In block 516 , core 102 is woken up by control unit 104 when all cores have written SYNC 2 . Flow proceeds to block 518 .

在方块518中,核102转储在方块508中调试信息的该存储器地址设定一旗标(flag),通过一重置(Reset)信号维持,接着重置其本身。核102重置微码,该微码侦测该旗标并由所储存的存储器地址重新载入其状态。流程进行到方块524。In block 518, the memory address at which the core 102 dumped the debug information in block 508 sets a flag, asserts it through a reset signal, and then resets itself. Core 102 resets the microcode, which detects the flag and reloads its state from the stored memory address. Flow proceeds to block 524 .

在方块524中,核102写入一SYNC 3,其造成控制单元104使核102进入睡眠状态。流程进行到方块526。In block 524, the core 102 writes a SYNC 3, which causes the control unit 104 to put the core 102 into a sleep state. Flow proceeds to block 526 .

在方块526中,当所有核已写入SYNC 3时,核102由控制单元104所唤醒。流程进行到方块528。In block 526 , core 102 is woken up by control unit 104 when all cores have written SYNC 3 . Flow proceeds to block 528 .

在方块528中,该核102基于该在方块518中被重新载入的状态移除重置,并开始提取架构(例如,x86)指令。流程结束于方块528。In block 528, the core 102 removes reset based on the state reloaded in block 518 and begins fetching architecture (eg, x86) instructions. Flow ends at block 528 .

请参照图6,其是显示一根据图5流程图中微处理器100的操作示例时序图。在此例子中,微处理器100配置具有三个核102,标示为核0、核1及核2,如图所示。然而,应可理解的是,在其它实施例中,微处理器100可包括不同数量的核102。在此时序图中,事件时序的过程如下所述。Please refer to FIG. 6 , which is a timing diagram showing an exemplary operation of the microprocessor 100 according to the flowchart in FIG. 5 . In this example, the microprocessor 100 is configured with three cores 102, labeled Core 0, Core 1, and Core 2, as shown. However, it should be appreciated that in other embodiments, the microprocessor 100 may include a different number of cores 102 . In this timing diagram, the process of event timing is as follows.

核0接收一调试转储请求,并传送一调试转储信息和中断信息至核1及核2(每一方块502)以作为响应。该核0接着写入一SYNC 1,并进入睡眠状态(每一方块504)。Core 0 receives a debug dump request, and sends a debug dump message and interrupt message to Core 1 and Core 2 (each block 502 ) in response. The core 0 then writes a SYNC 1 and goes to sleep (per block 504).

每一核1和核2最后由其目前任务中被中断并读取其信息(每一方块532)。作为响应,每一核1和核2写入一SYNC 1并进入睡眠状态(每一方块504)。如图所示,每一核写入SYNC 1的时间可能不同,例如,由于该指令当该中断被确立时正在执行。Each core 1 and core 2 is finally interrupted from its current task and reads its information (each block 532). In response, Core 1 and Core 2 each write a SYNC 1 and go to sleep (per block 504). As shown, each core may write to SYNC 1 at a different time, eg, because the instruction was executing when the interrupt was asserted.

当所有核已写入SYNC 1时,控制单元104同时唤醒所有核(每一方块506)。每一核接着转储其状态至存储器(每一方块508),写入一SYNC 2并进入睡眠状态(每一方块514)。需转储该状态的时间量可能不同;因此,在每一核写入SYNC 2的时间可能不同,如图所示。When all cores have written SYNC 1, control unit 104 wakes up all cores simultaneously (per block 506). Each core then dumps its state to memory (per block 508), writes a SYNC 2 and goes to sleep (per block 514). The amount of time required to dump this state may vary; therefore, the time at which SYNC 2 is written may vary on each core, as shown.

当所有核已写入SYNC 2时,控制单元104同时唤醒所有核(每一方块516)。每一核接着重置其本身并由存储器中重新载入其状态(每一方块518),写入SYNC 3并进入睡眠状态(每一方块524)。如图所示,需重置并重新载入状态的时间量可能会有所不同;因此,在每一核写入SYNC 3的时间可能不同。When all cores have written SYNC 2, control unit 104 wakes up all cores simultaneously (per block 516). Each core then resets itself and reloads its state from memory (per block 518), writes SYNC 3 and goes to sleep (per block 524). As shown, the amount of time required to reset and reload state may vary; therefore, the time to write SYNC 3 may vary on each core.

当所有核已写入SYNC 3时,控制单元104同时唤醒所有核(每一方块526)。每一核接着开始在被中断的时间点提取架构指令(每一方块528)。When all cores have written SYNC 3, control unit 104 wakes up all cores simultaneously (per block 526). Each core then begins fetching architectural instructions at the point in time it was interrupted (each block 528).

传统在多处理器之间同步操作的解决方法是使用软件信号量(semaphore)。然而,传统的解决方法缺点是其无法提供时间等级同步(Clock-level Synchronization)。本文所描述实施例的优点是控制单元104可同时开启时脉信号122至所有的核102。The traditional solution to synchronize operations between multiple processors is to use software semaphores (semaphore). However, the disadvantage of the traditional solution is that it cannot provide clock-level synchronization. An advantage of the embodiments described herein is that the control unit 104 can turn on the clock signal 122 to all cores 102 at the same time.

在如上所述的方法中,一调整微处理器100的工程师可配置核102其中之一以周期性地产生检查时间点,其用以产生调试转储请求,举例来说,在一预定数量的指令已经执行后。当微处理器100在运行时,工程师取得在一记录档中微处理器100外部总线上的所有活动。接近总线被察觉已发生时间的记录档部分可提供至一软件模拟器,其模拟该微处理器100以帮助工程师调试。该模拟器模拟执行由每一核102所指示的指令,并模拟外部微处理器100总线使用纪录信息的执行。在一实施例中,所有核102的模拟器从同时由一重置点启动。因此,该微处理器100的所有核102实际上在同一时间停止重置(例如,在SYNC 2之后)是具有较高的效果。此外,通过在所有其它核102已经停止其当前的任务(例如,在SYNC 1之后)之前,等待转储其状态时,由一核102转储其状态不会与其它核执行调试(例如,共享存储器总线或高速缓冲相互影响)的程序码及/或硬件互相干扰,其可增加重新产生错误并判断其原因的可能性。同样地,直到所有核102已经完成重新载入其状态之前(例如,在SYNC 3之后),等待以开始提取架构指令,通过一核102重新载入状态不会与其它核执行调试的程序码及/或硬件互相干扰,其可增加重新产生错误并判断其原因的可能性。In the method described above, an engineer tuning microprocessor 100 may configure one of cores 102 to periodically generate checkpoints for generating debug dump requests, for example, at a predetermined number of after the instruction has been executed. While the microprocessor 100 is running, the engineer captures all activity on the external bus of the microprocessor 100 in a log file. Portions of the log file for when the proximity bus is perceived to have occurred may be provided to a software simulator that simulates the microprocessor 100 to aid engineers in debugging. The simulator simulates the execution of instructions directed by each core 102 and the execution of the bus usage log information of the external microprocessor 100 . In one embodiment, the emulators for all cores 102 are started from a reset point simultaneously. Therefore, it is of high effect that all cores 102 of the microprocessor 100 actually stop reset at the same time (eg, after SYNC 2). Furthermore, dumping its state by one core 102 does not perform debugging (e.g., shared Memory bus or cache interfering) program code and/or hardware interfering with each other, which can increase the likelihood of reproducing the error and determining its cause. Likewise, waiting to start fetching architectural instructions until all cores 102 have finished reloading their state (e.g., after SYNC 3), reloading state by one core 102 will not execute debug code and and/or the hardware interferes with each other, which can increase the likelihood of reproducing the error and determining its cause.

这些好处提供比现有方法更多的优点,其现有方法如美国专利US8,370,684,其从所有目的整体上作为参考被引用于此,其无法享有能够取得该同步请求核的好处。These benefits provide advantages over existing approaches, such as US Pat. No. 8,370,684, which is hereby incorporated by reference in its entirety for all purposes, which do not enjoy the benefit of being able to obtain this synchronous requesting core.

高速缓冲控制操作cache control operation

微处理器100的核102被配置用以执行独立的高速缓冲控制操作,像是在本地高速缓冲存储器,例如,不由两个或更多核102所共享的高速缓冲器。此外,微处理器100被配置用以执行为跨核(Trans-core)的高速缓冲控制操作,例如,与微处理器100一个以上的核102相关,以及例如,因其与一共享高速缓冲存储器119相关。The cores 102 of the microprocessor 100 are configured to perform independent cache control operations, such as in local caches, eg, caches that are not shared by two or more cores 102 . In addition, the microprocessor 100 is configured to perform cache control operations that are trans-core, for example, associated with more than one core 102 of the microprocessor 100, and for example, because of its connection with a shared cache memory 119 related.

请参阅图7A~7B,其是显示微处理器100用以执行跨核高速缓冲控制操作的流程图。图7A~7B的实施例描述微处理器100如何执行一x86架构写回无效缓冲(Write Backand Invalidate Cache,WBINVD)指令。一WBINVD指令指示执行指令的核102写回在微处理器100高速缓存存储器中所有的修改行至系统存储器并使高速缓存存储器失效,或清空(Flush)。该WBINVD指令还指示该核102发布特别的总线周期以将任意高速缓存存储器外部直接指入微处理器100中,以写回其已修改的数据,并使上述数据失效。上述操作是以一单一核的角度所描述,但微处理器100的每一核102根据本说明书操作共同写回已修改高速缓冲线(Modified cache line)并使微处理器100的高速缓冲存储器无效。更具体地说明,图7A~7B描述一核遇到WBINVD指令的操作,其流程开始于方块702,并且其它核102的流程开始于方块752。Please refer to FIGS. 7A-7B , which are flowcharts showing the microprocessor 100 performing cross-core cache control operations. 7A-7B illustrate how the microprocessor 100 executes an x86 architecture Write Back and Invalidate Cache (WBINVD) instruction. A WBINVD instruction instructs the core 102 executing the instruction to write back all modified lines in the cache memory of the microprocessor 100 to the system memory and invalidate, or flush, the cache memory. The WBINVD instruction also instructs the core 102 to issue a special bus cycle to point any cache memory externally directly into the microprocessor 100 to write back its modified data and to invalidate said data. The above operation is described from the perspective of a single core, but each core 102 of the microprocessor 100 operates according to this specification to jointly write back the modified cache line (Modified cache line) and invalidate the cache memory of the microprocessor 100 . More specifically, FIGS. 7A-7B describe the operation of one core encountering a WBINVD instruction, whose flow begins at block 702 , and whose flow begins at block 752 for the other core 102 .

在方块702中,核102其中之一遇到一WBINVD指令。作为响应,核102传送一WBINVD指令信息至其它核102并且传送一核间中断信号至上述其它核102。更佳地说,直到流程进行至方块748/749之前,核102在时间中断信号被停用的期间内(例如,该微码不允许其本身被中断),阻止微码以作为WBINVD指令的响应(在方块702中),或以作为该中断信号(在方块752中)的响应,并维持在微码中。流程从方块702进行到方块704。In block 702, one of the cores 102 encounters a WBINVD instruction. In response, the core 102 sends a WBINVD command message to the other core 102 and sends an inter-core interrupt signal to the other core 102 . More preferably, until the flow proceeds to block 748/749, the core 102 blocks the microcode as a response to the WBINVD instruction during the period in which the time interrupt signal is disabled (e.g., the microcode does not allow itself to be interrupted). (in block 702), or in response to the interrupt signal (in block 752), and maintained in microcode. Flow proceeds from block 702 to block 704 .

在方块752中,其它核102其中之一(例如,除了在方块702中所遇到该WBINVD指令核102之外的一核)由于在方块702中被传送的该核间中断信号而被中断并接收该WBINVD指令信息。如上所述,虽然流程在方块752是由单一核102的角度所描述,但每一其它核102(例如,不为在方块702中的核102)在方块752中被中断并接收该信息,且执行方块704至方块749的步骤。流程由方块752进行到方块704。In block 752, one of the other cores 102 (e.g., a core other than the WBINVD instruction core 102 encountered in block 702) is interrupted due to the inter-core interrupt signal transmitted in block 702 and Receive the WBINVD command information. As noted above, although flow is described at block 752 from the perspective of a single core 102, each other core 102 (e.g., not the core 102 at block 702) is interrupted at block 752 and receives the information, and The steps from block 704 to block 749 are performed. Flow proceeds from block 752 to block 704 .

在方块704中,该核102写入一同步情况4的同步请求(在图7A~7B中标示为SYNC4)至其同步暂存器108中。因此,控制单元104使核102进入睡眠状态。流程进行到方块706。In block 704 , the core 102 writes a sync request for sync case 4 (labeled SYNC4 in FIGS. 7A-7B ) into its sync register 108 . Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 706 .

在方块706中,当所有核102已写入SYNC 4时,该核102由控制单元104所唤醒。流程进行到方块708。In block 706, when all cores 102 have written to SYNC 4, the cores 102 are woken up by the control unit 104. Flow proceeds to block 708 .

在方块708中,核102写回并使得本地高速缓冲存储器失效,例如,不由核102与其它核102共享的第1级(Level-1,L1)高速缓冲存储器。流程进行到框714。In block 708 , the core 102 writes back and invalidates a local cache, eg, a Level-1 (L1 ) cache that is not shared by the core 102 with other cores 102 . Flow proceeds to block 714 .

在方块714中,核102写入一SYNC 5,其造成控制单元104使核102进入睡眠状态。流程进行到方块716。In block 714, the core 102 writes a SYNC 5, which causes the control unit 104 to put the core 102 into a sleep state. Flow proceeds to block 716.

在方块716中,当所有核102已写入SYNC 5时,核102由控制单元104所唤醒。流程进行到判断方块717。In block 716 , cores 102 are woken up by control unit 104 when all cores 102 have written SYNC 5 . Flow proceeds to decision block 717.

在判断方块717中,核102判断其是否为在方块702中所遇到该WBINVD指令的核102(与在方块752中所接收该WBINVD指令信息的核102相对照)。若是,则流程进行到方块718;否则,流程前进到方块724。In decision block 717, the core 102 determines whether it is the core 102 that encountered the WBINVD instruction in block 702 (as opposed to the core 102 that received the WBINVD instruction information in block 752). If so, the process proceeds to block 718 ; otherwise, the process proceeds to block 724 .

在方块718中,核102写回并使共享高速暂存器119失效。在一实施例中,微处理器100包括多个晶片在多个核但并非全部核中,微处理器100的核102共享一高速缓冲存储器,如上所述。在此一实施例中,类似于方块717至方块726中的中间操作(图未示出)被执行,其为由在晶片中核102其中之一执行写回及使共享缓冲存储器失效,而该晶片的其它(多个)核回到类似于方块724中的睡眠状态以等待直到该高速缓冲存储器失效为止。流程进行到方块724。In block 718 , the core 102 writes back and invalidates the shared cache 119 . In one embodiment, the microprocessor 100 includes multiple chips in multiple cores, but not all cores, and the cores 102 of the microprocessor 100 share a cache, as described above. In this embodiment, an intermediate operation (not shown) similar to that in blocks 717 to 726 is performed, which is to perform a writeback and invalidate the shared buffer memory by one of the cores 102 on the chip that The other core(s) go back to a sleep state similar to block 724 to wait until the cache is invalidated. Flow proceeds to block 724 .

在方块724中,核102写入一SYNC 6,其造成控制单元104使核102进入睡眠状态。流程进行到方块726。In block 724, the core 102 writes a SYNC 6, which causes the control unit 104 to put the core 102 into a sleep state. Flow proceeds to block 726.

在方块726中,当所有核102已写入SYNC 6时,核102由控制单元104所唤醒。流程进行到判断方块727。In block 726 , cores 102 are woken up by control unit 104 when all cores 102 have written SYNC 6 . Flow proceeds to decision block 727.

在判断方块727中,核102判断其是否为在方块702中遇到WBINVD指令的核102(与在方块752中所接收该WBINVD指令信息的核102相对照)。若是,则流程进行到方块728;否则,流程前进到方块744。In decision block 727, the core 102 determines whether it is the core 102 that encountered the WBINVD instruction in block 702 (as opposed to the core 102 that received the WBINVD instruction information in block 752). If so, the process proceeds to block 728 ; otherwise, the process proceeds to block 744 .

在方块728中,核102发布特定的总线周期以造成外部高速缓冲器被写回并使外部高速缓冲器失效。流程进行到方块744。In block 728, the core 102 issues a specific bus cycle to cause the external cache to be written back and invalidate the external cache. Flow proceeds to block 744 .

在方块744中,写入一SYNC 13,其造成控制单元104使核102进入睡眠状态。流程进行到方块746。In block 744, a SYNC 13 is written, which causes the control unit 104 to put the core 102 into a sleep state. Flow proceeds to block 746.

在方块746中,当所有核102已写入SYNC 13时,核102由控制单元104所唤醒。流程进行到判断方块747。In block 746 , cores 102 are woken up by control unit 104 when all cores 102 have written to SYNC 13 . Flow proceeds to decision block 747.

在判断方块747中,核102判断其是否为在方块702中遇到WBINVD指令的核102(与在方块752中所接收该WBINVD指令信息的核102相对照)。若是,则流程进行到方块748;否则,流程前进到方块749。In decision block 747, the core 102 determines whether it is the core 102 that encountered the WBINVD instruction in block 702 (as opposed to the core 102 that received the WBINVD instruction information in block 752). If so, the process proceeds to block 748; otherwise, the process proceeds to block 749.

在方块748中,核102完成WBINVD指令,其包括引退(retire)的WBINVD指令,且可包括放弃一硬件信号量的所有权(见图20)。流程结束于方块748。In block 748, the core 102 completes the WBINVD instruction, which includes a retired WBINVD instruction, and may include relinquishing ownership of a hardware semaphore (see FIG. 20). Flow ends at block 748.

在方块749,在核102在方块752中被中断之前,核102在方块749恢复继续其正执行的任务102。流程结束于方块749。At block 749 , the core 102 resumes at block 749 the task 102 it was executing before the core 102 was interrupted at block 752 . Flow ends at block 749.

参阅图8,其是显示根据图7A~7B流程图的微处理器100的操作时序图。在此例子中,微处理器100配置具有三个核102,标示为核0、核1及核2,如图所示。然而,应可理解的是,在其它实施例中,微处理器100可包括不同数量的核102。Referring to FIG. 8 , it is a timing diagram showing the operation of the microprocessor 100 according to the flowcharts of FIGS. 7A-7B . In this example, the microprocessor 100 is configured with three cores 102, labeled Core 0, Core 1, and Core 2, as shown. However, it should be appreciated that in other embodiments, the microprocessor 100 may include a different number of cores 102 .

核0遇到一WBINVD指令并响应传送一WBINVD指令信息,且中断核1及核2(每一方块702)。核0接着写一SYNC 4并进入睡眠状态(每一方块704)。Core 0 encounters a WBINVD command and sends a WBINVD command message in response, and interrupts core 1 and core 2 (each block 702). Core 0 then writes a SYNC 4 and goes to sleep (per block 704).

每一核1及核2最后从其当前任务中被中断并读取该信息(每一方块752)。作为响应,每一核1及核2写入一SYNC 4并进入睡眠状态(每一方块704)。如图所示,每一核写入SYNC 4的时间可能不同。Each core 1 and core 2 is finally interrupted from its current task and reads the information (each block 752). In response, Core 1 and Core 2 each write a SYNC 4 and go to sleep (each block 704). As shown, each core may write to SYNC 4 at different times.

当所有核已经写入SYNC 4时,控制单元104同时唤醒所有核(每一方块706)。每一核接着写回并使其特定的高速缓冲存储器失效(每一方块708),写入SYNC 5并进入睡眠状态(每一方块714)。需写回及使高速缓冲存储器失效的时间量可能不同,因此,在每一核写入SYNC 5的时间可能不同,如图所示。When all cores have written SYNC 4, control unit 104 wakes up all cores simultaneously (per block 706). Each core then writes back and invalidates its particular cache (per block 708), writes SYNC 5 and goes to sleep (per block 714). The amount of time required to write back and invalidate the cache may vary, so the time to write to SYNC 5 may vary on each core, as shown.

当所有核已写入SYNC 5时,控制单元104同时唤醒所有核(每一方块716)。仅遇到WBINVD指令的核写回并使共享高速缓冲存储器119失效(每一方块718),以及所有核写入SYNC 6并进入睡眠状态(每一方块724)。由于仅一核写回并使共享高速缓冲存储器119失效,因此每一核写入SYNC 6的时间可能不同。When all cores have written SYNC 5, control unit 104 wakes up all cores simultaneously (per block 716). Only the core encountering the WBINVD instruction writes back and invalidates the shared cache 119 (per block 718), and all cores write to SYNC 6 and go to sleep (per block 724). Since only one core writes back and invalidates the shared cache 119, the time each core writes to SYNC 6 may be different.

当所有核已写入SYNC 6时,控制单元104同时唤醒所有核(每一方块726)。仅遇到WBINVD指令的核完成WBINVD指令(每一方块748),以及所有其它核恢复中断前的处理。When all cores have written SYNC 6, control unit 104 wakes up all cores simultaneously (per block 726). Only the core that encountered the WBINVD instruction completes the WBINVD instruction (per block 748), and all other cores resume pre-interrupt processing.

应可理解的是,虽然高速缓冲控制指令为一x86WBINVD指令的实施例已被描述,但其它实施例可以假设同步请求被用以执行其它高速缓冲指令。例如,微处理器100可执行类似的动作,以使无需写回高速缓冲数据(在方块708和718)而执行一x86INVD指令并简单地使高速缓冲器失效。再举另一例子来说,高速缓冲控制指令可由比x86架构更不相同的指令集架构得到。It should be appreciated that while embodiments have been described where the cache control instruction is an x86WBINVD instruction, other embodiments may assume that synchronization requests are used to execute other cache instructions. For example, microprocessor 100 may perform similar actions so that instead of writing back cached data (at blocks 708 and 718 ), it executes an x86 INVD instruction and simply invalidates the cache. As another example, cache control instructions are available from a different instruction set architecture than the x86 architecture.

电源管理操作Power Management Operations

在微处理器100的核102被配置用以执行各个功率减少的操作,例如,但不局限于,停止执行指令、请求控制单元104停止传送时脉信号至核102、请求控制单元104由移除核102的电源,写回并使核102的本地(例如,非共享)高速缓冲存储器失效及储存核102的状态至一外部存储器,如专用随机存取存储器116。当一核102已执行一或多个核指定的功率减少操作时,其已进入一“核”C-状态(也被称为一核闲置状态或核睡眠状态)。在一实施例中,C-状态值可大致对应到已知高级配置和电源接口(Advanced Configuration and PowerInterface,ACPI)规格处理器状态,但也可包括更精细的粒度(Granularity)。一般而言,一核102将进入一核C-状态以响应来自上述操作系统的请求。举例来说,x86架构监视等待(MWAIT)指令是一电源管理指令,其提供一提示,即一目标C-状态,至执行指令的核102以允许微处理器100进入一最佳化状态,像是一较低功率耗损状态。在一MWAIT指令的情况下,目标C-状态是专属的(proprietary)而非ACPI C-状态。核C-状态0(C0)对应于核102的运行状态并且C-状态逐渐增加的值对应逐渐减少的活动或响应状态(如C1、C2、C3等状态)。一逐渐减少的响应或活动状态是指相对于一更多活动或响应状态节省更多功率的配置或操作状态,或由于某种原因而相对减少响应的配置或操作状态(例如,具有一较长的唤醒延迟、较少完全启用)。一核102可能节省功率操作的例子为停止指令的执行、停止传送时脉信号、降低电压、和/或移除核的部分(例如,功能单元和/或本地高速缓冲器)或整个核的电源。The core 102 of the microprocessor 100 is configured to perform various power-reducing operations, such as, but not limited to, stopping execution of instructions, requesting the control unit 104 to stop sending clock signals to the core 102, requesting the control unit 104 to be removed from the powers up the core 102 , writes back and invalidates the core 102's local (eg, non-shared) cache memory and stores the core 102 state to an external memory, such as dedicated random access memory 116 . A core 102 has entered a "core" C-state (also known as a core idle state or core sleep state) when it has performed one or more core-specified power reduction operations. In one embodiment, the C-state value may roughly correspond to a known Advanced Configuration and Power Interface (ACPI) specification processor state, but may also include a finer granularity (Granularity). Generally, a core 102 will enter a core C-state in response to a request from the aforementioned operating system. For example, the x86 Architecture Monitor Wait (MWAIT) instruction is a power management instruction that provides a hint, i.e., a target C-state, to the core 102 executing the instruction to allow the microprocessor 100 to enter an optimized state, such as is a lower power consumption state. In the case of a MWAIT instruction, the target C-state is proprietary rather than an ACPI C-state. Core C-state 0 (C0) corresponds to an operational state of core 102 and increasing values of C-states correspond to progressively decreasing active or responsive states (eg, C1 , C2, C3, etc. states). A progressively less responsive or active state refers to a configuration or operating state that saves more power than a more active or responsive state, or that is relatively less responsive for some reason (e.g., with a longer wakeup delay, less fully enabled). Examples of possible power-saving operations by a core 102 are stopping execution of instructions, stopping sending clock signals, reducing voltage, and/or removing power to portions of the core (e.g., functional units and/or local caches) or to the entire core .

此外,微处理器100被配置用以执行跨核的功率减少操作。跨核功率减少操作牵连或影响微处理器100的多个核102。举例来说,共享高速缓冲存储器119可以是大的且相对消耗大量的功率。因此,显著的功率节省可通过移除传送至共享高速缓冲存储器119的时脉信号和/或电源来达成。然而,为了移除至共享高速缓冲存储器119的时脉信号和/或电源,所有共享高速缓冲存储器的核102必须同意以使数据的一致性得到维持。实施例考虑该微处理器100包括其它共享电源相关的资源,像是共享时脉和电源。在一实施例中,微处理器100被耦接至包括一存储器控制器、外围控制器和/或电源管理控制器的系统晶片组。在其它实施例中,一或多个控制器被整合至微处理器100中。系统省电可由微处理器100通知控制器使控制器采取省电的动作来达成。举例来说,微处理器100可以通知控制器使微处理器的高速缓冲存储器失效并关闭,以使其无须被侦查。Additionally, microprocessor 100 is configured to perform cross-core power reduction operations. The cross-core power reduction operation involves or affects multiple cores 102 of the microprocessor 100 . For example, shared cache memory 119 may be large and consume a relatively large amount of power. Thus, significant power savings can be achieved by removing clock signals and/or power to the shared cache memory 119 . However, in order to remove clock and/or power to the shared cache 119, all cores 102 sharing the cache must agree so that data coherency is maintained. Embodiments contemplate that the microprocessor 100 includes other shared power related resources, such as shared clock and power. In one embodiment, the microprocessor 100 is coupled to a system chipset including a memory controller, peripheral controller and/or power management controller. In other embodiments, one or more controllers are integrated into the microprocessor 100 . System power saving can be achieved by the microprocessor 100 notifying the controller to make the controller take power saving actions. For example, the microprocessor 100 may notify the controller to invalidate and shut down the microprocessor's cache memory so that it does not need to be detected.

除了一核C-状态的概念外,微处理器100一般来说具有一“封装”的C-状态(也被称为一封装闲置状态或封装睡眠状态)。封装C-状态对应核102的最低(例如,最高功率消耗)共同核C-状态(例如,请参阅图2中的栏位246及图3的方块318)。然而,除了核特定的功率减少操作外,封装C-状态涉及到执行一或多个跨核功率减少操作的微处理器100。与封装C-状态相关的跨核省电操作例子包括关闭一产生时脉信号的锁相回路(Phase-locked-loop,PLL),并清空该共享高速缓冲存储器119,且停止其时脉和/或电源,其使存储器/外部控制器避免侦查微处理器100的本地共享高速缓冲存储器。其它例子为改变电压、频率和/或总线时脉比、减少高速缓冲存储器的大小,如共享高速缓冲存储器119,并以一半的速度运行共享高速缓冲存储器119。In addition to the concept of a core C-state, microprocessor 100 generally has a "package" C-state (also known as a package idle state or package sleep state). The package C-state corresponds to the lowest (eg, highest power consumption) common core C-state of the core 102 (see, eg, column 246 in FIG. 2 and block 318 in FIG. 3 ). However, in addition to core-specific power reduction operations, package C-states involve the microprocessor 100 performing one or more cross-core power reduction operations. Examples of cross-core power-saving operations associated with package C-states include disabling a Phase-locked-loop (PLL) that generates a clock signal, clearing the shared cache memory 119, and stopping its clock and/or or a power supply that keeps the memory/external controller from snooping the microprocessor's 100 local shared cache memory. Other examples are changing the voltage, frequency and/or bus clock ratio, reducing the size of a cache, such as the shared cache 119, and running the shared cache 119 at half the speed.

在许多情况下,操作系统被有效地用以执行在单独核102中的指令,因此可令单独核进入睡眠状态(例如,至一核C-状态),但不具有直接令微处理器100进入睡眠状态(例如,至封装C-状态)的方式。有益地,在实施例中描述微处理器100的核102在控制单元104的帮助下互相合作地工作,以侦测当所有核102已进入核C-状态并准备使跨核的省电操作发生。In many cases, the operating system is effectively used to execute instructions in the individual cores 102, and thus may put the individual cores to sleep (e.g., to a core C-state), but does not have the ability to directly put the microprocessor 100 into Ways to sleep state (eg, to package C-state). Beneficially, in the embodiment described the cores 102 of the microprocessor 100 work cooperatively with each other with the help of the control unit 104 to detect when all the cores 102 have entered a core C-state and are ready to allow cross-core power saving operations to occur .

请参阅图9,其是显示微处理器100进入一低功率封装C-状态的操作流程图。图9的实施例描述微处理器100耦接至一晶片组并使用MWAIT指令执行的例子。然而,应可理解的是,在其它实施例中,操作系统采用其它电源管理指令以及主要核102与整合至微处理器100内的控制器互相通信,并使用一不同的握手(Handshake)协议来描述。Please refer to FIG. 9 , which is a flowchart showing the operation of the microprocessor 100 entering a low power package C-state. The embodiment of FIG. 9 depicts an example where the microprocessor 100 is coupled to a chipset and executed using the MWAIT instruction. However, it should be understood that in other embodiments, the operating system uses other power management instructions and the main core 102 communicates with the controller integrated into the microprocessor 100, and uses a different handshake (Handshake) protocol to describe.

此操作是以一单一核的角度来描述,但该微处理器100的每一核102可能会遇到MWAIT指令并根据本说明书操作共同使微处理器100进入最佳状态。流程开始于方块902。This operation is described from the perspective of a single core, but each core 102 of the microprocessor 100 may encounter the MWAIT instruction and operate according to this specification to make the microprocessor 100 enter the optimal state. Flow begins at block 902 .

在方块902中,一核102遇到一用于指定目标C-状态的MWAIT指令,在图9中标示为Cx,其中x是一非负整数值。流程进行到方块904。In block 902, a core 102 encounters a MWAIT instruction specifying a target C-state, denoted Cx in FIG. 9, where x is a non-negative integer value. Flow proceeds to block 904 .

在方块904中,核102写入一C位224集合及一C-状态栏位226值为x(在图9中标示为SYNC Cx)的同步请求至其同步暂存器108。此外,同步请求在其唤醒事件栏位204中指定核102在所有唤醒事件中被唤醒。因此,控制单元104令核102进入睡眠状态。更佳地说,核102在写入SYNC Cx之前,核102先写回并使其写入的本地高速缓冲存储器失效。流程进行到方块906。In block 904 , the core 102 writes a sync request to its sync register 108 with a set of C bits 224 and a C-status field 226 value of x (labeled SYNC Cx in FIG. 9 ). Additionally, the sync request specifies in its wakeup event field 204 that the core 102 wakes up on all wakeup events. Therefore, the control unit 104 puts the core 102 into a sleep state. More preferably, before the core 102 writes to SYNC Cx, the core 102 first writes back and invalidates the local cache it wrote to. Flow proceeds to block 906.

在方块906中,当所有核102已写入一SYNC Cx信号时,核102被控制单元104唤醒。如上所述,由其它核102写入的x值可能不同,并且控制单元104发出最低常用C-状态值至状态暂存器106状态字242的最低常用C-状态栏位246中(每一方块318)。在方块906之前,而核102处于睡眠状态时,其可由一唤醒事件所唤醒,像是一中断信号(例如,方块305和306)。更具体地说,但不保证该操作系统将执行所有核102的MWAIT指令,其可允许在一唤醒事件发生(例如,中断)指示核102其中之一有效地取消MWAIT指令之前,微处理器100执行与封装C-状态相关的省电操作。然而,在方块906中,一旦核102被唤醒,在时脉中断停用的期间内(例如,微码不允许其本身被中断),核102(事实上,所有的核102)由于(在方块902中)的MWAIT指令仍执行微码,并维持在微码中,直到方块924为止。换句话说,虽然所有核102中少部分已接收MWAIT指令以进入睡眠状态,单独的核102可处于睡眠状态中,但作为一封装的微处理器100不会指示该晶片集其已准备进入一封装睡眠状态。然而,一旦所有核102已同意进入一封装睡眠状态,其由在方块906中同步情况的发生有效地指示,主要核102被允许与晶片组完成一封装睡眠状态握手协议(例如,方块908、909和以下921),且未被中断及未有任何其它核102被中断。流程进行到判断方块907。In block 906, the cores 102 are woken up by the control unit 104 when all cores 102 have written a SYNC Cx signal. As noted above, the x values written by other cores 102 may be different, and the control unit 104 sends the lowest common C-state value into the lowest common C-state field 246 of the status word 242 of the state register 106 (per block 318). Prior to block 906, while core 102 was in sleep state, it may be awakened by a wakeup event, such as an interrupt signal (eg, blocks 305 and 306). More specifically, but without guaranteeing that the operating system will execute the MWAIT instruction for all cores 102, it may allow the microprocessor 100 to effectively cancel the MWAIT instruction before a wake-up event occurs (e.g., an interrupt) that instructs one of the cores 102 to effectively cancel the MWAIT instruction. Performs power saving operations related to package C-states. However, in block 906, once the core 102 is woken up, during the period when the clock interrupt is disabled (e.g., the microcode does not allow itself to be interrupted), the core 102 (in fact, all cores 102) due to (in the block 902) the MWAIT instruction still executes the microcode and remains in the microcode until block 924. In other words, although a small number of all cores 102 have received the MWAIT instruction to enter the sleep state, individual cores 102 may be in the sleep state, but the microprocessor 100 as a package will not indicate to the die set that it is ready to enter a sleep state. Encapsulates the sleep state. However, once all cores 102 have agreed to enter a package sleep state, which is effectively indicated by the occurrence of a synchronization condition in block 906, the primary core 102 is allowed to complete a package sleep state handshake protocol with the chipset (e.g., blocks 908, 909 and following 921), and are not interrupted and none of the other cores 102 are interrupted. The process proceeds to decision block 907 .

在判断方块907中,核102判断其是否为微处理器100的主要核102。更佳地说,若判断在重设时间其为BSP时,一核102为主要核102。若该核为主要核时,流程前进到方块908;否则,流程前进到方块914。In decision block 907 , the core 102 determines whether it is the primary core 102 of the microprocessor 100 . More preferably, one core 102 is the primary core 102 if it is determined to be the BSP at the reset time. If the core is the primary core, the process proceeds to block 908 ; otherwise, the process proceeds to block 914 .

在方块908中,主要核102写回并使共享高速缓冲存储器119失效,接着与可以采取适当行动以减少功率消耗的该晶片集通信。举例来说,由于当微处理器100处于封装C-状态时,存储器控制器和/或外部控制器皆维持失效,因此存储器控制器和/或外部控制器可避免侦测微处理器100的本地和共享高速缓冲存储器。举另一例子说明,该晶片组可传输信号至微处理器100使微处理器100采取省电操作(例如,如下所述的确立x86-style STPCLK、SLP、DPSLP、NAP、VRDSLP信号)。更佳地说,核102基于最低常用的C-状态栏位246值进行功率管理信息的通信。在一实施例中,核102发布一I/O读取总线周期至一提供晶片组相关的电源管理信息,例如,封装C-状态值的I/O地址。流程进行到方块909。In block 908, the primary core 102 writes back and invalidates the shared cache memory 119, then communicates with the chipset that it can take appropriate action to reduce power consumption. For example, the memory controller and/or the external controller can avoid probing the local and shared cache memory. As another example, the chipset may transmit signals to the microprocessor 100 to cause the microprocessor 100 to take power-saving operations (eg, assert x86-style STPCLK, SLP, DPSLP, NAP, VRDSLP signals as described below). More preferably, the core 102 communicates power management information based on the lowest common C-state field 246 value. In one embodiment, the core 102 issues an I/O read bus cycle to an I/O address that provides chipset-related power management information, eg, package C-state values. The flow goes to block 909.

在方块909中,主要核102等待晶片组确立(assert)STPCLK信号。更佳地说,若STPCLK信号在一预定数亮的时脉周期后未被确立时,控制单元104在中止其正进行的同步请求后,侦测此情况,唤醒所有核102并在错误码栏位248中指示该错误。流程前进到方块914。In block 909, the primary core 102 waits for the chipset to assert the STPCLK signal. More preferably, if the STPCLK signal is not asserted after a predetermined number of bright clock cycles, the control unit 104 detects this situation after suspending its ongoing synchronization request, wakes up all cores 102 and displays an error message in the error code column. The error is indicated in bit 248. Flow proceeds to block 914.

在方块914中,该核102写入一SYNC 14。在一实施例中,该同步请求在其唤醒事件栏位204中指定该核102未在任何唤醒事件中被唤醒。因此,控制单元104令核102进入睡眠状态。流程进行到方块916。In block 914, the core 102 writes a SYNC 14. In one embodiment, the sync request specifies in its wakeup event field 204 that the core 102 has not been woken up in any wakeup event. Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 916.

在方块916中,当所有核102已写了一SYNC 14时,核102由控制单元104所唤醒。流程进行到判断方块919。In block 916 , cores 102 are woken up by control unit 104 when all cores 102 have written a SYNC 14 . Flow proceeds to decision block 919.

在判断方块919中,核102判断其是否为微处理器100的主要核102。若是,则流程前进到方块921;否则,流程前进到方块924。In decision block 919 , the core 102 determines whether it is the primary core 102 of the microprocessor 100 . If so, the process proceeds to block 921 ; otherwise, the process proceeds to block 924 .

在方块921中,主要核102在微处理器100总线中发出一停止允许(grant)周期以通知该晶片组其可能采取跨核(例如,封装周边)与微处理器100整体相关的省电操作,像是避免微处理器100高速缓冲存储器的侦查、移除总线时脉(例如,x86-型BCLK)至微处理器100,并确立其它在总线中的信号(例如,x86-型SLP、DPSLP、NAP、VRDSLP),以使微处理器100移除时脉和/或电源至微处理器100的各个部分。虽然描述于本文中的实施例涉及到微处理器100及一与I/O读取相关的晶片集间的一握手协议(在方块908中),STPCLK的确立(在方块909中),并停止允许周期的发布(在方块921中),其与x86基础架构系统有历史相关,应可理解的是,其它实施例假设与其它具有不同协议指令集基础架构系统相关,但也可节省电能、提高性能和/或降低复杂度。流程进行到方块924。In block 921, the primary core 102 issues a stall grant cycle on the microprocessor 100 bus to notify the chipset that it may take power-saving operations across cores (e.g., around the package) with respect to the microprocessor 100 as a whole , such as avoiding snooping of microprocessor 100 cache memory, removing bus clocks (e.g., x86-type BCLK) to microprocessor 100, and asserting other signals on the bus (e.g., x86-type SLP, DPSLP , NAP, VRDSLP), so that the microprocessor 100 removes clock and/or power to various parts of the microprocessor 100 . Although the embodiments described herein relate to a handshake protocol (in block 908) between the microprocessor 100 and a chipset associated with I/O reads, the establishment of STPCLK (in block 909), and stop Allows the release of cycles (in block 921) that have historical relevance to x86 infrastructure systems, it being understood that other embodiments assume relevance to other infrastructure systems with different protocol instruction sets, but may also save power, improve performance and/or reduce complexity. Flow proceeds to block 924.

在方块924中,核102写入一睡眠请求(例如,睡眠位212为设置(set)及S位222为清除(clear)的睡眠请求)至同步暂存器108。此外,同步请求在其唤醒事件栏位204指示核102仅在STPCLK非确立唤醒事件(wakeup event of the de-assertion of STPCLK,即,解除确立的STPCLK的唤醒事件)中被唤醒。因此,控制单元104令核102进入睡眠状态。流程结束于方块924。In block 924 , the core 102 writes a sleep request (eg, a sleep request with the sleep bit 212 set and the S bit 222 clear ) to the sync register 108 . In addition, the sync request indicates in its wakeup event field 204 that the core 102 is only woken up during a wakeup event of the de-assertion of STPCLK (wakeup event of the de-assertion of STPCLK). Therefore, the control unit 104 puts the core 102 into a sleep state. Flow ends at block 924 .

请参阅图10,其是显示根据图9流程图微处理器100操作实施例的时序图。在此例子中,微处理器100配置具有三个核102,标示为核0、核1及核2,如图所示。然而,应可理解的是,在其它实施例中,微处理器100可包括不同数量的核102。Please refer to FIG. 10 , which is a timing diagram showing an embodiment of the operation of the microprocessor 100 according to the flowchart of FIG. 9 . In this example, the microprocessor 100 is configured with three cores 102, labeled Core 0, Core 1, and Core 2, as shown. However, it should be appreciated that in other embodiments, the microprocessor 100 may include a different number of cores 102 .

核0遇到一指定C-状态4的MWAIT指令(MWAIT C4)(每一方块902)。核0接着写一SYNC C4并进入睡眠状态(每一方块904)。核1遇到一指定C-状态3的MWAIT指令(MWAIT C3)(每一方块902)。核1接着写一SYNC C3并进入睡眠状态(每一方块904)。核2遇到一指定C-状态2的MWAIT指令(MWAIT C2)(每一方块902)。核2接着写一SYNC C2并进入睡眠状态(每一方块904)。如图所示,在每一核写入SYNC Cx的时间可能不同。事实上,在一些其它事件发生之前,例如一中断,一或多个核可能不会遇到一MWAIT指令。Core 0 encounters a MWAIT instruction specifying C-state 4 (MWAIT C4) (each block 902). Core 0 then writes a SYNC C4 and goes to sleep (per block 904). Core 1 encounters a MWAIT instruction specifying C-state 3 (MWAIT C3) (each block 902). Core 1 then writes a SYNC C3 and goes to sleep (per block 904). Core 2 encounters a MWAIT instruction specifying C-state 2 (MWAIT C2) (each block 902). Core 2 then writes a SYNC C2 and goes to sleep (per block 904). As shown, the time to write SYNC Cx may be different in each core. In fact, one or more cores may not encounter a MWAIT instruction until some other event occurs, such as an interrupt.

当所有核已经写入SYNC Cx时,控制单元104同时唤醒所有核(每一方块906)。主要核接着发出I/O读取总线周期(每一方块908),并等待STPCLK的确立(每方块909)。所有的核写入一SYNC 14,并进入睡眠状态(每一方块914)。由于只有主要核清空(Flush)共享高速缓冲存储器119,发出I/O读取总线周期并等待STPCLK确立,因此每一核写入SYNC 14的时间可能不同,如图所示。事实上,主要核在其它核后可以几百微秒的顺序写入SYNC 14。When all cores have written SYNC Cx, control unit 104 wakes up all cores simultaneously (per block 906). The main core then issues an I/O read bus cycle (per block 908) and waits for the assertion of STPCLK (per block 909). All cores write a SYNC 14 and go to sleep (per block 914). Since only the primary core flushes the shared cache memory 119, issues an I/O read bus cycle and waits for STPCLK assertion, each core may write to SYNC 14 at different times, as shown in the figure. In fact, the primary core may write to SYNC 14 on the order of hundreds of microseconds behind the other cores.

当所有核写入SYNC 14时,控制单元104同时唤醒所有核(每一方块916)。仅一主要核发出停止允许周期(Stop grant cycle)(每一方块921)。所有核写入在STPCLK非确立信号(~STPCLK)中等待的一睡眠请求并进入睡眠状态(每一方块924)。由于仅主要核发出停止允许周期,因此每一核写入睡眠请求的时间可能不同,如图所示。When all cores write to SYNC 14, control unit 104 wakes up all cores simultaneously (per block 916). Only one primary core issues a Stop grant cycle (per block 921). All cores write a sleep request waiting in STPCLK deassertion (~STPCLK) and go to sleep (per block 924). Since only the primary core issues a stall enable cycle, each core may write a sleep request at a different time, as shown.

当STPCLK信号被解除确立(de-asserted)时,控制单元104唤醒所有核。When the STPCLK signal is de-asserted, the control unit 104 wakes up all cores.

由图10可观察到,当核0执行握手协议时,核1和核2有益地可休眠一段有效的时间。然而,应当注意的是,需将微处理器100从封装睡眠状态中唤醒所需的时间通常与休眠时间长度成正比(例如,在睡眠状态时节省多大的功率)。因此,在封装睡眠状态相对较久的情况下(或者甚至其中一单独的核102睡眠状态时间是较长的),希望能进一步减少唤醒的发生及/或与握手协议相关所需唤醒的时间。图11描述单一核102处理的握手协议,而另一核102继续处于睡眠状态的一实施例。此外,根据图11的实施例中,节省功率进一步可通过降低响应一唤醒事件而被唤醒的核102数量而取得。It can be observed from FIG. 10 that core 1 and core 2 can advantageously sleep for an effective period of time while core 0 executes the handshake protocol. It should be noted, however, that the time required to wake up the microprocessor 100 from the package sleep state is generally proportional to the length of the sleep time (eg, how much power is saved while in the sleep state). Therefore, in cases where the package sleep state is relatively long (or even where a single core 102 sleep state time is long), it is desirable to further reduce the occurrence of wakeups and/or the time required for wakeups associated with handshaking protocols. FIG. 11 depicts an embodiment where a single core 102 handles the handshake protocol while the other core 102 remains in sleep state. Furthermore, according to the embodiment of FIG. 11 , power saving is further achieved by reducing the number of cores 102 that are woken up in response to a wakeup event.

请参阅图11,其是根据本发明另一实施例的微处理器100进入一低功率封装C-状态的操作流程图。图11的实施例使用微处理器100耦接至晶片组中MWAIT指令执行的例子进行说明。然而,应可理解的是,在其它实施例中,操作系统采用其它电源管理指令,并且最后同步的核102与整合至微处理器100中,且采用与描述不同握手协议的控制器通信。Please refer to FIG. 11 , which is an operation flowchart of the microprocessor 100 entering a low power package C-state according to another embodiment of the present invention. The embodiment of FIG. 11 is illustrated using an example in which the microprocessor 100 is coupled to the execution of the MWAIT instruction in the chipset. However, it should be understood that in other embodiments, the operating system employs other power management instructions, and ultimately the synchronized core 102 is integrated into the microprocessor 100 and communicates with a controller describing a different handshaking protocol.

图11的实施例在一些方面与图9的实施例相似。然而,在现存操作系统请求微处理器100进入非常低的功率状态并容忍与其相关延迟的环境中,图11的实施例被设计以便于节省潜在更大的功率。更具体地,图11的实施例有利于控制至核的功率并在必要时,如处理中断时,唤醒核中仅一核。实施例考虑在该微处理器100支持在图9及图11中两个模式的操作。此外,模式是可配置的,无论是在制造(例如,通过熔断器114)和/或经由软件控制或由微处理器100依据由MWAIT指令所指定的特定C-状态而自动决定。流程开始于方块1102。The embodiment of FIG. 11 is similar in some respects to the embodiment of FIG. 9 . However, in an environment where existing operating systems require the microprocessor 100 to enter very low power states and tolerate delays associated therewith, the embodiment of FIG. 11 is designed to facilitate potentially greater power savings. More specifically, the embodiment of FIG. 11 facilitates controlling power to cores and waking up only one of the cores when necessary, such as when processing an interrupt. The embodiment considers that the microprocessor 100 supports two modes of operation in FIG. 9 and FIG. 11 . Furthermore, the mode is configurable, either at manufacture (eg, via fuse 114 ) and/or via software control or automatically determined by microprocessor 100 depending on the particular C-state specified by the MWAIT instruction. Flow begins at block 1102 .

在方块1102中,核102遇到用于指定目标C-状态的MWAIT指令(MWAITCx),其在图11中表示为Cx,流程进行到方块1104。In block 1102 , core 102 encounters a MWAIT instruction (MWAITCx) specifying a target C-state, denoted as Cx in FIG. 11 , and flow proceeds to block 1104 .

在方块1104中,核102写入一C位224为set及一C-状态栏位226值为x(其在图11中标示为SYNC Cx)的同步请求至其同步暂存器108中。同步请求也设置了选择性唤醒(SELWAKE)位214及PG位208。此外,同步请求在其唤醒事件栏位204中指示核102在所有唤醒事件中被唤醒,除了STPCLK的确立和STPCLK的非确立(~STPCLK,即,STPCLK的解除确立)之外。(更佳地说,有其它唤醒事件,如AP启动时,该同步请求指定核102不被唤醒)。因此,控制单元104令核102进入睡眠状态,其包括因PG位208被设置而阻止提供功率至核102。此外,核102写回及使本地高速缓冲存储器无效,并在写入同步请求之前储存(最好为专用随机存取存储器116)其核102的状态。当随后核102被唤醒时(例如,在方块1137,1132或1106),核102将(例如,从PRAM 116)恢复其状态。如上所述,特别是相对于图3,当最后核102写入一具有选择性唤醒位214设置的同步请求时,除了最后写入核102外,该控制单元104会自动阻止所有核102的所有唤醒事件(每一方块326)。流程进行到方块1106。In block 1104 , the core 102 writes a sync request into its sync register 108 with a C bit 224 set and a C-status field 226 value of x (labeled SYNC Cx in FIG. 11 ). The sync request also sets the SELECTIVE WAKE-UP (SELWAKE) bit 214 and the PG bit 208 . Furthermore, the sync request indicates in its wakeup event field 204 that the core 102 wakes up on all wakeup events except STPCLK assertion and STPCLK deassertion (˜STPCLK, ie, deassertion of STPCLK). (More preferably, if there are other wake-up events, such as when the AP starts up, the synchronization request specifies that the core 102 will not be woken up). Accordingly, the control unit 104 puts the core 102 into a sleep state, which includes preventing power from being provided to the core 102 due to the PG bit 208 being set. In addition, the core 102 writes back and invalidates the local cache memory and stores (preferably in dedicated random access memory 116 ) its state of the core 102 before writing the synchronization request. When core 102 is subsequently awakened (eg, at blocks 1137, 1132, or 1106), core 102 will restore its state (eg, from PRAM 116). As mentioned above, especially with respect to FIG. 3 , when the last core 102 writes a synchronous request with the selective wake-up bit 214 set, the control unit 104 will automatically prevent all cores 102 except the last write to the core 102. Wake up event (each block 326). Flow proceeds to block 1106 .

在方块1106中,当所有核102已经写入一SYNC Cx时,控制单元104唤醒最后写入的核102。如上所述,控制单元104维持其它核102的S位222设置,即使控制单元104唤醒最后写入的核102并清除S位。在方块1106之前,当核102处于睡眠状态时,其可由一唤醒事件被唤醒,如一中断。然而,一旦核102在方块1106中被唤醒时,核102因MWAIT指令(方块1102)仍执行微码,并在中断被停用的期间内(例如,该微码不允许其本身被中断)保持在微码中,直到方块1124为止。换句话说,虽然不超过所有核102已收到一MWAIT指令以进入睡眠状态,仅单独的核102会休眠,但作为封装的微处理器100不指示该晶片组其已准备好进入一封装睡眠状态。然而,一旦所有核102已同意进入一封装睡眠状态时,其通过在方块1106的同步状态发生所指示,在方块906中被唤醒的核102(最后写入的核102,其造成同步情况发生)被允许与晶片组完成封装睡眠状态握手协议(例如,如下所示的方块1108、1109和1121)而不会被中断,且没有任何其它的核102被中断。流程进行到方块1108。In block 1106 , when all cores 102 have written a SYNC Cx, the control unit 104 wakes up the last written core 102 . As described above, the control unit 104 maintains the S bit 222 settings of the other cores 102 even if the control unit 104 wakes up the last core 102 to write to and clears the S bit. Before block 1106, when the core 102 is in sleep state, it may be woken up by a wakeup event, such as an interrupt. However, once the core 102 is woken up in block 1106, the core 102 still executes the microcode due to the MWAIT instruction (block 1102), and remains active during the period in which interrupts are disabled (e.g., the microcode does not allow itself to be interrupted). In microcode, until block 1124. In other words, although no more than all cores 102 have received a MWAIT instruction to enter a sleep state, only individual cores 102 will sleep, but the microprocessor 100 as a package does not indicate to the chipset that it is ready to enter a package sleep state. However, once all cores 102 have agreed to enter a package sleep state, which is indicated by the occurrence of a sync condition at block 1106, the core 102 that was woken up in block 906 (the core 102 that was last written to, which caused the sync condition to occur) is allowed to complete the package sleep state handshake protocol (eg, blocks 1108, 1109 and 1121 shown below) with the chipset without interruption, and without any other core 102 being interrupted. Flow proceeds to block 1108 .

在方块1108中,核102写回并使共享高速缓冲存储器119失效,接着与晶片组通信,其可能会采取适当的行动,以减少功率消耗。流程进行到方块1109。In block 1108, the core 102 writes back and invalidates the shared cache 119, then communicates with the chipset, which may take appropriate action to reduce power consumption. The flow proceeds to block 1109 .

在方块1109中,核102等待晶片组以确立STPCLK信号。更佳地说,如果STPCLK信号在一时脉周期预定数量后未确立时,控制单元104侦测此情况,并在终止其正进行的同步请求后唤醒所有核102,且在错误码栏位248中指示该错误。流程进行到方块1121。In block 1109, the core 102 waits for the chipset to assert the STPCLK signal. More preferably, if the STPCLK signal is not asserted after a predetermined number of clock cycles, the control unit 104 detects this situation, and wakes up all cores 102 after terminating its ongoing synchronization request, and in the error code field 248 indicates the error. The flow proceeds to block 1121 .

在方块1121中,核102发出一停止允许周期至总线上的晶片组。流程进行到方块1124。In block 1121, the core 102 issues a stall enable cycle to the chipset on the bus. Flow proceeds to block 1124 .

在方块1124中,核102写入一睡眠请求,例如,具有睡眠位212为设置(set)及S位222为清除(clear)及PG位208为设置(set),至同步暂存器108中。此外,同步请求在其唤醒事件栏位204中指定该核102仅在解除确立STPCLK的唤醒事件中被唤醒。因此,控制单元104令核102进入睡眠状态。流程进行到方块1132。In block 1124, the core 102 writes a sleep request, e.g., with the sleep bit 212 set and the S bit 222 clear and the PG bit 208 set, into the sync register 108 . Additionally, the sync request specifies in its wakeup event field 204 that the core 102 only wakes up on wakeup events that deassert STPCLK. Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 1132 .

在方块1132中,控制单元104侦测STPCLK非确立并唤醒核102。应注意的是,先前控制单元104唤醒核102,控制单元104也未限制电源至核102。有益的是,此时核102是唯一正在运作的核,这提供核102机会以使其执行任何必须被执行的动作,而没有其它核102正在运作。流程进行到方块1134。In block 1132 , the control unit 104 detects STPCLK de-assertion and wakes up the core 102 . It should be noted that before the control unit 104 wakes up the core 102 , the control unit 104 also does not limit the power to the core 102 . Beneficially, core 102 is the only core running at this time, which gives core 102 an opportunity to perform whatever action must be performed while no other core 102 is running. Flow proceeds to block 1134 .

在方块1134中,核102写入至控制单元104的一暂存器(图未示出)中以解开在其对应同步暂存器108的唤醒事件栏位204中所指定每一其它核102的唤醒事件。流程进行到方块1136。In block 1134, the core 102 writes to a register (not shown) of the control unit 104 to unlock each other core 102 specified in the wakeup event field 204 of its corresponding synchronization register 108 wakeup event. Flow proceeds to block 1136 .

在方块1136中,核102处理任何正进行指定该核102的唤醒事件。举例来说,在一实施例中,包括微处理器100的系统允许有向(both directed)的中断(例如,指向微处理器100一特定核的中断)和非向(non-directed)的中断(例如,当微处理器100选择时,可由微处理器100的任一核102所处理的中断)。一非向中断的例子通常被称为一“低优先级中断”。在一实施例中,微处理器100最好指向非向中断至在方块1132的解除确立STPCLK中被唤醒的单一核102,由于它已被唤醒,并能处理该中断以期望其它核102不具有任何正进行的唤醒事件,因此可以继续睡眠并限制电源。流程返回到方块1104。In block 1136 , the core 102 processes any wake-up events that are in progress specifying the core 102 . For example, in one embodiment, a system including microprocessor 100 allows both directed interrupts (e.g., interrupts directed to a particular core of microprocessor 100) and non-directed interrupts (eg, interrupts that may be handled by any core 102 of the microprocessor 100 when the microprocessor 100 selects). An example of an undirected interrupt is often referred to as a "low priority interrupt". In one embodiment, the microprocessor 100 preferably directs the non-directed interrupt to the single core 102 that was woken up in the deassertion of STPCLK at block 1132, since it has been woken up, and can handle the interrupt in the expectation that other cores 102 do not have Any wake-up events in progress, so sleep can continue and power is limited. Flow returns to block 1104 .

当唤醒事件在方块1134中被解除(unblcked)时,除了在方块1132中已被唤醒的核102之外,如果核102没有指定的唤醒事件正在进行,则有利于核102继续处于睡眠状态,并在每一方块1104中限制电源。然而,当唤醒事件在方块1134中被解除时,如果一指定的唤醒事件正由核102处理,则核将不限制电源(un-power-gated),并由控制单元104唤醒。在此情况下,不同的流程开始于图11中的方块1137。When the wakeup event is removed (unblcked) in block 1134, except for the core 102 that has been woken up in block 1132, if the core 102 has no specified wakeup event going on, it is beneficial for the core 102 to continue in the sleep state, and Power is limited in each block 1104 . However, when the wakeup event is dismissed in block 1134 , if a specified wakeup event is being processed by the core 102 , the core will be un-power-gated and woken up by the control unit 104 . In this case, a different flow begins at block 1137 in FIG. 11 .

在方块1137中,在唤醒事件在方块1134中被解除后,另一核102(例如,除了在方块1134中解除唤醒事件核102之外的核102)被唤醒。其它核102处理任何正进行并指向其它核102的唤醒事件,例如,处理一中断。流程从方块1137进行到方块1104。In block 1137 , after the wake event is dismissed in block 1134 , another core 102 (eg, a core 102 other than the core 102 that was dismissed in block 1134 ) is awakened. The other core 102 handles any ongoing wake-up event directed to the other core 102, eg, handles an interrupt. Flow proceeds from block 1137 to block 1104 .

请参阅图12,其是显示根据图11流程图的微处理器100操作一例子的时序图。在此例子中,微处理器100配置具有三个核102,标示为核0、核1和核2,如图所示。然而,应可理解的是,在其它实施例中,微处理器100可包括不同数量的核102。Please refer to FIG. 12 , which is a timing diagram showing an example of the operation of the microprocessor 100 according to the flowchart of FIG. 11 . In this example, microprocessor 100 is configured with three cores 102, labeled Core 0, Core 1, and Core 2, as shown. However, it should be appreciated that in other embodiments, the microprocessor 100 may include a different number of cores 102 .

核0遇到一指定C-状态7的MWAIT指令(MWAIT C7)(每一方块1102)。在此例子中,C-状态7允许限制电源。核0接着写入一选择性唤醒位214为设置(set)(如图12中所示的“选择性唤醒”)及PG位208为设置(set)的SYNC C7,并进入睡眠状态及限制电源(每一方块1104)。核1遇到一指定C-状态为7的MWAIT指令(每一方块1102)。核1接着写入选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7,并进入睡眠状态及限制电源(每一方块1104)。核2遇到一指定C-状态为7的MWAIT指令(每一方块1102)。核2接着写入具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7,并进入睡眠状态及限制电源(每一方块1104)。(然而,在描述于方块314一最佳的实施例中,最后写入的核无法限制电源)。如图所示,每一核的写入同SYNC C7的时间可能不同。Core 0 encounters a MWAIT instruction specifying C-state 7 (MWAIT C7) (each block 1102). In this example, C-State 7 allows power to be limited. Core 0 then writes a selective wake-up bit 214 to set (set) ("selective wake-up" as shown in Figure 12) and PG bit 208 to set (set) SYNC C7, and enter sleep state and limit power (each block 1104). Core 1 encounters a MWAIT instruction specifying a C-state of 7 (per block 1102). Core 1 then writes SYNC C7 with selective wakeup bit 214 set and PG bit 208 set, and goes to sleep and limits power (per block 1104). Core 2 encounters a MWAIT instruction specifying a C-state of 7 (per block 1102). Core 2 then writes to SYNC C7 with selective wakeup bit 214 set and PG bit 208 set, and goes to sleep and limits power (per block 1104). (However, in a preferred embodiment depicted at block 314, the last written core cannot limit power). As shown in the figure, the writing time of each core may be different from that of SYNC C7.

当最后写入的核写入具有选择性唤醒位214为设置(set)的SYNC C7时,该控制单元104阻挡(block off)所有最后写入核的唤醒事件(每一方块326),在图12的例子为核2。此外,控制单元104仅唤醒最后写入的核(每一方块1106),因其它核持续睡眠且限制电源,而核2与晶片组执行握手协议,因此可节省功率。核2接着发出I/O读取总线周期(每一方块1108),并等待STPCLK的确立(每一方块1109)。为了响应STPCLK,核2发出停止允许周期(每一方块1121),并写入一具有在STPCLK解除中等待PG位208为设置(set)的睡眠请求并进入睡眠状态及限制功率(每一方块1124)。上述核可以休眠并限制功率一段相对长的时间。When the last written core writes SYNC C7 with selective wake-up bit 214 set (set), the control unit 104 blocks (block off) all wake-up events written to the last core (each block 326), shown in FIG. An example of 12 is core 2. In addition, the control unit 104 only wakes up the last written core (per block 1106 ), since the other cores continue to sleep and limit power, while core 2 performs handshaking with the chipset, thus saving power. Core 2 then issues an I/O read bus cycle (per block 1108) and waits for the assertion of STPCLK (per block 1109). In response to STPCLK, core 2 issues a stop enable cycle (each block 1121), and writes a sleep request with wait for PG bit 208 set (set) in STPCLK release and enters sleep state and limits power (each block 1124 ). The aforementioned cores may sleep and limit power for a relatively long period of time.

当STPCLK无法确立时,控制单元104仅唤醒核2(每一方块1132)。在图12的例子中,该晶片组无法确立STPCLK以响应一非向中断的接收,其转发至微处理器100。微处理器100指示非向中断至核2,其因其它核继续处于睡眠状态及限制电源而节省功率。核解除其它核(每一方块1134)的唤醒事件并服务非向中断(每一方块1136)。核2接着重新写入一具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7,并进入睡眠状态且限制电源(每一方块1104)。When STPCLK fails to assert, control unit 104 only wakes up core 2 (per block 1132). In the example of FIG. 12 , the chipset fails to assert STPCLK which is forwarded to microprocessor 100 in response to the receipt of an undirected interrupt. Microprocessor 100 directs an undirected interrupt to core 2, which saves power as the other cores remain asleep and limits power. The core disarms the wakeup event for other cores (per block 1134) and services undirected interrupts (per block 1136). Core 2 then rewrites a SYNC C7 with selective wakeup bit 214 set and PG bit 208 set, and goes to sleep with power limited (per block 1104).

当核2写入具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7时,由于其它核的同步请求仍正在进行,例如,其它核的S位222并非由核2唤醒所清除,因此该控制单元104阻挡(block off)除了核2之外所有核的唤醒事件,例如,最后写入核(每一方块326)。此外,控制单元104仅唤醒核102(每一方块1106)。核2接着发出I/O读取总线周期(每一方块1108),并等待STPCLK的确立(每一方块1109)。为了响应STPCLK,核2发出停止允许周期(每一方块1121),并写入一具有在STPCLK无法确立中等候的PG位208为设置(set)的睡眠请求,并进入睡眠状态且限制功率(每一方块1124)。When core 2 writes SYNC C7 with selective wake-up bit 214 as set (set) and PG bit 208 as set (set), since the sync request of other cores is still in progress, for example, the S bit 222 of other cores is not controlled by Core 2 wakeup is cleared, so the control unit 104 blocks wakeup events for all cores except core 2, eg, last write core (per block 326). Furthermore, control unit 104 only wakes up core 102 (per block 1106). Core 2 then issues an I/O read bus cycle (per block 1108) and waits for the assertion of STPCLK (per block 1109). In response to STPCLK, core 2 issues a stop enable cycle (per block 1121), and writes a sleep request with the PG bit 208 waiting in STPCLK not asserted set (set), and goes to sleep and limits power (per block 1121) A block 1124).

当STPCLK无法确立时,控制单元104仅唤醒核2(每一方块1132)。在图12的例子中,STPCLK因其它非向中断而被解除确立。因此,微处理器100指示该中断至核2,这可节省功率。核2再解除其它核的唤醒事件(每一方块1134)并服务该非向中断(每一方块1136)。核2接着再写入一具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7,并进入睡眠状态且限制功率(每一方块1104)。When STPCLK fails to assert, control unit 104 only wakes up core 2 (per block 1132). In the example of Figure 12, STPCLK is deasserted by other unbound interrupts. Therefore, microprocessor 100 directs the interrupt to core 2, which saves power. Core 2 then deasserts the other core's wakeup event (per block 1134) and services the undirected interrupt (per block 1136). Core 2 then writes a SYNC C7 with selective wakeup bit 214 set and PG bit 208 set, and goes to sleep with limited power (per block 1104).

此周期可持续相当长的时间,即仅非向中断被产生。图13是显示一指示一除了最后写入核之外不同核中断处理的例子。This cycle can last for quite a long time, ie only non-directional interrupts are generated. FIG. 13 is an example showing an indication of a different core interrupt handling except the last write core.

可通过比较图10和图12知道,在图12中的实施例较为有利,一旦核102开始进入睡眠状态(在图12的例子中写入SYNC C7之后),仅一核102被再次唤醒以与晶片组执行握手协议,且其它核102保持睡眠,若核102处在一相当长的睡眠状态下,则可为一显著的优点。功率节省可能非常显著,特别是在操作系统识别在系统中对于单一核102处理工作负载非常小的情况下。Can be known by comparing Fig. 10 and Fig. 12, the embodiment in Fig. 12 is comparatively favorable, once core 102 starts to enter sleep state (after writing SYNC C7 in the example of Fig. 12), only one core 102 is woken up again with The chipset executes the handshaking protocol, and the other cores 102 remain asleep, which can be a significant advantage if the cores 102 are in a sleep state for a significant amount of time. The power savings can be significant, especially if the operating system recognizes that there is very little processing workload for a single core 102 in the system.

此外,有利的是,只要没有唤醒事件被指示到其它核102,则仅一核102被唤醒(以提供服务非向事件,像是一低优先级中断)。再来,若核102处于一相当长的睡眠状态,则可能具有显著的优势。除了相对不频繁的非向中断,如USB中断,尤其是在系统中不具有有效负载的情况下,功率节省可以是显著的。更进一步地,即使一唤醒事件发生被指示到另一核102时(例如,中断操作系统指示至一单一核102,像是操作系统定时器中断),实施例可有利地动态切换单一核102,其执行封装睡眠状态协议及服务非向唤醒事件,如图13所示,以便享有唤醒仅一单一核102的好处。Furthermore, advantageously, only one core 102 is woken up (to service an undirected event, such as a low priority interrupt) as long as no wakeup event is indicated to the other cores 102 . Again, there may be significant advantages if the core 102 is in a sleep state for a relatively long period of time. Except for relatively infrequent undirected interrupts, such as USB interrupts, especially if there is no payload in the system, the power savings can be significant. Further, embodiments may advantageously switch a single core 102 dynamically even if a wakeup event occurs that is indicated to another core 102 (eg, an interrupt operating system indicates to a single core 102, such as an operating system timer interrupt), It implements the encapsulated sleep state protocol and services non-directed wakeup events, as shown in FIG. 13 , in order to enjoy the benefit of waking up only a single core 102 .

请参阅图13,其是显示根据图11流程图的微处理器100操作一例子的时序图。图13的例子在许多方面与图12的例子相似。然而,在STPCLK被解除确立的第一实例中,该中断是一指向核1的中断(而不是图12例子中的一非向中断)。因此,控制单元104唤醒核2(每一方块1132),并接着在唤醒事件由核2解除(每一方块1134)后唤醒核1。核2接着再写入一具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7,并进入睡眠状态且限制功率(每一方块1104)。Please refer to FIG. 13 , which is a timing diagram showing an example of the operation of the microprocessor 100 according to the flowchart of FIG. 11 . The example of FIG. 13 is similar in many respects to the example of FIG. 12 . However, in the first instance where STPCLK is deasserted, the interrupt is an interrupt directed to core 1 (rather than an undirected interrupt as in the example of FIG. 12). Accordingly, control unit 104 wakes up core 2 (per block 1132 ), and then wakes up core 1 after the wakeup event is dismissed by core 2 (per block 1134 ). Core 2 then writes a SYNC C7 with selective wakeup bit 214 set and PG bit 208 set, and goes to sleep with limited power (per block 1104).

核1服务定向中断(每一区块1137)。核1接着再次写入具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7,并进入睡眠状态且限制功率(每一方块1104)在此例子中,核2在核1写入SYNC C7前写入其SYNCC7。因此,虽然核0在其写入初始SYNC C7时仍具有其S位222set,但核1当其被唤醒时S位222仍被清除。因此,当核2在解除唤醒事件后写入SYNC C7时,并非最后核写入同步C7请求,相反地,核1成为最后的核写同步C7请求。Core 1 services directed interrupts (per block 1137). Core 1 then again writes to SYNC C7 with selective wakeup bit 214 set and PG bit 208 set, and goes to sleep with power limited (per block 1104). In this example, core 2 Write to its SYNCC7 before core 1 writes to its SYNC C7. Thus, while Core 0 still has its S bit 222 set when it wrote the initial SYNC C7, Core 1 still has its S bit 222 cleared when it wakes up. Therefore, when Core 2 writes to SYNC C7 after a de-wakeup event, it is not the last core to write to SYNC C7 request, conversely, Core 1 becomes the last core to write to SYNC C7 request.

当核1写入一具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNCC7时,因核0的同步请求仍正在进行(例如,其没有被核1及核2的唤醒所清除),而核2(在此例中)已写入SYNC 14请求,所以该控制单元104阻挡除了核1之外所有核的唤醒事件,例如,最后写入核(每一方块326)。此外,控制单元104仅唤醒核1(每一方块1106)。核1接着发出I/O读取总线周期(每一方块1108),并等待STPCLK确立(每一方块1109)。为了响应STPCLK,核1发出停止允许周期(每一方块1121),并写入具有等待STPCLK解除确立的PG位208为设置(set)的睡眠请求,并进入睡眠状态且限制功率(每一方块1124)。When core 1 writes a SYNCC7 with selective wake-up bit 214 set (set) and PG bit 208 set (set), because core 0's synchronization request is still in progress (e.g., it is not recognized by core 1 and core 2 cleared by wake-up), and core 2 (in this example) has written to the SYNC 14 request, so the control unit 104 blocks wake-up events for all cores except core 1, e.g., the last write to core (each block 326 ). Furthermore, control unit 104 only wakes up core 1 (per block 1106). Core 1 then issues an I/O read bus cycle (per block 1108) and waits for STPCLK to assert (per block 1109). In response to STPCLK, core 1 issues a stall enable cycle (per block 1121), writes a sleep request with PG bit 208 set to wait for STPCLK deassertion, and goes to sleep with power limited (per block 1124 ).

当STPCLK被解除确立时,控制单元104仅唤醒核1(每一方块1132)。在图12的例子中,STPCLK由于一非向中断而解除确立;因此,微处理器100指示非向中断至核1,其可节省功率。由核1处理非向中断的周期可持续相当长的时间,即,仅非向中断被产生。在此种方式中,微处理器100可有利地通过指示非向中断至核102使最近的中断被指示以节省功率,其示于与切换至一不同核相关图13的例子中。核1再次解除其它核的唤醒事件(每一方块1134)并服务非向中断(每一方块1136)。核1接着再次写入一具有选择性唤醒位214为设置(set)及PG位208为设置(set)的SYNC C7,并进入睡眠状态且限制功率(每一方块1104)。When STPCLK is deasserted, control unit 104 only wakes up core 1 (per block 1132). In the example of FIG. 12, STPCLK is deasserted due to an unbound interrupt; therefore, microprocessor 100 directs the unbound interrupt to core 1, which saves power. The cycle in which unbound interrupts are processed by core 1 can last for a considerable amount of time, ie only unbound interrupts are generated. In this way, the microprocessor 100 can advantageously save power by directing non-directed interrupts to the core 102 so that the most recent interrupt is directed, which is shown in the example of FIG. 13 in relation to switching to a different core. Core 1 again deasserts the other cores' wakeup events (per block 1134) and services undirected interrupts (per block 1136). Core 1 then again writes a SYNC C7 with selective wakeup bit 214 set and PG bit 208 set, and goes to sleep with limited power (per block 1104).

应可理解的是,虽然电源管理指令为一x86MWAIT指令的实施例已被描述,但其它同步请求被使用以执行电源管理指令的实施例可以被考虑。举例来说,微处理器100可执行类似操作以响应由一组具有不同C-状态相关的预设I/O端口地址的读取。在举另一例子,功率管理指令可由与x86架构不同的指令集架构得到。It should be appreciated that while embodiments have been described in which the power management instruction is an x86MWAIT instruction, other embodiments in which synchronization requests are used to execute the power management instruction are contemplated. For example, microprocessor 100 may perform similar operations in response to reads by a set of preset I/O port addresses associated with different C-states. As another example, power management instructions may be derived from an ISA different from the x86 architecture.

多核处理器的动态重新配置Dynamic Reconfiguration of Multicore Processors

微处理器100的每一核102基于微处理器100每一核102的配置产生配置相关的值。更佳地说,每一核102的微码产生、储存并使用配置相关的值。实施例描述配置相关值的产生可以是动态且有益的,其描述如下。配置相关值的例子包括,但不局限于以下内容。Each core 102 of the microprocessor 100 generates a configuration-related value based on the configuration of each core 102 of the microprocessor 100 . More preferably, the microcode of each core 102 generates, stores, and uses configuration-dependent values. Embodiment Description The generation of configuration-related values can be dynamic and beneficial, as described below. Examples of configuration related values include, but are not limited to the following.

每一核102产生一与上述图2相关的整体核数量。与仅在核102中驻留晶体406的核102相关的核102的本地核数量256相比,整体核数量是指与微处理器100所有核102相关的整体核102的核数量。在一实施例中,核102产生整体核数量,其整体核数量为核102晶体数量258与每一晶体的核102数量的乘积及其本地核数量256的总和,如下所示:Each core 102 produces an overall core count related to FIG. 2 above. The overall core count refers to the core count of the overall cores 102 associated with all cores 102 of the microprocessor 100 as compared to the local core count 256 of the cores 102 associated with cores 102 that only reside crystals 406 in the cores 102 . In one embodiment, the core 102 produces an overall core count that is the sum of the product of the core 102 crystal count 258 and the core 102 count per crystal and its local core count 256, as follows:

整体核数量=(晶体数×每一晶体的核数量)+本地核数量。Overall number of nuclei = (number of crystals x number of nuclei per crystal) + number of local nuclei.

每一核102还产生一虚拟核数量。该虚拟核数量为整体核数量减去具有一低于即时核102的整体核数量的整体核数量的停用核102数量。因此,在该微处理器100的所有核102可用的情况下,整体核数量与虚拟核数量是相同的。然而,若一或多个核102停用、有缺陷时,一核102的虚拟核数量可能不同于其整体核数量。在一实施例中,每一核102填入其虚拟核数量至其对应的APIC ID暂存器的APIC ID栏位。然而,根据另一实施例(例如,图22和图23),则不属于此种情况。此外,在一实施例中,操作系统可更新在APIC ID暂存器中的APIC ID。Each core 102 also generates a number of virtual cores. The virtual core count is the overall core count minus the number of deactivated cores 102 having an overall core count lower than the overall core count of the immediate cores 102 . Therefore, in the case that all cores 102 of the microprocessor 100 are available, the overall core number and the virtual core number are the same. However, the virtual core count of a core 102 may differ from its overall core count if one or more cores 102 are disabled or defective. In one embodiment, each core 102 fills its virtual core number into the APIC ID field of its corresponding APIC ID register. However, according to another embodiment (eg, FIGS. 22 and 23 ), this is not the case. Additionally, in one embodiment, the operating system can update the APIC ID in the APIC ID register.

每个核102还产生一BSP旗标,其指示该核102是否为BSP。在一实施例中,一般来说(例如,当在图23中“所有核BSP”的功能停用时)一核102指定本身为引导序列处理器(Bootstrap Processor,BSP)且每一其它核102指定其本身为一应用处理器(ApplicationProcessor,AP)。在重设之后,AP核102进行初始化,接着进入睡眠状态等待BSP通知开始读取并执行指令。相反地,在AP核102初始化之后,BSP核102立即开始读取并执行系统固件的指令,例如,BIOS启动码,其用以初始化系统(例如,验证系统存储器及周边设备是否正常工作并初始化和/或配置它们)并引导操作系统,例如,载入操作系统(例如,从磁盘中载入),并将控制转移至操作系统。在引导操作系统之前,BSP决定系统配置(例如,核102或逻辑处理器在系统中的数量),并将其储存在存储器中,以使操作系统可在系统配置启动后读取。在操作系统在被引导后,指示AP核102开始读取并执行操作系统指令。在一实施例中,一般来说(例如,当图22和图23中“修改BSP”和“所有核的BSP”的功能,分别停用时),一核102若其虚拟核数量为0时,则指定本身为BSP,而所有其它核102指定本身为一AP核102。最佳地,一核102填入其BSP旗标相关配置值至对应其APIC的APIC基底地址暂存器中的BSP旗标位。根据一实施例中,如上所述,BSP为方块907及919中的主要核102,其执行图9的封装睡眠状态握手协议。Each core 102 also generates a BSP flag that indicates whether the core 102 is a BSP. In one embodiment, in general (for example, when the function of "all cores BSP" is disabled in FIG. Designate itself as an application processor (Application Processor, AP). After the reset, the AP core 102 initializes, and then enters the sleep state to wait for the BSP notification to start reading and executing instructions. On the contrary, after the AP core 102 is initialized, the BSP core 102 immediately starts to read and execute the instructions of the system firmware, such as the BIOS startup code, which is used to initialize the system (for example, verify whether the system memory and peripheral devices are working normally and initialize and /or configure them) and boot the operating system, eg, load the operating system (eg, from disk), and transfer control to the operating system. Before booting the operating system, the BSP determines the system configuration (eg, the number of cores 102 or logical processors in the system) and stores it in memory so that the operating system can read the system configuration after boot. After the operating system is booted, the AP core 102 is instructed to read and execute operating system instructions. In one embodiment, in general (for example, when the functions of “Modify BSP” and “BSP of all cores” in FIG. 22 and FIG. 23 are disabled respectively), if the number of virtual cores of a core 102 is 0 , designate itself as a BSP, and all other cores 102 designate themselves as an AP core 102 . Preferably, a core 102 fills its BSP flag related configuration value into the BSP flag bit in the APIC base address register corresponding to its APIC. According to one embodiment, as described above, the BSP is the main core 102 in blocks 907 and 919 , which executes the encapsulation sleep state handshake protocol of FIG. 9 .

每一核102还产生用于填入APIC基底暂存器的一APIC基值。APIC基底地址基于核102的APIC ID而产生。在一实施例中,操作系统可更新在APIC基底地址暂存器中的APIC基底地址。Each core 102 also generates an APIC base value for filling the APIC base register. The APIC base address is generated based on the APIC ID of the core 102 . In one embodiment, the operating system can update the APIC base address in the APIC base address register.

每一核102还产生一晶体主要指示,其指示该核102是否为包括该核102的晶体406的主要核102。Each core 102 also produces a crystal master indication indicating whether that core 102 is the master core 102 of the crystal 406 that includes that core 102 .

每一核102还产生一晶片主要指示,其指示该核102是否为包括即时核102晶片的主要核,其中假设该微处理器100被配置有晶片,其详细描述如上。Each core 102 also generates a chip master indication, which indicates whether the core 102 is the master core including the real-time core 102 chip, assuming that the microprocessor 100 is configured with the chip, which is described in detail above.

每一核102计算配置相关值并操作使用该配置相关值,使得包括微处理器100的系统正常运作。举例来说,系统基于其相关的APIC ID指示中断请求至核102。APIC ID决定核102应响应哪个中断请求。更具体地说明,每一中断请求包括一目地识别符,并且一核102仅当目地识别符与核102的APIC ID匹配时响应一中断请求(或若该中断请求识别符为一用以指示其为一请求所有核102的特殊值)。在举另一例子,每一核102必须知道其是否为BSP,以使其执行初始BIOS码并引导操作系统,且在一实施例中执行如图9所描述的封装睡眠状态握手协议。实施例描述如下(参阅图22和23),其中BSP旗标和APIC ID可因特定目的而由其正常的值中作修改,像是用于测试和/或调试。Each core 102 calculates a configuration-dependent value and operates using the configuration-dependent value so that the system including the microprocessor 100 operates normally. For example, the system directs an interrupt request to a core 102 based on its associated APIC ID. The APIC ID determines which interrupt request the core 102 should respond to. More specifically, each IRQ includes a DID, and a core 102 responds to an IRQ only if the DID matches the APIC ID of the core 102 (or if the IRID is one to indicate its is a special value that requests all cores 102). As another example, each core 102 must know whether it is a BSP in order for it to execute the initial BIOS code and boot the operating system and, in one embodiment, execute the encapsulated sleep state handshake protocol as described in FIG. 9 . Embodiments are described below (see FIGS. 22 and 23 ) where the BSP flag and APIC ID can be modified from their normal values for specific purposes, such as for testing and/or debugging.

请参阅图14,其是显示微处理器100动态重新配置的流程图。在图14的说明中,以图4的多晶体微处理器100作为参考,其包括两个晶体406和八个核102。然而,应可理解的是,所描述的动态重新配置可使用具有不同配置的微处理器100,即具有多于两个晶体或单个晶体,且多或少于八个核102但至少两个核102。此操作是从一单一核的角度所描述,但微处理器100的每一核102根据该描述以整体动态地操作并重新配置该微处理器100。流程开始于方块1402。Please refer to FIG. 14 , which is a flowchart showing dynamic reconfiguration of the microprocessor 100 . In the illustration of FIG. 14 , reference is made to the polycrystalline microprocessor 100 of FIG. 4 , which includes two crystals 406 and eight cores 102 . It should be understood, however, that the described dynamic reconfiguration may use microprocessors 100 having different configurations, that is, having more than two crystals or a single crystal, and having more or fewer than eight cores 102 but at least two 102. This operation is described from the perspective of a single core, but each core 102 of the microprocessor 100 dynamically operates and reconfigures the microprocessor 100 as a whole according to the description. Flow begins at block 1402 .

在方块1402中,微处理器100被重置,且微处理器100的硬件基于可用核102的数量及驻留于核104的晶体数量填入合适的值至每一核102的配置暂存器112中。在一实施例中,本地核数量256及晶体数量258被硬连线(hardwired)。如上所述,硬件可决定是否由熔断器114烧断或未烧断的状态启用或停用一核102。流程进行到方块1404。In block 1402, the microprocessor 100 is reset and the hardware of the microprocessor 100 fills the configuration registers of each core 102 with appropriate values based on the number of available cores 102 and the number of crystals residing in the core 104 112 in. In one embodiment, the number of local cores 256 and the number of crystals 258 are hardwired. As described above, hardware may determine whether a core 102 is enabled or disabled by the blown or unblown state of the fuse 114 . Flow proceeds to block 1404 .

在方块1404中,核102由配置暂存器112中读取配置字252。核102接着基于在方块1402中所读取的配置字252值产生其相关值。在多晶体微处理器100配置的情况下,在方块1404中所产生的配置相关值将不考虑其它晶体406的核102。然而,在方块1414及1424中(以及图15中方块1524)所产生的配置相关值将考虑其它晶体406的核102,如下所述。流程进行到方块1406。In block 1404 , the core 102 reads the configuration word 252 from the configuration register 112 . The core 102 then generates its correlation value based on the configuration word 252 value read in block 1402 . In the case of a multi-crystal microprocessor 100 configuration, the configuration-dependent values generated in block 1404 will not take into account the cores 102 of other crystals 406 . However, the configuration-dependent values generated in blocks 1414 and 1424 (and block 1524 in FIG. 15) will take into account the cores 102 of other crystals 406, as described below. Flow proceeds to block 1406.

在方块1406中,核102使在本地配置暂存器112中的本地核102的致能位254值被传播至远端晶体406配置暂存器112对应的致能位254。举例来说,请参考图4的配置,一在晶体A 406A中的核102使与在晶体A 406A(本地晶体)的配置暂存器112中核A、B、C及D(本地核)相关的致能位254被传播至与在晶体B 406B(远端晶体)的配置暂存器112中核A、B、C及D相关的致能位254。相反地,一在晶体B 406B中的核102使与在晶体B 406B(本地晶体)的配置暂存器112中核E、F、G及H(本地核)相关的致能位254被传播至与在晶体A 406A(远端晶体)的配置暂存器112中核E、F、G及H相关的致能位254。在一实施例中,核102通过写入本地配置暂存器112传播至其它晶体406。更佳地说,通过核102写入至本地配置暂存器112使本地配置暂存器没有发生改变,但会造成本地控制单元104传播本地致能位254值至远端晶体406中。流程进行至方块1408。In block 1406 , the core 102 causes the value of the enable bit 254 of the local core 102 in the local configuration register 112 to be propagated to the corresponding enable bit 254 of the configuration register 112 of the remote crystal 406 . For example, referring to the configuration of FIG. 4, a core 102 in crystal A 406A associates cores A, B, C, and D (local core) in the configuration register 112 of crystal A 406A (local crystal). The enable bits 254 are propagated to the enable bits 254 associated with cores A, B, C, and D in the configuration register 112 of crystal B 406B (remote crystal). Conversely, a core 102 in crystal B 406B has the enable bits 254 associated with cores E, F, G, and H (local cores) in the configuration register 112 of crystal B 406B (local crystal) propagated to the The enable bits 254 associated with cores E, F, G, and H are in configuration register 112 of crystal A 406A (remote crystal). In one embodiment, the core 102 propagates to other crystals 406 by writing to the local configuration register 112 . More preferably, writing to the local configuration register 112 by the core 102 leaves the local configuration register unchanged, but causes the local control unit 104 to propagate the value of the local enable bit 254 to the remote crystal 406 . Flow proceeds to block 1408 .

在方块1408中,核102写入一同步情况8(在图8中标示为SYNC 8)的同步请求至其同步暂存器108中。因此,控制单元104令核102进入睡眠状态。流程进行到方块1412。In block 1408 , the core 102 writes a sync request into its sync register 108 for a sync case 8 (labeled SYNC 8 in FIG. 8 ). Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 1412 .

在方块1412中,当在由核集合栏位228指定的核集合中所有可用核102已写入一SYNC 8时,控制单元104唤醒核102。值得注意的是,在一多晶体406微处理器100配置的情况下,同步情况发生可为一多晶体同步情况发生。也就是说,控制单元104将等待以唤醒(或在核102未设置睡眠位212从而决定不睡眠的情况下中断)核102,直到在核集合栏位228(其可以包括在晶体406中的核102)写入其同步请求为止。流程进行到方块1414。In block 1412 , the control unit 104 wakes up the cores 102 when all available cores 102 in the core set specified by the core set field 228 have written a SYNC 8 . It should be noted that, in the case of a multi-crystal 406 microprocessor 100 configuration, the synchronous event occurrence can be a multi-crystal synchronous event occurrence. That is, the control unit 104 will wait to wake up (or interrupt if the core 102 has not set the sleep bit 212 and thus decides not to sleep) the core 102 until the core set field 228 (which may include a core in the crystal 406 102) Write its synchronization request. Flow proceeds to block 1414 .

在方块1414中,核102再次读取配置暂存器112并基于包括由远端晶体所传送致能位254的正确值的配置字252新值产生其配置相关值,流程前进到判断方块1416。In block 1414 , the core 102 reads the configuration register 112 again and generates its configuration-related value based on the new value of the configuration word 252 including the correct value of the enable bit 254 transmitted by the remote crystal, and the flow proceeds to decision block 1416 .

在判断方块1416中,核102决定其是否应停用其本身。在一实施例中,熔断器114因该微码在其重置处理中读取(判断方块1416之前),以指示核102应停用其本身而被烧断,故核102决定其需停用其本身。熔断器114可在微处理器100的制造期间或之后被烧断。在另一实施例中,更新的熔断器114值可被扫描至保持暂存器中,如上所述,并且被扫描的值指示该核102应被停用。图15是描述核102由不同的方式判断其应被停止使用的另一实施例。若核102决定其应被停用时,流程进行到方块1417;否则,流程进行到方块1418。In decision block 1416, core 102 decides whether it should disable itself. In one embodiment, fuse 114 is blown because the microcode reads (before decision block 1416) in its reset process to indicate that core 102 should disable itself, so core 102 determines that it needs to disable itself. Fuse 114 may be blown during or after manufacture of microprocessor 100 . In another embodiment, updated fuse 114 values may be scanned into holding registers, as described above, and the scanned values indicate that the core 102 should be disabled. FIG. 15 illustrates another embodiment in which the core 102 determines that it should be deactivated in different ways. If the core 102 determines that it should be disabled, the process proceeds to block 1417 ; otherwise, the process proceeds to block 1418 .

在方块1417中,核102写入停用核位236以使其本身由可用核102的列表中移除,例如,清除在配置暂存器112的配置字252中其对应的致能位254。此后,核102可防止其本身执行任何更多的指令,更佳地通过设置一或多个位来以关闭其时脉信号,并移除其电源。流程在方块1417中结束。In block 1417 , the core 102 writes the disable core bit 236 to remove itself from the list of available cores 102 , eg, clear its corresponding enable bit 254 in the configuration word 252 of the configuration register 112 . Thereafter, core 102 may prevent itself from executing any further instructions, preferably by setting one or more bits to turn off its clock signal, and remove its power supply. The flow ends in block 1417.

在方块1418中,核102写入一同步情况9(在图14中标示为SYNC 9)的同步请求至同步暂存器108中。因此,控制单元104令核102进入睡眠状态。流程进行到方块1422。In block 1418 , the core 102 writes a sync request for a sync case 9 (labeled SYNC 9 in FIG. 14 ) into the sync register 108 . Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 1422.

在方块1422中,当所有启用的核102已写入一SYNC 9时,核102由控制单元104所唤醒。此外,在一多晶体406微处理器100配置的情况下,同步情况发生基于在配置暂存器112中的已更新值可能为一晶体同步情况发生。再者,当控制单元104决定一同步情况是否发生时,控制单元104将排除考虑在方块1417中停用其本身的核102。更详细地说明,在一情况中,在未停用其本身的核102在方块1417中写入同步暂存器108之前,所有其它核102(除了停用其本身的核102之外)写入一SYNC 9,接着当未停用其本身的核102在方块1417中的停用核位设置写入同步暂存器108时,控制单元104将侦测同步情况的发生(在方块316中)。当控制单元104因停用核102的致能位254为清除的(clear)而决定同步情况已经发生时,控制单元104不再考虑停用核102。也就是说,由于所有启用核102,但不包括停用核102,已经写入SYNC 9,无论停用核102是否已经写入SYNC 9,因此控制单元104判断同步情况已经发生。流程进行到方块1424。In block 1422 , cores 102 are woken up by control unit 104 when all enabled cores 102 have written a SYNC 9 . Additionally, in the case of a multi-crystal 406 microprocessor 100 configuration, synchronous conditions may occur for a crystal synchronous condition based on updated values in the configuration register 112 . Furthermore, when the control unit 104 determines whether a synchronous condition occurs, the control unit 104 will exclude consideration of disabling its own core 102 in block 1417 . In more detail, in one case, all other cores 102 (except for the core 102 that disabled itself) write A SYNC 9, then the control unit 104 will detect the occurrence of a sync condition (in block 316 ) when the disabled core bit setting of the core 102 that is not disabled itself is written to the sync register 108 in block 1417 . When the control unit 104 determines that a synchronization condition has occurred because the enable bit 254 of the disabled core 102 is clear, the control unit 104 no longer considers the disabled core 102 . That is, since all enabled cores 102, but not disabled cores 102, have written to SYNC 9, regardless of whether disabled cores 102 have written to SYNC 9, the control unit 104 judges that a synchronization condition has occurred. Flow proceeds to block 1424.

在方块1424中,如果一核102由另一核102在方块1417中的操作而被停用时,核102再次读取配置暂存器112,并且配置字252的新值反映了一停用核102。核102则根据配置字252的新值再次产生其配置相关值,其类似于在方块1414中的方式。一停用核的存在102可能会造成一些配置相关值不同于在方块1414中所产生的新值。例如,如上所述,虚拟核数量、APIC ID、BSP旗标、BSP基址、主要晶体主要晶片可因停用核102的存在而改变。下一实施例中,在产生配置相关值后,核102其中之一(例如,BSP)将微处理器100所有核102整体的一些配置相关值写入非核专用随机存取存储器116,使其可随后被所有核102读取。举例来说,在一实施例中,整体的配置相关值由核102读取以执行一架构指令(例如,x86CPUID指令),其指令请求微处理器100有关的整体信息,像是微处理器100的核102数量。流程进行到判断方块1426。In block 1424, if a core 102 is disabled by the operation of another core 102 in block 1417, the core 102 reads the configuration register 112 again, and the new value of the configuration word 252 reflects a disabled core 102. The core 102 then regenerates its configuration-related values based on the new value of the configuration word 252 in a manner similar to that in block 1414 . The presence 102 of a disabled core may cause some configuration dependent values to differ from the new values generated in block 1414 . For example, the number of virtual cores, APIC ID, BSP flag, BSP base address, main crystal main die may change due to the presence of disabled cores 102, as described above. In the next embodiment, after generating configuration-related values, one of the cores 102 (for example, BSP) writes some configuration-related values of all cores 102 of the microprocessor 100 into the non-core dedicated random access memory 116, so that it can It is then read by all cores 102 . For example, in one embodiment, global configuration-related values are read by core 102 to execute an architectural instruction (e.g., x86CPUID instruction) that requests general information about microprocessor 100, such as microprocessor 100 The number of cores 102. Flow proceeds to decision block 1426 .

在方块1426中,核102移除重置并开始提取架构指令。流程结束于方块1426。In block 1426, the core 102 removes reset and begins fetching architectural instructions. Flow ends at block 1426 .

请参阅图15,其是显示根据另一实施例中微处理器100动态重新配置的流程图。在图15的说明中,以图4的多晶体微处理器100作为参考,其包括两个晶体406和八个核102。然而,应可理解的是,所描述的动态重新配置可使用具有不同配置的微处理器100,即具有多于两个晶体或单个晶体,且多或少于八个核102但至少两个核102。此操作是从一单一核的角度所描述,但微处理器100的每一核102根据该描述以整体动态地操作并重新配置该微处理器100。更具体地说明,图15描述了一核102遇到核停用指令的操作,其流程开始于方块1502,而另一核102操作,其操作流程开始于方块1532。Please refer to FIG. 15 , which is a flowchart showing dynamic reconfiguration of the microprocessor 100 according to another embodiment. In the illustration of FIG. 15 , reference is made to the polycrystalline microprocessor 100 of FIG. 4 , which includes two crystals 406 and eight cores 102 . It should be understood, however, that the described dynamic reconfiguration may use microprocessors 100 having different configurations, that is, having more than two crystals or a single crystal, and having more or fewer than eight cores 102 but at least two 102. This operation is described from the perspective of a single core, but each core 102 of the microprocessor 100 dynamically operates and reconfigures the microprocessor 100 as a whole according to the description. More specifically, FIG. 15 depicts the operation of one core 102 encountering a core disable instruction, whose flow begins at block 1502 , and the operation of the other core 102 , whose operation flow begins at block 1532 .

在方块1502中,核102其中之一遇到一用以指示核102停用其本身的指令。在一实施例中,该指令为一x86WRMSR指令。作为响应,核102传送一重新配置信息至其它核102并传送其一核间中断信号。更佳地说,在时间中断被停用的期间内(例如,该微码不允许其自身被中断),核102阻止微码以响应该指令,以停用其本身(在方块1502中),或响应该中断(在方块1532中),并维持在微码中,直到方块1526为止。流程由方块1502进行到方块1504。In block 1502, one of the cores 102 encounters an instruction instructing the core 102 to disable itself. In one embodiment, the instruction is an x86WRMSR instruction. In response, the core 102 sends a reconfiguration message to the other core 102 and sends it an inter-core interrupt signal. More preferably, during the time that interrupts are disabled (e.g., the microcode does not allow itself to be interrupted), the core 102 blocks the microcode from responding to the instruction to disable itself (in block 1502), Or respond to the interrupt (in block 1532 ) and remain in microcode until block 1526 . The flow proceeds from block 1502 to block 1504 .

在方块1532中,其它核102其中之一(例如,除了在方块1502中遇到停用指令的核102之外的核)由于在方块1502中所传送的核间中断而被中断并接收重新配置信息。如上所述,虽然在方块1532中的流程由一单一核102的角度所描述,但每一其它核102(例如,并非在方块1502中的核102)在方块1532中被中断并接收该信息且执行方块1504至1526中的步骤。流程由方块1532进行到方块1504。In block 1532, one of the other cores 102 (e.g., a core other than the core 102 that encountered the disable instruction in block 1502) is interrupted due to the inter-core interrupt communicated in block 1502 and receives a reconfiguration information. As noted above, although the flow at block 1532 is described from the perspective of a single core 102, every other core 102 (e.g., the core 102 not at block 1502) is interrupted at block 1532 and receives the information and The steps in blocks 1504-1526 are performed. Flow proceeds from block 1532 to block 1504 .

在方块1504中,核102写入一同步请况10(在图15中标示为SYNC 10)的同步请求至其同步暂存器108中。因此,控制单元104令核102进入睡眠状态。流程进行到方块1506。In block 1504 , the core 102 writes a sync request for a sync request 10 (labeled SYNC 10 in FIG. 15 ) into its sync register 108 . Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 1506.

在方块1506中,当所有可用核102已写入一SYNC 10时,核102由控制单元102所唤醒。值得注意的是,在一多晶体406微处理器100配置的情况下,同步情况发生可为一多晶体同步情况发生。也就是说,控制单元104将等待以唤醒(或在核102尚未决定进入睡眠状态的情况下中断)核102,直到在核集合栏位228(其可以包括在晶体406中的核102)所指定且可启用(其由致能位所指示)的核102写入其同步请求为止。流程进行到判断方块1508。In block 1506, when all available cores 102 have written a SYNC 10, the cores 102 are woken up by the control unit 102. It should be noted that, in the case of a multi-crystal 406 microprocessor 100 configuration, the synchronous event occurrence can be a multi-crystal synchronous event occurrence. That is, the control unit 104 will wait to wake up (or interrupt if the core 102 has not decided to go to sleep) the core 102 until specified in the core set field 228 (which may include the core 102 in the crystal 406) and can be enabled (which is indicated by the enable bit) until the core 102 writes its sync request. Flow proceeds to decision block 1508.

在判断方块1508中,核102判断其是否为一在方块1502中被指示以停用其本身的核102。若是,流程进行到方块1517;否则,流程进行到方块1518。In decision block 1508, the core 102 determines whether it is a core 102 that was instructed to disable itself in block 1502. If so, the process proceeds to block 1517; otherwise, the process proceeds to block 1518.

在方块1517中,核102写入停用核位236以使其本身由可用核102的列表中移除,例如,清除在配置暂存器112的配置字252中其对应的致能位254。此后,核102可防止其本身执行任何更多的指令,更佳地通过设置一或多个位来以关闭其时脉信号,并移除其电源。流程在方块1517中结束。In block 1517 , the core 102 writes the disable core bit 236 to remove itself from the list of available cores 102 , eg, clear its corresponding enable bit 254 in the configuration word 252 of the configuration register 112 . Thereafter, core 102 may prevent itself from executing any further instructions, preferably by setting one or more bits to turn off its clock signal, and remove its power supply. Flow ends in block 1517.

在方块1518中,核102写入一同步情况11(在图15中标示为SYNC 11)的同步请求至同步暂存器108中。因此,控制单元104令核102进入睡眠状态。流程进行到方块1522。In block 1518 , the core 102 writes a sync request into the sync register 108 for a sync case 11 (labeled SYNC 11 in FIG. 15 ). Therefore, the control unit 104 puts the core 102 into a sleep state. Flow proceeds to block 1522.

在方块1522中,当所有启用的核102已写入一SYNC 11时,核102由控制单元104所唤醒。此外,在一多晶体406微处理器100配置的情况下,同步情况发生基于在配置暂存器112中的已更新值可能为一多晶体同步情况发生。再者,当控制单元104决定一同步情况是否发生时,控制单元104将排除考虑在方块1517中停用其本身的核102。更详细地说明,在一情况中,在未停用其本身的核102在方块1517中写入同步暂存器108之前,所有其它核102(除了停用其本身的核102之外)写入一SYNC 11,接着当因停用核102的致能位254为清除的(clear)而决定同步情况是否已经发生时,因控制单元104不再考虑停用核102,因此当未停用其本身的核102在方块1517中写入同步暂存器108时,控制单元104将侦测同步情况的发生(在方块316中)(请参阅图16)。也就是说,由于所有启用核102已写入一SYNC 11,无论停用核102是否已写入SYNC 11,控制单元104则判断同步情况已经发生。流程进行到方块1524。In block 1522, when all enabled cores 102 have written a SYNC 11, the cores 102 are woken up by the control unit 104. Additionally, in the case of a multi-crystal 406 microprocessor 100 configuration, synchronization conditions may occur for a multi-crystal synchronization condition based on updated values in the configuration register 112 . Furthermore, when the control unit 104 determines whether a synchronous condition occurs, the control unit 104 will exclude consideration of disabling its own core 102 in block 1517 . In more detail, in one case, all other cores 102 (except for the core 102 that disabled itself) write A SYNC 11, then when determining whether a synchronous situation has occurred because the enabling bit 254 of the disabling core 102 is clear (clear), because the control unit 104 no longer considers disabling the core 102, so when not disabling itself When the core 102 writes to the synchronization register 108 in block 1517, the control unit 104 will detect the occurrence of a synchronization condition (in block 316) (see FIG. 16). That is to say, since all enabled cores 102 have written a SYNC 11, no matter whether the disabled cores 102 have written a SYNC 11 or not, the control unit 104 determines that synchronization has occurred. Flow proceeds to block 1524.

在方块1524中,核102读取配置暂存器112,其配置字252将反映在方块1517中被停用的停用核102。该核102根据配置字252的新值接着产生其配置相关的值。更佳地说,在方块1502中停用指令由系统固件(例如,BIOS设置)所执行,以及在核102停用后,系统固件执行系统的重新启动,例如,在方块1526中之后。在重新启动期间内,微处理器100可以进行不同于在方块1524中先前配置相关值产生的操作。举例来说,在重新启动期间内BSP可为一不同于产生配置相关值前的核102。再举另一例子说明,在引导操作系统之前由BSP所决定与储存至存储器以使操作系统能读取的该系统配置信息(例如,在系统中核102及逻辑处理器的数量)可不相同。举另一例子说明,仍使用的核102的APIC ID不同于产生配置相关值前的APIC ID,在此情况下,操作系统将指示中断请求且核102将响应不同于先前配置相关值产生的中断请求。再举另一例子说明,在方块907及919中执行图9封装睡眠状态握手协议的主要核102可为一不同于先前配置相关值产生的核102。流程进行到判断方块1526。In block 1524 , the core 102 reads the configuration scratchpad 112 whose configuration word 252 will reflect the disabled core 102 disabled in block 1517 . The core 102 then generates its configuration-related value based on the new value of the configuration word 252 . More preferably, the disable command is executed by system firmware (eg, BIOS setup) in block 1502 , and after the core 102 is disabled, the system firmware performs a reboot of the system, eg, after in block 1526 . During the restart period, the microprocessor 100 may operate differently from the previously configured correlation value generation in block 1524 . For example, the BSP during reboot may be a different core 102 than it was before the configuration-related value was generated. As another example, the system configuration information (eg, the number of cores 102 and logical processors in the system) determined by the BSP prior to booting the operating system and stored in memory for the operating system to read may be different. As another example, the APIC ID of the core 102 that is still in use is different from the APIC ID before the configuration-related value was generated, in which case the operating system will indicate an interrupt request and the core 102 will respond to an interrupt that is different from the previous configuration-related value. ask. As another example, the primary core 102 that executes the encapsulation sleep state handshake protocol in FIG. 9 in blocks 907 and 919 may be a core 102 that is different from the core 102 generated by the previous configuration related values. Flow proceeds to decision block 1526.

在方块1526中,核102在方块1526中被中断之前恢复其执行的任务。流程结束于方块1526。In block 1526 , the core 102 resumes the tasks it was executing before it was interrupted in block 1526 . Flow ends at block 1526.

在本文中所描述动态地重新配置微处理器100可用于在各种应用中。举例来说,动态重新配置可在微处理器100的开发过程用于测试和/或模拟,和/或用于现场测试中。另外,一使用者可能想知道仅使用一核102子集运行一特定的应用程序时系统的性能和/或功率消耗的总量。在一实施例中,在一核102被停用时,其可使其时脉停止及/或移除电源,以使其基本上没有消耗电源。此外,在高可靠性的系统中,每一核102可周期性地检查其它核102及核102所选择的特定核102是否发生故障,未故障的核可停用故障的核102并使剩余的核102执行如上描述的动态重新配置。在此一实施例中,控制字202可包括一附加栏位,其使写入核102指定该核102被停用并且修改在图15中所描述的操作使得一核在方块1517中可停用一不同于核102本身的核102。Dynamically reconfiguring microprocessor 100 as described herein can be used in a variety of applications. For example, dynamic reconfiguration may be used for testing and/or simulation during the development of microprocessor 100, and/or for field testing. Additionally, a user may wish to know the total amount of performance and/or power consumption of the system using only a subset of cores 102 to run a particular application. In one embodiment, when a core 102 is disabled, it may have its clock stopped and/or power removed such that it consumes substantially no power. Furthermore, in a high-reliability system, each core 102 may periodically check other cores 102 and a specific core 102 selected by a core 102 for failure, and the non-failed core may disable the failed core 102 and make the remaining Core 102 performs dynamic reconfiguration as described above. In this embodiment, the control word 202 may include an additional field that causes the write core 102 to specify that the core 102 is disabled and modifies the operation described in FIG. 15 so that a core can be disabled in block 1517 A core 102 other than core 102 itself.

请参阅图16,其是显示根据图15流程图的微处理器100操作一例子的时序图。在此例子中,微处理器100配置具有三个核102,标示为核0、核1和核2,如图所示。然而,应可理解的是,在其它实施例中,微处理器100可包括不同数量的核102且可为单一晶体或多晶体微处理器100。在此时序图中,事件的时序向下前进。Please refer to FIG. 16 , which is a timing diagram showing an example of the operation of the microprocessor 100 according to the flowchart of FIG. 15 . In this example, microprocessor 100 is configured with three cores 102, labeled Core 0, Core 1, and Core 2, as shown. However, it should be understood that in other embodiments, the microprocessor 100 may include a different number of cores 102 and may be a single crystal or a multi-crystal microprocessor 100 . In this sequence diagram, the timing of events advances downward.

核1遇到一停用其本身的指令并作为响应传送一重新配置信息且中断核0和核2(每一方块1502)。核1接着写入SYNC 10并进入睡眠状态(每一方块1504)。Core 1 encounters an instruction to disable itself and in response transmits a reconfiguration message and interrupts Core 0 and Core 2 (each block 1502). Core 1 then writes to SYNC 10 and goes to sleep (per block 1504).

每一核0和核2最终从其目前的任务中被中断并阅读该信息(每一方块1532)。作为响应,每一核0和核2的写入SYNC 10并进入睡眠状态(每一方块1504)。如图所示,每一核的写入同SYNC 10的时间可能不同。举例来说,由于该指令的延迟,因此该指令当中断被确立时而执行。Each core 0 and core 2 is eventually interrupted from its current task and reads the information (each block 1532). In response, each of Core 0 and Core 2 writes to SYNC 10 and goes to sleep (each block 1504). As shown in the figure, the time of writing to SYNC 10 may be different for each core. For example, due to the latency of the instruction, the instruction is executed when the interrupt is asserted.

当所有核102写入SYNC 10时,控制单元104同时唤醒所有核(每一方块1506)。核0及核2接着决定其不会停用其本身(每一判断方块1508),并写入一SYNC 11并进入睡眠状态(每一方块1518)。然而,因核1决定其停用其本身,所以其写入其停用核位236(每一方块1517)。在此例子中,核1在核0及核2写入各自的SYNC 11后写入其停用核位236,如图所示。然而,由于控制单元104决定S位222对于每一致能位254被设置的核102而设置,因此控制单元104侦测该同步情况发生。也就是说,即使核1的S位222未设置,其致能位254在方块1517核1的同步暂存器108写入时被清除。When all cores 102 write to SYNC 10, control unit 104 wakes up all cores simultaneously (per block 1506). Core 0 and Core 2 then decide that they will not disable themselves (per decision block 1508), and write a SYNC 11 and go to sleep (per block 1518). However, since Core 1 decided that it would disable itself, it writes its disable core bit 236 (per block 1517). In this example, core 1 writes its disable core bit 236 after core 0 and core 2 write their respective SYNC 11, as shown. However, since the control unit 104 determines that the S bit 222 is set for each core 102 for which the enable bit 254 is set, the control unit 104 detects that the synchronization condition occurs. That is to say, even if the S bit 222 of core 1 is not set, its enable bit 254 is cleared when the synchronous register 108 of core 1 is written in block 1517 .

当所有可用核已写入SYNC 11时,控制单元104同时唤醒所有核(每一方块1522)。如上所述,在一个多晶体微处理器100的情况下,当核1写入其停用核位236,并且本地控制单元104分别清除核1的本地致能位254,本地控制单元104也传播本地致能位254至远端晶体406。因此,远端控制单元104也侦测同步状态的发生且同时唤醒其晶体406所有可用的核。核0和核2基于已更新配置暂存器112的值接着产生其配置相关值(每一方块1524),并恢复其中断前的活动(每一方块1526)。When all available cores have written to SYNC 11, control unit 104 wakes up all cores simultaneously (per block 1522). As noted above, in the case of a polycrystalline microprocessor 100, when core 1 writes its disable core bit 236, and local control unit 104 clears core 1's local enable bit 254, respectively, local control unit 104 also propagates Local enable bit 254 to remote crystal 406 . Therefore, the remote control unit 104 also detects the occurrence of a sync state and wakes up all available cores of its crystal 406 at the same time. Core 0 and Core 2 then generate their configuration-related values based on the updated configuration register 112 values (per block 1524 ), and resume their pre-interrupt activity (per block 1526 ).

硬件信号量(HARDWARE SEMAPHORE)Hardware semaphore (HARDWARE SEMAPHORE)

请参考图17,其是显示在图1中硬件信号量118的一方块图。硬件信号量118包括一拥有位(owned bit)1702、所有者位(owner bit)1704及一状态机1706,其状态机1706用以更新拥有位1702及所有者位1704以响应由核102读取及写入的硬件信号量118。更佳地说,为了辨识核目前拥有的硬件信号量118,所有者位1704的数量为log以2为底的微处理器100配置的核102数量。在另一实施例中,所有者位1704包括微处理器100每一核102一对应的位。值得注意的是,尽管一组拥有位1702、所有者位1704及状态机1706被描述以一硬件信号量118实现,但微处理器100可包括多个硬件信号量118,其中每一硬件信号量118都包括上述的一套硬件。更佳地说,为了执行需要独占读取共享资源的操作,在每一核102中运行的微码读取并写入该硬件信号量118以取得一由核102所共享资源的所有权,其详细描述于下方的例子中。该微码可将每一多个硬件信号量118与微处理器100不同的共享资源所有权联系在一起。更佳地说,硬件信号量118通过核102在核102的一非架构地址空间内一预设地址中所读取及写入。该非架构地址空间仅可由一核102的微码所读取,但无法直接由使用者程序码读取(例如,x86架构的程序指令)。用以更新硬件信号量118的拥有位1702及所有者位1704的状态机1706操作被描述如图18及19中,并且硬件信号量118的使用也在之后描述。Please refer to FIG. 17 , which is a block diagram of the hardware semaphore 118 shown in FIG. 1 . The hardware semaphore 118 includes an owned bit (owned bit) 1702, an owner bit (owner bit) 1704, and a state machine 1706. The state machine 1706 is used to update the owned bit 1702 and the owner bit 1704 in response to being read by the core 102. And the written hardware semaphore 118. More preferably, in order to identify the hardware semaphore 118 currently owned by the core, the number of owner bits 1704 is the log base-2 number of cores 102 in the microprocessor 100 configuration. In another embodiment, the owner bits 1704 include a corresponding bit for each core 102 of the microprocessor 100 . It should be noted that although a set of own bit 1702, owner bit 1704, and state machine 1706 are described as being implemented as a hardware semaphore 118, the microprocessor 100 may include multiple hardware semaphores 118, each of which 118 all comprise a set of above-mentioned hardware. More preferably, microcode running in each core 102 reads and writes the hardware semaphore 118 to take ownership of a resource shared by the core 102 in order to perform operations requiring exclusive reads of the shared resource, which are detailed in described in the example below. The microcode may associate each of the plurality of hardware semaphores 118 with different ownership of shared resources of the microprocessor 100 . More preferably, the hardware semaphore 118 is read and written by the core 102 at a predetermined address in a non-architectural address space of the core 102 . The non-architectural address space can only be accessed by the microcode of a core 102, but cannot be directly accessed by user program code (eg, program instructions of the x86 architecture). The operation of the state machine 1706 to update the own bit 1702 and the owner bit 1704 of the hardware semaphore 118 is described in FIGS. 18 and 19 , and the use of the hardware semaphore 118 is also described later.

请参阅图18,其是显示当一核102读取硬件信号量118的操作流程图。流程开始于方块1802。Please refer to FIG. 18 , which is a flowchart showing the operation when a core 102 reads the hardware semaphore 118 . Flow begins at block 1802 .

在方块1802中,一核102,标示为核x,读取硬件信号量118。如上所述,更佳地说,核102的微码读取该硬件信号量118所驻留在非架构地址空间内的预定地址。流程进行到判断方块1804。In block 1802 , a core 102 , denoted core x, reads the hardware semaphore 118 . As mentioned above, preferably, the microcode of the core 102 reads the predetermined address in the non-architectural address space where the hardware semaphore 118 resides. Flow proceeds to decision block 1804.

在判断方块1804中,状态机1706检查该所有者位1704,以决定核102是否为硬件信号量118的所有者。若是,则流程进行到方块1808;否则,流程进行到方块1806。In decision block 1804 , state machine 1706 checks the owner bit 1704 to determine whether core 102 is the owner of hardware semaphore 118 . If so, the flow goes to block 1808; otherwise, the flow goes to block 1806.

在方块1806中,该硬件信号量118返回并读取核102中的一零值以指示该核102不拥有硬件信号量118,流程在方块1806中结束。In block 1806 , the hardware semaphore 118 returns and reads a zero value in the core 102 to indicate that the core 102 does not own the hardware semaphore 118 , and the flow ends in block 1806 .

在方块1808,该硬件信号量118返回并读取核102中的一值,以指示该核102拥有硬件信号量118,流程在方块1808中结束。In block 1808 , the hardware semaphore 118 returns and reads a value in the core 102 to indicate that the core 102 owns the hardware semaphore 118 , and the process ends in block 1808 .

如上所述,微处理器100可包括多个硬件信号量118。在一实施例中,微处理器100包括16个硬件信号量118,并且当一核102读取预定地址时,其接收一16位数据值,其每一位对应16个硬件信号量118其中之一不同的硬件信号量118,并指示该读取预定地址的核102是否拥有对应的硬件信号量118。As mentioned above, the microprocessor 100 may include a number of hardware semaphores 118 . In one embodiment, the microprocessor 100 includes 16 hardware semaphores 118, and when a core 102 reads a predetermined address, it receives a 16-bit data value, each of which corresponds to one of the 16 hardware semaphores 118 A different hardware semaphore 118 , and indicates whether the core 102 reading the predetermined address has the corresponding hardware semaphore 118 .

请参阅图19,其是显示当一核102写入硬件信号量118的操作流程图。流程开始于方块1902。Please refer to FIG. 19 , which is a flowchart showing the operation when a core 102 writes to the hardware semaphore 118 . Flow begins at block 1902 .

在方块1902中,一核102,标示为核x,写入硬件信号量118,例如,如上所述的在非架构的预设地址。流程进行到判断方块1804。In block 1902, a core 102, denoted core x, writes to the hardware semaphore 118, eg, at a non-architectural default address as described above. Flow proceeds to decision block 1804.

在判断方块1904中,状态机1706检查该拥有位1702,以决定硬件信号量118是否为任一核102所拥有或未被占有(free)。若已被拥有,则流程进行到判断方块1914;否则,流程进行到判断方块1906。In decision block 1904 , the state machine 1706 checks the owned bit 1702 to determine whether the hardware semaphore 118 is owned by any core 102 or not (free). If owned, the flow proceeds to decision block 1914 ; otherwise, the flow proceeds to decision block 1906 .

在判断方块1906中,状态机1706检查写入的值。若该值为1,其表示核102欲获取硬件信号量118的所有权,则流程前进到方块1908。然而,若该值为0,其表示核102欲放弃硬件信号量118的所有权,则流程进行到方块1912。In decision block 1906, state machine 1706 checks the written value. If the value is 1, it means that the core 102 wants to take ownership of the hardware semaphore 118 , and the process proceeds to block 1908 . However, if the value is 0, which means that the core 102 intends to relinquish ownership of the hardware semaphore 118 , the flow proceeds to block 1912 .

在方块1908中,状态机1706更新拥有位1702至1,并设置所有者位1704指示核x现在拥有的硬件信号量118。流程在方块1908中结束。In block 1908, the state machine 1706 updates the own bit 1702 to 1 and sets the owner bit 1704 to indicate that core x now owns the hardware semaphore 118. Flow ends in block 1908 .

在方块1912中,该状态机1706未执行拥有位1702的更新,也未执行所有者位1704的更新,流程结束于方块1912中。In block 1912 , the state machine 1706 does not perform an update of the own bit 1702 , nor does it perform an update of the owner bit 1704 , and the flow ends in block 1912 .

在判断方块1914中,状态机1706检查该所有者位1704,以决定核x是否为硬件信号量118的所有者。若是,则流程进行到判断方块1916;否则,流程进行到方块1912。In decision block 1914 , state machine 1706 checks the owner bit 1704 to determine whether core x is the owner of hardware semaphore 118 . If so, the process proceeds to decision block 1916; otherwise, the process proceeds to block 1912.

在判断方块1916中,状态机1706检查所写入的值。如果该值为1,其表示该核102欲获取硬件信号量118的所有权,则流程前进到方块1912(其中因此核102已拥有硬件信号量118,所以未有更新发生,如判断方块1914中所判断)。然而,若该值为0,其表示该核102欲放弃硬件信号量118的所有权,则流程进行到方块1918。In decision block 1916, state machine 1706 checks the written value. If the value is 1, it represents that the core 102 intends to acquire the ownership of the hardware semaphore 118, and the process proceeds to block 1912 (wherein the core 102 already owns the hardware semaphore 118, so no update occurs, as determined in the decision block 1914 judge). However, if the value is 0, it means that the core 102 intends to relinquish the ownership of the hardware semaphore 118 , and the process proceeds to block 1918 .

在方块1918中,该状态机1706更新拥有位1702为零,以表示现在未有核102拥有硬件信号量118,流程结束于方块1918。In block 1918 , the state machine 1706 updates the owning bit 1702 to zero to indicate that no core 102 owns the hardware semaphore 118 now, and the process ends at block 1918 .

如上所述,在一实施例中,微处理器100包括16个硬件信号量118。当一核102写入该预定地址时,其写入一16位数据值,其每一位对应16个硬件信号量118其中之一不同的硬件信号量118,并指示该写入预定地址的核102是否请求拥有(值为1)或放弃对应硬件信号量118的所有权(值为零)。As mentioned above, in one embodiment, the microprocessor 100 includes 16 hardware semaphores 118 . When a core 102 writes the predetermined address, it writes a 16-bit data value, each of which corresponds to a different hardware semaphore 118 of one of the 16 hardware semaphores 118, and indicates the core of the write predetermined address 102 whether to request possession (value 1) or to relinquish ownership of the corresponding hardware semaphore 118 (value 0).

在一实施例中,仲裁逻辑仲裁由核102所请求存取该硬件信号量118,以使核102由硬件信号量118序列化(Serialize)读取/写入硬件信号量118。在一实施例中,仲裁逻辑在核102间使用一循环控制公平演算法(Round-Robin Fairness Algorithm)以存取硬件信号量118。In one embodiment, the arbitration logic arbitrates the access request from the core 102 to the hardware semaphore 118 , so that the core 102 serializes the reading/writing of the hardware semaphore 118 by the hardware semaphore 118 . In one embodiment, the arbitration logic uses a round-robin fairness algorithm (Round-Robin Fairness Algorithm) between the cores 102 to access the hardware semaphore 118 .

请参阅图20,其是显示当微处理器100使用硬件信号量118以执行需一资源独占所有权的操作流程图。更具体地说明,硬件信号量118用以在两或多个核102已分别遇到一写回且使共享高速缓冲存储器119失效指令的情况下确保在某一时间仅一核102执行一写回,并使共享高速缓冲存储器119失效。该操作是以一单一核的角度所描述,但微处理器100的每一核102根据本发明以整体确保一核102执行写回且使其它核102的操作无效。也就是说,图20的操作确保WBINVD指令过程被序列化(Serialize)。在一实施例中,图20的操作可在一微处理器100中执行,其根据图7A~7B中的实施例执行一WBINVD指令。流程开始于方块2002。Please refer to FIG. 20 , which is a flowchart showing the operation when the microprocessor 100 uses the hardware semaphore 118 to execute a resource requiring exclusive ownership. More specifically, the hardware semaphore 118 is used to ensure that only one core 102 executes a writeback at a time if two or more cores 102 have each encountered a writeback and invalidated the shared cache 119 instruction , and invalidate the shared cache memory 119. The operation is described in terms of a single core, but each core 102 of the microprocessor 100 as a whole ensures that one core 102 performs the writeback and disables the other core 102 in accordance with the present invention. That is, the operation of FIG. 20 ensures that the WBINVD instruction process is serialized (Serialized). In one embodiment, the operations of FIG. 20 may be implemented in a microprocessor 100 that executes a WBINVD instruction according to the embodiment of FIGS. 7A-7B . Flow begins at block 2002.

在方块2002中,一核102遇到一高速缓冲控制指令,像是一WBINVD指令。流程进行到方块2004。In block 2002, a core 102 encounters a cache control instruction, such as a WBINVD instruction. The flow proceeds to block 2004.

在方块2004中,核102写入1至WBINVD硬件信号量118中。在一实施例中,该微码已分配硬件信号量118其中之一至WBINVD操作中。该核102接着读取WBINVD硬件信号量118以决定其是否获得所有权。流程进行到判断方块2006。In block 2004 , the core 102 writes a 1 to the WBINVD hardware semaphore 118 . In one embodiment, the microcode has allocated one of the hardware semaphores 118 to the WBINVD operation. The core 102 then reads the WBINVD hardware semaphore 118 to determine if it has taken ownership. The process proceeds to decision block 2006 .

在判断方块2006中,若核102决定其取得WBINVD硬件信号量118的所有权时,则流程进行到方块2008;否则,流程返回至方块2004以再次尝试获取所有权。应注意的是当即时核102的微码经由方块2004至2006间循环,其最终会由拥有WBINVD硬件信号量118的核102所中断,因为该核102正于图7A~7B中方块702中执行WBINVD指令并且传送一中断至即时核102。更佳地说,经由每次循环,即时核102的微码检查中断状态暂存器,以观察其它核102其中之一(例如,拥有该WBINVD硬件信号量118的核102)是否发送一中断至即时核102。该即时核102接着将执行图7A~7B的操作,并在方块749中根据图20恢复操作以试图获得硬件信号量118的所有权,以执行其WBINVD指令。In decision block 2006, if the core 102 determines that it has acquired the ownership of the WBINVD hardware semaphore 118, the flow proceeds to block 2008; otherwise, the flow returns to block 2004 to try to acquire ownership again. It should be noted that when the microcode of the real-time kernel 102 loops through blocks 2004 to 2006, it will eventually be interrupted by the core 102 that owns the WBINVD hardware semaphore 118 because the core 102 is executing in block 702 in FIGS. 7A-7B The WBINVD instruction sends an interrupt to the real-time core 102 . More preferably, through each loop, the microcode of the real-time core 102 checks the interrupt status register to see if one of the other cores 102 (for example, the core 102 that owns the WBINVD hardware semaphore 118) sends an interrupt to Instant Core 102. The just-in-time core 102 will then execute the operations of FIGS. 7A-7B , and resume operations according to FIG. 20 in block 749 to attempt to acquire ownership of the hardware semaphore 118 to execute its WBINVD instruction.

在方块2008中,核102已获得所有权且流程进行到图7A~7B中的方块702以执行WBINVD指令。由于部分的WBINVD指令操作,在图7A~7B方块748中,该核102写入零至WBINVD硬件信号量118中以放弃其所有权。流程结束于方块2008。In block 2008, the core 102 has taken ownership and flow proceeds to block 702 in FIGS. 7A-7B to execute the WBINVD instruction. As part of the WBINVD instruction operation, the core 102 relinquishes ownership of the WBINVD hardware semaphore 118 by writing a zero to the WBINVD hardware semaphore 118 at block 748 in FIGS. 7A-7B . The process ends at block 2008.

一类似于图20所描述的操作可由该微码执行,以获得的其它共享资源独占的所有权。一核102可获得通过使用一硬件信号量118所使用的独占所有权的其它资源为非核103的暂存器,其由核102所共享。在一实施例中,非核103暂存器包括一控制暂存器,其包括每一核102各自的栏位。该栏位控制各个核102的操作方面。由于栏位位于相同的暂存器中,当一核102欲更新其各自的栏位但无法更新其它核102的栏位时,该核102必须读取该控制暂存器、修改所读取的值,接着写回已修改的值至控制暂存器。举例来说,微处理器100可以包括一非核103性能控制暂存器(Performance Control Register,PCR),其用于控制核102的总线时脉比。为了更新其总线时脉比,一特定的核102必须读取、修改并写回PCR。因此,在一实施例中,微码被配置为当核102拥有与PCR相关的硬件信号量118时,执行一PCR的有效原子读取/修改/写回。总线时脉比经由一外部总线决定单一核102时脉频率为该支持微处理器100的时脉频率的倍数。An operation similar to that described in FIG. 20 may be performed by the microcode to obtain exclusive ownership of other shared resources. Another resource that a core 102 can obtain exclusive ownership of used by using a hardware semaphore 118 is non-core 103 registers, which are shared by cores 102 . In one embodiment, the non-core 103 registers include a control register including fields for each core 102 . This field controls the operational aspects of each core 102 . Since the fields are in the same register, when a core 102 wants to update its respective field but cannot update the fields of other cores 102, the core 102 must read the control register, modify the read value, then writes back the modified value to the control register. For example, the microprocessor 100 may include an uncore 103 performance control register (Performance Control Register, PCR), which is used to control the bus clock ratio of the core 102 . In order to update its bus clock ratio, a particular core 102 must read, modify and write back the PCR. Thus, in one embodiment, the microcode is configured to perform an efficient atomic read/modify/writeback of a PCR when the core 102 owns the hardware semaphore 118 associated with the PCR. The bus clock ratio determines the clock frequency of the single core 102 as a multiple of the clock frequency of the supporting microprocessor 100 via an external bus.

另一资源是一可信赖平台模组(Trusted Platform Module,TPM)。在一实施例中,微处理器100执行运作在核102中微码的一可信赖平台模组。在一给定的即时时间中,运行在一核102及核102其中之一的微码实施TPM。然而,实施TPM的核102可能随时间而改变。通过使用与TPM相关联的硬件信号量118,核102的微码可确保仅一核102在一时间实施TPM。更具体地说明,目前正执行TPM的核102在放弃实施该TPM之前写入TPM状态至专用随机存取存储器116,并且该接管实施TPM的核102从专用随机存取存储器116中读取TPM的状态。在每一核102的微码被配置为使当核102欲成为执行TPM的核102时,核102在由专用随机存取存储器116中读取TPM状态之前首先取得TPM硬件信号量118的所有权,并开始执行TPM。在一实施例中,TPM大致符合由可信赖运算组织(Trusted Computing Group)所发布的TPM规范,像是ISO/IEC11889规范。Another resource is a Trusted Platform Module (TPM). In one embodiment, the microprocessor 100 executes a TPM running microcode in the core 102 . At a given instant in time, microcode running on a core 102 or one of the cores 102 implements the TPM. However, the cores 102 implementing the TPM may change over time. By using the hardware semaphore 118 associated with the TPM, the microcode of the cores 102 can ensure that only one core 102 implements the TPM at a time. More specifically, the core 102 currently executing the TPM writes the TPM state to the dedicated random access memory 116 before relinquishing implementation of the TPM, and the core 102 that takes over implementing the TPM reads the TPM state from the dedicated random access memory 116. state. The microcode of each core 102 is configured such that when the core 102 intends to become the core 102 executing the TPM, the core 102 first takes ownership of the TPM hardware semaphore 118 before reading the TPM state from the dedicated random access memory 116, And start executing TPM. In one embodiment, the TPM substantially conforms to the TPM specification issued by the Trusted Computing Group, such as the ISO/IEC11889 specification.

如上所述,传统对多个处理器之间资源竞争的解决方法是采用在系统存储器中的软件信号量(software semaphore)。本文所描述的硬件信号量118的潜在优点是其可避免在额外存储器总线上额外传输量的产生,并且其存取速度快于存取系统的存储器。As mentioned above, the conventional solution to resource contention between multiple processors is to use software semaphores in system memory. Potential advantages of the hardware semaphore 118 described herein are that it avoids the need for extra traffic on an extra memory bus, and it can be accessed faster than accessing system memory.

中断、非睡眠同步请求Interrupt, non-sleep synchronous request

请参阅图21,其是显示根据图3流程图的核102发出非睡眠同步请求操作一例子的时序图。在此例子中,微处理器100配置具有三个核102,标示为核0、核1和核2,如图所示。然而,应可理解的是,在其它实施例中,该微处理器100可包括不同数量的核102。Please refer to FIG. 21 , which is a timing diagram showing an example of the operation of the core 102 issuing a non-sleep synchronization request according to the flowchart in FIG. 3 . In this example, microprocessor 100 is configured with three cores 102, labeled Core 0, Core 1, and Core 2, as shown. However, it should be understood that in other embodiments, the microprocessor 100 may include a different number of cores 102 .

核0写入一SYNC 14,其不被设置于睡眠位212中,也非设置于选择性唤醒位214(例如,一非睡眠同步请求)中。因此,控制单元104允许核0保持运行(每一判断方块312的分支“否”)。Core 0 writes a SYNC 14 that is not set in the sleep bit 212, nor is it set in the optional wakeup bit 214 (eg, a non-sleep sync request). Therefore, the control unit 104 allows core 0 to keep running (branch "No" of each decision block 312).

核1最终也写入一非睡眠SYNC 14且控制单元104允许核1保持运行。最后,核2写入一非睡眠SYNC 14。如图所示,每一核写入SYNC 14的时间可能不同。Core 1 eventually also writes a non-sleep SYNC 14 and the control unit 104 allows Core 1 to keep running. Finally, Core 2 writes a non-sleep SYNC 14 . As shown, the time each core writes to SYNC 14 may be different.

当所有核已写入非睡眠同步14时,控制单元104同时发送一同步中断至每一核0、核1及核2。每一核接着接收同步中断并服务同步中断(除非该同步中断被遮蔽,在这种情况下,该微码一般会轮询(poll)该同步中断)。When all cores have written non-sleep sync 14, the control unit 104 simultaneously sends a sync interrupt to each of core 0, core 1 and core 2. Each core then receives and services the isochronous interrupt (unless the isochronous interrupt is masked, in which case the microcode typically polls the isochronous interrupt).

导引处理器的指定Boot Processor Assignment

在一实施例中,如上所述,通常(例如,当图23“所有核BSP”的功能被停用时)一核102指定本身为引导处理器(BSP)并执行指定的任务,像是引导工作系统。在一实施例中,通常(例如,当图22及23“修改BSP”及“所有核BSP”的功能分别被停用时)虚拟核的数量由核102BSP预设为0。In one embodiment, as described above, normally (e.g., when the functionality of "All Cores BSP" in FIG. working system. In one embodiment, the number of virtual cores is preset to 0 by the core 102BSP normally (eg, when the "Modify BSP" and "All Cores BSP" functions of FIGS. 22 and 23, respectively, are disabled).

然而,本发明人已经观察到BSP以一不同的方式被指定有可能是有利的,实施例将在下面进行描述。例如,部分微处理器100的许多测试,特别是在制造测试中,是由引导操作系统和运行程序码来执行,以确保该部分微处理器100正常进行工作。因BSP核102执行系统初始化并启动该操作系统,因此BSP核102可以AP核无法运行的方式运行。此外,由观察可知,即使是在多执行绪(Multithreaded)的操作环境中,BSP通常较AP负担该处理负荷较大的部分,因此,AP核102无法与BSP核102一样作全面的测试。最后,可能有某些动作其仅需由该BSP核102代表微处理器100作为一整体来执行,像是如图9描述的封装睡眠状态握手协议。However, the inventors have observed that it may be advantageous to specify the BSP in a different manner, an example of which will be described below. For example, many tests of a portion of the microprocessor 100, especially during manufacturing testing, are performed by booting the operating system and running program code to ensure that the portion of the microprocessor 100 is functioning properly. Since the BSP core 102 performs system initialization and starts the operating system, the BSP core 102 can operate in ways that the AP core cannot. In addition, it can be seen from observation that even in a multi-threaded operating environment, the BSP usually bears a larger part of the processing load than the AP. Therefore, the AP core 102 cannot be tested as comprehensively as the BSP core 102 . Finally, there may be certain actions that only need to be performed by the BSP core 102 on behalf of the microprocessor 100 as a whole, such as the encapsulated sleep state handshake protocol as depicted in FIG. 9 .

因此,实施例描述任一核102可被指定为BSP。在一实施例中,在微处理器100的测试期间内,运行测试N次,其中N为微处理器100核102的数量,并在测试的每一个运行中微处理器100被重新配置以使BSP为不同的核102。这在制造过程中可以有利地提供更好的测试覆盖率,并且也有利地在微处理器100的设计过程中揭露在微处理器100中的错误。另一优点是在不同的运行中每一核102可具有一不同的APIC ID,从而响应不同的中断请求,其可提供更广泛的测试覆盖率。Thus, the embodiments describe that any core 102 can be designated as a BSP. In one embodiment, during testing of the microprocessor 100, the test is run N times, where N is the number of cores 102 of the microprocessor 100, and in each run of the test the microprocessor 100 is reconfigured so that The BSPs are different cores 102 . This may advantageously provide better test coverage during manufacturing and also advantageously expose bugs in microprocessor 100 during microprocessor 100 design. Another advantage is that each core 102 can have a different APIC ID in different runs and thus respond to different interrupt requests, which can provide wider test coverage.

请参阅图22,其是显示配置微处理器100的一程序流程图。在图22的描述参考图4中的多晶体微处理器100,其包括两个晶体406和八个核102。然而,应可理解的是,描述于此的动态重新配置可使用具有不同配置的一微处理器100,即具有多于两个晶体或单个晶体,且多或少于八个核102但至少两个核102。此操作是从一单一核的角度所描述,但微处理器100的每一核102根据该描述以整体动态地操作并重新配置该微处理器100。流程开始于方块2202。Please refer to FIG. 22 , which is a flowchart showing a procedure for configuring the microprocessor 100 . The description in FIG. 22 refers to the polycrystalline microprocessor 100 in FIG. 4 , which includes two crystals 406 and eight cores 102 . However, it should be understood that the dynamic reconfiguration described herein may use a microprocessor 100 having a different configuration, that is, having more than two crystals or a single crystal, and having more or less than eight cores 102 but at least two 102 cores. This operation is described from the perspective of a single core, but each core 102 of the microprocessor 100 dynamically operates and reconfigures the microprocessor 100 as a whole according to the description. Flow begins at block 2202.

在方块2202中,微处理器100被重置,并执行其初始化的初始部分,更佳地一方式其类似于上面图14所描述的方式。然而,配置相关值的产生,像是图14中的方块1424,尤其是该APIC ID和BSP旗标,以方块2203至2204中所描述的方式执行。流程进行到方块2203。In block 2202, the microprocessor 100 is reset and performs an initial portion of its initialization, preferably in a manner similar to that described above for FIG. 14 . However, generation of configuration-related values, such as block 1424 in FIG. 14 , particularly the APIC ID and BSP flag, is performed in the manner described in blocks 2203-2204. The process goes to block 2203.

在方块2203中,核102产生其虚拟核数量,更佳地描述于图14中。流程进行到判断方块2204。In block 2203, the core 102 generates its virtual core count, better described in FIG. 14 . Flow proceeds to decision block 2204.

在判断方块2204中,核102取样一指示以决定一功能是否可启用。该功能在本文中称为“修改BSP”功能。在一实施例中,烧断一熔断器114可修改BSP的功能。更佳地说,在测试过程中,并非烧断修改BSP功能的熔断器114,而是一真值(True)被扫描至与修改BSP功能熔断器114相关的保存暂存器位中,如上述图1中所示,以使该修改BSP功能可启用。在此方式中,该修改BSP功能在部分微处理器100中并非永久可启用,但却在供电(power-up)后停用。更佳地说,在方块2203至2214中的操作由核102的微码所执行。若该修改BSP功能被启用时,流程进行到方块2205。否则,流程进行到方块2206。In decision block 2204, the core 102 samples an indication to determine whether a function is enabled. This functionality is referred to herein as the "Modify BSP" functionality. In one embodiment, blowing a fuse 114 may modify the functionality of the BSP. More preferably, in the test process, instead of blowing the fuse 114 that modifies the BSP function, a true value (True) is scanned into the save temporary register bit associated with the modifying BSP function fuse 114, as described above shown in Figure 1 to enable the modified BSP functionality. In this way, the modified BSP function is not permanently enabled in some microprocessors 100, but disabled after power-up. More preferably, the operations in blocks 2203 to 2214 are performed by the microcode of the core 102 . If the modified BSP function is enabled, the process proceeds to block 2205 . Otherwise, the process proceeds to block 2206.

在方块2205中,核102修改在方块2203中所产生的虚拟核数量。在一实施例中,核102修改虚拟核数量以产生在方块2203中所产生虚拟核数量的一循环函数(Rotatefunction)的结果及一循环量,如下所示:In block 2205, the core 102 modifies the number of virtual cores generated in block 2203. In one embodiment, the core 102 modifies the number of virtual cores to generate a result of a Rotate function (Rotate function) of the number of virtual cores generated in block 2203 and a rotation amount, as follows:

虚拟核数量=循环(循环量,虚拟核数量)。Number of virtual cores = cycle(cycle amount, number of virtual cores).

循环函数,在一个实施例中,通过循环数在核102之间循环虚拟核数。循环量为烧断熔断器114的一值,或更佳地说,其在测试过程中被扫描至保持暂存器中。表1显示每一核102的虚拟核数,其序对(晶体数量258、本地核数量256)是显示在一示范配置的左侧行中,而每一循环量是显示在顶行中,其晶体数量406为二且每一晶体406的核102数量为4,以及所有核102可被启用。在此种方式中,测试器被授权使核102产生其虚拟核数、及例如任何有效值的APIC ID。虽然用于修改虚拟核数描述于的一实施例中,但其它实施例也可被预期。例如,循环方向可以相反表示于表格1中。流程进行到方块2206。The round robin function, in one embodiment, cycles the number of virtual cores between cores 102 by a round number. The cycle amount is a value that blows the fuse 114, or better said, is scanned into the holding register during the test. Table 1 shows the number of virtual cores for each core 102 in sequential pairs (crystal count 258, local core count 256) are shown in the left row of an exemplary configuration, and the amount per cycle is shown in the top row, where The number of crystals 406 is two and the number of cores 102 per crystal 406 is four, and all cores 102 can be enabled. In this way, the tester is authorized to have the core 102 generate its virtual core number, and APIC ID, eg, any valid value. Although the method for modifying the number of virtual cores is described in one embodiment, other embodiments are also contemplated. For example, the direction of circulation can be reversed as indicated in Table 1. Flow proceeds to block 2206.

表1Table 1

00 11 22 33 44 55 66 77 (0,0)(0,0) 00 77 66 55 44 33 22 11 (0,1)(0,1) 11 00 77 66 55 44 33 22

(0,2)(0,2) 22 11 00 77 66 55 44 33 (0,3)(0,3) 33 22 11 00 77 66 55 44 (1,0)(1,0) 44 33 22 11 00 77 66 55 (1,1)(1,1) 55 44 33 22 11 00 77 66 (1,2)(1,2) 66 55 44 33 22 11 00 77 (1,3)(1,3) 77 66 55 44 33 22 11 00

在方块2206中,核102将在方块2203中产生的预设虚拟核数量或在方块2203中产生的已修改的值填入本地APIC ID暂存器中。在一实施例中,APICID暂存器可由该核102在存储器地址0x0FEE00020中从其本身中读取(例如,通过通过BIOS和/或操作系统)。然而,在另一实施例中,APIC ID暂存器可通过核102在MSR地址0x802读取。流程进行到判断方块2208。In block 2206 , the core 102 fills the local APIC ID register with the preset number of virtual cores generated in block 2203 or the modified value generated in block 2203 . In one embodiment, the APICID register can be read by the core 102 from itself at memory address 0x0FEE00020 (eg, via the BIOS and/or operating system). However, in another embodiment, the APIC ID register can be read by the core 102 at MSR address 0x802. Flow proceeds to decision block 2208.

在判断方块2208中,核102决定其在方块2208所填入的APIC ID是否为零。若是,则流程进行到方块2212;否则,流程进行到方块2214。In decision block 2208, the core 102 determines whether the APIC ID it populated in block 2208 is zero. If so, the process proceeds to block 2212; otherwise, the process proceeds to block 2214.

在方块2212中,核102将其BSP旗标设置为真(true),以表示核102为BSP。在一实施例中,BSP旗标为该核102的x86APIC基址暂存器(IA32_APIC_BASE MSR)的一位。流程进行到判断方块2216。In block 2212, the core 102 sets its BSP flag to true to indicate that the core 102 is a BSP. In one embodiment, the BSP flag is a bit of the x86 APIC base address register (IA32_APIC_BASE MSR) of the core 102 . Flow proceeds to decision block 2216.

在方块2214中,核102将BSP旗标设置为假(false),以表示核102不为BSP,例如,在一AP中。流程进行到判断方块2216。In block 2214, the core 102 sets the BSP flag to false to indicate that the core 102 is not a BSP, eg, in an AP. Flow proceeds to decision block 2216.

在判断方块2216中,核102判断其是否为BSP,例如,是否指定本身为在方块2212中的BSP核102,而非指定本身为在方块2214中的AP核102。若是,则流程进行到方块2218;否则,流程进行到方块2222。In decision block 2216 , the core 102 determines whether it is a BSP, eg, designates itself as the BSP core 102 in block 2212 rather than designating itself as the AP core 102 in block 2214 . If so, the process proceeds to block 2218; otherwise, the process proceeds to block 2222.

在方块2218中,核102开始提取并执行系统初始化固件(例如,BSP BIOS引导程序码)。这可包括与BSP旗标和APIC ID相关的指令,例如,读取APICID暂存器或APIC基址暂存器的指令,在此情况下,核102恢复在方块2206及2212/2214写入的值。其还可包括作为微处理器100唯一核102代表微处理器100作为一整体以执行操作,像是图9描述的封装睡眠状态握手协议。更佳地说,BSP核102在一已定义架构重置向量中开始获取并执行系统初始化固件。例如,在x86架构中,重置向量指向0xFFFFFFF0。更佳地说,执行系统初始化固件包括引导该操作系统,例如,载入该操作系统并转变为控制操作系统。流程进行到方块2224。In block 2218, the core 102 begins fetching and executing system initialization firmware (eg, BSP BIOS bootloader code). This may include instructions related to the BSP flag and APIC ID, for example, an instruction to read the APICID register or the APIC base address register, in which case the core 102 restores what was written at blocks 2206 and 2212/2214. value. It may also include being the sole core 102 of the microprocessor 100 to perform operations on behalf of the microprocessor 100 as a whole, such as the encapsulated sleep state handshake protocol described in FIG. 9 . More preferably, the BSP core 102 starts fetching and executing the system initialization firmware in a defined architectural reset vector. For example, on the x86 architecture, the reset vector points to 0xFFFFFFF0. More preferably, executing the system initialization firmware includes booting the operating system, eg, loading the operating system and transitioning to the controlling operating system. Flow proceeds to block 2224.

在方块2222中,核102中止其本身并等待来自BSP的启动序列以开始提取并执行指令。在一实施例中,从BSP所接收的启动序列包括到AP系统初始化固件的一中断向量(例如,AP BIOS程序码)。这可包括与BSP旗标及APIC ID相关的指令,在此种情况下,核102恢复在方块2206及2212/2214中所写入的值。流程进行到方块2224。In block 2222, the core 102 halts itself and waits for a boot sequence from the BSP to begin fetching and executing instructions. In one embodiment, the boot sequence received from the BSP includes an interrupt vector to the AP system initialization firmware (eg, AP BIOS code). This may include instructions related to the BSP flag and APIC ID, in which case core 102 restores the values written in blocks 2206 and 2212/2214. Flow proceeds to block 2224.

在方块2224中,当核102执行指令时,该核102基于在方块2206中写在其APIC ID暂存器的APIC ID接收中断请求并响应所述中断请求。流程结束于方块2224。In block 2224, when the core 102 executes the instruction, the core 102 receives an interrupt request based on the APIC ID written in its APIC ID register in block 2206 and responds to the interrupt request. Flow ends at block 2224.

如上所述,根据一实施例中,虚拟核数为零的核102预设为BSP。然而,发明人已经观察到,可能具有对所有核102被指定为BSP有利的情况,实施例将描述于下方。举例来说,微处理器100开发人员已投入显著大量的时间和成本来研发一设计用以在单一执行绪(single-threaded)的一单核中运行的庞大测试主体,并且开发人员想使用单核测试以测试多核微处理器100。例如,该测试可能在x86实际模式老旧且知名的DOS操作系统中运行。As mentioned above, according to an embodiment, the core 102 whose number of virtual cores is zero is preset as a BSP. However, the inventors have observed that there may be situations where it is advantageous for all cores 102 to be designated as BSPs, examples of which are described below. For example, microprocessor 100 developers have invested a significant amount of time and cost in developing a large test body designed to run in a single-core on a single-threaded, and the developers want to use a single-threaded Core test to test the multi-core microprocessor 100 . For example, the test might be run on the old and well-known DOS operating system in x86 real mode.

在每一核102运行这些测试可以在一使用图22中所描述的该修改BSP功能以连续的方式中完成及/或通过烧断熔断器或扫描至保持暂存器修改熔断器值以停用所有核102,但一核102用来进行测试。然而,发明人已经理解这将比在所有核102中同时运行测试需要更多的时间(例如,在一4核微处理器100的情况下大约为4倍),此外,所需测试每一单独微处理器100部分的时间是宝贵的,尤其是当制造数十万或更多的微处理器100部分,特别是当许多测试在非常昂贵的测试设备中被测试。Running these tests on each core 102 can be done in a sequential fashion using the modified BSP function described in FIG. All cores 102 but one core 102 are used for testing. However, the inventors have appreciated that this will require more time (e.g., about 4 times in the case of a 4-core microprocessor 100) than running the tests simultaneously in all cores 102, and furthermore, each individual test required to test Microprocessor 100 part time is precious, especially when manufacturing hundreds of thousands or more of microprocessor 100 parts, especially when many tests are tested in very expensive test equipment.

此外,其它可能为当在同一时间运行一个以上的核102(或所有核102)时,由于其会产生更多的热能及/或吸引更多的能量,在微处理器100逻辑中的一速度路径将被施加更多压力的情况。在此连续的方式中运行的测试可能不会产生额外的压力并揭露该速度路径。In addition, other possibilities are that a speed in the logic of the microprocessor 100 may be affected by the fact that more than one core 102 (or all of the cores 102) are running at the same time because it generates more heat and/or draws more power. The case where the path will be subjected to more stress. Tests run in this continuous fashion may not create additional stress and reveal this velocity path.

因此,实施例描述所有核102可被动态指定该BSP核102以使所有核102可同时执行一测试。Therefore, the embodiment describes that all cores 102 can be dynamically assigned to the BSP core 102 so that all cores 102 can execute a test at the same time.

请参阅图23,其是显示根据另一实施例中配置微处理器100的一程序流程图。在图23的描述参考图4中的多晶体微处理器100,其包括两个晶体406和八个核102。然而,应可理解的是,描述于此的动态重新配置可使用具有不同配置的一微处理器100,即具有多于两个晶体或单个晶体,且多或少于八个核102但至少两个核102。此操作是从一单一核的角度所描述,但微处理器100的每一核102根据该描述以整体动态地操作并重新配置该微处理器100。流程开始于方块2302。Please refer to FIG. 23 , which is a flowchart showing a program for configuring the microprocessor 100 according to another embodiment. The description in FIG. 23 refers to the polycrystalline microprocessor 100 in FIG. 4 , which includes two crystals 406 and eight cores 102 . However, it should be understood that the dynamic reconfiguration described herein may use a microprocessor 100 having a different configuration, that is, having more than two crystals or a single crystal, and having more or less than eight cores 102 but at least two 102 cores. This operation is described from the perspective of a single core, but each core 102 of the microprocessor 100 dynamically operates and reconfigures the microprocessor 100 as a whole according to the description. Flow begins at block 2302.

在方块2302中,微处理器100被重置,并执行其初始化的初始部分,更佳地一方式其类似于上面图14所描述的方式。然而,配置相关值的产生,像是图14中的方块1424,尤其是该APIC ID和BSP旗标,以方块2304至2312中所描述的方式执行。流程进行到判断方块2304。In block 2302, the microprocessor 100 is reset and performs an initial portion of its initialization, preferably in a manner similar to that described above for FIG. 14 . However, generation of configuration-related values, such as block 1424 in FIG. 14 , particularly the APIC ID and BSP flag, is performed in the manner described in blocks 2304-2312. Flow proceeds to decision block 2304.

在判断方块2304中,核102侦测一功能可被启用。该功能在本文中称为“所有核BSP”功能。更佳地说,烧断熔断器114能使所有核BSP功能被启用。更佳地说,在测试过程中,并非烧断所有核BSP功能的熔断器114,而是一真值(True)被扫描至与所有核BSP功能熔断器114相关的保存暂存器位中,如上述图1中所示,以使该所有核BSP功能可启用。在此方式中,该所有核BSP功能在部分微处理器100中并非永久可启用,但却在供电(power-up)后停用。更佳地说,在方块2304至2312中的操作由核102的微码所执行。若该所有核BSP功能被启用时,流程进行到方块2305。否则,流程进行到图22中方块2203。In decision block 2304, the core 102 detects that a function can be enabled. This functionality is referred to herein as the "all cores BSP" functionality. More preferably, blowing fuse 114 enables all core BSP functions to be enabled. More preferably, in the test process, instead of blowing the fuses 114 of all core BSP functions, a true value (True) is scanned into the save temporary register bits relevant to all core BSP function fuses 114, As shown in Figure 1 above, to enable all of the core BSP functions. In this approach, the all core BSP functions are not permanently enabled in part of the microprocessor 100, but are disabled upon power-up. More preferably, the operations in blocks 2304 to 2312 are performed by microcode of the core 102 . If the all-core BSP function is enabled, the process proceeds to block 2305 . Otherwise, the process proceeds to block 2203 in FIG. 22 .

在方块2305中,无论本地核数量256及核102的晶体258数量为何,核102设定其虚拟核数量为零。流程进行到方块2306。In block 2305, the core 102 sets its virtual core count to zero regardless of the local core count 256 and the core 102 crystal 258 count. Flow proceeds to block 2306.

在方块2306中,核102将在方块2305中所设置值为零的虚拟核数量填入本地APICID暂存器。流程进行到方块2312。In block 2306, the core 102 fills the local APICID register with the number of virtual cores set to zero in block 2305. Flow proceeds to block 2312.

在方块2312中,无论本地核数量256及核102的晶体258数量为何,核102设置其BSP旗标为真(True)以表示该核102为BSP。流程进行至方块2315。In block 2312, regardless of the number of local cores 256 and the number of crystals 258 of the core 102, the core 102 sets its BSP flag to True to indicate that the core 102 is a BSP. Flow proceeds to block 2315.

在方块2315中,每当一核102执行一存储器存取请求时,微处理器100分别修改每一核102存储器存取请求地址的较高地址位,使得每一核102存取其单独的存储器空间。也就是说,根据产生存储器存取请求的核102,微处理器100修改较高地址位,以使较高地址位具有每一核102一独特的值。在一实施例中,微处理器100修改由烧断熔断器114的值所指示的较高地址位。在另一实施例中,微处理器100基于本地核数量256及核102的晶体数量258修改较高地址位。举例来说,在一微处理器100中核数量为4的实施例中,微处理器100修改该存储器地址较高的两个位,并在每一核102较高的两个位中产生一独特的值。实际上,可由微处理器100寻址的存储器空间被分成N个子空间,其中N是核102的数量。测试程序的开发使得其限制自己本身以指定在N个子空间中最低子空间的地址。例如,假设微处理器100能找寻存储器64GB的地址以及微处理器100包括四个核102。该测试被研发仅存取存储器最低的8GB。当核0执行存取存储器地址A(在存储器中较低的8GB)的指令时,微处理器100在存储器总线A(未修改)中产生一地址;当核1执行存取相同存储器地址A的指令时,该微处理器100在存储器总线A+8GB中产生一地址;当核2执行存取相同存储器地址A的指令时,该微处理器100在存储器总线A+16GB中产生一地址;以及当核3执行存取相同存储器地址A的指令时,该微处理器100在存储器总线A+32GB中产生一地址。在此种方式中,有利地,核102将不会在其存取存储器中相冲突,其可使测试正确地执行。更佳地说,单一执行绪测试被执行于一独立测试机器中,其能够单独测试该微处理器100。该微处理器100开发人员开发测试数据并由测试机器提供给该微处理器100,相反地,该微处理器100开发人员研发结果数据,其为由测试机器在一存储器写入存取期间内比较该微处理器100所写入的数据结果,以确保该微处理器100写入正确的数据。在一实施例中,共享高速缓冲存储器119(例如,最高级高速缓冲存储器,其产生用于外部总线处理中的地址)为微处理器100的一部分,其配置用以当所有核BSP功能启用时修改较高地址位。流程进行到方块2318。In block 2315, whenever a core 102 executes a memory access request, the microprocessor 100 modifies the higher address bits of each core 102 memory access request address respectively, so that each core 102 accesses its separate memory space. That is, depending on the core 102 that generated the memory access request, the microprocessor 100 modifies the upper address bits so that the upper address bits have a unique value for each core 102 . In one embodiment, microprocessor 100 modifies the higher address bits indicated by the value of blown fuse 114 . In another embodiment, the microprocessor 100 modifies the upper address bits based on the local core number 256 and the crystal number 258 of the core 102 . For example, in an embodiment where the number of cores in a microprocessor 100 is four, the microprocessor 100 modifies the upper two bits of the memory address and generates a unique value. In practice, the memory space addressable by the microprocessor 100 is divided into N subspaces, where N is the number of cores 102 . The test program was developed such that it constrains itself to specify the address of the lowest subspace among the N subspaces. For example, assume that the microprocessor 100 is capable of addressing 64 GB of memory and that the microprocessor 100 includes four cores 102 . The test was developed to access only the lowest 8GB of memory. When core 0 executes an instruction that accesses memory address A (lower 8 GB in memory), microprocessor 100 generates an address in memory bus A (unmodified); when core 1 executes an instruction that accesses the same memory address A During the instruction, the microprocessor 100 generates an address in the memory bus A+8GB; when the core 2 executes an instruction to access the same memory address A, the microprocessor 100 generates an address in the memory bus A+16GB; and When core 3 executes an instruction that accesses the same memory address A, the microprocessor 100 generates an address in memory bus A+32GB. In this manner, advantageously, the core 102 will not conflict in its accesses to memory, which allows the test to execute correctly. More preferably, single-thread testing is performed in a stand-alone testing machine capable of testing the microprocessor 100 alone. The microprocessor 100 developer develops test data and is provided to the microprocessor 100 by the test machine, conversely, the microprocessor 100 developer develops result data, which is provided by the test machine during a memory write access Compare the results of the data written by the microprocessor 100 to ensure that the microprocessor 100 writes correct data. In one embodiment, a shared cache 119 (e.g., a top-level cache that generates addresses used in external bus transactions) is part of the microprocessor 100 and is configured to when all core BSP functions are enabled Modify higher address bits. Flow proceeds to block 2318.

在方块2318中,核102开始提取并执行系统初始化固件(例如,BSP BIOS引导程序码)。这可包括与该BSP旗标和APIC ID相关的指令,例如,读取该APIC ID暂存器或APIC基址暂存器的指令,在此情况下,该核102恢复在方块2306中所写入的零值。更佳地说,该BSP核102在一架构定义的重置向量(Architecturally-defined reset vector)中开始读取并执行系统初始化固件。例如,在x86架构中,重置向量指向0xFFFFFFF0地址。更佳地说,执行该系统初始化固件包括引导操作系统,例如,载入该操作系统并且变更为控制该操作系统。流程进行到方块2324。In block 2318, the core 102 begins fetching and executing system initialization firmware (eg, BSP BIOS bootloader code). This may include instructions related to the BSP flag and APIC ID, for example, an instruction to read the APIC ID register or the APIC base address register, in which case the core 102 restores what was written in block 2306 Enter a value of zero. More preferably, the BSP core 102 starts to read and execute the system initialization firmware in an architecturally-defined reset vector. For example, on the x86 architecture, the reset vector points to address 0xFFFFFFF0. More preferably, executing the system initialization firmware includes booting an operating system, eg, loading the operating system and changing to control the operating system. Flow proceeds to block 2324.

在方块2324中,当核102执行指令时,该核102基于在方块2306中写在其APIC ID暂存器值为零的APIC ID值接收中断请求并响应所述中断请求。流程结束于方块2324。In block 2324, when the core 102 executes the instruction, the core 102 receives an interrupt request based on the APIC ID value written in its APIC ID register to a value of zero in block 2306 and responds to the interrupt request. Flow ends at block 2324.

虽然一所有核102被指定为该BSP的实施例中已被描述于图23中,但其它实施例可以考虑多个但少于所有核102被指定为该BSP。While an embodiment in which all cores 102 are designated as the BSP has been described in FIG. 23 , other embodiments contemplate that more, but less than all cores 102 are designated as the BSP.

虽然实施例以一x86型系统内容被描述,其系统中每一核102使用一本地APIC且具有在本地APIC ID及BSP指定之间的一关联性,应可理解的是,该引导处理器的指定并不局限于x86的实施例,但可在具有不同系统架构的系统中使用。Although the embodiment is described in the context of an x86-type system in which each core 102 uses a local APIC and has an association between the local APIC ID and the BSP designation, it should be understood that the boot processor's The designation is not limited to x86 implementations, but may be used in systems with different system architectures.

用于多核的微码修补(PATCH)的传播Propagation of microcode patches (PATCH) for multicore

如先前所观察,有可能主要由微处理器的微码执行的许多重要的功能,且特别地,其需在执行于该微处理器多核中的该微码实例间正确的通信及协调。由于微码的复杂性,因此一显著的机率显示错误将存在于需修正的微码中。这可经由使用新微码指令取代造成该错误的旧微码指令的微码修补来完成。也就是说,该微处理器包括益于微码修补的特定硬件。在一般情况下,理想是将该微修改应用到该微处理器的所有核。传统上,其已通过在每一核中单独执行一架构指令以执行修补。然而,传统的方法可能会有问题。As previously observed, there are many important functions likely to be performed primarily by the microcode of a microprocessor, and in particular, it requires correct communication and coordination among instances of the microcode executing in the microprocessor's multi-core. Due to the complexity of microcode, there is a significant chance that a bug will be present in the microcode that needs to be corrected. This can be accomplished via microcode patching that replaces the old microcode instruction that caused the error with a new microcode instruction. That is, the microprocessor includes specific hardware that benefits from microcode patching. In general, it is desirable to apply the micromodification to all cores of the microprocessor. Traditionally, patching has been performed by individually executing an architectural instruction in each core. However, traditional methods can be problematic.

首先,该修补与使用微码例子(例如,核同步、硬件信号量使用)的核间通信相关或与需微码核间通信的功能(例如,跨核调整请求、高速缓冲控制操作或电源管理,或动态多核微处理器配置)相关。分别在每一核上架构修补应用程序的执行可能会产生一时间视窗,其微码修补应用到一些核中但并非用应用到其它核中(或一先前修补应用在一些核以及新的修补应用到其它核)。这可能会造成核间一通信失败及该微处理器不正确的操作。如果该微处理器的所有核使用相同的微码修补,其它可预期及不可预期的问题也可能产生。First, the fix is related to inter-core communication using microcode instances (e.g., core synchronization, hardware semaphore usage) or functionality that requires microcode inter-core communication (e.g., cross-core alignment requests, cache control operations, or power management , or dynamic multi-core microprocessor configuration) related. Architecting the execution of the patch application separately on each core may result in a time window in which the microcode patch is applied to some cores but not to others (or a previous patch is applied to some cores and the new patch is applied to other cores). This may cause a communication failure between the cores and incorrect operation of the microprocessor. If all cores of the microprocessor are patched with the same microcode, other expected and unexpected problems may also arise.

其次,该微处理器的架构指定许多功能,其在某些实例(instance)中可由该微处理器所支持,且不被其它微处理器支持。在操作期间中,微处理器能够与支持该特定功能的系统软件进行通信。例如,在一x86体系架构微处理器的情况下,x86CPUID指令可由系统软件执行以决定所支持的功能设置。然而,决定功能设置的指令(例如,CPUID)分别在该微处理器的每一核中执行。在一些情况下,一功能可因一存在于该时间中的错误而被停用,并解除该微处理器。然而,随后一修复此错误的微码修补可被开发出,以使此功能可在修补应用后被启用。然而,如果修补是以传统的常规实施(例如,通过在每一核中应用修补指令的个别指令,分别实施于每一核),不同的核可取决于该修补是否已经应用在核中,在一给定的时间点指示不同的功能配置。这可能是有问题的,尤其是当该系统软件(如操作系统,例如,帮助核间执行绪迁移),期望该微处理器的所有核具有相同的功能设置。尤其是,已观察到一些系统软件仅获得一核的功能配置,并假设其它核芯具有相同的功能配置。Second, the microprocessor's architecture specifies many functions that, in some instances, may be supported by the microprocessor and not supported by other microprocessors. During operation, the microprocessor is able to communicate with system software that supports that particular function. For example, in the case of an x86 architecture microprocessor, the x86CPUID instruction can be executed by system software to determine the supported feature set. However, instructions (for example, CPUID) that determine function settings are executed in each core of the microprocessor, respectively. In some cases, a function may be disabled due to an error present at the time and disable the microprocessor. However, a subsequent microcode patch that fixes this bug can be developed so that this feature can be enabled after the patch is applied. However, if the patch is implemented in a conventional manner (e.g., by applying individual instructions in each core to apply the patch instruction, separately to each core), different cores may differ depending on whether the patch has been applied to the core, in A given point in time indicates different functional configurations. This can be problematic, especially when the system software (such as an operating system, for example, to facilitate thread migration between cores) expects all cores of the microprocessor to have the same feature set. In particular, some system software has been observed to only get the functional configuration of one core and assume the other cores have the same functional configuration.

再者,每一核控制和/或与核所共享的非核资源(例如,同步相关的硬件、硬件信号量、共享PRAM、共享高速缓冲器或服务处理单元)通信的微码实例。因此,由于在核中其中之一具有使用微码修补而其它核并无使用(或两个核具有不同的微码修补),一般来说,两种不同核的微码同时以两种不同的方式进行控制或与非核资源进行通信可能是有问题的。Furthermore, each core controls and/or microcode instances that communicate with non-core resources shared by the cores (eg, synchronization-related hardware, hardware semaphores, shared PRAM, shared caches, or service processing units). Therefore, since one of the cores has a microcode patch and the other core does not (or two cores have different microcode patches), generally speaking, the microcode of two different cores uses two different microcode patches at the same time. methods of controlling or communicating with non-nuclear resources can be problematic.

最后,在该微处理器的该微码修补硬件也可以使用传统方式的修补,但其可能造成其它核修补应用及由一核修补操作的干扰,例如,若修补硬件的部分在核间共享。Finally, the microcode patching hardware on the microprocessor can also be patched in a conventional way, but it may cause interference with other core patching applications and patching operations by a core, for example, if parts of the patching hardware are shared between cores.

更佳地说,在架构指令级以一原子(atomic)方式应用微码修补至一多核微处理器的实施例以解决描述于本文中的问题。首先,将修补应用在整体微处理器100中以响应在单一核102中一架构指令的执行。即,实施例无须要求系统软件在每一核102中执行一应用微码修补指令(如下所描述)。更具体地说,遇到该应用微码修补指令的单一核102将传送信息并中断其它核102以引起其微码用于修补部分的实例,及所有微码实例与另一微码合作使得该微码修补应用至每一核102的微码修补软件中,并当在所有核102中停用中断时,共享该微处理器100的修补硬件。其次,在所有核102中运行并实现该原子修补应用机制的微码实例与另一微码相合作,以使其避免执行任一的架构指令(除了一应用微码修补指令外)在该微处理器100的所有核102已同意应用此修补后,直到所有核102完成为止。也就是说,当任一核102使用该微码修补时,没有核102执行一架构指令。此外,在一更佳的实施例中,所有核102到达该微码相同的地方以执行具有停用中断的修补应用,并在之后核102仅执行用于修补的该微码指令直到该微处理器100的所有核确认该修补已被使用为止。也就是说,当该微处理器100的任一核102正使用该修补时,核102除了使用微码修补的微码指令外,没有核102执行微码指令。More preferably, microcode patches are applied to embodiments of a multi-core microprocessor in an atomic fashion at the architectural instruction level to address the problems described herein. First, patches are applied to the overall microprocessor 100 in response to the execution of an architectural instruction in a single core 102 . That is, embodiments need not require system software to execute an application microcode patching instruction (described below) in each core 102 . More specifically, a single core 102 that encounters the apply microcode patch instruction will send a message and interrupt the other cores 102 to cause its microcode to patch that portion of the instance, and all microcode instances cooperate with the other microcode to make the Microcode patches are applied to the microcode patch software of each core 102 and share the patched hardware of the microprocessor 100 when interrupts are disabled in all cores 102 . Second, the microcode instance running in all cores 102 and implementing the atomic patch apply mechanism cooperates with another microcode so that it avoids executing any architectural instructions (except for an apply microcode patch instruction) in the microcode After all cores 102 of processor 100 have agreed to apply the patch, until all cores 102 are done. That is, when any core 102 uses the microcode patch, no core 102 executes an architectural instruction. Also, in a more preferred embodiment, all cores 102 go to the same place in the microcode to execute the patch application with interrupts disabled, and then the cores 102 only execute the microcode instructions for patching until the microcode All cores of the processor 100 confirm that the patch has been used. That is, when any core 102 of the microprocessor 100 is using the patch, no core 102 executes the microcode instruction except the microcode instruction patched by the microcode.

请参照图24,其是显示根据另一实施例的一多核微处理器100的方块图。该微处理器100在许多方面相似于图1的微处理器100。然而,图24的微处理器100还包括在其非核103中的服务处理单元(Service Processing Unit,SPU)2423、服务处理单元(SPU)起始地址暂存器2497、一非核微码只读存储器(Read Only Memory,ROM)2425及一非核微码修补随机存取存储器(Random Access Memory,RAM)2408。此外,每一核102包括一核PRAM2499、一修补可定址内容存储器(Content Addressable Memory,CAM)2439及一核微码ROM2404。Please refer to FIG. 24 , which is a block diagram showing a multi-core microprocessor 100 according to another embodiment. The microprocessor 100 is similar in many respects to the microprocessor 100 of FIG. 1 . However, the microprocessor 100 of FIG. 24 also includes a service processing unit (Service Processing Unit, SPU) 2423 in its non-core 103, a service processing unit (SPU) initial address register 2497, a non-core microcode ROM (Read Only Memory, ROM) 2425 and a random access memory (Random Access Memory, RAM) 2408 patched by a non-core microcode. In addition, each core 102 includes a core PRAM 2499 , a patched Content Addressable Memory (CAM) 2439 and a core microcode ROM 2404 .

微码包括微码指令。该微码指令为储存在该微处理器100一或多个存储器(例如,非核微码ROM 2425、非核微码修补RAM2408及/或核微码ROM2404)中的非架构指令,其中该微码指令由一核102基于储存在该非架构微程序计数器(Micro-program Counter,Micro-PC)中一提取(fetch)地址所提取,并且由该核102使用以实现该微处理器100指令集结构的指令。更佳地说,该微码指令由一微转译器(Microtranslator)转译成微指令,其微指令由该核102的执行单元所执行,或在另一实施例中,该微码指令直接由执行单元所执行,在此情况下,微码指令为微指令。该微码指令为非架构指令意指其并非该微处理器100的指令集架构(Instruction Set Architecture,ISA)的指示,但其根据一不同于该架构指令集的指令集而被编码。该非架构微程序计数器不由该微处理器100的指令集架构所定义,并且不同于该核102的架构定义(Architecturally-defined)程序计数器。该微码用以实现如下该微处理器的ISA指令集的一些或所有指令。为了响应解码一微码执行ISA指令,该核102转变为控制一与该ISA相关的一微码例行程序(Routine)。该微码例行程序包括微码指令。该执行单元执行该微码指令,或根据较佳的实施例,该微码指令进一步被转译为由该执行单元所执行的微指令。该微码指令(或由该微码指令所转译的微指令)由该执行单元所执行的执行结果为由该ISA指令所定义的结果。因此,与该ISA指令相关的微码(或从该微码例行程序指令转译的微指令)例行程序的共同执行是由该执行单元“实施(Implement)”该ISA指令。也就是说,通过执行微码指令(或从该微码指令转译的微指令)的执行单元所执行的共同执行完成由在该ISA指令所指定的输入中的该ISA指令所指定的操作,以产生一由该ISA指令所定义的结果。此外,当该微处理器为了配置该微处理器而重置时,该微码指令可以被执行(或转译为被执行的微指令)。Microcode includes microcode instructions. The microcode instructions are non-architectural instructions stored in one or more memories of the microprocessor 100 (e.g., non-core microcode ROM 2425, non-core microcode patch RAM 2408, and/or core microcode ROM 2404), wherein the microcode instructions Extracted by a core 102 based on a fetch address stored in the non-architectural Micro-program Counter (Micro-program Counter, Micro-PC), and used by the core 102 to implement the instruction set architecture of the microprocessor 100 instruction. More preferably, the microcode instruction is translated by a microtranslator (Microtranslator), and its microcode is executed by the execution unit of the core 102, or in another embodiment, the microcode instruction is directly executed by Executed by an execution unit, in this case microcode instructions are microinstructions. The microcode instruction is a non-architectural instruction meaning that it is not indicative of the instruction set architecture (ISA) of the microprocessor 100 , but is encoded according to an instruction set different from the instruction set of the architecture. The non-architectural micro program counter is not defined by the instruction set architecture of the microprocessor 100 and is different from the architecturally-defined program counter of the core 102 . The microcode is used to implement some or all of the instructions of the microprocessor's ISA instruction set as follows. In response to decoding a microcode to execute an ISA instruction, the core 102 transitions to control a microcode routine (Routine) associated with the ISA. The microcode routines include microcode instructions. The execution unit executes the microcode instruction, or according to a preferred embodiment, the microcode instruction is further translated into a microinstruction executed by the execution unit. The execution result of the microcode instruction (or the microcode translated by the microcode instruction) executed by the execution unit is the result defined by the ISA instruction. Thus, co-execution of the microcode (or microinstructions translated from the microcode routine instructions) routine associated with the ISA instruction is "implementing" the ISA instruction by the execution unit. That is, the operations specified by the ISA instruction in the inputs specified by the ISA instruction are performed by collective execution performed by the execution units executing the microcode instruction (or microinstructions translated from the microcode instruction), to Produces a result as defined by the ISA directive. Additionally, the microcode instructions may be executed (or translated into microinstructions that are executed) when the microprocessor is reset in order to configure the microprocessor.

该核微码ROM2404拥有由包括该核微码ROM 2404的特定核102所执行的微码。该非核微码ROM 2425也拥有由该核102所执行的微码。然而,与核微码ROM 2404相比,该非核ROM2425由核102所共享。更佳地说,由于该非核ROM 2425的存取时间大于该核ROM 2404,因此该非核ROM 2425拥有需较少性能及/或较不频繁地执行的微码例行程序。此外,该非核ROM2425拥有由该SPU 2423所提取并执行的程序码。The core microcode ROM 2404 holds the microcode executed by the particular core 102 that includes the core microcode ROM 2404 . The non-core microcode ROM 2425 also holds the microcode executed by the core 102 . However, in contrast to the core microcode ROM 2404, the uncore ROM 2425 is shared by the core 102. More preferably, since the uncore ROM 2425 has a greater access time than the core ROM 2404, the uncore ROM 2425 has microcode routines that require less performance and/or execute less frequently. In addition, the non-core ROM 2425 holds program codes extracted and executed by the SPU 2423 .

该非核微码修补RAM2408也由核102所共享。该非核微码修补RAM2408拥有由核102所执行的微码指令。当该提取地址与该修补CAM 2439中项目(entry)其中之一的内容相匹配时,则该修补CAM2439拥有为响应一微码提取地址而由该修补CAM 2439所输出至一微序列器(Microsequencer)的修补地址。在此情况下,该微序列器输出的该修补地址为该微码提取地址,而非下一顺序的提取指地址(或在分支型指令情况下的目标地址),以作为该非核修补RAM 2408输出一修补微码指令的回复。例如,因为修补微码指令和/或在其之后的微码指令为一错误来源,因此一修补微码指令由在非核修补RAM2408中提取实行,而非从该非核ROM 2425或该核ROM 2404中提取的一微码指令。因此,该修补微码指令有效地替换、或修补在该原始微码提取地址中驻留在该核ROM 2404或该非核微码ROM 2425非预期的微码指令。更佳地说,该修补CAM 2439及修补RAM 2408被载入以响应包含在系统软件中的架构指令,像是BIOS或在该微处理器100中运行的操作系统。The non-core microcode patch RAM 2408 is also shared by the core 102 . The non-core microcode patch RAM 2408 holds microcode instructions executed by the core 102 . When the extraction address matches the contents of one of the entries in the repair CAM 2439, then the repair CAM 2439 has a microcode extraction address output by the repair CAM 2439 to a microsequencer (Microsequencer) ) patch address. In this case, the patch address output by the microsequencer is the microcode fetch address, rather than the next sequential fetch pointer address (or target address in the case of a branch instruction), as the non-core patch RAM 2408 A reply to the patch microcode instruction is output. For example, a patched microcode instruction is fetched from uncore patched RAM 2408 rather than from either the uncored ROM 2425 or the cored ROM 2404 because the patched microcode instruction and/or the microcoded instructions following it are a source of errors A microcode instruction is fetched. Thus, the patched microcode instructions effectively replace, or patch, unintended microcode instructions residing in the core ROM 2404 or the non-core microcode ROM 2425 at the original microcode fetch address. More preferably, the patch CAM 2439 and patch RAM 2408 are loaded in response to architectural instructions contained in system software, such as the BIOS or operating system running on the microprocessor 100 .

在其它事件中,该非核PRAM 116被该微码用以储存该微码所使用的值。这些值的一部分有效函数为常数Among other things, the uncore PRAM 116 is used by the microcode to store values used by the microcode. Some valid functions of these values are constants

除可能经由一修补或为响应一明确修改该值的指令(例如,一WRMSR指令)的执行之外,当该微处理器100被重置且在该微处理器100的操作期间内不被修改时,由于其为储存于该核微码ROM 2404或该非核微码ROM 2425的立即值(immediate value)或在该微处理器100被制造或由该微码写入至该非核PRAM 116的时间点烧断该熔断器114。有利的是,这些值可经由本文中所描述的修补机制而修改,无需改变成本可能非常昂贵的该核微码ROM2404或该非核微码ROM 2425,且也无需一或多个未烧断的熔断器114。Except possibly via a patch or in response to the execution of an instruction that explicitly modifies the value (e.g., a WRMSR instruction), when the microprocessor 100 is reset and not modified during operation of the microprocessor 100 , because it is an immediate value stored in the core microcode ROM 2404 or the non-core microcode ROM 2425 or when the microprocessor 100 is manufactured or written to the non-core PRAM 116 by the microcode point to blow the fuse 114. Advantageously, these values can be modified via the patching mechanism described herein without requiring changes to the core microcode ROM 2404 or the non-core microcode ROM 2425, which can be very expensive, and without requiring one or more unblown fuses device 114.

此外,该非核PRAM 116用以保存由该SPU 2423所提取并执行的修补码,如本文所述。In addition, the uncore PRAM 116 is used to store patch code fetched and executed by the SPU 2423, as described herein.

该核PRAM 2499,其类似于该非核PRAM 116,为专用(private)的,或非架构的,其意指该核PRAM 2499并不位于该微处理器100架构使用者程序地址空间中。然而,不像该非核PRAM 116,每一PRAM 2499仅由其各自的核102所读取且不由其它核102所共享。像该非核PRAM 116一样,该核PRAM2499也由该微码使用以储存由该微码所使用的值。有利的是,这些值可经由本文中所描述的修补机制而修改,并无需改变该核微码ROM 2404或非核微码ROM2425。The core PRAM 2499, like the uncore PRAM 116, is private, or non-architectural, which means that the core PRAM 2499 is not located in the microprocessor 100 architectural user program address space. However, unlike the uncore PRAM 116 , each PRAM 2499 is only read by its respective core 102 and is not shared by other cores 102 . Like the uncore PRAM 116, the core PRAM 2499 is also used by the microcode to store values used by the microcode. Advantageously, these values can be modified via the patching mechanisms described herein without requiring changes to either the core microcode ROM 2404 or the non-core microcode ROM 2425.

该SPU 2423包括一已储存程序处理器,其为一附属且不同于每一核102的附属物(adjunct)。虽然所述核102结构上可执行所述核102的该ISA的指令(例如,x86的ISA指令),但该SPU 2423在结构上无法这样做。因此,举例来说,该操作系统无法在该SPU 2423中运行,也无法使所述核102的ISA操作系统调度程序(例如,x86的ISA指令)在该SPU 2423中运行。换言之,该SPU2423不为由该操作系统所管理的一系统资源。更确切地说,该SPU 2423执行用于调整该微处理器100的操作。此外,该SPU 2423可帮助测量所述核102的性能及其它功能。更佳地说,该SPU 2423比所述核102较小、较不复杂并且具有更少的功率消耗(例如,在一实施例中,该SPU 2423包括内建的时脉门控(Clock Gating))。在一实施例中,SPU2423包括一FORTH CPU核。The SPU 2423 includes a stored program processor, which is an adjunct distinct from each core 102 . Although the core 102 is structurally capable of executing instructions of the ISA of the core 102 (eg, x86 ISA instructions), the SPU 2423 is structurally incapable of doing so. Therefore, for example, the operating system cannot run in the SPU 2423 , nor can the ISA operating system scheduler of the core 102 (eg, ISA instructions for x86) run in the SPU 2423 . In other words, the SPU 2423 is not a system resource managed by the operating system. More specifically, the SPU 2423 performs operations for tuning the microprocessor 100 . Additionally, the SPU 2423 can help measure the performance and other functions of the core 102 . More preferably, the SPU 2423 is smaller, less complex, and consumes less power than the core 102 (e.g., in one embodiment, the SPU 2423 includes built-in Clock Gating ). In one embodiment, the SPU2423 includes a FORTH CPU core.

可与由所述核102所执行的除错指令一起发生的非同步事件可能无法处理得很好。然而,有利的是,该SPU 2423可以由一核102命令以侦测该事件,并执行操作,像是建立一纪录档(log)或修改该核102各方面的行为和/或该微处理器100外部总线接口,以作为侦测此事件的响应。该SPU 2423可提供该纪录档信息到该使用者,并且其也可以与追踪器互动以请求该追踪器提供该纪录档信息或请求胎追踪器执行其它动作。在一实施例中,该SPU2423能够访问控制该存储器子系统的暂存器及每一核102的可编程中断控制器,以及该共享高速缓冲暂存器119的控制暂存器。Asynchronous events that may occur with debug instructions executed by the core 102 may not be handled well. Advantageously, however, the SPU 2423 can be commanded by a core 102 to detect the event and perform operations such as creating a log or modifying the behavior of various aspects of the core 102 and/or the microprocessor 100 external bus interface as a response to detecting this event. The SPU 2423 can provide the log file information to the user, and it can also interact with the tracker to request the tracker to provide the log file information or request the tire tracker to perform other actions. In one embodiment, the SPU 2423 has access to control registers of the memory subsystem and programmable interrupt controllers of each core 102 , and control registers of the shared cache 119 .

该SPU 2423可侦测事件的例子包括如下:(1)一核102正运作,例如,该核102在一数量的时钟周期中尚未引退(retire)可编程的任何指令;(2)一核102载入由存储器中一非高速缓冲区域内的数据;(3)在该微处理器100中温度发生改变;(4)该操作系统请求在该微处理器100总线时脉比的一变化及/或请求在该微处理器100电压水平的一变化;(5)符合本身的该微处理器100改变电压水平和/或总线时脉比,例如,以达成省电及改善性能;(6)一核102的一内部计时器逾时;(7)一高速缓冲窥探(snoop),其碰撞到一修改后的高速暂存行(Cache line),而导致该高速暂存行被写回至存储器中;(8)该微处理器100的温度、电压、总线时脉比超出一各自的范围;(9)一外部触发信号在该微处理器100的一外部管脚(pin)中由一使用者所确立。Examples of events that the SPU 2423 can detect include the following: (1) a core 102 is running, for example, the core 102 has not retired any programmable instructions within a number of clock cycles; (2) a core 102 Loading data from a non-cache area in memory; (3) temperature changes in the microprocessor 100; (4) the operating system requests a change in bus clock ratio in the microprocessor 100 and/or Or request a change in the microprocessor 100 voltage level; (5) the microprocessor 100 changes the voltage level and/or the bus clock ratio according to itself, for example, to achieve power saving and improve performance; (6) a An internal timer of the core 102 expires; (7) a high-speed buffer snoop (snoop), which bumps into a modified high-speed temporary storage line (Cache line), causing the high-speed temporary storage line to be written back to the memory ; (8) the temperature, voltage, and bus clock ratio of the microprocessor 100 exceed a respective range; (9) an external trigger signal is generated by a user in an external pin (pin) of the microprocessor 100 established.

有利的是,因该SPU 2423独立运行所述核102的程序码132,其不具有像是在该核102中执行追踪器微码(tracer code)相同的限制。因此,该SPU 2423可侦测或被通知独立于该核102指令执行边界的事件并且不中断该核102的状态。Advantageously, since the SPU 2423 independently runs the program code 132 of the core 102, it does not have the same constraints as executing tracer code in the core 102. Thus, the SPU 2423 can detect or be notified of events independent of the instruction execution boundaries of the core 102 and without interrupting the state of the core 102 .

该SPU 2423具有其执行本身的程序码。该SPU 2423可以从非核微码ROM 2425或从非该核PRAM 116中提取其程序码。即,更佳地说,该SPU 2423与该非核ROM 2425和该非核PRAM 116共享运行于该核102中的微码。该SPU 2423使用该非核PRAM 116以储存其数据,包括该纪录档。在一实施例中,该SPU 2423还包括其本身的序列端口接口,其可传送该纪录档至一外部装置。有利的是,该SPU 2423也可指示在一核102中运行的追踪器以将该纪录档信息由非核PRAM 116储存至系统存储器中。The SPU 2423 has its own program code for execution. The SPU 2423 can fetch its program code from the non-core microcode ROM 2425 or from the non-core PRAM 116 . That is, more preferably, the SPU 2423 shares the microcode running in the core 102 with the uncore ROM 2425 and the uncore PRAM 116 . The SPU 2423 uses the non-core PRAM 116 to store its data, including the log files. In one embodiment, the SPU 2423 also includes its own serial port interface, which can transmit the log file to an external device. Advantageously, the SPU 2423 can also instruct a tracker running on a core 102 to store the log file information from the uncore PRAM 116 into system memory.

该SPU 2423通过状态暂存器及控制暂存器与所述核102通信。该SPU状态暂存器包括对应描述于上方且该SPU 2423可侦测每一事件的一位。为了通知该SPU 2423一事件,该核102在对应该事件的SPU状态暂存器中设置一位。一些事件位由该微处理器100的硬件所设置以及一些由所述核102的微码所设置。该SPU 2423读取该状态暂存器以决定已发生的事件的列表。一控制暂存器包括对应每一操作的位,其每一操作为该SPU 2423响应侦测在状态暂存器中指定事件其中之一的一操作。也就是说,对于在该状态暂存器每个可能的事件,一组操作位存在于该控制暂存器中。在一实施例中,每一事件有16个动作位。在一实施例中,当该状态暂存器被写入以指示一事件时,其会造成该SPU 2423中断,以作为该SPU2423读取该状态暂存器的响应,以决定哪些事件已经发生。有利的是,如此可通过减少该SPU 2423轮询该状态暂存器的需求以节省电源。该状态暂存器及控制暂存器也可由执行指令(例如,RDMSR和WRMSR指令)的使用者程序读取和写入。The SPU 2423 communicates with the core 102 through state registers and control registers. The SPU status register includes a bit corresponding to each event described above and the SPU 2423 can detect. To notify the SPU 2423 of an event, the core 102 sets a bit in the SPU status register corresponding to the event. Some event bits are set by the microprocessor 100 hardware and some are set by the core 102 microcode. The SPU 2423 reads the status register to determine the list of events that have occurred. A control register includes bits for each operation that the SPU 2423 responds to detecting one of the events specified in the status register. That is, for each possible event in the status register, a set of operation bits exists in the control register. In one embodiment, each event has 16 action bits. In one embodiment, when the status register is written to indicate an event, it causes an interrupt to the SPU 2423 in response to the SPU 2423 reading the status register to determine which events have occurred. Advantageously, this saves power by reducing the need for the SPU 2423 to poll the status register. The status and control registers are also readable and writable by user programs executing instructions (eg, RDMSR and WRMSR instructions).

该SPU 2423可执行作为侦测一事件响应的该组操作包括以下各项。(1)将该纪录档信息写入该非核PRAM 116。对于每一写入纪录档的操作,多个的操作位存在以使程序设计人员指定该仅特定纪录档信息的子集应被写入。(2)由该非核PRAM 116中写入该纪录档信息至该序列端口接口。(3)写入控制暂存器其中之一以设定追踪器的一事件。也就是说,该SPU 2423可中断一核102并导致该追踪器微码需执行一组与该事件相关的操作。该操作可通过先前的使用者所指定。在一实施例中,当该SPU 2423写入该控制暂存器以设置该事件时,这会造成该核102一机器检查异常,并且该机器检查异常处理机检查以查看追踪器是否被启动。若是,则机器检查异常处理机转换控制至该追踪器。该追踪器读取该控制暂存器并且若设置在该控制暂存器中的事件为使用者已启用该追踪器的事件时,该追踪器通过与事件相关的使用者执行先前所描述的操作。例如,该SPU 2423可设置一事件以造成该追踪器将储存在非核PRAM 116中的纪录档信息写入系统存储器中。(4)写入一控制暂存器,以造成该微码分支到由该SPU 2423所指定的一微码地址。这是特别有助于如果该微码在一无限循环中,使得该追踪器不能执行任何有意义的操作,但该核102仍执行并退回(retire)该指令,其意指该处理器正执行的事件将不会发生。(5)写入一控制暂存器以使一核102重置。如上面所提到,该SPU 2423可侦测一正进行的核102(例如,对一些时间可编程量而言,尚未退回(retire)任何指令)并重置该核。该重置微码会检查以查看该重置是否由该SPU 2423所发起,若是,在初始化该核102的过程中,有助于在清除该纪录档信息前写出该纪录档信息至系统存储器中。(6)连续记录档事件。在此种模式下,并非等待一事件被中断,而是该SPU2423在一检查该状态暂存器的循环(loop)中旋转(spin),并连续地记录信息至表示于此与事件相关的该非核PRAM116,且可选择额外将该纪录档信息写入该序列端口接口。(7)写入一控制暂存器,以停止一核102发出请求到该共享高速缓冲存储器119,及/或停止该共享高速缓冲存储器119确认请求至核102。这在移除存储器子系统相关的设计错误特别有用,像是页面转换表(tablewalk)硬件错误,甚至可在该微处理器100操作期间内修改该错误,像是通过一修补修改该SPU 2423程序码,如下所述。(8)写入到该微处理器100一外部总线接口控制器的控制暂存器,以执行在外部系统总线中的处理,像是特定的周期或存储器读取/写入周期。(9)写入至一核102可编程中断控制器的一控制暂存器,例如,产生一中断到另一核102或模拟一I/O装置到核102或固定修复在该中断控制器中的一错误。(10)写入一该共享高速缓冲存储器119的一控制暂存器以控制其大小,例如,以不同方式停用或启用相关的共享高速缓冲存储器119。(11)写入核102各种功能单元的控制暂存器以配置不同的性能特征,像是分支预测(branch prediction)和数据预提取(prefetch)演算法。如下所述,该SPU2423程序码可有助于被修补,即使在完成该微处理器100的设计且已制造出该微处理器100之后,使该SPU 2423执行如本文所述的动作修补设计的缺陷或执行其它功能。The set of operations that the SPU 2423 can perform in response to detecting an event includes the following. (1) Write the log file information into the non-core PRAM 116 . For each write operation to a log file, multiple operation bits exist to allow the programmer to specify that only a subset of the particular log file information should be written. (2) Write the log file information from the non-core PRAM 116 to the serial port interface. (3) Write one of the control registers to set an event of the tracker. That is, the SPU 2423 can interrupt a core 102 and cause the tracer microcode to perform a set of operations related to the event. The operation can be specified by the previous user. In one embodiment, when the SPU 2423 writes to the control register to set the event, this causes the core 102 a machine check exception, and the machine check exception handler checks to see if the tracker is enabled. If so, the machine check exception handler transfers control to the tracker. The tracker reads the control register and if the event set in the control register is an event for which the user has enabled the tracker, the tracker performs the operations previously described by the user associated with the event . For example, the SPU 2423 can set an event to cause the tracker to write log file information stored in the uncore PRAM 116 into system memory. (4) Write a control register to cause the microcode to branch to a microcode address specified by the SPU 2423. This is especially helpful if the microcode is in an infinite loop such that the tracer cannot perform any meaningful operations, but the core 102 still executes and retires the instruction, which means the processor is executing event will not occur. (5) Write a control register to reset a core 102 . As mentioned above, the SPU 2423 can detect a core 102 in progress (eg, no instructions have been retired for some programmable amount of time) and reset the core. The reset microcode checks to see if the reset was initiated by the SPU 2423, and if so, during initialization of the core 102, helps write the log file information to system memory before clearing the log file information middle. (6) Record file events continuously. In this mode, rather than waiting for an event to be interrupted, the SPU 2423 spins in a loop that checks the status register, and continuously records information to the non-core PRAM 116, and optionally additionally write the record file information into the serial port interface. (7) Write a control register to stop a core 102 from issuing requests to the shared cache 119 , and/or stop the shared cache 119 from acknowledging requests to the core 102 . This is especially useful in removing memory subsystem related design errors, such as page translation table (tablewalk) hardware errors, which can even be corrected during operation of the microprocessor 100, such as by modifying the SPU 2423 program through a patch code, as described below. (8) Write to the control register of the external bus interface controller of the microprocessor 100 to perform processing in the external system bus, such as specific cycles or memory read/write cycles. (9) Write to a control register of a core 102 programmable interrupt controller, for example, generate an interrupt to another core 102 or simulate an I/O device to the core 102 or fix it in the interrupt controller of a mistake. (10) Write to a control register of the shared cache 119 to control its size, eg, disable or enable the associated shared cache 119 in different ways. (11) Writing control registers of various functional units of the core 102 to configure different performance features, such as branch prediction and data prefetch algorithms. As described below, the SPU 2423 program code can facilitate being patched, causing the SPU 2423 to perform actions as described herein to patch the design even after the microprocessor 100 has been designed and manufactured. defects or perform other functions.

该SPU起始地址暂存器2497保持当该SPU 2423移除重置时,开始提取指令的该地址。该SPU起始地址暂存器由核102写入。该地址可位于非核PRAM116或非核微码ROM 2425中。The SPU start address register 2497 holds the address at which to start fetching instructions when the SPU 2423 is removed from reset. The SPU start address register is written by the core 102 . This address can be located in uncore PRAM 116 or in uncore microcode ROM 2425 .

请参阅图25,其是显示根据本发明一实施例一微码修补2500的架构方块图。在图25的实施例中,该微码修补2500包括下列部分:一标头2502;一即时修补2504;该即时修补2504的校对和(Checksum)2506;一CAM数据2508;一核PRAM修补2512;该CAM数据2508及核PRAM修补2512的一校对和2514;一RAM修补2516;一非核PRAM修补2518;该核PRAM修补2512及RAM修补2516的一校对和2522。校对和2506/2514/2522在被载入至该微处理器100之后,使该微处理器100核对修补各个部分的完整性。更佳地说,该微码修补2500由系统存储器和/或一非挥发性(Non-volatile)系统所读取,举例来说,像是从具有一系统BIOS或可扩展固件的ROM或FLASH存储器中。标头2502描述该修补2500的各部分,像是其大小、在其载入修补部分各自修补相关存储器中的位置、及一指示该部分是否包含一应用于该微处理器100有效修补的一有效旗标。Please refer to FIG. 25 , which is a block diagram showing a microcode patch 2500 according to an embodiment of the present invention. In the embodiment of Figure 25, the microcode patch 2500 includes the following parts: a header 2502; a real-time patch 2504; a checksum (Checksum) 2506 of the real-time patch 2504; a CAM data 2508; a core PRAM patch 2512; A collation 2514 of the CAM data 2508 and core PRAM patch 2512; a RAM patch 2516; a non-core PRAM patch 2518; a collation 2522 of the core PRAM patch 2512 and RAM patch 2516. After the checksum 2506/2514/2522 is loaded into the microprocessor 100, the microprocessor 100 checks the integrity of each part of the patch. More preferably, the microcode patch 2500 is read from system memory and/or a non-volatile system, such as ROM or FLASH memory with a system BIOS or scalable firmware, for example middle. Header 2502 describes the various parts of the patch 2500, such as its size, the location in its respective patch-associated memory where the patched part is loaded, and a valid field indicating whether the part contains a valid patch that applies to the microprocessor 100. Flag.

该即时修补2504包括程序码(例如,指令、较佳的微码指令)以被载入至图24的非核微码修补RAM 2408(例如,在图26A~26B的方块2612),接着由每一核102所执行(例如,在图26A~26B的方块2616)。该修补2500还指定该即时修补2504被载入至该修补RAM2408中的地址。更佳地说,该即时修补2504码修改由该重置微码所写入的预设值,像是被写入影响该微处理器100配置的配置暂存器的值。在即时修补2504由在该修补RAM2408外的每一核所执行后,并不会再次被执行。此外,后续该RAM修补2516载入至该修补RAM2408的过程(例如,在图26A~26B中的方块2632)可能会覆盖在该修补RAM2408的即时修补2504。The live patch 2504 includes program code (e.g., instructions, preferably microcode instructions) to be loaded into the non-core microcode patch RAM 2408 of FIG. 24 (e.g., at block 2612 of FIGS. Executed by core 102 (eg, block 2616 in FIGS. 26A-26B ). The patch 2500 also specifies the address in the patch RAM 2408 at which the immediate patch 2504 is loaded. More preferably, the hotfix 2504 code modifies default values written by the reset microcode, such as values written to configuration registers that affect the configuration of the microprocessor 100 . After the on-the-fly patch 2504 is executed by each core outside the patch RAM 2408, it will not be executed again. In addition, subsequent loading of the RAM patch 2516 into the patch RAM 2408 (eg, block 2632 in FIGS. 26A-26B ) may overwrite the live patch 2504 in the patch RAM 2408 .

该RAM修补2516包括用以取代在核ROM2404或需修补的非核ROM2425中的修补微码指令。该RAM修补2516还包括在当该修补2500被使用时,该修补微码指令被写入至该修补RAM 2408中该位置的地址(例如,在图26A~26B的方块2632)。该CAM数据2508被载入至每一核102的该修补CAM2439(例如,在图26A~26B的方块2626)。以上是以该修补CAM 2439的操作角度所描述,该CAM数据2508包括一或多个项目,每一项目包括一对微码提取地址。该第一地址是被提取的微码指令以及由该提取地址匹配的内容。该第二地址是指向在该修补RAM 2408中的地址,其该修补RAM 2408具有取代被修补微码指令而被执行的该修补微码指令。The RAM patch 2516 includes patched microcode instructions to replace in the core ROM 2404 or non-core ROM 2425 to be patched. The RAM patch 2516 also includes the address at which the patched microcode instructions were written to the location in the patched RAM 2408 when the patch 2500 was used (eg, at block 2632 in FIGS. 26A-26B ). The CAM data 2508 is loaded into the patch CAM 2439 of each core 102 (eg, at block 2626 in FIGS. 26A-26B ). The above is described from the perspective of the operation of the repair CAM 2439, the CAM data 2508 includes one or more items, and each item includes a pair of microcode extraction addresses. The first address is the fetched microcode instruction and what is matched by the fetched address. The second address points to an address in the patch RAM 2408 that has the patched microcode instructions executed in place of the patched microcode instructions.

不同于该即时修补2504,该RAM修补2516维持在该修补RAM2408中,并(与根据修补CAM数据2508的该修补CAM2439操作一起)继续运作以修补该核微码ROM 2404和/或该非核微码ROM 2425,直到由另一修补2500或该微处理器100重置为止。Unlike the live patch 2504, the RAM patch 2516 is maintained in the patch RAM 2408 and continues (along with the patch CAM 2439 operations based on patch CAM data 2508) to patch the core microcode ROM 2404 and/or the non-core microcode ROM 2425 until reset by another patch 2500 or the microprocessor 100.

该核PRAM修补2512包括被写入至每一核102的该核PRAM2499的数据及在该数据每一项目被写入该核PRAM2499内的地址(例如,在图26A~26B的方块2626)。该非核PRAM修补2518包括被写入至该非核PRAM 116的数据及在该数据每一项目被写入该非核PRAM 116内的地址(例如,在图26A~26B的方块2632)。The core PRAM patch 2512 includes the data written to the core PRAM 2499 of each core 102 and the address at which each entry of the data is written into the core PRAM 2499 (eg, at block 2626 of FIGS. 26A-26B ). The uncore PRAM patch 2518 includes the data written to the uncore PRAM 116 and the address at which each entry in the data is written into the uncore PRAM 116 (eg, at block 2632 of FIGS. 26A-26B ).

请参阅图26A~26B,其是显示图24中该微处理器100的一操作以传播图25的一微码修补2500至该微处理器100的多个核102的一流程图。该操作是以一单一和新的角度所描述,但该微处理器100每一核102根据本发明操作以共同传播该微码修补至该微处理器100的所有核102。图26A~26B描述一遇到该指令的核使用一修改至该微码的操作,其流程开始于方块2602,而其它核102的操作,其流程开始于方块2652。应可理解的是,多个修补2500可在该微处理器100操作期间内的不同时间中应用至该微处理器100。例如一第一修补2500当包括该微处理器100的该系统被引导时,像是在BIOS初始化期间内,根据描述于本文中原子的实施例而被使用,以及一第二修补2500在该操作系统运行后而被使用,其对以清除该处理器100错误的目的而言特别有用。Please refer to FIGS. 26A-26B , which are a flowchart showing an operation of the microprocessor 100 in FIG. 24 to propagate a microcode patch 2500 of FIG. 25 to the cores 102 of the microprocessor 100 . The operation is described in a single and new perspective, but each core 102 of the microprocessor 100 operates in accordance with the present invention to jointly propagate the microcode patch to all cores 102 of the microprocessor 100 . FIGS. 26A-26B describe the operation of a core encountering the instruction using a modification to the microcode, whose flow begins at block 2602 , and the operation of other cores 102 , whose flow begins at block 2652 . It should be appreciated that fixes 2500 may be applied to microprocessor 100 at different times during operation of microprocessor 100 . For example a first patch 2500 is used according to the atomic embodiments described herein when the system including the microprocessor 100 is booted, such as during BIOS initialization, and a second patch 2500 is used during the operation Used after the system is running, it is especially useful for the purpose of clearing the processor 100 of errors.

在方块2602中,核102其中之一遇到一指示其应用该微码修补在该微处理器100中的指令。更佳地说,该微码修补类似于上面所述的微码修补。在一实施例中,该应用微码修补指令是一x86WRMSR指令。为响应该应用微码修补指令,该核102停用中断并阻止执行该应用微码修补指令的微码。应可理解的是,包括该应用微码修补指令的该系统软件可包括一多指令序列,以作为该微码修补应用的准备。然而,更佳地说,其作为该序列单一架构指令的响应,而该微码修补在该架构指令级中以一原子方式被传播至所有核。也就是说,一旦中断在该第一核102中被停用(例如,在方块2602中,该核102遇到该应用微码修补指令),当执行的微码传播该微码修补且应用至该微处理器100所有核102时(例如,直到在方块2652后为止),中断仍维持停用;再者,一旦在其它核102中被停用(例如,在方块2652),其仍被停用直到该微码修补已被应用到该微处理器100所有核102中为止(例如,直到方块2634后为止)。因此,有利的是,该微码修补在该架构指令级以一原子方式被传播并应用于该微处理器100的所有核102中。流程进行到方块2604。In block 2602, one of the cores 102 encounters an instruction instructing it to apply the microcode patch in the microprocessor 100. More preferably, the microcode patching is similar to the microcode patching described above. In one embodiment, the application microcode patch instruction is an x86WRMSR instruction. In response to the application microcode patch instruction, the core 102 disables interrupts and prevents execution of the microcode of the application microcode patch instruction. It should be understood that the system software including the application microcode patching instruction may include a sequence of instructions in preparation for the microcode patching application. More preferably, however, as a response to the sequence of single architectural instructions, the microcode patch is propagated atomically to all cores at the architectural instruction level. That is, once interrupts are disabled in the first core 102 (e.g., in block 2602, the core 102 encounters the apply microcode patch instruction), when executing microcode propagates the microcode patch and applies to When all cores 102 of the microprocessor 100 (e.g., until after block 2652), interrupts remain disabled; furthermore, once disabled in other cores 102 (e.g., at block 2652), they are still disabled Use until the microcode patch has been applied to all cores 102 of the microprocessor 100 (eg, until after block 2634). Thus, advantageously, the microcode patch is propagated atomically at the architectural instruction level and applied to all cores 102 of the microprocessor 100 . Flow proceeds to block 2604.

在方块2604中,该核102获得图1中该硬件信号量118的所有权。更佳地说,该微处理器100包括一与修补微码相关的硬件信号量118。更佳地说,该核102以一方式获得硬件信号量118的所有权,其方式类似上方图20所描述,更具体地说为方块2004和2006。该硬件信号量118被使用由于有可能核102其中之一使用一修补2500以作为遇到一应用微码修补指令的响应,而一第二核102遇到一应用微码修补指令,以作为该第二核将开始使用该第二修补2500,其可能会造成不正确的执行,举例来说,由于该第一修补2500的误用。流程进行到方块2606。In block 2604, the core 102 takes ownership of the hardware semaphore 118 in FIG. 1 . More preferably, the microprocessor 100 includes a hardware semaphore 118 associated with the patched microcode. More preferably, the core 102 acquires ownership of the hardware semaphore 118 in a manner similar to that described above in FIG. 20 , more specifically blocks 2004 and 2006 . The hardware semaphore 118 is used because it is possible that one of the cores 102 uses a patch 2500 in response to encountering an application microcode patch instruction, and a second core 102 encounters an application microcode patch instruction as the response to the application microcode patch instruction. The second core will start using the second patch 2500, which may cause incorrect execution, for example, due to misuse of the first patch 2500. Flow proceeds to block 2606.

在方块2606中,该核102传送一修补信息至其它核102并且传送一核间中断至其它核102。更佳地说,该核102在时间中断被停用的期间内(例如,该微码不允许其本身被中断)阻止该微码以响应该应用微码修补指令(方块2602),或响应该中断(方块2652),并保持该微码中,直到方块2634为止。流程由方块2606进行到方块2608。In block 2606 , the core 102 sends a patch message to the other core 102 and sends an inter-core interrupt to the other core 102 . More preferably, the core 102 prevents the microcode from responding to the apply microcode patch instruction (block 2602) during the period in which time interrupts are disabled (e.g., the microcode does not allow itself to be interrupted), or in response to the Interrupt (block 2652), and remain in the microcode until block 2634. Flow proceeds from block 2606 to block 2608 .

在方块2652中,其它核102之一(例如,在方块2602中除了遇到该应用微码修补指令的该核102之外的一核)被中断并且因在方块2606中所传送的该核间中断而接收该修补信息。在一实施例中,该核102在下一架构指令边界中(例如,在下一x86指令边界中)取得该中断。为了响应该中断,该核102停用中断且阻止处理该修补信息的微码。如上所述,虽然在方块2652中的流程是以一单一核102的角度所描述,但每一其它核102(例如,没在方块2602中的核102)在方块2652中被中断并接收该信息,且执行在方块2608至方块2634的步骤。流程由方块2652进行到方块2608。In block 2652, one of the other cores 102 (e.g., a core other than the core 102 that encountered the apply microcode patch instruction in block 2602) is interrupted and the inter-core interrupt to receive the patch information. In one embodiment, the core 102 fetches the interrupt in the next architectural instruction boundary (eg, in the next x86 instruction boundary). In response to the interrupt, the core 102 disables interrupts and prevents the microcode from processing the patch information. As mentioned above, although the flow in block 2652 is described from the perspective of a single core 102, every other core 102 (e.g., the core 102 not in block 2602) is interrupted in block 2652 and receives the information , and the steps at block 2608 to block 2634 are performed. Flow proceeds from block 2652 to block 2608 .

在方块2608中,该核102写入一同步情况21的同步请求(在图26A~26B中标示为SYNC 21)至其同步暂存器108中,并由该控制单元104令该核102进入睡眠状态,并随后当所有核102已写入SYNC 21时,由该控制单元104唤醒。流程进行到判断方块2611。In block 2608, the core 102 writes a synchronization request for a synchronization condition 21 (indicated as SYNC 21 in FIGS. 26A-26B ) into its synchronization register 108, and the control unit 104 puts the core 102 into sleep state, and subsequently woken up by the control unit 104 when all cores 102 have written to SYNC 21 . The process proceeds to decision block 2611.

在判断方块2611中,该核102判断其是否为遇见方块2602中的该微码修补的核102(与在方块2652中接收该修补信息的一核102相比较)。若是,则流程进行到方块2612;否则,流程进行到方块2614。In decision block 2611, the core 102 determines whether it is the core 102 that encountered the microcode patch in block 2602 (compared to a core 102 that received the patch information in block 2652). If so, the process proceeds to block 2612; otherwise, the process proceeds to block 2614.

在方块2612中,该核102将该微码修补2500的即时修补2504的一部分载入至该非核修补RAM 2408。此外,该核102产生该载入即时修补2504的一检查和并验证其与该校对和2506相匹配。更佳地说,该核102也传送信息至其它核102,其指示该即时修补2504的长度及该即时修补2504被载入在非核修补RAM2408内的位置。有利的是,因所有核102已知执行实行该微码修补应用的相同微码,因此当一先前RAM修补2516存在于该非核修补RAM2408时,则由于在这段期间内(假设实行于该微码修补应用的微码并不被修补)在该修补CAM 2439中将不具有碰撞(hit),故使用该新修补覆盖该非核修补RAM2408是安全的。在另一实施例中,该核102将该即时修补2504载入到该非核PRAM 116,并在方块2616中的该即时修补2504执行之前,核102将该立即修补2504从该非核PRAM 116复制到该非核修补RAM 2408。更佳地说,该核102将该立即修补载入至被保留用于此目的的该非核PRAM 116的一部分,例如,不被用于其它目的的该非核PRAM 116的一部分,像是持有由该微码所使用的值(例如,如上所述的核102状态、TPM状态、或有效微码常数),并且该非核PRAM 116的一部分可以被修补(例如,在方块2632),以使任一先前非核PRAM修补2518不被破坏(clobber)。在一实施例中,载入该非核PRAM 116或由该非核PRAM 116所复制的动作在多个阶段中执行,以减少该已保留部分所需的大小。流程进行到方块2614。In block 2612 , the core 102 loads a portion of the live patch 2504 of the microcode patch 2500 into the non-core patch RAM 2408 . Additionally, the core 102 generates a checksum of the loaded live patch 2504 and verifies that it matches the checksum 2506 . More preferably, the core 102 also sends information to the other cores 102 indicating the length of the immediate patch 2504 and the location in the non-core patch RAM 2408 where the immediate patch 2504 is loaded. Advantageously, since all cores 102 are known to execute the same microcode that implements the microcode patch application, when a previous RAM patch 2516 exists in the non-core patched RAM 2408, then due to the time period (assuming it was implemented on the microcode) The microcode of the code patching application is not patched) there will be no hits in the patched CAM 2439, so it is safe to overwrite the non-core patched RAM 2408 with the new patch. In another embodiment, the core 102 loads the immediate patch 2504 into the uncore PRAM 116, and before the instant patch 2504 in block 2616 is executed, the core 102 copies the immediate patch 2504 from the uncore PRAM 116 to The non-core patched RAM 2408. More preferably, the core 102 loads the immediate patch into a portion of the uncore PRAM 116 that is reserved for this purpose, e.g., a portion of the uncore PRAM 116 that is not used for other purposes, such as holding the values used by the microcode (e.g., core 102 state, TPM state, or effective microcode constants as described above), and a portion of the non-core PRAM 116 may be patched (e.g., at block 2632) so that any The previous non-core PRAM patch 2518 was not clobbered. In one embodiment, the act of loading or copying by the uncore PRAM 116 is performed in multiple stages to reduce the required size of the reserved portion. Flow proceeds to block 2614.

在方块2614中,该核102写入一同步情况22(在图26A~26B中标示为SYNC 22)的同步请求到其同步暂存器108,并由该控制单元104使该核102进入睡眠状态,随后当所有核102写入一SYNC 22时,由控制单元104所唤醒。流程进行到方块2616。In block 2614, the core 102 writes a synchronization request of a synchronization condition 22 (indicated as SYNC 22 in FIGS. 26A-26B ) to its synchronization register 108, and the control unit 104 puts the core 102 into a sleep state , and then wake up by the control unit 104 when all cores 102 write a SYNC 22 . Flow proceeds to block 2616.

在方块2616中,该核102执行该非核修补RAM2408中的该即时修补2504。如上所述,在一实施例中,在该核102执行该即时修补2504之前,该核102将该即时修补2504由该非核修补RAM 116复制至该非核修补RAM 2408。流程进行至方块2618。In block 2616 , the core 102 executes the on-the-fly patch 2504 in the non-core patch RAM 2408 . As described above, in one embodiment, the core 102 copies the live patch 2504 from the uncore patch RAM 116 to the uncore patch RAM 2408 before the core 102 executes the live patch 2504 . Flow proceeds to block 2618.

在方块2618中,该核102写入一同步情况23(在图26A~26B中标示为SYNC 23)的同步请求到其同步暂存器108,并由该控制单元104使该核102进入睡眠状态,随后当所有核102写入一SYNC 23时,由控制单元104所唤醒。流程进行到判断方块2621。In block 2618, the core 102 writes a synchronization request of a synchronization condition 23 (indicated as SYNC 23 in FIGS. 26A-26B ) to its synchronization register 108, and the control unit 104 puts the core 102 into a sleep state , and then wake up by the control unit 104 when all cores 102 write a SYNC 23 . The process proceeds to decision block 2621.

在判断方块2621中,该核102决定该核102是否为遇到在方块2602中的该应用微码修补指令的核102(与在方块2652中接收该修补信息的一核102相比较)。若是,则流程进行到方块2622;否则,流程进行到方块2624。In decision block 2621, the core 102 determines whether the core 102 is the core 102 that encountered the apply microcode patch instruction in block 2602 (compared to a core 102 that received the patch information in block 2652). If so, the process proceeds to block 2622; otherwise, the process proceeds to block 2624.

在方块2622中,该核102将该CAM数据2508及核PRAM修补2512载入至该非核PRAM116。此外,该核102产生该载入CAM数据2508及核PRAM修补2512的一检查和并验证其与该校对和2514相匹配。更佳地说,该核102也传送信息至其它核102,其指示该CAM数据2508及核PRAM修补2512的长度,以及该CAM数据2508及核PRAM修补2512被载入在非核PRAM 116内的位置。更佳地说,该核102将该CAM数据2508及核PRAM修补2512载入至该非核PRAM 116的一已保留部分,以使任一先前非核PRAM修补2518不被破坏(clobber),其类似于方块2612中所描述的方式。流程前进至方块2624。In block 2622 , the core 102 loads the CAM data 2508 and core PRAM patch 2512 into the uncore PRAM 116 . Additionally, the core 102 generates a checksum of the loaded CAM data 2508 and core PRAM patch 2512 and verifies that it matches the checksum 2514 . More preferably, the core 102 also transmits information to the other cores 102 indicating the length of the CAM data 2508 and core PRAM patch 2512, and the location in the uncore PRAM 116 where the CAM data 2508 and core PRAM patch 2512 are loaded . More preferably, the core 102 loads the CAM data 2508 and core PRAM patch 2512 into a reserved portion of the uncore PRAM 116 so that any previous uncore PRAM patch 2518 is not clobbered, similar to In the manner described in block 2612. Flow proceeds to block 2624.

在方块2624中,该核102写入一同步情况24(在图26A~26B中标示为SYNC 24)的同步请求到其同步暂存器108,并由该控制单元104使该核102进入睡眠状态,随后当所有核102写入一SYNC 24时,由控制单元104所唤醒。流程进行到方块2626。In block 2624, the core 102 writes a synchronization request of a synchronization condition 24 (indicated as SYNC 24 in FIGS. 26A-26B ) to its synchronization register 108, and the control unit 104 puts the core 102 into a sleep state , and then wake up by the control unit 104 when all cores 102 write a SYNC 24 . Flow proceeds to block 2626.

在方块2626中,该核102将该CAM数据2508由该非核PRAM 116载入至其修补CAM2439。此外,该核102将该核PRAM修补2512由该非核PRAM 116载入至其核PRAM 2499。有利的是,由于所有核已知正执行实行于该微码修补应用中相同的微码,即使该对应RAM修补2516尚未被写入该非核修补RAM 2408(其将在方块2632中发生),由于在这段期间内(假设实行于该微码修补应用的微码并不被修补)在该修补CAM 2439中将不具有碰撞(hit),故使用该CAM数据2508载入该修补CAM 2439是安全的。此外,由于所有核102已知正执行实行于该微码修补应用中相同的微码,并且中断将不在任一核102中使用直到该修补2500被传播至所有核102为止,因此由该核PRAM修补2512所执行至该核PRAM 2499的任一更新,其包括用以改变可能影响该核102操作的值的更新(例如,功能设置),保证不会在架构中看见,直到该修补2500已被传播到所有核102为止。流程进行到方块2628。In block 2626 , the core 102 loads the CAM data 2508 from the uncore PRAM 116 into its patched CAM 2439 . Additionally, the core 102 loads the core PRAM patch 2512 from the uncore PRAM 116 into its core PRAM 2499 . Advantageously, since all cores are known to be executing the same microcode implemented in the microcode patch application, even if the corresponding RAM patch 2516 has not yet been written to the non-core patch RAM 2408 (which would occur in block 2632), due to the During this period (assuming the microcode implemented in the microcode patch application is not patched) there will be no hits in the patch CAM 2439, so it is safe to load the patch CAM 2439 using the CAM data 2508 . Furthermore, since all cores 102 are known to be executing the same microcode implemented in the microcode patch application, and interrupts will not be used in any core 102 until the patch 2500 is propagated to all cores 102, the core PRAM patch Any updates performed by 2512 to the core PRAM 2499, including updates to change values (e.g., feature settings) that may affect the operation of the core 102, are guaranteed not to be seen in the architecture until the patch 2500 has been propagated up to all cores 102. Flow proceeds to block 2628.

在方块2628中,该核102写入一同步情况25(在图26A~26B中标示为SYNC 25)的同步请求到其同步暂存器108,并由该控制单元104使该核102进入睡眠状态,随后当所有核102写入一SYNC 25时,由控制单元104所唤醒。流程进行到判断方块2631。In block 2628, the core 102 writes a synchronization request of a synchronization condition 25 (labeled as SYNC 25 in FIGS. 26A-26B ) to its synchronization register 108, and the control unit 104 puts the core 102 into a sleep state , and then wake up by the control unit 104 when all cores 102 write a SYNC 25 . The process proceeds to decision block 2631.

在判断方块2631中,该核102决定该核102是否为遇到在方块2602中的该应用微码修补指令的核102(与在方块2652中接收该修补信息的一核102相比较)。若是,则流程进行到方块2632;否则,流程进行到方块2634。In decision block 2631, the core 102 determines whether the core 102 is the core 102 that encountered the apply microcode patch instruction in block 2602 (compared to a core 102 that received the patch information in block 2652). If so, the process proceeds to block 2632; otherwise, the process proceeds to block 2634.

在方块2632中,该核102载入该RAM修补2516至该非核修补RAM 2408。此外,该核102载入该非核PRAM修补2518至该非核PRAM 116。在一实施例中,该非核PRAM修补2518包括由该SPU 2423所执行的程序码。在一实施例中,该非核PRAM修补2518包括该微码所使用值的更新,如上所述。在一实施例中,该非核PRAM修补2518包括该SPU 2423程序码以及该微码所使用值的更新。有利的是,由于所有核102已知正执行实行于该微码修补应用中相同的微码,更具体地说,所有核102的该修补CAM 2439已被载入该新CAM数据2508(例如,在方块2626中),以及在这段期间内(假设实行于该微码修补应用的微码并不被修补)在该修补CAM2439中将不具有碰撞(hit)。此外,由于所有核102已知正执行实行于该微码修补应用中相同的微码,并且中断将不在任一核102中使用直到该修补2500被传播至所有核102为止,由该非核PRAM修补2518所执行至该非核PRAM 116的任一更新,包括用以改变可能影响该核102操作的值的更新(例如,功能设置),保证不会在架构中看见,直到该修补2500已被传播到所有核102为止。流程进行到方块2634。In block 2632 , the core 102 loads the RAM patch 2516 into the non-core patch RAM 2408 . Additionally, the core 102 loads the uncore PRAM patch 2518 into the uncore PRAM 116 . In one embodiment, the uncore PRAM patch 2518 includes program code executed by the SPU 2423 . In one embodiment, the uncore PRAM patch 2518 includes updates to the values used by the microcode, as described above. In one embodiment, the uncore PRAM patch 2518 includes updates to the SPU 2423 program code and values used by the microcode. Advantageously, since all cores 102 are known to be executing the same microcode implemented in the microcode patch application, more specifically, the patch CAM 2439 of all cores 102 has been loaded with the new CAM data 2508 (e.g., in block 2626), and during this period (assuming the microcode implemented in the microcode patch application is not patched) there will be no hits in the patch CAM 2439. Furthermore, since all cores 102 are known to be executing the same microcode implemented in the microcode patch application, and interrupts will not be used in any core 102 until the patch 2500 is propagated to all cores 102, the non-core PRAM patch 2518 Any updates performed to the uncore PRAM 116, including updates to change values (e.g., feature settings) that may affect the operation of the core 102, are guaranteed not to be seen in the architecture until the fix 2500 has been propagated to all Core 102 so far. Flow proceeds to block 2634.

在方块2634中,该核102写入一同步情况26(在图26A~26B中标示为SYNC 26)的同步请求到其同步暂存器108,并由该控制单元104使该核102进入睡眠状态,随后当所有核102写入一SYNC 26时,由控制单元104所唤醒。流程结束于方块2634。In block 2634, the core 102 writes a synchronization request for a synchronization condition 26 (labeled as SYNC 26 in FIGS. 26A-26B ) to its synchronization register 108, and the control unit 104 puts the core 102 into a sleep state , and then wake up by the control unit 104 when all cores 102 write a SYNC 26 . Flow ends at block 2634.

在方块2634之后,如果程序码被载入至用于该SPU 2423的该非核PRAM116时,该修补核102也接着开始执行该程序码,如图30所述。此外,在方块2634后,该修补核102释放在方块2634中所取得的硬件信号量118。更进一步地说,在方块2634之后,该核102重新启动上述中断。After block 2634, if program code is loaded into the uncore PRAM 116 for the SPU 2423, the patch core 102 then also begins executing the program code, as described in FIG. Additionally, after block 2634 , the patch core 102 releases the hardware semaphore 118 obtained in block 2634 . Further, after block 2634, the core 102 re-enables the aforementioned interrupt.

请参阅图27,其是显示根据图26A~26B流程图的一微处理器操作的一例子的时序图。在此例子中,一微处理器100配置具有三个核102,标示为核0、核1和核2,如图所示。然而,应可理解的是,在其它实施例中,该微处理器100可包括不同数量的核102。在此时序图中,事件进行的时序如下方所述。Please refer to FIG. 27, which is a timing diagram showing an example of the operation of a microprocessor according to the flowcharts of FIGS. 26A-26B. In this example, a microprocessor 100 is configured with three cores 102, labeled Core 0, Core 1, and Core 2, as shown. However, it should be understood that in other embodiments, the microprocessor 100 may include a different number of cores 102 . In this timing diagram, the sequence of events proceeds as described below.

核0接收到一请求修补微码的请求(每一方块2602)并以响应取得该硬件信号量118(每一方块2604)。核0接着传送一微码修补信息及中断至核1及核2(每一方块2606)。核0接着写入一SYNC 21并进入睡眠状态(每一方块2608)。Core 0 receives a request to patch microcode (per block 2602) and in response fetches the hardware semaphore 118 (per block 2604). Core 0 then sends a microcode patch message and interrupt to Core 1 and Core 2 (each block 2606). Core 0 then writes a SYNC 21 and goes to sleep (per block 2608).

每一核1及核2最终由其目前的任务中被中断并读取该信息(每一方块2652)。对此,每一核1及核2写入一SYNC 21并并进入睡眠状态(每一方块2608)。如图所示,例如,由于当该中断被确立时,正执行该指令延迟的因素,每一核写入SYNC 21的时间可能不同。Each Core 1 and Core 2 is eventually interrupted from its current task and reads the information (each block 2652). To this end, Core 1 and Core 2 each write a SYNC 21 and go to sleep (each block 2608). As shown, each core may write to SYNC 21 at a different time, for example due to delays in executing the instruction when the interrupt is asserted.

当所有核已写入SYNC 21时,该控制单元104同时将所有核唤醒(每一方块2608)。核0接着将该即时修补2504载入至该非核PRAM 116(每一方块2612),并写入一SYNC 22,并进入睡眠状态(每一方块2614)。每一核1及核2写入一SYNC 22,并进入睡眠状态(每一方块2614)。When all cores have written to SYNC 21, the control unit 104 wakes up all cores simultaneously (per block 2608). Core 0 then loads the live patch 2504 into the uncore PRAM 116 (per block 2612), writes a SYNC 22, and goes to sleep (per block 2614). Each core 1 and core 2 writes a SYNC 22 and goes to sleep (each block 2614).

当所有核已写入该SYNC 22时,该控制单元104同时将所有核唤醒(每一方块2614)。每一核执行该即时修补2504(每一方块2616)并写入一SYNC23,且进入睡眠状态(每一方块2618)。When all cores have written to the SYNC 22, the control unit 104 wakes up all cores simultaneously (per block 2614). Each core executes the live patch 2504 (per block 2616) and writes a SYNC 23, and goes to sleep (per block 2618).

当所有核已写入该SYNC 23时,该控制单元104同时将所有核唤醒(每一方块2618)。核0接着将该CAM数据2508及核PRAM修补2512载入至非核PRAM 116(每一方块2622),并写入一SYNC 24,且进入睡眠状态(每一方块2624)。When all cores have written to the SYNC 23, the control unit 104 wakes up all cores simultaneously (per block 2618). Core 0 then loads the CAM data 2508 and core PRAM patch 2512 into the uncore PRAM 116 (per block 2622), writes a SYNC 24, and goes to sleep (per block 2624).

当所有核已写入该SYNC 24时,该控制单元104同时将所有核唤醒(每一方块2624)。每一核接着使用该CAM数据2508载入其修补CAM 2439,并使用该核PRAM修补2512(每一方块2626)载入其核PRAM 2499,且写入一SYNC 25,并进入睡眠状态(每一方块2628)。When all cores have written to the SYNC 24, the control unit 104 wakes up all cores simultaneously (per block 2624). Each core then loads its patch CAM 2439 with the CAM data 2508, loads its core PRAM 2499 with the core PRAM patch 2512 (each block 2626), writes a SYNC 25, and goes to sleep (each block 2626) block 2628).

当所有核已写入该SYNC 25时,该控制单元104同时将所有核唤醒(每一方块2628)。核0接着将该RAM修补2516载入至该非核修补RAM 2408,并将该非核PRAM修补2518载入至该非核PRAM 116,以及写入一SYNC 26,并进入睡眠状态(每一方块2634)。When all cores have written the SYNC 25, the control unit 104 wakes up all cores simultaneously (per block 2628). Core 0 then loads the RAM patch 2516 into the uncore patch RAM 2408, loads the uncore PRAM patch 2518 into the uncore PRAM 116, writes a SYNC 26, and goes to sleep (per block 2634).

当所有核已写入该SYNC 26时,该控制单元104同时将所有核唤醒(每一方块2634)。如上所述,若程序码已被载入至用于该SPU 2423中的该非核PRAM 116以方块2632的步骤时,该核102也接着开始执行该程序码,如以下图30所描述。When all cores have written to the SYNC 26, the control unit 104 wakes up all cores simultaneously (per block 2634). As mentioned above, if the program code has been loaded into the non-core PRAM 116 for the SPU 2423 in the step of block 2632, the core 102 also then starts to execute the program code, as described in FIG. 30 below.

请参照图28,其是显示根据另一实施例的一多核微处理器100的方块图。该微处理器100在许多方面相似于图24的微处理器100。然而,图28的微处理器100未包括一非核修补RAM,但包括一核修补RAM 2808在每一核102中,其提供与图24该非核修补RAM 2408类似的功能。然而,在每一核102中的核修补RAM 2808由其各自核102所专用并且不与其它核102所共享。Please refer to FIG. 28 , which is a block diagram showing a multi-core microprocessor 100 according to another embodiment. The microprocessor 100 is similar in many respects to the microprocessor 100 of FIG. 24 . However, the microprocessor 100 of FIG. 28 does not include an uncore patch RAM, but includes a core patch RAM 2808 in each core 102 that provides similar functionality to the uncore patch RAM 2408 of FIG. 24 . However, the core patch RAM 2808 in each core 102 is dedicated to its respective core 102 and is not shared with other cores 102 .

请参阅图29A~29B,其是显示根据另一实施例的图28中该微处理器100用以传播一微码修补至该微处理器100的多个核102的一操作流程图。在图28及图29A~29B的另一实施例中,图25的修补2500可以被修改,使得该校对和2514采用该RAM修补2516,而非采用该核PRAM修补2512,并在该CAM数据2508的完整性、该核PRAM修补2512及该RAM修补2516载入该微处理器100(例如,在图29A~29B中的方块2922)后,启用该微处理器100来验证该CAM数据2508的完整性、该核PRAM修补2512及该RAM修补2516。图29A~29B的流程图在许多方面类似于图26A~26B的流程图,且同样编号的方块也相似。然而,方块2912替换方块2612、方块2916替换方块2616、方块2922替换方块2622、方块2926替换方块2626以及方块2932替换方块2632。在方块2912中,该核102将该即时修补2504载入至该非核PRAM 116(而非载入至一非核修补RAM)。在方块2916中,该核102在执行该即时修补2504之前,将该即时修补2504从非核PRAM 116复制到该核修补RAM 2808。在方块2922中,除了该CAM数据2508及该核PRAM修补2512之外,该核102将该RAM修补2516载入至该非核PRAM 116。在方块2926中,该核102除了将该CAM数据2508由该非核PRAM 116载入至其修补CAM 2439及将该核PRAM修补2512由该非核PRAM 116载入至其核PRAM2499之外,该核102还将该RAM修补2516从该非核PRAM 116载入至其修补RAM 2808。在方块2932中,不同于图26A~26B的方块2632,该核102不将该RAM修补2516载入至一非核修补RAM。Please refer to FIGS. 29A-29B , which are flowcharts illustrating an operation of the microprocessor 100 in FIG. 28 for propagating a microcode patch to the cores 102 of the microprocessor 100 according to another embodiment. In another embodiment of FIG. 28 and FIGS. 29A-29B , the patch 2500 of FIG. 25 can be modified so that the collation 2514 uses the RAM patch 2516 instead of the core PRAM patch 2512, and in the CAM data 2508 After the integrity of the core PRAM patch 2512 and the RAM patch 2516 are loaded into the microprocessor 100 (e.g., block 2922 in FIGS. 29A-29B ), the microprocessor 100 is enabled to verify the integrity of the CAM data 2508 properties, the core PRAM patch 2512 and the RAM patch 2516. The flowchart of Figures 29A-29B is similar in many respects to the flowchart of Figures 26A-26B, and like numbered blocks are also similar. However, block 2912 replaces block 2612 , block 2916 replaces block 2616 , block 2922 replaces block 2622 , block 2926 replaces block 2626 , and block 2932 replaces block 2632 . In block 2912, the core 102 loads the live patch 2504 into the uncore PRAM 116 (instead of loading into an uncore patch RAM). In block 2916, the core 102 copies the immediate patch 2504 from the uncore PRAM 116 to the core patch RAM 2808 before executing the immediate patch 2504. In block 2922 , the core 102 loads the RAM patch 2516 into the uncore PRAM 116 in addition to the CAM data 2508 and the core PRAM patch 2512 . In block 2926, the core 102, in addition to loading the CAM data 2508 from the uncore PRAM 116 into its patch CAM 2439 and loading the core PRAM patch 2512 from the uncore PRAM 116 into its core PRAM 2499, the core 102 The RAM patch 2516 is also loaded from the uncore PRAM 116 into its patch RAM 2808 . In block 2932, unlike block 2632 of Figures 26A-26B, the core 102 does not load the RAM patch 2516 into a non-core patch RAM.

可由上述实施例中观察,益于传播至该微处理器100核102每一相关存储器2439/2499/2808及至相关非核存储器2408/116的该微码修补2500的原子传播以一方式进行以确保该修补2500的完整性及有效性,即使存在多个同时执行的核102,其核102能共享资源,否则当应用于传统方式时,核102可能会破坏(clobber)另一核修补的各部分。It can be observed from the above examples that the atomic propagation of the microcode patch 2500 benefiting from propagation to each associated memory 2439/2499/2808 of the microprocessor 100 core 102 and to associated non-core memory 2408/116 proceeds in a manner to ensure the The integrity and validity of the patch 2500, even if there are multiple concurrently executing cores 102, can share resources that would otherwise clobber parts of another core's patch when applied in a conventional manner.

修补服务处理器程序码patch service processor code

请参阅图30,其是显示图24的微处理器100用以修补一服务处理器程序码的流程图。流程开始于方块3002。Please refer to FIG. 30 , which is a flow chart showing how the microprocessor 100 of FIG. 24 is used to patch a service processor program code. Flow begins at block 3002.

在方块3002中,该核102将由该SPU 2423执行的程序码载入至在一修补所指定一修补地址中的该非核PRAM 116,如上面图26A~26B方块2632中所述。流程进入该方块3004。In block 3002, the core 102 loads the program code executed by the SPU 2423 into the uncore PRAM 116 at a patch address specified by a patch, as described above at block 2632 in FIGS. 26A-26B . The process enters the block 3004.

在方块3004中,该核102控制该SPU 2423以执行在修补地址的程序码,例如,该SPU2423的程序码在方块3002中被写入于非核PRAM 116中的地址。在一实施例中,该SPU 2423配置用以从起始地址暂存器2497提取其重置向量(例如,该SPU 2423移除重置后开始提取指令的地址),以及该核102将该修补地址写入该起始地址暂存器2497,接着写入至一使该SPU 2423被重置的控制暂存器中。流程进行到方块3006。In block 3004 , the core 102 controls the SPU 2423 to execute the program code at the patched address, eg, the address where the SPU 2423 program code was written in the uncore PRAM 116 in block 3002 . In one embodiment, the SPU 2423 is configured to fetch its reset vector from the start address register 2497 (e.g., the address at which the SPU 2423 starts fetching instructions after removing reset), and the core 102 the patch Addresses are written to the start address register 2497, followed by a control register that causes the SPU 2423 to be reset. Flow proceeds to block 3006.

在方块3006中,该SPU 2423开始在该修补地址提取程序码(例如,提取其第一指令),例如,在方块3002中写入该SPU 2423程序码至非核PRAM 116中的地址。一般来说,驻留在该非核PRAM 116中的SPU 2423修补程序码将执行一跳转(jump)至驻留在该非核ROM2425中的SPU 2423程序码。流程结束于方块3006。In block 3006 , the SPU 2423 begins fetching program code (eg, fetching its first instruction) at the patch address, eg, writing the SPU 2423 program code to the address in uncore PRAM 116 in block 3002 . Typically, the SPU 2423 patch code residing in the uncore PRAM 116 will perform a jump to the SPU 2423 program code residing in the uncore ROM 2425 . Flow ends at block 3006.

修补该SPU 2423程序码的功能可能特别有用。例如,该SPU 2423可被用于本质上短暂的性能测试,举例来说,其可能不欲使该性能测试SPU 2423程序码成为该微处理器100永久的一部分,而仅成为发展部分的一部分,例如,对于制造部分而言,仅成为发展部分的一部分。在另一例子中,该SPU2423可用以找寻和/或修复错误。在另一例子中,该SPU 2423可用以配置该微处理器100。The ability to patch the SPU 2423 code may be particularly useful. For example, the SPU 2423 may be used for performance testing that is transient in nature, for example, it may not be desirable for the performance testing SPU 2423 program code to be a permanent part of the microprocessor 100, but only part of the developmental portion, For example, for the manufacturing part, only become part of the development part. In another example, the SPU 2423 can be used to find and/or repair errors. In another example, the SPU 2423 can be used to configure the microprocessor 100 .

更新为每一核即时架构可视储存资源的原子传播Atomic propagation of updates to per-core real-time architecture visible storage resources

请参照图31,其是显示根据另一实施例的一多核微处理器100的方块图。该微处理器100在许多方面相似于图24的微处理器100。然而,图31的微处理器100每一核102还包括架构上可见的存储器类型范围暂存器(Memory Type Range Registers,MTRRs)3102。也就是说,每一核102实例化架构上可见的MTRR 3102,即使系统软件要求MTRR 3102在所有核102中是一致的(更详细的描述如下)。MTRR 3102为每一核实例化架构上可见的储存资源的例子,以及其它每一核实例化架构上可见的储存资源实施例描述如下。(虽然图并未示出,但每一核102还包括图24中的该核PRAM 2499、核微码ROM 2404、修补CAM 2439,以及在一实施例中,图28的核微码修补RAM 2808)。Please refer to FIG. 31 , which is a block diagram showing a multi-core microprocessor 100 according to another embodiment. The microprocessor 100 is similar in many respects to the microprocessor 100 of FIG. 24 . However, each core 102 of the microprocessor 100 in FIG. 31 also includes architecturally visible memory type range registers (Memory Type Range Registers, MTRRs) 3102 . That is, each core 102 instantiates an architecturally visible MTRR 3102, even if system software requires the MTRR 3102 to be consistent across all cores 102 (described in more detail below). MTRR 3102 is an example of per-core instantiation of architecturally visible storage resources, and other per-core instantiation of architecturally visible storage resource embodiments are described below. (Although the figure is not shown, each core 102 also includes the core PRAM 2499, core microcode ROM 2404, patch CAM 2439 in Figure 24, and in one embodiment, the core microcode patch RAM 2808 of Figure 28 ).

MTRR 3102提供一种系统软件,以使一存储器类型与在该微处理器100系统存储器地址空间中多个不同的实体地址范围相关。不同存储器类型的例子包括强不可缓冲的(strong uncacheable)、不可缓冲的(uncacheable)、写入结合(write-combining)、写入通过(write through)、写回(write back)及写入保护(write protected)。每一MTRR3102(明确地或隐含地)指定一存储器范围及其存储器类型。各MTRR3102的共同值定义一存储器映射,其指定不同的存储器范围的存储器类型。在一实施例中,MTRR3102类似于在Intel 64以及IA-32架构软件开发人员手册,第3册:系统编程指南,2013年9月,特别是在第11.11节的描述,其在本文中被引用并构成本说明书的一部分。MTRR 3102 provides system software to associate a memory type with a plurality of different physical address ranges in the microprocessor 100 system memory address space. Examples of different memory types include strong uncacheable, uncacheable, write-combining, write through, write back, and write-protected ( write protected). Each MTRR 3102 specifies (explicitly or implicitly) a memory range and its memory type. The common values for each MTRR 3102 define a memory map that specifies the memory types of different memory ranges. In one embodiment, the MTRR3102 is similar to that described in Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3: System Programming Guide, September 2013, particularly Section 11.11, which is incorporated herein by reference and form part of this manual.

希望由MTRR 3102所定义的存储器映射在用于该微处理器100所有核中为相同的,以使在该微处理器100中运作的该软件具有一存储器一致性。然而,在传统的处理器中,并无硬件支持以维持在一多核处理器核间MTRRs的一致性。如先前提到Intel手册第3册第11-20页底部注解描述,“P6及更多最近的处理器家族提供并无提供用以维持[MTRRs值的一致性]的硬件支持”。因此,系统软件则负责维持跨核MTRR的一致性。上方引用Intel手册第11.11.8节描述系统软件的一演算法,其用以维持与更新与其MTRRs多核处理器每一核相关的一致性,例如,所有核执行更新其各自MTRRs的指令。It is desirable that the memory map defined by MTRR 3102 be the same in all cores for the microprocessor 100 so that the software running in the microprocessor 100 has a memory coherency. However, in conventional processors, there is no hardware support to maintain the consistency of MTRRs among the cores of a multi-core processor. As mentioned earlier in the note at the bottom of pages 11-20 of Book 3 of the Intel Manual, "P6 and more recent processor families do not provide hardware support for maintaining [the consistency of MTRRs values]". System software is therefore responsible for maintaining cross-core MTRR consistency. Section 11.11.8 of the Intel manual cited above describes an algorithm for system software to maintain consistency and update its MTRRs relative to each core of a multi-core processor, eg, all cores execute instructions to update their respective MTRRs.

相反地,该系统软件可在该核102其中之一中更新该MTRR 3102各自请求(instance),以及在一原子方式中利于该核102传播该更新至该微处理器100所有核102中MTRR 3102的各自请求的实施例描述于本文中(类似于描述在上方图24至图30中实施例所执行的一微码修补的方式)。其提供一种用以维持在不同核102的MTRR 3102间架构指令级一致性的方法。Conversely, the system software can update the MTRR 3102 in one of the cores 102 for each instance, and facilitate the core 102 to propagate the update to the MTRR 3102 in all cores 102 of the microprocessor 100 in an atomic fashion. Embodiments of the respective requests are described herein (similar to the manner described above for the implementation of a microcode patch by the embodiments in FIGS. 24-30 ). It provides a method to maintain architectural instruction level consistency between MTRR 3102 of different cores 102 .

请参阅图32,其是显示图31中该微处理器100用以传播一MTRR 3102更新至该微处理器100的多个核102之一的操作流程图。该操作从一单一核的角度所描述,但该微处理器100的每一核102根据共同传播该MTRR3102更新至该微处理器100所有核102的描述来进行操作。更具体的说,图32描述遇到更新该MTRR 3102指令的核的操作,其流程开始于方块3202,而其它核102的操作,其流程开始于方块3252。Please refer to FIG. 32 , which is a flowchart showing the operation of the microprocessor 100 in FIG. 31 for propagating an MTRR 3102 update to one of the cores 102 of the microprocessor 100 . The operation is described from the perspective of a single core, but each core 102 of the microprocessor 100 operates according to the description of co-propagating the MTRR3 102 update to all cores 102 of the microprocessor 100 . More specifically, FIG. 32 describes the operation of the core encountering the instruction to update the MTRR 3102 , whose flow begins at block 3202 , and the operation of the other cores 102 , whose flow begins at block 3252 .

在方块3202中,核102其中之一遇到一指示该核更新其MTRR 3102的指令。也就是说,该MTRR更新指令包括一MTRR3102识别符及一被写入至该MTRR 3102的更新值。在一实施例中,该MTRR更新指令是一x86WRMSR指令,其用以指定在EAX:EDX暂存器中的该更新值及在该ECX暂存器的该MTRR3102识别符,其为在该核102的MSR地址空间内的一MSR地址。为了响应该MTRR更新指令,该核102停用中断并阻止执行该MTRR更新指令的微码。应可理解的是,包括该MTRR更新指令的该系统软件可包括一多指令序列,以作为该MTRR 3102更新的准备。然而,更佳地说,其作为该序列单一架构指令的响应,所有核102的MTRR 3102在该架构指令级中以一原子方式被更新。也就是说,一旦中断在该第一核102中被停用(例如,在方块3202中,该核102遇到该MTRR更新指令),当执行的微码传播新MTRR 3102值至该微处理器100所有核102时(例如,直到在方块3218后为止),中断仍维持停用。再者,一旦在其它核102中被停用(例如,在方块3252),其仍被停用直到该微处理器100所有核102的该MTRR 3102已更新为止(例如,直到方块2634后为止)。因此,有利的是,该新MTRR 3102值在该架构指令级中以一原子方式被传播至该微处理器100的所有核102中。流程进行到方块3204。In block 3202, one of the cores 102 encounters an instruction instructing the core to update its MTRR 3102. That is, the MTRR update command includes an MTRR 3102 identifier and an update value written into the MTRR 3102 . In one embodiment, the MTRR update instruction is an x86WRMSR instruction, which is used to specify the update value in the EAX:EDX register and the MTRR3102 identifier in the ECX register, which is in the core 102 An MSR address within the MSR address space. In response to the MTRR update instruction, the core 102 disables interrupts and prevents execution of the MTRR update instruction's microcode. It should be understood that the system software including the MTRR update command may include a multi-command sequence in preparation for the MTRR 3102 update. More preferably, however, the MTRRs 3102 of all cores 102 are updated atomically at the architectural instruction level in response to the sequence of single architectural instructions. That is, once interrupts are disabled in the first core 102 (e.g., in block 3202, the core 102 encounters the MTRR update instruction), when executing microcode propagates new MTRR 3102 values to the microprocessor When all cores 102 are 100 (eg, until after block 3218), interrupts remain disabled. Furthermore, once disabled in other cores 102 (e.g., at block 3252), it remains disabled until the MTRR 3102 of all cores 102 of the microprocessor 100 has been updated (e.g., until after block 2634) . Thus, advantageously, the new MTRR 3102 value is propagated atomically to all cores 102 of the microprocessor 100 at the architectural instruction level. Flow proceeds to block 3204.

在方块3204中,该核102获得图1中该硬件信号量118的所有权。更佳地说,该微处理器100包括与一MTRR 3102相关的一硬件信号量118。更佳地说,该核102以一方式获得硬件信号量118的所有权,其方式类似上方图20所描述,更具体地说为方块2004和2006。该硬件信号量118被使用由于有可能核102其中之一执行一MTRR 3102更新,以作为遇到一MTRR更新指令的响应,而一第二核102遇到一MTRR更新指令,以作为该第二核将开始更新该MTRR3102的响应,这可能会造成不正确的执行。流程进行到方块3206。In block 3204, the core 102 takes ownership of the hardware semaphore 118 in FIG. 1 . More preferably, the microprocessor 100 includes a hardware semaphore 118 associated with an MTRR 3102 . More preferably, the core 102 acquires ownership of the hardware semaphore 118 in a manner similar to that described above in FIG. 20 , more specifically blocks 2004 and 2006 . The hardware semaphore 118 is used because it is possible that one of the cores 102 performs a MTRR 3102 update in response to encountering a MTRR update command, while a second core 102 encounters a MTRR update command as the second The core will start updating the MTRR3102's response, which may cause incorrect execution. Flow proceeds to block 3206.

在方块3206中,一核102传送一MTRR更新信息至其它核102并传送其它核102一核间中断。更佳地说,在时间中断被停用的期间内(例如,该微码不允许其本身被中断),该核102阻止该微码以响应该MTRR更新指令(在方块3202中)或响应该中断(在该方块3252中),并维持于该微码中,直到方块3218为止。流程进行到方块3208。In block 3206, one core 102 sends an MTRR update message to the other core 102 and sends the other core 102 an inter-core interrupt. More preferably, the core 102 prevents the microcode from responding to the MTRR update instruction (in block 3202) or in response to the Interrupted (in the block 3252), and maintained in the microcode until block 3218. Flow proceeds to block 3208.

在方块3252中,其它核102之一(例如,在方块3202中除了遇到该MTRR更新指令的该核102之外的一核)被中断并且因在方块3206中所传送的该核间中断而接收该MTRR更新信息。在一实施例中,该核102在下一架构指令边界中(例如,在下一x86指令边界中)取得该中断。为了响应该中断,该核102停用中断且阻止处理该MTRR更新信息的微码。如上所述,虽然在方块3252中的流程是以一单一核102的角度所描述,但每一其它核102(例如,没在方块3202中的核102)在方块3252中被中断并接收该信息,且执行在方块3208至方块3234的步骤。流程由方块3252进行到方块3208。In block 3252, one of the other cores 102 (e.g., a core other than the core 102 that encountered the MTRR update instruction in block 3202) is interrupted and interrupted by the inter-core interrupt transmitted in block 3206 The MTRR update information is received. In one embodiment, the core 102 fetches the interrupt in the next architectural instruction boundary (eg, in the next x86 instruction boundary). In response to the interrupt, the core 102 disables interrupts and blocks the microcode that processes the MTRR update information. As mentioned above, although the flow in block 3252 is described in terms of a single core 102, every other core 102 (e.g., the core 102 not in block 3202) is interrupted in block 3252 and receives the information , and the steps at block 3208 to block 3234 are performed. Flow proceeds from block 3252 to block 3208 .

在方块3208中,该核102写入一同步情况31的同步请求(在图32中标示为SYNC 31)至其同步暂存器108中,并由该控制单元104令该核102进入睡眠状态,并随后当所有核102已写入SYNC 31时,由该控制单元104唤醒。流程进行到判断方块3211。In block 3208, the core 102 writes a synchronization request (labeled as SYNC 31 in FIG. 32 ) of a synchronization condition 31 into its synchronization register 108, and the control unit 104 causes the core 102 to enter the sleep state, And then wake up by this control unit 104 when all cores 102 have written to SYNC 31 . The process proceeds to decision block 3211.

在判断方块3211中,该核102判断其是否为遇见在方块3202中该MTRR更新指令的核102(与在方块3252中接收该MTRR更新信息的一核102相比较)。若是,则流程进行到方块3212;否则,流程进行到方块3214。In decision block 3211, the core 102 determines whether it is the core 102 that encountered the MTRR update command in block 3202 (compared to a core 102 that received the MTRR update message in block 3252). If so, the process proceeds to block 3212; otherwise, the process proceeds to block 3214.

在方块3212中,该核102将由该MTRR更新指令所指定的该MTRR识别符及该MTRR被更新使得所有其它核102可看见的一MTRR更新值载入至该非核PRAM 116。在一x86实施例的情况下,MTRR 3102包括:(1)已修复范围MTRR,其包括一经由单一WRMSR指令更新的单一64位MSR以及(2)不同范围MTRR,其包括两个64位MSR,每一MSR通过一不同WRMSR指令被写入,例如,这两个WRMSR指令指定不同MSR地址。对于不同范围MTRRs,该MSR其中之一(该PHYSBASE暂存器)包括该存储器范围的一基址及一用以指定该存储器类型的一类型栏位,以及其它的MSR(该PHYSMASK暂存器)包括一有效位及一设置该范围遮蔽(mask)的遮蔽栏位。更佳地说,该核102载入至该非核PRAM 116的该MTRR更新值如下。In block 3212 , the core 102 loads the uncore PRAM 116 with the MTRR identifier specified by the MTRR update instruction and an MTRR update value in which the MTRR is updated so that all other cores 102 can see. In the case of an x86 embodiment, MTRR 3102 includes: (1) a repaired range MTRR that includes a single 64-bit MSR updated via a single WRMSR instruction and (2) a different range MTRR that includes two 64-bit MSRs, Each MSR is written by a different WRMSR instruction, eg, the two WRMSR instructions specify different MSR addresses. For different range MTRRs, one of the MSRs (the PHYSBASE register) includes a base address of the memory range and a type field for specifying the memory type, and the other MSR (the PHYSMASK register) Contains a valid bit and a mask field for setting the range mask. More preferably, the updated MTRR value loaded by the core 102 into the uncore PRAM 116 is as follows.

1、若该MSR确定为该PHYSMASK暂存器时,则该核102载入该非核PRAM 116一128位更新值,该更新值包括由该WRMSR指令所指定的新64位值(其包括该有效位及遮蔽值)以及该PHYSBASE暂存器的当前值(其包括基值及类型值)。1. If the MSR is determined to be the PHYSMASK register, then the core 102 loads the non-core PRAM 116-128 bit update value, which includes the new 64-bit value specified by the WRMSR instruction (it includes the effective bits and mask values) and the current value of the PHYSBASE register (which includes base and type values).

2、若该MSR确定为该PHYSBASE暂存器时:2. If the MSR is determined to be the PHYSBASE scratchpad:

a、若在该PHYSMASK暂存器中有效位正被设置,则该核102载入至该非核PRAM 116一128位的更新值,该更新值包括由该WRMSR指令所指定该新的64位值(该64位值包括该基值及类型值)以及该PHYSMASK暂存器的当前值(该当前值包括该有效位及遮蔽值)。a. If the active bit is being set in the PHYSMASK register, the core 102 loads into the non-core PRAM 116 a 128-bit update value including the new 64-bit value specified by the WRMSR instruction (the 64-bit value includes the base value and type value) and the current value of the PHYSMASK register (the current value includes the valid bit and mask value).

b、若在该PHYSMASK暂存器中有效位正被设置,则该核102载入至该非核PRAM 116一64位的更新值,该更新值仅包括由该WRMSR指令所指定该新的64位值(该64位值包括该基值及类型值)。b. If the active bit is being set in the PHYSMASK register, the core 102 loads into the non-core PRAM 116 a 64-bit update value that includes only the new 64-bit value specified by the WRMSR instruction value (the 64-bit value includes the base and type values).

此外,若该写入的更新值是一128位的值,该核102在该非核PRAM 116中设置一旗标,并且,若该更新值是一64位的值时,则该核102清除该旗标。流程由方块3212进行到方块3214。Additionally, the core 102 sets a flag in the non-core PRAM 116 if the updated value written is a 128-bit value, and clears the core 102 if the updated value is a 64-bit value. Flag. Flow proceeds from block 3212 to block 3214.

在方块3214中,该核102写入一同步情况32(在图32中标示为SYNC 32)的同步请求到其同步暂存器108,并由该控制单元104使该核102进入睡眠状态,随后当所有核102写入一SYNC 32时,由控制单元104所唤醒。流程进行到方块3216。In block 3214, the core 102 writes a synchronization request of a synchronization condition 32 (marked as SYNC 32 in FIG. 32 ) to its synchronization register 108, and the control unit 104 causes the core 102 to enter the sleep state, and then When all cores 102 write a SYNC 32, they are woken up by the control unit 104. Flow proceeds to block 3216.

在方块3216中,该核102从该非核PRAM 116读取在方块3212中写入的该MTRR 3102识别符及该MTRR更新值。有利的是,该MTRR更新值传播以一原子方式执行,使得任何可能会影响各自核102操作的MTRR 3102的更新保证在架构上不可见,直到该更新值已被传播至所有核102的MTRR 3102为止,由于所有核已知正执行实行于该MTRR更新指令中相同的微码,并且中断将不在任一核102中使用,直到该更新值被传播至所有核102各自的MTRR 3102为止。如以上本实施例中方块3212所述,若该旗标在方块3212中被设置时,则该核102也更新(除了已确定的MSR之外)该PHYSMASK或PHYSBASE暂存器;否则,若该旗标为清除(clear)时,则该核102仅更新已确定的MSR。流程进行到方块3218。In block 3216 , the core 102 reads the MTRR 3102 identifier and the MTRR update value written in block 3212 from the uncore PRAM 116 . Advantageously, this MTRR update value propagation is performed in an atomic fashion such that any update to the MTRR 3102 that might affect the operation of the respective core 102 is guaranteed to be architecturally invisible until the update value has been propagated to the MTRR 3102 of all cores 102 So far, since all cores are known to be executing the same microcode implemented in the MTRR update instruction, and interrupts will not be used in any core 102 until the update value is propagated to all cores 102's respective MTRR 3102. As described in block 3212 in the present embodiment above, if the flag is set in block 3212, then the core 102 also updates (in addition to the determined MSR) the PHYSMASK or PHYSBASE register; otherwise, if the When the flag is clear, the core 102 only updates the determined MSR. Flow proceeds to block 3218.

在方块3218中,该核102写入一同步情况33(在图32中标示为SYNC 33)的同步请求到其同步暂存器108,并由该控制单元104使该核102进入睡眠状态,随后当所有核102写入一SYNC 33时,由控制单元104所唤醒。流程结束于方块3218。In block 3218, the core 102 writes a synchronization request of a synchronization condition 33 (indicated as SYNC 33 in FIG. 32 ) to its synchronization register 108, and the control unit 104 causes the core 102 to enter a sleep state, and then When all cores 102 write a SYNC 33, they are woken up by the control unit 104. Flow ends at block 3218.

在方块3218之后,该MTRR核102释放在方块3204中所获得的该硬件信号量118。更进一步地,在方块3218之后,该核102重新启动中断。After block 3218 , the MTRR core 102 releases the hardware semaphore 118 obtained in block 3204 . Further, after block 3218, the core 102 restarts interrupts.

从图31和图32观察可知,运行在图31微处理器100中的系统软件可利于执行在该微处理器100单一核102中执行一MTRR更新指令以完成更新该微处理器100所有核102的指定MTRR 3102,而非单独在每一核102中执行一MTRR更新指令,其可以提供系统的完整性。It can be seen from FIG. 31 and FIG. 32 that the system software running in the microprocessor 100 in FIG. 31 can be beneficial to execute an MTRR update instruction in the single core 102 of the microprocessor 100 to complete the update of all the cores 102 of the microprocessor 100. The specified MTRR 3102, rather than individually executing an MTRR update instruction in each core 102, can provide system integrity.

一实例化在每一核102中特定MTRR3102是一系统管理范围暂存器(SystemManagement Range Register,SMRR)3102。由于该SMRR 3102拥有程序码及与系统管理模式(System Management Mode,SMM)相关的数据的操作,如一系统管理中断(SystemManagement Interrupt,SMI)处理器,因此由该SMRR 3102所指定的该存储器范围被称为SMRAM区域。当在一核102中运行的程序码尝试存取该SMRAM区域时,若该核102运行于SMM中,则该核102仅允许此存取;否则,该核102忽略写入该SMRAM区域的一写入,并恢复由该SMRAM区域中所读取每一位的一固定值。此外,如果一运行在该SMM中的一核102尝试在该SMRAM区域外执行程序码,则该核102将确立一机器检查异常。此外,当该核102运行在SMM中时,该核102仅允许程序码写入该SMRR3102中。这有利于在该SMRAM区域中SMM程序码和数据的保护。在一实施例中,该SMRR3102类似于在Intel64和IA-32架构软件开发人员手册第3册:系统编程指南,2013年9月,特别是在第11.11.2.4和34.4.2.1节描述,其在本文中被引用并构成本说明书的一部分。One specific MTRR 3102 instantiated in each core 102 is a System Management Range Register (SMRR) 3102 . Since the SMRR 3102 has program codes and data operations related to the System Management Mode (SMM), such as a System Management Interrupt (SMI) processor, the memory range specified by the SMRR 3102 is Called the SMRAM area. When a program code running in a core 102 attempts to access the SMRAM area, if the core 102 is running in SMM, the core 102 only allows this access; otherwise, the core 102 ignores a write to the SMRAM area. Write, and restore a fixed value for each bit read from the SMRAM area. Additionally, if a core 102 running in the SMM attempts to execute code outside of the SMRAM area, the core 102 will assert a machine check exception. In addition, the core 102 only allows program code to be written into the SMRR 3102 when the core 102 is running in SMM. This facilitates the protection of SMM program code and data in this SMRAM area. In one embodiment, the SMRR3102 is similar to that described in Intel64 and IA-32 Architectures Software Developer's Handbook Volume 3: System Programming Guide, September 2013, particularly Sections 11.11.2.4 and 34.4.2.1, which are in It is cited herein and constitutes a part of this specification.

一般来说,每一核102具有其自身在存储器中SMM程序码和数据的例子。期望的是每一核102的SMM程序码和数据受到保护以避免不仅来自于在本身中运行的程序码,而且还来自另一核102中运行的程序码。为了使用SMRRs3102来完成,系统软件通常将多个SMM程序码和数据实例放置于存储器中相邻的区块。即,该SMRAM区域是一单一包括所有SMM程序码和数据实例的邻近存储器区域。如果该微处理器100所有核102的该SMRR 3102具有指定包括所有SMM代程序码和数据实例的该单一邻近存储器区域全体的值时,这可以阻止在非SMM中一核运行的程序码更新另一核102的SMM程序码及数据实例。当一时间窗口存在于核102中SMRR 3102值不相同时,例如,该微处理器100不同核102中SMRRs 3102具有不同的值,其任一值明确小于包括所有SMM程序码和数据实例的单一邻近存储器区域的整体,则系统可能易受到一安全攻击,对于给定SMM的性质而言,其可能是严重的。因此,描述原子传播更新至SMRRs 3102的实施例可以是特别有利的。In general, each core 102 has its own instance of SMM program code and data in memory. It is desirable that the SMM code and data of each core 102 be protected not only from code running in itself, but also from code running in another core 102 . To accomplish this using SMRRs 3102, system software typically places multiple instances of SMM code and data in contiguous blocks in memory. That is, the SMRAM region is a single contiguous memory region containing all instances of SMM program code and data. If the SMRR 3102 of all cores 102 of the microprocessor 100 has a value specifying the entirety of the single contiguous memory region including all instances of SMM code and data, this can prevent code running on one core in non-SMM from updating another An example of SMM program code and data of a core 102. When a window of time exists where SMRRs 3102 have different values in cores 102, for example, the microprocessor 100 has different values for SMRRs 3102 in different cores 102, any value of which is definitely less than a single value that includes all instances of SMM program code and data. Adjacent to the entirety of the memory region, the system may be vulnerable to a security attack, which may be severe given the nature of SMMs. Accordingly, embodiments describing atomically propagated updates to SMRRs 3102 may be particularly advantageous.

此外,其它实施例可预期该微处理器100其它每一核实例化架构上可见储存资源的更新以类似上述方法的一原子方式被传播。例如,在一实施例中,每一核102实例化该x86IA32_MISC_ENABLE MSR的某些位栏位,并且在一核102中所执行的一WRMSR以类似如上所述的一方式被传播至该微处理器100中的所有核102。此外,实施例也可预期在一WRMSR的一核102中的执行至在该微处理器100所有核102中被实例化的其它MSR,其皆为架构上及专用的及/或当前和未来的,以类似如上所述的一方式被传播至该微处理器100中的所有核102。In addition, other embodiments contemplate that updates to the architecturally visible storage resources of each other instance of the microprocessor 100 are propagated in an atomic manner similar to the method described above. For example, in one embodiment, each core 102 instantiates certain bit fields of the x86IA32_MISC_ENABLE MSR, and a WRMSR executed in a core 102 is propagated to the microprocessor in a manner similar to that described above All cores 102 in 100. Furthermore, embodiments also contemplate execution in a core 102 of a WRMSR to other MSRs instantiated in all cores 102 of the microprocessor 100, both architectural and specific and/or current and future , is propagated to all cores 102 in the microprocessor 100 in a manner similar to that described above.

此外,尽管实施例描述该每一核实例化架构上可见的储存资源为MTRRs,其它实施例预期为该每一核实例化资源为不同于x86ISA指令集架构的资源、及其它除了MTRRs之外的资源。举例来说,其它除了MTRRs之外的资源包括CPUID值及回报功能的MSR,像是向量多媒体扩展(Vectored Multimedia eXtensions,VMX)功能。Additionally, although embodiments describe the per-core instantiation architecture-visible storage resources as MTRRs, other embodiments contemplate that the per-core instantiation resources are resources other than x86 ISA instruction set architecture resources, and others in addition to MTRRs resource. For example, other resources besides MTRRs include CPUID values and MSRs for reporting functions, such as Vectored Multimedia eXtensions (VMX) functions.

虽然本发明已以较佳实施例揭露如上,然其并非用以限定本发明,本领域技术人员在不脱离本发明的精神和范畴内,当可做些许更动与润饰,因此本发明的保护范围当以本申请权利要求所界定的为准。例如,软件可致能,例如,功能、制造、模型化、模拟、描述及/或测试本发明所述的装置以及方法。上述可通过使用一般程序语言(例如:C、C++)、硬件描述语言(Hardware Description Languages,HDL)包括Verilog HDL、VHDL等等来实现。此类软件可以以程序码的型态包含于实体介质中,例如任何其它机器可读取(如计算机可读取)储存介质如半导体、磁盘、硬盘或光盘(例如:CD-ROM、DVD-ROM等等),其中,当程序码被机器,如计算机载入且执行时,此机器变成用以实施本发明的装置。本发明的方法与装置也可以以程序码型态通过一些传送介质,如电线或电缆、光纤、或是任何传输型态进行传送,其中,当程序码被机器,如计算机接收、载入且执行时,此机器变成用以实施本发明的装置。当在一般用途处理器实作时,程序码结合处理器提供一操作类似于应用特定逻辑电路的独特装置。本发明所述的装置以及方法可包含于一半导体智能产权核例如一微处理器核(嵌入于HDL),并转换成集成电路的硬件产品。此外,本发明所述的装置以及方法可包含具有硬件以及软件的组合实体实施例。因此本发明的保护范围当视本申请权利要求所界定的为准。最后,本领域技术人员可基于本发明所揭露的概念以及特定实施例,在不脱离本发明的精神和范围内可做些许更动与润饰以达到本发明的相同目的。Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Those skilled in the art may make some changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection of the present invention The scope shall be defined by the claims of the present application. For example, software may enable, eg, function, manufacture, model, simulate, describe and/or test the devices and methods described herein. The above can be realized by using general programming languages (for example: C, C++), hardware description languages (Hardware Description Languages, HDL) including Verilog HDL, VHDL and so on. Such software may be contained in a physical medium in the form of program code, such as any other machine-readable (such as computer-readable) storage medium such as a semiconductor, magnetic disk, hard disk or optical disk (such as: CD-ROM, DVD-ROM etc.), wherein, when the program code is loaded and executed by a machine, such as a computer, the machine becomes a device for implementing the present invention. The method and device of the present invention can also be transmitted in the form of program code through some transmission media, such as wires or cables, optical fibers, or any transmission mode, wherein when the program code is received, loaded and executed by a machine such as a computer , the machine becomes an apparatus for implementing the present invention. When implemented on a general-purpose processor, the program code combines with the processor to provide a unique device that operates similarly to application-specific logic circuits. The device and method described in the present invention can be included in a semiconductor intellectual property core such as a microprocessor core (embedded in HDL), and converted into an integrated circuit hardware product. Furthermore, the apparatus and methods described in the present invention may comprise physical embodiments having a combination of hardware and software. Therefore, the scope of protection of the present invention shall be defined by the claims of the present application. Finally, those skilled in the art may make some modifications and modifications based on the concepts and specific embodiments disclosed in the present invention without departing from the spirit and scope of the present invention so as to achieve the same purpose of the present invention.

Claims (16)

1.一种微处理器,其特征在于,包括:1. A microprocessor, characterized in that, comprising: 多个处理核,被配置为操作在第一模式和第二模式,a plurality of processing cores configured to operate in a first mode and a second mode, 在上述第一模式下,上述多个处理核的每一处理核共同操作为导引处理器并共同执行各自的测试程序的实例;以及In the above-mentioned first mode, each processing core of the above-mentioned plurality of processing cores jointly operates as a boot processor and jointly executes an instance of a respective test program; and 在上述第二模式下,仅上述多个处理核的单一处理核操作为上述导引处理器并执行上述微处理器的测试。In the second mode, only a single processing core of the plurality of processing cores operates as the boot processor and executes the test of the microprocessor. 2.根据权利要求1所述的微处理器,其特征在于,还包括:2. The microprocessor according to claim 1, further comprising: 对应上述多个处理核的多个中断控制器,其中每一中断控制器包括指示,上述指示用以确定是否针对对应处理核指定中断请求,a plurality of interrupt controllers corresponding to the plurality of processing cores, wherein each interrupt controller includes an indication for determining whether to specify an interrupt request for the corresponding processing core, 在上述第一模式下,上述多个处理核的每一处理核将对应中断控制器中的指示设置为与上述导引处理器相关的值。In the above-mentioned first mode, each processing core of the above-mentioned plurality of processing cores sets an indication in a corresponding interrupt controller to a value related to the above-mentioned boot processor. 3.根据权利要求1所述的微处理器,其特征在于,上述微处理器还被配置为修改上述多个处理核的每一处理核所存取的存储器的较高地址位,使得每一处理核存取单独的地址空间。3. The microprocessor according to claim 1, wherein the microprocessor is further configured to modify the higher address bits of the memory accessed by each processing core of the plurality of processing cores, so that each Handles core access to a separate address space. 4.根据权利要求1所述的微处理器,其特征在于,上述微处理器的保留暂存器指定上述多个处理核的操作的上述第一模式或上述第二模式。4. The microprocessor of claim 1, wherein a reserved register of the microprocessor specifies the first mode or the second mode of operation of the plurality of processing cores. 5.根据权利要求4所述的微处理器,其特征在于,上述保留暂存器被配置为接收被烧断或未被烧断的熔断丝的感测评估。5. The microprocessor of claim 4, wherein the reserved register is configured to receive a sensed evaluation of a blown or unblown fuse. 6.根据权利要求4所述的微处理器,其特征在于,上述保留暂存器被配置为从边界扫描输入接收指定上述第一模式或上述第二模式的值。6. The microprocessor of claim 4, wherein the reserved register is configured to receive a value specifying the first mode or the second mode from a boundary scan input. 7.根据权利要求1所述的微处理器,其特征在于,上述多个处理核的每一处理核被配置为确定上述第一模式或上述第二模式以响应上述处理核的重置。7. The microprocessor according to claim 1, wherein each processing core of the plurality of processing cores is configured to determine the first mode or the second mode in response to a reset of the processing core. 8.根据权利要求1所述的微处理器,其特征在于,在上述第一模式下,上述多个处理核被配置为在上述多个处理核中每一处理核的各自暂存器中设置旗标,以指示该处理核为上述导引处理器。8. The microprocessor according to claim 1, wherein, in the first mode, the plurality of processing cores are configured to be set in the respective registers of each processing core in the plurality of processing cores flag to indicate that the processing core is the above-mentioned boot processor. 9.一种配置多核微处理器的方法,其特征在于,上述方法包括:9. A method for configuring a multi-core microprocessor, characterized in that said method comprises: 确定上述微处理器的模式;determining the mode of said microprocessor; 当上述模式为第一模式时,由上述微处理器的多个处理核的每一处理核共同操作为导引处理器,并且由上述多个处理核的每一处理核共同执行各自的测试程序的实例;以及When the above-mentioned mode is the first mode, each processing core of the plurality of processing cores of the above-mentioned microprocessor jointly operates as a guide processor, and each processing core of the above-mentioned plurality of processing cores jointly executes a respective test program instance of ; and 当上述模式为第二模式时,仅由上述多个处理核的单一处理核操作为上述导引处理器并执行上述微处理器的测试。When the above mode is the second mode, only a single processing core of the plurality of processing cores operates as the boot processor and executes the test of the microprocessor. 10.根据权利要求9所述的方法,其特征在于,上述微处理器还包括对应上述多个处理核的多个中断控制器,其中每一中断控制器包括指示,上述指示用以确定是否针对对应处理核指定中断请求,10. The method according to claim 9, wherein the microprocessor further includes a plurality of interrupt controllers corresponding to the plurality of processing cores, wherein each interrupt controller includes an indication, and the indication is used to determine whether to target Corresponding to the processing core specified interrupt request, 在上述第一模式下,上述多个处理核的每一处理核将对应中断控制器中的指示设置为与上述导引处理器相关的值。In the above-mentioned first mode, each processing core of the above-mentioned plurality of processing cores sets an indication in a corresponding interrupt controller to a value related to the above-mentioned boot processor. 11.根据权利要求9所述的方法,其特征在于,还包括:11. The method of claim 9, further comprising: 修改上述多个处理核的每一处理核所存取的存储器的较高地址位,使得每一处理核存取单独的地址空间。Modifying higher address bits of the memory accessed by each processing core of the plurality of processing cores, so that each processing core accesses a separate address space. 12.根据权利要求9所述的方法,其特征在于,上述微处理器的保留暂存器指定上述多个处理核的操作的上述第一模式或上述第二模式。12. The method of claim 9, wherein a reserved register of the microprocessor specifies the first mode or the second mode of operation of the plurality of processing cores. 13.根据权利要求12所述的方法,其特征在于,上述保留暂存器被配置为接收被烧断或未被烧断的熔断丝的感测评估。13. The method of claim 12, wherein the retention register is configured to receive a sensed evaluation of a blown or unblown fuse. 14.根据权利要求12所述的方法,其特征在于,上述保留暂存器被配置为从边界扫描输入接收指定上述第一模式或上述第二模式的值。14. The method of claim 12, wherein the reserved register is configured to receive a value specifying the first mode or the second mode from a boundary scan input. 15.根据权利要求9所述的方法,其特征在于,确定模式的步骤被执行以响应上述处理核的重置。15. The method of claim 9, wherein the step of determining a mode is performed in response to a reset of said processing core. 16.根据权利要求9所述的方法,其特征在于,在上述第一模式下,在上述多个处理核中每一处理核的各自暂存器中设置旗标,以指示该处理核为上述导引处理器。16. The method according to claim 9, wherein, in the first mode, a flag is set in a respective register of each processing core among the plurality of processing cores to indicate that the processing core is the boot processor.
CN201410431423.9A 2013-08-28 2014-08-28 microprocessor and configuration method thereof Active CN104331387B (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US201361871206P 2013-08-28 2013-08-28
US61/871,206 2013-08-28
US201361916338P 2013-12-16 2013-12-16
US61/916,338 2013-12-16
US14/281,709 US9971605B2 (en) 2013-08-28 2014-05-19 Selective designation of multiple cores as bootstrap processor in a multi-core microprocessor
US14/281,709 2014-05-19

Publications (2)

Publication Number Publication Date
CN104331387A CN104331387A (en) 2015-02-04
CN104331387B true CN104331387B (en) 2019-08-06

Family

ID=52406117

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410431423.9A Active CN104331387B (en) 2013-08-28 2014-08-28 microprocessor and configuration method thereof

Country Status (1)

Country Link
CN (1) CN104331387B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106774788B (en) * 2016-11-23 2020-01-17 深圳市博巨兴微电子科技有限公司 SOC based on MCU and kernel cooperation control unit thereof

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5724527A (en) * 1995-12-28 1998-03-03 Intel Corporation Fault-tolerant boot strap mechanism for a multiprocessor system
US8041920B2 (en) * 2006-12-29 2011-10-18 Intel Corporation Partitioning memory mapped device configuration space
US8078862B2 (en) * 2008-04-25 2011-12-13 Intel Corporation Method for assigning physical data address range in multiprocessor system
US8972707B2 (en) * 2010-12-22 2015-03-03 Via Technologies, Inc. Multi-core processor with core selectively disabled by kill instruction of system software and resettable only via external pin
US8892946B2 (en) * 2011-12-15 2014-11-18 International Business Machines Corporation Verifying speculative multithreading in an application
CN103019863A (en) * 2012-12-25 2013-04-03 上海新储集成电路有限公司 Crisis emergency processing multi-core micro controller arbitration framework and working mode thereof

Also Published As

Publication number Publication date
CN104331387A (en) 2015-02-04

Similar Documents

Publication Publication Date Title
CN104462004B (en) Microprocessor and method for processing synchronous operation between cores
US9811344B2 (en) Core ID designation system for dynamically designated bootstrap processor
TWI613593B (en) Propagation of microcode patches to multiple cores in multicore microprocessor
CN104331388B (en) Microprocessor and method for synchronizing processing cores of microprocessor
CN109240481B (en) Multi-core microprocessor and method of using the same to save power
CN107729055B (en) Microprocessor and execution method thereof
CN104216679A (en) Microprocessor and execution method thereof
CN110046126B (en) Multi-core microprocessor and reconfiguration method thereof, computer readable storage medium
CN104239274B (en) microprocessor and configuration method thereof
CN104331387B (en) microprocessor and configuration method thereof
CN104239273B (en) Microprocessor and execution method thereof
CN104239272A (en) Microprocessor and operating method thereof
CN104216861A (en) Microprocessor and method for synchronizing processing cores in microprocessor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant