CN100394381C - Synchronous multi-thread processor circuit and operating method - Google Patents
Synchronous multi-thread processor circuit and operating method Download PDFInfo
- Publication number
- CN100394381C CN100394381C CNB2004100430627A CN200410043062A CN100394381C CN 100394381 C CN100394381 C CN 100394381C CN B2004100430627 A CNB2004100430627 A CN B2004100430627A CN 200410043062 A CN200410043062 A CN 200410043062A CN 100394381 C CN100394381 C CN 100394381C
- Authority
- CN
- China
- Prior art keywords
- threads
- performance index
- processor
- currently running
- performance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30189—Instruction operation extension or modification according to execution mode, e.g. mode flag
-
- A—HUMAN NECESSITIES
- A41—WEARING APPAREL
- A41D—OUTERWEAR; PROTECTIVE GARMENTS; ACCESSORIES
- A41D19/00—Gloves
- A41D19/015—Protective gloves
- A41D19/01547—Protective gloves with grip improving means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3824—Operand accessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Textile Engineering (AREA)
- Power Sources (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
与SMT处理器中线程运行相关的处理电路,可用于基于所述SMT处理器当前所运行的线程的数量,而以不同的性能指标运行。例如,根据本发明的一些实施例,与SMT处理器中线程的运行相关的处理电路,例如浮点单元或数据高速缓存,可以基于所述SMT处理器当前所运行线程的数量,以高功率模式或低功率模式运行。此外,随着SMT处理器所运行线程数量的增加,能够降低处理电路的性能指标,从而在允许减少与线程相关的处理电路所消耗功率的总量时,提供该SMT处理器体系结构的优点。相关的计算机程序产品和方法也被公开。
Processing circuitry associated with running threads in an SMT processor can be configured to run with different performance metrics based on the number of threads currently running on the SMT processor. For example, according to some embodiments of the present invention, processing circuits related to the operation of threads in the SMT processor, such as floating point units or data caches, can be activated in a high-power mode based on the number of currently running threads in the SMT processor. or low power mode operation. Furthermore, as the number of threads run by an SMT processor increases, the performance metrics of the processing circuitry can be reduced, thereby providing the benefits of the SMT processor architecture while allowing a reduction in the amount of power consumed by the processing circuitry associated with the threads. Related computer program products and methods are also disclosed.
Description
本申请要求于2003年2月20日提交的韩国专利申请号2003-107595的优先权,在此全文引用作为参考。This application claims priority from Korean Patent Application No. 2003-107595 filed on February 20, 2003, which is incorporated herein by reference in its entirety.
技术领域 technical field
本发明通常涉及计算机处理器体系结构,特别涉及同步多线程计算机处理器、相关的计算机程序产品及其运行方法。The present invention generally relates to computer processor architecture, and more particularly to synchronous multi-threaded computer processors, related computer program products and methods of operation thereof.
背景技术 Background technique
同步多线程(SMT)是一种利用硬件多线程来允许多个独立的线程在每一周期过程中发出指令的处理器体系结构。与其它硬件多线程体系结构在任何给定周期中仅激活一个单独的硬件内容(即线程)不同,SMT体系结构能够允许所有的线程内容同步地去竞争并共享处理器资源。Simultaneous multithreading (SMT) is a processor architecture that utilizes hardware multithreading to allow multiple independent threads to issue instructions during each cycle. Unlike other hardware multithreading architectures that only activate a single hardware context (ie, thread) in any given cycle, the SMT architecture allows all thread contexts to compete and share processor resources simultaneously.
SMT处理器能利用其他没有用的周期来执行指令,这样可以降低在SMT处理器中长时间等待操作的影响。此外,随着线程数量的增加,性能也可能提高,这也可能增加SMT处理器所消耗的能量。SMT processors can use other unused cycles to execute instructions, which can reduce the impact of long wait operations in SMT processors. Also, performance may increase as the number of threads increases, which may also increase the power consumed by the SMT processor.
在图1中举例说明了传统SMT处理器的方框图。图1中传统SMT处理器的运行在Dean M.Tullsen;Susan J.Egger;Henry M.Levy;Jack L.Lo;RebeccaL.Stamm等1996年在The 23rd Annual International Symposium on ComputerArchitecture,pp.191-202上的题为Exploiting Choice:Instruction Fetch and Issueon an Implementable Simultaneous Multithreading Processor中进行了论述,在此引用其公开内容以供参考。传统SMT处理器的体系结构和运行在技术上众所周知,在这里将不对它们作进一步详细的描述。A block diagram of a conventional SMT processor is illustrated in Figure 1 . The operation of the traditional SMT processor in Figure 1 is described in Dean M.Tullsen; Susan J.Egger; Henry M.Levy; Jack L.Lo; 202 entitled Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor, the disclosure of which is incorporated herein by reference. The architecture and operation of conventional SMT processors are well known in the art and they will not be described in further detail here.
发明内容 Contents of the invention
根据本发明的实施例可以提供处理电路、计算机程序产品、和/或以基于同步多线程(SMT)处理器所运行的线程的数量以不同的性能指标来运行的方法。例如,在根据本发明的多个实施例中,与SMT处理器中线程的运行相关的处理电路,例如浮点单元或数据高速缓存,可以基于所述SMT处理器当前所运行线程的数量,以一种高功率模式或一种低功率模式之一运行。此外,随着SMT处理器所运行线程数量的增加,能够降低处理电路的性能指标,从而在允许减少与线程相关的处理电路所消耗功率的总量时,提供该SMT处理器体系结构的优点。换言之,该SMT处理器能够以相同的功率但更高的性能运行,或者可以消耗较多功率但以高于传统SMT处理器的性能指标运行。Embodiments in accordance with the present invention may provide processing circuits, computer program products, and/or methods that operate at different performance metrics based on the number of threads executed by a simultaneous multithreading (SMT) processor. For example, in multiple embodiments of the present invention, the processing circuits related to the running of threads in the SMT processor, such as floating point units or data caches, can be based on the number of threads currently running on the SMT processor, in order of One of a high power mode or a low power mode operates. Furthermore, as the number of threads run by an SMT processor increases, the performance metrics of the processing circuitry can be reduced, thereby providing the benefits of the SMT processor architecture while allowing a reduction in the amount of power consumed by the processing circuitry associated with the threads. In other words, the SMT processor can run at the same power but with higher performance, or can consume more power but run at a higher performance index than conventional SMT processors.
在根据本发明的多个实施例中,该处理电路可以用于当所述SMT处理器当前所运行线程的数量小于或等于阈值时,以第一性能指标运行,当所述SMT处理器当前所运行线程的数量大于该阈值时,以第二性能指标运行。In multiple embodiments according to the present invention, the processing circuit may be configured to run with a first performance index when the number of threads currently running by the SMT processor is less than or equal to a threshold value, and when the number of threads currently running by the SMT processor is When the number of running threads is greater than the threshold, run with the second performance indicator.
在根据本发明的多个实施例中,性能指标控制电路可以用于基于所述SMT处理器当前所运行线程的数量为处理电路提供一个性能指标。根据本发明的多个实施例,当所述SMT处理器当前所运行线程的数量小于或等于阈值时,该性能指标控制电路能将提供给处理电路的性能指标提高为第一性能指标。当所述SMT处理器当前所运行线程的数量超过该阈值时,该性能指标控制电路能将提供给至少一个处理电路的性能指标降低至小于第一性能指标的第二性能指标。In various embodiments according to the present invention, the performance index control circuit may be used to provide a performance index for the processing circuit based on the number of threads currently running on the SMT processor. According to multiple embodiments of the present invention, when the number of currently running threads of the SMT processor is less than or equal to a threshold, the performance index control circuit can increase the performance index provided to the processing circuit to the first performance index. When the number of currently running threads of the SMT processor exceeds the threshold, the performance index control circuit can reduce the performance index provided to at least one processing circuit to a second performance index that is less than the first performance index.
在根据本发明的多个实施例中,当所述SMT处理器当前所运行线程的数量超过大于第一阈值的第二阈值时,该性能指标控制电路进一步将处理电路的性能指标降低至小于第二性能指标的第三性能指标。In multiple embodiments according to the present invention, when the number of currently running threads of the SMT processor exceeds a second threshold greater than the first threshold, the performance index control circuit further reduces the performance index of the processing circuit to be less than the first threshold The third performance index of the second performance index.
根据本发明可提供的性能指标变量的多个实施例。例如,根据本发明的一些实施例,该处理电路可以是包括标记存储器和数据存储器的高速缓冲存储器电路,用于当高速缓冲存储器以第一性能指标运行时,将提供与该标记存储器的存取同步的高速缓存数据。该数据存储器可以用于当高速缓冲存储器电路以小于第一性能指标的第二性能指标运行时,提供响应于标记存储器中命中的高速缓存数据。Various embodiments of performance indicator variables may be provided in accordance with the present invention. For example, according to some embodiments of the invention, the processing circuit may be a cache memory circuit comprising a tag memory and a data memory for providing access to the tag memory when the cache memory is operating at a first performance indicator Synchronized cache data. The data store may be used to provide cache data responsive to a hit in the tag memory when the cache memory circuit is operating at a second performance indicator that is less than the first performance indicator.
在根据本发明的多个实施例中,该高速缓冲存储器可以是用于存储通过指令运行的数据的数据高速缓冲存储器和用于存储通过相关数据运行的指令的指令高速缓冲存储器中的至少一种。根据本发明的多个实施例,该数据高速缓冲存储器可进一步用于当以第二性能指标运行时不提供响应于标记存储器中漏失的高速缓存数据。In various embodiments according to the present invention, the cache may be at least one of a data cache for storing data operated by instructions and an instruction cache for storing instructions operated by associated data . According to various embodiments of the invention, the data cache is further operable to not provide cache data responsive to misses in the tag memory when operating at the second performance indicator.
在根据本发明的多个实施例中,该处理电路可以是浮点单元。根据本发明的多个实施例,该浮点单元可以是用于在SMT处理器所运行线程的数量小于或等于阈值时以第一性能指标运行的第一浮点单元,并且该SMT处理器可以进一步包括当所述SMT处理器所运行线程的数量大于该阈值时以小于第一性能指标的第二性能指标运行的第二浮点单元。In various embodiments according to the invention, the processing circuit may be a floating point unit. According to multiple embodiments of the present invention, the floating point unit may be a first floating point unit for running with a first performance index when the number of threads run by the SMT processor is less than or equal to a threshold, and the SMT processor may It further includes a second floating-point unit running with a second performance index smaller than the first performance index when the number of threads run by the SMT processor is greater than the threshold.
在根据本发明的多个实施例在,该性能指标控制电路可用于响应于在SMT处理器中分别被创建和完成的线程,增加或减少SMT处理器当前所运行线程的数量。In various embodiments according to the present invention, the performance index control circuit is operable to increase or decrease the number of threads currently running by the SMT processor in response to threads being created and completed in the SMT processor, respectively.
根据本发明的多个实施例,第二处理电路可用于响应于在SMT处理器中当前所运行的线程数量增加到大于该阈值,从而以小于第一性能指标的第二性能指标运行。According to various embodiments of the invention, the second processing circuit is operable to operate at a second performance indicator less than the first performance indicator in response to the number of threads currently running in the SMT processor increasing above the threshold.
根据本发明的多个实施例,该性能指标控制电路可以用于响应于新线程的创建而降低提供给至少一个处理电路的性能指标,从而将SMT处理器当前所运行线程的数量从小于或等于阈值增加到大于该阈值。根据本发明的多个实施例,该性能指标控制电路可用于随着SMT处理器当前所运行线程的数量超过上升的阈值中的每一个时,将处理电路的性能指标降低至多个下降的性能指标中的一个。According to various embodiments of the present invention, the performance index control circuit can be used to reduce the performance index provided to at least one processing circuit in response to the creation of a new thread, thereby reducing the number of threads currently running by the SMT processor from less than or equal to The threshold is increased above the threshold. According to various embodiments of the present invention, the performance index control circuit is operable to reduce the performance index of the processing circuit to a plurality of decreasing performance indexes as the number of threads currently running by the SMT processor exceeds each of rising thresholds one of the.
根据本发明的多个实施例,该性能指标控制电路可用于为第一处理电路保持第一性能指标,并响应于SMT当前所运行线程的数量从小于或等于阈值增加至大于该阈值,为第二处理电路提供小于第一性能指标的第二性能指标。According to various embodiments of the present invention, the performance indicator control circuit is operable to maintain a first performance indicator for the first processing circuit, and respond to the number of threads currently running by the SMT increasing from less than or equal to a threshold value to greater than the threshold value, for the second The second processing circuit provides a second performance index less than the first performance index.
根据本发明的其他实施例,性能指标控制电路可用于基于所述SMT处理器当前所运行的线程的数量,向SMT处理器中的处理电路提供性能指标。According to other embodiments of the present invention, the performance indicator control circuit may be used to provide performance indicators to the processing circuit in the SMT processor based on the number of threads currently running on the SMT processor.
仍然根据本发明的其它实施例,线程管理电路可用于在创建线程后,将与SMT处理器相关的处理电路分配给SMT处理器中运行的线程。性能指标控制电路可用于基于所述SMT处理器当前所执行的,与至少一个阈值进行了比较的线程的数量,向处理电路提供大量性能指标中的一个。Still according to other embodiments of the present invention, the thread management circuit can be used to allocate the processing circuit related to the SMT processor to the thread running in the SMT processor after the thread is created. The performance indicator control circuit is operable to provide one of a plurality of performance indicators to the processing circuit based on the number of threads currently executing by the SMT processor compared to at least one threshold.
仍然根据本发明的其它实施例,与SMT处理器相关的高速缓冲存储器可以包括标记存储器和数据存储器,基于所述SMT处理器当前所运行的线程的数量,可以同步或在存取该标记存储器之后对该数据存储器进行存取。Still according to other embodiments of the present invention, the cache memory associated with the SMT processor may include a tag memory and a data memory, which may be accessed synchronously or after accessing the tag memory based on the number of threads currently running on the SMT processor access to the data memory.
附图说明 Description of drawings
图1是举例说明传统同步多线程(SMT)处理器电路体系结构的方框图。FIG. 1 is a block diagram illustrating a conventional simultaneous multithreading (SMT) processor circuit architecture.
图2是举例说明根据本发明的SMT处理器实施例的方框图。Figure 2 is a block diagram illustrating an embodiment of an SMT processor according to the present invention.
图3是举例说明根据本发明的线程管理电路实施例的方框图。Figure 3 is a block diagram illustrating an embodiment of a thread management circuit according to the present invention.
图4是举例说明根据本发明的性能指标控制电路实施例的方框图。FIG. 4 is a block diagram illustrating an embodiment of a performance index control circuit according to the present invention.
图5是举例说明根据本发明性能指标控制电路实施例的流程图。FIG. 5 is a flowchart illustrating an embodiment of the performance index control circuit according to the present invention.
图6是举例说明根据本发明的高速缓冲存储器实施例的方框图。Figure 6 is a block diagram illustrating an embodiment of a cache memory in accordance with the present invention.
图7是举例说明根据本发明的SMT处理器实施例的方框图。Figure 7 is a block diagram illustrating an embodiment of an SMT processor according to the present invention.
图8是举例说明根据本发明的SMT处理器实施例的方框图。Figure 8 is a block diagram illustrating an embodiment of an SMT processor according to the present invention.
图9是举例说明根据本发明的SMT处理器实施例的方框图。Figure 9 is a block diagram illustrating an embodiment of an SMT processor according to the present invention.
图10是举例说明根据本发明的性能指标控制电路实施例的方框图。FIG. 10 is a block diagram illustrating an embodiment of a performance index control circuit according to the present invention.
图11是举例说明根据本发明的性能指标控制电路实施例的流程图。FIG. 11 is a flowchart illustrating an embodiment of a performance index control circuit according to the present invention.
具体实施方式 Detailed ways
以下将参照附图对本发明进行更加充分地描述,在附图中示出了本发明的说明性实施例。然而,本发明可以以许多不同的形式实现,而并不应该认为局限于所述实施例;更确切地说,提供这些实施例是为了使公开的内容更透彻和全面,并且将会充分地向本领域技术人员传达本发明的范围。全文中,相同的数字表示相同的元件。The present invention will be described more fully hereinafter with reference to the accompanying drawings, in which illustrative embodiments of the invention are shown. However, the invention may be embodied in many different forms and should not be construed as limited to these embodiments; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully inform Those skilled in the art convey the scope of the invention. Throughout, like numerals refer to like elements.
应当了解,尽管在这里用术语“第一”和“第二”来描述多个元件,但是这些元件不应被这些术语所限制。这些术语只是用来区别一个元件和其他元件。因而,在不脱离所公开内容的范围内,以下所论述的第一元件可以被称为第二元件,同样,第二元件也可以被称为第一元件。It should be understood that although the terms "first" and "second" are used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from other elements. Thus, a first element discussed below could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the disclosure.
作为本领域技术人员之一,将能够理解本发明可以具体表现为电路、计算机程序产品,和/或计算机程序产品。因此,本发明可以采取纯硬件的实施例,纯软件的实施例或结合软件和硬件特征的实施例的形式。此外,本发明可以采用在具有计算机可用程序代码的计算机可用存储介质上的计算机程序产品的形式。任何适用的计算机可读介质都可以被利用,包括硬盘,CD-ROM,光存储装置,或磁存储装置。As one of ordinary skill in the art will understand, the present invention may be embodied as a circuit, a computer program product, and/or a computer program product. Accordingly, the present invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware features. Furthermore, the present invention can take the form of a computer program product on a computer-usable storage medium having computer-usable program code. Any suitable computer readable medium may be utilized including hard disks, CD-ROMs, optical storage devices, or magnetic storage devices.
计算机程序代码或用来根据本发明,实现运行的“代码”可以用一种面向对象编程语言来编写,例如JAVASmalltalk或C++,JavaScript,VisualBasic,TSQL,Perl,或其它编程语言。本发明的软件实施例不依赖于一个特殊编程语言的实现。部分代码可以全部在一个中间服务器所利用的一个或更多的系统上执行。The computer program code or "code" used to implement the operation according to the present invention can be written in an object-oriented programming language, such as JAVA Smalltalk or C++, JavaScript, VisualBasic, TSQL, Perl, or other programming languages. The software embodiments of the present invention do not rely on a particular programming language for implementation. Portions of code may all execute on one or more systems utilized by an intermediate server.
代码可以全部在一个或更多的计算机系统上执行,或者可以一部分在服务器上执行,且一部分在客户机装置内的客户机,或者在通信网络中中间站的代理服务器上执行。在后面的方案中,客户机装置可以通过局域网或广域网(例如内部网)与服务器相连接,或者通过互联网(例如,经由互联网服务供应商)来进行连接。本发明可以通过使用经由各种类型计算机网络的各种协议来体现。The code may execute entirely on one or more computer systems, or may execute partly on a server and partly on a client within a client device, or on a proxy server at an intermediate station in a communication network. In the latter scheme, the client device may connect to the server through a local or wide area network (eg, an intranet), or through the Internet (eg, via an Internet service provider). The invention can be embodied through the use of various protocols over various types of computer networks.
以下将根据本发明的实施例,结合方框图和对方法、系统和计算机程序产品进行说明的流程图,来对本发明进行描述。应当了解,方框图和流程图中的每一个模块,以及方框图和流程图中模块的组合都可以通过计算机程序指令执行。这些计算机程序指令可以提供给同步多线程(SMT)处理器电路、专用计算机、或其他可编程数据处理装置,以生成一种机器,以使通过计算机处理器或其他可编程数据处理装置执行的所述指令,生成用于执行方框图和/或流程图的块中指定功能的装置。The present invention will be described below in combination with block diagrams and flowcharts illustrating methods, systems and computer program products according to embodiments of the present invention. It should be understood that each and every module of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a simultaneous multi-threaded (SMT) processor circuit, a special purpose computer, or other programmable data processing apparatus to create a machine such that all programs executed by a computer processor or other programmable data processing apparatus The instructions described above generate means for performing the functions specified in the blocks of the block diagrams and/or flowcharts.
这些计算机程序指令可以存储在计算机可读存储器中,以指示计算机或其他可编程数据处理装置以特定方式运行,以使存储在计算机可读存储器中的指令,生成包含执行在方框图和/或流程图或模块中所指定功能的指令装置的一件产品。These computer program instructions may be stored in a computer-readable memory to instruct a computer or other programmable data processing apparatus to operate in a specific manner, so that the instructions stored in the computer-readable memory generate the Or a piece of product with a command device for the function specified in the module.
该计算机程序指令可以载入SMT处理器电路或其他可编程数据处理装置,以在计算器或其他可编程装置中执行一系列的运行步骤,从而生成计算机实现的处理,以使在计算机或其他可编程装置上执行的指令,提供用来实现在方框图和/或流程图或模块中所指定功能的步骤。The computer program instructions can be loaded into an SMT processor circuit or other programmable data processing device to perform a series of operational steps in a calculator or other programmable device, thereby generating a computer-implemented Instructions executed on a programming device provide the steps to implement the functions specified in the block diagrams and/or flowcharts or modules.
根据本发明的实施例,可以提供与SMT处理器中线程的运行相关的处理电路,其中该处理电路用于基于所述SMT处理器当前所运行的线程的数量,以不同的性能指标来运行。应当了解,不同的性能指标可以包括不同的电路运行速度和/或不同的精度指标。根据本发明的多个实施例,根据本发明的处理电路可以在不同的时钟速度运行和/或用不同的电路类型(例如不同类型的CMOS装置),来提供不同的性能指标。例如,根据本发明的多个实施例,与SMT处理器中线程的运行相关的处理电路,例如浮点单元或数据高速缓存,可以基于所述SMT处理器当前所运行线程的数量,以高时钟速度下的高功率模式或低时钟速度下的低功率模式运行。此外,随着SMT处理器所运行线程数量的增加,能够降低处理电路的性能指标,从而在允许减少与线程相关的处理电路所消耗功率的总量时,提供该SMT处理器体系结构的优点。According to an embodiment of the present invention, a processing circuit related to running threads in the SMT processor may be provided, wherein the processing circuit is used to run with different performance indicators based on the number of threads currently running on the SMT processor. It should be appreciated that different performance metrics may include different circuit operating speeds and/or different accuracy metrics. According to various embodiments of the present invention, processing circuits according to the present invention may operate at different clock speeds and/or use different circuit types (eg, different types of CMOS devices) to provide different performance metrics. For example, according to multiple embodiments of the present invention, processing circuits related to the operation of threads in the SMT processor, such as floating point units or data caches, can be clocked at a high speed based on the number of currently running threads in the SMT processor. high power mode at high clock speeds or low power mode operation at low clock speeds. Furthermore, as the number of threads run by an SMT processor increases, the performance metrics of the processing circuitry can be reduced, thereby providing the benefits of the SMT processor architecture while allowing a reduction in the amount of power consumed by the processing circuitry associated with the threads.
应当了解,根据本发明实施例,可以示出能够使用多个固有地相互平行运行的线程的线程级并行技术。如在此所使用的,“线程”可以是具有相关指令和数据的单独的处理过程线程可以表示为是具有多个处理过程的并行计算机程序的一部分的处理过程。线程还可以表示为独立于其他程序运行的单独的计算机程序。每一个线程可以有相关状态,例如,由分别适于相关的指令,数据,程序计数器,和/或寄存器的状态所定义。对于线程的相关状态,能包括对于由处理器执行的线程而言足够的信息。It should be appreciated that in accordance with embodiments of the present invention, thread-level parallelism techniques that enable the use of multiple threads that inherently run in parallel to one another may be shown. As used herein, a "thread" may be an individual process thread with associated instructions and data may represent a process that is part of a parallel computer program having multiple processes. A thread can also be represented as a separate computer program that runs independently of other programs. Each thread may have an associated state, eg, defined by states respectively for associated instructions, data, program counters, and/or registers. For thread-related state, information sufficient for the thread executed by the processor can be included.
根据本发明的多个实施例,性能指标控制电路用于向分配给SMT处理器中所创建线程的处理电路提供各自的性能指标。例如,该性能指标控制电路能够提供第一性能指标,以使处理电路能以高功率模式运行,此外,可以向以低功率模式运行的处理电路提供第二性能指标。仍然根据本发明其他的实施例,该性能指标控制电路提供中间性能指标(就是在高功率与低功率之间的其他性能指标)。According to various embodiments of the present invention, the performance indicator control circuit is configured to provide respective performance indicators to the processing circuits allocated to the threads created in the SMT processor. For example, the performance index control circuit can provide a first performance index to enable the processing circuit to operate in a high power mode, and further, can provide a second performance index to the processing circuit operating in a low power mode. Still according to other embodiments of the present invention, the performance index control circuit provides an intermediate performance index (that is, other performance index between high power and low power).
根据本发明的多个实施例,以不同的性能指标运行的处理电路可以是包括标记存储器和数据存储器的高速缓冲存储器。当高速缓冲存储器以第一性能指标运行时(即以高功率模式),能够同步存取该标记存储器和数据存储器,而不考虑对标记存储器的存取是否会产生命中。当标记存储器中的命中率较高时,对数据存储器所进行的同步存取能够提供较高的性能。换言之,该高速缓冲存储器还能够以第二性能指标运行(即如低功率模式),其中该数据存储器仅响应标记存储器中的一个命中而进行存取。因此,如果出现一个标记漏失,则能够避免与数据存储器的存取相关的某些功率消耗。此外,如果出现一个标记命中,则可以及时偏移对标记存储器和数据存储器的存取。According to various embodiments of the invention, the processing circuit operating at different performance metrics may be a cache memory including a tag memory and a data memory. When the cache memory is operating at the first performance indicator (ie, in a high power mode), the tag memory and the data memory can be accessed synchronously, regardless of whether the access to the tag memory will result in a hit. Synchronous access to the data memory provides higher performance when the hit rate in the tag memory is high. In other words, the cache memory is also capable of operating at a second performance level (ie, such as a low power mode), wherein the data memory is only accessed in response to a hit in the tag memory. Thus, some of the power consumption associated with accessing the data memory can be avoided if a tag miss occurs. Furthermore, accesses to the tag memory and data memory can be shifted in time if a tag hit occurs.
仍然在其他实施例中,与SMT处理器的线程运行相关的处理电路,可以是指令高速缓存或其他类型的处理电路,就像浮点电路或整数载入/存储电路。此外,这些处理电路中的每一个都能以不同的性能指标运行。例如,根据本发明的多个实施例,高速缓冲存储器,指令高速缓存,和浮点电路以及整数载入/存储电路能够以不同的性能指标同步运行。Still in other embodiments, the processing circuit associated with the thread execution of the SMT processor may be an instruction cache or other type of processing circuit, like a floating point circuit or an integer load/store circuit. Furthermore, each of these processing circuits can operate at different performance metrics. For example, according to various embodiments of the present invention, cache memories, instruction caches, and floating point circuits and integer load/store circuits can operate simultaneously with different performance metrics.
仍然进一步根据本发明的实施例,同类的处理电路(例如浮点电路和整数载入/存储电路)能够划分成不同的性能类别,以使其中一些电路设置为以第一性能指标运行,然而其他操作电路设置为以第二性能指标运行。例如,根据本发明的多个实施例,其中用于分配给SMT处理器中的线程的一些浮点电路,可用于以一种高功率模式运行,然而其它可用于分配给SMT处理器中的线程的浮点电路,可用于以一种低功率模式运行。Still further in accordance with embodiments of the present invention, processing circuits of the same type (e.g., floating point circuits and integer load/store circuits) can be divided into different performance classes, such that some of the circuits are set to operate at a first performance specification, while others The operating circuit is configured to operate at the second performance indicator. For example, according to various embodiments of the present invention, some of the floating-point circuits for allocation to threads in SMT processors may be used to run in a high power mode, while others may be used for allocation to threads in SMT processors. floating-point circuitry that can be used to run in a low-power mode.
图2是举例说明根据本发明的SMT处理器实施例的方框图。根据图2,当在SMT处理器200中创建新线程时,线程管理电路205将一组处理电路分配给新创建的线程使用。所分配的处理电路可以包括程序计数器215、一组浮点寄存器245、和一组整数寄存器250。其他处理电路也可以分配给新创建的线程。应当了解,当线程完成时,可以释放分配为该线程使用的处理电路,以使这些处理电路可以重新分配给随后创建的线程。Figure 2 is a block diagram illustrating an embodiment of an SMT processor according to the present invention. According to FIG. 2, when a new thread is created in the
运行时,取指令电路210基于由所分配的程序计数器215提供的定位,从指令高速缓存220中取出指令,并将该指令提供给译码器225。译码器225将已译码的指令输出到寄存器重命名电路230。根据寄存器重命名电路230所提供指令的类型,寄存器重命名电路230将重命名的指令提供给浮点指令队列235或整数指令队列240。例如,如果寄存器重命名电路230提供的指令类型是浮点指令,则该指令将载入至浮点指令队列235,然而如果该寄存器重命名电路230所提供的指令是整数指令,则该指令将载入至整数指令队列240。In operation, the instruction fetch
将来自从浮点指令队列235或整数指令队列240的指令载入至一相关寄存器,该寄存器用于通过浮点电路255或整数载入/存储电路260运行。特别是,浮点指令从浮点指令队列235传递到一组浮点寄存器245。浮点寄存器245中的指令可以通过浮点电路255存取。例如当浮点电路255所执行的(来自浮点寄存器245的)指令涉及存储在数据高速缓存265中的数据时,浮点电路255还可以存取存储在数据高速缓存265中的浮点数据。Instructions from floating
整数指令将从整数指令队列240传递到整数寄存器250。整数载入/存储电路260可以存取存储在整数寄存器250中的整数指令,以便执行所述指令。例如,当存储在整数寄存器250中的整数指令涉及存储在数据高速缓存265中的整数数据时,整数载入/存储电路260还可以对数据高速缓存265进行存取。Integer instructions will be passed from
根据本发明的实施例,线程管理电路205向数据高速缓存265提供性能指标。特别是,该性能指标可以控制数据高速缓存265以第一性能指标或第二性能指标运行(即以高功率模式或低功率模式)。例如,线程管理电路205能够提供第一性能指标,其中数据高速缓存265以高功率模式运行,或者该线程管理电路可以提供第二性能指标,其中数据高速缓存265以低功率模式运行。应当了解,尽管已描述了以第一性能指标或者以第二性能指标运行的数据高速缓存265,但是根据本发明的实施例,可以使用更多的性能指标。According to an embodiment of the invention,
图3是举例说明根据本发明的线程管理电路实施例的方框图。根据图3,线程管理电路305接收来自操作系统的信息,或换言之,来自与SMT处理器中线程的创建有关的线程产生电路。线程管理电路305包括线程分配电路330,根据本发明,该线程分配电路能够为SMT处理器所创建的线程分配处理电路。Figure 3 is a block diagram illustrating an embodiment of a thread management circuit according to the present invention. According to FIG. 3, the
线程管理电路305还包括性能指标控制电路340,该性能指标控制电路将性能指标提供给与SMT处理器所创建线程相关的处理电路。性能指标控制电路340能够基于所述SMT处理器当前所运行线程的数量,将性能指标提供给处理电路。特别是,随着SMT处理器所运行线程数量的增加,该性能指标控制电路可以将已降低的性能指标提供给与SMT所执行线程相关的处理电路。性能指标控制电路340能够响应于SMT处理器所运行线程的创建和完成,通过递增和递减一个内部计数,来确定SMT处理器当前所运行线程的数量。
应当了解,根据本发明提供给处理电路的性能指标可以拥有一个默认值,例如第一性能指标(或高功率模式)。因此,随着线程增加,可以降低提供给处理电路的性能指标,从而降低性能,并且因此降低该处理电路的功率耗损。应当了解,可以经由信号线向处理电路提供性能指标,该信号线能传导具有至少两种状态的一个信号,所述两种状态即:第一性能指标和第二性能指标。例如,SMT处理器初始化后,该SMT处理器所运行线程的数量可以为零,其中提供给处理电路的性能指标的默认值是默认的第一性能指标(高功率模式)。随着线程增加并最终超过了一个阈值,该性能指标可以被改变为第二性能指标,例如,改变指示所用性能指标的信号的状态。It should be understood that the performance index provided to the processing circuit according to the present invention may have a default value, such as the first performance index (or high power mode). Thus, as the number of threads increases, the performance metrics provided to the processing circuitry may be reduced, thereby reducing performance, and thus reducing the power consumption of the processing circuitry. It should be appreciated that the performance indicator may be provided to the processing circuit via a signal line capable of conducting a signal having at least two states, namely: a first performance indicator and a second performance indicator. For example, after the SMT processor is initialized, the number of running threads of the SMT processor may be zero, wherein the default value of the performance index provided to the processing circuit is the default first performance index (high power mode). As threads increase and eventually exceed a threshold, the performance indicator may be changed to a second performance indicator, for example, changing the state of a signal indicative of the used performance indicator.
图4是举例说明根据本发明的性能指标控制电路实施例的方框图。根据图4,计数器电路405可以接收来自根据图3中所论述的操作系统或线程产生电路的信息,来确定SMT处理器当前所运行线程的数量。例如,如果在接收到关于一个新线程创建的信息时,计数器电路405示出了由SMT处理器启动的四个线程,则计数器电路405可以递增,从而反映SMT处理器当前运行了5个线程。FIG. 4 is a block diagram illustrating an embodiment of a performance index control circuit according to the present invention. According to FIG. 4, the
计数器电路405可以将SMT处理器当前所运行线程的数量,提供给比较器电路410。与SMT处理器当前所运行线程的数量一起,还提供给比较器电路410阈值。该阈值可以示出了线程数量超过性能指标的改变的可编程值。因此,当所述SMT处理器当前所运行线程的数量小于或等于该阈值时,提供给处理电路的性能模式可以保持在第一性能指标,例如高功率模式。然而,当所述SMT处理器当前所运行线程的数量超过了该阈值时,可以降低该性能指标以减少SMT处理器的功率耗损。The
图5是举例说明根据本发明的性能指标控制电路实施例的流程图。根据图5,初始化SMT处理器时,该SMT处理器当前所运行线程的数量为零(模块500),随着在SMT处理器中线程的创建和完成,当前在SMT处理器中运行的线程数量N被递增或递减(模块505)。例如,在SMT处理器中运行四个线程的情况下,该N值将为4。当创建新线程时,该N值递增为5,然而如果随后有线程被完成时,该N值被递减回4。FIG. 5 is a flowchart illustrating an embodiment of a performance index control circuit according to the present invention. According to Fig. 5, when initializing SMT processor, the quantity of thread that this SMT processor is running currently is zero (module 500), along with creating and finishing in thread in SMT processor, the thread quantity that is currently running in SMT processor N is incremented or decremented (block 505). For example, in the case of four threads running in an SMT processor, the N value would be 4. When a new thread is created, the N value is incremented to 5, however if a thread is subsequently completed, the N value is decremented back to 4.
将SMT处理器当前所运行线程的数量与阈值做比较(模块510)。如果该SMT处理器当前所运行线程的数量小于或等于该阈值,该性能指标控制电路将第一性能指标提供给分配给线程的处理电路(模块515)。例如,如果分配给线程的处理电路是根据图2中所论述的高速缓冲存储器,该高速缓冲存储器能运行,使得该标记存储器和数据存储器被同步存取(即以高功率模式)。另一方面,如果SMT处理器所运行线程的数量大于该阈值(模块510),该性能指标控制电路将第二性能指标提供给与线程相关的处理电路(模块520)。例如,在上面根据图2所论述的实施例中,该高速缓冲存储器能以第二性能指标运行,使该数据高速缓存仅响应在标记存储器中的命中而被存取(即以低功率模式)。The number of threads currently running by the SMT processor is compared to a threshold (block 510). If the number of threads currently running by the SMT processor is less than or equal to the threshold, the performance indicator control circuit provides a first performance indicator to the processing circuit assigned to the thread (block 515). For example, if the processing circuitry assigned to a thread is in accordance with the cache memory discussed in Figure 2, the cache memory can operate such that the tag memory and data memory are accessed synchronously (ie in high power mode). On the other hand, if the number of threads run by the SMT processor is greater than the threshold (block 510), the performance indicator control circuit provides a second performance indicator to the processing circuit associated with the thread (block 520). For example, in the embodiment discussed above with reference to FIG. 2, the cache memory can operate at a second performance index such that the data cache is only accessed in response to hits in the tag memory (i.e., in a low power mode) .
图6是举例说明图2中所示的根据本发明高速缓冲存储器实施例的方框图。根据图6,标记存储器610用于存储数据存储器620中所存储数据的地址。SMT处理器使用与可用数据相关的地址对标记存储器610进行存取。通过标记比较电路630,将标记存储器610中的项目与地址进行比较,来确定该SMT处理器所需要的数据是否存储在数据存储器620中。如果标记比较电路630确定标记存储器610示出的所需要的数据存储在数据存储器620中,则产生标记命中,否则产生标记漏失。如果产生标记命中,允许输出电路650允许从数据存储器620中输出数据。FIG. 6 is a block diagram illustrating an embodiment of the cache memory shown in FIG. 2 in accordance with the present invention. According to FIG. 6 , tag memory 610 is used to store addresses of data stored in data memory 620 . The SMT processor accesses tag memory 610 using addresses associated with available data. The item in the tag memory 610 is compared with the address by the tag comparison circuit 630 to determine whether the data required by the SMT processor is stored in the data memory 620 . If the tag comparison circuit 630 determines that the required data shown by the tag memory 610 is stored in the data memory 620, a tag hit is generated, otherwise a tag miss is generated. The enable output circuit 650 allows data to be output from the data memory 620 if a tag hit occurs.
根据本发明的实施例,由性能指标控制电路所提供的性能指标,用于控制标记存储器610和数据存储器620如何运行。特别是,如果将第一性能指标提供给该高速缓冲存储器,不管是否产生了个标记命中,数据存储器允许电路640允许对数据存储器620与标记存储器610进行同步存取。相反,如果将第二性能指标提供给高速缓冲存储器,则除非产生标记命中,否则数据存储器允许电路640不允许对数据存储器620进行存取。According to an embodiment of the present invention, the performance index provided by the performance index control circuit is used to control how the tag memory 610 and the data memory 620 operate. In particular, data store enable circuit 640 allows simultaneous access to data store 620 and tag store 610 if a first performance indicator is provided to the cache memory, regardless of whether a tag hit has occurred. In contrast, if the second performance index is provided to the cache memory, the data memory enable circuit 640 does not allow access to the data memory 620 unless a tag hit occurs.
因此,根据本发明的实施例,在高功率模式中,可以同步存取标记存储器610和数据存储器620,以提供经过提高的性能,然而在个低功率模式中,只有当标记存储器610示出产生了标记命中时,才可以存取数据存储器620,从而允许减少该高速缓冲存储器的功率耗损。Therefore, according to an embodiment of the present invention, in the high power mode, the tag memory 610 and the data memory 620 may be accessed synchronously to provide improved performance, whereas in the low power mode, only when the tag memory 610 is shown generating Data memory 620 can only be accessed when a tag hit is obtained, thereby allowing the power consumption of the cache memory to be reduced.
图7是举例说明根据本发明在指令高速缓存中所使用的实施例的方框图。根据图7,线程管理电路700将指令高速缓存722分配给新线程。包含在线程管理电路300中的性能指标控制电路,可以将性能指标,提供给指令高速缓存722,以控制指令高速缓存722如何运行。Figure 7 is a block diagram illustrating an embodiment used in an instruction cache according to the present invention. According to FIG. 7, the
特别是,指令高速缓存722能够响应于第一性能指标,以高功率模式运行,并且可以用于响应于第二性能指标,以低功率模式运行。根据上面所描述的,例如图5,可以基于所述SMT处理器当前所运行线程的数量,向指令高速缓存722提供第一和第二性能指标。此外,指令高速缓存722能够在与上面根据图6所描述的相类似的方式,以不同的性能指标运行,其中数据存储器620仅响应标记命中以低功率模式存取。例如,当确定对同一高速缓冲存储器线进行连续的存储器存取时,可以在该指令高速缓存中提供不同的性能指标,从而允许进行直接寻址。这种类型的限制可以使用直接寻址高速缓存来执行,该直接寻址高速缓存允许避免对标记随机存取存储器(RAM)进行读取,还允许消除标记比较。此外,在直接寻址高速缓存中,还可以避免从虚拟地址到物理地址的转换。In particular,
图8是举例说明根据本发明,具有不同性能指标的单独的处理电路实施例的方框图。根据图8,第一浮点电路805可用于以第一性能指标运行,然而第二浮点电路815可用于以小于第一性能指标的第二性能指标运行。换言之,第一浮点电路805可用在高功率模式下,然而第二浮点电路815可用在低功率模式下。Figure 8 is a block diagram illustrating an embodiment of individual processing circuits having different performance metrics in accordance with the present invention. According to FIG. 8, the first floating point circuit 805 is operable to operate at a first performance index, whereas the second floating point circuit 815 is operable to operate at a second performance index less than the first performance index. In other words, the first floating point circuit 805 can be used in high power mode, while the second floating point circuit 815 can be used in low power mode.
第一整数载入/存储电路810用于以第一性能指标运行,然而第二整数载入/存储电路820用于以第二性能指标运行。线程管理电路800用于提供两种单独的性能指标。特别是,将第一性能指标提供给第一浮点电路805和第一整数载入/存储电路810。将由线程管理电路800提供的第二性能指标,提供给第二浮点电路815和第二整数载入/存储电路820。因此,将第一浮点电路805和第一整数载入/存储电路810分配给以第一性能指标运行的线程,然而将第二浮点电路815和第二整数载入/存储电路820分配给以第二性能指标运行的线程。应当了解,线程管理电路800能够分别或同步提供第一和第二性能指标。还应当了解,可以提供多于两个的单独的浮点电路和整数载入/存储电路,作为附加的性能指标。The first integer load/store circuit 810 is configured to operate at a first performance specification, whereas the second integer load/store circuit 820 is configured to operate at a second performance specification. Thread management circuit 800 is used to provide two separate performance metrics. In particular, the first performance index is provided to the first floating point circuit 805 and the first integer load/store circuit 810 . The second performance index provided by the thread management circuit 800 is provided to the second floating point circuit 815 and the second integer load/store circuit 820 . Thus, the first floating point circuit 805 and the first integer load/store circuit 810 are assigned to threads running at the first performance index, whereas the second floating point circuit 815 and the second integer load/store circuit 820 are assigned to A thread running at the second performance metric. It should be appreciated that the thread management circuit 800 can provide the first and second performance indicators separately or simultaneously. It should also be appreciated that more than two separate floating point circuits and integer load/store circuits may be provided as an additional performance measure.
根据本发明的实施例,SMT处理器中所运行线程的数量小于或等于第一阈值时,能够向第一浮点电路805和第一整数载入/存储电路810提供第一性能指标。当所述SMT处理器中当前所运行线程的数量超过该第一阈值时,能够向第二浮点电路815和第二整数载入/存储电路820提供第二性能指标。因此,当所述SMT处理器所运行线程的数量超过该阈值时,所有的线程(那些以前存在的和那些新创建的)可以用第二浮点电路815和第二整数载入/存储电路820来减少SMT处理器的功率损耗。According to an embodiment of the present invention, when the number of running threads in the SMT processor is less than or equal to the first threshold, the first performance index can be provided to the first floating point circuit 805 and the first integer load/store circuit 810 . When the number of currently running threads in the SMT processor exceeds the first threshold, a second performance index can be provided to the second floating point circuit 815 and the second integer load/store circuit 820 . Therefore, when the number of threads run by the SMT processor exceeds this threshold, all threads (those previously existing and those newly created) can use the second floating point circuit 815 and the second integer load/store circuit 820 To reduce the power loss of the SMT processor.
应当了解,根据本发明该浮点电路和整数载入/存储电路能够以不同的时钟速度运行和/或使用不同的电路类型(如不同类型的C M O S装置),来提供不同的性能指标。例如,根据本发明的一些实施例,与SMT处理器中线程的运行相关的浮点电路,可以基于所述SMT处理器当前所运行线程的数量,以高时钟速度下的高功率模式或低时钟速度下的低功率模式运行。It should be appreciated that the floating point circuitry and integer load/store circuitry can operate at different clock speeds and/or use different circuit types (e.g., different types of CMOS devices) to provide different performance metrics in accordance with the present invention . For example, according to some embodiments of the present invention, the floating-point circuit related to the operation of threads in the SMT processor can be based on the number of currently running threads in the SMT processor, in a high power mode or a low clock speed at a high clock speed. Low power mode operation at speed.
图9是举例说明包含多个处理电路的SMT处理器实施例的方框图,这些处理电路响应于由线程管理电路900提供的单独的性能指标。特别是,线程管理电路900提供三个单独的性能指标给一个指令高速缓存930,一个数据高速缓存965,第一和第二浮点电路905,915,以及第一和第二整数/载入-存储电路910,920。应当了解,提供给第一和第二浮点电路905,915以及提供给第一和第二整数载入/存储电路910,920的性能指标可以根据图8以上面所论述的方式运行。此外,数据高速缓存965和指令高速缓存930能够分别地根据图2和图7以上面所论述的方式运行。FIG. 9 is a block diagram illustrating an embodiment of an SMT processor including multiple processing circuits responsive to individual performance metrics provided by thread management circuit 900 . In particular, thread management circuit 900 provides three separate performance metrics for an instruction cache 930, a data cache 965, first and second floating point circuits 905, 915, and first and second integer/load- Storage circuits 910,920. It should be appreciated that the performance metrics provided to the first and second floating point circuits 905, 915 and to the first and second integer load/store circuits 910, 920 may operate in the manner discussed above with respect to FIG. Furthermore, data cache 965 and instruction cache 930 can operate in the manner discussed above with respect to FIGS. 2 and 7 , respectively.
因此,可以向不同的处理电路提供单独的性能指标,使得该处理电路能够以不同的性能指标运行,从而能够在性能和功率耗损的权衡上提供更好的控制。例如,当数据高速缓存265和第一和第二浮点电路905,915,以及第一和第二整数载入/存储电路910,920以第二性能指标运行时,该指令高速缓存能以第一性能指标运行。其他的性能指标的组合也可以被使用。Therefore, separate performance indicators can be provided to different processing circuits, so that the processing circuits can operate with different performance indicators, thereby providing better control on the trade-off between performance and power consumption. For example, when the
图10是举例说明图9中线程管理电路900包含的性能指标控制电路实施例的方框图。特别是,该性能指标控制电路包括计数器1000,该计数器响应SMT处理器中创建和完成的线程进行递增和递减的操作。第一到第三寄存器1015,1020,1025,每一个寄存器能存储该SMT处理器当前所运行的线程的数量的单独的阈值。三个比较器电路1030,1035和1040,分别与对应的寄存器1015,1020,1025相连接。特别是,存储第一阈值的第一寄存器1015与第一比较器电路1030相连接。存储第二阈值的第二寄存器1020与第二比较器电路1035相连接。存储第三阈值的第三寄存器1025与第三比较器电路1040相连接。FIG. 10 is a block diagram illustrating an embodiment of the performance index control circuit included in the thread management circuit 900 in FIG. 9 . In particular, the performance indicator control circuit includes a counter 1000 that is incremented and decremented in response to threads being created and completed in the SMT processor. First through third registers 1015, 1020, 1025, each capable of storing a separate threshold for the number of threads currently running on the SMT processor. The three comparator circuits 1030, 1035 and 1040 are respectively connected to the corresponding registers 1015, 1020 and 1025. In particular, a first register 1015 storing a first threshold value is connected to a first comparator circuit 1030 . The second register 1020 storing the second threshold value is connected to the second comparator circuit 1035 . A third register 1025 storing a third threshold value is connected to a third comparator circuit 1040 .
比较器电路1030,1035,1040中每一个,将SMT处理器当前所运行线程的数量与存储在各寄存器中的阈值进行比较。如果第一比较器电路1030确定当前SMT处理器所运行线程的数量大于第一寄存器1015中的第一阈值,则第一比较器电路1130产生一个性能指标1045,如图9所示,该性能指标连接到数据高速缓存965。因此,当该SMT处理器所运行线程的数量超过第一寄存器1015中的阈值时,数据高速缓存965的性能指标从第一性能指标改变为第二性能指标(即从高功率模式到低功率模式)。Each of the comparator circuits 1030, 1035, 1040 compares the number of threads currently running by the SMT processor with thresholds stored in respective registers. If the first comparator circuit 1030 determines that the number of threads run by the current SMT processor is greater than the first threshold in the first register 1015, then the
如果第二比较器电路1035确定SMT处理器当前所运行线程的数量超过存储在第二寄存器1020中的阈值,则第二比较器电路1035产生一个连接到指令高速缓存930的性能指标1050,从而将指令高速缓存930的性能指标从第一性能指标改变为第二性能指标(即从高功率模式到低功率模式)。If the second comparator circuit 1035 determines that the number of threads currently being run by the SMT processor exceeds the threshold value stored in the second register 1020, the second comparator circuit 1035 generates a performance indicator 1050 that is connected to the instruction cache 930 so that the The performance index of the instruction cache 930 is changed from the first performance index to the second performance index (ie, from the high power mode to the low power mode).
如果第三比较器电路1040确定SMT处理器当前所运行线程的数量超过存储在第三寄存器1025中的阈值,则第三比较器电路1040产生一个连接到第一和第二浮点电路905,915和第一和第二整数/载入-存储电路910,920的性能指标1055。因此,也将这些处理电路的性能指标从第一性能指标改变为第二性能指标(即从高功率模式到低功率模式)。应当了解,连接到该浮点电路和该整数载入/存储电路的性能指标1055根据图8以上面所描述的方式运行。If the third comparator circuit 1040 determines that the number of threads currently being run by the SMT processor exceeds the threshold stored in the third register 1025, the third comparator circuit 1040 generates a signal that is connected to the first and second floating point circuits 905, 915. and the performance index 1055 of the first and second integer/load-store circuits 910,920. Accordingly, the performance specification of these processing circuits is also changed from the first performance specification to the second performance specification (ie from high power mode to low power mode). It should be appreciated that performance indicators 1055 connected to the floating point circuit and the integer load/store circuit operate in the manner described above with respect to FIG. 8 .
图11是举例说明图10中所例举的性能指标控制电路实施例的方法的流程图。根据图11,初始化SMT处理器时,该SMT处理器当前所运行线程的数量为零(模块1100)。随着该SMT处理器进行的线程的创建和完成,递增和递减该SMT处理器当前所运行线程的数量,以提供数值N,该数值表示SMT处理器当前所运行线程的数量(模块1105)。FIG. 11 is a flow chart illustrating the method of the embodiment of the performance index control circuit illustrated in FIG. 10 . According to FIG. 11 , when an SMT processor is initialized, the number of threads currently running by the SMT processor is zero (block 1100 ). As threads are created and completed by the SMT processor, the number of threads currently running by the SMT processor is incremented and decremented to provide a value N representing the number of threads currently running by the SMT processor (block 1105).
如果SMT处理器当前所运行线程的数量少于或等于第一阈值(模块1110),所有的处理电路继续以第一性能指标(或高性能指标)运行(模块1115)。另一方面,如果SMT处理器当前所运行线程的数量超过了第一阈值(模块1110),则连接到性能指标1045的处理电路开始以第二性能指标(或低性能指标)运行(模块1120)。If the number of threads currently running by the SMT processor is less than or equal to the first threshold (block 1110), all processing circuits continue to run at the first performance index (or high performance index) (block 1115). On the other hand, if the number of threads currently running by the SMT processor exceeds the first threshold (block 1110), the processing circuits connected to the performance index 1045 begin to run at the second performance index (or low performance index) (block 1120) .
如果SMT处理器当前所运行线程的数量少于或等于一个第二阈值(模块1125),则连接到性能指标1050(和连接到性能指标1055)的处理电路开始(或继续)以第一性能指标运行,与此同时,连接到性能指标1045(如上所述)的处理电路仍然以第二性能指标运行(模块1130)。If the number of threads currently running by the SMT processor is less than or equal to a second threshold (block 1125), the processing circuitry coupled to performance indicator 1050 (and to performance indicator 1055) begins (or continues) with the first performance indicator operation while the processing circuitry coupled to the performance indicator 1045 (described above) is still operating at the second performance indicator (block 1130).
如果SMT处理器当前所运行线程的数量超过了第二阈值(模块1125),连接到性能指标1050的处理电路,连同连接到性能指标1045的处理电路,开始(或继续)以第二性能指标运行(模块1135),然而连接到性能指标1055的处理电路继续以第一性能指标运行。If the number of threads currently running by the SMT processor exceeds a second threshold (block 1125), the processing circuitry coupled to the performance indicator 1050, along with the processing circuitry coupled to the performance indicator 1045, begin (or continue) to run at the second performance indicator (block 1135), however the processing circuitry coupled to performance indicator 1055 continues to operate at the first performance indicator.
如果SMT处理器当前所运行线程的数量少于或等于一个第三阈值(模块1140),连接到性能指标1055的处理电路继续以第一性能指标运行,然而连接到性能指标1045和性能指标1050的处理电路仍然以第二性能指标运行(模块1145)。如果SMT处理器当前所运行线程的数量超过了第三阈值(模块1140),则连接到性能指标1055的处理电路开始(或继续)以第二性能指标(即以低功率模式)运行(模块1150)。If the number of threads currently being run by the SMT processor is less than or equal to a third threshold (block 1140), the processing circuits connected to the performance index 1055 continue to run with the first performance index, while the processing circuits connected to the performance index 1045 and the performance index 1050 The processing circuit is still operating at the second performance indicator (block 1145). If the number of threads currently running by the SMT processor exceeds a third threshold (block 1140), the processing circuit connected to the performance index 1055 begins (or continues) to run with a second performance index (i.e., in a low power mode) (block 1150 ).
如上所述,根据本发明的实施例能够提供与SMT处理器中线程运行相关的处理电路,其中该处理电路基于所述SMT处理器当前所运行的线程的数量,以不同的性能指标运行。例如,根据本发明的一些实施例,与SMT处理器中线程的运行相关的处理电路,例如浮点单元或数据高速缓存,能够基于所述SMT处理器当前所运行线程的数量以高功率模式或低功率模式运行。As mentioned above, according to the embodiments of the present invention, processing circuits related to running threads in the SMT processor can be provided, wherein the processing circuits run with different performance indicators based on the number of threads currently running on the SMT processor. For example, according to some embodiments of the present invention, processing circuits related to the execution of threads in an SMT processor, such as floating point units or data caches, can be configured in a high-power mode or Low power mode operation.
此外,随着SMT处理器所运行线程数量的增加,能够降低处理电路的性能指标,从而在允许减少与线程相关的处理电路所消耗功率的总量时,提供了该SMT处理器体系结构的优点。例如,根据本发明的一些实施例,根据本发明的处理电路能够以不同的时钟速度运行和/或使用不同的电路类型(如不同类型的CMOS装置)来提供不同的性能指标。例如,根据本发明的一些实施例,与SMT处理器中线程的运行相关的处理电路,例如浮点电路或数据高速缓存,能够基于所述SMT处理器当前所运行线程的数量,以高时钟速度下的高功率模式或低时钟速度下的低功率模式运行。Furthermore, as the number of threads run by an SMT processor increases, the performance metrics of the processing circuitry can be reduced, thereby providing the advantages of the SMT processor architecture while allowing a reduction in the amount of power consumed by the processing circuitry associated with the threads . For example, according to some embodiments of the invention, processing circuits according to the invention can operate at different clock speeds and/or use different circuit types (eg, different types of CMOS devices) to provide different performance metrics. For example, according to some embodiments of the present invention, processing circuits related to the running of threads in an SMT processor, such as floating point circuits or data caches, can be clocked at a high clock speed based on the number of threads currently running on the SMT processor. run in high power mode at low clock speeds or in low power mode at low clock speeds.
本领域普通技术人员在给出的目前所公开的优势下,可以在不脱离本发明的精神和范围的情况下进行许多的变化和修改。因此,应当了解,前面所例举的实施例目的只是为了举例,而不应该用来将本发明限定成如下面权利要求所定义的那样。因此,下述权利要求所包含的内容不仅是前面照字面上所述的元件的组合,而且还包含了所有用于以实质上相同的方式执行实质上相同的功能,以获得实质上相同的结果的所有等效元件。因此,该权利要求应理解为包括了上面具体说明与描述的内容,概念上相等的内容,以及结合了本发明基本原理的内容。Numerous changes and modifications may be made by one of ordinary skill in the art, given the presently disclosed advantage, without departing from the spirit and scope of the invention. Accordingly, it should be understood that the foregoing illustrated embodiments are for purposes of example only, and should not be taken to limit the invention as defined by the following claims. Accordingly, to the following claims it is intended that not only combinations of the foregoing literally stated elements be included, but also all combinations for performing substantially the same function in substantially the same way to obtain substantially the same result all equivalent elements. Therefore, the claims should be understood to include what is specifically illustrated and described above, what is conceptually equivalent, and what incorporates the basic principles of the present invention.
Claims (32)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10759/2003 | 2003-02-20 | ||
KR10759/03 | 2003-02-20 | ||
KR20030010759 | 2003-02-20 | ||
US10/631,601 | 2003-07-31 | ||
US10/631,601 US7152170B2 (en) | 2003-02-20 | 2003-07-31 | Simultaneous multi-threading processor circuits and computer program products configured to operate at different performance levels based on a number of operating threads and methods of operating |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1534463A CN1534463A (en) | 2004-10-06 |
CN100394381C true CN100394381C (en) | 2008-06-11 |
Family
ID=32044744
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100430627A Expired - Lifetime CN100394381C (en) | 2003-02-20 | 2004-02-20 | Synchronous multi-thread processor circuit and operating method |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP4439288B2 (en) |
KR (1) | KR100594256B1 (en) |
CN (1) | CN100394381C (en) |
GB (1) | GB2398660B (en) |
TW (1) | TWI261198B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4574493B2 (en) * | 2005-08-22 | 2010-11-04 | キヤノン株式会社 | Processor system and multi-thread processor |
JP4687685B2 (en) * | 2007-04-24 | 2011-05-25 | 株式会社デンソー | Electronic control device for engine control and microcomputer |
EP2159700A4 (en) * | 2007-06-19 | 2011-07-20 | Fujitsu Ltd | CACHE MEMORY CONTROLLER AND CONTROL METHOD |
KR101109029B1 (en) | 2007-06-20 | 2012-01-31 | 후지쯔 가부시끼가이샤 | Arithmetic unit |
US9529727B2 (en) | 2014-05-27 | 2016-12-27 | Qualcomm Incorporated | Reconfigurable fetch pipeline |
CN105808444B (en) * | 2015-01-19 | 2019-01-01 | 东芝存储器株式会社 | The control method of storage device and nonvolatile memory |
WO2018018494A1 (en) * | 2016-07-28 | 2018-02-01 | 张升泽 | Method and system for allocating power based on multi-zone allocation |
WO2018018492A1 (en) * | 2016-07-28 | 2018-02-01 | 张升泽 | Method and system of allocating current in plurality of intervals in interior of multi-core chip |
CN112631960B (en) * | 2021-03-05 | 2021-06-04 | 四川科道芯国智能技术股份有限公司 | Method for expanding cache memory |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1193144A (en) * | 1997-03-11 | 1998-09-16 | 国际商业机器公司 | Method for monitoring property of multi-line processor and system thereof |
US6079025A (en) * | 1990-06-01 | 2000-06-20 | Vadem | System and method of computer operating mode control for power consumption reduction |
US6493741B1 (en) * | 1999-10-01 | 2002-12-10 | Compaq Information Technologies Group, L.P. | Method and apparatus to quiesce a portion of a simultaneous multithreaded central processing unit |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5218704A (en) * | 1989-10-30 | 1993-06-08 | Texas Instruments | Real-time power conservation for portable computers |
JP3100241B2 (en) * | 1992-10-09 | 2000-10-16 | ダイヤセミコンシステムズ株式会社 | Microprocessor drive controller |
JP3461535B2 (en) * | 1993-06-30 | 2003-10-27 | 株式会社日立国際電気 | Wireless terminal device and control method therefor |
US5630142A (en) * | 1994-09-07 | 1997-05-13 | International Business Machines Corporation | Multifunction power switch and feedback led for suspend systems |
US6073159A (en) | 1996-12-31 | 2000-06-06 | Compaq Computer Corporation | Thread properties attribute vector based thread selection in multithreading processor |
US6272616B1 (en) * | 1998-06-17 | 2001-08-07 | Agere Systems Guardian Corp. | Method and apparatus for executing multiple instruction streams in a digital processor with multiple data paths |
US7051329B1 (en) * | 1999-12-28 | 2006-05-23 | Intel Corporation | Method and apparatus for managing resources in a multithreaded processor |
US7487505B2 (en) * | 2001-08-27 | 2009-02-03 | Intel Corporation | Multithreaded microprocessor with register allocation based on number of active threads |
US6711447B1 (en) * | 2003-01-22 | 2004-03-23 | Intel Corporation | Modulating CPU frequency and voltage in a multi-core CPU architecture |
-
2004
- 2004-02-17 TW TW093103698A patent/TWI261198B/en not_active IP Right Cessation
- 2004-02-19 GB GB0403738A patent/GB2398660B/en not_active Expired - Lifetime
- 2004-02-20 JP JP2004043969A patent/JP4439288B2/en not_active Expired - Fee Related
- 2004-02-20 CN CNB2004100430627A patent/CN100394381C/en not_active Expired - Lifetime
- 2004-02-20 KR KR1020040011337A patent/KR100594256B1/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6079025A (en) * | 1990-06-01 | 2000-06-20 | Vadem | System and method of computer operating mode control for power consumption reduction |
CN1193144A (en) * | 1997-03-11 | 1998-09-16 | 国际商业机器公司 | Method for monitoring property of multi-line processor and system thereof |
US6493741B1 (en) * | 1999-10-01 | 2002-12-10 | Compaq Information Technologies Group, L.P. | Method and apparatus to quiesce a portion of a simultaneous multithreaded central processing unit |
Non-Patent Citations (4)
Title |
---|
Converting Thread-Level Parallelism to Instruction-LevelParallelism via Simultaneous Multithreading. JACK L.LO SUSAN J.EGGERS ET AL.ACM Transactions On Computer Systems,Vol.VOL.15 No.NO.3. 1997 |
Converting Thread-Level Parallelism to Instruction-LevelParallelism via Simultaneous Multithreading. JACK L.LO SUSAN J.EGGERS ET AL.ACM Transactions On Computer Systems,Vol.VOL.15 No.NO.3. 1997 * |
Simultaneous Multithreading:Maximizing On-Chip Parallelism. Tullsen ET AL.Proceedings of the 22nd Annual International Symposium on Computer Acchitecture. 1995 |
Simultaneous Multithreading:Maximizing On-Chip Parallelism. Tullsen ET AL.Proceedings of the 22nd Annual International Symposium on Computer Acchitecture. 1995 * |
Also Published As
Publication number | Publication date |
---|---|
JP4439288B2 (en) | 2010-03-24 |
JP2004252987A (en) | 2004-09-09 |
KR100594256B1 (en) | 2006-06-30 |
KR20040075287A (en) | 2004-08-27 |
GB2398660B (en) | 2005-09-07 |
CN1534463A (en) | 2004-10-06 |
TW200421180A (en) | 2004-10-16 |
GB2398660A (en) | 2004-08-25 |
TWI261198B (en) | 2006-09-01 |
GB0403738D0 (en) | 2004-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7152170B2 (en) | Simultaneous multi-threading processor circuits and computer program products configured to operate at different performance levels based on a number of operating threads and methods of operating | |
US9715391B2 (en) | Cache memory apparatus | |
US6314511B2 (en) | Mechanism for freeing registers on processors that perform dynamic out-of-order execution of instructions using renaming registers | |
US6151662A (en) | Data transaction typing for improved caching and prefetching characteristics | |
US7437537B2 (en) | Methods and apparatus for predicting unaligned memory access | |
US20090055635A1 (en) | Program execution control device | |
US10866834B2 (en) | Apparatus, method, and system for ensuring quality of service for multi-threading processor cores | |
TW201342218A (en) | Providing an asymmetric multicore processor system transparently to an operating system | |
US20040205326A1 (en) | Early predicate evaluation to reduce power in very long instruction word processors employing predicate execution | |
CN112559389B (en) | Storage control device, processing device, computer system and storage control method | |
EP4020187A1 (en) | Segmented branch target buffer based on branch instruction type | |
CN100394381C (en) | Synchronous multi-thread processor circuit and operating method | |
US20080209174A1 (en) | Processor And Its Instruction Issue Method | |
KR100977687B1 (en) | Power saving method and apparatus for selectively enabling comparators in the CA Renaming Register file based on known processor state | |
US20080244224A1 (en) | Scheduling a direct dependent instruction | |
Chappell et al. | Microarchitectural support for precomputation microthreads | |
Dixon et al. | THE NEXT-GENERATION INTEL CORE MICROARCHITECTURE. | |
EP3757772A1 (en) | System, apparatus and method for a hybrid reservation station for a processor | |
CN118227285B (en) | Resource allocation method, processor and electronic device | |
US12254319B2 (en) | Scalable toggle point control circuitry for a clustered decode pipeline | |
US20240004808A1 (en) | Optimized prioritization of memory accesses | |
GB2410584A (en) | A simultaneous multi-threading processor accessing a cache in different power modes according to a number of threads | |
CN118193153A (en) | Resource allocation method, processor and electronic equipment | |
Assis | Simultaneous Multithreading: a Platform for Next Generation Processors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CX01 | Expiry of patent term | ||
CX01 | Expiry of patent term |
Granted publication date: 20080611 |