CN100489784C - Multithreading microprocessor and its novel threading establishment method and multithreading processing system - Google Patents
Multithreading microprocessor and its novel threading establishment method and multithreading processing system Download PDFInfo
- Publication number
- CN100489784C CN100489784C CNB2004800247988A CN200480024798A CN100489784C CN 100489784 C CN100489784 C CN 100489784C CN B2004800247988 A CNB2004800247988 A CN B2004800247988A CN 200480024798 A CN200480024798 A CN 200480024798A CN 100489784 C CN100489784 C CN 100489784C
- Authority
- CN
- China
- Prior art keywords
- thread
- instruction
- microprocessor
- operand
- register
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims description 31
- 238000012545 processing Methods 0.000 title claims description 14
- 230000004044 response Effects 0.000 claims description 33
- 238000009826 distribution Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 239000000284 extract Substances 0.000 claims 1
- 230000006870 function Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 10
- 230000008901 benefit Effects 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 8
- 238000013519 translation Methods 0.000 description 6
- 230000014616 translation Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000006872 improvement Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 239000000725 suspension Substances 0.000 description 2
- 241001522296 Erithacus rubecula Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 235000003642 hunger Nutrition 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000037351 starvation Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Landscapes
- Executing Machine-Instructions (AREA)
- Multi Processors (AREA)
- Advance Control (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明公开了一种fork指令,该fork指令在多线程微处理器上执行,且占用单一指令发出时隙。在父线程中执行时,所述fork指令包括第一操作数和第二操作数,该第一操作数指定新线程的初始指令提取地址。所述微处理器通过为所述新线程分配上下文,将第一操作数复制到所述新线程上下文的程序计数器,将第二操作数复制到所述新线程上下文的寄存器以及调度所述新线程的执行,来执行所述fork指令。如果没有自由的新线程上下文可供分配,则所述微处理器发出一异常给所述fork指令。本发明的fork指令是很有效率的,因为其不需要将整个父线程通用寄存器复制到新线程。所述第二操作数通常可以用作指向包含有新线程的起始通用寄存器组值的存储器中的数据结构的指针。
The invention discloses a fork instruction, which is executed on a multi-threaded microprocessor and occupies a time slot for issuing a single instruction. When executed in a parent thread, the fork instruction includes a first operand and a second operand, the first operand specifying an initial instruction fetch address of the new thread. The microprocessor allocates a context for the new thread, copies a first operand to a program counter of the new thread context, copies a second operand to a register of the new thread context, and schedules the new thread Execution to execute the fork instruction. If no new thread contexts are free to allocate, the microprocessor issues an exception to the fork instruction. The fork instruction of the present invention is very efficient because it does not need to copy the entire parent thread general registers to the new thread. The second operand may generally be used as a pointer to a data structure in memory containing the new thread's starting general purpose register set value.
Description
相关申请交叉引用Related Application Cross Reference
本申请是下列待审美国非临时专利申请的部分继续(CIP),在此将其全文引入,作为参考,用于所有目的。This application is a continuation-in-part (CIP) of the following pending US non-provisional patent application, which is hereby incorporated by reference in its entirety for all purposes.
上述待审的美国非临时专利申请要求下列美国临时申请的权利,在此将其全文引入,作为参考,用于所有目的。The aforementioned pending US non-provisional patent application claims the benefit of the following US provisional application, which is hereby incorporated by reference in its entirety for all purposes.
本申请涉及下列同时申请的美国非临时专利申请,在此将其全文引入,作为参考,用于所有目的。This application is related to the following concurrently filed US Nonprovisional Patent Application, which is hereby incorporated by reference in its entirety for all purposes.
技术领域 technical field
本发明通常涉及多线程处理器领域,特别是用于产生在多线程处理器中执行的新线程的指令。The present invention relates generally to the field of multithreaded processors, and in particular to instructions for spawning new threads for execution in multithreaded processors.
背景技术 Background technique
微处理器的设计者使用了许多技术来提高微处理器的性能。大多数的微处理器使用在一固定频率下运行的时钟信号来进行操作。每个时钟周期,微处理器中的电路执行其各自的功能。根据Hennessy(汉尼斯)与Patterson(派特森),微处理器的性能是根据执行一个程序或多个程序所需的时间来实际测量的。根据此种观点,微处理器的性能是其时钟频率、执行一个指令所需的时钟周期的平均数目(或者表述为,每一个时钟周期所执行的指令的平均数目),以及一个程序或多个程序中所执行的指令数目的函数。半导体科学家以及工程师不断地使得微处理器能够在更快的时钟频率下运行,主要是通过减少晶体管大小来实现,这导致更快的切换时间。能够执行的指令数目主要由程序中所欲执行的任务所限制,尽管也会受微处理器指令集架构影响。利用架构或组织的概念,即改善每一时钟周期所能执行的指令,特别是利用并行处理概念,可以大幅度提高性能。Microprocessor designers use a number of techniques to increase the performance of microprocessors. Most microprocessors operate using a clock signal that runs at a fixed frequency. Each clock cycle, the circuits in the microprocessor perform their respective functions. According to Hennessy and Patterson, the performance of a microprocessor is actually measured in terms of the time it takes to execute a program or programs. According to this view, the performance of a microprocessor is its clock frequency, the average number of clock cycles required to execute an instruction (or expressed as the average number of instructions executed per clock cycle), and the A function of the number of instructions executed in the program. Semiconductor scientists and engineers are continually enabling microprocessors to run at faster clock frequencies, primarily by reducing transistor size, which results in faster switching times. The number of instructions that can be executed is primarily limited by the tasks the program is trying to perform, although it is also affected by the microprocessor's instruction set architecture. Using the concept of architecture or organization, that is, improving the instructions that can be executed per clock cycle, and especially using the concept of parallel processing, can greatly improve performance.
一种能改善每一时钟周期所能执行的指令以及时钟频率的微处理器的并行处理概念是流水线,其使在微处理器的流水线阶段中的多个指令的执行重叠。在理想的情况下,每一时钟周期,一个指令将流水线向下移到一个新阶段,该新阶段执行所述指令的一个不同的功能。因此,虽然每一个体指令需要花费多个时钟周期来完成,但是因为所述单个指令的所述多个周期互相重叠,所以每一指令所需的平均时钟周期会减少。所述流水线的性能改善可以被实现到所述程序中的指令允许的程度,即使得一指令不依赖其前趋(precedessor)来执行且因此与其前趋并行执行的程度,此称之为指令级并行处理。现今微处理器使用的指令级并行处理的另一种方式是每一时钟周期发出多个执行指令,通常称之为超级标量微处理器。A parallel processing concept for microprocessors that improves the number of instructions that can be executed per clock cycle and the clock frequency is pipelining, which overlaps the execution of multiple instructions in the microprocessor's pipeline stages. Ideally, each clock cycle, an instruction moves the pipeline down to a new stage that performs a different function of the instruction. Thus, although each individual instruction takes multiple clock cycles to complete, the average clock cycle required per instruction is reduced because the multiple cycles of the single instruction overlap each other. The performance improvement of the pipeline can be realized to the extent that the instructions in the program allow, that is, to the extent that an instruction is executed independently of its predecessor and is therefore executed in parallel with its predecessor, which is called the instruction level. Parallel processing. Another form of instruction-level parallelism used by today's microprocessors is to issue multiple instructions for execution per clock cycle, often referred to as superscalar microprocessors.
前述所讨论的适合于个体指令级的并行处理,然而,通过使用指令级并行处理所能实现的性能改善是有限的。由有限的指令级并行处理产生的各种限制以及其他性能限制问题使得近来对使用数据块(block)级的并行处理,或是指令系列或是流的并行处理,这通常被称为线程级并行处理,重新产生兴趣。一个线程只是一程序指令系列或程序指令流。多线程微处理器可根据一调度策略来并行执行多个线程,该调度策略规定各种线程的指令的提取及发出,比如交织、成块(blocked)或并发多线程。一个多线程微处理器通常允许多线程同时共享微处理器中的一些功能单元(如:指令提取及译码单元、高速缓存器、分支预测单元、以及读取/存储、整数处理、浮点处理、SIMD等执行单元)。然而,多线程微处理器还包括用于存储每一线程的惟一状态的多组资源或上下文(context),比如多个程序计数器和通用寄存器组,以促进在线程之间快速切换以提取及发出指令的能力。The foregoing discussion lends itself to individual instruction-level parallelism, however, there are limits to the performance improvements that can be achieved by using instruction-level parallelism. Various limitations arising from limited instruction-level parallelism and other performance-limiting issues have recently led to the use of block-level parallelism, or parallelism of instruction sequences or streams, which is often referred to as thread-level parallelism. Process, renewed interest. A thread is simply a sequence or stream of program instructions. A multithreaded microprocessor can execute multiple threads in parallel according to a scheduling policy that specifies fetching and issuing of instructions for various threads, such as interleaved, blocked, or concurrent multithreading. A multithreaded microprocessor usually allows multiple threads to share some functional units in the microprocessor at the same time (such as: instruction fetch and decode unit, cache, branch prediction unit, and read/store, integer processing, floating point processing , SIMD and other execution units). However, multithreaded microprocessors also include multiple sets of resources or contexts for storing the unique state of each thread, such as multiple program counters and general-purpose register banks, to facilitate fast switching between threads to fetch and issue ability to command.
多线程微处理器所解决的性能限制问题的一个例子是下述一个事实,即因为高速存取失败而必须存取微处理器外部的存储器时通常需要一段相对长的等待时间。对基于当代微处理器的计算机而言,存储器命中(cache hit)的存取时间通常为高速缓存存储器命中的存取时间的一至二个数量级倍数是十分常见的。因而,当流水线停顿以等待来自存储器的数据时,一个单线程微处理器的一些或全部流水线阶段必须暂停执行任何无用的工作且持续多个时钟周期。而多线程微处理器则可以通过在存储器提取的等待时间内发出来自其它线程的指令来解决这个问题,由此使得流水线阶段能够进行下一进程来执行有用的工作,有点类似于在页面错误时执行任务切换的操作系统,但其粒度级更为精细。其它的例子可以是由于分支错误预测以及伴随的流水线排空(flush),或者由于数据依赖性,或是由于如分割指令的长等待时间指令,而造成的流水线停顿以及其相应的空闲时钟周期。同样的,多线程微处理器将来自其它线程的指令发出到另外处于空闲的流水线阶段的能力可以使执行包含多个线程的一程序或多个程序所需的时间大大减少。另一个问题是,特别是在嵌入式系统中,与中断服务相关的所述浪费的开销。通常,当一输入/输出设备向微处理器发出一个中断事件信号时,所述微处理器切换对中断服务例程的控制,这要求存储当前程序状态,服务于该中断,以及在中断已经完成后重新恢复到当前程序状态。一多线程微处理器为事件服务代码提供一种能力,使得其自有的线程具有其自有的上下文。因此,响应于输入/输出设备发出一个事件信号,多线程微处理器可以快速地,可能在单一时钟周期内,切换回所述事件服务线程,由此可以避免出现传统中断服务例程开销。An example of a performance limiting problem addressed by multi-threaded microprocessors is the fact that there is usually a relatively long latency when having to access memory external to the microprocessor due to high-speed access failures. For modern microprocessor based computers, it is quite common for the access time of a memory hit (cache hit) to be a multiple of one to two orders of magnitude of the access time of a cache memory hit. Thus, some or all of the pipeline stages of a single-threaded microprocessor must be suspended from performing any useless work for many clock cycles while the pipeline is stalled waiting for data from memory. A multi-threaded microprocessor can solve this problem by issuing instructions from other threads during the latency of memory fetches, thus enabling the next process in the pipeline to perform useful work, somewhat similar to a page fault An operating system that performs task switching, but at a finer granularity. Other examples could be pipeline stalls and their corresponding idle clock cycles due to branch mispredictions and accompanying pipeline flushes, or due to data dependencies, or due to long latency instructions such as split instructions. Likewise, the ability of a multi-threaded microprocessor to issue instructions from other threads to otherwise idle pipeline stages can greatly reduce the time required to execute a program or programs involving multiple threads. Another problem, especially in embedded systems, is said wasteful overhead associated with servicing interrupts. Typically, when an input/output device signals an interrupt event to the microprocessor, the microprocessor switches control of the interrupt service routine, which requires storing the current program state, servicing the interrupt, and Then restore to the current program state. A multi-threaded microprocessor provides the event service code with the ability to have its own thread with its own context. Thus, in response to an event signaled by an I/O device, the multithreaded microprocessor can quickly, possibly within a single clock cycle, switch back to the event service thread, thereby avoiding the overhead of conventional interrupt service routines.
如同指令级并行处理的程度规定微处理器对流水线以及超量指令发出的好处的利用程度,线程级并行处理的程度也可规定微处理器对多线程执行的利用程度。线程的一个重要特征是其不依赖于与在多线程微处理器上执行的其它线程。一个线程不依赖于另一线程,使得该线程的指令不依赖于其他线程中的指令。线程的独立特性使得微处理器能够同时执行各个线程的指令。也就是,微处理器可以将一线程的指令发至执行单元,而不需要考虑被发出的其它线程的指令。就所述线程存取公共数据而言,所述线程自身必须被编程来保证使数据存取彼此同步,以确保正确操作,使得微处理器指令发出阶段并不需要关注所述依赖性。Just as the degree of instruction-level parallelism dictates the extent to which the microprocessor utilizes the benefits of pipelining and excess instruction issuance, the degree of thread-level parallelism dictates the extent to which the microprocessor utilizes multi-threaded execution. An important characteristic of threads is that they are not dependent on other threads executing on a multithreaded microprocessor. A thread is not dependent on another thread such that instructions in that thread do not depend on instructions in other threads. The independent nature of threads enables the microprocessor to execute the instructions of each thread concurrently. That is, the microprocessor can issue instructions for one thread to the execution units without regard to the instructions of other threads being issued. As far as the threads access common data, the threads themselves must be programmed to ensure that data accesses are synchronized with each other to ensure correct operation, so that the microprocessor instruction issue stage does not need to be concerned with the dependencies.
如同上面叙述可以观察到的,一处理器同时执行多个线程,可以减少执行包含所述多个线程的一个程序或多个程序所需的时间。然而,存在与用于执行的新线程的创建以及分发相关联的开销。也就是说,微处理器必须花费有用的时间来执行必需的功能来创建一个新线程——通常要为新线程分配上下文且将父线程的上下文复制到新线程的上下文——并且调度新线程的执行,例如,确定微处理器何时开始从新线程提取并发出指令。所述开销时间类似于多任务操作系统的任务切换开销,并且不会对执行由一个程序或多个程序完成的实际任务,如乘法矩阵,或处理从网络接收的一个分组或提供一个图像有所贡献。因此,虽然并行执行多线程在理论上可以改善微处理器的性能,但是该性能改善的程度受限于创建新线程所需的开销。换句话说,创建新线程所需的开销越大,必须由该新线程执行的有用工作的量就更多,以抵销创建新线程的代价。对于具有相当长的执行时间的线程而言,线程创建开销可能基本与性能无关。然而,某些应用或许可以受益于相对频繁创建的、具有相对短的执行时间的线程,在这种情况下,所述线程创建开销必须足够短以便实现从多线程获得的相当高的性能。因此,需要一个多线程微处理器,该多线程微处理器在其指令集中具有一个轻量级的线程创建指令。As can be observed from the above description, a processor executing multiple threads simultaneously can reduce the time required to execute a program or programs comprising said multiple threads. However, there is an overhead associated with the creation and distribution of new threads for execution. That is, the microprocessor must spend useful time performing the necessary functions to create a new thread—typically allocating a context for the new thread and copying the parent thread's context to the new thread's context—and scheduling the new thread's Execution, for example, determines when the microprocessor starts fetching and issuing instructions from a new thread. The overhead time is similar to the task switching overhead of a multitasking operating system, and does not contribute to performing the actual task performed by the program or programs, such as multiplying a matrix, or processing a packet received from the network or rendering an image. contribute. Thus, while executing multiple threads in parallel can theoretically improve microprocessor performance, the extent of this performance improvement is limited by the overhead required to create new threads. In other words, the greater the overhead required to create a new thread, the greater the amount of useful work that must be performed by that new thread to offset the cost of creating the new thread. For threads with significant execution times, thread creation overhead may be largely irrelevant to performance. However, certain applications may benefit from relatively frequently created threads with relatively short execution times, in which case the thread creation overhead must be short enough to achieve reasonably high performance from multithreading. Therefore, there is a need for a multithreaded microprocessor that has a lightweight thread creation instruction in its instruction set.
发明内容 Contents of the invention
本发明在多线程微处理器指令集中提供单一指令,当执行时,该单一指令为一新线程分配一线程上下文,且调度该新线程的执行。在一个实施例中,所述指令以类似精简指令集计算机(RISC)方式来占用微处理器中的单一指令发出时隙。所述指令具有非常低的开销,因为其放弃将整个父线程上下文复制到该线程,对于这种整个复制,如果顺序地复制所述线程上下文,则需要相当长的时间,或如果并行复制则需要一个庞大的数据路径以及多个逻辑。取而代之,所述指令包含第一操作数以及第二操作数,其中所述第一操作数是被存储进所述新线程上下文的程序计数器中的初始指令提取地址,所述第二操作数被存储在所述新线程上下文的寄存器组中的一个寄存器(比如一个通用寄存器)中。所述第二操作数可以被所述新线程用作指向存储器中的数据结构的指针,所述存储器包含所述新线程所需的数据,如初始通用寄存器组值。所述第二操作数使得所述新线程通过从所述数据结构中读取所述新线程所需的寄存器来仅仅提供所述寄存器。因为本发明的发明人注意到新线程通常仅需要提供一到五个寄存器,所以这是有利的。例如许多现今的微处理器包含有32个通用寄存器,在通常的情况下,本发明的微处理器可避免用于将整个父线程寄存器组复制到新线程寄存器组的无用努力。The present invention provides a single instruction in a multithreaded microprocessor instruction set that, when executed, allocates a thread context for a new thread and schedules execution of the new thread. In one embodiment, the instructions occupy a single instruction issue slot in a microprocessor in a reduced instruction set computer (RISC) like manner. The instruction has very low overhead because it foregoes copying the entire parent thread context to the thread, which would take a considerable amount of time if copying the thread context sequentially, or would require A huge data path with multiple logic. Instead, the instruction includes a first operand which is an initial instruction fetch address to be stored into the program counter of the new thread context, and a second operand which is stored in In a register (such as a general-purpose register) in the register set of the new thread context. The second operand may be used by the new thread as a pointer to a data structure in memory containing data needed by the new thread, such as an initial general purpose register set value. The second operand causes the new thread to provide only the registers needed by the new thread by reading the registers from the data structure. This is advantageous because the inventors of the present invention noted that new threads typically only need to provide one to five registers. For example, many present-day microprocessors contain 32 general-purpose registers, and the microprocessor of the present invention avoids the useless effort of copying the entire parent thread's register set to the new thread's register set in the usual case.
在一个实施例中,所述指令包含第三操作数,该第三操作数用于指定所述新线程上下文中的哪一个寄存器将要接收所述第二操作数。在一个实施例中,所述指令可以由用户模式代码加以执行,有利于避免操作系统在一般情况下创建新线程的需要。具有执行新线程上下文分配和新线程调度的单一指令的另一个好处是,可以在要求多个指令来创建和调度新线程的实现期间节省指令集中的宝贵操作码空间。如果在执行所述指令时,没有自由的线程上下文可用于分配,则通过发出异常给所述指令,本指令可以在单一指令内执行两个功能。In one embodiment, the instruction includes a third operand specifying which register in the new thread context is to receive the second operand. In one embodiment, the instructions can be executed by user-mode code, advantageously avoiding the need for the operating system to create new threads in general. Another benefit of having a single instruction that performs new thread context allocation and new thread scheduling is that valuable opcode space in the instruction set can be saved during implementations that require multiple instructions to create and schedule new threads. This instruction can perform two functions within a single instruction by raising an exception to the instruction if no free thread context is available for allocation when the instruction is executed.
在一个方面,本发明提供一种用于在被配置为执行并行程序线程的微处理器上执行的指令。所述指令包括一操作码,用于指示该微处理器为新线程分配资源,且调度该新线程在该微处理器上的执行。所述资源包含程序计数器以及一寄存器组。所述指令还包括第一操作数,用于指定被存储在为新线程分配的程序计数器中的初始指令提取地址。所述指令还包括第二操作数,该第二操作数存储在为所述新线程分配的所述寄存器组中的一个寄存器中。In one aspect, the invention provides instructions for execution on a microprocessor configured to execute parallel program threads. The instruction includes an opcode for instructing the microprocessor to allocate resources for the new thread and to schedule execution of the new thread on the microprocessor. The resources include a program counter and a set of registers. The instruction also includes a first operand specifying an initial instruction fetch address to be stored in a program counter allocated for the new thread. The instruction also includes a second operand stored in a register in the set of registers allocated for the new thread.
在另一个方面,本发明提供一种多线程微处理器。所述微处理器包括多个线程上下文,每一个线程上下文被配置为存储线程的状态,且指示该线程上下文是否可用于分配。所述微处理器还包括一调度器,耦接到所述多个线程上下文,用于响应于当前执行的线程中的单一指令,将所述多个线程上下文中的一个分配给新线程,且调度该新线程的执行。如果所述多个线程上下文都不可用于分配,则所述微处理器会发出异常给该指令。In another aspect, the invention provides a multithreaded microprocessor. The microprocessor includes a plurality of thread contexts, each thread context configured to store a state of a thread and indicate whether the thread context is available for allocation. The microprocessor also includes a scheduler coupled to the plurality of thread contexts for assigning one of the plurality of thread contexts to a new thread in response to a single instruction in a currently executing thread, and Schedules execution of this new thread. If none of the plurality of thread contexts is available for allocation, the microprocessor issues an exception to the instruction.
在另一方面,本发明提供一种多线程微处理器。所述微处理器包括第一程序计数器,用于将指令的提取地址存储在第一程序线程中。所述微处理器还包括第一寄存器组,该第一寄存器组包括由所述指令指定的第一和第二寄存器,用于分别存储第一和第二操作数,所述第一操作数指定第二程序线程的提取地址。所述微处理器还包括第二程序计数器,该第二程序计数器与所述第一寄存器组耦接,用于响应于所述指令,从所述第一寄存器接收第一操作数。所述微处理器还包括第二寄存器组,该第二寄存器组与该第一寄存器组耦接,包含第三寄存器,用于响应于所述指令,从所述第二寄存器接收第二操作数。所述微处理器还包括一调度器,该调度器与所述第一和第二寄存器组耦接,用于使所述微处理器响应于所述指令,从在所述第二程序计数器中存储的第二线程初始提取地址提取指令且执行所述提取的指令。In another aspect, the invention provides a multi-threaded microprocessor. The microprocessor includes a first program counter for storing fetch addresses of instructions in the first program thread. The microprocessor also includes a first set of registers including first and second registers specified by the instruction for storing first and second operands respectively, the first operand specifying Fetch address for the second program thread. The microprocessor also includes a second program counter coupled to the first set of registers for receiving a first operand from the first registers in response to the instruction. The microprocessor also includes a second set of registers coupled to the first set of registers including a third register for receiving a second operand from the second register in response to the instruction . The microprocessor also includes a scheduler coupled to the first and second register sets for causing the microprocessor to, in response to the instruction, read from the second program counter The stored second thread initially fetches an address fetch instruction and executes the fetched instruction.
在另一方面,本发明提供一种用于创建多线程处理器上的执行的新线程的方法。所述方法包括对在第一程序线程中执行的单一指令进行解码。所述方法还包括响应于对所述指令进行译码,为第二程序线程分配该微处理器的程序计数器和寄存器组。所述方法还包括响应于为所述第二程序线程分配所述程序计数器和寄存器组,将所述指令的第一操作数存储在所述寄存器组中的一个寄存器中。所述方法还包括响应于为所述第二程序线程分配所述程序计数器和寄存器组,将所述指令的第二操作数存储在所述程序计数器中。所述方法还包括在存储所述第一和第二操作数之后,对第二程序线程在所述微处理器上的执行的进行调度。In another aspect, the invention provides a method for creating a new thread of execution on a multithreaded processor. The method includes decoding a single instruction executing in a first program thread. The method also includes allocating a program counter and register set of the microprocessor to a second program thread in response to decoding the instruction. The method also includes storing a first operand of the instruction in a register of the set of registers in response to allocating the program counter and set of registers for the second program thread. The method also includes storing a second operand of the instruction in the program counter in response to allocating the program counter and register bank for the second program thread. The method also includes scheduling execution of a second program thread on the microprocessor after storing the first and second operands.
在另一方面,本发明提供了一种多线程处理系统。所述系统包括一存储器,该存储器被配置来存储第一线程的派生(fork)指令以及一数据结构,所述fork指令指定用于存储所述数据结构的存储器地址以及第二线程的初始指令地址的寄存器。所述数据结构包含第二线程的初始通用寄存器值。所述系统还包括一微处理器,该微处理器与所述存储器耦接。所述微处理器为第二线程分配一自由的线程上下文,将第二线程初始指令地址存储在该线程上下文的程序计数器中,将该数据结构存储器地址存储在该线程上下文的寄存器中,以及响应于所述fork指令,对所述第二线程的执行进行调度。In another aspect, the invention provides a multithreaded processing system. The system includes a memory configured to store a fork instruction of a first thread and a data structure, the fork instruction specifying a memory address for storing the data structure and an initial instruction address of a second thread register. The data structure includes initial general purpose register values for the second thread. The system also includes a microprocessor coupled to the memory. The microprocessor allocates a free thread context for the second thread, stores the initial instruction address of the second thread in the program counter of the thread context, stores the data structure memory address in the register of the thread context, and responds Based on the fork instruction, the execution of the second thread is scheduled.
在另一方面,本发明提供一种与计算机设备一起使用的计算机程序产品。所述计算机程序产品包括一计算机可用介质,使得计算机可读程序代码包含于该介质中,以导致多线程微处理器。所述计算机可读程序代码包括用于提供第一程序计数器的第一程序代码,以将指令的提取地址存储在第一程序线程中。所述计算机可读程序代码还包括用于提供第一寄存器组的第二程序代码,该第一寄存器组包含由所述指令指定的第一及第二寄存器,用于分别存储第一及第二操作数,所述第一操作数指定第二程序线程的提取地址。所述计算机可读程序代码还包括用于提供第二程序计数器的第三程序代码,该第二程序计数器耦接至该第一寄存器组,以响应于所述指令,从第一寄存器接收第一操作数。该计算机可读程序代码还包括用于提供第二寄存器组的第四程序代码,该第二寄存器组耦接至该第一寄存器组,包含第三寄存器,以响应于所述指令,从第二寄存器接收第二操作数。该计算机可读程序代码还包括用于提供调度器的第五程序代码,该调度器耦接至第一及第二寄存器组,以响应于所述指令,使得该微处理器从在所述第二程序计数器中存储的所述第二程序线程初始提取地址提取执行且执行所述提取的指令。In another aspect, the invention provides a computer program product for use with a computer device. The computer program product includes a computer usable medium such that computer readable program code is embodied on the medium to cause a multi-threaded microprocessor. The computer readable program code includes first program code for providing a first program counter to store a fetch address of an instruction in a first program thread. The computer readable program code further includes second program code for providing a first set of registers comprising first and second registers specified by the instruction for storing first and second registers, respectively operands, the first operand specifying a fetch address for a second program thread. The computer readable program code further includes third program code for providing a second program counter coupled to the first register set for receiving a first register from the first register in response to the instruction. operand. The computer-readable program code also includes fourth program code for providing a second set of registers coupled to the first set of registers, including a third register, to read from the second set of registers in response to the instruction. A register receives a second operand. The computer readable program code also includes fifth program code for providing a scheduler coupled to the first and second register sets, responsive to the instructions, causing the microprocessor to The second program thread initially fetches an address stored in a program counter for execution and executes the fetched instruction.
在另一方面,本发明提供一种在传输介质内包含的计算机数据信号,该计算机数据信号包括计算机可读程序代码,以提供多线程微处理器来执行fork指令。所述程序代码包括用于提供操作码的第一程序代码,以指示该微处理器为一新线程分配资源和对新线程在该微处理器上的执行进行调度,所述资源包含程序计数器以及寄存器组。该程序代码还包括用于提供第一操作数的第二程序代码,以指定将被存储到为所述新线程分配的该程序计数器中的初始指令提取地址。所述程序代码还包括用于提供第二操作数的第三程序代码,以存储在为所述新线程分配的所述寄存器组中的一个寄存器中。In another aspect, the present invention provides a computer data signal embodied within a transmission medium, the computer data signal including computer readable program code to provide a multi-threaded microprocessor to execute a fork instruction. The program code includes first program code for providing an opcode to instruct the microprocessor to allocate resources for a new thread and to schedule execution of the new thread on the microprocessor, the resources including a program counter and register set. The program code also includes second program code for providing a first operand specifying an initial instruction fetch address to be stored into the program counter allocated for the new thread. The program code also includes third program code for providing a second operand for storage in a register of the set of registers allocated for the new thread.
附图说明 Description of drawings
图1是示出根据本发明的计算机系统的方框图;1 is a block diagram illustrating a computer system according to the present invention;
图2是示出根据本发明的图1中的计算机系统的多线程微处理器的方框图;FIG. 2 is a block diagram illustrating a multithreaded microprocessor of the computer system in FIG. 1 according to the present invention;
图3是示出根据本发明的由图2中的微处理器执行的fork指令的方框图;Figure 3 is a block diagram illustrating a fork instruction executed by the microprocessor in Figure 2 according to the present invention;
图4是示出根据本发明的图中的每线程控制寄存器和TCStatus寄存器的方框图;FIG. 4 is a block diagram illustrating a per-thread control register and a TCStatus register in a graph according to the present invention;
图5是示出根据本发明的图2中的微处理器执行图3中的fork指令的操作的流程图。FIG. 5 is a flowchart illustrating the operation of the microprocessor in FIG. 2 executing the fork instruction in FIG. 3 according to the present invention.
发明详述Detailed description of the invention
现在参见图1,示出了根据本发明的计算机系统100的方框图。所述计算机系统100包括与系统接口控制器104耦接的多线程处理器102。所述系统接口控制器104与系统存储器108以及多个输入/输出(I/O)设备106耦接。每一输入/输出(I/O)设备106提供至微处理器102的中断请求线路112。所述计算机系统100可以是但并不局限于通用可编程计算机系统、服务器计算机、工作站计算机、个人计算机、笔记型计算机、个人数字助理(PDA)或嵌入式系统(Embedded system),该嵌入式系统比如是但并不局限于网络路由器或交换机、打印机、海量存储控制器、照相机、扫瞄仪、汽车控制系统等等。Referring now to FIG. 1, there is shown a block diagram of a
所述系统存储器108包括用于存储在微处理器102上执行的程序指令以及用于存储根据所述程序指令将由微处理器处理的数据的存储器,比如随机存取存储器(RAM)或是只读存储器(ROM)。所述程序指令可以包含微处理器102同时执行的多个程序线程。程序线程或线程包括可执行的程序指令序列或程序指令流,以及与所述指令的执行相关联的微处理器102内的状态改变的关联序列。此指令序列,通常,但不必一定,包含一个或多个程序控制指令,比如分支指令。因此,这些指令可能具有或可能不具有连续的存储器地址。包含线程的指令序列来自单一程序。特别是,微处理器102被配置来执行用于创建新程序线程的fork指令,例如,分配微处理器102执行一个线程所需要的微处理器的资源,以及对线程在微处理器102上的执行进行调度,如同下面的详述。The system memory 108 includes memory, such as random access memory (RAM) or read-only memory, for storing program instructions executed on the
系统接口控制器104经由将微处理器102耦接到系统接口控制器104的处理器总线与微处理器102进行交互。在一个实施例中,系统接口控制器104包括用于控制系统存储器108的存储器控制器。在一个实施例中,系统接口控制器104包括本地总线接口控制器,以提供与输入输出(I/O)设备106相连接的总线,比如PCI总线。System interface controller 104 interacts with
输入输出(I/O)设备106可包括但不局限于,用户输入设备,如键盘、鼠标、扫描仪等等;显示设备,比如监视器、打印机等等;存储设备,比如磁盘驱动器、磁带驱动器、光盘驱动器等等;系统外围设备,比如直接存储器存取控制器(DMAC)、计时器、定时器、输入/输出端口等等;网络设备,比如以太网络、光纤信道、无限带宽(Infiniband)或其它高速网络接口的媒体存取控制器(MAC);数据变换设备,比如模拟/数字(A/D)变换器、数字/模拟(D/A)变换器等。输入/输出(I/O)设备106产生至微处理器102的中断信号112来请求服务。有利的是,微处理器102能够同时执行多个程序线程,以处理中断请求线路112上表示的事件,而不需要与存储微处理器102的当前状态、传送对中断服务例程的控制以及在中断服务例程完成后重新恢复状态等相关联的传统开销。Input output (I/O) devices 106 may include, but are not limited to, user input devices such as keyboards, mice, scanners, etc.; display devices such as monitors, printers, etc.; storage devices such as disk drives, tape drives , optical drive, etc.; system peripherals, such as direct memory access controller (DMAC), timers, timers, input/output ports, etc.; network devices, such as Ethernet, Fiber Channel, Infiniband or Media access controllers (MAC) for other high-speed network interfaces; data conversion equipment, such as analog/digital (A/D) converters, digital/analog (D/A) converters, etc. Input/output (I/O) devices 106 generate interrupt signals 112 to
在一个实施例中,计算机系统100包括包含多个多线程微处理器102的多处理系统。在一个实施例中,每个微处理器102提供两个分立的但并不互斥的多线程能力。第一,每一微处理器102包含多个逻辑处理上下文,对于操作系统而言,每个逻辑处理上下文通过共享微处理器102中的资源而表现为一个独立处理元件,在此称为虚拟处理元件(VPE)。对于所述操作系统而言,N个虚拟处理组件(VPE)的微处处理器102表现为类似一N路对称多处理器(SMP),其允许现有具有对称多处理器(SMP)能力的操作系统,可以管理多个虚拟处理元件(VPE)。第二,每一虚拟处理元件(VPE)可以包含多个线程上下文,以同时执行多个线程。因此,所述微处理器102还可以提供一多线程编程模型,其中在通常情况下,可以创建或破坏线程而不需要操作系统的参与,同时可以响应于外部条件(如输入/输出服务事件信号),可以零等待时间对系统服务线程进行调度。In one embodiment,
现在,参见图2,示出了根据本发明的图1中的计算机系统100的多线程微处理器102的方框图。所述微处理器102是包括多个流水线阶段的流水线微处理器。所述微处理器102包括多个线程上下文228,以存储与多个线程相关的状态。线程上下文228包含微处理器102的寄存器集合和/或寄存器中的位,该寄存器中的位描述线程的执行状态。在一个实施例中,线程上下文228包括寄存器组224(如通用寄存器(GPR)组),程序计数器(PC)222以及每线程控制寄存器226。所述每线程控制寄存器226部分的内容将会在下面进行详细描述。图2中的实施例示出了四个线程上下文228,每一个均包括程序计数器222,寄存器组224以及每线程控制寄存器226。在一个实施例中,线程上下文228还包括乘法结果寄存器。在另一个实施例中,寄存器组224中的每一个均包含两个读取端口以及一个写入端口,以支持在单个时钟周期内从两个寄存器中每一个中读取以及向一个寄存器写入。如下所述,FORK指令300包含两个源操作数以及一个目的操作数。因此,微处理器102可以在单个时间周期内执行FORK指令300。Referring now to FIG. 2, there is shown a block diagram of the
对照于线程上下文228,微处理器102也保持着一处理器上下文,其是微处理器102状态的一个较大的集合。在图2的实施例中,所述处理器上下文被存储在每处理器控制寄存器218中。每个虚拟处理元件(VPE)包含其自有的每处理器控制寄存器组218。在一个实施例中,每处理器控制寄存器218中的一个寄存器包含状态寄存器,其含有一指定由异常信号234所发出(raise)的最近分发的线程异常的字段。具体而言,如果一虚拟处理元件(VPE)发出当前线程的fork指令300,但是当前并没有可自由分配的线程上下文228来分配给新线程,则此异常字段会指示线程上溢(overflow)情形。在一个实施例中,微处理器102基本上符合MIPS 32或MIPS 64指令集体系架构(ISA),同时每处理器控制寄存器218基本上符合用于存储MIPS特权资源架构(PRA,Priviledged Resource Architecture)的处理器上下文的寄存器,比如管理微处理器102的资源(比如虚拟存储器,高速缓存存储器,异常以及用户上下文等)的操作系统所必需的机制。In contrast to thread context 228 ,
微处理器102包含调度器216,该调度器216用于对由微处理器并行执行的各种线程的执行进行调度。调度器216与每线程控制寄存器226以及每处理器控制寄存器218耦接。特别的是,调度器216负责调度自各个线程的程序计数器222中提取出指令,以及调度将所述提取出的指令发出给微处理器102的执行单元,如下所述。调度器216根据微处理器102的调度策略来对所述线程的执行进行调度。所述调度策略可以包括但不限于下述调度策略中的任何一个。在一个实施例中,调度器216使用轮转(Round-robin),或时分复用,或交织的调度策略,该调度策略可以按照循环顺序来将预定数目的时钟周期或指令发出时隙分配给每一准备好的线程。轮转策略对于其中要求公平性以及某些线程(比如实时应用程序线程)需要最低服务品质的应用而言是十分有效的。在一个实施例中,调度器216使用阻塞调度策略,在该阻塞调度策略中,调度器216继续对当前正在运行的线程的提取及发出进行调度,直到阻塞所述线程进一步执行的事件产生,比如高速缓存丢失,分支错误预测,数据依赖性,或是长等待时间的指令。在一个实施例中,微处理器102包括超级标量流水线微处理器,调度器216每一个时钟周期对多个指令的发出进行调度,特别是,每一个时钟周期从多个线程中发出指令,通常称为并发多线程。
微处理器102包含指令高速缓存存储器202,用于高速缓存从图1中的系统存储器108中提取出的程序指令,比如图3中的fork指令300。在一个实施例中,微处理器102提供虚拟存储能力,同时提取单元204包含转换监视缓存器(translation lookaside buffer),以高速缓存实体到虚拟存储器的页面转换。在一个实施例中,在微处理器102上执行的每一程序或任务被分配一个惟一任务ID,或地址空间ID(ASID),其可用来执行存储器存取以及尤其是执行存储器地址转换,并且线程上下文228还包括用于存储与该线程相关的地址空间ID(ASID)。在一个实施例中,当父线程执行fork指令300来创建新线程时,所述新线程继承父线程的所述ASID和其地址空间。在一个实施例中,在微处理器102上执行的各个线程共享指令高速缓存存储器202以及转换监视缓存器。在另一个实施例中,每个线程包含其自己的转换监视缓存器。
微处理器102还包括提取单元204,该提取单元204与指令高速缓存存储器202耦接,以从指令高速缓存存储器202和系统存储器108中提取如fork指令300的程序指令。提取单元204提取在由复用器244提供的指令提取地址处的指令。复用器244从对应的多个指令计数器222接收多个指令提取地址。每一个指令计数器222存储用于不同线程的当前指令提取地址。图2中所示的实施例示出了与四个不同的线程相关联的四个程序计数器222。而复用器244基于由调度器216提供的选择输入来选择四个程序计数器222中的一个。在一个实施例中,在微处理器102上执行的各个线程共享提取单元204。The
微处理器102还包括译码单元206,该译码单元206与提取单元204耦接,以对由提取单元204中提取出的程序指令(比如fork指令300)进行译码。译码单元206对指令的操作码、操作数以及其它字段进行译码。在一个实施例中,在微处理器102上执行的各个线程共享译码单元206。The
微处理器102还包括用于执行指令的执行单元212。执行单元212可以包括但是不局限于一个或多个整数(integer)单元,用于执行整数运算、布尔运算、移位运算、旋转运算等等;浮点单元,用于执行浮点运算;读取/存储单元,用于执行存储器存取,特别是对与执行单元212耦接的数据高速缓存存储器242进行存取;以及分支解析单元,用于解析分支指令的结果和目标地址。在一个实施例中,数据高速缓存存储器242包括转换监视缓存器,用于高速缓存实体到虚拟存储器的页面转换。除了从数据高速缓存存储器242接收的操作数外,执行单元212还从寄存器组224的寄存器中接收操作数。特别是,执行单元212从分配给所述指令所属的线程的线程上下文228的寄存器组224中接收操作数。复用器248基于由执行单元212执行的指令的线程上下文228,从合适的寄存器组224中选取操作数,以提供给执行单元212。在一个实施例中,各个执行单元212可以同时执行来自多个并行线程的指令。
执行单元212中的一个负责执行fork指令300,并响应于正在发出fork指令300而使new_thread_request信号232为真值,该真值被提供给调度器216。new_thread_request信号232请求调度器216来分配新线程上下文228,并对与线程上下文228相关的新线程的执行进行调度。如下更为详细的描述,如果新线程上下文228被请求来用于分配但并没有可自由分配的线程可用,则调度器216会使异常信号234为真值,以发出一个异常给fork指令300。在一个实施例中,调度器216会保持可自由分配的线程上下文228的数目的计数,如果当new_thread_request信号232产生时所述数目小于零,则调度器216发出异常234给fork指令300。在另一个实施例中,在new_thread_request信号232产生时,调度器216检验每线程控制寄存器226中的状态位以决定是否可以得到可自由分配的线程上下文228。One of execution units 212 is responsible for executing
微处理器102还包括指令发出单元208,该指令发出单元208与调度器216耦接,并且耦接在译码单元206与执行单元212之间,以根据调度器216的指示且响应于译码单元206译码出的指令信息而将指令发出到执行单元212。特别的是,指令发出单元208确保如果指令对先前发到执行单元212的其他指令具有数据依赖性,则该指令不被发到执行单元212。在一个实施例中,指令队列被强加在译码单元206与指令发出单元208之间,以缓存等待发到执行单元212的指令,从而减少执行单元212“饥饿(starvation)”的可能性。在一个实施例中,在微处理器102上执行的各个线程共享指令发出单元208。The
微处理器102还包括回写单元214,该回写单元214与执行单元212耦接,以将完成的指令的结果写回到寄存器组224。解复用器246从回写单元214接收指令结果,并将指令结果存储到与完成的指令的线程相关的合适寄存器组224中。
现在,参见图3,示出了根据本发明的图2中的微处理器102执行的fork指令300的方框图。如图所示,FORK指令300的表示方式为fork rd,rs,rt,其中rd,rs,rt是FORK指令300的三个操作数。图3示出了所述FORK指令的各个字段,位26到31是操作码字段302,而位0到5是功能字段314。在一个实施例中,操作码字段302指示指令是MIPS指令集中的SPECIAL 3型指令,功能字段314指示所述功能是FORK指令。因此,图2中的译码单元206检验操作码字段302与功能字段314,以决定指令是否是FORK指令300。位6到10被保留为0。Referring now to FIG. 3, there is shown a block diagram of a
位21到25、16到20和11到15分别是rs字段304、rt字段306和rd字段308,分别用来指示图2中的寄存器组224中的一组寄存器中的rs寄存器324、rt寄存器326和rd寄存器328。在一个实施例中,rs寄存器324、rt寄存器326和rd寄存器328中的每个分别是MIPS ISA中的32个通用寄存器中的一个。rs寄存器324和rt寄存器326均是分配给其中包含FORK指令300的线程的寄存器组224中的寄存器中的一个,所述线程被称之为父线程,或派生线程,或当前线程。rd寄存器328是分配给FORK指令300所创建的线程的寄存器器组224中的一个寄存器,所述线程被称之为新线程,或子线程。
如图3中所示,FORK指令300指示微处理器102将父线程的rs寄存器324的值复制到新线程的程序计数器222中。所述新线程程序计数器222将被用作新线程的初始指令提取地址。As shown in Figure 3, the
此外,FORK指令300指示微处理器102将父线程的rt寄存器326的值复制到新线程的rd寄存器328中。在一典型的程序操作中,所述程序将所述rd寄存器328的值用作新线程的数据结构的存储器地址。这使得FORK指令300可以放弃将整个父线程的寄存器组324的内容复制到新线程寄存器组324中,由此有利于使FORK指令300更简洁及更有效率,且可在单一时钟周期内执行。取而代之,新线程包含指令来通过从数据结构中读取寄存器值而仅仅提供新线程所需的寄存器,该数据结构存在于数据高速缓存存储器242中的几率较高。因为本发明的发明人确定新线程通常仅需要提供一到五个寄存器,而不是在通常在许多当前微处理中发现的大数目的寄存器(如MIPS指令集中的32个通用寄存器),所以这是有利的。欲在单个时钟周期内复制整个寄存器组324,则在微处理器102中的各个线程上下文228之间需要一个不可实现的宽数据路径,同时顺序地复制整个寄存器组324(每个时钟周期复制一到两个寄存器),会花费更多的时间同时要求微处理器102更为复杂。然而,此FORK指令300可以有利地在单个时钟周期内按照RISC形式来执行。Additionally, the
有利的是,不只是在微处理器102上执行的操作系统可以使用FORK指令300来为新线程分配资源且对所述新线程的执行进行调度,而且用户级线程也可以如此进行。对于相对频繁地创建和终止相对短的线程的程序而言,这个事实特别有利。例如,包含大量具有短的循环主体且在迭代之间没有数据依赖性的循环的程序可以受益于FORK指令300的小线程创建开销。假设代码循环如下:Advantageously, not only can an operating system executing on
for(i=0;i<N;i++){for(i=0; i<N; i++){
result[i]=FUNCTION(x[i],y[i]); result[i] = FUNCTION(x[i], y[i]);
}}
线程创建和破坏的开销越小,FUNCTION指令序列就越短,并且仍可以被有用地并行于多个线程中。如果与创建和破坏一个新线程相关联的开销为100个指令的数量级,如同传统线程创建机制的情形中一样,那么FUNCTION必须是多个指令长,使得可以从将所述循环并行于于多个线程中获得,如果有的话,更多的益处。然而,FORK指令300开销是如此地小(在一个实施例中仅是一个时钟周期)的事实有利地暗示,即使非常小的代码区间,也可以从并行于多个线程中受益。The less overhead of thread creation and destruction, the shorter the sequence of FUNCTION instructions can be and still be usefully parallelized across multiple threads. If the overhead associated with creating and destroying a new thread is on the order of 100 instructions, as is the case with traditional thread creation mechanisms, then FUNCTION must be multiple instructions long so that the loop can be parallelized across multiple Threads gain, if any, more benefit. However, the fact that the
虽然图3仅示出将rt寄存器326和rs寄存器324的值从父线程上下文228复制到新线程上下文228中,但是响应于FORK指令300,其它状态,或内容,也可以被复制,如下参照图4进行的描述。Although Fig. 3 only shows that the value of the rt register 326 and the rs register 324 is copied from the parent thread context 228 to the new thread context 228, other states, or contents, may also be copied in response to the
现在,参照图4,示出了根据本发明的图2中的每线程控制寄存器226和TCStatus寄存器400中的一个的方框图。也就是,每个线程上下文228包含TCStatus寄存器400。TCStatus寄存器400中的各个字段将会在图4的表格中进行描述;然而,明显与FORK指令300相关的特定字段将会进行更详细地描述。Referring now to FIG. 4, there is shown a block diagram of one of the per-thread control register 226 and the TCStatus register 400 of FIG. 2 in accordance with the present invention. That is, each thread context 228 contains a
TCStatus寄存器400包含TCU字段402。在一个实施例中,根据MIPS指令集或特权资源体系架构(PRA),微处理器102包含一单独的微处理器核心以及一个或多个共处理器。TCU字段402控制线程是否可以存取或被限制于一特定的共处理器。在图4的实施例中,TCU字段402允许控制至多四个共处理器。在一个实施例中,FORK指令300指示微处理器102将父线程中的TCU字段402的值复制到由FORK指令300创建的新线程中的TCU字段402中。The TCStatus register 400 contains a
TCStatus寄存器400还包含DT位406,该DT位406指示线程上下文228是否是“脏(dirty)”的。DT位406可以被操作系统使用来确保不同程序间的安全。例如,使用FORK指令300来动态地分配线程上下文228,且同时在不同的安全域使用微处理器102的YIELD指令来释放(deallocate)所述线程上下文228,如,通过多个应用程序或是通过操作系统及应用程序两者,存在由应用程序继承的寄存器值形式的信息泄漏的风险,其必须由安全操作系统进行管理。无论线程上下文228是否被修改,与线程上下文228相关的DT位406可以被软件清除,同时由微处理器102来设置。所述操作系统可以将全部的线程上下文228初始化为已知的清洁状态,并且在调度任务之前清除所有与所述线程上下文228相关的DT位406。当任务切换发生时,其DT位406被设置的线程上下文228必须在其它作业被允许分配和使用它们之前被清除至一清洁状态。如果安全操作系统希望动态地创建线程和分配特权服务线程,则相关线程上下文228必须在将它们输送来由应用程序潜在使用之前被清除。读者可参阅在本发明的开始提及的、共同提交的题目为“Integrated mechanism for suspensionand deallocation of computational threads of execution in a processor”美国共同待审专利申请,该申请的案卷号为MIPS.0189-01US,在该申请中,对YIELD指令进行了详尽的描述。The
TCStatus寄存器400还包括DA状态位412,该DA状态位412指示线程上下文228是否是由FORK指令300动态地分配和调度,以及由YIELD指令动态地释放。在一个实施例中,线程上下文228的一部分由FORK指令300进行动态地分配,而线程上下文228的另一部分不是由FORK指令300进行动态地分配,而是将线程上下文228静态地分配到一程序的永久线程中。例如,一个或多个线程上下文228可以被静态地分配到此操作系统的一部分中,而不是由FORK指令300进行动态地分配。在另一个例子中,在嵌入式系统中,一个或多个线程上下文228可以被静态地分配到特权服务线程,在传统处理器中,该特权服务线程的功能类似于用于服务于中断源的中断服务例程,该中断服务例程被熟知为所述应用程序中的必需的部分。例如,在一网络路由器中,一个或多个线程上下文228可以被静态地分配到用于处理由一组输入/输出端口发出的事件的线程中,这可能产生由在此描述的微处理器102的单个时间周期线程切换高效处理的大量事件,但是对于不得不导致与进行大量中断事件、存储其相关的状态和向该中断服务例程传送控制权相关的开销的其它的微处理器而言,在此描述的微处理器102具有优势。The
在一个实施例中,DA位412可以被一操作系统使用来处理在应用程序间的线程上下文228的共享。例如,当FORK指令300可以在没有自由的线程上下文228可用于分配时尝试分配线程上下文228,在此种情况下,微处理器102会发出一个线程上溢异常234给FORK指令300。与之响应,操作系统存储当前值的一个复制,然后清除所有线程上下文228的DA位412。在下一次由应用程序释放线程上下文时,线程下溢异常234将会被发出,与之响应,操作系统将会恢复响应于线程下溢异常234而存储的DA位412,并且调度产生原始线程上溢异常234的FORK指令300的重放。In one embodiment, DA
TCStatus寄存器400还包括A位414,该A位414指示与线程上下文228相关的线程是否是处于激活状态。当所述线程处于激活状态时,调度器216将会根据调度器216的调度策略,对从其程序计数器222提取指令和发出所述指令进行调度。当FORK指令300动态地分配线程上下文228时,调度器216自动设置此A位414,且当一YIELD指令动态地释放线程上下文228时,调度器216自动清除此A位414。在一个实施例中,当微处理器102重置时,线程上下文228中之一会被指派为重置的线程上下文228,以执行此微处理器102的初始线程。响应于微处理器102的重置,所述重置的线程上下文228的A位414会被自动设置。The
TCStatus寄存器400还包括TKSU字段416,该TKSU字段指示线程上下文228的特权状态或级别。在一个实施例中,此特权可以是以下三个级别之一:核心、管理者或是用户。在一个实施例中,FORK指令300指示微处理器102将父线程的TKSU字段416的值复制到由该FORK指令300创建的新线程的TKSU字段416中。The
TCStatus寄存器400还包括TASID字段422,该TASID字段指定线程上下文228的地址空间ID(ASID),或惟一任务ID。在一个实施例中,FORK指令300指示微处理器102将父线程的TASID字段422的值复制到由该FORK指令300创建的新线程的TASID字段422中,使得父线程和新线程共享相同的地址空间。The
在一个实施例中,每线程控制寄存器226还包括用于存储停止位的寄存器,通过设置该停止位,使得软件能够停止一线程,比如,将此线程上下文228置于一停止状态。In one embodiment, the per-thread control register 226 also includes a register for storing a stop bit. By setting the stop bit, software can stop a thread, eg, put the thread context 228 in a stopped state.
现在,参见图5,示出了依照本发明的图2中的微处理器102执行图3中的FORK指令300的流程图。此流程自方块502开始。Referring now to FIG. 5, there is shown a flow diagram of the execution of the
在方块502,提取单元204使用当前线程的程序计数器222提取FORK指令300,译码单元206对该FORK指令300进行译码,同时指令发出单元208将该FORK指令300发出到图2中的执行单元212。此流程继续至方块504。In
在方块504,执行单元212经由new_thread_request信号232指示FORK指令300正在请求将被分配和调度的新线程上下文228。此流程继续至方块506。At block 504 , the execution unit 212 indicates via the
在方块506,调度器216确定是否有自由的线程上下文228可用于分配。在一个实施例中,调度器216会保持一计数器,该计数器指示可自由分配的线程上下文228的数目,每次YIELD指令释放线程上下文228后,所述数目就加一,而在每次FORK指令300分配线程上下文228后,所述数目就减一,并且调度器216通过确定此计数器的值是否大于零,来确定是否存在可自由分配的线程上下文228。在另一个实施例中,调度器216会检验每线程控制寄存器226中的状态位,比如图4中的TCStatus寄存器400中的DA位412和A位414,以及停止位,以确定是否存在可自由分配的线程上下文228。当线程上下文228未处于激活或停止状态且不是一个静态分配的线程上下文228时,则该线程上下文228是可自由分配的线程上下文228。如果线程上下文228可用于自由分配,则此流程继续至方块508;否则,此流程继续至方块522。At
在方块508,调度器216响应于FORK指令300而将为新线程分配一个可自由分配的线程上下文228。此流程继续至方块512。At
在方块512,父线程上下文228的rs寄存器324的值被复制到新线程上下文228的程序计数器222中,父线程上下文228的rt寄存器326的值也会被复制到新线程上下文228的rd寄存器328中,如图3所示,以及与FORK指令300相关的其它上下文,如图4中所描述,也从父线程上下文228复制到所述新线程上下文228中。此流程继续至方块514。In
在方块514,调度器216对执行的新线程上下文228进行调度。也就是,调度器216将该线程上下文228加入到当前已准备好执行的线程上下文228列表中,使得提取单元204依据调度策略的限制,开始从线程上下文228的程序计数器222提取和发出指令。此流程继续至方块516。At
在方块516,提取单元204开始提取在新线程上下文228的程序计数器222中的指令。此流程继续至方块518。At
在方块518,新线程的指令在需要时将寄存器组224提供给新线程上下文228的寄存器。如前所述,新线程的程序指令通常会从由rd寄存器328值指定的存储器中的数据结构提供寄存器组224。此流程于方块518结束。At
在方块522,调度器216向FORK指令300发出线程上溢异常234,以表示在FORK指令300执行时,没有自由的线程上下文228可用来分配。此流程继续至方块524。At
在方块524,此操作系统中的异常处理器创建一个条件,在该条件下,被分配的线程上下文228被释放来用于FORK指令300,如先前针对图4中的DA位412进行的描述。此流程继续至方块526。At
在方块526,此操作系统重新发出造成方块522中的异常234的FORK指令300,现在因为可自由分配的线程上下文228可以得到而可以成功地执行,如先前针对图4中的DA位412进行的描述。此流程结束于方块526。At
虽然本发明及其目的、特征、与优点都已经进行了详细地描述,但是本发明包括其它实施例。例如,虽然已经描述了其中新线程上下文228在与父线程上下文相同的VPE上被分配的实施例,但是在另一实施例中,如果父VPE检测在VPE上没有可自由分配的线程上下文,则所述VPE会尝试另一VPE上的远程FORK指令300。特别是,VPE确定另一VPE是否具有可自由分配的线程上下文以及是否具有与父线程上下文相同的地址空间,如果是,则将FORK指令信息分组发送到其它的VPE,以使得其它的VPE能够分配及调度此自由的线程上下文。除此之外,在此描述的FORK指令并不限于在一可同时执行多线程来解决特定潜在事件的微处理器上使用,也可以在一可针对高速缓存失误、错误预测分支、长时间指令等进行多线程的微处理器上执行。此外,在此描述的FORK指令也可以在标量或超级标量微处理器中执行。另外,在此处所描述的FORK指令也可以在具有不同调度策略的微处理器中执行。此外,虽然已经描述了其中rt寄存器值被复制到一新线程上下文的寄存器中的FORK指令的实施例,但是也可以期望其它的实施例,其中rt寄存器值可以通过其他方式被提供给新线程上下文,比如通过存储器。最后,虽然已经描述了其中FORK指令的操作数被存储在通用寄存器中的实施例,但是在其它的实施例中,可通过其它方式,如经由存储器或经由非通用寄存器,来存储操作数。例如,虽然已经描述了其中微处理器是基于寄存器的微处理器的实施例,但是也可以期望其他实施例,在其它的实施例中,微处理器可以是基于堆栈的微处理器,比如处理器被配置为有效地执行Java虚拟机器程序代码。在此实施例中,FORK指令的操作数在存储器的操作数堆栈中被指定,而不是在一寄存器中。例如,每一线程上下文可以包括一堆栈指针寄存器,且FORK指令的字段可以指示该FORK指令的操作数在堆栈存储器中相对于堆栈指针寄存器的偏移,而不是指定微处理器的寄存器空间中的寄存器。While the invention and its objects, features, and advantages have been described in detail, the invention encompasses other embodiments. For example, although an embodiment has been described in which the new thread context 228 is allocated on the same VPE as the parent thread context, in another embodiment, if the parent VPE detects that there is no freely assignable thread context on the VPE, then The VPE will attempt a
除了使用硬件实现本发明之外,本发明也可以体现在嵌入至计算机可用(如可读)介质的软件中(如计算机可读代码、程序代码、指令与/或数据)。这些软件实现在此描述的装置与方法的功能、制造、模块、仿真、描述且/或测试。例如,这可通过使用通用程序语言(如C、C++、JAVA等)、GDSII数据库、包括Vorilog HDL、VHDL等的硬件描述语言(HDL)、或是其它可用程序、数据库、与/或电路(亦即示意图)捕获工具来实现。此种软件可安置于任何计算机可用(如可读)介质中,这些介质可包括半导体存储器、磁盘、光盘(如CD-ROM、DVD-ROM等)等,且可作为计算机数据信号包含在计算机可用(可读)传输介质(如载波或任何其它包括数字、光学、或基于模拟的介质的介质)中。同样,软件可通过包括因特网与内部网络等通信网路来进行传输。应该理解的是,本发明可以体现为软件、且可转化为作为集成电路产品的部分的硬件,前述软件例如在HDL中,可作为半导体知识产权核心(如微处理器核心)的一部分,或作为系统级设计,比如片上系统或SOC。同样,本发明也可以利用软件与硬件的组合来实现。Instead of implementing the invention in hardware, the invention can also be embodied in software (eg, computer readable code, program code, instructions and/or data) embodied in a computer-usable (eg, readable) medium. Such software enables the function, fabrication, modeling, simulation, description and/or testing of the devices and methods described herein. For example, this can be achieved through the use of general purpose programming languages (such as C, C++, JAVA, etc.), GDSII databases, hardware description languages (HDL) including Vorilog HDL, VHDL, etc., or other available programs, databases, and/or circuits (also That is, the schematic diagram) capture tool to achieve. Such software can be placed on any computer usable (eg, readable) medium, which can include semiconductor memory, magnetic disk, optical disk (such as CD-ROM, DVD-ROM, etc.), etc., and can be included as a computer data signal in a computer usable In a (readable) transmission medium such as a carrier wave or any other medium including digital, optical, or analog-based media. Likewise, software may be transmitted over communication networks including the Internet and intranets. It should be understood that the present invention may be embodied in software, and may be translated into hardware as part of an integrated circuit product, such as in HDL, as part of a semiconductor intellectual property core (such as a microprocessor core), or as System-level design, such as a system-on-chip or SOC. Likewise, the present invention can also be realized by a combination of software and hardware.
最后,本领域的技术人员应该明白的是,他们可以容易将所述公开的概念和具体实施例用作基础,来设计或修改其他结构,以实现本发明的相同目的,而不会偏离本发明的精神和范围,本发明的范围应该由所附权利要求来限定。Finally, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the invention. Rather, the scope of the invention should be defined by the appended claims.
Claims (34)
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US49918003P | 2003-08-28 | 2003-08-28 | |
US60/499,180 | 2003-08-28 | ||
US60/502,358 | 2003-09-12 | ||
US60/502,359 | 2003-09-12 | ||
US10/684,350 | 2003-10-10 | ||
US10/684,348 | 2003-10-10 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1842769A CN1842769A (en) | 2006-10-04 |
CN100489784C true CN100489784C (en) | 2009-05-20 |
Family
ID=37031160
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200480024800 Pending CN1842770A (en) | 2003-08-28 | 2004-08-26 | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
CNB2004800247988A Expired - Fee Related CN100489784C (en) | 2003-08-28 | 2004-08-27 | Multithreading microprocessor and its novel threading establishment method and multithreading processing system |
CNB2004800248016A Expired - Fee Related CN100538640C (en) | 2003-08-28 | 2004-08-27 | The device of dynamic-configuration virtual processor resources |
CN2004800248529A Expired - Fee Related CN1846194B (en) | 2003-08-28 | 2004-08-27 | Method and device for executing Parallel programs thread |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 200480024800 Pending CN1842770A (en) | 2003-08-28 | 2004-08-26 | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004800248016A Expired - Fee Related CN100538640C (en) | 2003-08-28 | 2004-08-27 | The device of dynamic-configuration virtual processor resources |
CN2004800248529A Expired - Fee Related CN1846194B (en) | 2003-08-28 | 2004-08-27 | Method and device for executing Parallel programs thread |
Country Status (1)
Country | Link |
---|---|
CN (4) | CN1842770A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038019A (en) * | 2015-10-02 | 2017-08-11 | 联发科技股份有限公司 | Method for processing instruction in single instruction multiple data computing system and computing system |
CN114691565A (en) * | 2020-12-29 | 2022-07-01 | 新唐科技股份有限公司 | Direct memory access device and electronic equipment using same |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9417914B2 (en) * | 2008-06-02 | 2016-08-16 | Microsoft Technology Licensing, Llc | Regaining control of a processing resource that executes an external execution context |
WO2010095182A1 (en) * | 2009-02-17 | 2010-08-26 | パナソニック株式会社 | Multithreaded processor and digital television system |
GB2474521B (en) * | 2009-10-19 | 2014-10-15 | Ublox Ag | Program flow control |
US8561070B2 (en) | 2010-12-02 | 2013-10-15 | International Business Machines Corporation | Creating a thread of execution in a computer processor without operating system intervention |
CN102183922A (en) * | 2011-03-21 | 2011-09-14 | 浙江机电职业技术学院 | Method for realization of real-time pause of affiliated computer services (ACS) motion controller |
WO2011127862A2 (en) * | 2011-05-20 | 2011-10-20 | 华为技术有限公司 | Method and device for multithread to access multiple copies |
CN102831053B (en) * | 2011-06-17 | 2015-05-13 | 阿里巴巴集团控股有限公司 | Scheduling method and device for test execution |
US9507638B2 (en) * | 2011-11-08 | 2016-11-29 | Nvidia Corporation | Compute work distribution reference counters |
CN102750132B (en) * | 2012-06-13 | 2015-02-11 | 深圳中微电科技有限公司 | Thread control and call method for multithreading virtual assembly line processor, and processor |
CN103973600B (en) * | 2013-02-01 | 2018-10-09 | 德克萨斯仪器股份有限公司 | Merge and deposit the method and device of field instruction for packet transaction rotation mask |
JP6122749B2 (en) * | 2013-09-30 | 2017-04-26 | ルネサスエレクトロニクス株式会社 | Computer system |
CN108228321B (en) * | 2014-12-16 | 2021-08-10 | 北京奇虎科技有限公司 | Android system application closing method and device |
US9747108B2 (en) * | 2015-03-27 | 2017-08-29 | Intel Corporation | User-level fork and join processors, methods, systems, and instructions |
US10346168B2 (en) | 2015-06-26 | 2019-07-09 | Microsoft Technology Licensing, Llc | Decoupled processor instruction window and operand buffer |
US9720693B2 (en) * | 2015-06-26 | 2017-08-01 | Microsoft Technology Licensing, Llc | Bulk allocation of instruction blocks to a processor instruction window |
US10169105B2 (en) * | 2015-07-30 | 2019-01-01 | Qualcomm Incorporated | Method for simplified task-based runtime for efficient parallel computing |
GB2544994A (en) * | 2015-12-02 | 2017-06-07 | Swarm64 As | Data processing |
CN105700913B (en) * | 2015-12-30 | 2018-10-12 | 广东工业大学 | A kind of parallel operation method of lightweight bare die code |
US10761849B2 (en) * | 2016-09-22 | 2020-09-01 | Intel Corporation | Processors, methods, systems, and instruction conversion modules for instructions with compact instruction encodings due to use of context of a prior instruction |
GB2569275B (en) * | 2017-10-20 | 2020-06-03 | Graphcore Ltd | Time deterministic exchange |
GB2569098B (en) * | 2017-10-20 | 2020-01-08 | Graphcore Ltd | Combining states of multiple threads in a multi-threaded processor |
GB201717303D0 (en) * | 2017-10-20 | 2017-12-06 | Graphcore Ltd | Scheduling tasks in a multi-threaded processor |
CN109697084B (en) * | 2017-10-22 | 2021-04-09 | 刘欣 | Fast access memory architecture for time division multiplexed pipelined processor |
CN108536613B (en) * | 2018-03-08 | 2022-09-16 | 创新先进技术有限公司 | Data cleaning method and device and server |
CN110768807B (en) * | 2018-07-25 | 2023-04-18 | 中兴通讯股份有限公司 | Virtual resource method and device, virtual resource processing network element and storage medium |
CN110955503B (en) * | 2018-09-27 | 2023-06-27 | 深圳市创客工场科技有限公司 | Task scheduling method and device |
GB2580327B (en) * | 2018-12-31 | 2021-04-28 | Graphcore Ltd | Register files in a multi-threaded processor |
CN111414196B (en) * | 2020-04-03 | 2022-07-19 | 中国人民解放军国防科技大学 | A method and device for implementing a zero-value register |
CN112395095A (en) * | 2020-11-09 | 2021-02-23 | 王志平 | Process synchronization method based on CPOC |
CN112579278B (en) * | 2020-12-24 | 2023-01-20 | 海光信息技术股份有限公司 | Central processing unit, method, device and storage medium for simultaneous multithreading |
CN115129369B (en) * | 2021-03-26 | 2025-03-28 | 上海阵量智能科技有限公司 | Command distribution method, command distributor, chip and electronic device |
CN113946445B (en) * | 2021-10-15 | 2025-02-25 | 杭州国芯微电子股份有限公司 | A multi-thread module and multi-thread control method based on ASIC |
CN116701085B (en) * | 2023-06-02 | 2024-03-19 | 中国科学院软件研究所 | Form verification method and device for consistency of instruction set design of RISC-V processor Chisel |
CN116954950B (en) * | 2023-09-04 | 2024-03-12 | 北京凯芯微科技有限公司 | Inter-core communication method and electronic equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5812811A (en) * | 1995-02-03 | 1998-09-22 | International Business Machines Corporation | Executing speculative parallel instructions threads with forking and inter-thread communication |
-
2004
- 2004-08-26 CN CN 200480024800 patent/CN1842770A/en active Pending
- 2004-08-27 CN CNB2004800247988A patent/CN100489784C/en not_active Expired - Fee Related
- 2004-08-27 CN CNB2004800248016A patent/CN100538640C/en not_active Expired - Fee Related
- 2004-08-27 CN CN2004800248529A patent/CN1846194B/en not_active Expired - Fee Related
Non-Patent Citations (1)
Title |
---|
Design,Implementierung und Evaluierung einer virtuellenMaschine fur Oz. Ralf,Scheidhauer.. 1998 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038019A (en) * | 2015-10-02 | 2017-08-11 | 联发科技股份有限公司 | Method for processing instruction in single instruction multiple data computing system and computing system |
CN114691565A (en) * | 2020-12-29 | 2022-07-01 | 新唐科技股份有限公司 | Direct memory access device and electronic equipment using same |
TWI775259B (en) * | 2020-12-29 | 2022-08-21 | 新唐科技股份有限公司 | Direct memory access apparatus and electronic device using the same |
CN114691565B (en) * | 2020-12-29 | 2023-07-04 | 新唐科技股份有限公司 | Direct memory access device and electronic equipment using same |
Also Published As
Publication number | Publication date |
---|---|
CN1842769A (en) | 2006-10-04 |
CN100538640C (en) | 2009-09-09 |
CN1842771A (en) | 2006-10-04 |
CN1846194A (en) | 2006-10-11 |
CN1842770A (en) | 2006-10-04 |
CN1846194B (en) | 2010-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100489784C (en) | Multithreading microprocessor and its novel threading establishment method and multithreading processing system | |
JP4818918B2 (en) | An instruction that starts a concurrent instruction stream on a multithreaded microprocessor | |
US7418585B2 (en) | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts | |
US7836450B2 (en) | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts | |
US7870553B2 (en) | Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts | |
US9032404B2 (en) | Preemptive multitasking employing software emulation of directed exceptions in a multithreading processor | |
US7849297B2 (en) | Software emulation of directed exceptions in a multithreading processor | |
US20140115594A1 (en) | Mechanism to schedule threads on os-sequestered sequencers without operating system intervention | |
US20050050305A1 (en) | Integrated mechanism for suspension and deallocation of computational threads of execution in a processor | |
WO2005022384A1 (en) | Apparatus, method, and instruction for initiation of concurrent instruction streams in a multithreading microprocessor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: American California Patentee after: Imagination Technologies Ltd. Address before: American California Patentee before: Imagination Technology Co.,Ltd. Address after: American California Patentee after: Imagination Technology Co.,Ltd. Address before: American California Patentee before: Mips Technologies, Inc. |
|
CP01 | Change in the name or title of a patent holder | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090520 Termination date: 20200827 |
|
CF01 | Termination of patent right due to non-payment of annual fee |