CN104932945B

CN104932945B - A kind of out of order multi-emitting scheduler of task level and its dispatching method

Info

Publication number: CN104932945B
Application number: CN201510342408.1A
Authority: CN
Inventors: 张多利; 张扬; 宋宇鲲; 杜高明
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2015-06-18
Filing date: 2015-06-18
Publication date: 2018-05-18
Anticipated expiration: 2035-06-18
Also published as: CN104932945A

Abstract

The invention discloses a kind of out of order multi-emitting scheduler of task level and its dispatching method, it is characterized in that, scheduler includes：Reservation station, selection wakeup unit and managing computing resources unit；Write address administrative unit, memory space and reservation station state table are included in reservation station；It selects to include chronological table, ready query unit and ready counter in wakeup unit；Computing resource table, allocation unit and recovery unit are included in managing computing resources unit.The present invention can improve throughput and the level of resources utilization of scheduler, so as to promote assignment instructions emission effciency, lifting system performance.

Description

A task-level out-of-order multiple-issue scheduler and its scheduling method

技术领域technical field

本发明涉及一种任务级乱序多发射调度器及其调度方法，属于乱序多发射处理器领域。The invention relates to a task-level out-of-order multi-issue scheduler and a scheduling method thereof, belonging to the field of out-of-order multi-issue processors.

背景技术Background technique

随着集成电路技术的发展，对于处理器性能的要求越来越高。处理器性能的提升，一方面取决于集成电路工艺的发展；另一方面也取决于处理器设计技术的进步，其中体系结构的发展起到至关重要的作用，而提升并行度是体系结构发展的主题。在过去很大一部分时间里，对指令级并行的挖掘做了很多努力，超流水线结构、超标量结构、指令乱序多发射、超长指令字VLIW等技术已经在很多处理器中得到应用。随着多核处理器的发展，当前多核技术已成为提升处理器性能的主要技术方法，但是计算资源利用效率低是目前多核系统存在的主要问题，如何充分调度片上众多计算资源，提升系统性能是多核系统研究中的一个关键方向。With the development of integrated circuit technology, the requirements for processor performance are getting higher and higher. The improvement of processor performance, on the one hand, depends on the development of integrated circuit technology; on the other hand, it also depends on the progress of processor design technology. Theme of. For a large part of the past, many efforts have been made to mine instruction-level parallelism. Technologies such as super-pipeline structure, superscalar structure, instruction out-of-order multiple issue, and very long instruction word VLIW have been applied in many processors. With the development of multi-core processors, the current multi-core technology has become the main technical method to improve processor performance, but the low utilization efficiency of computing resources is the main problem existing in multi-core systems. A key direction in systems research.

将指令级乱序多发射技术扩展到任务级是解决上述问题的一个有效方法，而任务级的动态调度、资源动态管理技术是其中的关键和难点。从指令级扩展到任务级，对应着计算粒度从细粒度到粗粒度的变化，这给多发射技术的实现带来诸多新的挑战，吞吐率和资源利用效率是评估调度器设计的两大关键因素。动态调度器的基本思想是Tomasulo算法的应用，这在传统的指令级调度器中应用已经很成熟，但是传统指令级调度器的设计已经不能满足任务级粗粒度调度器的设计要求。而现有的针对任务级调度器的设计，例如基于广播的调度器的扩展性存在很大的问题；而基于相关性矩阵调度器要求将依赖关系表达成“生产者—消费者”指令，所以对于“任务—任务”相关性获得效率很低。Extending instruction-level out-of-order multiple-issue technology to task level is an effective way to solve the above problems, and task-level dynamic scheduling and resource dynamic management technologies are the key and difficult points. The expansion from instruction level to task level corresponds to the change of computing granularity from fine-grained to coarse-grained, which brings many new challenges to the realization of multi-issue technology. Throughput rate and resource utilization efficiency are the two keys to evaluate scheduler design factor. The basic idea of the dynamic scheduler is the application of the Tomasulo algorithm, which has been very mature in the traditional instruction-level scheduler, but the design of the traditional instruction-level scheduler can no longer meet the design requirements of the task-level coarse-grained scheduler. However, the existing design of task-level schedulers, such as the scalability of broadcast-based schedulers, has serious problems; and the dependency matrix-based scheduler requires the expression of dependencies as "producer-consumer" instructions, so The acquisition efficiency is very low for "task-task" dependencies.

发明内容Contents of the invention

本发明为克服现有技术存在的不足之处，提出了一种任务级乱序多发射调度器及其调度方法，以期能提高调度器的吞吐率和资源利用效率，从而提升任务指令发射效率，提升系统性能。In order to overcome the shortcomings of the existing technology, the present invention proposes a task-level out-of-order multi-issue scheduler and its scheduling method, in order to improve the throughput rate and resource utilization efficiency of the scheduler, thereby improving the task instruction transmission efficiency, Improve system performance.

本发明为达到上述目的所采用的技术方案是：The technical scheme that the present invention adopts for achieving the above object is:

本发明一种任务级乱序多发射调度器，是设置在处理器中并用于调度M个任务指令，所述处理器中包括：取指单元、寄存器状态表和处理单元阵列；其特点是，所述调度器包括：保留站、选择唤醒单元和计算资源管理单元；所述保留站中包含写地址管理单元、存储空间和保留站状态表；所述选择唤醒单元中包含年龄表、就绪查询单元和就绪计数器；所述计算资源管理单元中包含计算资源表、分配单元和回收单元；A task-level out-of-order multi-issue scheduler of the present invention is arranged in a processor and is used to schedule M task instructions, and the processor includes: an instruction fetch unit, a register state table and a processing unit array; its characteristics are: The scheduler includes: a reservation station, a selective wake-up unit and a computing resource management unit; the reservation station includes a write address management unit, storage space, and a reservation station state table; the selective wake-up unit includes an age table, a ready query unit and a ready counter; the computing resource management unit includes a computing resource table, an allocation unit and a recycling unit;

所述存储空间用于保存M个任务指令，且在同一时刻最多容纳N个任务指令，每个任务指令占用所述存储空间中连续的L个地址空间，使得所述存储空间被分为N段，编号依次为0～N-1；所述写地址管理单元用于对所述N个任务指令自动分配保留站的存储空间；所述存储空间的状态包括“空”、“满”、“非空”、和“非满”；所述保留站状态表用于存储所述存储空间的状态位；所述状态位包括：“空闲”或“占用”；The storage space is used to save M task instructions, and can accommodate up to N task instructions at the same time, and each task instruction occupies consecutive L address spaces in the storage space, so that the storage space is divided into N segments , the numbers are 0～N-1 in turn; the write address management unit is used to automatically allocate the storage space of the reservation station to the N task instructions; the status of the storage space includes "empty", "full", "not empty", and "not full"; the reserved station status table is used to store the status bit of the storage space; the status bit includes: "idle" or "occupied";

所述就绪查询单元用于接收所述保留站发送的任务指令并进行解析，获得所述任务指令所需的计算资源信息和输入寄存器信息，并分别发送给所述计算资源管理单元和寄存器状态表并接收反馈的状态信息；所述计算资源信息包括：计算资源种类和计算资源个数；所述输入寄存器信息包括：输入寄存器编号和输入寄存器个数num；所述年龄表用于存储任务指令在所述存储空间中的地址信息，并将所述任务指令进入保留站的顺序提供给所述就绪查询单元；所述就绪计数器用于对所述寄存器状态表所反馈的已就绪的输入寄存器进行计数；The ready query unit is used to receive and analyze the task instruction sent by the reservation station, obtain the computing resource information and input register information required by the task instruction, and send them to the computing resource management unit and the register state table respectively And receive feedback status information; the computing resource information includes: computing resource type and computing resource number; the input register information includes: input register number and input register number num; the age table is used to store task instructions in The address information in the storage space, and the order in which the task instructions enter the reservation station are provided to the ready query unit; the ready counter is used to count the ready input registers fed back by the register state table ;

所述计算资源表用于反馈所述任务指令所需的计算资源是否就绪；所述分配单元用于查询所述计算资源表，并将已就绪的计算资源编号发送给所述选择唤醒单元；所述回收单元用于回收已完成计算任务的计算资源；The computing resource table is used to feed back whether the computing resources required by the task instruction are ready; the allocation unit is used to query the computing resource table, and send the ready computing resource number to the selection wake-up unit; The reclaiming unit is used for reclaiming computing resources that have completed computing tasks;

所述选择唤醒单元根据所述已就绪的计算资源编号和寄存器状态表反馈的输入寄存器的状态信息，判断所述任务指令是否就绪，并将已就绪的任务指令发射到外部的处理单元阵列中进行执行。The selective wake-up unit judges whether the task instruction is ready according to the ready computing resource number and the state information of the input register fed back by the register state table, and transmits the ready task instruction to an external processing unit array for processing. implement.

本发明一种基于所述任务级乱序多发射调度器的调度方法的特点是按如下步骤进行：The characteristics of a scheduling method based on the task-level out-of-order multi-transmission scheduler of the present invention are as follows:

步骤1、定义取指单元向保留站发送的任务指令编号为变量p，0≤p≤M-1；定义所述存储空间的写地址为变量i，0×L≤i≤N×L-1；定义所述存储空间的读地址为变量j，0×L≤j≤N×L-1；定义所述年龄表的写地址为变量k，0≤k≤N；定义所述年龄表的读地址为变量m，0≤m≤N-1；定义所述保留站状态表的地址为变量n，0≤n≤N-1；定义所述就绪计数器为变量cnt；并初始化p＝0、i＝0、j＝0、k＝0、m＝0、n＝0、cnt＝0；并同时执行步骤2和步骤5；Step 1. Define the task instruction number sent by the fetching unit to the reservation station as variable p, 0≤p≤M-1; define the write address of the storage space as variable i, 0×L≤i≤N×L-1 ; Define the read address of the storage space as variable j, 0×L≤j≤N×L-1; define the write address of the age table as variable k, 0≤k≤N; define the read address of the age table Address is variable m, 0≤m≤N-1; Define the address of the reserved station state table as variable n, 0≤n≤N-1; Define the ready counter as variable cnt; And initialize p=0, i =0, j=0, k=0, m=0, n=0, cnt=0; and execute step 2 and step 5 at the same time;

步骤2、所述写地址管理单元判断所述存储空间的状态是否为“满”状态，若为“满”；则所述写地址管理单元等待所述存储空间的状态为“非满”状态时再开始查询；否则，所述写地址管理单元立即查询所述保留站状态表中第n+1个地址的状态位，若第n+1个地址的状态位为“空闲”，则所述写地址管理单元将存储空间的写地址的状态改为“已锁定”状态，再执行步骤3；否则，将n+1赋值给n，并重复执行步骤2；Step 2, the write address management unit judges whether the state of the storage space is "full", if it is "full"; then the write address management unit waits for the state of the storage space to be "not full" Re-start query; otherwise, the write address management unit immediately inquires the status bit of the n+1th address in the reserved station status table, if the status bit of the n+1th address is "idle", then the write The address management unit changes the state of the write address of the storage space to the "locked" state, and then performs step 3; otherwise, assigns n+1 to n, and repeats step 2;

步骤3、所述保留站判断所述取指单元是否发送N个任务指令中第p+1个任务指令；若已发送，则将n×L赋值给i，表示将第p+1个任务指令存储在所述存储空间中的第n×L+1个地址中，并将所述保留站状态表中第n+1个地址的状态位设为“占用”；写地址管理单元将写地址的状态改为“未锁定”状态，同时根据“old-first”原则将n存入所述年龄表的第k+1个地址中，并将k+1赋值给k，将p+1赋值给p后，执行步骤4；若未发送，则继续执行步骤3，等待取指单元发送的第p+1个任务指令；Step 3. The reservation station judges whether the fetching unit sends the p+1th task instruction among the N task instructions; if it has been sent, then assigns n×L to i, indicating that the p+1th task instruction Stored in the n×L+1 address in the storage space, and set the state bit of the n+1 address in the reserved station state table to "occupancy"; the write address management unit will write the address The state is changed to "unlocked" state, and n is stored in the k+1th address of the age table according to the "old-first" principle, and k+1 is assigned to k, and p+1 is assigned to p After that, execute step 4; if not sent, continue to execute step 3, waiting for the p+1th task instruction sent by the instruction fetch unit;

步骤4，判断k＝1是否成立，若成立，则将所述存储空间的状态改为“非空”状态，否则，维持所述存储空间的状态为“非空”状态；Step 4, judging whether k=1 is established, if established, then change the state of the storage space to a "non-empty" state, otherwise, maintain the state of the storage space as a "non-empty" state;

判断k＝N是否成立，若成立，则将所述存储空间的状态改为“满”状态，否则，维持所述存储空间的状态为“非满”状态；并返回步骤2执行；Judging whether k=N is established, if established, then change the state of the storage space into a "full" state, otherwise, maintain the state of the storage space as a "not full" state; and return to step 2 for execution;

步骤5、判断所述存储空间的状态是否为“非空”状态，若为“非空”状态，则根据“old-first”原则读取所述年龄表的第m+1个地址，获得所述年龄表的第m+1个地址的输出t，并将t×L赋值给j，从而使得能从所述存储空间的第t×L+1个地址中读取到第f个任务指令，并对所述第f个任务指令进行解析，获得第f个任务指令所需的计算资源信息和输入寄存器信息，再执行步骤6；否则，重复执行步骤5，等待所述存储空间的状态变为“非空”状态；Step 5, judging whether the state of the storage space is "non-empty", if it is "non-empty", read the m+1th address of the age table according to the "old-first" principle, and obtain the The output t of the m+1th address of the age table, and assign t×L to j, so that the fth task instruction can be read from the t×L+1th address of the storage space, And analyze the fth task instruction, obtain the computing resource information and input register information required by the fth task instruction, and then perform step 6; otherwise, repeat step 5, and wait for the state of the storage space to become "not-empty" state;

步骤6、所述选择唤醒单元根据第f个任务指令的第cnt+1个输入寄存器编号查询所述寄存器状态表，并接收所述寄存器状态表反馈的状态信息后，执行步骤7；Step 6. The selective wake-up unit queries the register state table according to the cnt+1 input register number of the f-th task instruction, and after receiving the state information fed back by the register state table, execute step 7;

步骤7、所述选择唤醒单元根据所述寄存器状态表反馈的状态信息，判断所述输入寄存器是否已就绪，若已就绪，则将cnt+1赋值给cnt，再执行步骤8；否则，将m+1的值赋给m，将cnt置为0，再返回步骤5执行；Step 7, the selective wake-up unit judges whether the input register is ready according to the state information fed back by the register state table, if ready, assigns cnt+1 to cnt, and then performs step 8; otherwise, sets m Assign the value of +1 to m, set cnt to 0, and then return to step 5 for execution;

步骤8、所述选择唤醒单元将所述就绪计数器cnt和第f个任务指令的输入寄存器个数num进行比较，若满足cnt＜num，则返回到步骤6执行；否则，执行步骤9；Step 8: The selective wake-up unit compares the ready counter cnt with the number of input registers num of the f-th task instruction, and if cnt<num is satisfied, return to step 6 for execution; otherwise, execute step 9;

步骤9、所述选择唤醒单元根据第f个任务指令的所需计算资源信息，向所述计算资源管理单元发送查询命令后；执行步骤10；Step 9: After the selective wake-up unit sends a query command to the computing resource management unit according to the required computing resource information of the fth task instruction; execute step 10;

步骤10、所述分配单元接收到所述选择唤醒单元发送的查询命令，扫描所述计算资源表，判断是否满足所述第f个任务指令的所需计算资源信息，若满足，则将所述计算资源管理单元中状态为“空闲”的计算资源分配给所述选择唤醒单元，并向所述选择唤醒单元反馈状态信息为所分配的“空闲”的计算资源编号，同时在所述计算资源表中将所分配的“空闲”计算资源的状态置为“占用”；若不满足，则向所述选择唤醒单元反馈状态信息为所需计算资源不足；并执行步骤11；Step 10, the allocating unit receives the query command sent by the selective wake-up unit, scans the computing resource table, and judges whether the required computing resource information of the f-th task instruction is satisfied, and if so, sends the The computing resource whose state is "idle" in the computing resource management unit is allocated to the selective wake-up unit, and the status information is fed back to the selective wake-up unit as the number of the allocated "idle" computing resource, and at the same time in the computing resource table Set the state of the allocated "idle" computing resources to "occupied"; if not satisfied, then feed back status information to the selection wake-up unit that the required computing resources are insufficient; and perform step 11;

步骤11、所述选择唤醒单元根据所述计算资源管理单元反馈的状态信息，判断所述所需计算资源是否已就绪，若已就绪，则将已就绪的第f个任务指令发射到外部的处理单元阵列中进行执行，并将所述保留站状态表中第t+1个地址的状态位设为“空闲”；将所述年龄表的第m+1个地址的输出t置为“无效”；从而使得所述存储空间的第t+1个地址和所述年龄表的第m+1个地址中均产生了“气泡”，并同时执行步骤12和步骤13；否则，将m+1的值赋给m，将cnt置为0，并返回步骤5执行；Step 11: The selective wake-up unit judges whether the required computing resource is ready according to the status information fed back by the computing resource management unit, and if it is ready, transmits the ready f-th task instruction to the external processing Execute in the cell array, and set the status bit of the t+1th address in the reserved station status table to "idle"; set the output t of the m+1th address of the age table to "invalid" ; so that "bubbles" are generated in the t+1th address of the storage space and the m+1th address of the age table, and step 12 and step 13 are executed at the same time; otherwise, the m+1th address Assign the value to m, set cnt to 0, and return to step 5 for execution;

步骤12、将所述年龄表中第m+2个地址信息到第N个地址信息依次赋值给第m+1个地址信息到第N-1个地址信息，并将k-1赋值给k；判断k＝0是否成立，若成立，则将所述存储空间的状态改为“空”状态；否则，维持在“非空”状态；Step 12, assigning the m+2th to the Nth address information in the age table to the m+1th to the N-1th address information in turn, and assigning k-1 to k; Judging whether k=0 is established, if established, the state of the storage space is changed to an "empty" state; otherwise, it is maintained in a "non-empty" state;

判断k＝N-1是否成立，若成立，则将所述存储空间的状态改为“非满”状态；否则，维持在“非满”状态；将cnt置为0后，再返回步骤5执行；Determine whether k=N-1 is true, if true, change the state of the storage space to a "not full" state; otherwise, maintain the "not full" state; set cnt to 0, and then return to step 5 for execution ;

步骤13、所述回收单元若接收到所述处理单元阵列反馈的完成计算任务信息，则对所述完成计算任务信息进行解析，获得已完成计算任务指令的计算资源种类和计算资源编号，并将所述已完成计算任务指令的计算资源种类和计算资源编号所对应的计算资源在所述计算资源表中的计算资源状态改为“空闲”，再重复执行步骤13。Step 13: If the recovery unit receives the completed computing task information fed back by the processing unit array, it analyzes the completed computing task information, obtains the computing resource type and computing resource number of the completed computing task instruction, and sends The computing resource status corresponding to the computing resource type and computing resource number of the completed computing task instruction is changed to "idle" in the computing resource table, and then step 13 is repeated.

本发明所述的调度方法的特点也在于，所述计算资源管理单元采用主动分配资源的方式则将状态为“空闲”的计算资源分配给所述选择唤醒单元；所述主动分配资源的方式为：The scheduling method of the present invention is also characterized in that the computing resource management unit allocates the computing resources in the "idle" state to the selective wake-up unit by actively allocating resources; the proactive resource allocation method is :

步骤1、利用若干个FIFO单元组成所述计算资源表，假设计算资源的种类为A种；且每种计算资源的个数为B个；且将每种计算资源的个数B分为C组，从而形成A×C个FIFO单元所组成的二维矩阵；且每列FIFO单元对应于一个基编号，分别为 Step 1, using several FIFO units to form the computing resource table, assuming that the type of computing resource is type A; and the number of each computing resource is B; and dividing the number B of each computing resource into C groups , thus forming a two-dimensional matrix composed of A×C FIFO units; and each column of FIFO units corresponds to a base number, respectively

步骤2、对所述二维矩阵进行初始化，将偏移量分别写入所述二维矩阵的每个FIFO单元中，用于表示“空闲”状态的计算资源；Step 2, initialize the two-dimensional matrix, and set the offset respectively written into each FIFO unit of the two-dimensional matrix, used to represent computing resources in an "idle"state;

步骤3、每个FIFO单元读取自身的一个偏移量，并等待所述计算资源表满足所述第f个任务指令的所需计算资源信息；Step 3. Each FIFO unit reads an offset of itself, and waits for the computing resource table to satisfy the required computing resource information of the f-th task instruction;

步骤4、在所述计算资源表满足所述第f个任务指令的所需计算资源信息时，每个FIFO单元将所读取的偏移量和自身所对应的基编号进行组合后形成计算资源编号；Step 4. When the computing resource table satisfies the required computing resource information of the f-th task instruction, each FIFO unit combines the read offset with its corresponding base number to form a computing resource Numbering;

步骤5、所述分配单元从C组计算资源编号中任意选取满足所述第f个任务指令的所需计算资源信息的计算资源编号提供给所述选择唤醒单元。Step 5: The allocating unit arbitrarily selects a computing resource number that satisfies the required computing resource information of the f-th task instruction from the C group of computing resource numbers and provides it to the selective wakeup unit.

与现有技术相比，本发明的有益技术效果体现在：Compared with the prior art, the beneficial technical effect of the present invention is reflected in:

1、本发明设计了一种任务级乱序多发射的调度器，包括保留站、选择唤醒单元、计算资源管理单元，从而能够很好的将细粒度特性的指令级乱序多发射动态调度技术扩展到具有粗粒度特性的任务级乱序多发射动态调度技术，能更高效的利用保留站的存储空间，提升保留站的利用效率，进而高效的支持系统对于任务级指令的动态调度，充分利用计算资源的强大计算力，提取任务级并行执行效率，提升系统并行度；获得更高的计算资源利用效率和更好的计算性能，对于多核甚至众核系统中的计算资源管理在降低设计复杂度的同时具备较高的效率。1. The present invention designs a task-level out-of-order multi-launch scheduler, including a reservation station, a selective wake-up unit, and a computing resource management unit, so that the fine-grained instruction-level out-of-order multiple-launch dynamic scheduling technology can be well implemented Extended to the task-level out-of-order multi-launch dynamic scheduling technology with coarse-grained characteristics, it can more efficiently use the storage space of the reserved station, improve the utilization efficiency of the reserved station, and then efficiently support the dynamic scheduling of task-level instructions by the system, making full use of The powerful computing power of computing resources extracts task-level parallel execution efficiency and improves system parallelism; obtains higher computing resource utilization efficiency and better computing performance, and reduces design complexity for computing resource management in multi-core or even many-core systems while having higher efficiency.

2、本发明设计了一种高效的保留站，对于保留站存储空间的利用率有很好的提升；所设计的写地址管理单元具有一定的自查询功能，通过扫描保留站状态表，自动锁定空闲的存储空间，以备新的任务指令写入；这种自查询功能具有较高的效率，能够快速的定位到空闲的存储空间；所设计保留站状态表对于存储空间的状态表示、分配、清除起到很关键的作用。2. The present invention designs an efficient reservation station, which can greatly improve the utilization rate of the storage space of the reservation station; the designed write address management unit has a certain self-query function, and automatically locks by scanning the reservation station status table Free storage space for writing new task instructions; this self-query function has high efficiency and can quickly locate the free storage space; the design of the reserved station status table is for the storage space status representation, allocation, Clearance plays a key role.

3、本发明设计的选择唤醒单元对于“old-first”的发射原则有很高效的实现；规避了传统选择唤醒方法，传统选择唤醒方法是先唤醒再选择，先将已经就绪的指令发送到已就绪指令池，再通过比较任务指令的年龄，选择最老的指令进行发射。而本发明是先选择再唤醒，借鉴了二级指针的概念，在选择的时候按照“old-first”原则，即按照任务指令进入保留站的顺序将指令在保留站中的地址写到年龄表中，唤醒的时候按照年龄表的地址从小到大依次读出即可，而且对于保留站“气泡”的压缩只需一个时钟周期；这种方法设计简单，吞吐率高，在一定程度上节省资源，同时提高了选择唤醒的效率。3. The selective wake-up unit designed by the present invention has a very efficient implementation of the "old-first" launch principle; it avoids the traditional selective wake-up method, which is to wake up first and then select, and first send the ready command to the ready The ready instruction pool, and then by comparing the age of the task instructions, select the oldest instruction to launch. However, the present invention first selects and wakes up again, draws lessons from the concept of the secondary pointer, and follows the "old-first" principle when selecting, that is, writes the address of the instruction in the reservation station to the age table according to the order in which the task instructions enter the reservation station When waking up, the address of the age table can be read in ascending order, and it only takes one clock cycle to compress the "bubble" of the reservation station; this method is simple in design, high in throughput, and saves resources to a certain extent , while improving the efficiency of selective wake-up.

4、本发明设计的计算资源管理单元设计成二维矩阵的形式，方便对多种计算资源进行分配与回收；采用FIFO进行存储，简化了设计，同时提高了分配回收效率，不用遍历计算资源向量表查询空闲的计算资源；同时采用基编号加上偏移量的方式生成所分配的计算资源编号，这种形式在一定程度上节省了用于存储计算资源编号的FIFO存储空间。FIFO提供计算资源编号偏移量是主动提供，不用被等到计算资源管理单元需要时才从FIFO里取出，节省了计算资源分配的时间。4. The computing resource management unit designed by the present invention is designed in the form of a two-dimensional matrix, which facilitates the allocation and recycling of various computing resources; FIFO is used for storage, which simplifies the design and improves the efficiency of distribution and recycling at the same time, without traversing computing resource vectors The table queries idle computing resources; at the same time, the assigned computing resource number is generated by adding the base number to the offset, which saves the FIFO storage space used to store the computing resource number to a certain extent. The number offset of computing resources provided by FIFO is actively provided, and it does not need to be taken out of FIFO when the computing resource management unit needs it, which saves the time for computing resource allocation.

附图说明Description of drawings

图1为本发明整体结构示意图；Fig. 1 is a schematic diagram of the overall structure of the present invention;

图2为本发明保留站结构示意图；Fig. 2 is the structural representation of the reservation station of the present invention;

图3为本发明选择唤醒单元结构示意图；Fig. 3 is a schematic structural diagram of the selective wake-up unit of the present invention;

图4为本发明年龄表和存储空间映射图；Fig. 4 is the age table and storage space map of the present invention;

图5为本发明保留站压缩“气泡”示意图；Fig. 5 is a schematic diagram of the compressed "bubble" in the retention station of the present invention;

图6为本发明计算资源管理单元结构示意图；FIG. 6 is a schematic structural diagram of a computing resource management unit in the present invention;

图7为本发明调度器调度流程示意图；Fig. 7 is a schematic diagram of the scheduling process of the scheduler of the present invention;

图8为现有技术中不压缩“气泡”的保留站示意图。Fig. 8 is a schematic diagram of a retention station without compressing "bubbles" in the prior art.

具体实施方式Detailed ways

本实施例中，如图1所示，一种任务级乱序多发射调度器，是设置在处理器中并用于调度M个任务指令，该处理器中包括：取指单元、寄存器状态表和处理单元阵列；调度器包括：保留站、选择唤醒单元和计算资源管理单元；保留站中包含写地址管理单元、存储空间和保留站状态表；选择唤醒单元中包含年龄表、就绪查询单元和就绪计数器；计算资源管理单元中包含计算资源表、分配单元和回收单元；In this embodiment, as shown in FIG. 1, a task-level out-of-order multiple-issue scheduler is arranged in a processor and used to schedule M task instructions, and the processor includes: an instruction fetch unit, a register state table and Processing unit array; scheduler includes: reservation station, selective wake-up unit and computing resource management unit; reservation station contains write address management unit, storage space and reservation station state table; selective wake-up unit contains age table, ready query unit and ready Counter; computing resource management unit includes computing resource table, allocation unit and recycling unit;

通常，支持动态调度的处理器中包含控制器和处理单元阵列，控制器中包含取指译码单元、寄存器重命名单元、调度器、提交单元、物理寄存器、寄存器状态表等；任务指令由取指译码单元从主存储器中的程序区获取任务指令，进行译码后发送到寄存器重命名单元；寄存器重命名单元对任务指令进行寄存器重命名，消除假数据相关性，保留真数据相关性，再发送到调度器；调度器根据任务指令的就绪情况，包括输入寄存器就绪情况和计算资源就绪情况，将已就绪的任务指令发送到处理单元阵列进行执行。为了简化描述，这里将取指译码单元和寄存器重命名单元统称为取指单元，取指单元向调度器发送任务指令，此时调度器接收到的任务指令已经经过寄存器重命名，消除了假数据相关性；同时省略了提交单元和物理寄存器；保留了寄存器状态表和处理单元阵列。Usually, a processor that supports dynamic scheduling includes a controller and a processing unit array. The controller includes an instruction fetching and decoding unit, a register renaming unit, a scheduler, a submission unit, physical registers, a register status table, etc.; The decoding unit obtains task instructions from the program area in the main memory, and sends them to the register renaming unit after decoding; the register renaming unit renames the registers of the task instructions, eliminates false data dependencies, and retains true data dependencies. and then sent to the scheduler; the scheduler sends the ready task instructions to the processing unit array for execution according to the readiness of the task instructions, including the readiness of input registers and the readiness of computing resources. In order to simplify the description, the instruction fetching unit and the register renaming unit are collectively referred to as the fetching unit. The fetching unit sends task instructions to the scheduler. At this time, the task instructions received by the scheduler have been renamed to eliminate false Data dependencies; both commit unit and physical registers are omitted; register state tables and arrays of processing units are preserved.

本发明中的任务指令是调度一个任务所需要的信息，主要包括：输入寄存器的个数、输入寄存器的编号、计算资源的种类、每种计算资源的个数等。而一个任务则包含一系列的简单指令。这里可以利用函数调用和函数体来理解：将任务指令对应为函数调用，任务则对应为函数体。调度器通过对函数调用进行调度，从而执行相应的函数体。The task instruction in the present invention is the information needed to schedule a task, mainly including: the number of input registers, the number of input registers, the types of computing resources, the number of each type of computing resources, and the like. A task, on the other hand, consists of a series of simple instructions. Here we can use the function call and function body to understand: the task instruction corresponds to the function call, and the task corresponds to the function body. The scheduler schedules the function call to execute the corresponding function body.

存储空间用于保存M个任务指令，且在同一时刻最多容纳N个任务指令，每个任务指令占用存储空间中连续的L个地址空间，从而使得存储空间被分为N段，编号依次为0～N-1；若M≤N，则存储空间在同一时刻可以容纳全部的M个任务指令，当接收到取指单元发送的任务指令时可以立即写入存储空间中；否则，当存储空间中已经有N个任务指令时，存储空间不再将取指单元发送过来的任务指令立即写入存储空间中。只有当存储空间中有任务指令发射到处理单元阵列中执行并压缩存储空间中的“气泡”，且由写地址管理单元查询空闲存储空间并锁定写地址后，才能将取指单元发送任务指令写入存储空间中。The storage space is used to save M task instructions, and can accommodate up to N task instructions at the same time. Each task instruction occupies consecutive L address spaces in the storage space, so that the storage space is divided into N segments, numbered sequentially 0 ~N-1; if M≤N, the storage space can accommodate all M task instructions at the same time, and can be written into the storage space immediately when the task instruction sent by the instruction fetch unit is received; otherwise, when the storage space When there are already N task instructions, the storage space no longer immediately writes the task instructions sent by the instruction fetch unit into the storage space. Only when a task instruction in the storage space is sent to the processing unit array for execution and the "bubble" in the storage space is compressed, and the write address management unit queries the free storage space and locks the write address, can the instruction fetch unit send the task instruction to write into the storage space.

由于本发明中的任务指令包含的信息较多，因此每个任务指令都会被分为连续的L个字段写入到存储空间的一段连续的地址空间中，这些连续的L个字段称为任务指令字。如图2所示，本实例中令L＝10，即一个任务指令被分为10个连续的任务指令写入到存储空间中；令N＝32，即存储空间被分为以10个地址空间为一个单位的32段；因此所需存储空间的深度为320，对应的地址空间为0～319。写地址管理单元用于对N个任务指令自动分配保留站的存储空间，将存储空间的32段地址空间分别编号为0～31，写地址管理单元通过查询保留站状态表获取每段地址空间的状态，每次从编号为0～31的地址空间中选择一段空闲的地址空间，用于存储取指单元发送来的任务指令；存储空间的状态包括“空”、“满”、“非空”、和“非满”；存储空间的“空”状态表示存储空间中没有任务指令；存储空间的“满”状态表示存储空间中32段都存储了任务指令；存储空间的“非空”状态表示存储空间中至少有一段存储了任务指令；存储空间的“非满”状态表示存储空间中至少有一段存储空间没有存储任务指令。保留站状态表用于存储存储空间的状态位；状态位包括：“空闲”或“占用”；如图2所示，保留站状态表总共存储了32个状态位，分别对应存储空间的32段地址空间，因此每一个状态位对应存储空间中一段10个连续的地址空间。保留站状态表给写地址管理单元分配存储空间提供了依据。写地址管理单元通过遍历保留站状态表的状态位，当第一次遇到状态位为“空闲”状态时，假设此时存储空间的的地址空间的编号为id，则实际写地址为10×id，此时将写地址的状态置为“已锁定”，即为写地址的锁定过程；当保留站接收到取指单元发送来的任务指令时，则将该任务指令写入所分配的存储空间，再将写地址状态置为“未锁定”，即为写地址的解锁过程。Since the task instructions in the present invention contain more information, each task instruction will be divided into continuous L fields and written into a continuous address space of the storage space. These continuous L fields are called task instructions Character. As shown in Figure 2, let L=10 in this example, that is, a task instruction is divided into 10 consecutive task instructions and written into the storage space; let N=32, that is, the storage space is divided into 10 address spaces It is a unit of 32 segments; therefore, the depth of the required storage space is 320, and the corresponding address space is 0~319. The write address management unit is used to automatically allocate the storage space of the reserved station for N task instructions, and number the 32 segments of address space in the storage space as 0~31 respectively, and the write address management unit obtains the address space of each segment by querying the reserved station status table State, each time a free address space is selected from the address space numbered 0~31, which is used to store the task instructions sent by the instruction fetch unit; the state of the storage space includes "empty", "full", and "not empty" , and "not full"; the "empty" status of the storage space indicates that there is no task instruction in the storage space; the "full" status of the storage space indicates that all 32 sections of the storage space have stored task instructions; the "non-empty" status of the storage space indicates At least one section of the storage space stores task instructions; the "not full" status of the storage space indicates that at least one section of the storage space does not store task instructions. The reserved station status table is used to store the status bits of the storage space; the status bits include: "idle" or "occupancy"; as shown in Figure 2, the reserved station status table stores a total of 32 status bits, corresponding to 32 segments of the storage space address space, so each status bit corresponds to a segment of 10 consecutive address spaces in the storage space. The reserved station state table provides a basis for allocating storage space to the write address management unit. The write address management unit traverses the state bits of the reservation station state table, and when the state bit is “idle” for the first time, assuming that the number of the address space of the storage space is id, the actual write address is 10× id, at this time, the state of the write address is set to "locked", which is the lock process of the write address; when the reservation station receives the task instruction sent by the instruction fetch unit, it writes the task instruction into the allocated memory space, and then set the status of the write address to "unlocked", which is the unlocking process of the write address.

就绪查询单元用于接收保留站发送的任务指令并进行解析，获得任务指令所需的计算资源信息和输入寄存器信息，并分别发送给计算资源管理单元和寄存器状态表并接收反馈的状态信息；计算资源信息包括：计算资源种类和计算资源个数；输入寄存器信息包括：输入寄存器编号和输入寄存器个数num；本实例中，就绪查询逻辑先查询输入寄存器的是否已全部就绪，若已全部就绪，则再查询计算资源是否就绪；若存在未就绪的输入寄存器，则对下一个任务指令进行查询。年龄表用于存储任务指令在存储空间中的地址信息，并将任务指令进入保留站的顺序提供给就绪查询单元；就绪计数器用于对寄存器状态表所反馈的已就绪的输入寄存器进行计数；The ready query unit is used to receive and analyze the task instructions sent by the reservation station, obtain the computing resource information and input register information required by the task instructions, and send them to the computing resource management unit and the register status table respectively, and receive the feedback status information; The resource information includes: the type of computing resources and the number of computing resources; the input register information includes: the input register number and the number of input registers num; in this example, the readiness query logic first checks whether all the input registers are ready, if all are ready, Then query whether the computing resource is ready; if there is an input register that is not ready, then query the next task instruction. The age table is used to store the address information of the task instructions in the storage space, and provide the order of the task instructions entering the reservation station to the ready query unit; the ready counter is used to count the ready input registers fed back by the register state table;

调度器中按“old-first”原则对任务指令进行唤醒是指优先发射较早进入保留站的任务指令。因为较早进入保留站的任务指令有更大的概率包含更多的数据相关性，后续相关的任务指令以此任务指令的输出寄存器作为它们的输入，因此实现“old-first”原则能较早的使后续相关的任务指令就绪而发射，从而提升系统的性能。In the scheduler, the task instruction is awakened according to the "old-first" principle, which means that the task instruction that enters the reservation station earlier is given priority. Because task instructions entering the reservation station earlier have a greater probability of containing more data dependencies, subsequent dependent task instructions use the output registers of this task instruction as their input, so the "old-first" principle can be implemented earlier The subsequent relevant task instructions are ready to be launched, thereby improving the performance of the system.

对于传统指令级的调度器，由于指令包含的信息较少，因此每个指令直接存储在保留站的存储空间中，不需要分成多段进行存储，并且所需要的存储空间比较小。指令的调度过程是并行查询指令是否就绪，然后从处于“已就绪”状态的指令中，根据“old-first”原则选择最先进入保留站的指令进行发射，即“先唤醒，再选择”。通常用两种方法实现“old-first”原则。For the traditional instruction-level scheduler, because the instruction contains less information, each instruction is directly stored in the storage space of the reservation station, and does not need to be divided into multiple segments for storage, and the required storage space is relatively small. The scheduling process of the instruction is to query whether the instruction is ready in parallel, and then select the instruction that first enters the reservation station from among the instructions in the "ready" state according to the "old-first" principle, that is, "wake up first, then select". The "old-first" principle is usually implemented in two ways.

第一种方法是每条指令携带年龄信息写入存储空间中，不依靠保留站的地址作为选择唤醒的顺序，而是显示比较“已就绪”指令的年龄，选取年龄最小的指令——即最先进入保留站的指令进行发射。这种方法需要较多的比较器，并且电路延时较大，通常不建议采取这种方法。The first method is to write each instruction with age information into the storage space, instead of relying on the address of the reserved station as the order of selecting the wake-up, but displaying and comparing the ages of the "ready" instructions, and selecting the instruction with the youngest age—that is, the latest The command to enter the reservation station first is launched. This method requires more comparators, and the circuit delay is relatively large, so it is generally not recommended to adopt this method.

第二种方法是用移位寄存器组实现，当接收到取指单元发送的指令时，存储在存储空间的第一个空闲的地址空间中；当指令发射的后，会在存储空间中产生“气泡”，但是该“气泡”只占用存储空间中的一个地址空间，压缩保留站中的气泡只需要将该“气泡”所在地址空间之后的所有内容依次向前移动一个地址，即可将“气泡”消除。因此在保留站的存储空间中，所有的指令都是按照指令进入保留站的顺序依次存储在存储空间中，存储空间地址较小的地址空间存储的是较早进入的指令，存储空间地址较大的地址空间存储的是较晚进入的指令。对于“old-first”原则的实现很方便，依次按照存储空间地址从小到大读出进行调度即可。The second method is implemented with a shift register group. When the instruction sent by the instruction fetch unit is received, it is stored in the first free address space of the storage space; when the instruction is issued, it will be generated in the storage space. Bubble”, but the “bubble” only occupies one address space in the storage space, the compression of the bubble in the reserved station only needs to move all the content after the address space where the “bubble” is located one address forward in turn, and the “bubble” can be moved forward "eliminate. Therefore, in the storage space of the reservation station, all instructions are stored in the storage space in sequence according to the order in which the instructions entered the reservation station. The address space with the smaller address of the storage space stores the instructions entered earlier, and the address of the storage space is larger. The address space of the address space stores the instructions that come in later. It is very convenient to realize the "old-first" principle, and the scheduling can be done by reading out the storage space addresses from small to large in turn.

但是相比于传统指令，对于任务级的调度器而言，每个任务指令包含的信息较多，需要占用存储空间中的多个地址空间，而且需要的硬件存储资源较大，硬件设计通常选择BlockRAM实现，而不是类似方法二采用移位寄存器组实现。采用BlockRAM实现时，当存储空间中产生“气泡”后，不方便压缩“气泡”。假设产生气泡的位置为q，则需要将q+10的内容读出，再写到q的地址空间中，然后将q递增再重复这一操作，直到q+10以后所有的内容都移动到对应的位置。如果所采用的BlockRAM是双端口的，则也需要320-q-10个时钟周期才能完成，如此低的性能是不能接受的。However, compared with traditional instructions, for a task-level scheduler, each task instruction contains more information, needs to occupy multiple address spaces in the storage space, and requires large hardware storage resources. Hardware design usually chooses BlockRAM implementation, rather than using a shift register bank similar to method 2. When implemented with BlockRAM, it is inconvenient to compress the "bubbles" after "bubbles" are generated in the storage space. Assuming that the position where the bubble is generated is q, you need to read the content of q+10, and then write it into the address space of q, then increment q and repeat this operation until all the contents after q+10 are moved to the corresponding s position. If the adopted BlockRAM is dual-port, it also needs 320-q-10 clock cycles to complete, such low performance is unacceptable.

这里说明一下为何需要压缩“气泡”。因为要实现“old-first”原则，若不压缩“气泡”，则在任务指令进入保留站时，只能写在保留站末端的第一个空闲的存储空间，而中间处于“气泡”位置的存储空间不能被分配，必须要等到“气泡”位置之前的存储空间里的任务指令全部被发射后，“气泡”位置的存储空间才能够被分配。如图8所示为不压缩“气泡”的保留站示意图，图中显示编号为1、2、4的存储空间被占用，编号为3的存储空间中为“气泡”，此时若有任务指令进来时，只能顺序写入编号为5～31的存储空间中，不能写入编号为3的存储空间中；当编号为5～31的存储空间也都写满任务指令时，编号为3的存储空间中依旧不能写入任务指令，若还有任务指令需要写入，则保留站拒绝写入。直到编号为0～2的存储空间中任务指令发射后，才能将需要写入的任务指令依次写到编号为0～2的存储空间中，接下来的任务指令才能写入编号为3的存储空间中。因此这种不压缩“气泡”的保留站的利用效率不高。因此需要设计一种既按“old-first”原则进行任务指令调度同时还能高效压缩“气泡”的调度器。Here is an explanation of why the "bubble" needs to be compressed. Because the "old-first" principle is to be implemented, if the "bubble" is not compressed, when the task instruction enters the reservation station, it can only be written in the first free storage space at the end of the reservation station, while the "bubble" in the middle The storage space cannot be allocated, and the storage space at the “bubble” position can only be allocated after all the task commands in the storage space before the “bubble” position are issued. Figure 8 is a schematic diagram of the reservation station without compressing the "bubble". When coming in, it can only be written into the storage space numbered 5~31 sequentially, and cannot be written into the storage space numbered 3; when the storage spaces numbered 5~31 are also filled with task instructions, the The task instruction cannot be written into the storage space, if there is still a task instruction to be written, the reservation station refuses to write. The task instructions that need to be written can only be written into the storage spaces numbered 0~2 until the task instructions in the storage spaces numbered 0~2 are launched, and the next task instructions can be written into the storage space numbered 3. middle. The use of such a retention station without compressing the "bubble" is therefore inefficient. Therefore, it is necessary to design a scheduler that not only schedules task instructions according to the "old-first" principle, but also compresses "bubbles" efficiently.

本发明借鉴二级指针的概念设计了一种年龄表，配合双端口BlockRAM的存储空间实现了一种按“old-first”原则进行任务指令调度同时还能高效压缩存储空间中“气泡”的调度器。将任务指令进入保留站时写地址管理单元所分配的存储空间的编号依次存储到年龄表中，将此编号作为一级指针，将年龄表的读地址作为二级指针，年龄表设计成移位寄存器组的形式。如图3所示，年龄表中含有头指针、尾指针、分配指针。头指针用于指示年龄表的第一个地址，即图中的地址0；尾指针用于指示年龄表中最后一个含有有效存储空间编号的地址，即图中的地址28；分配指针用于指示尾指针所指向年龄表地址的下一个地址，即图中的地址29。即将进入保留站的任务指令被分配的存储空间编号会被存储在分配指针所指向的年龄表地址中。当分配指针和头指针相等时，表示年龄表中没有任何有效内容，此时存储空间也为“空”状态；当分配指针等于32时，表示年龄表中所有的地址空间都存储有有效内容，此时存储空间也为“满”状态。The present invention designs an age table by referring to the concept of the secondary pointer, and realizes a scheduling of task instructions according to the "old-first" principle while efficiently compressing "bubbles" in the storage space in conjunction with the storage space of the dual-port BlockRAM device. Store the number of the storage space allocated by the write address management unit when the task command enters the reservation station into the age table in turn, use this number as the first-level pointer, and use the read address of the age table as the second-level pointer, and the age table is designed to shift The form of the register file. As shown in Figure 3, the age table contains a head pointer, a tail pointer, and an allocation pointer. The head pointer is used to indicate the first address of the age table, that is, address 0 in the figure; the tail pointer is used to indicate the last address in the age table that contains a valid storage space number, that is, address 28 in the figure; the allocation pointer is used to indicate The next address of the age table address pointed by the tail pointer is the address 29 in the figure. The number of the allocated storage space for the task instruction that is about to enter the reservation station will be stored in the address of the age table pointed to by the allocation pointer. When the allocation pointer is equal to the head pointer, it means that there is no valid content in the age table, and the storage space is also in an "empty" state at this time; when the allocation pointer is equal to 32, it means that all address spaces in the age table have valid content stored. At this time, the storage space is also in the "full" state.

如图4所示，显示了年龄表和存储空间的映射关系。当对任务指令进行调度时，由就绪查询单元按照年龄表地址从小到大的顺序读取年龄表，只需要从头指针开始到尾指针结束，年龄表的输出即为存储空间的编号id，按照此编号可以计算出对应的存储空间的地址为10×id。就绪查询单元将此地址作为存储空间的读地址获取对应的任务指令，并对任务指令进行解析并判断是否就绪。至此即实现了“old-first”原则，优先读取出了最先进入保留站的任务指令进行选择唤醒，也就最先可能被发射到处理单元阵列中执行。在图4中，依次读出的任务指令所对应的存储空间地址为5、3、29……30、1，这也是任务指令进入保留站的先后顺序。当任务指令就绪发射后，存储空间中对应所发射任务指令的地址空间里的内容不再有效，年龄表中对应所发射任务指令的存储空间的编号也无效，即在存储空间和年龄表中都会产生“气泡”，如图5显示了“气泡”压缩前后的年龄表和存储空间，压缩前年龄表的地址2和存储空间的地址290～299中都为“气泡”，此时只需通过移位寄存器的压缩方式压缩年龄表中的“气泡”。从表象上看，压缩后的年龄表中就不再含有“气泡”，而存储空间中的“气泡”看起来依旧还在，但是进行任务调度时是先读年龄表然后根据年龄表的输出计算出存储空间的读地址，因此不会读出存储空间“气泡”位置的任务指令，这也就实现了存储空间“气泡”的压缩。本实例将存储空间的“气泡”也称为保留站的“气泡”。As shown in Figure 4, the mapping relationship between the age table and the storage space is shown. When scheduling task instructions, the ready query unit reads the age table in the order of the age table address from small to large, only needs to start from the head pointer to the end of the tail pointer, the output of the age table is the number id of the storage space, according to this The address of the corresponding storage space can be calculated as 10×id. The readiness query unit uses this address as the read address of the storage space to obtain the corresponding task instruction, and analyzes the task instruction to determine whether it is ready. So far, the "old-first" principle has been realized, and the task instruction that enters the reservation station first is read first for selective wake-up, and it may be sent to the processing unit array for execution first. In FIG. 4 , the storage space addresses corresponding to the sequentially read task instructions are 5, 3, 29...30, 1, which is also the order in which the task instructions enter the reservation station. When the mission instruction is ready to be launched, the content in the address space corresponding to the transmitted mission instruction in the storage space is no longer valid, and the number of the storage space corresponding to the transmitted mission instruction in the age table is also invalid, that is, both the storage space and the age table will be invalid. Generate "bubbles", as shown in Figure 5, the age table and storage space before and after compression of the "bubbles", the address 2 of the age table before compression and the addresses 290-299 of the storage space are all "bubbles", at this time only need to move The way the bit registers are compressed compresses the "bubbles" in the age table. From the appearance, the compressed age table no longer contains "bubbles", and the "bubbles" in the storage space still seem to be there, but when performing task scheduling, the age table is first read and then calculated based on the output of the age table The read address of the storage space is read out, so the task instruction of the "bubble" position of the storage space will not be read out, which also realizes the compression of the "bubble" of the storage space. In this example, the "bubble" of the storage space is also called the "bubble" of the reservation station.

计算资源表用于反馈任务指令所需的计算资源是否就绪；分配单元用于查询计算资源表，并将已就绪的计算资源编号发送给选择唤醒单元；回收单元用于回收已完成计算任务的计算资源；如图6所示为计算资源管理单元结构示意图。The computing resource table is used to feedback whether the computing resources required by the task instruction are ready; the allocation unit is used to query the computing resource table, and sends the ready computing resource number to the selection wake-up unit; the recycling unit is used to reclaim the computing resources of the completed computing tasks Resources; FIG. 6 is a schematic structural diagram of a computing resource management unit.

选择唤醒单元根据已就绪的计算资源编号和寄存器状态表反馈的输入寄存器的状态信息，判断任务指令是否就绪，并将已就绪的任务指令发射到外部的处理单元阵列中进行执行。The selective wakeup unit judges whether the task instruction is ready according to the ready computing resource number and the state information of the input register fed back by the register state table, and transmits the ready task instruction to the external processing unit array for execution.

一种基于任务级乱序多发射调度器的调度方法，按如下步骤进行：A scheduling method based on a task-level out-of-order multi-issue scheduler is performed as follows:

步骤1、定义取指单元向保留站发送的任务指令编号为变量p，0≤p≤M-1；定义存储空间的写地址为变量i，0×L≤i≤N×L-1；定义存储空间的读地址为变量j，0×L≤j≤N×L-1；定义年龄表的写地址为变量k，0≤k≤N；定义年龄表的读地址为变量m，0≤m≤N-1；定义保留站状态表的地址为变量n，0≤n≤N-1；定义就绪计数器为变量cnt；并初始化p＝0、i＝0、j＝0、k＝0、m＝0、n＝0、cnt＝0；并同时执行步骤2和步骤5；本发明中地址都是从0开始，第e+1个地址即为地址是e的地址。本实例中设置N＝32。以下的调度过程如图7所示。Step 1. Define the task instruction number sent by the instruction fetch unit to the reserved station as variable p, 0≤p≤M-1; define the write address of the storage space as variable i, 0×L≤i≤N×L-1; define The read address of the storage space is the variable j, 0×L≤j≤N×L-1; the write address of the age table is defined as the variable k, 0≤k≤N; the read address of the age table is defined as the variable m, 0≤m ≤N-1; define the address of the reserved station state table as variable n, 0≤n≤N-1; define the ready counter as variable cnt; and initialize p=0, i=0, j=0, k=0, m =0, n=0, cnt=0; and execute step 2 and step 5 at the same time; in the present invention, addresses start from 0, and the e+1th address is the address whose address is e. In this example, N=32 is set. The following scheduling process is shown in Figure 7.

步骤2、写地址管理单元判断存储空间的状态是否为“满”状态，若为“满”；则写地址管理单元等待存储空间的状态为“非满”状态时再开始查询；否则，写地址管理单元立即查询保留站状态表中第n+1个地址的状态位，若第n+1个地址的状态位为“空闲”，则写地址管理单元将存储空间的写地址的状态改为“已锁定”状态，再执行步骤3；否则，将n+1赋值给n，并重复执行步骤2；本实例中，保留站状态表中的初始值全部都是0，因此将状态位为“空闲”状态表示为0，而将状态位为“占用”状态表示为1。该步骤即为查询空闲存储空间和锁定写地址的过程，此时n也是存储空间中的地址空间的编号。如图2所示，保留站状态表中的一个状态位对应存储空间的一段地址空间，例如，保留站状态表中的第0个状态位对应存储空间的0～9的地址空间，保留站状态表中的第1个状态位对应存储空间的10～19的地址空间……以此类推。Step 2, the write address management unit judges whether the state of the storage space is a "full" state, if it is "full"; then the write address management unit waits for the state of the storage space to be "not full" and then starts to inquire; otherwise, the write address The management unit inquires immediately the status bit of the n+1th address in the reserved station status table, if the status bit of the n+1th address is "idle", then the write address management unit changes the status of the write address of the storage space to " Locked" state, then go to step 3; otherwise, assign n+1 to n, and repeat step 2; in this example, the initial values in the reserved station state table are all 0, so set the state bit to "idle " state is represented as 0, and the state bit is represented as "occupied" state as 1. This step is the process of querying the free storage space and locking the write address, and n is also the number of the address space in the storage space. As shown in Figure 2, a status bit in the reserved station status table corresponds to a section of address space in the storage space. The first status bit in the table corresponds to the address space 10-19 of the storage space...and so on.

步骤3、保留站判断取指单元是否发送N个任务指令中第p+1个任务指令；若已发送，则将n×L赋值给i，表示将第p+1个任务指令存储在存储空间中的第n×L+1个地址中，其中n×L即为第p+1个任务指令在存储空间中的起始地址，该任务指令占用存储空间中的地址空间为n×L～n×L+L-1，每个地址空间存储一个任务指令字；并将保留站状态表中第n+1个地址的状态位设为“占用”；写地址管理单元将写地址的状态改为“未锁定”状态，同时根据“old-first”原则将n存入年龄表的第k+1个地址中，并将k+1赋值给k，将p+1赋值给p后，执行步骤4；若未发送，则继续执行步骤3，等待取指单元发送的第p+1个任务指令；其中k即为年龄表的分配指针。写入一个任务指令后，分配指针自动递增指向年龄表的下一个地址空间。本实例中年龄表的硬件用32个移位寄存器实现，每个寄存器为5位，用于存储0～31的存储空间的编号。Step 3. The reservation station judges whether the instruction fetching unit has sent the p+1th task instruction among the N task instructions; if it has been sent, then assign n×L to i, indicating that the p+1th task instruction is stored in the storage space Among the n×L+1 addresses in , where n×L is the starting address of the p+1 task instruction in the storage space, the address space occupied by the task instruction in the storage space is n×L～n ×L+L-1, each address space stores a task instruction word; and the state bit of the n+1th address in the reserved station state table is set to "occupancy"; the write address management unit changes the state of the write address to "Unlocked" state, and at the same time store n in the k+1th address of the age table according to the "old-first" principle, and assign k+1 to k, and p+1 to p, then execute step 4 ; If not sent, proceed to step 3, waiting for the p+1th task instruction sent by the instruction fetch unit; where k is the allocation pointer of the age table. After writing a task instruction, the allocation pointer is automatically incremented to point to the next address space in the age table. In this example, the hardware of the age table is implemented with 32 shift registers, each register is 5 bits, and is used to store the number of the storage space from 0 to 31.

步骤4，判断k＝1是否成立，若成立，则将存储空间的状态改为“非空”状态，否则，维持存储空间的状态为“非空”状态；如图3所示，若k＝1，则表明写入该任务指令后年龄表的分配指针指向年龄表的第2个地址，此时存储空间的状态为“非空”状态；也说明在写入该任务指令前年龄表的分配指针指向年龄表的第1个地址，此时存储空间的状态为“空”状态；因此需要将存储空间的状态从“空”状态改为“非空”状态。否则，维持存储空间的状态为“非空”状态。Step 4, judge whether k=1 is set up, if set up, then change the state of storage space into " non-empty " state, otherwise, keep the state of storage space as " non-empty " state; As shown in Figure 3, if k= 1, it means that the allocation pointer of the age table points to the second address of the age table after writing the task instruction, and the status of the storage space is "non-empty" at this time; it also indicates the allocation of the age table before writing the task instruction The pointer points to the first address of the age table. At this time, the state of the storage space is "empty"; therefore, the state of the storage space needs to be changed from "empty" to "non-empty". Otherwise, maintain the state of the storage space as "not empty".

判断k＝N是否成立，若成立，则将存储空间的状态改为“满”状态，否则，维持存储空间的状态为“非满”状态；并返回步骤2执行；同理如图3所示，若k＝N，则表明写入该任务指令后年龄表的分配指针指向年龄表最后一个地址空间的下一个地址空间，此时存储空间的状态为“满”状态；也说明在写入该任务指令前年龄表的分配指针指向年龄表的最后一个地址空间，此时存储空间的状态为“非满”状态；因此需要将存储空间的状态从“非满”状态改为“满”状态。否则，维持存储空间的状态为“非满”状态。特别说明，“非空”和“非满”、“非空”和“满”、“非满”和“空”之间都可能存在交集。Judging whether k=N is established, if established, then change the state of the storage space to a "full" state, otherwise, maintain the state of the storage space as a "not full" state; and return to step 2 for execution; similarly as shown in Figure 3 , if k=N, then show that the allocation pointer of the age table after writing this task command points to the next address space of the last address space of the age table, and the state of the storage space is "full" state at this moment; The allocation pointer of the age table before the task instruction points to the last address space of the age table. At this time, the state of the storage space is "not full"; therefore, the state of the storage space needs to be changed from "not full" to "full". Otherwise, maintain the status of the storage space as "not full". In particular, there may be intersections between "not empty" and "not full", "not empty" and "full", "not full" and "empty".

步骤5、判断存储空间的状态是否为“非空”状态，若为“非空”状态，则根据“old-first”原则读取年龄表的第m+1个地址，获得年龄表的第m+1个地址的输出t，并将t×L赋值给j，从而使得能从存储空间的第t×L+1个地址中读取到第f个任务指令，并对第f个任务指令进行解析，获得第f个任务指令所需的计算资源信息和输入寄存器信息，再执行步骤6；否则，重复执行步骤5，等待存储空间的状态变为“非空”状态；如图4所示，就绪查询单元最开始会读取年龄表地址为0的地址，年龄表输出存储空间编号为5，根据t×L计算出存储空间的读起始地址为50，即可在10个时钟周期内读出存储空间中地址为50～59的任务指令字，即为该步骤中的第f个任务指令。同理，就绪查询单元下一次会读取年龄表地址为1的地址，年龄表输出存储空间编号为3，根据t×L计算出存储空间的读起始地址为30，即可读出存储空间中地址为30～39的任务指令字，即为第f+1个任务指令，以此类推。Step 5. Determine whether the state of the storage space is "non-empty", if it is "non-empty", read the m+1th address of the age table according to the "old-first" principle, and obtain the m-th address of the age table +1 address output t, and assign t×L to j, so that the fth task instruction can be read from the t×L+1th address of the storage space, and the fth task instruction can be read Analyze, obtain the computing resource information and input register information required by the fth task instruction, and then perform step 6; otherwise, repeat step 5, and wait for the state of the storage space to become "non-empty"; as shown in Figure 4, The ready query unit will initially read the address of the age table whose address is 0, and the output storage space number of the age table is 5. According to t×L, the read start address of the storage space is calculated as 50, which can be read within 10 clock cycles. The task instruction word whose address is 50-59 in the storage space is the fth task instruction in this step. Similarly, the ready query unit will read the address of the age table address 1 next time, and the output storage space number of the age table is 3. According to t×L, the read start address of the storage space is calculated as 30, and the storage space can be read The task instruction word whose address is 30-39 is the f+1th task instruction, and so on.

这里需要说明的是，图4中的年龄表中地址为0～28的地址中存储有存储空间的编号，这些编号是按照任务指令进入保留站的顺序写入年龄表的，此时存储空间中最先进来的任务指令存储在50～59的地址空间中，第二个任务指令存储在30～39的地址空间中，第三个任务指令存储在290～299的地址空间中……以此类推。那么按照年龄表的地址从小到大的顺序读取年龄表，就可以按任务指令进入保留站的顺序读取任务指令，从而实现“old-first”原则。What needs to be explained here is that the numbers of the storage space are stored in the addresses 0-28 in the age table in Fig. The most advanced task instruction is stored in the address space of 50~59, the second task instruction is stored in the address space of 30~39, the third task instruction is stored in the address space of 290~299...and so on . Then read the age table according to the order of the address of the age table from small to large, and you can read the task instructions in the order in which the task instructions entered the reservation station, thus realizing the "old-first" principle.

步骤6、选择唤醒单元根据第f个任务指令的第cnt+1个输入寄存器编号查询寄存器状态表，并接收寄存器状态表反馈的状态信息后，执行步骤7；Step 6, select the wake-up unit to query the register state table according to the cnt+1 input register number of the fth task instruction, and after receiving the state information fed back by the register state table, perform step 7;

步骤7、选择唤醒单元根据寄存器状态表反馈的状态信息，判断输入寄存器是否已就绪，若已就绪，则将cnt+1赋值给cnt，再执行步骤8；否则，将m+1的值赋给m，将cnt置为0，再返回步骤5执行；本实例中，为了避免重复查询输入寄存器的状态，减少访问外部寄存器状态表，当选择唤醒单元判断输入寄存器为未就绪时，则将cnt的值寄存起来，记作cnt_reg；下次再次查询该任务指令时，将该任务指令的cnt_reg赋值给cnt后继续查询，这样就可以避免再次查询之前已查询且判断为已就绪的输入寄存器的状态。这需要额外增加32个寄存器cnt_reg用于记录每个任务指令上一次查询的位置。相应的在该步骤中，将cnt寄存到cnt_reg后，不再是将cnt置为0，而是返回到步骤5将下一个任务指令的cnt_reg赋值给cnt，再继续查询下一个任务指令的输入寄存器状态。这里为了简化描述，采用将cnt置为0的方法。Step 7. Select the wake-up unit to judge whether the input register is ready according to the state information fed back by the register state table. If it is ready, assign cnt+1 to cnt, and then perform step 8; otherwise, assign the value of m+1 to m, set cnt to 0, and then return to step 5 to execute; in this example, in order to avoid repeatedly querying the state of the input register and reduce access to the external register status table, when the wake-up unit determines that the input register is not ready, the cnt The value is stored and recorded as cnt_reg; next time the task instruction is queried again, the cnt_reg of the task instruction is assigned to cnt and the query is continued, so as to avoid re-querying the status of the input register that has been queried and judged to be ready. This requires an additional 32 registers cnt_reg to record the last query position of each task instruction. Correspondingly, in this step, after storing cnt in cnt_reg, instead of setting cnt to 0, return to step 5 to assign the cnt_reg of the next task command to cnt, and then continue to query the input register of the next task command state. In order to simplify the description here, the method of setting cnt to 0 is adopted.

步骤8、选择唤醒单元将就绪计数器cnt和第f个任务指令的输入寄存器个数num进行比较，若满足cnt＜num，则返回到步骤6执行；否则，执行步骤9；当满足cnt＜num，表示此时还有输入寄存器的状态没有查询，则返回到步骤6继续查询剩余输入寄存器的状态；否则，表示所有的输入寄存器已经全部查询完并且所有的输入寄存器已经就绪，则执行步骤9查询计算资源是否就绪。Step 8, select the wake-up unit to compare the ready counter cnt with the number of input registers num of the fth task instruction, if cnt<num is satisfied, return to step 6 for execution; otherwise, execute step 9; when cnt<num is satisfied, Indicates that there are still input register statuses that have not been queried at this time, then return to step 6 and continue to query the status of the remaining input registers; otherwise, it indicates that all input registers have been queried and all input registers are ready, then perform step 9 to query and calculate Whether the resource is ready.

步骤9、选择唤醒单元根据第f个任务指令的所需计算资源信息，向计算资源管理单元发送查询命令后；执行步骤10；Step 9. Select the wake-up unit to send a query command to the computing resource management unit according to the required computing resource information of the fth task instruction; perform step 10;

步骤10、分配单元接收到选择唤醒单元发送的查询命令，扫描计算资源表，判断是否满足第f个任务指令的所需计算资源信息，若满足，则将计算资源管理单元中状态为“空闲”的计算资源分配给选择唤醒单元，并向选择唤醒单元反馈状态信息为所分配的“空闲”的计算资源编号，同时在计算资源表中将所分配的“空闲”计算资源的状态置为“占用”；若不满足，则向选择唤醒单元反馈状态信息为所需计算资源不足；并执行步骤11；如图6所示为本实例中计算单元管理单元结构示意图。图中将所分配的计算资源种类和相应计算资源编号通过打包模块产生一个计算资源分配信息发送到选择唤醒单元，这也就是当满足第f个任务指令的所需计算资源信息时向选择唤醒单元反馈的状态信息。Step 10, the allocation unit receives the query command sent by the selective wake-up unit, scans the computing resource table, and judges whether the required computing resource information of the fth task instruction is satisfied, and if so, sets the status of the computing resource management unit to "idle" The computing resources allocated to the selective wake-up unit, and the state information fed back to the selective wake-up unit is the number of the allocated "idle" computing resources, and at the same time, the status of the allocated "idle" computing resources is set to "occupied" in the computing resource table ”; if not satisfied, feed back status information to the selective wake-up unit that the required computing resources are insufficient; and perform step 11; FIG. 6 is a schematic diagram of the structure of the computing unit management unit in this example. In the figure, the allocated computing resource type and the corresponding computing resource number are generated through the packaging module to send a computing resource allocation information to the selection wake-up unit, that is, when the required computing resource information of the f-th task instruction is satisfied, the selection wake-up unit sends a message to the selection wake-up unit Feedback status information.

步骤11、选择唤醒单元根据计算资源管理单元反馈的状态信息，判断所需计算资源是否已就绪，若已就绪，此时，任务指令发射的两个条件都已经准备好：输入寄存器已就绪和计算资源已就绪。则将已就绪的第f个任务指令发射到外部的处理单元阵列中进行执行，也就是发送到处理单元阵列中由分配单元所分配的所需计算资源中执行。并将保留站状态表中第t+1个地址的状态位设为“空闲”；将年龄表的第m+1个地址的输出t置为“无效”；从而使得存储空间的第t+1个地址和年龄表的第m+1个地址中均产生了“气泡”，并同时执行步骤12和步骤13；否则，将m+1的值赋给m，将cnt置为0，并返回步骤5执行；本实例中，若所需计算资源未就绪，和步骤7类似，将cnt寄存到cnt_reg后，不再是将cnt置为0，而是返回到步骤5将下一个任务指令的cnt_reg赋值给cnt，再继续查询下一个任务指令的输入寄存器状态。Step 11. Select the wake-up unit to judge whether the required computing resources are ready according to the status information fed back by the computing resource management unit. Resource is ready. Then, the ready fth task instruction is sent to the external processing unit array for execution, that is, it is sent to the required computing resources allocated by the allocation unit in the processing unit array for execution. And set the state bit of the t+1th address in the reservation station state table as "idle"; set the output t of the m+1th address of the age table as "invalid"; thus make the t+1th address of the storage space A "bubble" is generated in both address and the m+1th address of the age table, and execute step 12 and step 13 at the same time; otherwise, assign the value of m+1 to m, set cnt to 0, and return to step 5 Execute; in this example, if the required computing resources are not ready, similar to step 7, after registering cnt to cnt_reg, instead of setting cnt to 0, return to step 5 to assign a value to cnt_reg of the next task command Give cnt, and then continue to query the input register status of the next task instruction.

步骤12、将年龄表中第m+2个地址信息到第N个地址信息依次赋值给第m+1个地址信息到第N-1个地址信息，并将k-1赋值给k；该步骤是用来压缩年龄表中的“气泡”，也就是压缩了存储空间中的“气泡”，压缩过程如图5。判断k＝0是否成立，若成立，则将存储空间的状态改为“空”状态；否则，维持在“非空”状态；若k＝0，则表明压缩“气泡”后年龄表的分配指针指向年龄表的第一个地址空间，此时存储空间的状态为“空”状态；也说明在压缩“气泡”前年龄表的分配指针指向年龄表的第二个地址空间，此时存储空间的状态为“非空”状态；因此需要将存储空间的状态从“非空”状态改为“空”状态。否则，维持存储空间的状态为“非空”状态。Step 12, assign the m+2th address information to the Nth address information in the age table to the m+1th address information to the N-1th address information in turn, and assign k-1 to k; this step It is used to compress the "bubbles" in the age table, that is, to compress the "bubbles" in the storage space. The compression process is shown in Figure 5. Determine whether k=0 is true, if true, change the state of the storage space to "empty" state; otherwise, maintain the "non-empty" state; if k=0, it indicates the allocation pointer of the age table after compressing the "bubble" Pointing to the first address space of the age table, the state of the storage space is "empty" at this time; it also shows that the allocation pointer of the age table points to the second address space of the age table before the "bubble" is compressed, and the storage space at this time Status is "Not Empty"; therefore the status of the bucket needs to be changed from "Not Empty" to "Empty". Otherwise, maintain the state of the storage space as "not empty".

判断k＝N-1是否成立，若成立，则将存储空间的状态改为“非满”状态；否则，维持在“非满”状态；将cnt置为0后，再返回步骤5执行；若k＝N-1，则表明压缩“气泡”后年龄表的分配指针指向年龄表的最后一个地址空间，此时存储空间的状态为“非满”状态；也说明在压缩“气泡”前年龄表的分配指针指向年龄表最后一个地址空间的下一个地址空间，此时存储空间的状态为“满”状态；因此需要将存储空间的状态从“满”状态改为“非满”状态。否则，维持存储空间的状态为“非满”状态。本实例中，压缩完“气泡”后，和步骤7类似，不再是将cnt置为0，而是返回到步骤5将下一个任务指令的cnt_reg赋值给cnt，再继续查询下一个任务指令的输入寄存器状态。这里因为任务已发射，不再需要将cnt寄存到cnt_reg中。Determine whether k=N-1 is true, if true, change the state of the storage space to a "not full" state; otherwise, maintain the "not full" state; set cnt to 0, and then return to step 5 for execution; if k=N-1, then it shows that the allocation pointer of the age table points to the last address space of the age table after compressing the "bubble", and the state of the storage space is "not full" at this time; it also shows that the age table is before the "bubble" is compressed The allocation pointer of the age table points to the next address space of the last address space in the age table. At this time, the status of the storage space is "full"; therefore, the status of the storage space needs to be changed from "full" to "not full". Otherwise, maintain the status of the storage space as "not full". In this example, after the "bubble" is compressed, similar to step 7, instead of setting cnt to 0, return to step 5 to assign the cnt_reg of the next task command to cnt, and then continue to query the next task command Input register status. Here, because the task has been launched, it is no longer necessary to register cnt in cnt_reg.

步骤13、回收单元若接收到处理单元阵列反馈的完成计算任务信息，则对完成计算任务信息进行解析，获得已完成计算任务指令的计算资源种类和计算资源编号，并将已完成计算任务指令的计算资源种类和计算资源编号所对应的计算资源在计算资源表中的计算资源状态改为“空闲”，再重复执行步骤13。如图6所示计算单元管理单元结构示意图中的分配单元。Step 13: If the recovery unit receives the completed computing task information fed back by the processing unit array, it analyzes the completed computing task information, obtains the computing resource type and computing resource number of the completed computing task instruction, and sends the completed computing task instruction Change the computing resource status of the computing resource corresponding to the computing resource type and computing resource number in the computing resource table to "idle", and then repeat step 13. As shown in FIG. 6 , the allocation unit in the structural schematic diagram of the computing unit management unit.

综上，步骤1为初始化过程；步骤2～步骤4为任务指令写入保留站的过程；步骤5～步骤12为任务指令的选择唤醒过程和计算资源的分配过程；步骤13为计算资源的回收过程。该调度方法借鉴了二级指针的概念，高效实现了任务指令调度过程中的“old-first”原则。In summary, step 1 is the initialization process; steps 2 to 4 are the process of writing task instructions to the reservation station; steps 5 to 12 are the selection and wake-up process of task instructions and the allocation of computing resources; step 13 is the recycling of computing resources process. The scheduling method draws on the concept of the secondary pointer, and efficiently implements the "old-first" principle in the task instruction scheduling process.

其中，计算资源管理单元采用主动分配资源的方式则将状态为“空闲”的计算资源分配给选择唤醒单元；主动分配资源的方式为：Among them, the computing resource management unit allocates the computing resources whose state is "idle" to the selective wake-up unit by actively allocating resources; the way of actively allocating resources is:

步骤1、利用若干个FIFO单元组成计算资源表，假设计算资源的种类为A种；且每种计算资源的个数为B个；且将每种计算资源的个数B分为C组，从而形成A×C个FIFO单元所组成的二维矩阵；且每列FIFO单元对应于一个基编号，分别为如图6所示为计算资源管理单元结构示意图，包括分配单元、计算资源表和回收单元。中间的二维矩阵即为计算资源表。本实例中，令A＝4、B＝32、C＝4，即共有4种计算资源，每种计算资源有32个，每种计算资源分为4组，则每组包含8个同种计算资源。由计算资源种类编码和分组编码形成一个4×4的二维矩阵，将计算资源种类编码作为二维矩阵的行y，将分组编码作为二维矩阵的列x。二维矩阵的每个元素同时又是一个长度为8的向量，表示每组计算资源编号的偏移量，该向量存储在FIFO中，每次每个FIFO提供一个偏移量记为z。因此每个偏移量都对应一个唯一的坐标(y,x,z)，其中0≤y≤3，0≤x≤3，0≤z≤7。。Step 1. Use several FIFO units to form a computing resource table, assuming that the type of computing resource is A; and the number of each computing resource is B; and the number B of each computing resource is divided into C groups, so that Form a two-dimensional matrix composed of A×C FIFO units; and each column of FIFO units corresponds to a base number, respectively FIG. 6 is a schematic structural diagram of a computing resource management unit, including an allocation unit, a computing resource table, and a recovery unit. The two-dimensional matrix in the middle is the computing resource table. In this example, set A=4, B=32, and C=4, that is, there are 4 kinds of computing resources, each computing resource has 32, each computing resource is divided into 4 groups, and each group contains 8 computing resources of the same type resource. A 4×4 two-dimensional matrix is formed by computing resource type coding and group coding, the computing resource type coding is used as row y of the two-dimensional matrix, and group coding is used as column x of the two-dimensional matrix. Each element of the two-dimensional matrix is also a vector with a length of 8, which represents the offset of each group of computing resource numbers. The vector is stored in the FIFO, and each FIFO provides an offset and recorded as z. So each offset corresponds to a unique coordinate (y,x,z), where 0≤y≤3, 0≤x≤3, 0≤z≤7. .

步骤2、对二维矩阵进行初始化，将偏移量分别写入二维矩阵的每个FIFO单元中，用于表示“空闲”状态的计算资源；在本实例中，偏移量为0～7，即在初始化阶段，将0～7依次写入每个FIFO中。Step 2. Initialize the two-dimensional matrix, and set the offset Write them into each FIFO unit of the two-dimensional matrix to represent computing resources in the "idle"state; in this example, the offset is 0~7, that is, in the initialization stage, write 0~7 in turn in a FIFO.

步骤3、每个FIFO单元读取自身的一个偏移量，并等待计算资源表满足第f个任务指令的所需计算资源信息；该步骤是FIFO主动提供一个偏移量，等待分配单元获取；当该偏移量被分配单元获取后，该FIFO立即自动读取自身的一个偏移量，再次等待分配单元获取。因为读FIFO的过程通常需要消耗2～3个时钟周期，为了避免每次获取计算资源都要等待2～3个时钟周期，所以采取了将FIFO分组的形式和主动提供的方式，这样在性能上有优势。另外，分组或者不分组所需要的FIFO的存储深度是相同的，但是所需要的FIFO的存储位宽不一样；若不分组，则需要存储的是计算资源的编号，否则需要存储的是计算资源编号的偏移量，在本实例中，每种计算资源的个数是32，计算资源编号为0～31，偏移量为0～7；因此若不分组则需要FIFO的存储位宽是5位，否则需要FIFO的存储位宽是3位。因此采用分组的方式总共可以节省的FIFO存储空间是4×32×2＝256位。Step 3. Each FIFO unit reads an offset of itself, and waits for the computing resource table to satisfy the required computing resource information of the f-th task instruction; this step is that the FIFO actively provides an offset, and waits for the allocation unit to obtain it; After the offset is acquired by the allocation unit, the FIFO automatically reads an offset of itself immediately, and waits for the allocation unit to acquire again. Because the process of reading FIFO usually takes 2 to 3 clock cycles, in order to avoid waiting for 2 to 3 clock cycles each time to obtain computing resources, the form of FIFO grouping and proactive provision is adopted, so that in terms of performance There are advantages. In addition, the storage depth of the FIFO required for grouping or not grouping is the same, but the storage bit width of the FIFO required is different; if not grouped, the number of the computing resource needs to be stored, otherwise the computing resource needs to be stored The offset of the number, in this example, the number of each type of computing resource is 32, the number of computing resources is 0-31, and the offset is 0-7; therefore, if there is no grouping, the storage bit width of the FIFO needs to be 5 bit, otherwise the storage bit width of the FIFO is required to be 3 bits. Therefore, the total FIFO storage space that can be saved by grouping is 4*32*2=256 bits.

步骤4、在计算资源表满足第f个任务指令的所需计算资源信息时，每个FIFO单元将所读取的偏移量和自身所对应的基编号进行组合后形成计算资源编号；如图6所示，本实例中的分配单元包括行解码模块、列解码模块和打包模块。计算资源表将坐标(y,x,z)发送到分配单元的行解码模块和列解码模块中，计算出基编号为x×8，偏移量为z，从而获得计算资源的编号g为：x×8+z，计算资源的种类即为y。将获得的计算资源的编号和计算资源的种类发送到打包模块中，获得所需计算资源在处理单元阵列中的坐标，即可将相应的任务指令发送到处理单元阵列的所需计算资源中执行；Step 4. When the computing resource table satisfies the required computing resource information of the fth task instruction, each FIFO unit combines the read offset with its corresponding base number to form a computing resource number; as shown in the figure 6, the allocation unit in this example includes a row decoding module, a column decoding module and a packing module. The computing resource table sends the coordinates (y, x, z) to the row decoding module and column decoding module of the allocation unit, calculates the base number as x×8, and the offset as z, so that the number g of the computing resource is obtained as: x×8+z, the type of computing resource is y. Send the obtained number and type of computing resources to the packaging module, obtain the coordinates of the required computing resources in the processing unit array, and then send the corresponding task instructions to the required computing resources in the processing unit array for execution ;

步骤5、分配单元从C组计算资源编号中任意选取满足第f个任务指令的所需计算资源信息的计算资源编号提供给选择唤醒单元。Step 5: The allocation unit arbitrarily selects a computing resource number that satisfies the required computing resource information of the f-th task instruction from the C group of computing resource numbers and provides it to the selection wakeup unit.

如图6所示，计算资源的回收过程是计算资源分配过程的逆过程，本实例中的回收单元包括解包模块、行编码模块和列编码模块。解包模块接收完成计算任务信息并解析，获得计算资源种类y和计算资源编号设为g，然后发送到行编码模块和列编码模块，获得计算资源的行为y，列为x＝g/8，偏移量为z＝g％8，其中“/”为整除运算，“％”为求余运算；从而获得坐标为(y,x,z)，将偏移量z存入相应的FIFO中，即实现了计算资源的回收过程，这对应了步骤13中要将计算资源表中的计算资源状态改为“空闲”这一操作。As shown in FIG. 6 , the recovery process of computing resources is the inverse process of the allocation process of computing resources. The recovery unit in this example includes an unpacking module, a row coding module and a column coding module. The unpacking module receives and parses the computing task information, obtains the computing resource type y and the computing resource number as g, and then sends it to the row coding module and column coding module to obtain the computing resource behavior y, and the column is x=g/8, The offset is z=g%8, wherein "/" is an integer division operation, and "%" is a remainder operation; thus the coordinates are (y, x, z), and the offset z is stored in the corresponding FIFO, That is, the recycling process of computing resources is realized, which corresponds to the operation of changing the status of computing resources in the computing resource table to "idle" in step 13.

Claims

1. a kind of out of order multi-emitting scheduler of task level is provided in processor and for dispatching M assignment instructions, the place Reason device includes：Fetch unit, buffer status table and pe array；It is characterized in that the scheduler includes：Retain It stands, select wakeup unit and managing computing resources unit；Write address administrative unit, memory space and guarantor are included in the reservation station Stay station state table；Chronological table, ready query unit and ready counter are included in the selection wakeup unit；The computing resource Computing resource table, allocation unit and recovery unit are included in administrative unit；

The memory space accommodates up to N number of assignment instructions, each task for preserving M assignment instructions in synchronization Instruction occupies continuous L address space in the memory space so that the memory space is divided into N sections, and number is followed successively by 0 ~N-1；The write address administrative unit is used to distribute N number of assignment instructions automatically the memory space of reservation station；It is described to deposit Storing up the state in space includes " sky ", " full ", " non-empty " and " non-full "；The reservation station state table is empty for storing the storage Between mode bit；The mode bit includes：" free time " or " occupancy "；

The ready query unit is used to receive the assignment instructions that the reservation station is sent and be parsed, and obtains the task and refers to Computing resource information and input register information needed for order, and it is sent respectively to the managing computing resources unit and register State table and the status information for receiving feedback；The computing resource information includes：Computing resource species and computing resource number；Institute Stating input register information includes：Input register is numbered and input register number num；The chronological table is used for store tasks Instruct address information in the memory space, and by the assignment instructions enter reservation station be sequentially presented to it is described ready Query unit；Based on the ready counter is carried out by the ready input register fed back to the buffer status table Number；

Whether the computing resource table is used for the computing resource fed back needed for the assignment instructions ready；The allocation unit is used for The computing resource table is inquired about, and ready computing resource number is sent to the selection wakeup unit；The recycling is single Member has completed the computing resource of calculating task for recycling；

The selection wakeup unit is deposited according to the input that the ready computing resource number and buffer status table are fed back The status information of device judges whether the assignment instructions are ready, and ready assignment instructions are transmitted to external processing list It is performed in element array.

2. a kind of dispatching method of the out of order multi-emitting scheduler of task based access control grade, it is characterized in that carrying out as follows：

The assignment instructions number that step 1, definition Fetch unit are sent to reservation station is variable p, 0≤p≤M-1；Definition storage is empty Between write address for variable i, 0 × L≤i≤N × L-1；The reading address for defining the memory space is variable j, 0 × L≤j≤N ×L-1；The write address for defining chronological table is variable k, 0≤k≤N；Define the reading address of the chronological table for variable m, 0≤m≤ N-1；The address for defining reservation station state table is variable n, 0≤n≤N-1；Ready counter is defined as variable cnt；And initialize p =0, i=0, j=0, k=0, m=0, n=0, cnt=0；And step 2 and step 5 are performed simultaneously；

Step 2, write address administrative unit judge whether the state of the memory space is " full " state, if " full "；It is then described Write address administrative unit starts a query at again when the state of the memory space being waited to be " non-full " state；Otherwise, the write address Administrative unit inquires about the mode bit of (n+1)th address in the reservation station state table immediately, if the mode bit of (n+1)th address is " free time ", then the state of the write address of memory space is changed to " locked " state by the write address administrative unit, then performs step Rapid 3；Otherwise, n+1 is assigned to n, and repeats step 2；

Step 3, the reservation station judge whether the Fetch unit sends+1 assignment instructions of pth in N number of assignment instructions；If It sends, then n × L is assigned to i, represent n-th × L+1 ground being stored in+1 assignment instructions of pth in the memory space In location, and the mode bit of (n+1)th address in the reservation station state table is set to " occupy "；Write address administrative unit will write ground The state of location is changed to " unlocked " state, while n is stored in+1 ground of kth of the chronological table according to " old-first " principle In location, and k+1 is assigned to k, after p+1 is assigned to p, performs step 4；If not sending, step 3 is continued to execute, wait takes Refer to+1 assignment instructions of pth that unit is sent；

Step 4, judge whether k=1 is true, if so, the state of the memory space is then changed to " non-empty " state, otherwise, The state for maintaining the memory space is " non-empty " state；

Judge whether k=N is true, if so, then the state of the memory space is changed to " to expire " state, otherwise, described in maintenance The state of memory space is " non-full " state；And return to step 2 performs；

Whether step 5, the state for judging the memory space are " non-empty " state, if " non-empty " state, then according to " old- First " principles read the m+1 address of the chronological table, obtain the output t of the m+1 address of the chronological table, and will T × L is assigned to j, so that f-th of assignment instructions can be read from the t × L+1 address of the memory space, and F-th of assignment instructions are parsed, obtain computing resource information and input register letter needed for f-th of assignment instructions Breath, then perform step 6；Otherwise, step 5 is repeated, the state of the memory space is waited to become " non-empty " state；

Step 6, selection wakeup unit inquire about register shape according to the cnt+1 input register number of f-th of assignment instructions State table, and after receiving the status information of buffer status table feedback, perform step 7；

The status information that step 7, the selection wakeup unit are fed back according to the buffer status table, judges the input deposit Whether device is ready, if ready, cnt+1 is assigned to cnt, then performs step 8；Otherwise, the value of m+1 is assigned to m, it will Cnt is set to 0, returns again to step 5 and performs；

Step 8, the selection wakeup unit are by the input register number of the ready counter cnt and f-th of assignment instructions Num is compared, if meeting cnt ＜ num, is performed back to step 6；Otherwise, step 9 is performed；

Step 9, the required computing resource information for selecting wakeup unit according to f-th of assignment instructions, to managing computing resources After unit sends querying command；Perform step 10；

Step 10, allocation unit receive the querying command that the selection wakeup unit is sent, and scan computing resource table, judgement is The no required computing resource information for meeting f-th of assignment instructions, if satisfied, then by shape in the managing computing resources unit State for " free time " computational resource allocation to the selection wakeup unit, and to it is described select wakeup unit feedback status information for The computing resource number of " free time " distributed, while by " free time " computing resource distributed in the computing resource table State is set to " occupancy "；If not satisfied, wakeup unit feedback status information then is selected as required computing resource deficiency to described；And Perform step 11；

The status information that step 11, the selection wakeup unit are fed back according to the managing computing resources unit, judges the institute It needs computing resource whether ready, if ready, f-th ready of assignment instructions is transmitted to external processing unit battle array It is performed in row, and the mode bit of the t+1 address in the reservation station state table is set to " free time "；By the chronological table The output t of the m+1 address be set to engineering noise；So that t+1 address and the chronological table of the memory space The m+1 address in generate " bubble ", and perform step 12 and step 13 simultaneously；Otherwise, the value of m+1 is assigned to m, it will Cnt is set to 0, and return to step 5 performs；

The m+2 address information in the chronological table is assigned to the m+1 address by step 12 successively to n-th address information Information is assigned to k to the N-1 address information, and by k-1；Judge whether k=0 is true, if so, then by the memory space State be changed to " sky " state；Otherwise, " non-empty " state is maintained；

Judge whether k=N-1 is true, if so, the state of the memory space is then changed to " non-full " state；Otherwise, maintain In " non-full " state；After cnt is set to 0, returns again to step 5 and perform；

If step 13, recovery unit receive the completion calculating task information of the pe array feedback, to described complete It is parsed into calculating task information, obtains the computing resource species for having completed calculating task instruction and computing resource number, and By the corresponding computing resource of the computing resource species for having completed calculating task instruction and computing resource number in the meter The computing resource state calculated in resource table is changed to " free time ", repeats and performs step 13.

3. dispatching method according to claim 2, it is characterized in that, the managing computing resources unit is using actively distribution money The mode in source then gives state to the selection wakeup unit for the computational resource allocation of " free time "；The side for actively distributing resource Formula is：

Step 1 forms the computing resource table using several cell fifos, it is assumed that the species of computing resource is A kinds；And each The number of computing resource is B；And it is C groups to divide the number B of each computing resource, so as to form A × C cell fifo institute group Into two-dimensional matrix；And each column cell fifo corresponds to a base and numbers, and is respectively

Step 2 initializes the two-dimensional matrix, by offsetIt is respectively written into each of the two-dimensional matrix In cell fifo, for representing the computing resource of " free time " state；

Step 3, each cell fifo read the offset of itself, and the computing resource table is waited to meet described f-th The required computing resource information of assignment instructions；

Step 4, when the computing resource table meets the required computing resource information of f-th of assignment instructions, each FIFO Unit forms computing resource number after the base number corresponding to read offset and itself is combined；

Step 5, the allocation unit are arbitrarily chosen from C groups computing resource number to be met needed for f-th of assignment instructions The computing resource number of computing resource information is supplied to the selection wakeup unit.