[go: up one dir, main page]

CN115129369B - Command distribution method, command distributor, chip and electronic device - Google Patents

Command distribution method, command distributor, chip and electronic device Download PDF

Info

Publication number
CN115129369B
CN115129369B CN202110323622.8A CN202110323622A CN115129369B CN 115129369 B CN115129369 B CN 115129369B CN 202110323622 A CN202110323622 A CN 202110323622A CN 115129369 B CN115129369 B CN 115129369B
Authority
CN
China
Prior art keywords
command
target
processing cycle
threads
current processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110323622.8A
Other languages
Chinese (zh)
Other versions
CN115129369A (en
Inventor
王文强
夏晓旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Power Tensors Intelligent Technology Co Ltd
Original Assignee
Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Power Tensors Intelligent Technology Co Ltd filed Critical Shanghai Power Tensors Intelligent Technology Co Ltd
Priority to CN202110323622.8A priority Critical patent/CN115129369B/en
Priority to PCT/CN2021/120535 priority patent/WO2022198955A1/en
Publication of CN115129369A publication Critical patent/CN115129369A/en
Application granted granted Critical
Publication of CN115129369B publication Critical patent/CN115129369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Bus Control (AREA)

Abstract

本公开提供了一种命令分发方法、命令分发器、芯片以及电子设备,其中,该命令分发方法包括:从多个寄存器组中确定当前处理周期对应的多个第一目标寄存器组;其中,第一目标寄存器组,与最近至少一个历史处理周期确定的第二目标寄存器组不同;从与多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程;向对应运算单元分发确定的目标线程分别对应的命令。本公开实施例中每个线程组在每个处理周期,最多会被一个运算单元访问,因此接收到命令的运算单元之间不需要进行仲裁,就能够直接对对应的寄存器组进行访问,得到命令所需要的操作数,进而提升了命令分发的效率和命令的处理效率。

The present disclosure provides a command distribution method, a command distributor, a chip, and an electronic device, wherein the command distribution method includes: determining multiple first target register groups corresponding to the current processing cycle from multiple register groups; wherein the first target register group is different from the second target register group determined in at least one recent historical processing cycle; determining the target threads corresponding to the multiple first target register groups in the current processing cycle from the thread groups corresponding to the multiple first target register groups; and distributing commands corresponding to the determined target threads to the corresponding computing units. In the embodiment of the present disclosure, each thread group will be accessed by at most one computing unit in each processing cycle, so the computing units that receive the command do not need to arbitrate, and can directly access the corresponding register group to obtain the operands required for the command, thereby improving the efficiency of command distribution and the efficiency of command processing.

Description

Command distribution method, command distributor, chip and electronic device
Technical Field
The disclosure relates to the field of computer technology, and in particular relates to a command distribution method, a command distributor, a chip and electronic equipment.
Background
The configuration of a command processing device such as a central processing unit and a graphics processor generally includes a controller, a command distributor connected to the controller, and a plurality of arithmetic units connected to the command distributor. The controller is used for receiving the command from the host, primarily processing the command and then sending the command to the command distributor, and the command distributor distributes the command to different operation units for execution. With the increasing of intensive computing tasks, hardware multithreading is widely used in the fields of image, neural network, data processing and the like as a technology capable of effectively improving parallel computing capability. Hardware multithreading effectively increases the computation speed by increasing the number of arithmetic units, maintaining a greater number of threads executing in parallel, increasing the capacity of a register file for storing command operands, and employing higher bandwidth memory, among other ways.
The current command distribution mode has the problem of low distribution efficiency.
Disclosure of Invention
The embodiment of the disclosure at least provides a command distribution method, a command distributor, a chip and electronic equipment.
In a first aspect, an embodiment of the present disclosure provides a command distribution method, including determining a plurality of first target register sets corresponding to a current processing cycle from a plurality of register sets, where the first target register sets are different from a second target register set determined by at least one recent historical processing cycle;
Determining target threads respectively corresponding to the plurality of first target register groups in the current processing cycle from thread groups respectively corresponding to the plurality of first target register groups; and distributing the determined commands respectively corresponding to the target threads to the corresponding operation units.
In a possible implementation manner, the determining the plurality of first target register sets corresponding to the current processing period from the plurality of register sets includes determining the register set with the odd number in the plurality of register sets as the first target register set when the current processing period is an odd number period, and determining the register set with the even number in the plurality of register sets as the first target register set when the current processing period is an even number period.
In a possible embodiment, the method further comprises determining the grouping number of the registers according to the number of operands of the operation unit with the largest number of required operands, and dividing the registers into the plurality of register groups.
In a possible implementation manner, the determining the target threads respectively corresponding to the plurality of first target register groups in the current processing cycle from the thread groups respectively corresponding to the plurality of first target register groups comprises determining the target threads respectively corresponding to the plurality of first target register groups in the current processing cycle from the thread groups respectively corresponding to the plurality of first target register groups based on the determined command execution state information of each thread in the thread groups respectively corresponding to the plurality of first target register groups.
In a possible implementation manner, the determining the target threads corresponding to the plurality of first target register groups in the current processing cycle from the thread groups corresponding to the plurality of first target register groups respectively based on the determined command execution state of each thread in the thread groups corresponding to the plurality of first target register groups respectively comprises determining a plurality of candidate threads with command execution state information in a ready state from the thread groups corresponding to the plurality of first target register groups respectively, and determining the target threads corresponding to the plurality of first target register groups in the current processing cycle from the plurality of candidate threads respectively.
In a possible implementation manner, the determining the target threads respectively corresponding to the first target registers in the current processing cycle from the plurality of candidate threads includes determining the target threads respectively corresponding to the first target registers in the current processing cycle from the plurality of candidate threads based on the priorities of commands to be dispatched respectively corresponding to the plurality of candidate threads.
In a possible implementation manner, the determining the target threads respectively corresponding to the first target registers in the current processing cycle from the plurality of candidate threads includes determining the target threads respectively corresponding to the first target registers in the current processing cycle from the plurality of candidate threads based on the priorities of the commands to be distributed respectively corresponding to the plurality of candidate threads and the occupancy states of the operation units corresponding to the commands to be distributed.
In a possible implementation manner, in response to a multi-operand to-be-dispatched command with more than one operand in a to-be-dispatched command corresponding to a target thread determined for a current processing cycle, each processing cycle from the current processing cycle to a target processing cycle, a first target register group corresponding to the multi-operand to-be-dispatched command distributes a corresponding operand to a to-be-dispatched command corresponding operation unit respectively, wherein the difference between the number of cycles of the target processing cycle and the current processing cycle is equal to one less than the number of the multi-operands.
In a possible implementation manner, the method further comprises the step of responding to a multi-operand to-be-dispatched command with two operands in a to-be-dispatched command corresponding to a target thread determined for a current processing period, and determining another single-operand to-be-dispatched command in a ready state for a first target register group where the single-operand to-be-dispatched command is located in a next processing period of the current processing period aiming at each single-operand to-be-dispatched command in the to-be-dispatched command corresponding to the target thread determined for the current processing period.
In a possible implementation manner, the method further comprises the steps of responding to a multi-operand to-be-dispatched command with more than one operand in a to-be-dispatched command corresponding to a target thread determined for a current processing period, determining the operand number of the multi-operand to-be-dispatched command with the largest operand number in the multi-operand to-be-dispatched command, and determining the to-be-dispatched command with the ready state for the first target register group in a thread group corresponding to the first target register group in response to the fact that the multi-operand to-be-dispatched command with more than one operand exists in the to-be-dispatched command corresponding to the target thread determined for the current processing period, wherein the operation number of the ready-state to-be-dispatched command is not more than the number of processing periods from the next processing period of the current processing period to the processing period of the first target register group to be dispatched again.
In a possible implementation manner, the method further comprises the steps of obtaining feedback information generated by the operation unit after executing the command, and generating command execution state information corresponding to a thread to which the executed command belongs based on the feedback information.
In a possible implementation manner, the method further comprises grouping the threads currently being executed based on the number of the register groups and the number of the threads currently being executed to obtain thread groups corresponding to each register group respectively.
In a second aspect, embodiments of the present disclosure provide a command dispatcher comprising a scheduler, and a dispatch interface;
the scheduler is used for determining a plurality of first target register groups corresponding to the current processing cycle from the register groups; wherein the first set of target registers is different from a second set of target registers determined by at least one recent historical processing cycle; determining target threads respectively corresponding to the plurality of first target register groups in the current processing cycle from thread groups respectively corresponding to the plurality of first target register groups;
the distributing interface is used for distributing the determined commands respectively corresponding to the target threads to the corresponding operation units.
In a possible implementation manner, the scheduler is configured, when determining, from a plurality of register sets, a plurality of first target register sets corresponding to a current processing cycle, to:
Determining a register group numbered as an odd number among the plurality of register groups as the first target register group in a case where the current processing cycle is an odd number cycle;
And determining a register group numbered even in the plurality of register groups as the first target register group in the case that the current processing cycle is an even number cycle.
In a possible implementation manner, the scheduler is further configured to:
the number of groupings of registers is determined based on the number of operands of the arithmetic unit that requires the greatest number of operands, and the registers are divided into the plurality of register banks.
In a possible implementation manner, the scheduler is configured to, in determining, from a thread group corresponding to each of the plurality of first target registers, a target thread corresponding to each of the plurality of first target registers in a current processing cycle:
and determining target threads respectively corresponding to the plurality of first target register groups in the current processing period from the thread groups respectively corresponding to the plurality of first target register groups based on the determined command execution state information of each thread in the thread groups respectively corresponding to the plurality of first target register groups.
In a possible implementation manner, the scheduler is configured to, when determining, based on the determined command execution status of each thread in the plurality of first target register groups respectively corresponding to the plurality of first target register groups, a target thread corresponding to the plurality of first target register groups in a current processing cycle from the thread groups respectively corresponding to the plurality of first target register groups, determine:
determining a plurality of alternative threads with command execution state information being ready state from thread groups respectively corresponding to the plurality of first target register groups;
And determining target threads respectively corresponding to the first target registers in the current processing cycle from the candidate threads.
In a possible implementation manner, the scheduler is configured, in determining, from the plurality of candidate threads, a target thread corresponding to each of the plurality of first target registers in the current processing cycle, to:
and determining target threads respectively corresponding to the first target registers in the current processing cycle from the plurality of candidate threads based on the priorities of the commands to be distributed respectively corresponding to the plurality of candidate threads.
In a possible implementation manner, the scheduler is configured, in determining, from the plurality of candidate threads, a target thread corresponding to each of the plurality of first target registers in the current processing cycle, to:
And determining target threads respectively corresponding to the plurality of first target registers in the current processing period from the plurality of candidate threads based on the priorities of the commands to be distributed respectively corresponding to the plurality of candidate threads and the occupation states of operation units corresponding to the commands to be distributed.
In a possible implementation manner, the scheduler is configured, in determining, from the plurality of candidate threads, a target thread corresponding to each of the plurality of first target registers in the current processing cycle, to:
and determining target threads respectively corresponding to the plurality of first target registers in the current processing period from the plurality of candidate threads based on the command types of the current commands to be distributed respectively corresponding to the plurality of candidate threads and the types of the operation units.
In a possible implementation manner, the scheduler is further configured to:
Responding to a multi-operand to-be-dispatched command with more than one operand in a to-be-dispatched command corresponding to a target thread determined for a current processing period, and respectively dispatching a corresponding one operand to a corresponding operation unit of the to-be-dispatched command from the current processing period to each processing period of the target processing period by a first target register group corresponding to the multi-operand to-be-dispatched command;
The difference value between the cycle number of the target processing cycle and the current processing cycle is equal to one less than the number of the multiple operands.
In a possible implementation manner, the scheduler is further configured to:
In response to a multi-operand to-be-dispatched command with two operands in a to-be-dispatched command corresponding to a target thread determined for a current processing cycle, determining another single-operand to-be-dispatched command in a ready state for a first target register group where the single-operand to-be-dispatched command is located in a next processing cycle of the current processing cycle for each single-operand to-be-dispatched command existing in the to-be-dispatched command corresponding to the target thread determined for the current processing cycle.
In a possible embodiment, the scheduler is further configured to:
Responding to a multi-operand to-be-dispatched command with more than one operand in a target thread corresponding to a current processing period, and determining the operand number of the multi-operand to-be-dispatched command with the largest operand number in the multi-operand to-be-dispatched command;
For each other command to be dispatched, in which the number of operands in the command to be dispatched corresponding to the target thread determined for the current processing cycle is less than the maximum number of operands, determining a command to be dispatched in a ready state for the first target register group from the thread group corresponding to the first target register group in response to the first target register group where the other command to be dispatched is idle for each processing cycle from the next processing cycle of the current processing cycle to the processing cycle before the first target register group is scheduled again;
The operation number of the ready-state commands to be distributed is not more than the number of processing cycles from the processing cycle of the ready-state commands to be distributed to the processing cycle of the first target register group which is scheduled again.
In a possible implementation manner, the scheduler is further used for acquiring feedback information generated by the operation unit after executing the command;
and generating command execution state information corresponding to the thread to which the executed command belongs based on the feedback information.
In a possible implementation manner, the scheduler is further configured to:
And grouping the threads currently being executed based on the number of the register groups and the number of the threads currently being executed to obtain thread groups corresponding to each register group.
In a third aspect, embodiments of the present disclosure further provide a chip, a controller, a command distributor, and an operator;
the controller is used for acquiring commands corresponding to the threads respectively and sending the commands to the command distributor;
the command distributor is configured to distribute the command to the operator based on the command distribution method according to any one of the first aspects;
the operator is configured to read an operand from a target register group corresponding to the command based on the command distributed by the command distributor, and execute the command based on the operand.
In a fourth aspect, an embodiment of the disclosure further provides an electronic device, including the chip described in the third aspect.
In a fifth aspect, embodiments of the present disclosure also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the command distribution method according to any of the first aspects above.
The description of the effects of the command distributor, the chip and the electronic device is referred to the description of the command distributing method, and is not repeated here.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
FIG. 1 illustrates a flow chart of a command distribution method provided by an embodiment of the present disclosure;
FIG. 2 illustrates a specific example of a command distribution device provided by an embodiment of the present disclosure;
Fig. 3 is a schematic diagram showing a specific example of command distribution by the command distribution device according to the embodiment of the present disclosure;
FIG. 4 illustrates a schematic diagram of a command distributor provided by an embodiment of the present disclosure;
Fig. 5 shows a schematic structural diagram of a chip according to an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
It has been found that the command processing apparatus includes a controller, a command distribution unit, and a plurality of arithmetic units. After the command distribution unit distributes the commands to the operation units, the operation units executing different commands obtain the reading authority of the register through arbitration, after the reading authority of the register is obtained, the operands required by executing the commands are read from the register, and then the arbitration of the reading authority of the register file can cause certain delay based on the read operands, thereby influencing the throughput rate of the operation units to the commands and further causing the problem of low command distribution efficiency and lower processing efficiency of executing single commands.
In addition, after the command is distributed to the command distribution unit, if the operand required for executing the command is not ready, the operation unit will switch to executing the command corresponding to other threads, which requires the command distribution unit to distribute the new command to the operation unit, which results in that the command distributed to the operation unit may have a command which cannot be executed immediately (the command may be executed after the operand is ready is required to wait), thus resulting in reduced efficiency of command distribution and lower processing efficiency of the command.
Based on the above study, the present disclosure provides a command distribution method that divides registers in a register file into a plurality of register groups, and different register groups correspond to different thread groups. In each processing period, a plurality of first target register groups are determined, target threads respectively corresponding to the plurality of first target register groups in the current processing period are determined from thread groups respectively corresponding to the plurality of first target register groups, and commands respectively corresponding to the determined target threads are distributed to corresponding operation units, so that each thread group can be accessed by one operation unit at most in each processing period, and therefore, the operation units receiving the commands can directly access the corresponding register groups without arbitration, operands required by the commands are obtained, the command distribution efficiency is improved, and the command processing efficiency is improved.
The defects of the scheme are all results obtained by the inventor after practice and careful study, and therefore, the discovery process of the above problems and the solutions to the above problems set forth hereinafter by the present disclosure should be all contributions of the inventors to the present disclosure during the course of the present disclosure.
It should be noted that like reference numerals and letters refer to like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
For the sake of understanding the present embodiment, a detailed description will be given first of one of the command distribution methods disclosed in the embodiments of the present disclosure, where an execution body of the command distribution method provided in the embodiments of the present disclosure is generally a command processing device such as a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), an artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) chip, and the like.
In the embodiment of the disclosure, the operand of the command is data which needs to be read from an external memory when the command is executed, for example, the command is that the corresponding command operand is data A and data B when the data A and the data B are subjected to multiplication operation, and for example, the command is that the feature map M to be processed is subjected to convolution operation by using a convolution kernel F, and the corresponding command operand is the feature map M and the convolution kernel F.
The command distribution method provided by the embodiment of the present disclosure is described below.
Referring to fig. 1, a flowchart of a command distribution method according to an embodiment of the disclosure is shown, where the method includes steps S101 to S103, where:
S101, determining a plurality of first target register groups corresponding to a current processing period from a plurality of register groups, wherein the first target register groups are different from a second target register group determined by at least one latest historical processing period;
s102, determining target threads respectively corresponding to the plurality of first target register groups in the current processing period from the thread groups respectively corresponding to the plurality of first target register groups;
And S103, distributing the commands respectively corresponding to the determined target threads to the corresponding operation units.
According to the embodiment of the disclosure, the register is divided into the plurality of register groups, the plurality of first target register groups corresponding to the current processing period are determined from the plurality of register groups in each processing period, then the target threads corresponding to the plurality of first target register groups in the current processing period are determined from the thread groups corresponding to the plurality of first target register groups respectively, after the commands corresponding to the plurality of target threads are distributed to the operation units respectively, as at most one operation unit accessing one register group in one processing period is needed, the corresponding register groups can be directly accessed without arbitration among the operation units receiving the commands, the operand required by the commands is obtained, the efficiency of command distribution is further improved, and the processing efficiency of the commands is improved.
The following describes the steps S101 to S103 in detail.
In S101, for example, a plurality of registers may be divided into a plurality of register groups in advance, and a thread currently issued by a command in a host may be also divided into a plurality of thread groups, each of the register groups corresponds to one of the thread groups, and for each of the thread groups, an operand corresponding to the command generated by each of the threads in the thread group is stored in a register in the register group corresponding to the thread group.
Here, for example, the threads currently being executed may be grouped based on the number of register groups and the number of threads currently being executed, to obtain each of the register groups corresponding to each of the register groups.
When determining a plurality of first target register sets corresponding to the current processing cycle from the plurality of register sets, for example, the plurality of register sets may be divided into at least two groups, each group including the plurality of register sets therein. And in each processing period, determining the register group included in one group as a first target register group corresponding to the processing period. The register sets in different groupings are alternately determined as corresponding target register sets for a plurality of processing cycles, respectively, over a plurality of processing cycles.
For example, each register group may be numbered, and for the processing cycle number, the correspondence between the register group number and the processing cycle number may be predetermined. For example, in an even number of processing cycles, the register group with an even number is determined as a target register group corresponding to the even number of processing cycles, and in an odd number of processing cycles, the register group with an odd number is determined as a target register group corresponding to the odd number of processing cycles.
Here, the number of registers grouped may be related to the number of operands required in the executed command, for example, if the number of operands required in each command to be executed is at most 2, the number of register groups is also 2, and if the number of operands required in each command to be executed is at most 3, the number of register groups is 3. In the case that the number of operands required in each command to be executed is at most n, that is, the number of register sets is n, in the ith processing period, if the 1 st register set in the n th register set is taken as the target register set, an operation unit receiving the command with the number of operations of n is used as the target register set, after receiving the command in the current ith processing period, the operation unit reads one operand from the corresponding first target register set in the current ith processing period, then in the following ith (i+1) th to (i+n) -1 th processing periods, n-1 operands remained in the corresponding first target register set are read, and in the following (i+1) th to (i+n) -1 th processing periods, the 2 nd to n-th register sets are taken as the target register sets respectively corresponding to the i+1 th to (i+n) -1 th processing periods, so that the conflict of the operation unit accessing the same register set is avoided.
For S102, after determining the plurality of first target register groups corresponding to the current processing cycle, for example, a plurality of target threads may be determined from thread groups corresponding to the plurality of first target register groups in at least one of the following manners:
(1) Determining the sequence of a plurality of threads in a thread group corresponding to each first target register group according to a circulating mode, and determining different threads in the thread group as target threads corresponding to the first target register group in different processing cycles according to the sequence.
(2) And regarding each first target register group, taking the thread with a command in the current processing cycle in the thread group corresponding to the first target register group as an alternative thread, and taking the alternative thread with the highest priority as the target thread of the first target register group in the current processing cycle according to the priority of each alternative thread.
(3) And determining target threads respectively corresponding to the plurality of first target register groups in the current processing period based on the determined command execution state information of each thread in the thread groups respectively corresponding to the plurality of first target register groups.
Here, for example, for each first target register group, a plurality of candidate threads for which command execution state information is ready may be determined from a thread group corresponding to the first target register group, and a target thread corresponding to each of the plurality of first target register groups in the current processing cycle may be determined from the plurality of candidate threads.
The command execution status information corresponding to any thread includes, for example, whether the command that the thread has recently dispatched to the arithmetic unit has been executed, and/or whether the operand required by the thread corresponding to the command to be dispatched is ready.
The operands corresponding to the command are ready, e.g., the data generated by other commands on which the command depends have been stored in the corresponding registers, and/or operands that need to be read from external memory have been stored in the corresponding registers.
If the instruction recently dispatched to the arithmetic unit by the thread is executed and/or the operand required by the current instruction to be dispatched corresponding to the thread is ready, the instruction execution state information corresponding to the thread is considered as a ready state, and the thread corresponding to the instruction can be used as a target thread.
In this case, in another embodiment of the present disclosure, the method further includes obtaining feedback information generated by the operation unit after executing the command, and generating command execution state information corresponding to a thread to which the executed command belongs based on the feedback information.
Thus, the command distributor can know the execution condition of each operation unit on the command in real time.
In one possible embodiment, in the above (3), the number of target threads specified for a certain first target register group may be greater than 1, or the target threads may be specified from a plurality of threads satisfying the requirement (3) in combination with priorities corresponding to the respective threads or in a round-robin manner.
It should be noted here that there may be a case where, in a certain processing cycle, a certain first target register set does not have a target thread, i.e. the number of target threads determined is less than the number of first target accumulator sets.
In determining a target thread corresponding to each of the plurality of first target registers in the current processing cycle from the determined plurality of candidate threads, any one of the following ①~③ may be used, for example:
① Determining target threads respectively corresponding to the first target registers in the current processing period from the candidate threads based on the priorities of commands to be distributed corresponding to the candidate threads.
② Determining target threads respectively corresponding to the first target registers in the current processing period from the plurality of candidate threads based on the priorities of the commands to be distributed respectively corresponding to the plurality of candidate threads and the occupation states of operation units corresponding to the commands to be distributed.
The occupancy state of the arithmetic unit may include, for example, that a specific target thread has been allocated to the arithmetic unit during the current processing cycle, and/or that the number of commands received by the arithmetic unit during the historical processing cycle and not executed reaches a preset number.
Illustratively, the following is performed in order of priority from high to low:
Determining at least one command to be distributed with highest priority according to the priorities of the commands to be distributed, which correspond to the candidate threads, determining whether the command to be distributed with highest priority can be distributed to the corresponding operation unit or not based on the occupation state of the operation unit corresponding to the command to be distributed with highest priority, and determining the candidate thread corresponding to the command to be distributed with highest priority as a target thread if the command to be distributed with highest priority can be distributed to the corresponding operation unit. If the instruction to be distributed cannot be distributed to the corresponding operation unit, the alternative thread corresponding to the instruction to be distributed is not taken as the target thread.
Then, at least one command to be distributed with high priority is determined from the candidate threads, and whether the command to be distributed with high priority can be distributed to the corresponding operation unit is determined based on the occupation state of the operation unit corresponding to the command to be distributed with high priority.
......
And determining at least one command to be distributed with the lowest priority from the candidate threads, and then determining whether the command to be distributed with the lowest priority can be distributed to the corresponding operation unit based on the occupation state of the operation unit corresponding to the command to be distributed with the lowest priority.
Based on the above procedure, a target thread is determined from the plurality of candidate threads that corresponds to the plurality of first target registers, respectively, at the current processing cycle.
③ Determining target threads respectively corresponding to the first target registers in the current processing period from a plurality of candidate threads based on the command type of the current command to be distributed and the type of the operation unit, wherein the command type corresponds to the candidate threads.
Here, the types of the arithmetic units are different, and the types of commands that can be processed are also different.
The method comprises the steps of enabling an arithmetic operation unit to process an arithmetic operation command, enabling a write address operation unit to process a write address command, enabling a read address operation unit to process a read address command, and enabling an override function operation unit to process an override function.
When the target thread is determined, a plurality of target threads which can be respectively matched with the types of the operation units are determined from the candidate threads according to the types of the commands to be distributed corresponding to the candidate threads, and then the current commands to be distributed corresponding to the target threads are distributed to the operation units with the matched types.
In another embodiment of the present disclosure, for some commands, the number of operands required in executing the command may be different.
After the command to be distributed corresponding to the target thread is distributed to the operation unit, the operation unit needs at least one period to read the operands corresponding to the command to be distributed from the corresponding register group, wherein the number of the periods for reading the operands is the same as the number of the operands corresponding to the command to be distributed.
Further, in response to a multi-operand to-be-dispatched command with more than one operand in a to-be-dispatched command corresponding to a target thread determined for a current processing cycle, each processing cycle from the current processing cycle to a target processing cycle, a first target register group corresponding to the multi-operand to-be-dispatched command distributes a corresponding one operand to a corresponding operation unit of the to-be-dispatched command respectively;
The difference value between the cycle number of the target processing cycle and the current processing cycle is equal to one less than the number of the multiple operands.
For the commands to be distributed with fewer operands, the operation unit can respectively read the operands corresponding to different commands to be distributed from the same target register group in a plurality of periods.
Further, in response to a multi-operand to-be-dispatched command with more than one operand in a to-be-dispatched command corresponding to a target thread determined for a current processing cycle, determining the operand number of the multi-operand to-be-dispatched command with the largest operand number in the multi-operand to-be-dispatched command;
For each other command to be dispatched, in which the number of operands in the command to be dispatched corresponding to the target thread determined for the current processing cycle is less than the maximum number of operands, determining a command to be dispatched in a ready state for the first target register group from the thread group corresponding to the first target register group in response to the first target register group where the other command to be dispatched is idle for each processing cycle from the next processing cycle of the current processing cycle to the processing cycle before the first target register group is scheduled again;
The operation number of the ready-state commands to be distributed is not more than the number of processing cycles from the processing cycle of the ready-state commands to be distributed to the processing cycle of the first target register group which is scheduled again.
For example, in response to a multi-operand to-be-dispatched command with two operands in a to-be-dispatched command corresponding to a target thread determined for a current processing cycle, for each single-operand to-be-dispatched command existing in the to-be-dispatched command corresponding to the target thread determined for the current processing cycle, another single-operand to-be-dispatched command in a ready state is determined for a first target register group in which the single-operand to-be-dispatched command is located in a next processing cycle of the current processing cycle.
In this way, the operation unit can read a plurality of operands corresponding to the multi-operand to-be-distributed instruction from the first target register set corresponding to the multi-operand to-be-distributed instruction in a plurality of processing cycles respectively, and simultaneously can read operands corresponding to different single-operand to-be-distributed commands from the single-operand to-be-distributed instruction to the first target register set in the plurality of processing cycles respectively, so that the efficiency of data reading is improved under the condition of avoiding the reading conflict to the same target register set.
In one embodiment, the number N of the groups of registers may be determined according to the number of operands of the operation unit with the largest number of required operands, so that in the consecutive N processing periods, the N number of registers may be respectively scheduled, and then after the ith processing period, the ith register group may be scheduled again after the ith processing period, and after N-i periods, the command corresponding to the register group scheduled by the ith processing period may be scheduled again, and assuming that the command corresponding to the register group scheduled by the ith processing period just needs N operands, then for the ith register group, the N number of operands may be respectively distributed to the corresponding operation units in the N periods, and if the number of operands required by the command corresponding to the register group scheduled by the ith processing period is less than N, the command with the number of operands matched with the number of remaining periods before the next scheduled may be flexibly scheduled, so as to improve the data reading efficiency.
Referring to fig. 2 and 3, the embodiment of the present disclosure further provides a command distribution apparatus, and a specific example of command distribution using the same, in which the example includes a command distributor, and 5 arithmetic units connected to the command distributor, the 5 arithmetic units being respectively:
Two arithmetic units (ARITHMETIC AND Logic Unit ALUs) are used to process instructions requiring two operands.
A write address Unit (ST) and the processed instruction requires two operands.
A read address arithmetic Unit (LD) and the processed instruction requires an operand.
An override function arithmetic unit (Tensor Function Unit, TFU) is provided for processing instructions requiring two operands. The total number of threads is 64, namely threads 0to 63,8 register sets (banks), namely banks 0to banks 7 and 5 arithmetic units. Each register set is allocated 8 threads.
Each Bank has only one read path, and in one processing cycle, different operation units access the same register set and conflict, and different operation units access different banks and do not conflict.
For an instruction with two operands, the reading of the operands needs to be performed in two cycles in the same Bank.
In the odd cycles, register groups numbered 1,3, 5, and 7 are taken as the first register group.
The valid and highest priority ALU, ST, LD, TFU instructions are selected from the 8 threads allocated to each even numbered Bank.
From these even numbered banks, the two highest priority ALU instructions are first selected and dispatched.
And under the condition that the bank corresponding to the ALU instruction is occupied, selecting the ST instruction from the rest banks and distributing the ST instruction.
When the bank of ALU and ST instructions is occupied, LD instructions are selected from the rest banks and distributed.
Since ALU and ST instructions are two operands, the operands still need to be read from the same bank in the next cycle, TFU instructions are selected from the remaining banks and distributed in the next processing cycle if the banks of ALU and ST instructions are occupied.
The two-operand instruction distributed in the even cycle needs to continue to read the instruction of the same even bank in the next odd cycle, but the problem of bank conflict does not occur because the instruction of the odd bank is only distributed in the next cycle. The same applies to the scheduling mode of the odd cycle.
As shown in fig. 3:
A, in the 0 th processing period, the determined bands are respectively band 0, band 2, band 4 and band 6.
The command determined for Bank0 is an ALU command, and the operation unit that reads the operand from Bank0 is ALU0, and distributes the ALU command to ALU0 in the 0 th processing cycle. In the 0 th processing cycle and the 1 st processing cycle, the arithmetic unit ALU0 reads the first operand alu0_r0 and the second operand alu0_r1 from Bank0, respectively.
The command determined for Bank2 is an ST command, and in the 0 th processing cycle, the ST command is distributed to the ST unit. The operation unit that reads the operands from this Bank2 is ST, and in the 0 th processing cycle, and the 1 ST processing cycle, the operation unit ST reads the first operand st_r0 and the second operand st_r1 from the Bank2, respectively.
The commands determined for Bank4 are LD commands, and TFU commands, and the arithmetic units that read operands from this Bank4 are LD and TFU. In the 0 th processing period, the LD command is distributed to the operation unit LD, the operation unit LD reads the operand corresponding to the LD command from the Bank4, and in the 1 st processing period, the TFU command is distributed to the operation unit TFU, and the operation unit TFU reads the operand corresponding to the TFU command from the Bank 4.
The commands determined for Bank6 are ALU commands and, at processing cycle 0, the ALU commands are distributed to the ALU units. The arithmetic unit that reads the operands from Bank6 is an ALU, and in the 0 th processing cycle and the 1 st processing cycle, the arithmetic unit ALU reads the first operand alu1_r0 and the second operand alu1_r1 from Bank6, respectively.
And B, in the 1 st processing period, the determined bands are respectively band 1, band 3, band 5 and band 6.
The command determined for Bank1 is an ST command, and in the 1 ST processing cycle, the ST command is distributed to the ST unit. The arithmetic unit that reads the operands from this Bank1 is ST, and in the 1 ST processing cycle, and the 2 nd processing cycle, the arithmetic unit ST reads the first operand st_r0 and the second operand st_r1 from the Bank1, respectively.
The command determined for Bank3 is an ALU command, and the arithmetic unit that reads the operands from Bank3 is ALU0, and distributes the ALU command to ALU0 in the 1 st processing cycle. In the 2 nd processing cycle and the 1 st processing cycle, the arithmetic unit ALU0 reads the first operand alu0_r0 and the second operand alu0_r1 from Bank3, respectively.
The commands determined for Bank5 are ALU commands and, in the 1 st processing cycle, the ALU commands are distributed to the ALU units. The arithmetic unit that reads the operands from Bank5 is ALU1, and in the 1 st processing cycle and the 2 nd processing cycle, the arithmetic unit ALU1 reads the first operand alu1_r0 and the second operand alu1_r1 from Bank5, respectively.
The commands determined for Bank7 are LD commands, and TFU commands, and the arithmetic units that read operands from this Bank7 are LD and TFU. In the 1 st processing period, the LD command is distributed to the operation unit LD, the operation unit LD reads the operand corresponding to the LD command from the Bank7, and in the 2 nd processing period, the TFU command is distributed to the operation unit TFU, and the operation unit TFU reads the operand corresponding to the TFU command from the Bank 7.
Then in the 3 rd processing period and the 4 th processing period until the 8 th processing period, and further in the mode, in the same processing period, only one operation unit of each register group is ensured to be accessed, so that data conflict caused by that a plurality of operation units access the same register group in the same processing period is avoided, the command distribution efficiency is improved, and the command processing efficiency is further improved.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiment of the present disclosure further provides a command distributor corresponding to the command distribution method, and since the principle of the command distributor in the embodiment of the present disclosure for solving the problem is similar to that of the command distribution method in the embodiment of the present disclosure, the implementation of the command distributor may refer to the implementation of the method, and the repetition is omitted.
Referring to FIG. 4, a schematic diagram of a command distributor according to an embodiment of the present disclosure is provided, where the command distributor includes a scheduler 41 and a distribution interface 42;
The scheduler 41 is configured to determine a plurality of first target register sets corresponding to a current processing cycle from a plurality of register sets, where the first target register sets are different from a second target register set determined by at least one last historical processing cycle;
the dispatch interface 42 is configured to dispatch the commands corresponding to the determined target threads to the corresponding computing units.
In a possible implementation manner, the scheduler 41 is configured, when determining, from a plurality of register sets, a plurality of first target register sets corresponding to a current processing cycle, to:
Determining a register group numbered as an odd number among the plurality of register groups as the first target register group in a case where the current processing cycle is an odd number cycle;
And determining a register group numbered even in the plurality of register groups as the first target register group in the case that the current processing cycle is an even number cycle.
In a possible implementation, the scheduler 41 is further configured to:
the number of groupings of registers is determined based on the number of operands of the arithmetic unit that requires the greatest number of operands, and the registers are divided into the plurality of register banks.
In a possible implementation manner, the scheduler 41 is configured, in determining, from a thread group corresponding to each of the plurality of first target registers, a target thread corresponding to each of the plurality of first target registers in a current processing cycle, to:
and determining target threads respectively corresponding to the plurality of first target register groups in the current processing period from the thread groups respectively corresponding to the plurality of first target register groups based on the determined command execution state information of each thread in the thread groups respectively corresponding to the plurality of first target register groups.
In a possible implementation manner, the scheduler 41 is configured to, when determining, based on the determined command execution status of each thread in the thread group respectively corresponding to the plurality of first target registers, a target thread corresponding to each of the plurality of first target registers in the current processing cycle from the thread group respectively corresponding to the plurality of first target registers, determine:
determining a plurality of alternative threads with command execution state information being ready state from thread groups respectively corresponding to the plurality of first target register groups;
And determining target threads respectively corresponding to the first target registers in the current processing cycle from the candidate threads.
In a possible implementation manner, the scheduler 41, when determining, from the plurality of candidate threads, a target thread corresponding to each of the plurality of first target registers in the current processing cycle, is configured to:
and determining target threads respectively corresponding to the first target registers in the current processing cycle from the plurality of candidate threads based on the priorities of the commands to be distributed respectively corresponding to the plurality of candidate threads.
In a possible implementation manner, the scheduler 41, when determining, from the plurality of candidate threads, a target thread corresponding to each of the plurality of first target registers in the current processing cycle, is configured to:
And determining target threads respectively corresponding to the plurality of first target registers in the current processing period from the plurality of candidate threads based on the priorities of the commands to be distributed respectively corresponding to the plurality of candidate threads and the occupation states of operation units corresponding to the commands to be distributed.
In a possible implementation manner, the scheduler 41, when determining, from the plurality of candidate threads, a target thread corresponding to each of the plurality of first target registers in the current processing cycle, is configured to:
and determining target threads respectively corresponding to the plurality of first target registers in the current processing period from the plurality of candidate threads based on the command types of the current commands to be distributed respectively corresponding to the plurality of candidate threads and the types of the operation units.
In a possible implementation, the scheduler 41 is further configured to:
Responding to a multi-operand to-be-dispatched command with more than one operand in a to-be-dispatched command corresponding to a target thread determined for a current processing period, and respectively dispatching a corresponding one operand to a corresponding operation unit of the to-be-dispatched command from the current processing period to each processing period of the target processing period by a first target register group corresponding to the multi-operand to-be-dispatched command;
The difference value between the cycle number of the target processing cycle and the current processing cycle is equal to one less than the number of the multiple operands.
In a possible implementation, the scheduler 41 is further configured to:
In response to a multi-operand to-be-dispatched command with two operands in a to-be-dispatched command corresponding to a target thread determined for a current processing cycle, determining another single-operand to-be-dispatched command in a ready state for a first target register group where the single-operand to-be-dispatched command is located in a next processing cycle of the current processing cycle for each single-operand to-be-dispatched command existing in the to-be-dispatched command corresponding to the target thread determined for the current processing cycle.
In a possible embodiment, the scheduler 41 is further configured to:
Responding to a multi-operand to-be-dispatched command with more than one operand in a target thread corresponding to a current processing period, and determining the operand number of the multi-operand to-be-dispatched command with the largest operand number in the multi-operand to-be-dispatched command;
For each other command to be dispatched, in which the number of operands in the command to be dispatched corresponding to the target thread determined for the current processing cycle is less than the maximum number of operands, determining a command to be dispatched in a ready state for the first target register group from the thread group corresponding to the first target register group in response to the first target register group where the other command to be dispatched is idle for each processing cycle from the next processing cycle of the current processing cycle to the processing cycle before the first target register group is scheduled again;
The operation number of the ready-state commands to be distributed is not more than the number of processing cycles from the processing cycle of the ready-state commands to be distributed to the processing cycle of the first target register group which is scheduled again.
In a possible implementation manner, the scheduler 41 is further configured to obtain feedback information generated by the operation unit after executing the command;
and generating command execution state information corresponding to the thread to which the executed command belongs based on the feedback information.
In a possible implementation, the scheduler 41 is further configured to:
And grouping the threads currently being executed based on the number of the register groups and the number of the threads currently being executed to obtain thread groups corresponding to each register group.
For a description of the processing flow of each module in the command distributor, and the interaction flow between the modules, reference is made to the relevant description in the above method embodiment, and will not be described in detail here.
In addition, the command distributor provided by the embodiment of the present disclosure may be a chip capable of implementing the command distribution method provided by the embodiment of the present disclosure.
The embodiment of the disclosure also provides a chip, as shown in fig. 5, comprising a controller 51, a command distributor 52, and an operator 53;
the controller 51 is configured to obtain commands corresponding to the multiple threads, and send the commands to the command distributor 52;
the command distributor 52 is configured to distribute the command to the arithmetic unit 53 based on a command distribution method provided by any one of the embodiments of the present disclosure;
The operator 53 is configured to read an operand from a first target register group corresponding to the command, and execute the command based on the operand.
The specific process of the specific execution command of the command execution device may refer to the steps of the command distribution method described in the embodiments of the present disclosure, which is not described herein.
The embodiment of the disclosure also provides electronic equipment, which comprises the chip provided by any embodiment of the disclosure.
The disclosed embodiments also provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the command distribution method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, where instructions included in the program code may be used to perform the steps of the command distribution method described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein in detail.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one operation unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several commands to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. The storage medium includes a U disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It should be noted that the foregoing embodiments are merely specific implementations of the disclosure, and are not intended to limit the scope of the disclosure, and although the disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that any modification, variation or substitution of some of the technical features described in the foregoing embodiments may be made or equivalents may be substituted for those within the scope of the disclosure without departing from the spirit and scope of the technical aspects of the embodiments of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (27)

1.一种命令分发方法,其特征在于,包括:1. A command distribution method, comprising: 从多个寄存器组中确定当前处理周期对应的多个第一目标寄存器组;其中,所述第一目标寄存器组,与最近至少一个历史处理周期确定的第二目标寄存器组不同;Determine a plurality of first target register groups corresponding to a current processing cycle from a plurality of register groups; wherein the first target register groups are different from a second target register group determined in at least one recent historical processing cycle; 从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程;Determine, from the thread groups corresponding to the plurality of first target register groups respectively, target threads corresponding to the plurality of first target register groups respectively in a current processing cycle; 向对应运算单元分发确定的目标线程分别对应的命令;Distribute commands corresponding to the determined target threads to the corresponding computing units; 还包括:Also includes: 响应于为当前处理周期确定的目标线程对应的待分发命令中存在两个操作数的多操作数待分发命令,针对为当前处理周期确定的目标线程对应的待分发命令中存在的每个单操作数待分发命令,在当前处理周期的下一处理周期,为该单操作数待分发命令所在第一目标寄存器组确定就绪状态的另一个单操作数待分发命令。In response to the presence of a multi-operand to-be-dispatched command having two operands in the commands to-be-dispatched corresponding to the target thread determined for the current processing cycle, for each single-operand to-be-dispatched command in the commands to-be-dispatched corresponding to the target thread determined for the current processing cycle, in the next processing cycle of the current processing cycle, another single-operand to-be-dispatched command in a ready state is determined for the first target register group where the single-operand to-be-dispatched command is located. 2.根据权利要求1所述的命令分发方法,其特征在于,所述从多个寄存器组中确定当前处理周期对应的多个第一目标寄存器组,包括:2. The command distribution method according to claim 1, wherein determining a plurality of first target register groups corresponding to the current processing cycle from the plurality of register groups comprises: 在所述当前处理周期为奇数周期的情况下,将所述多个寄存器组中编号为奇数的寄存器组确定为所述第一目标寄存器组;In a case where the current processing cycle is an odd-numbered cycle, determining an odd-numbered register group among the multiple register groups as the first target register group; 在所述当前处理周期为偶数周期的情况下,将所述多个寄存器组中编号为偶数的寄存器组确定为所述第一目标寄存器组。When the current processing cycle is an even-numbered cycle, an even-numbered register group among the multiple register groups is determined as the first target register group. 3.根据权利要求1所述的命令分发方法,其特征在于,还包括:3. The command distribution method according to claim 1, further comprising: 根据所需操作数数量最多的运算单元的操作数的数量,确定寄存器的分组数量,并将寄存器划分成所述多个寄存器组。The number of register groups is determined according to the number of operands of the operation unit requiring the largest number of operands, and the registers are divided into the plurality of register groups. 4.根据权利要求1-3任一项所述的命令分发方法,其特征在于,所述从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程,包括:4. The command distribution method according to any one of claims 1 to 3, characterized in that the step of determining, from the thread groups corresponding to the plurality of first target register groups respectively, the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle comprises: 基于确定的所述多个第一目标寄存器组分别对应的线程组中各个线程的命令执行状态信息,从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the determined command execution status information of each thread in the thread groups corresponding to the plurality of first target register groups, target threads corresponding to the plurality of first target register groups in a current processing cycle are determined from the thread groups corresponding to the plurality of first target register groups. 5.根据权利要求4所述的命令分发方法,其特征在于,所述基于确定的所述多个第一目标寄存器组分别对应的线程组中各个线程的命令执行状态,从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程,包括:5. The command distribution method according to claim 4, characterized in that the step of determining, based on the command execution status of each thread in the thread groups corresponding to the plurality of first target register groups, target threads corresponding to the plurality of first target register groups in a current processing cycle from the thread groups corresponding to the plurality of first target register groups comprises: 从所述多个第一目标寄存器组分别对应的线程组中,确定命令执行状态信息为就绪状态的多个备选线程;Determine, from the thread groups respectively corresponding to the plurality of first target register groups, a plurality of candidate threads whose command execution status information is in a ready state; 从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。From the multiple candidate threads, target threads corresponding to the multiple first target register groups respectively in a current processing cycle are determined. 6.根据权利要求5所述的命令分发方法,其特征在于,所述从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程,包括:6. The command distribution method according to claim 5, wherein determining the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle from the plurality of candidate threads comprises: 基于所述多个备选线程分别对应的待分发命令的优先级,从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the priorities of the to-be-dispatched commands respectively corresponding to the multiple candidate threads, target threads respectively corresponding to the multiple first target register groups in a current processing cycle are determined from the multiple candidate threads. 7.根据权利要求5所述的命令分发方法,其特征在于,所述从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程,包括:7. The command distribution method according to claim 5, wherein determining the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle from the plurality of candidate threads comprises: 基于所述多个备选线程分别对应的待分发命令的优先级以及待分发命令对应运算单元的占用状态,从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the priorities of the commands to be issued respectively corresponding to the multiple candidate threads and the occupation status of the computing units corresponding to the commands to be issued, target threads corresponding to the multiple first target register groups in the current processing cycle are determined from the multiple candidate threads. 8.根据权利要求5所述的命令分发方法,其特征在于,所述从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程,包括:8. The command distribution method according to claim 5, wherein determining the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle from the plurality of candidate threads comprises: 基于多个所述备选线程分别对应的当前待分发命令的命令类型、以及所述运算单元的类型,从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the command types of the current commands to be distributed corresponding to the candidate threads respectively and the type of the operation unit, target threads corresponding to the first target register groups respectively in the current processing cycle are determined from the candidate threads. 9.根据权利要求1-3任一项所述的命令分发方法,其特征在于,还包括:9. The command distribution method according to any one of claims 1 to 3, characterized in that it further comprises: 响应于为当前处理周期确定的目标线程对应的待分发命令中存在多于一个操作数的多操作数待分发命令,从所述当前处理周期至目标处理周期的每个处理周期,该多操作数待分发命令对应的第一目标寄存器组分别向该待分发命令对应运算单元分发对应的一个操作数;In response to the existence of a multi-operand to-be-dispatched command with more than one operand in the to-be-dispatched commands corresponding to the target thread determined for the current processing cycle, the first target register group corresponding to the multi-operand to-be-dispatched command distributes a corresponding operand to the computing unit corresponding to the to-be-dispatched command in each processing cycle from the current processing cycle to the target processing cycle; 其中,所述目标处理周期与所述当前处理周期的周期数差值,与将所述多操作数数量减一相等。The difference between the target processing cycle and the current processing cycle is equal to the number of the multiple operands reduced by one. 10.根据权利要求1-3任一项所述的命令分发方法,其特征在于,还包括:10. The command distribution method according to any one of claims 1 to 3, characterized in that it further comprises: 响应于为当前处理周期确定的目标线程对应的待分发命令中存在多于一个操作数的多操作数待分发命令,确定所述多操作数待分发命令中操作数数量最多的多操作数待分发命令的操作数数量;In response to the existence of a multi-operand to-be-dispatched command having more than one operand among the commands to be distributed corresponding to the target thread determined for the current processing cycle, determining the number of operands of the multi-operand to-be-dispatched command having the largest number of operands among the multi-operand to-be-dispatched commands; 针对为当前处理周期确定的目标线程对应的待分发命令中存在的操作数数量少于所述最多的操作数数量的每个其他待分发命令,从当前处理周期的下一个处理周期至所述第一目标寄存器组再次被调度的处理周期之前的每个处理周期,响应于存在其他待分发命令所在第一目标寄存器组空闲,从与该所在第一目标寄存器组对应的线程组中,为该所在第一目标寄存器组确定就绪状态的待分发命令;For each other to-be-dispatched command corresponding to the target thread determined for the current processing cycle and having an operand number less than the maximum operand number, in each processing cycle from the next processing cycle of the current processing cycle to the processing cycle before the first target register group is scheduled again, in response to the first target register group where the other to-be-dispatched command is located being idle, determine a to-be-dispatched command in a ready state for the first target register group from the thread group corresponding to the first target register group; 其中,所述就绪状态的待分发命令的操作数量不大于确定就绪状态的待分发命令所在处理周期至所述第一目标寄存器组再次被调度的处理周期包含的处理周期数量。The number of operations of the ready-to-be-dispatched command is not greater than the number of processing cycles from the processing cycle where the ready-to-be-dispatched command is located to the processing cycle where the first target register group is scheduled again. 11.根据权利要求4所述的命令分发方法,其特征在于,还包括:获取所述运算单元在执行命令后生成的反馈信息;11. The command distribution method according to claim 4, further comprising: obtaining feedback information generated by the computing unit after executing the command; 基于所述反馈信息,生成与所执行命令所属线程对应的命令执行状态信息。Based on the feedback information, command execution status information corresponding to the thread to which the executed command belongs is generated. 12.根据权利要求1-3任一项所述的命令分发方法,其特征在于,还包括:12. The command distribution method according to any one of claims 1 to 3, characterized in that it further comprises: 基于所述寄存器组的数量、以及当前正在执行的线程的数量,将所述当前正在执行的线程分组,得到每个所述寄存器组分别对应的线程组。Based on the number of the register groups and the number of currently executing threads, the currently executing threads are grouped to obtain thread groups corresponding to each of the register groups. 13.一种命令分发器,其特征在于,包括:调度器、以及分发接口;13. A command distributor, comprising: a scheduler and a distribution interface; 所述调度器,用于从多个寄存器组中确定当前处理周期对应的多个第一目标寄存器组;其中,所述第一目标寄存器组,与最近至少一个历史处理周期确定的第二目标寄存器组不同;从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程;The scheduler is configured to determine a plurality of first target register groups corresponding to a current processing cycle from a plurality of register groups; wherein the first target register group is different from a second target register group determined in at least one recent historical processing cycle; and determine, from thread groups corresponding to the plurality of first target register groups, target threads corresponding to the plurality of first target register groups in the current processing cycle; 所述分发接口,用于向对应运算单元分发确定的目标线程分别对应的命令;The distribution interface is used to distribute commands corresponding to the determined target threads to the corresponding computing units; 所述调度器,还用于:The scheduler is further used for: 响应于为当前处理周期确定的目标线程对应的待分发命令中存在两个操作数的多操作数待分发命令,针对为当前处理周期确定的目标线程对应的待分发命令中存在的每个单操作数待分发命令,在当前处理周期的下一处理周期,为该单操作数待分发命令所在第一目标寄存器组确定就绪状态的另一个单操作数待分发命令。In response to the presence of a multi-operand to-be-dispatched command having two operands in the commands to-be-dispatched corresponding to the target thread determined for the current processing cycle, for each single-operand to-be-dispatched command in the commands to-be-dispatched corresponding to the target thread determined for the current processing cycle, in the next processing cycle of the current processing cycle, another single-operand to-be-dispatched command in a ready state is determined for the first target register group where the single-operand to-be-dispatched command is located. 14.根据权利要求13所述的命令分发器,其特征在于,所述调度器,在从多个寄存器组中确定当前处理周期对应的多个第一目标寄存器组时,用于:14. The command distributor according to claim 13, wherein the scheduler, when determining a plurality of first target register groups corresponding to a current processing cycle from a plurality of register groups, is configured to: 在所述当前处理周期为奇数周期的情况下,将所述多个寄存器组中编号为奇数的寄存器组确定为所述第一目标寄存器组;In a case where the current processing cycle is an odd-numbered cycle, determining an odd-numbered register group among the multiple register groups as the first target register group; 在所述当前处理周期为偶数周期的情况下,将所述多个寄存器组中编号为偶数的寄存器组确定为所述第一目标寄存器组。When the current processing cycle is an even-numbered cycle, an even-numbered register group among the multiple register groups is determined as the first target register group. 15.根据权利要求13所述的命令分发器,其特征在于,所述调度器,还用于:15. The command distributor according to claim 13, wherein the scheduler is further used for: 根据所需操作数数量最多的运算单元的操作数的数量,确定寄存器的分组数量,并将寄存器划分成所述多个寄存器组。The number of register groups is determined according to the number of operands of the operation unit requiring the largest number of operands, and the registers are divided into the plurality of register groups. 16.根据权利要求13-15任一项所述的命令分发器,其特征在于,所述调度器,在从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程时,用于:16. The command distributor according to any one of claims 13 to 15, characterized in that the scheduler, when determining, from the thread groups corresponding to the plurality of first target register groups, the target threads corresponding to the plurality of first target register groups in the current processing cycle, is configured to: 基于确定的所述多个第一目标寄存器组分别对应的线程组中各个线程的命令执行状态信息,从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the determined command execution status information of each thread in the thread groups corresponding to the plurality of first target register groups, target threads corresponding to the plurality of first target register groups in a current processing cycle are determined from the thread groups corresponding to the plurality of first target register groups. 17.根据权利要求16所述的命令分发器,其特征在于,所述调度器,在基于确定的所述多个第一目标寄存器组分别对应的线程组中各个线程的命令执行状态,从与所述多个第一目标寄存器组分别对应的线程组中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程时,用于:17. The command distributor according to claim 16, characterized in that the scheduler, when determining the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle from the thread groups respectively corresponding to the plurality of first target register groups based on the determined command execution status of each thread in the thread groups respectively corresponding to the plurality of first target register groups, is configured to: 从所述多个第一目标寄存器组分别对应的线程组中,确定命令执行状态信息为就绪状态的多个备选线程;Determine, from the thread groups respectively corresponding to the plurality of first target register groups, a plurality of candidate threads whose command execution status information is in a ready state; 从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。From the multiple candidate threads, target threads corresponding to the multiple first target register groups respectively in a current processing cycle are determined. 18.根据权利要求17所述的命令分发器,其特征在于,所述调度器,在从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程时,用于:18. The command distributor according to claim 17, wherein the scheduler, when determining the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle from the plurality of candidate threads, is configured to: 基于所述多个备选线程分别对应的待分发命令的优先级,从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the priorities of the to-be-dispatched commands respectively corresponding to the multiple candidate threads, target threads respectively corresponding to the multiple first target register groups in a current processing cycle are determined from the multiple candidate threads. 19.根据权利要求17所述的命令分发器,其特征在于,所述调度器,在从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程时,用于:19. The command distributor according to claim 17, wherein the scheduler, when determining the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle from the plurality of candidate threads, is configured to: 基于所述多个备选线程分别对应的待分发命令的优先级以及待分发命令对应运算单元的占用状态,从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the priorities of the commands to be issued respectively corresponding to the multiple candidate threads and the occupation status of the computing units corresponding to the commands to be issued, target threads corresponding to the multiple first target register groups in the current processing cycle are determined from the multiple candidate threads. 20.根据权利要求17所述的命令分发器,其特征在于,所述调度器,在从所述多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程时,用于:20. The command distributor according to claim 17, wherein the scheduler, when determining the target threads corresponding to the plurality of first target register groups respectively in the current processing cycle from the plurality of candidate threads, is configured to: 基于多个所述备选线程分别对应的当前待分发命令的命令类型、以及所述运算单元的类型,从多个备选线程中,确定在当前处理周期与所述多个第一目标寄存器组分别对应的目标线程。Based on the command types of the current commands to be distributed corresponding to the candidate threads respectively and the type of the operation unit, target threads corresponding to the first target register groups respectively in the current processing cycle are determined from the candidate threads. 21.根据权利要求13-15任一项所述的命令分发器,其特征在于,所述调度器,还用于:21. The command distributor according to any one of claims 13 to 15, characterized in that the scheduler is further used for: 响应于为当前处理周期确定的目标线程对应的待分发命令中存在多于一个操作数的多操作数待分发命令,从所述当前处理周期至目标处理周期的每个处理周期,该多操作数待分发命令对应的第一目标寄存器组分别向该待分发命令对应运算单元分发对应的一个操作数;In response to the existence of a multi-operand to-be-dispatched command with more than one operand in the to-be-dispatched commands corresponding to the target thread determined for the current processing cycle, the first target register group corresponding to the multi-operand to-be-dispatched command distributes a corresponding operand to the computing unit corresponding to the to-be-dispatched command in each processing cycle from the current processing cycle to the target processing cycle; 其中,所述目标处理周期与所述当前处理周期的周期数差值,与将所述多操作数数量减一相等。The difference between the target processing cycle and the current processing cycle is equal to the number of the multiple operands reduced by one. 22.根据权利要求13-15任一项所述的命令分发器,其特征在于,所述的调度器,还用于:22. The command distributor according to any one of claims 13 to 15, characterized in that the scheduler is further used for: 响应于为当前处理周期确定的目标线程对应的待分发命令中存在多于一个操作数的多操作数待分发命令,确定所述多操作数待分发命令中操作数数量最多的多操作数待分发命令的操作数数量;In response to the existence of a multi-operand to-be-dispatched command having more than one operand among the commands to be distributed corresponding to the target thread determined for the current processing cycle, determining the number of operands of the multi-operand to-be-dispatched command having the largest number of operands among the multi-operand to-be-dispatched commands; 针对为当前处理周期确定的目标线程对应的待分发命令中存在的操作数数量少于所述最多的操作数数量的每个其他待分发命令,从当前处理周期的下一个处理周期至所述第一目标寄存器组再次被调度的处理周期之前的每个处理周期,响应于存在其他待分发命令所在第一目标寄存器组空闲,从与该所在第一目标寄存器组对应的线程组中,为该所在第一目标寄存器组确定就绪状态的待分发命令;For each other to-be-dispatched command corresponding to the target thread determined for the current processing cycle and having an operand number less than the maximum operand number, in each processing cycle from the next processing cycle of the current processing cycle to the processing cycle before the first target register group is scheduled again, in response to the first target register group where the other to-be-dispatched command is located being idle, determine a to-be-dispatched command in a ready state for the first target register group from the thread group corresponding to the first target register group; 其中,所述就绪状态的待分发命令的操作数量不大于确定就绪状态的待分发命令所在处理周期至所述第一目标寄存器组再次被调度的处理周期包含的处理周期数量。The number of operations of the ready-to-be-dispatched command is not greater than the number of processing cycles from the processing cycle where the ready-to-be-dispatched command is located to the processing cycle where the first target register group is scheduled again. 23.根据权利要求16所述的命令分发器,其特征在于,所述调度器,还用于:获取所述运算单元在执行命令后生成的反馈信息;23. The command distributor according to claim 16, characterized in that the scheduler is further used to: obtain feedback information generated by the computing unit after executing the command; 基于所述反馈信息,生成与所执行命令所属线程对应的命令执行状态信息。Based on the feedback information, command execution status information corresponding to the thread to which the executed command belongs is generated. 24.根据权利要求13-15任一项所述的命令分发器,其特征在于,所述调度器,还用于:24. The command distributor according to any one of claims 13 to 15, characterized in that the scheduler is further used for: 基于所述寄存器组的数量、以及当前正在执行的线程的数量,将所述当前正在执行的线程分组,得到每个所述寄存器组分别对应的线程组。Based on the number of the register groups and the number of currently executing threads, the currently executing threads are grouped to obtain thread groups corresponding to each of the register groups. 25.一种芯片,其特征在于,包括:控制器、命令分发器、以及运算器;25. A chip, characterized in that it comprises: a controller, a command distributor, and an operator; 其中,所述控制器,用于获取多个线程分别对应的命令,并向所述命令分发器发送所述命令;The controller is used to obtain commands corresponding to the multiple threads respectively, and send the commands to the command distributor; 所述命令分发器,用于基于权利要求1-12任一项所述的命令分发方法向所述运算器分发所述命令;The command distributor is used to distribute the command to the operator based on the command distribution method according to any one of claims 1 to 12; 所述运算器,用于基于所述命令分发器分发的命令,从与所述命令对应的目标寄存器组中读取操作数,并基于所述操作数执行所述命令。The operator is used to read operands from a target register group corresponding to a command distributed by the command distributor based on the command, and execute the command based on the operands. 26.一种电子设备,其特征在于,包括如权利要求25所述的芯片。26. An electronic device, comprising the chip according to claim 25. 27.一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器运行时执行如权利要求1至12任一项所述的命令分发方法的步骤。27. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the command distribution method according to any one of claims 1 to 12 are executed.
CN202110323622.8A 2021-03-26 2021-03-26 Command distribution method, command distributor, chip and electronic device Active CN115129369B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110323622.8A CN115129369B (en) 2021-03-26 2021-03-26 Command distribution method, command distributor, chip and electronic device
PCT/CN2021/120535 WO2022198955A1 (en) 2021-03-26 2021-09-26 Command distribution method, command distributor, chip, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110323622.8A CN115129369B (en) 2021-03-26 2021-03-26 Command distribution method, command distributor, chip and electronic device

Publications (2)

Publication Number Publication Date
CN115129369A CN115129369A (en) 2022-09-30
CN115129369B true CN115129369B (en) 2025-03-28

Family

ID=83374196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110323622.8A Active CN115129369B (en) 2021-03-26 2021-03-26 Command distribution method, command distributor, chip and electronic device

Country Status (2)

Country Link
CN (1) CN115129369B (en)
WO (1) WO2022198955A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120578422B (en) * 2025-08-06 2026-01-27 摩尔线程智能科技(北京)股份有限公司 Processor, display card, computer equipment, register allocation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101014933A (en) * 2004-07-13 2007-08-08 辉达公司 Simulating multiported memories using lower port count memories
US7761688B1 (en) * 2003-12-03 2010-07-20 Redpine Signals, Inc. Multiple thread in-order issue in-order completion DSP and micro-controller

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3718912A (en) * 1970-12-22 1973-02-27 Ibm Instruction execution unit
EP1660998A1 (en) * 2003-08-28 2006-05-31 MIPS Technologies, Inc. Mechanisms for dynamic configuration of virtual processor resources
CN1842770A (en) * 2003-08-28 2006-10-04 美普思科技有限公司 A holistic mechanism for suspending and releasing threads of computation during execution in a processor
US9798544B2 (en) * 2012-12-10 2017-10-24 Nvidia Corporation Reordering buffer for memory access locality
TWI564807B (en) * 2015-11-16 2017-01-01 財團法人工業技術研究院 Scheduling method and processing device using the same
US10754651B2 (en) * 2018-06-29 2020-08-25 Intel Corporation Register bank conflict reduction for multi-threaded processor
CN109408118B (en) * 2018-09-29 2024-01-02 古进 MHP heterogeneous multi-pipeline processor
CN111290786B (en) * 2018-12-12 2022-05-06 展讯通信(上海)有限公司 Information processing method, device and storage medium
CN111459543B (en) * 2019-01-21 2022-09-13 上海登临科技有限公司 Method for managing register file unit
CN111258657B (en) * 2020-01-23 2020-11-20 上海燧原智能科技有限公司 Pipeline control method and related equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7761688B1 (en) * 2003-12-03 2010-07-20 Redpine Signals, Inc. Multiple thread in-order issue in-order completion DSP and micro-controller
CN101014933A (en) * 2004-07-13 2007-08-08 辉达公司 Simulating multiported memories using lower port count memories

Also Published As

Publication number Publication date
WO2022198955A1 (en) 2022-09-29
CN115129369A (en) 2022-09-30

Similar Documents

Publication Publication Date Title
US7418576B1 (en) Prioritized issuing of operation dedicated execution unit tagged instructions from multiple different type threads performing different set of operations
KR102586988B1 (en) Multi-kernel wavefront scheduler
US9672035B2 (en) Data processing apparatus and method for performing vector processing
US8392669B1 (en) Systems and methods for coalescing memory accesses of parallel threads
US20090240895A1 (en) Systems and methods for coalescing memory accesses of parallel threads
CN114942831B (en) Processor, chip, electronic device and data processing method
US20150163324A1 (en) Approach to adaptive allocation of shared resources in computer systems
US9940134B2 (en) Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
CN107580698B (en) System and method for determining a concurrency factor for a schedule size of a parallel processor core
CN114625421A (en) SIMT instruction processing method and device
CN118012632B (en) GPGPU branch instruction scheduling method based on multi-level reallocation mechanism
US12210902B2 (en) System and method for maintaining dependencies in a parallel process
KR20150056373A (en) Multi-thread processing apparatus and method with sequential performance manner
CN115033184A (en) Memory access processing device and method, processor, chip, board card and electronic equipment
US10152328B2 (en) Systems and methods for voting among parallel threads
CN105957131A (en) Graphics processing system and method thereof
CN115129369B (en) Command distribution method, command distributor, chip and electronic device
JP7617907B2 (en) The processor and its internal interrupt controller
EP4174671A1 (en) Method and apparatus with process scheduling
US11625269B1 (en) Scheduling for locality of reference to memory
US9442772B2 (en) Global and local interconnect structure comprising routing matrix to support the execution of instruction sequences by a plurality of engines
JP4789269B2 (en) Vector processing apparatus and vector processing method
US9678752B2 (en) Scheduling apparatus and method of dynamically setting the size of a rotating register
CN116954850A (en) Computing task scheduling method, device, system on chip and storage medium
KR20220067289A (en) Gpgpu thread block scheduling method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant