CN113778528A

CN113778528A - Instruction sending method and device, electronic equipment and storage medium

Info

Publication number: CN113778528A
Application number: CN202111068106.1A
Authority: CN
Inventors: 郭向飞
Original assignee: Beijing Eswin Computing Technology Co Ltd
Current assignee: Beijing Eswin Computing Technology Co Ltd
Priority date: 2021-09-13
Filing date: 2021-09-13
Publication date: 2021-12-10
Anticipated expiration: 2041-09-13
Also published as: CN113778528B

Abstract

The application provides an instruction sending method, an instruction sending device, electronic equipment and a storage medium, wherein the instruction sending method comprises the following steps: determining the type of an instruction to be sent by a transmitting unit; determining at least two execution units to be selected corresponding to the type from a plurality of execution units; acquiring the time consumed by processing instructions by at least two execution units to be selected; selecting a corresponding execution unit under the condition that the time consumed by finishing processing the instruction is shortest from at least two execution units to be selected as a target execution unit; and sending the instruction to be sent by the transmitting unit to the target execution unit. Compared with the condition that the instruction is fixedly mapped into a corresponding execution unit for processing, the method can avoid sending the instruction to the execution unit which consumes a long time for processing, and further shorten the processing time of the instruction, thereby improving the processing efficiency of the superscalar processor.

Description

Instruction sending method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of processors, and in particular, to an instruction sending method, an instruction sending device, an electronic device, and a storage medium.

Background

With the emergence of application scenarios such as big data processing, cloud computing, deep learning, etc., the performance requirements for processors are also gradually increasing.

Currently, superscalar (superscalar) processors are mainly used. In a superscalar processor, one processor core is capable of performing a type of parallel operation that is instruction level parallel. Thus, a superscalar processor can achieve higher processor throughput at the same processor dominant frequency. That is, superscalar processors are executed out of order by multiple issue. In a superscalar processor, an issue unit issues out-of-order (out-of-order) instruction streams that are fixedly mapped for execution in respective execution units of the superscalar processor. Each execution unit is substantially a plurality of units formed by combining components with different functions differently.

However, due to the complexity of the practical use scenario of superscalar processors, multiple execution units with the same function are typically provided. For example: an adder and a multiplier are arranged in an execution unit A of the superscalar processor. A multiplier is arranged in the execution unit B of the superscalar processor. Thus, both execution unit a and execution unit B have multiplication functions. However, to balance the performance and performance of superscalar processors, the specific implementation of different execution units having the same functionality in executing instructions may vary. For example: the multiplier in execution unit a takes 3 cycles when executing a multiply instruction. While the multiplier in execution unit B takes 5 cycles when executing a multiply instruction. If the instruction transmitted by the transmitting unit is fixedly mapped to a certain execution unit for execution, for example: the multiplication instruction is fixedly mapped to the execution unit B for execution, or the multiplication instruction is fixedly mapped to the execution unit A for execution, and the previous instruction is currently executed in the execution unit A. Thus, the execution time of the instruction is extended, causing a delay, thereby reducing the execution efficiency of the superscalar processor.

Disclosure of Invention

An object of the embodiments of the present application is to provide an instruction sending method, an instruction sending apparatus, an electronic device, and a storage medium, so as to improve the execution efficiency of a superscalar processor.

In order to solve the above technical problem, an embodiment of the present application provides the following technical solutions:

a first aspect of the present application provides an instruction sending method, which is applied to a superscalar processor, where the superscalar processor includes at least one issue unit and a plurality of execution units, the issue unit is configured to issue an instruction, and the execution units are configured to execute the instruction issued by the issue unit; the method comprises the following steps: determining the type of an instruction to be issued by the transmitting unit; determining at least two execution units to be selected corresponding to the type from the plurality of execution units; acquiring the time consumed by the at least two execution units to be selected for processing the instruction; selecting the corresponding execution unit under the condition that the time consumed for processing the instruction is shortest from the at least two execution units to be selected as a target execution unit; and sending the instruction to be sent by the transmitting unit to the target execution unit.

A second aspect of the present application provides an instruction issue apparatus, which is applied to a superscalar processor, the superscalar processor including at least one issue unit and a plurality of execution units, the issue unit being configured to issue an instruction, the execution units being configured to execute the instruction issued by the issue unit; the device comprises: the first determination module is used for determining the type of the instruction to be sent by the transmitting unit; the first selection module is used for determining at least two execution units to be selected corresponding to the types from the plurality of execution units; the second determining module is used for acquiring the time consumed by the at least two to-be-selected execution units for processing the instructions; a second selection module, configured to select, from the at least two execution units to be selected, an execution unit corresponding to the shortest time consumed for completing processing of the instruction as a target execution unit; and the sending module is used for sending the instruction to be sent by the sending unit to the target execution unit.

A third aspect of the present application provides an electronic device comprising: a processor, a memory, and a bus; wherein, the processor and the memory complete mutual communication through the bus; the processor is for invoking program instructions in the memory for performing the method of the first aspect.

A fourth aspect of the present application provides a computer-readable storage medium comprising: a stored program; wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method of the first aspect.

Compared with the prior art, according to the instruction sending method provided by the first aspect of the present application, after the type of the instruction to be sent by the sending unit is determined, by determining at least two to-be-selected execution units corresponding to the type from the plurality of execution units and acquiring the time consumed by the at least two to-be-selected execution units to process the instruction, and further selecting the corresponding execution unit from the at least two to-be-selected execution units, which has the shortest time consumed by processing the instruction, as the target execution unit for receiving and processing the instruction, compared with a case where the instruction is fixedly mapped into one corresponding execution unit to be processed, the instruction can be prevented from being sent to the execution unit which consumes longer time, so that the processing time of the instruction is shortened, and the processing efficiency of the superscalar processor is improved.

The instruction sending device provided by the second aspect, the electronic device provided by the third aspect, and the computer-readable storage medium provided by the fourth aspect of the present application have the same or similar beneficial effects as the instruction sending method provided by the first aspect.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present application will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present application are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

fig. 1 is a schematic view of an application scenario architecture of a command sending method in an embodiment of the present application;

FIG. 2 is a diagram illustrating a transmit buffer in an embodiment of the present application;

fig. 3 is a first flowchart illustrating a method for sending an instruction in an embodiment of the present application;

fig. 4 is a flowchart illustrating a second instruction sending method in an embodiment of the present application;

fig. 5 is a schematic flowchart of a third instruction sending method in an embodiment of the present application;

FIG. 6 is a diagram illustrating an exemplary process for sending commands in an embodiment of the present application;

FIG. 7 is a first schematic structural diagram of an instruction issue device according to an embodiment of the present application;

FIG. 8 is a second schematic structural diagram of an instruction issue device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which this application belongs.

In the prior art, due to the complexity of the use scenario, a superscalar processor is generally provided with a plurality of execution units with the same function. To balance the performance and performance of superscalar processors, multiple execution units with the same functionality are configured with electronic devices of different performance and efficiency. When an instruction is generated by an issue unit of a superscalar processor, the instruction is fixedly issued to a corresponding one of the execution units. Sending instructions in this fixed mapping manner may result in a long time consuming instruction execution, thereby reducing the execution efficiency of the superscalar processor.

In view of this, an embodiment of the present application provides an instruction sending method, in which all execution units capable of processing an instruction to be sent are determined from a superscalar processor, and a corresponding target execution unit, which takes the shortest time to execute the instruction to be sent, is selected from all the determined execution units, so that an emission unit of the superscalar processor sends the instruction to the target execution unit for processing. Thus, the processing time of the instruction can be shortened, and the processing efficiency of the superscalar processor is improved.

It should be noted that the instruction sending method provided in the embodiment of the present application may be applied to a scheduling logic unit. The scheduling logic unit may be any circuit having a logic operation function. The dispatch logic unit enables instructions in the issue unit of the superscalar processor to be issued to the execution unit which is able to process the instructions faster. In practical applications, the dispatch logic unit may be disposed in the superscalar processor so as to control the issue unit of the superscalar processor to issue instructions to a more suitable execution unit, thereby improving the processing efficiency of the superscalar processor.

Before describing the instruction sending method provided in the embodiment of the present application in detail, an application scenario architecture of the instruction sending method provided in the embodiment of the present application is described first.

Fig. 1 is a schematic diagram of an application scenario architecture of an instruction forwarding method in an embodiment of the present application, and referring to fig. 1, the superscalar processor at least includes: a transmitting unit 101, a scheduling logic unit 102, a multiplexing unit 103, and a plurality of execution units 104.

In the transmitting unit 101, at least a transmission buffer 1011 is included.

During the execution of the superscalar processor, the instruction sequence after being evaluated and decoded is pushed to the issue unit 101 and buffered in the issue buffer 1011 of the issue unit 101.

Fig. 2 is a schematic diagram of a launch buffer in an embodiment of the present application, and referring to fig. 2, instruction sequences instr1, instr2, instr3, and instr4 are stored in the launch buffer 1011. Wherein instr1, instr2, instr3 and instr4 are different instructions. These instructions may be stored in a certain order. For example: the storage is performed in the order of transmission to the transmission buffer 1011. Of course, the storage may also be performed in other orders. The storage order of the instructions in the issue buffer 1011 is not limited specifically here.

The scheduling logic unit 102 includes at least a scheduling cache 1021 and a scheduling policy module 1022.

The scheduling buffer 1021 is used for buffering the instructions to be sent according to the transmission order. If the order of the transmission of the plurality of instructions to be transmitted does not change, the order of the instructions stored in scheduling buffer 1021 is the same as the order of the instructions stored in issue buffer 1011. If the transmission order of the instructions to be transmitted changes, the order of the instructions stored in the scheduling buffer 1021 needs to be adjusted according to the changed order.

For example, assuming that the 4 instructions stored in the issue buffer 1011 are instr1, instr2, instr3 and instr4 in the receiving order, and in the subsequent issue, instruction instr4 needs to be issued first, then the 4 instructions stored in the dispatch buffer 1021 are instr4, instr1, instr2 and instr3 in the order.

Various scheduling policies are stored in the scheduling policy module 1022. The scheduling policy is to which one of the execution units 104 the instructions in the scheduling cache 1021 are sent.

In practical applications, the scheduling policy may be embodied in the form of a function or a table. After the instruction to be sent is obtained, calculating relevant information of the instruction to be sent through a function, and determining an execution unit required to be sent by the instruction to be sent; or, the execution unit corresponding to the instruction to be sent may be searched through a table. When the scheduling policy is embodied in the form of a table, in order to save the storage space occupied by the table, only part of the corresponding relationship between the instructions and the execution units may be listed in the table, and as for the corresponding relationship between the instructions and the execution units not listed in the table, it may be determined based on the history of instruction transmission, that is, to which execution unit the instruction that is the same as or similar to the current instruction is transmitted before, and at this time, the current instruction is also transmitted to the execution unit.

In the multi-path selection unit 103, after receiving the output of the scheduling logic unit 102 (i.e., the determined target execution unit), the selected target execution unit is converted into a hardware signal, and the converted hardware signal is sent to the scheduling logic unit 102, so that the scheduling logic unit 102 sends the instruction to be sent to the target execution unit according to the hardware signal.

The scheduling logic unit 102 and the multiplexing unit 103 have been described as separate units from the transmitting unit 101. Of course, the scheduling logic unit 102 and the multiplexing unit 103 may also be used as sub-units in the transmitting unit 101. The specific locations of the scheduling logic unit 102 and the multiplexing unit 103 are not limited herein.

In multiple execution units 104, each execution unit 104 may be capable of performing some specific functions. The functions that can be performed by different execution units 104 may be partially repeated, and the electronic devices used in different execution units 104 to perform the partially repeated functions may be different.

For example, assume that execution unit A is capable of performing an add operation and execution unit B is capable of performing an add operation and a multiply operation. Then, there is a partial overlap of the functions that execution unit a and execution unit B can implement. Assume that the number of hardware resources occupied by the adder used by the execution unit a to implement the addition operation is 100, and the number of areas occupied on the circuit board is 1. The number of hardware resources occupied by an adder used for realizing addition operation by the execution unit B is 1, and the number of areas occupied on a circuit board is 100. Then, although the execution unit a and the execution unit B both employ adders, the two adders are different.

After an instruction is sent to one of the execution units 104 by the issue unit 101 in the superscalar processor, the execution unit 104 performs processing based on the instruction, thereby completing one processing procedure of the superscalar processor.

Next, a method for sending a command provided in an embodiment of the present application will be described in detail.

Fig. 3 is a first schematic flowchart of a command sending method in an embodiment of the present application, and as shown in fig. 3, the method may include:

s301: the type of instruction to be issued by the issue unit is determined.

Since different types of instructions need to be processed by execution units with different functions, before the instructions are issued, the types of the instructions need to be determined, and then the execution unit to which the instructions are sent needs to be determined according to the types of the instructions.

In determining the type of the instruction, the type of the instruction may be determined according to characters in the instruction, which can characterize the type of the instruction. For example: traversing all characters in the instruction, finding all characters capable of characterizing the type of the instruction, and taking the type of the instruction characterized by the characters as the type of the instruction. Of course, it is also possible to add a type tag to an instruction in advance when the instruction is generated. Thus, the type of instruction can be determined based on the tag of the instruction. Of course, the type of instruction may also be determined in other ways, and is not limited herein for the specific way.

S302: at least two execution units to be selected corresponding to the type are determined from the plurality of execution units.

Since different types of instructions need different functional execution units to process, and in the same superscalar processor, the functions of a plurality of execution units are usually repeated, after the type of the instruction is determined, the execution unit capable of executing the instruction in the superscalar processor needs to be found out first.

For example, assume that the type of instruction is an add, i.e., the instruction is specifically an add instruction. In a superscalar processor, execution unit A is capable of performing addition operations, execution unit B is capable of performing addition operations and multiplication operations, and execution unit C is capable of performing multiplication operations. Then, the at least two execution units selected are execution unit a and execution unit B.

Generally, to be able to allocate an instruction to the most appropriate execution unit, all execution units that are able to execute the instruction are found from the superscalar processor. However, in order to improve the lookup efficiency, two execution units capable of executing the instruction are searched from the superscalar processor, and one execution unit is determined from the two execution units and the instruction is sent to the instruction unit. Therefore, a relatively proper execution unit can be selected for the instruction, the determination efficiency of the execution unit can be improved, and the processing efficiency of the superscalar processor is improved.

S303: and acquiring the time consumed by the at least two candidate execution units for processing the instruction.

The time consumed by the execution units with the same function in processing instructions may also be different, for example: the number of hardware resources required to be occupied by the adder used by the execution unit a to implement the addition function is 100, the number of cycles consumed by the corresponding processing instruction is 3, the number of hardware resources required to be occupied by the adder used by the execution unit B to implement the addition function is 1, and the number of cycles consumed by the corresponding processing instruction is 5, where the number of cycles may be considered as the time consumed by a specific processing instruction. Then the execution unit a and the execution unit B take different time when processing the add instruction. Execution unit a spends 3 cycles and execution unit B spends 5 cycles. It is apparent that execution unit a takes less time to process the add instruction than execution unit B. Therefore, when determining which execution unit to send an instruction to, the time consumed by each execution unit to be selected to process the instruction needs to be determined first, so as to select the execution unit with higher processing speed for the instruction.

In the process of determining the time consumed by each candidate execution unit to process the instruction, the time consumed by the candidate execution unit to process the instruction may be directly obtained from the hardware information of the candidate execution unit, or the time actually consumed by the candidate execution unit to process the instruction last time may be used as the time consumed by the candidate execution unit to process the instruction. Of course, the time consumed by the selected execution unit to process the instruction may also be determined in other manners, and the specific determination manner is not limited herein.

S304: and selecting the corresponding execution unit as a target execution unit under the condition that the time consumed by processing the instruction is shortest from at least two execution units to be selected.

After a plurality of execution units to be selected corresponding to the type of the instruction are determined from all execution units of the superscalar processor and the time consumed by each execution unit to be selected for processing the instruction is determined, the corresponding execution unit to be selected under the condition that the time consumed for processing the instruction is the shortest can be taken as a target execution unit. And then the transmitting unit is controlled to send the instruction to the target execution unit, so that the target execution unit receives and processes the instruction. Thus, the instructions can be executed in the shortest time, and the processing efficiency of the superscalar processor is improved.

Continuing with the above example, assuming that the add instruction defines execution unit a and execution unit B in the superscalar processor, both execution unit a and execution unit B can implement the add function, which takes 3 cycles for execution unit a to process the add instruction and 5 cycles for execution unit B to process the add instruction. Then execution unit a is taken as the target execution unit and the issue unit is controlled to issue an add instruction to execution unit a. Thus, after 3 cycles, the add instruction can be processed to completion. Compared with the condition that the addition instruction needs to be processed in the execution unit B for 5 cycles, the addition instruction can be processed 2 cycles in advance, and the processing efficiency of the superscalar processor is improved.

S305: and sending the instruction to be sent by the transmitting unit to the target execution unit.

After the target execution unit is selected from the plurality of execution units, the instruction to be issued in the transmission unit can be sent to the target execution unit for processing. Because the time for the target execution unit to process the instruction to be issued is shortest, the instruction can be ensured to be executed and completed in the shortest time, and the processing efficiency of the superscalar processor is further improved.

As can be seen from the above, according to the instruction sending method provided in the embodiment of the present application, after the type of the instruction to be sent by the sending unit is determined, by determining at least two to-be-selected execution units corresponding to the type from the plurality of execution units, and acquiring time consumed by the at least two to-be-selected execution units to process the instruction, and further selecting the corresponding execution unit from the at least two to-be-selected execution units, which has the shortest time consumed by processing the instruction, as the target execution unit for receiving and processing the instruction, compared with a case where the instruction is fixedly mapped to a corresponding execution unit for processing, it is possible to avoid sending the instruction to the execution unit which consumes a longer time for processing, and further shorten processing time of the instruction, thereby improving processing efficiency of the superscalar processor.

Further, as a refinement and an extension of the method shown in fig. 3, an embodiment of the present application further provides an instruction sending method. The following description is mainly directed to a case where a plurality of instructions of the same type are to be issued in an issue unit of a superscalar processor, and specifically how to select corresponding target execution units for the plurality of instructions from a plurality of execution units to be selected in different states.

Fig. 4 is a schematic flowchart of a second instruction sending method in an embodiment of the present application, and as shown in fig. 4, the method may include:

s401: the types of the plurality of instructions to be issued by the issue unit are determined.

The specific implementation manner of step S401 is the same as or similar to the specific implementation manner of step S301, and reference may be made to the description in step S301, which is not described herein again.

S402: judging whether an instruction with a branch type exists in the plurality of instructions; if yes, executing S403 and S404; if not, go to S404.

The branch type instruction is an instruction capable of changing the flow of a program. That is, whether the branch condition is satisfied in the branch instruction affects whether the next instruction to be executed will be changed. In other words, if the branch of the branch instruction is true, the next instruction to be executed is changed. Conversely, if the branch of the branch instruction does not hold, the next instruction to be executed is not changed.

In superscalar processors, however, there is typically a deeper pipeline. If the branch prediction fails, the penalty paid is very high, resulting in a flush of the entire pipeline. Therefore, branch prediction is required as early as possible, i.e., a branch instruction is issued preferentially to the corresponding execution unit for processing. Therefore, the judgment can be made as early as possible, the risk of a pipeline of the superscalar processor in the instruction processing process is reduced, and the stability of the superscalar processor is further improved.

S403: the priority of the branch type instruction is configured to be highest.

Wherein the higher the priority level, the earlier the corresponding instruction is issued. That is, after determining that the type of an instruction is a branch type, the priority of the instruction is configured to be the highest level. The target execution unit is selected preferentially for instructions of the branch type over instructions of other types, and the branch instruction is sent preferentially to the target execution unit selected for it.

S404: at least two execution units to be selected corresponding to the types of the plurality of instructions are determined from the plurality of execution units, respectively.

S405: and respectively acquiring the time consumed by the at least two to-be-selected execution units corresponding to the types of the plurality of instructions to process the instructions.

The specific implementation manners of steps S404 and S405 are the same as or similar to the specific implementation manners of steps S302 and S303, and refer to the descriptions in steps S302 and S303, which are not described herein again.

For a branch instruction of the plurality of instructions, or in the case where no branch instruction exists among the plurality of instructions, for any one of the plurality of instructions, for example: the first instruction may first execute the following step S406. Here, the first command is explained as an example.

S406: and selecting a corresponding first execution unit under the condition that the time consumed for processing the instruction is shortest from at least two to-be-selected execution units corresponding to the first instruction.

The specific implementation manner of step S406 is the same as or similar to the specific implementation manner of step S304, and reference may be made to the description in step S304, which is not described herein again.

S407: judging whether the first execution unit is in an idle state currently; if yes, go to S408; if not, executing S409 and S410;

s408: the first execution unit is taken as a target execution unit of the first instruction.

When the first execution unit is judged to be in the idle state currently, if the first instruction is sent to the first execution unit, the first execution unit can immediately process the first instruction. And the first execution unit is also the execution unit with the fastest processing speed in all the execution units capable of processing the first instruction in superscalar processing. Therefore, the first execution unit is used as the target execution unit, so that the speed of processing the first instruction can be fastest, and the processing efficiency of the superscalar processor is improved.

S409: the sum of the time spent by the first execution unit processing the current instruction and the first instruction is determined.

S410: judging whether the sum of the time sum is less than the time consumed by the second execution unit for processing the first instruction; if yes, go to S408; if not, S411 is executed.

When the time consumed by the first execution unit to process the current instruction and the first instruction is judged to be less than the time consumed by the second execution unit to process the first instruction, the time consumed by sending the first instruction to the second execution unit for processing is shorter than the time consumed by the first execution unit to process the first instruction even if the first execution unit processes the first instruction after finishing processing the current instruction. At this time, the first execution unit is still used as the target execution unit, so that the speed of processing the first instruction can be fastest, and the processing efficiency of the superscalar processor is further improved.

For example, assume that a first execution unit takes 2 cycles to process an instruction and a second execution unit takes 5 cycles to process an instruction. The first execution unit is currently processing an instruction and the second execution unit is currently in an idle state. If the first instruction is sent to the first execution unit, the first instruction is processed after the first execution unit finishes processing the current instruction, and at most 4 cycles are consumed. If the first execution unit has processed the current instruction half way, it only takes 3 cycles. If the first instruction is sent to the second execution unit, even if the second execution unit is not currently processing the instruction, it takes 5 cycles for the second instruction to finish processing the first instruction. At this time, the first execution unit is taken as the target execution unit, so that the speed of processing the first instruction can be fastest, and the processing efficiency of the superscalar processor is further improved.

S411: the second execution unit is taken as a target execution unit of the first instruction.

The second execution unit is an execution unit with the second shortest time spent on processing the instruction in at least two execution units to be selected. Typically, the second execution unit is idle. However, when the second execution unit is currently in a non-idle state, the time of the second execution unit processing the first instruction is compared with the time of the third execution unit processing the first instruction, and so on until the corresponding execution unit which takes the shortest time to process the first instruction is found from the plurality of execution units corresponding to the first instruction and is used as the target execution unit.

For example, assume that a first execution unit needs to take 3 cycles to process an instruction and a second execution unit needs to take 5 cycles to process the instruction. The first execution unit has now just started processing an instruction, while the second execution unit is currently in an idle state. If the first instruction is sent to the first execution unit, after the first execution unit finishes processing the current instruction, the first instruction is processed, and 6 cycles are consumed at most. If the first instruction is sent to the second execution unit, it takes only 5 cycles for the second instruction to process the first instruction. At this time, the second execution unit is used as the target execution unit, so that the speed of processing the first instruction can be fastest, and the processing efficiency of the superscalar processor is further improved.

It should be noted here that when the sum of the time taken by the first execution unit to finish processing the current instruction and the first instruction is equal to the time taken by the second execution unit to process the first instruction, the second execution unit is still used to process the first instruction. In this way, the pressure on the first execution unit to process instructions can be reduced.

The above steps S406 to S411 are to select a target execution unit for a first instruction of the plurality of instructions. Next, the description continues with the selection of a target execution unit for a second instruction of the plurality of instructions.

It should be noted here that the second instruction is of the same type as the first instruction. And, the first execution unit has been made the target execution unit to execute the first instruction among the plurality of execution units to be selected.

S412: the sum of the time spent by the first execution unit processing the first instruction and the second instruction is determined.

S413: judging whether the sum of the time consumed by the first execution unit for processing the first instruction and the second instruction is less than the time consumed by the second execution unit for processing the second instruction; if yes, go to S414: if not, S415 is executed.

S414: the first execution unit is taken as a target execution unit of the second instruction.

S415: the second execution unit is taken as a target execution unit of the second instruction.

The specific implementation manners of steps S412, S413, S414, and S415 are the same as or similar to the specific implementation manners of steps S408, S409, S410, and S411, which can be referred to the descriptions of steps S409 and S410, and are not described herein again.

S416: and sending the instruction to be sent by the transmitting unit to the target execution unit.

In a specific implementation process, after determining a target execution unit of a first instruction, the scheduling logic unit controls the multi-path selection unit to generate a corresponding hardware signal, and further causes the multi-path selection unit to control the transmission unit to transmit the cached first instruction to the target execution unit through the hardware signal.

Of course, if a third instruction is present in the issue unit, then the determination of the target execution unit corresponding to the third instruction continues in the same manner as the determination of the target execution unit corresponding to the second instruction, until all instructions of that type in the issue unit have determined the target execution unit.

Further, as a refinement and an extension of the method shown in fig. 3, an embodiment of the present application further provides an instruction sending method. The following description is directed to a superscalar processor having a plurality of different types of instructions to be issued in an issue unit, and only one execution unit capable of processing the plurality of different types of instructions, and how to determine an issue order of the plurality of different types of instructions.

Fig. 5 is a schematic flowchart of a third instruction sending method in an embodiment of the present application, and as shown in fig. 5, the method may include:

s501: the types of the plurality of instructions to be issued by the issue unit are determined.

S502: judging whether an instruction with a branch type exists in the plurality of instructions; if yes, go to S503; if not, executing S504 and S505.

S503: the priority of the branch type instruction is configured to be the highest level and a corresponding target execution unit is determined for the branch type instruction.

S504: target execution units corresponding to types of the plurality of instructions are determined from the plurality of execution units, respectively.

The specific implementation manners of steps S501, S502, S503, and S504 are the same as or similar to the specific implementation manners of steps S301, S402, S403, S302, S303, and S304, and refer to the descriptions in steps S301, S402, S403, S302, S303, and S304, which are not repeated herein.

S505: judging whether the multiple instructions correspond to the same execution unit or not; if yes, go to S506: if not, go to S512.

In some cases, one of the multiple execution units of the superscalar processor may be capable of performing two different functions, and only that execution unit may perform the two different functions in the superscalar processor, while none of the other execution units may perform the two different functions.

For example, assume that execution unit A, execution unit B, and execution unit C are present in a superscalar processor. The execution unit can implement an addition function, the execution unit B can implement a subtraction function, and the execution unit C can implement a multiplication function and a division function. When the transmission unit caches a multiplication instruction and a division instruction, the multiplication instruction and the division instruction can only be processed by the execution unit C, and neither the execution unit a nor the execution unit B can process the multiplication instruction and the division instruction.

In the following, a plurality of instructions are taken as a first instruction and a third instruction, and only one target execution unit can process the first instruction and the third instruction. However, this is not intended to limit the number of instructions to two. The number of the plurality of instructions is not particularly limited.

S506: judging whether the output of the first instruction in the target execution unit is used as the input of other instructions or not, and judging whether the output of the third instruction in the target execution unit is used as the input of other instructions or not; if the first instruction and the third instruction are both yes, executing S507; if the first instruction is yes, and the third instruction is no, then S508 is executed; if the first instruction is no and the third instruction is yes, executing S509; if the first instruction and the third instruction are both no, S510 is executed.

Whether the output of the first instruction in the target execution unit is used as the input of other instructions means whether the processing result of the first instruction in the target execution unit is used by other instructions.

For example, assume that: if the multiplication instruction x is 2 × 3 and the addition instruction y is x +1, and the output result of the multiplication instruction is the input of the addition instruction, it can be determined that the output of the multiplication instruction is the input of the addition instruction. Assuming that the multiplication instruction x is 2 × 3 and the addition instruction z is y +1, it can be determined that the output of the multiplication instruction is not the input of the addition instruction as a result of the output of the multiplication instruction not being the input of the addition instruction.

Whether the output of the third instruction in the target execution unit is used as the input of other instructions is the same as the determination method of the first instruction, and details are not repeated here.

S507: the first and third instructions are configured with priorities based on the importance of the instructions.

S508: the priority of the first instruction is configured as a first priority, and the priority of the third instruction is configured as a second priority.

S509: the priority of the first instruction is configured as a second priority, and the priority of the third instruction is configured as a first priority.

S510: and the priority of the first instruction is configured as the first priority or the second priority, and the priority of the third instruction is configured as the second priority or the first priority.

Wherein the first priority is higher than the second priority. The higher the level of priority, the earlier the corresponding instruction is issued.

That is, the first instruction is said to be more important when the output of the first instruction at the target execution unit is input as another instruction and the output of the third instruction at the target execution unit is not input as another instruction, and therefore the priority of the first instruction is configured as the first priority and the priority of the third instruction is configured as the second priority. And the target execution unit is enabled to process the first instruction preferentially, and then process the third instruction after the first instruction is processed. Thus, the processing efficiency of all instructions can be improved, and the processing efficiency of the superscalar processor can be further improved.

The third instruction is said to be more important when the output of the first instruction at the target execution unit is not input as the other instruction and the output of the third instruction at the target execution unit is input as the other instruction, and therefore the priority of the first instruction is configured as the second priority and the priority of the third instruction is configured as the first priority. And the target execution unit is enabled to process the third instruction preferentially, and then the first instruction is processed after the third instruction is processed. Thus, the processing efficiency of all instructions can be improved, and the processing efficiency of the superscalar processor can be further improved.

When the first instruction is not input as another instruction at the output of the target execution unit and the third instruction is not input as another instruction at the output of the target execution unit, it is explained that neither the first instruction nor the third instruction is particularly important, and therefore, any priority may be configured for the first instruction and the third instruction. That is, the priority of the first instruction is set to the first priority or the second priority, and correspondingly, the priority of the third instruction is set to the second priority or the first priority.

When the output of the first instruction at the target execution unit is used as the input of other instructions, and the output of the third instruction at the target execution unit is also used as the input of other instructions, the first instruction and the third instruction are relatively important, so that the priority is configured for the first instruction and the third instruction according to the importance degree of the instructions. Therefore, more important instructions can be sent to the target execution unit to be processed preferentially, and the performance of the superscalar processor is improved.

In step S507, when priorities are configured for the first instruction and the third instruction according to the importance degree of the instruction, a specific configuration manner may be:

step a: determining the number of the output of the first instruction in the target execution unit as the input of other instructions, and taking the number as the first input number; and determining the number of the output of the third instruction at the target execution unit as the input of other instructions, and taking the number as a second input number.

That is, the first instruction specifies how many other instructions need to be used at the output of the target execution unit, and even how many times it needs to be input in other instructions. Here, how many and how many times can be regarded as the first input number. Accordingly, the third instruction specifies how many other instructions need to be used at the output of the target execution unit, and even how many times it needs to be input in the other instructions. Here, how many and how many times can be regarded as the second input number.

Step b: judging whether the first input quantity is larger than or equal to the second input quantity; if yes, executing step c; if not, executing step d.

Step c: the priority of the first instruction is configured as a first priority, and the priority of the third instruction is configured as a second priority.

Step d: the priority of the third instruction is configured as a first priority, and the priority of the first instruction is configured as a second priority.

That is, when the first input number is greater than or equal to the second input number, it is indicated that the output of the first instruction needs to be used by more other instructions, and the first instruction is more important than the other instructions, so that the priority of the first instruction is configured as the first priority, and the priority of the third instruction is configured as the second priority, so that the more important instructions can be preferentially sent to the target execution unit for processing, thereby improving the performance of the superscalar processor.

When the first input number is smaller than the second input number, the output of the third instruction needs to be used by more other instructions, and the third instruction is more important than the third instruction, so that the priority of the third instruction is configured to be the first priority, the priority of the first instruction is configured to be the second priority, the more important instruction can be sent to the target execution unit to be processed preferentially, and the performance of the superscalar processor is improved.

Of course, it is also possible to configure priorities for the first instruction and the third instruction whose outputs are both required as other instruction inputs in other ways, for example: according to the generation time of the instruction, etc. Here, the specific manner of configuring the priority is not particularly limited.

After the priorities are configured for the branch instruction, the first instruction, and the third instruction in steps S503 and S507-S510, the plurality of instructions may be respectively sent to the target execution unit according to the priorities of the instructions and the order.

S511: the control transmitting unit sequentially transmits the plurality of instructions to the target execution unit based on the priorities of the plurality of instructions.

The specific implementation manner of step S511 is the same as or similar to the specific implementation manner of step S416, and reference may be made to the description in step S416, which is not described herein again.

After it is determined in step S505 that the multiple instructions do not correspond to the same execution unit, the multiple execution units may be simultaneously sent to the target execution units corresponding to the multiple instructions, respectively.

S512: and respectively sending a plurality of instructions to be sent by the sending unit to target execution units corresponding to the plurality of instructions.

The specific implementation manner of step S512 is the same as or similar to the specific implementation manner of step S416, and reference may be made to the description in step S416, which is not described herein again.

Finally, it should be noted that the addition instruction, the multiplication instruction, and the like are only examples of instruction types, and the types of the instruction may include: add, subtract, multiply, divide, and, or, not, shift, etc. The specific type of instruction is not limited herein.

Finally, the instruction sending method provided by the embodiment of the present application is described again by an example.

Fig. 6 is a schematic diagram of an instruction sending flow in the embodiment of the present application, and referring to fig. 6, in a superscalar processor, a send queue 601 is cached in a send unit, and instr1, instr2, instr3, instr4 and other instructions are stored in the send queue. When each instruction needs to be sent to the execution unit for processing, the scheduling logic unit stores the transmission queue in the transmission unit according to the transmission sequence of the instructions to obtain a scheduling queue 602. The scheduling logic unit is also configured with a scheduling policy 603. In particular, the scheduling policy may be a policy function. After the scheduling logic unit determines the execution unit corresponding to each instruction based on the policy function, the multiplexing network 604 converts the execution unit corresponding to each instruction determined by the scheduling logic unit into a hardware signal, and controls the transmitting unit to transmit each instruction to the corresponding execution unit 104 such as exe1, exe2, exe3, exe4, exe5, or exe6 for execution.

Based on the same inventive concept, as an implementation of the method, the embodiment of the application further provides an instruction sending device. The device is applied to a superscalar processor, the superscalar processor at least comprises a transmitting unit and a plurality of executing units, the transmitting unit is used for transmitting instructions, and the executing units are used for executing the instructions transmitted by the transmitting unit. Fig. 7 is a schematic structural diagram of a command sending apparatus in an embodiment of the present application, and referring to fig. 7, the apparatus may include:

a first determining module 701, configured to determine a type of an instruction to be issued by the transmitting unit;

a first selecting module 702, configured to determine at least two execution units to be selected corresponding to the type from the multiple execution units;

a second determining module 703, configured to obtain time consumed by the at least two candidate execution units to process the instruction;

a second selecting module 704, configured to select, from the at least two execution units to be selected, an execution unit corresponding to the shortest time consumed for completing processing of the instruction as a target execution unit;

a sending module 705, configured to send the instruction to be sent by the sending unit to the target execution unit.

Further, as a refinement and an extension of the apparatus shown in fig. 7, an instruction sending apparatus is also provided in the embodiments of the present application. Fig. 8 is a schematic structural diagram of a second instruction sending device in the embodiment of the present application, and referring to fig. 8, the device may include:

a first determining module 801, configured to determine a type of an instruction to be issued by the transmitting unit;

a first selection module 802, configured to determine at least two execution units to be selected corresponding to the type from the multiple execution units;

a second determining module 803, configured to obtain time consumed by the at least two candidate execution units to process the instruction;

the second selection module 804 includes:

a selecting submodule 8041, configured to select, from the at least two execution units to be selected, a corresponding first execution unit in a case where a time consumed for completing processing of the instruction is shortest;

a determining submodule 8042, configured to, when the first execution unit is currently in an idle state, take the first execution unit as a target execution unit;

the determining submodule 8042 is further configured to, when the first execution unit is not currently in an idle state, determine a sum of time consumed by the first execution unit for completing processing of the current instruction and the instruction; when the sum of the time is less than the time consumed by a second execution unit to process the instruction, taking the first execution unit as a target execution unit; when the sum of the time is greater than or equal to the time consumed by a second execution unit for processing the instruction, taking the second execution unit as a target execution unit, wherein the second execution unit is the execution unit which has the second shortest time consumed by finishing processing the instruction in the at least two execution units to be selected;

when the instruction includes a first instruction and a second instruction, the first instruction being of the same type as the second instruction,

the determining submodule 8042 is specifically configured to use the first execution unit as a target execution unit for executing the first instruction;

the determining submodule 8042 is further configured to determine a sum of time consumed by the first execution unit to process the first instruction and the second instruction; when the sum of the time is less than the time consumed by the second execution unit to process the second instruction, taking the first execution unit as a target execution unit for executing the second instruction; when the sum of the time is greater than or equal to the time consumed by a second execution unit to process the second instruction, taking the second execution unit as a target execution unit for executing the second instruction;

when the instruction comprises a first instruction and a third instruction, the type of the first instruction is different from that of the third instruction; when only one of the plurality of execution units is capable of processing the first instruction and the third instruction,

a configuration module 805 configured to, when the output of the first instruction at the target execution unit is used as an input of another instruction, and the output of the third instruction at the target execution unit is not used as an input of another instruction, configure the priority of the first instruction as a first priority and configure the priority of the third instruction as a second priority;

and a configuration module 805 configured to, when the first instruction is not input as an additional instruction at the output of the target execution unit and the third instruction is input as an additional instruction at the output of the target execution unit, configure the priority of the first instruction as a second priority and configure the priority of the third instruction as a first priority;

and a configuration module 805 configured to, when the first instruction is not input as an additional instruction at the output of the target execution unit and the third instruction is not input as an additional instruction at the output of the target execution unit, configure the priority of the first instruction as the first priority or the second priority and configure the priority of the third instruction as the second priority or the first priority;

and a configuration module 805 configured to configure priorities for the first instruction and the third instruction according to importance levels of the instructions when the output of the first instruction at the target execution unit is used as input of other instructions and the output of the third instruction at the target execution unit is used as input of other instructions;

wherein the first priority is higher than the second priority, and the higher the priority level is, the earlier the corresponding instruction is issued;

the configuration module 805 includes:

a computation submodule 8051, configured to determine a first input number of the output of the first instruction in the target execution unit in the other instructions, and determine a second input number of the output of the third instruction in the target execution unit in the other instructions;

a configuration submodule 8052, configured to, when the first input number is greater than or equal to the second input number, configure the priority of the first instruction as the first priority, and configure the priority of the third instruction as the second priority;

the configuration submodule 8052 is further configured to, when the first input number is smaller than the second input number, configure the priority of the third instruction as the first priority, and configure the priority of the first instruction as the second priority.

The configuration module 805 is further configured to, when the type of the instruction is a branch type, configure the priority of the instruction as a highest level; wherein, the higher the priority level is, the earlier the corresponding instruction is issued;

a sending module 806, configured to send the instruction to be sent by the sending unit to the target execution unit.

Further, the types of the instructions include at least: at least one of add, subtract, multiply, divide, and, or, not, shift.

It is to be noted here that the above description of the embodiments of the apparatus, similar to the description of the embodiments of the method described above, has similar advantageous effects as the embodiments of the method. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

Based on the same inventive concept, the embodiment of the application also provides the electronic equipment. Fig. 9 is a schematic structural diagram of an electronic device in an embodiment of the present application, and referring to fig. 9, the electronic device may include: a processor 901, memory 902, and bus 903; wherein, the processor 901 and the memory 902 complete the communication with each other through the bus 903; the processor 901 is configured to call program instructions in the memory 902 to perform the method in one or more embodiments described above.

It is to be noted here that the above description of the embodiments of the electronic device, similar to the description of the embodiments of the method described above, has similar advantageous effects as the embodiments of the method. For technical details not disclosed in the embodiments of the electronic device of the present application, refer to the description of the embodiments of the method of the present application for understanding.

Based on the same inventive concept, the embodiment of the present application further provides a computer-readable storage medium, where the storage medium may include: a stored program; wherein the program controls the device on which the storage medium is located to execute the method in one or more of the above embodiments when the program runs.

It is to be noted here that the above description of the storage medium embodiments, like the description of the above method embodiments, has similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An instruction sending method is applied to a superscalar processor, the superscalar processor at least comprises a transmitting unit and a plurality of executing units, the transmitting unit is used for transmitting an instruction, and the executing units are used for executing the instruction transmitted by the transmitting unit; the method comprises the following steps:

determining the type of an instruction to be issued by the transmitting unit;

determining at least two execution units to be selected corresponding to the type from the plurality of execution units;

acquiring the time consumed by the at least two execution units to be selected for processing the instruction;

selecting the corresponding execution unit under the condition that the time consumed for processing the instruction is shortest from the at least two execution units to be selected as a target execution unit;

and sending the instruction to be sent by the transmitting unit to the target execution unit.

2. The method according to claim 1, wherein the selecting, as the target execution unit, the corresponding execution unit from the at least two execution units to be selected that has the shortest time to complete processing of the instruction comprises:

selecting a corresponding first execution unit under the condition that the time consumed for processing the instruction is shortest from the at least two execution units to be selected;

when the first execution unit is in an idle state currently, taking the first execution unit as a target execution unit;

when the first execution unit is not in an idle state currently, determining the sum of the time consumed by the first execution unit for processing the current instruction and the instruction; when the sum of the time is less than the time consumed by a second execution unit to process the instruction, taking the first execution unit as a target execution unit; and when the sum of the time is greater than or equal to the time consumed by a second execution unit for processing the instruction, taking the second execution unit as a target execution unit, wherein the second execution unit is the execution unit which has the second shortest time consumed by finishing processing the instruction in the at least two execution units to be selected.

3. The method of claim 2, wherein the instruction comprises a first instruction and a second instruction, the first instruction being of a same type as the second instruction;

the taking the first execution unit as a target execution unit includes:

the first execution unit is used as a target execution unit for executing the first instruction;

after taking the first execution unit as a target execution unit, the method further comprises:

determining a sum of time spent by the first execution unit processing the first instruction and the second instruction; when the sum of the time is less than the time consumed by the second execution unit to process the second instruction, taking the first execution unit as a target execution unit for executing the second instruction; and when the sum of the time is larger than or equal to the time consumed by a second execution unit to process the second instruction, taking the second execution unit as a target execution unit for executing the second instruction.

4. The method of any of claims 1-3, wherein the instructions comprise a first instruction and a third instruction, the first instruction being of a different type than the third instruction; only one of the plurality of execution units is capable of processing the first instruction and the third instruction; after selecting, from the at least two candidate execution units, an execution unit corresponding to the shortest time spent processing the instruction as a target execution unit, the method further includes:

when the output of the first instruction at the target execution unit is used as the input of other instructions and the output of the third instruction at the target execution unit is not used as the input of other instructions, the priority of the first instruction is configured as a first priority and the priority of the third instruction is configured as a second priority;

when the output of the first instruction at the target execution unit is not used as the input of other instructions and the output of the third instruction at the target execution unit is used as the input of other instructions, the priority of the first instruction is configured as a second priority and the priority of the third instruction is configured as a first priority;

when the output of the first instruction at the target execution unit is not used as the input of other instructions and the output of the third instruction at the target execution unit is not used as the input of other instructions, the priority of the first instruction is configured as the first priority or the second priority, and the priority of the third instruction is configured as the second priority or the first priority;

when the output of the first instruction at the target execution unit is used as the input of other instructions and the output of the third instruction at the target execution unit is used as the input of other instructions, configuring priorities for the first instruction and the third instruction according to the importance degree of the instructions;

wherein the first priority is higher than the second priority, and the higher the priority level is, the earlier the corresponding instruction is issued.

5. The method of claim 4, wherein configuring priorities for the first instruction and the third instruction according to importance of the instructions comprises:

determining the number of the first instruction which is output at the target execution unit and is used as the input of other instructions, and taking the number as a first input number; determining the number of the output of the third instruction at the target execution unit as the input of other instructions, and taking the number as a second input number;

when the first input number is greater than or equal to the second input number, configuring the priority of the first instruction as the first priority, and configuring the priority of the third instruction as the second priority;

when the first input number is smaller than the second input number, the priority of the third instruction is configured to be the first priority, and the priority of the first instruction is configured to be the second priority.

6. The method according to any of claims 1 to 3, wherein after said determining the type of instruction to be issued by the transmission unit, the method further comprises:

when the type of the instruction is a branch type, configuring the priority of the instruction as the highest level; wherein the higher the priority level, the earlier the corresponding instruction is issued.

7. The method according to any of claims 1 to 3, wherein the type of the instruction comprises at least: at least one of add, subtract, multiply, divide, and, or, not, shift.

8. An instruction issue apparatus, applied to a superscalar processor, said superscalar processor comprising at least a issue unit and a plurality of execution units, said issue unit being adapted to issue instructions, said execution units being adapted to execute instructions issued by said issue unit; the device comprises:

the first determination module is used for determining the type of the instruction to be sent by the transmitting unit;

the first selection module is used for determining at least two execution units to be selected corresponding to the types from the plurality of execution units;

the second determining module is used for acquiring the time consumed by the at least two to-be-selected execution units for processing the instructions;

a second selection module, configured to select, from the at least two execution units to be selected, an execution unit corresponding to the shortest time consumed for completing processing of the instruction as a target execution unit;

and the sending module is used for sending the instruction to be sent by the sending unit to the target execution unit.

9. An electronic device, comprising: a processor, a memory, and a bus;

wherein, the processor and the memory complete mutual communication through the bus; the processor is configured to invoke program instructions in the memory to perform the method of any of claims 1 to 7.

10. A computer-readable storage medium, comprising: a stored program; wherein the program, when executed, controls the device on which the storage medium is located to perform the method according to any one of claims 1 to 7.