CN118295712A

CN118295712A - Data processing method, device, equipment and medium

Info

Publication number: CN118295712A
Application number: CN202410726284.6A
Authority: CN
Inventors: 胡振波; 彭剑英; 蔡骏
Original assignee: Shin Lai Zhirong Semiconductor Technology Shanghai Co ltd
Current assignee: Shin Lai Zhirong Semiconductor Technology Shanghai Co ltd
Priority date: 2024-06-05
Filing date: 2024-06-05
Publication date: 2024-07-05
Anticipated expiration: 2044-06-05
Also published as: CN118295712B

Abstract

The embodiment of the application provides a data processing method, a device, equipment and a medium, and relates to the technical field of computers. The method comprises the following steps: determining whether a first pipeline stage satisfies a simplified condition of the data forwarding network based on pipeline information, the first pipeline stage being any one of the multi-stage pipeline stages; if the first pipeline stage meets the simplification condition, simplifying the data forwarding network of the first pipeline stage; controlling the first pipeline stage to read the execution result of instructions of other pipeline stages based on the simplified data forwarding network; the pipeline information includes at least one of: whether an arithmetic unit is present in the pipeline stage, whether an instruction on the pipeline stage is a performance critical instruction, and whether an instruction on the pipeline stage is a long-cycle instruction. Based on the technical scheme provided by the embodiment of the application, the execution performance of the processor can be improved, and meanwhile, the unnecessary data forwarding path is reduced, so that the consumption of logic resources can be saved.

Description

Data processing method, device, equipment and medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and medium.

Background

Generally, processors execute instructions in accordance with the order of program flow, either in a sequential architecture or an out-of-order architecture. For example, a classical five-stage pipeline processor architecture includes five pipeline stages, an IF (Instruction Fetch ) pipeline stage, an ID (Instruction Decode ) pipeline stage, an EX (instruction execution) pipeline stage, a MEM (Memory Access) pipeline stage, and a WB (Write Back) pipeline stage. Wherein the processor executes instructions in a fixed order at the IF pipeline stage, the ID pipeline stage, and the EX pipeline stage; processors execute instructions out of order in the EX pipeline stage and MEM pipeline stage.

Typically, the sequencing among instructions determines the determination of dependencies, including resource dependencies and data dependencies. However, due to the dependency relationship among instructions, for instructions with different periods, such as early time of multi-period distribution, late time of single-period instruction distribution, early time distribution of the distributed instructions is not completed, later time distribution of the distributed instructions is completed, the instructions need to consume invalid waiting time to write back, for more and more double-emission and multiple-emission mechanisms, the situation becomes more and more common, a large number of instructions in a pipeline stage are invalid to wait, the execution performance of a processor is greatly affected, and once the execution of the instructions is completed, the data is forwarded to all instructions, so that the waiting period generated by data dependence among the instructions can be reduced, and the performance of the processor can be improved.

However, the above-mentioned data forwarding method consumes a large amount of logic resources, which increases the circuits between pipeline stages, resulting in an excessively long data forwarding logic path, and affecting the execution performance of the processor.

Disclosure of Invention

The embodiment of the application provides a data processing method, a device, equipment and a medium.

In a first aspect of an embodiment of the present application, a data processing method is provided, applied to a processor of a multi-stage pipeline stage; the method comprises the following steps: determining whether a first pipeline stage satisfies a simplified condition of the data forwarding network based on pipeline information, the first pipeline stage being any one of the multi-stage pipeline stages; if the first pipeline stage meets the simplification condition, simplifying the data forwarding network of the first pipeline stage; controlling the first pipeline stage to read the execution result of instructions of other pipeline stages based on the simplified data forwarding network; wherein the pipeline information includes at least one of: whether an arithmetic unit is present in the pipeline stage, whether an instruction on the pipeline stage is a performance critical instruction, and whether an instruction on the pipeline stage is a long-cycle instruction.

In an alternative embodiment of the application, determining whether the first pipeline stage satisfies the reduced condition based on the pipeline information comprises: determining whether an arithmetic unit is present in the first pipeline stage; if the first pipeline stage does not have an operation unit, determining whether the first instruction has dependent data in the target pipeline stage, wherein the first instruction is an instruction on the first pipeline stage at the current moment, and the data on the target pipeline stage disappears from the pipeline in the next period; if the first pipeline stage meets the simplification condition, simplifying the data forwarding network of the first pipeline stage, including: if the first instruction does not have the dependent data in the target pipeline stage, simplifying a data forwarding network of the first pipeline stage; wherein the data forwarding network of the first pipeline stage is simplified to indicate that the first pipeline stage does not read the execution result of the subsequent instructions of the other pipeline stages than the target pipeline stage.

In an alternative embodiment of the application, the method further comprises, after determining whether the first instruction has dependent data at the target pipeline stage: if the first pipeline stage has no operation unit and the first instruction has dependent data in the target pipeline stage, the first pipeline stage is controlled to read the data of the instruction of the target pipeline stage.

In an alternative embodiment of the present application, after determining whether an arithmetic unit is present in the first pipeline stage, the method further comprises: if the operation unit exists in the first pipeline stage, judging whether the type of the first instruction is a performance critical instruction or not; if the first instruction is a performance critical instruction, the first instruction on the first pipeline stage is controlled to read data of instructions on all pipeline stages subsequent to the first pipeline stage.

In an alternative embodiment of the present application, after determining whether the type of the first instruction is a performance critical instruction, the method further includes: if an operation unit exists in the first pipeline stage and the first instruction is not a performance critical instruction, judging whether the instruction on the first pipeline stage is a long-period instruction or not; if the execution time of the first instruction is a plurality of periods and then the first instruction is executed in the second pipeline stage, judging whether the first instruction has dependent data in the target pipeline stage or not; if there is no data on which the first instruction depends at the target pipeline stage, the data forwarding network of other pipeline stages after the first pipeline stage, except the second pipeline stage, is simplified.

In an alternative embodiment of the application, the method further comprises, after the target pipeline stage, determining whether the first instruction has dependent data, the method further comprising: if the execution timing of the first instruction is after a plurality of cycles and there is data on which the first instruction depends in the target pipeline stage, the instruction controlling the first pipeline stage reads the data of the instruction of the target pipeline stage.

In an alternative embodiment of the application, the method further comprises: and delaying to release the dependency relationship between the first instruction and the second instruction, wherein the second instruction is an instruction which has a dependency relationship with the first instruction on the target pipeline stage.

In a first aspect of an embodiment of the present application, there is provided a data processing apparatus, the apparatus comprising: a determining module, a simplifying module and a control module; determining whether a first pipeline stage satisfies a simplified condition of the data forwarding network based on pipeline information, the first pipeline stage being any one of the multi-stage pipeline stages; a simplifying module, configured to simplify the data forwarding network of the first pipeline stage if the first pipeline stage meets a simplifying condition; the control module is used for controlling the execution result of the instructions of the first pipeline stage based on other pipeline stages after the simplified data forwarding network reads; wherein the pipeline information includes at least one of: whether an arithmetic unit is present in the pipeline stage, whether an instruction on the pipeline stage is a performance critical instruction, and whether an instruction on the pipeline stage is a long-cycle instruction.

In an alternative embodiment of the present application, the determining module is specifically configured to: determining whether an arithmetic unit is present in the first pipeline stage; if the first pipeline stage does not have an operation unit, determining whether the first instruction has dependent data in the target pipeline stage, wherein the first instruction is an instruction on the first pipeline stage at the current moment, and the data on the target pipeline stage disappears from the pipeline in the next period; the simplified module is specifically used for: if the first instruction does not have the dependent data in the target pipeline stage, simplifying a data forwarding network of the first pipeline stage; wherein the data forwarding network of the first pipeline stage is simplified to indicate that the first pipeline stage does not read the execution result of the subsequent instructions of the other pipeline stages than the target pipeline stage.

In an alternative embodiment of the application, the control module is further configured to: after the determination module determines whether the first instruction has dependency data in the target pipeline stage, if the arithmetic unit is not present in the first pipeline stage and the first instruction has dependency data in the target pipeline stage, the first pipeline stage is controlled to read data of the instruction of the target pipeline stage.

In an alternative embodiment of the application, the determining module is further configured to: after determining whether an operation unit exists in the first pipeline stage, if the operation unit exists in the first pipeline stage, judging whether the type of the first instruction is a performance critical instruction; the control module is also used for: if the first instruction is a performance critical instruction, the first instruction on the first pipeline stage is controlled to read data of instructions on all pipeline stages subsequent to the first pipeline stage.

In an alternative embodiment of the application, the determining module is further configured to: if an operation unit exists in the first pipeline stage and the first instruction is not a performance critical instruction, judging whether the instruction on the first pipeline stage is a long-period instruction or not; if the execution time of the first instruction is a plurality of periods and then the first instruction is executed in the second pipeline stage, judging whether the first instruction has dependent data in the target pipeline stage or not; the simplification module is also for: if there is no data on which the first instruction depends at the target pipeline stage, the data forwarding network of other pipeline stages after the first pipeline stage, except the second pipeline stage, is simplified.

In an alternative embodiment of the application, the control module is further configured to: after the determining module determines whether the first instruction has dependent data in the target pipeline stage, if the execution timing of the first instruction is after a plurality of cycles and the first instruction dependent data is in the target pipeline stage, the instruction controlling the first pipeline stage reads the data of the instruction of the target pipeline stage.

In an alternative embodiment of the application, the data processing apparatus further comprises: a dependency release module; the dependency relationship releasing module is used for delaying releasing the dependency relationship between a first instruction and a second instruction, wherein the second instruction is an instruction which has the dependency relationship with the first instruction on a target pipeline stage.

In a third aspect of the embodiment of the present application, there is provided a computer apparatus including: the computer program comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of any one of the data processing methods when executing the computer program.

In a fourth aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data processing method of any of the above.

The embodiment of the application provides a data processing method, which provides a high-efficiency and quick architecture capable of forwarding data to a required module with minimum logic at a proper time point, namely, when a processor of a multi-stage pipeline stage executes instructions on a pipeline stage, judging each instruction on each stage pipeline stage, and judging whether the stage pipeline stage meets the simplification condition of a forwarding network based on pipeline stage information such as whether an operation unit exists in the pipeline stage and whether the type of the instruction on the pipeline stage is a performance critical instruction or not and whether the instruction is a long-period instruction or not; when the simplifying condition is met, the forwarding network between the pipeline stage of the stage and other subsequent pipeline stages can be simplified, namely, the use of logic resources is reduced, the generation of an ultra-long data forwarding logic path is avoided, the circuits between the pipeline stages are reduced, the probability of reducing the main frequency of the processor is further reduced, the complexity of the forwarding network is simplified, and the execution performance of the processor can be improved on the premise of ensuring the reduction of the waiting time of instructions due to the dependency relationship.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of a five-stage pipelined processor architecture in the related art;

FIG. 2 is a diagram of a related art data processing method;

FIG. 3 is a diagram of a related art data processing method;

FIG. 4 is a flow chart of a data processing method according to an embodiment of the present application;

FIG. 5 is a flow chart of a data processing method according to an embodiment of the present application;

fig. 6 is a simplified logic diagram of a forwarding network according to one embodiment of the present application;

fig. 7 is a simplified logic diagram of a forwarding network according to one embodiment of the present application;

FIG. 8 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

In carrying out the application, the inventors have found that the current manner in which the processor reclaims space has a greater impact on processor performance.

Technical terms in the related art are specifically described herein first:

1. classical five-stage pipeline processor architecture

FIG. 1 is a schematic diagram of a five-stage pipeline processor architecture in the related art. As shown in fig. 1, includes the following five pipeline stages. Wherein: IF pipeline stages are responsible for reading instructions from memory. An ID pipeline stage, responsible for identifying instructions. EX pipeline stage, responsible for instruction operation; the operation instruction performs operation in the pipeline stage to obtain an operation result. MEM pipeline stage responsible for accessing memory; data is read from an internal or external memory, or execution results are written to the memory. WB pipeline stage responsible for writing back result; and writing the result of instruction execution back to the register file of the processor, so that the next quick access is convenient. Where FIG. 1 is a classical five-stage pipeline workflow, a portion of the instruction's lifecycle may go through the 5-stage pipeline described above, or a 4-stage pipeline of 5 stages (e.g., some instructions do not require WB pipeline stages and some instructions do not require MEM pipeline stages). If an instruction does not fetch all data when it enters the EX pipeline stage, or if any operand is not fetched correct data when it enters the EX pipeline stage, then the instruction can only wait at the EX pipeline stage until the operand is fetched. Such dependencies between operands are referred to as data dependencies. The sequence among the instructions determines the judgment of the dependency relationship, wherein the resource dependency is common in that different instructions in the same class preempt the same processing unit or access different access memory instructions in the same address. The data dependencies are in program order, including RAW (READ AFTER WRITE, read-after-write) dependencies and WAW (WRITE AFTER WRITE, write-after-write) dependencies, where a RAW dependency indicates that execution of a subsequent instruction requires the use of the result of an instruction that has not been executed before, and both instructions of the WAW need to be written back to the same register.

It should be noted that the method in the embodiment of the present application may be applied to a multi-stage pipeline stage processor architecture, for example, instruction execution of a processor of a five-stage pipeline, a nine-stage pipeline, or the like.

2. Problems in common program flows

1 # for (i=0;i<5;i++)

2 #{

3 # a=a+1；

4 #｝

5 ##

6

7

8 li to，5

9 li t1，0

10 loop;

11 lw ao，(s0)

12 add a0，a0，t1

13 sw ao，(s0)

14 add s0，s0，4

15 add t1，t1，1

16 bne t1，t0 loop

From the above program, it can be seen that the main part of the program, i.e. the loop starts to the loop ends, can see that some data both saves the calculation result and is also used as the data source of the next instruction. Typically from row 11 to row 13. At line 11, the lw instruction fetches a 32-bit (which may be modified according to the application) data from the location in memory where the address is s0, and saves the data to register a 0. On line 12, the add instruction adds the data in the a0 register to the data in the t1 register, while saving the result to the a0 register. At line 13, the sw instruction saves the data in the a0 register to the space in memory with address s 0. In this succession of 3 instructions, the a0 register is frequently used. If data is obtained from the most primitive registers each time, it is imperative that some blocking be caused. Taking the classical 5-stage pipeline mentioned in fig. 1 as an example, in conjunction with fig. 2, fig. 2 is a data processing manner in the related art, in the 1 st cycle, when the lw instruction arrives at the MEM pipeline stage, the add instruction arrives at the EX pipeline stage, and at this time, the lw instruction issues a memory read request, waiting for data. At this time, the add instruction cannot take the data, and blocks the pipeline. In cycle 2, the lw instruction continues to wait for data and the add instruction continues to clamp the pipeline. New sw instructions cannot enter the EX pipeline stage. In the 3 rd period, the lw instruction receives data and can be sent to the next WB pipeline stage, and the add instruction still cannot acquire the data and continues waiting, so that the pipeline is blocked. In the 4 th cycle, the lw instruction performs a write-back operation, but the data is updated in the next cycle, so the add instruction still cannot acquire the data and continues to wait. The add instruction takes the data and starts execution. While the pipeline is released allowing new instructions to come in. New sw instructions to the EX pipeline stage and add instructions to the MEM pipeline stage. Since the sw instruction is also a memory access instruction, it is not done until the MEM pipeline stage executes. The sw instruction arrives at the MEM pipeline stage and the add instruction arrives at the WB pipeline stage. The data will be updated to the register in the next cycle, so the sw instruction cannot acquire the data and continues waiting. In cycle 8, the sw instruction takes the data and performs a memory write operation.

It can be seen that the dependency relationship between data is critical to the performance of the processor, and a data forwarding technique is currently provided in the related art, in which data is forwarded to subsequent other instructions immediately after the instruction operation is completed. For example, in connection with fig. 3, fig. 3 is a data processing manner in the related art, in fig. 3, in cycle 1, an add instruction arrives at the EX pipeline stage, an lw instruction walks to the MEM pipeline stage, and the add instruction of the EX pipeline stage depends on the execution result of the lw instruction, thus seizing waiting data. (resulting in a greater drive load and a lower processor operating frequency). In cycle 2, the lw instruction issues a memory read request and the add instruction continues to wait for data and the card is pipelined. In the 3 rd period, the lw instruction receives the data returned by the memory, and simultaneously forwards the data to the add instruction immediately and walks to the next stage pipeline. The add instruction takes the data and starts to perform the addition operation. In cycle 4, the new instruction sw instruction arrives at the EX pipeline stage, the add instruction arrives at the MEM pipeline stage, and the lw instruction is at the WB pipeline stage. The add instruction has completed the operation and forwards the data to the sw instruction, which can fetch the data in advance. In cycle 5, the sw instruction arrives at the MEM pipeline stage and is executed directly. The add instruction arrives at the WB pipeline stage and starts a write back.

It will be appreciated that a processing architecture with data forwarding may reduce latency between instructions due to data dependencies, and may improve processor performance. However, in the above design, first, a large amount of logic resources are consumed for data forwarding, which increases the circuits between pipeline stages, resulting in a reduction of the overall main frequency of the processor. In particular, when complex logic such as adders, multipliers, dividers and the like is provided, the operation results of these instructions are forwarded, and these complex result generation logic (operation logic) is connected with the operation data update logic, resulting in an excessively long data forwarding logic path. Whereas a longer logic path means a longer actual circuit path, meaning that the main frequency of the circuit operation is reduced more; secondly, the data forwarding needs to be accurately controlled, the early forwarding does not necessarily bring about equivalent benefits, the late forwarding can cause performance and even functional problems, and the design complexity is high.

In view of the foregoing, an embodiment of the present application provides a data processing method, which provides an efficient and fast architecture that can forward data to a desired module with minimum logic at a proper point in time, that is, when a processor of a multi-stage pipeline stage executes instructions on a pipeline stage, a determination can be made for each instruction on each stage pipeline stage, based on pipeline stage information such as whether the pipeline stage has an operation unit, whether the type of instruction on the pipeline stage is a performance critical instruction, and whether the pipeline stage is a long-period instruction, so as to determine whether the stage pipeline stage satisfies a simplified condition of a forwarding network; when the simplifying condition is met, the forwarding network between the pipeline stage of the stage and other subsequent pipeline stages can be simplified, namely, the use of logic resources is reduced, the generation of an ultra-long data forwarding logic path is avoided, the circuits between the pipeline stages are reduced, the probability of reducing the main frequency of the processor is further reduced, the complexity of the forwarding network is simplified, and the execution performance of the processor can be improved on the premise of ensuring the reduction of the waiting time of instructions due to the dependency relationship.

The scheme in the embodiment of the application can be realized by adopting various computer languages, such as object-oriented programming language Java, an transliteration script language JavaScript and the like.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

Fig. 4 is a flow chart of a data processing method according to an embodiment of the application. Referring to fig. 4, the following embodiments take the above processor as an execution body, and apply the data processing method provided by the embodiments of the present application to a computer device supporting single/multiple transmissions, for specifically describing a processor supporting multiple pipeline stages of single/multiple transmissions as an example. The data processing method provided by the embodiment of the application comprises the following steps 401-403:

Step 401, the processor determines, based on the pipeline information, whether the first pipeline stage satisfies a reduced condition of the data forwarding network.

The present application is to simplify the data forwarding network based on forwarding all the data after execution provided in the related art.

Wherein the pipeline information includes at least one of: whether an arithmetic unit (or instruction execution unit) is present in the pipeline stage, whether an instruction on the pipeline stage is a performance critical instruction, and whether an instruction on the pipeline stage is a long-cycle instruction.

By way of example, instructions may be partitioned based on characteristics of the instructions in embodiments of the application, which may include performance critical instructions, non-performance critical instructions, and long-cycle instructions. The performance critical instructions may include instructions for: instructions that can be quickly calculated and frequently used, such as add instructions, subtract instructions, shift instructions, and jump instructions. Non-performance critical instructions may include instructions that access system status registers. Long-cycle instructions may include instructions that are computationally long and logically complex, such as multiply and divide instructions, that are executed after a number of pipeline stages following the current pipeline stage (typically at the end pipeline stage, e.g., the previous stage of a WB pipeline stage).

In step 402, if the first pipeline stage satisfies the simplification condition, the processor simplifies the data forwarding network of the first pipeline stage.

Wherein the data forwarding network of the first pipeline stage is simplified to instruct the first pipeline stage not to read data of instruction execution results of some or all pipeline stages subsequent to the first pipeline stage, and the simplification is performed specifically based on the specific simplification condition being satisfied.

Step 403, the processor controls the first pipeline to read the execution result of the instruction of the other pipeline stage after the other pipeline stage based on the simplified data forwarding network.

It will be appreciated that the processor may be configured to process data on the pipeline stages based on whether the pipeline stages have an arithmetic unit, whether instructions on the pipeline stages are performance critical instructions, whether instructions on the pipeline stages are long-cycle instructions, and avoid forwarding instruction calculation results directly onto all subsequent pipeline stages, so that a data forwarding network between each stage of pipeline stages may be simplified, and thus, under a condition of reducing a waiting period generated due to a data dependency relationship, a probability of reducing a main frequency of the processor may be reduced, and design complexity of the forwarding network may be simplified.

The data processing method provided by the embodiment of the application provides a high-efficiency and rapid architecture capable of forwarding data to a required module with minimum logic at a proper time point, namely, when a processor of a multi-stage pipeline stage executes instructions on a pipeline stage, judging can be carried out for each instruction on each stage pipeline stage, and judging whether the stage pipeline stage meets the simplification condition of a forwarding network based on pipeline stage information such as whether an operation unit exists in the pipeline stage and whether the type of the instruction on the pipeline stage is a performance critical instruction or not and whether the instruction is a long-period instruction or not; when the simplifying condition is met, the forwarding network between the pipeline stage of the stage and other subsequent pipeline stages can be simplified, namely, the use of logic resources is reduced, the generation of an ultra-long data forwarding logic path is avoided, the circuits between the pipeline stages are reduced, the probability of reducing the main frequency of the processor is further reduced, the complexity of the forwarding network is simplified, and the execution performance of the processor can be improved on the premise of ensuring the reduction of the waiting time of instructions due to the dependency relationship.

In an alternative embodiment of the present application, in conjunction with fig. 5, the above step 401 may be specifically the following steps 401a and 401b, and further the above step 402 may include the following step 402a:

Step 401a, the processor determines whether an arithmetic unit is present in the first pipeline stage.

It should be noted that in a processor with multiple pipeline stages, not every pipeline stage has an arithmetic unit, i.e. no arithmetic unit is present in some of the pipeline stages in the multi-stage pipeline stage. For example, an EX pipeline stage in nine stages includes five EX sub-stages, where some of the EX sub-stages have no arithmetic units present, and some of the EX sub-stages have arithmetic units present, particularly in association with the architecture used by the processor, as embodiments of the application are not specifically limited in this regard.

In step 401b, if there is no arithmetic unit in the first pipeline stage, the processor determines whether the first instruction has dependency data in the target pipeline stage.

Wherein the first instruction is an instruction on the first pipeline stage at the current time. The data on the target pipeline stage will disappear from the pipeline in the next cycle.

The data on which the first instruction depends is an execution result of the instruction on which the first instruction depends.

The target pipeline stage may be a five-stage pipeline stage WB pipeline stage, for example. The target pipeline stage may also be an EX4 stage of nine stages of pipeline stages; in the nine-stage pipeline, the EX pipeline stage comprises an EX0 stage, an EX1 stage, an EX2 stage, an EX3 stage and an EX4 stage, the WB pipeline stage can be a shadow register hidden in the EX4 stage, and the result of an instruction executed by the EX4 stage can be saved in the shadow register and simultaneously saved in a destination register. This hidden register is therefore the last position of the instruction execution result in the pipeline stage.

In step 401b described above, if the first pipeline stage does not have an arithmetic unit, the (possible) data is only fetched from the target pipeline stage, and not from other pipeline stages (other than the target pipeline stage) after the first pipeline stage, which in effect simplifies the data forwarding network.

In step 402a, if the first instruction does not have dependent data in the target pipeline stage, the processor simplifies the data forwarding network of the first pipeline stage.

Wherein the data forwarding network of the first pipeline stage is simplified to indicate that the first pipeline stage does not read the execution result of the subsequent instructions of the other pipeline stages than the target pipeline stage. I.e. without building a forwarding network for the first pipeline stage and for the subsequent other pipeline stages, reducing forwarding logic.

Based on the scheme, under the condition that an operation unit does not exist in the pipeline stage, instructions in the pipeline stage do not operate, if the instructions on the pipeline stage do not exist in the target pipeline stage, that is, the dependent data cannot disappear from the pipeline stage temporarily, the processor can directly simplify the forwarding network of the pipeline stage and other pipeline stages, that is, the execution result of the instructions on the other pipeline stages is not read, that is, the data forwarding network between the pipeline stage and the other pipeline stages is not established, that is, the complexity of the forwarding network is simplified, and the temporarily unnecessary data is not required to be forwarded, so that the execution efficiency of the processor is improved.

In an alternative embodiment of the present application, after the above step 401b, the following steps may be further included: if the first pipeline stage does not have an operation unit and the first instruction has dependent data in the target pipeline stage, the processor does not simplify the data forwarding network and can control the first pipeline stage to read the data of the instruction of the target pipeline stage.

It will be appreciated that if no arithmetic unit is present in the current pipeline stage, the instruction does not perform an arithmetic operation at the current pipeline stage, and if the data on which the instruction depends does not disappear from the pipeline at the next cycle, the arithmetic results of instructions on other pipeline stages may not be read at the current pipeline stage. If the instruction dependent data will disappear from the pipeline stage in the next cycle, the processor controls the current pipeline stage to read the result of the operation performed on the target pipeline stage.

Based on this scheme, if there is no arithmetic unit in the pipeline stage, i.e. the first instruction does not perform an arithmetic operation in the current pipeline stage, but the data on which the first instruction depends is about to disappear from the pipeline stage, in which case the processor reads the data of the instruction on the target pipeline stage for the first pipeline stage to avoid a waiting period caused by the first instruction reading the data from the register.

In an alternative embodiment of the present application, after the step 401a, the following step 401c is further included:

In step 401c, if an arithmetic unit exists in the first pipeline stage, the processor determines whether the first instruction is a performance critical instruction.

Furthermore, in the data processing method provided by the embodiment of the present application, the following step 404 may be further included:

If the first instruction is a performance critical instruction, the processor controls the first instruction on the first pipeline stage to read the data of the instructions on all pipeline stages following the first pipeline stage, step 404.

It should be noted that, the performance critical instructions are generally large in number and are applied in a large number in the program, and the operands can be quickly executed after being fetched, so if the instruction is the performance critical instruction, the processor can forward data for the instruction as much as possible, so that the instruction can be quickly executed and completed, and unnecessary waiting of the instruction is avoided.

Based on the scheme, if an operation unit exists in the pipeline stage and the first instruction is a performance critical instruction, the first instruction may need to be executed in the current pipeline stage, and the number of the instructions is large, so that the processor can forward data for the instructions as much as possible, so that the instructions can quickly take required operands to operate, waiting periods of the instructions for operating without taking data are reduced as much as possible, and the execution efficiency and the execution performance of the processor can be greatly improved.

In an alternative embodiment of the present application, after the step 401a, the following steps 401d and 401e are further included:

in step 401d, if there is an operation unit in the first pipeline stage and the first instruction is not a performance critical instruction, the processor determines whether the first instruction is a long-cycle instruction.

In step 401e, if the execution timing of the first instruction is multiple cycles and then the first instruction is executed in the second pipeline stage, the processor determines whether there is dependent data in the first instruction in the target pipeline stage.

Further, the step 402 may be performed by the following step 402b:

if there is no data on which the first instruction depends at the target pipeline stage, the processor simplifies the data forwarding network of the other pipeline stages after the first pipeline stage, except the second pipeline stage, step 402 b.

It will be appreciated that where there is an arithmetic unit in the first pipeline stage, the first instruction is not a performance critical instruction, it may be determined whether the first instruction is a long-cycle instruction, such as a long-cycle instruction, i.e. the instruction does not require an operation in the first pipeline stage any more, and the data on which the first instruction depends is not to leave the pipeline stage, indicating that the first instruction does not require immediate retrieval of data of other instructions at the first pipeline stage, and therefore the data forwarding network to which the first instruction corresponds at the first pipeline stage may be simplified, as the first instruction is executed at the second pipeline stage, and the processor therefore retains the forwarding network of the second pipeline stage and the first pipeline stage.

Based on the scheme, when an operation unit exists in the current pipeline stage, and the instruction is a long-period instruction, namely the instruction needs to be executed by a second pipeline stage after a plurality of pipeline stages, namely the instruction does not need to be executed immediately at present, and the data on which the first instruction depends cannot disappear from the pipeline temporarily, the processor can reserve a data forwarding network between the first pipeline stage and the execution pipeline stage of the first instruction, simplify the data forwarding network between the first pipeline stage and other pipeline stages, and avoid useless data forwarding, thereby improving the execution efficiency of the processor.

In an alternative embodiment of the present application, after the step 401e, the following step 405 may be further included:

In step 405, if the execution timing of the first instruction is after a plurality of cycles and there is data on which the first instruction depends in the target pipeline stage, the processor controls the instruction of the first pipeline stage to read the data of the instruction of the target pipeline stage.

Based on the scheme that the current pipeline stage comprises an execution unit, the instruction on the pipeline stage is a long-period instruction, and the data on the instruction is about to be output from the pipeline stage, the processor needs to control the instruction on the first pipeline stage to immediately read the dependent data so as to avoid that the data cannot be read in the pipeline stage, and the waiting period of the read data needs to be read again from a register, thus improving the execution efficiency of the processor.

In an alternative embodiment of the present application, in the step 404 or the step 406, the following step 400 may be further included:

Step 400, the processor delays releasing the dependency of the first instruction and the second instruction.

Wherein the second instruction is an instruction on the target pipeline stage that has a dependency relationship with the first instruction.

Based on this scheme, when the processor determines that the data forwarding network can be simplified, the process can delay releasing the dependency relationship between the current instruction and the dependent instruction, and then release the dependency relationship until the execution of the current instruction is completed.

Examples:

FIG. 6 is a simplified logic diagram of a forwarding network according to one embodiment of the present application, wherein for ALU1 instructions and BJP instructions, which are performance critical instructions, a processor can process quickly, and the instructions can receive the most complete and most data forwarding; for AGU1 instruction, the instruction is used for access and belongs to performance critical instruction, so the instruction also receives the most complete and most data forwarding, and because the instruction can only forward from front to back according to pipeline stage, the forwarding network of EX0 stage is the most complex, then EX2 stage and finally EX4 stage. Among them, ALU (Arithmetic and logic Unit) instructions are commonly used in the instruction set of computers to perform Arithmetic and logic operations. BJP (Branch & JumP) instructions are commonly used to: and executing the branch jump operation. Such as the usual if-else program or loop body program, etc. AGU (ADDRESS GENERATE Unit) instructions are instructions used to calculate an address, access memory.

FIG. 7 is a simplified logic diagram of a forwarding network according to one embodiment of the present application, with no arithmetic units for pipeline stages without arithmetic units, such as EX1 stage and EX3 stage. For the EX1 stage, the AGU2 instruction at this point is processing data that needs to be saved to memory, which is not necessary here, and thus corresponds to the pipeline stage not executing the AGU2 instruction; for EX3 stage, the source of data forwarding for this pipeline stage has only the necessary write back port, and if the instruction dependent data on EX1 stage/EX 3 stage is just on EX4 stage, i.e. the instruction sees its required data on the write back port, the forwarding network does not forward the data, otherwise it is not specially processed. With continued reference to fig. 7, for MUL (multiply) and DIV (divide) instructions, since such instructions are processed only at EX4 stage, they are not immediately forwarded in the present application when at EX0, EX1, EX2 and EX3 stages, even though MUL and DIV instructions have seen the required data from other pipeline stages than EX4 stage (data forwarding should be started here according to the related art). The forwarding data in this case does not increase the execution speed of the MUL instruction and the DIV instruction, but rather causes a waste of forwarding resources, i.e. the forwarding logic of these pipeline stages is complicated, so that for the MUL instruction and the DIV instruction, unless the dependent data reach the EX4 stage, i.e. the data is forwarded immediately before being written back to the register file, no special processing is performed.

The data processing method provided by the embodiment of the application can simplify the data forwarding of the pipeline stage, greatly reduce the design complexity, reduce the logic consumption and improve the processing frequency of the pipeline stage. The designed multi-stage pipeline data forwarding network is used for simplifying the data forwarding network of certain pipeline stages under the condition of ensuring the instruction execution efficiency from the practical application scene. The complexity of the whole data forwarding network can be reduced, logic consumption (meaning that power consumption is also reduced) can be reduced, and the working frequency of each pipeline stage can be improved.

The data processing mode of the application is simplified to obtain the data forwarding network architecture which can be suitable for various processor architectures, and the benefits brought by the method become more obvious along with the increase of the number of pipeline stages and the increase of the number of the transmitted instructions. Because complex pipelines mean complex data forwarding networks, the operating frequency of the processor must be affected if these data networks are not tailored. In a classical five-stage pipeline, since the EX pipeline stage and the MEM pipeline stage are respectively positioned at two pipeline stages, the EX pipeline stage can judge whether an instruction is a performance key instruction or not, and read data as much as possible, so that the pipeline is prevented from being blocked. For MEM pipeline stages, since the memory address is unlikely to be of the floating point data type, it can be considered that there is no floating point execution unit at this stage, and then there is no need to read the result of the floating point instruction. In the nine-stage pipeline of the present application, the individual execution units are not equally divided among each pipeline stage, as there are more pipeline stages. Thus for pipeline stages where no execution units are present no connection is established to other pipeline stages than WB pipeline stages, simplifying the data forwarding network between these pipeline stages. When the execution unit needs to execute the performance critical instruction, the pipeline stage and other pipeline stages establish a data forwarding network, and the integrity of the forwarding networks is ensured as much as possible, so that the instructions executed in the execution units can take the required data as soon as possible. Finally, the implementation of logically complex instructions (executed on later pipeline stages in the pipeline), such as divide instructions, floating point instructions, etc., is only present in the last pipeline stage in the architecture, so these instructions only build up a data forwarding network with other pipeline stages when the WB pipeline stage is implemented, so these instructions do not, or only a very small number, of data forwarding relationships network when the other pipeline stages are implemented, and thus the final forwarding network can also be greatly simplified. If a dual-issue nine-stage pipeline, the forwarding network of such instructions may be tailored by determining whether a pipeline will execute the corresponding type of instruction, again based on the logic described above. If there are execution units, and are performance critical units, there are more forwarding. If there are no execution units, little to no forwarding occurs. In summary, by fully utilizing the architecture and instruction characteristics, the method can improve the working frequency of a hardware circuit by reducing unnecessary data forwarding networks in a pipeline while not damaging the performance of the processor, thereby improving the performance of the whole processor.

It should be understood that, although the steps in the flowchart are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the figures may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of other steps or other steps.

Referring to fig. 8, an embodiment of the present application provides a schematic structural diagram of a data processing apparatus, and a data processing apparatus 800 includes: a determination module 801, a simplification module 802, and a control module 803; a determining module 801, configured to determine, based on pipeline information, whether a first pipeline stage satisfies a simplified condition of the data forwarding network, where the first pipeline stage is any one of the multiple stages of pipeline stages; a simplification module 802, configured to simplify the data forwarding network of the first pipeline stage if the first pipeline stage meets the simplification condition; a control module 803 for controlling the first pipeline stage to execute the instruction based on the other pipeline stages after the simplified data forwarding network read; wherein the pipeline information includes at least one of: whether an arithmetic unit is present in the pipeline stage, whether an instruction on the pipeline stage is a performance critical instruction, and whether an instruction on the pipeline stage is a long-cycle instruction.

In an alternative embodiment of the present application, the determining module 801 is specifically configured to: determining whether an arithmetic unit is present in the first pipeline stage; if the first pipeline stage does not have an operation unit, determining whether the first instruction has dependent data in the target pipeline stage, wherein the first instruction is an instruction on the first pipeline stage at the current moment, and the data on the target pipeline stage disappears from the pipeline in the next period; the simplification module 802 is specifically configured to: if the first instruction does not have the dependent data in the target pipeline stage, simplifying a data forwarding network of the first pipeline stage; wherein the data forwarding network of the first pipeline stage is simplified to indicate that the first pipeline stage does not read the execution result of the subsequent instructions of the other pipeline stages than the target pipeline stage.

In an alternative embodiment of the present application, the control module 803 is further configured to: after the determination module 801 determines whether the first instruction has dependency data in the target pipeline stage, if the arithmetic unit is not present in the first pipeline stage and the first instruction has dependency data in the target pipeline stage, the first pipeline stage is controlled to read data of the instruction of the target pipeline stage.

In an alternative embodiment of the present application, the determining module 801 is further configured to: after determining whether an operation unit exists in the first pipeline stage, if the operation unit exists in the first pipeline stage, judging whether the type of the first instruction is a performance critical instruction; the control module 803 is further configured to: if the first instruction is a performance critical instruction, the first instruction on the first pipeline stage is controlled to read data of instructions on all pipeline stages subsequent to the first pipeline stage.

In an alternative embodiment of the present application, the determining module 801 is further configured to: if an operation unit exists in the first pipeline stage and the first instruction is not a performance critical instruction, judging whether the instruction on the first pipeline stage is a long-period instruction or not; if the execution time of the first instruction is a plurality of periods and then the first instruction is executed in the second pipeline stage, judging whether the first instruction has dependent data in the target pipeline stage or not; the simplification module 802 is also used to: if there is no data on which the first instruction depends at the target pipeline stage, the data forwarding network of other pipeline stages after the first pipeline stage, except the second pipeline stage, is simplified.

In an alternative embodiment of the present application, the control module 803 is further configured to: after the determining module 801 determines whether the first instruction has dependent data in the target pipeline stage, if the execution timing of the first instruction is after a plurality of cycles and the first instruction dependent data is in the target pipeline stage, the instruction controlling the first pipeline stage reads the data of the instruction of the target pipeline stage.

In an alternative embodiment of the present application, in conjunction with fig. 8, as shown in fig. 9, the data processing apparatus 800 further includes: a dependency release module 804; the dependency release module 804 is configured to delay release of a dependency of a first instruction and a second instruction, where the second instruction is an instruction having a dependency with the first instruction on a target pipeline stage.

The data processing device provided by the embodiment of the application provides a high-efficiency and quick architecture capable of forwarding data to a required module with minimum logic at a proper time point, namely, when a processor of a multi-stage pipeline stage executes instructions on a pipeline stage, the data processing device can judge each instruction on each stage pipeline stage, and judge whether the stage pipeline stage meets the simplification condition of a forwarding network based on pipeline stage information such as whether an operation unit exists in the pipeline stage and whether the type of the instruction on the pipeline stage is a performance critical instruction or not and whether the instruction is a long-period instruction or not; when the simplifying condition is met, the data processing device can simplify the forwarding network between the pipeline stage of the stage and other pipeline stages after the stage, namely, the use of logic resources is reduced, the generation of an ultra-long data forwarding logic path is avoided, the circuits between the pipeline stages are reduced, the probability of reducing the main frequency of the processor is further reduced, the complexity of the forwarding network is simplified, and the execution performance of the processor can be improved on the premise of ensuring the reduction of the waiting time of instructions due to the dependency relationship.

The specific limitation of the data processing apparatus may be referred to as limitation of the data processing method hereinabove, and will not be described herein. Each of the modules in the above-described data processing apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data processing method as described above.

In one embodiment, there is provided a computer device comprising: the computer program comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes any step of the data processing method when executing the computer program.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, can implement any of the steps of the data processing method as above.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A data processing method for use in a processor having a plurality of pipeline stages, the method comprising:

determining whether a first pipeline stage meets a simplified condition of a data forwarding network based on pipeline information, wherein the first pipeline stage is any one of the multi-stage pipeline stages;

If the first pipeline stage meets the simplification condition, simplifying the data forwarding network of the first pipeline stage;

Controlling the first pipeline stage to be based on the execution result of instructions of other pipeline stages after the simplified data forwarding network reads;

Wherein the pipeline information includes at least one of: whether an arithmetic unit is present in the pipeline stage, whether an instruction on the pipeline stage is a performance critical instruction, and whether an instruction on the pipeline stage is a long-cycle instruction.

2. The method of claim 1, wherein determining whether the first pipeline stage satisfies a reduced condition based on pipeline information comprises:

determining whether an arithmetic unit is present in the first pipeline stage;

If the first pipeline stage does not have an operation unit, determining whether a first instruction has dependent data in a target pipeline stage, wherein the first instruction is an instruction on the first pipeline stage at the current moment, and the data on the target pipeline stage disappears from the pipeline in the next period;

and if the first pipeline stage meets the simplification condition, simplifying the data forwarding network of the first pipeline stage, including:

if the first instruction does not have the dependent data in the target pipeline stage, simplifying a data forwarding network of the first pipeline stage;

Wherein the data forwarding network that simplifies the first pipeline stage instructs the first pipeline stage not to read the execution results of subsequent instructions of pipeline stages other than the target pipeline stage.

3. The method of claim 2, wherein the determining whether the first instruction has dependent data after the target pipeline stage further comprises:

And if the first pipeline stage does not have an operation unit and the first instruction has dependent data in the target pipeline stage, controlling the first pipeline stage to read the data of the instruction of the target pipeline stage.

4. The method of claim 2, wherein after the determining whether an arithmetic unit is present in the first pipeline stage, the method further comprises:

If an operation unit exists in the first pipeline stage, judging whether the type of the first instruction is a performance critical instruction or not;

if the first instruction is a performance critical instruction, the first instruction on the first pipeline stage is controlled to read data of instructions on all pipeline stages subsequent to the first pipeline stage.

5. The method of claim 4, wherein after said determining whether the type of the first instruction is a performance critical instruction, the method further comprises:

if an operation unit exists in the first pipeline stage and the first instruction is not a performance critical instruction, judging whether the instruction on the first pipeline stage is a long-period instruction or not;

If the execution time of the first instruction is a plurality of periods and then the first instruction is executed in the second pipeline stage, judging whether the first instruction has dependent data in the target pipeline stage or not;

If there is no data on which the first instruction depends at the target pipeline stage, a data forwarding network of other pipeline stages after the first pipeline stage, except the second pipeline stage, is simplified.

6. The method of claim 5, wherein the determining whether the first instruction has dependent data following a target pipeline stage, the method further comprising:

and if the execution time of the first instruction is after a plurality of cycles and the data on which the first instruction depends is in a target pipeline stage, controlling the instruction of the first pipeline stage to read the data of the instruction of the target pipeline stage.

7. The method according to claim 2 or 5, characterized in that the method further comprises:

And delaying to release the dependency relationship between the first instruction and the second instruction, wherein the second instruction is an instruction which has a dependency relationship with the first instruction on the target pipeline stage.

8. A data processing apparatus provided in a processor of a multi-stage pipeline stage, the data processing apparatus comprising: a determining module, a simplifying module and a control module;

the determining module is used for determining whether a first pipeline stage meets a simplification condition or not based on pipeline information, wherein the first pipeline stage is any one of the multi-stage pipeline stages;

The simplifying module is configured to simplify the data forwarding network of the first pipeline stage if the first pipeline stage meets a simplifying condition;

The control module is used for controlling the first pipeline stage to read the instruction results of the instructions of other pipeline stages based on the simplified data forwarding network;

9. A computer device, comprising: a memory storing a computer program, and a processor, wherein the processor implements the steps of the data processing method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the data processing method of any of claims 1 to 7.