CN101923386A

CN101923386A - Method and device for reducing CPU power consumption and low power consumption CPU

Info

Publication number: CN101923386A
Application number: CN2010102568302A
Authority: CN
Inventors: 张紧; 姜君; 晏晓京
Original assignee: Beijing Ingenic Semiconductor Co Ltd
Current assignee: Beijing Ingenic Semiconductor Co Ltd
Priority date: 2010-08-18
Filing date: 2010-08-18
Publication date: 2010-12-22
Anticipated expiration: 2030-08-18
Also published as: CN101923386B

Abstract

The invention provides a method and a device for reducing CPU power consumption and a low power consumption CPU. The invention aims to reduce the CPU power consumption. The method comprises the following steps: comparing write operation of the current instruction register file with the write operation of the latter n-level instruction register file; and if both the current instruction and at least one of the latter n-level instruction execute the write operation and a register address written by at least one instruction is identical to a register address written by the current instruction, omitting the operation that the current instruction writes the register address, wherein n is the maximum span value of the bypass path on the CPU production line. The invention reduces the CPU power consumption by omitting the excess write operation of GRF in the WB step. From the angle of hardware design, the invention only needs to add a logic module to the original CPU structure. Therefore, the invention can fully utilize the prior circuit without needing to add a plurality of hardware units.

Description

A kind of method, device and a kind of low power consumption CPU that reduces the CPU power consumption

Technical field

The present invention relates to the microprocessor technology field, particularly relate to a kind of method, device and a kind of low power consumption CPU of the CPU of reduction power consumption.

Background technology

The pipeline organization of CPU is used widely on the hardware design of RISC (reduced instruction set computer) CPU at first, but today no matter be RISC CPU or CISC (sophisticated vocabulary) CPU, on hardware is realized, all adopted multipole pipeline organization.RISC CPU five-stage pipeline structure with classics is an example, with reference to shown in Figure 1, comprises five step: IF, DE, EX, MEM and WB, and each step function is as follows:

1) IF step: comprise sense order from command memory (for example RAM, Cache etc.), then instruction is deposited in IF/DE pipeline register R _IFDeng;

2) DE step: comprise that handle is from R _IFThe instruction of taking out is deciphered, and generates the control information of present instruction, from register file GRF sense data, control information and data is deposited in DE/EX pipeline register R _DEDeng.Wherein, control information comprises to the control information of subsequent step EX (E), to the control information (M) of step MEM with to the control information (W) of step WB, and E comprises that arithmetic operation control (ALUop), ALU Data Source select (ALUSrc) etc.; Whether M comprises rdma read (Memread), whether writes internal memory (Memwrite) etc.; W comprises whether writing GRF (GRFWrite), the GRF Data Source is selected (MemReg), institute's GRF register address of writing (GRFWaddr) etc.;

3) EX step: comprise from R _DEObtain E, M, W, from bypass or R _DEObtain to carry out the data that data operation is used, carry out data operation, operation result is write EX/MEM pipeline register R according to E _EX, M, W are write R _EXDeng;

4) MEM step: comprise from R _EXObtain M, W, according to M to from bypass or R _EXThe content that obtains is operated accordingly, according to M to the conduct interviews operation of internal memory of operating result, according to M to operating result or from bypass or R _EXThe content that obtains writes MEM/WB pipeline register R _MEMOperation, W is write R _MEMDeng;

5) WB step: comprise from R _MEMObtain W, according to W to from R _MEMThe content that obtains writes the operation of GRF etc.If the bypass path shown in the dotted line exists, then corresponding contents is deposited in MEM/WB pipeline register R _WB

Also need to prove, among Fig. 1:

1) INST _x, INST _X+1, INST _X+2, INST _X+3, INST _X+4: the continuous instruction of representing CPU to carry out in proper order, every instruction all will be flow through five treatment steps on the CPU streamline;

2) T _y, T _Y+1, T _Y+2, T _Y+3, T _Y+4, T _Y+5...: represent the zero hour of each clock period among the CPU, two adjacent time points differ the clock period of a CPU;

3) line between the instruction is represented the bypass path among the CPU, as R _EXTo the line of EX, R _MEMTo the line of MEM or EX, R _WBLine to MEM or EX.Bypass is used to use the data of an instruction generation on the streamline subsequent instructions of these data directly to transmit before it writes register file GRF on streamline.Wherein the bypass path represented of dotted line may exist also and may not exist, this depends on the GRF design, occur in negative edge if be positioned at the GRF that reads of DE step, the GRF that writes that is positioned at the WB step occurs in rising edge, and the data that rising edge clock writes can be read when clapping the clock negative edge, this bypass path does not need so, otherwise just needs to arrange this path;

4) if the bypass path shown in the above-mentioned dotted line exists, then just there is pipeline register R after the WB step _WB, otherwise need not this pipeline register;

5) only show instruction INST among the figure _xTo the bypass path of three instructions thereafter, in fact, every instruction all exists directly with the bypass path of data transmission to subsequent instructions.

Above-mentioned Bypass installation optimization the CPU pipeline organization, if there is not the Bypass device, efficient is carried out in the instruction of CPU very tremendous loss can occur, drop to half in addition poorer.Because the execution of instruction generally needs to read primary operand earlier from register file GRF, needing the result after instruction is finished is that target operand writes GRF.And program generally has closely " writing-read " data usually and relies on, i.e. the data (writing) that instruction produces by following closely the 1st, 2,3 ... (reading) used in the bar instruction.Because the execution of instruction is a pipeline system, read GRF and write between the GRF several clock period apart, and between the adjacent instruction for the cause of the carrying out efficient one-period of as far as possible only being separated by.The back instruction will be used the execution result that instructs previously, if there is not the bypass path, this instruction reads out it after just need waiting the instruction of front that the result is write GRF again, can block a plurality of cycles of streamline like this.Bypass is a shunting device, do not influence under the situation that these data write GRF these data are directly given need to use it following closely the 1st, 2,3 ... the bar instruction, data can directly be sent to the circuit that uses them, rather than press beat through each stage arrangement of streamline.

The execution of every instruction is decided because of this instruction in the CPU operational process, and some carries out all five steps, and some may only carry out wherein several.Carrying out which step is controlled by the control information after deciphering in the DE step.If certain step is not carried out, the content in the pipeline register will be passed to this step that pipeline register afterwards before this step in the working time of this step correspondence so.

Design from above-mentioned CPU, present CPU pipeline organization has improved the execution efficient of CPU greatly because the optimization of Bypass device has been arranged.But, in guaranteed efficiency, how to reduce the power consumption of CPU, produce the CPU of low-power consumption as far as possible, be a field of microprocessors development trend in future, also be the problem that the present invention will solve.

Summary of the invention

Technical matters to be solved by this invention provides a kind of method, device and a kind of low power consumption CPU of the CPU of reduction power consumption, to reduce the CPU power consumption.

In order to address the above problem, the invention discloses a kind of method of the CPU of reduction power consumption, comprising:

The register file operation of writing that compares present instruction and back n level instruction, if present instruction is all carried out write operation with at least one instruction in the n level instruction of back, and institute's register address of writing of at least one instruction is identical with register address that present instruction is write, and then omits the operation that this register address is write in present instruction; Wherein, the value of n is the maximum span value in bypass path on the CPU streamline.

Preferably, described method also comprises: do not carry out write operation if instruct in the n level instruction of back, perhaps write operation is carried out at least one instruction in the instruction of back n level, all in the instruction of write operation to write register address all different with register address that present instruction is write but carry out, and then carry out the operation that this register address is write in present instruction.

Preferably, describedly relatively comprise: obtain the control information of writing register file of present instruction and back n level instruction, wherein the control information of every instruction comprises the information of whether writing and the register address of writing; When judging that according to described control information present instruction is when writing register file, every of back n level instruction is made the following judgment: judge whether to write register file, if write, then continue to judge this instruction the register address of writing whether with present instruction to write register address identical.

Preferably, described method also comprises: if omit the operation that this register address is write in present instruction, then certain the bar instruction in the instruction of back n level obtains the result that present instruction will write this register address by the bypass path; If carry out the operation that this register address is write in present instruction, then the result that present instruction writes is read in certain the bar instruction in the instruction of back n level from this register address, perhaps, obtains the result that present instruction will write this register address by the bypass path.

Preferably, described method obtains the control information of writing register file of present instruction and back n level instruction in the following manner: the register of CPU five-stage pipeline is followed successively by R _IF, R _DE, R _EX, R _MEM, R _WB, then the maximal value of n is 3; The control information of writing register file of present instruction is from pipeline register R _MEMObtain; The control information of writing register file of back first order instruction is from pipeline register R _EXObtain; The control information of writing register file of the second level, back instruction is from pipeline register R _DEObtain; The control information of writing register file of back third level instruction is from writing pipeline register R _DEObtain among the instruction decode result before.

Corresponding the present invention also provides a kind of device of the CPU of reduction power consumption, comprising:

Comparison module is used for the register file operation of writing of comparison present instruction and back n level instruction; Wherein, the value of n is the maximum span value in bypass path on the CPU streamline;

The operation control module, be used for present instruction and all carry out write operation with at least one instruction of back n level instruction, and when institute's register address of writing of at least one instruction was identical with register address that present instruction is write, the control present instruction was write the operation of this register address and is omitted.

Preferably, if do not instruct in the back n level instruction and carry out write operation, perhaps write operation is carried out at least one instruction in the instruction of back n level, all in the instruction of write operation to write register address all different with register address that present instruction is write but carry out, and the operation execution of this register address is write in then described operation control module control present instruction.

Preferably, described comparison module comprises:

Submodule is obtained in control information, is used to obtain the control information of writing register file of present instruction and back n level instruction, and wherein the control information of every instruction comprises the information of whether writing and the register address of writing;

Judge submodule, be used for judging that according to described control information present instruction is when writing register file, every of back n level instruction is made the following judgment: judge whether to write register file, if write, then continue to judge this instruction the register address of writing whether with present instruction to write register address identical.

Accordingly, the present invention also provides a kind of low power consumption CPU, comprising:

Register file is used for the target operand after storage instruction is carried out required primary operand and instruction execution;

Write the register file control device, be used to control whether carry out the operation of writing register file, specifically comprise:

The operation control module, be used for present instruction and all carry out write operation with at least one instruction of back n level instruction, and when institute's register address of writing of at least one instruction is identical with register address that present instruction is write, the register address output write to present instruction does not allow the control signal write, and the operation that this register address is write in present instruction is omitted in expression.

Preferably, described comparison module comprises:

Preferably, the described operation control module of writing the register file control device also is used for, when after in the n level instruction instruction carry out write operation, perhaps write operation is carried out at least one instruction in the instruction of back n level, but carry out in the instruction of write operation all write register address all with register address that present instruction is write not simultaneously, export and allow the control signal write to the register address of writing of present instruction;

Accordingly, described CPU also comprises: the write operation module is used for carrying out the operation that this register address is write in present instruction according to the described control signal that allows to write.

Preferably, when described CPU is the five-stage pipeline structure, also comprise five pipeline registers, be followed successively by R _IF, R _DE, R _EX, R _MEM, R _WB, then the maximal value of n is 3; The described control information of writing the register file control device is obtained submodule from pipeline register R _MEMObtain the control information of writing register file of present instruction; From pipeline register R _EXObtain the control information of writing register file of back first order instruction; From pipeline register R _DEObtain the control information of writing register file of the second level, back instruction; From writing pipeline register R _DEObtain the control information of writing register file of back third level instruction among the instruction decode result before.

Preferably, described register file comprises a plurality of write ports, the corresponding register address of each write port, and then the register address of writing of described instruction comprises a plurality of.

Compared with prior art, the present invention has the following advantages:

At first, the invention provides a kind of method of the CPU of reduction power consumption: if A and B are the instructions on the streamline, A is preceding, B after, instruction A will write certain register of GRF, but when the instruction B of back will rewrite this register of GRF on this streamline, by in the WB of CPU streamline step, increasing logic module, this logic module can have been set under the situation that will write GRF in W (control information of WB), judge whether the operation of writing GRF can not carried out, if judged result for can not carrying out, is not then write the operation of GRF, and is to use the bypass path will write A the result transmission of GRF to instruction B follow-up on the streamline.As from the foregoing, the unnecessary GRF that writes operates the power consumption that reduces CPU in the WB step by omitting in the present invention.

Secondly, from the hardware design angle, the present invention only needs to increase described logic module and gets final product on original CPU structure, so the present invention can make full use of existing circuit and need not to increase a lot of hardware cells.

Once more, before the present invention proposes, those skilled in the art seldom can find to exist in the CPU pipeline organization the unnecessary GRF that writes to operate, under the optimization in bypass path, still generally believe to be judged as after the instruction decode and write GRF and just carry out write operation and let nature take its course, even for the purpose that reduces the CPU power consumption, also seldom consider to seek the method for reduction from the execution in step of CPU streamline, because the CPU pipeline organization has been unusual mature C PU method for designing, existing C PU generally follows this design.But the present inventor exactly breaks this inertial thinking, find also to exist the write operation that expends the CPU power consumption in the CPU pipeline organization, so the present invention has carried out above-mentioned improvement to the operation steps in the existing C PU streamline under the situation of taking into account instruction execution efficient.

Description of drawings

Fig. 1 is the RISC CPU five-stage pipeline synoptic diagram of prior art;

Fig. 2 is the processing flow chart of logic module in the described five step CPU streamlines of the embodiment of the invention;

Fig. 3 is the CPU hardware configuration synoptic diagram of prior art;

Fig. 4 is the described CPU hardware configuration of an embodiment of the invention synoptic diagram;

Fig. 5 is the structure drawing of device of the described a kind of CPU of the reduction power consumption of the embodiment of the invention;

Fig. 6 is the CPU structural drawing of the described a kind of low-power consumption of the embodiment of the invention.

Embodiment

For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.

In order to reduce the CPU power consumption as much as possible, the present invention finally proposes a kind of method of the CPU of reduction power consumption from analyzing with lower angle, makes a concrete analysis of as follows:

From the analysis of Fig. 1 to RISC instruction, overwhelming majority instruction is read one, two even multioperand more from GRF, have only the only a few instruction not from GRF read operation number, and GRF is write in most of instruction (about 70%).Hence one can see that, and the read-write operation of GRF consumes appreciable power consumption in the CPU operational process.

Further analyze and find, CPU handles instruction on the streamline, and often to run into such situation: A and B be instruction on the streamline, A is preceding, B after, instruction A will write certain register of GRF, but the instruction B of back will rewrite the same register of GRF, and (for example: A is R2=R1+R3 can the instruction C that uses A will write the result of GRF between A and the B, C is R5=R2+R6, and B is R2=R2+R4).If A is also on streamline during B entry instruction decoding step, and can obtain the result that the A instruction prepares to write GRF by the bypass path according to the instruction between pipeline organization B and the A (comprising B), do not need to read GRF, so A instruction to write the GRF operation be exactly unnecessary, can cause the waste of CPU power consumption, that therefore can cancel the A instruction writes the GRF operation to save power consumption.Because the B instruction will rewrite the register that the A instruction is write very soon, and the result of A instruction is used in the instruction between B and the A (comprising B) if desired, based on the pipeline organization (as shown in Figure 1) of risc instruction set CPU, this result is not by the GRF transmission but transmits by the bypass path.

To sum up analyze and learn, exist the unnecessary operation that need not to write GRF in a large number in the CPU operational process, expended the power consumption of CPU.Based on this analysis result, the present invention proposes a kind of method of the CPU of reduction power consumption, the unnecessary GRF that writes operates the power consumption that reduces CPU in the WB step by omitting.

Core thinking of the present invention is: suppose that A and B are the instructions on the streamline, A is preceding, B after, instruction A will write certain register of GRF, but when the instruction B of back will rewrite the same register of GRF on this streamline, by in the WB of CPU streamline step, increasing logic module, this logic module can have been set under the situation that will write GRF in W (control information of WB), judge whether the operation of writing GRF can not carried out, if judged result is not for can carrying out, then do not write the operation of GRF, and be to use the bypass path will write A the result transmission of GRF to instruction B follow-up on the streamline.

Need to prove, before the present invention proposes, those skilled in the art seldom can find to exist in the CPU pipeline organization the unnecessary GRF that writes to operate, generally believe to be judged as after the instruction decode and write GRF and just carry out write operation and let nature take its course, even for the purpose that reduces the CPU power consumption, also seldom consider to seek from the execution in step of CPU streamline the method for reduction, because the CPU pipeline organization has been unusual mature C PU method for designing, existing C PU generally follows this design.But the present inventor exactly breaks this inertial thinking, find also to exist the operation that expends the CPU power consumption in the CPU pipeline organization, so the present invention has carried out above-mentioned improvement to the operation steps in the existing C PU streamline under the situation of taking into account instruction execution efficient.

Be elaborated below by embodiment.

Consider two kinds of more common The pipeline design now:

1. as dashed line free bypass path among Fig. 1, there is not register R _WB

2. as dotted line bypass path is arranged among Fig. 1, register R is arranged _WB

If the CPU The pipeline design is above-mentioned 1. to plant situation, instruction A will write certain register of GRF, instruction B will use and rewrite GRF, and B is one of the 1st, 2 instruction behind the A, can there be the instruction C that uses A will write the result of GRF between A and the B, the operation of writing GRF of A need not to take place so, because the result that C and B can use bypass acquisition A will write GRF finishes the operation of C and B, need not read GRF and obtain the result that A will write GRF.So, at this moment, be unnecessary if carry out the operation of writing GRF accordingly according to the control information of A, can cause the waste of CPU power consumption.Wherein, using A will write the result of GRF among C and the B can be from the bypass acquisition between the pipeline register after the corresponding step of oneself EX/MEM step and previous cpu clock A in the cycle in cpu clock cycle at that time.Article one, the possibility of result of EX and MEM is identical in the instruction, also may be different, to see the situation of reality with the bypass of which step correspondence.

If B just rewrites GRF, and B is the 1st, 2 instruction one behind the A, and C can use bypass to obtain the operation that result that A will write GRF finishes C, need not read GRF and obtain the result that A will write GRF.So, it is unnecessary carrying out the operation of writing GRF accordingly according to the control information of A, can cause the waste of CPU power consumption.Wherein, using A will write the result of GRF among the C can be from the bypass acquisition between the pipeline register after the corresponding step of oneself EX/MEM step and previous cpu clock A in the cycle in cpu clock cycle at that time.Article one, the possibility of result of EX and MEM is identical in the instruction, also may be different, to see the situation of reality with the bypass of which step correspondence.

If the CPU The pipeline design is above-mentioned 2. to plant situation, B will use and rewrite GRF, and B is the 1st, 2,3 instruction one behind the A, the operation of writing GRF of A need not to take place so, because the result that C and B can use bypass acquisition A will write GRF finishes the operation of C and B, need not read GRF and obtain the result that A will write GRF.So, at this moment, be useless if carry out the operation of writing GRF according to the control information of A, can cause the waste of CPU power consumption.

Equally, if B just rewrites GRF, and B is the 1st, 2,3 instruction one behind the A, and C can use bypass to obtain the operation that result that A will write GRF finishes C, need not read GRF and obtain the result that A will write GRF.So, it is unnecessary carrying out the operation of writing GRF accordingly according to the control information of A, can cause the waste of CPU power consumption.

No matter above-mentioned two kinds of CPU The pipeline design are any designs, and the present invention can increase logic module in the WB step, omit the GRF that writes unnecessary in the WB step by logic determines and operate, thus the power consumption of reduction CPU.

In actual applications, classical CPU streamline is divided into five steps, but also has the pipeline organization greater than five steps, is equivalent to one or more steps of five step streamlines are divided into several sub-steps greater than the five CPU streamlines that go on foot.For the clearer treatment scheme of introducing logic module in the WB step of the present invention, describe to five step CPU streamlines with greater than the five CPU streamlines that go on foot respectively below.Certainly, the treatment scheme greater than the CPU streamline of five steps is equally applicable to five step CPU streamlines.

1, five step CPU streamlines

Five step CPU streamlines can be with reference to shown in Figure 1, and five steps are respectively: IF, DE, EX, MEM and WB, the pipeline register that five steps related to is respectively: R _IF, R _DE, R _EX, R _MEMIf comprise the bypass path of dotted line then also comprising pipeline register R _WBWherein, the related instruction in the bypass path of solid line is INST _x, INST _X+1, INST _X+2The related instruction in the bypass path of dotted line is INST _x, INST _X+1, INST _X+2, INST _X+3

With reference to Fig. 2, be the processing flow chart of logic module in the described five step CPU streamlines of the embodiment of the invention.

Described logic module is carried out in the WB step, and as previously mentioned, the function of WB step is: from R _MEMObtain W (control information of WB), according to W to from R _MEMThe content that obtains writes the operation of GRF etc.If the bypass path shown in the dotted line exists, then corresponding contents is deposited in MEM/WB pipeline register R _WBWherein, W comprises whether writing GRF (GRFWrite), the GRF Data Source is selected (MemReg), the GRF register address control informations such as (GRFWaddr) of writing.

Described logic module mainly is responsible for writing the control of GRF in the WB step, judge promptly whether the operation of writing GRF is unnecessary, whether can not carry out.Logic module judges whether unnecessary method is in the operation of writing GRF: the register file operation of writing that compares present instruction and back n level instruction, if present instruction all carries out write operation with at least one instruction in the n level instruction of back and institute's register address of writing is identical, then omit the register file operation of writing of present instruction; Wherein, the value of n is the maximum span value in bypass path on the CPU streamline.

In actual applications, it is multiple that logic module realizes that above-mentioned judgment mode has, and provides a kind of example that realizes below, and the back provides another kind of realization example in the CPU streamline greater than five steps.

The complete implementation of described logic module is as follows:

Step 201 obtains the control information W that writes register file GRF that present instruction and back n level are instructed, and the control information W of every instruction comprises the information of whether writing GRF and the register address of writing;

The instruction of supposing the WB step correspondence of CPU streamline is INST _x, promptly described present instruction, when the bypass path was solid-line paths among Fig. 1, the value of n was 2 so, then back n level refers to the instruction INST of present instruction back _X+1And INST _X+2When the bypass path comprised solid line among Fig. 1 and dashed path, the value of n was 3, and then back n level refers to the instruction INST of present instruction back _X+1, INST _X+2And INST _X+3

Step 202 judges according to the control information of present instruction and back n level instruction whether every instruction writes register file, if register file is write in present instruction, and also has the instruction of writing register file in the n level instruction of back, and then execution in step 203; Otherwise the execution of logic module finishes;

Whether unnecessary this step be to judge to write GRF first determining step, promptly judges whether to write GRF.

Generally speaking, the execution of instruction all is to read primary operand from GRF, and after instruction is finished target operand is write GRF.Therefore, if instruction INST _xWrite GRF, as long as the instruction INST of back _X+1And INST _X+2(the bypass path of dotted line also comprises INST _X+3) in have an instruction also to need to write GRF, just need to continue second judgement of execution in step 203.But INST in some cases, _xThough instruction INST is all used in the instruction of back _xWrite the result of GRF, but do not write GRF, in this case, instruction INST _xAlso need to carry out the operation of writing GRF.

Step 203, with present instruction the register address of writing respectively with back n level instruction in write the instruction of register file the register address of writing compare, judging whether identically, is identical if one group of comparative result is arranged, and then omits the operation that this register address is write in present instruction; If any one group of comparative result all is different, then carry out the operation that this register address is write in present instruction.

Whether unnecessary this step be to judge to write GRF second determining step, promptly judges whether to write the same register of GRF.

Presumptive instruction INST _xThe instruction INST of back _X+1, INST _X+2, INST _X+3(the bypass path that comprises dotted line) all writes register file, and the notion of described " group " is meant instruction INST _xAnd instruction INST _X+1Write register address by one group of comparison, instruction INST _xAnd instruction INST _X+2Write register address by one group of comparison, when having the bypass path of dotted line, also comprised instruction INST _xAnd instruction INST _X+3Write register address by one group of comparison.

If it is identical having one group of comparative result at least, show instruction INST _xThe back has at least an instruction can rewrite the same register of GRF, instructs INST this moment _xThe operation of writing GRF is exactly unnecessary, can omit this and operate the power consumption of saving CPU.If any one group of comparative result all is different, though show instruction INST _x3 instructions (the bypass path that comprises dotted line) of back all will be write register file, but not instruction will not write the same register of GRF, instruct INST this moment _xJust need to carry out the operation of writing GRF.

In the above-mentioned steps, if omit present instruction INST _xWrite GRF operation, then certain bar instruction in the instruction of back n level can obtain present instruction INST by the bypass path _xWrite the result of GRF.If carry out present instruction INST _xWrite GRF operation, then certain bar instruction in the instruction of back n level can be read present instruction INST from the GRF that is write _xThe result who writes also can obtain the result that present instruction will write register file by the bypass path; Certainly, the bypass path is the optimization to the CPU streamline, and therefore obtaining by the bypass path is a kind of preferred mode.

2, greater than the five CPU streamlines that go on foot

As previously mentioned, be equivalent to one or more steps of five step streamlines are divided into several sub-steps, so the execution of logic module is identical with five step CPU streamlines greater than the five CPU streamlines that go on foot.Describe below by another embodiment:

In the present embodiment, logic determines is carried out in the WB step, and in the WB step, has obtained the control information W of WB, but carries out when also not carrying out any operation according to W.As follows:

1) instruction of supposing the WB step correspondence of CPU streamline is INST _x

2) establish GRFWrite ‖ GRFWaddr=H;

Wherein, GRFWrite represents whether write GRF, and GRFWaddr represents the GRF register address of writing, and " ‖ " represents connection.H _xRepresent INST _xPairing H can be from R _MEMIn W in obtain; H _X+1Represent INST _X+1Pairing H, the rest may be inferred, H _X+nRepresent INST _X+nPairing H.

3) judge H _xIn GRFWrite whether be true (being whether present instruction writes GRF), if for really entering next step, otherwise withdraw from;

Promptly at first judge present instruction INST _xWhether write GRF, if write, the judgement below then continuing, promptly judge present instruction whether write the GRF operation unnecessary; If GRF is not write in present instruction, then logic module is just no longer carried out following judgement.

4) compare H _xAnd H _X+1Whether equate, relatively H _xAnd H _X+2Whether equate ..., compare H _xAnd H _X+nWhether equate.If there is one group to equate, then the GRF that writes in this WB step operates and can omit; Otherwise the GRF that writes in this WB step operates and will be performed.Wherein, the size of " n " depends on the maximum span in bypass path on the streamline.

Above-mentioned each the group relatively in, with H _xAnd H _X+1Relatively be example, specifically comprise following two steps:

The first step is judged H _X+1In GRFWrite whether be true (be next stage instruction INST _X+1Whether write GRF), if for really entering next step, otherwise withdraw from;

Second step, relatively H _xIn GRFWaddr and H _X+1In GRFWaddr whether equate, if equate, then represent H _xAnd H _X+1Equate; Otherwise, expression H _xAnd H _X+1Unequal.

Need to prove that according to the difference of instruction set, the write port of GRF can be a plurality of, corresponding each write port all has corresponding GRFWrite and GRFWaddr, promptly comprises GRFWrite1, GRFWrite2 among the W ... with GRFWaddr1, GRFWaddr2 ... etc. information.Above five steps and all be to be that example describes with a write port greater than the embodiment of CPU streamlines in five steps, and for the situation of a plurality of write ports, can adopt the method identical to judge whether the write operation of each write port unnecessary with write port.Wherein, when back n level instruction was judged one by one, need all judge whether to write GRF and institute to write register address identical to a plurality of write ports of every instruction.For example, to compare H _xAnd H _X+1Whether equate for example, if CPU has m write port, then to instruction INST _xEach write port i (i=1～m), all carry out Hi _xAnd Hj _X+1Comparison, j=1～m wherein.If i port and INST _X+1In the H of any one j port identical, just can omit write operation to the i port.When specific implementation, can in above-mentioned logic module, increase logic determines respectively to each write port, also can increase the logic module of corresponding each write port respectively.Because the judgement of multiport is identical with the logic determines of an above-mentioned write port, this paper will be no longer be elaborated to the situation of a plurality of write ports.

By above-mentioned logic determines as can be known, in the described embodiment of CPU streamline greater than five steps, the implementation of logic module is: after obtaining the control information of writing register file (control information of every instruction comprises the information of whether writing and the register address of writing) of present instruction and back n level instruction, when judging that according to described control information present instruction is when writing register file, every of back n level instruction is made the following judgment: judge whether to write register file, if write, then continue to judge this instruction the register address of writing whether with present instruction to write register address identical.If present instruction all carries out write operation with at least one instruction in the n level instruction of back and institute's register address of writing is identical, then omit the operation that this register address is write in present instruction; If in the instruction of back n level instruction carry out write operation, perhaps at least one instruction in the instruction of back n level is carried out write operation but is different with institute's register address of writing of present instruction, then carries out the operation that this register address is write in present instruction.Certainly, if do not write register file, then do not need follow-up a series of judgements according to control information judgement present instruction.

The described implementation of present embodiment slightly is different from the example of Fig. 2: when " ‖ " is illustrated in each bar instruction in the n level instruction of back judged in this example, judge whether to satisfy simultaneously " writing register file " and " writing same register address " two conditions side by side; And the example of Fig. 2 is that all instructions that back n level is instructed are judged whether to satisfy " writing register file " earlier, and then the instruction of writing register file in the n level instruction of back is judged whether to satisfy " writing same register address ".They logically are of equal value, just the form of presentation difference.Dual mode does not have essential distinction when hardware is realized.

In addition; except that above-mentioned two kinds of logic module implementation methods enumerating; the implementation that also may have other; as judging whether to write GRF by other modes rather than judging by the GRFWrite in the control information; or the like; but as long as total thinking is followed " judge that present instruction and at least one instruction in the n level instruction of back are all carried out and write register file operation and a register address of writing is identical; then omit the register file operation of writing of present instruction "; all belong to protection scope of the present invention, specific implementation is enumerated no longer one by one.

Based on foregoing, be example still below with the situation of a write port, be described in detail in conjunction with the hardware configuration of CPU.

For comparative illustration, at first introduce CPU hardware configuration of the prior art.

With reference to Fig. 3, be the CPU hardware configuration synoptic diagram of prior art.

CPU comprises R _IF, R _DE, R _EX, R _MEM, R _WB(the bypass path that has dotted line) five pipeline registers and a register file GRF, not shown R _IF

With five step streamlines is example, as previously mentioned, the DE step is carried out instruction decode and is generated control information, and comprising the control information W of WB step, W comprises whether writing GRF (GRFWrite) again, the GRF Data Source is selected (MemReg), institute's GRF register address of writing (GRFWaddr) etc.Pipeline register R among the figure _DE, R _EX, R _MEMNot only store data division (DATA), also after decoding, stored the control information part, only shown GRFWaddr and the GRFWrite of storage control information W among the figure.In addition, the Din among the figure among the GRF represents the data write signal, and WAddress represents to write register address, and WE represents whether allow write signal.

In the prior art, the function of WB step is: from R _MEMObtain W, according to W to from R _MEMThe content that obtains writes the operation of GRF etc.If the bypass path shown in the dotted line exists, then corresponding contents is deposited in MEM/WB pipeline register R _WBThe execution of WB step is realized by the Logic1 module among the figure, promptly determine to give the WE signal of GRF according to the GRFWrite among the W, decision writes the WAddress of GRF according to the GRFWaddr among the W, when the WE signal when allowing to write, the Logic1 module sends the Din signal data of preparing to write GRF is write in the register that WAddress represents.When the bypass path that exists dotted line to represent, also data are write R _WB(connecting as shown in phantom in FIG.).

Wherein, according to the difference of The pipeline design, the data that write GRF change to some extent, may comprise: Data Source is selected (deriving from EX result of calculation or MEM peek), data type expansion (as the symbol of 8bit → 32bit or 16bit → 32bit or do not have symbol etc.), or the like.

Data type with 32 bit CPUs expands to example, and the execution of Logic1 module can comprise following:

1) from R _MEMTake out data DATA in the pipeline register, judge whether to do the data expansion;

2) if need not expansion, then directly DATA is sent;

3) expansion if desired is from R _MEMObtain sign extended type (symbol is arranged or do not have symbol), current data type (8bit or 16bit);

4) take out the low 8bit of corresponding positions or 16bit as current data according to the current data type from DATA, high 24 or 16 sign bits (sign extended) of filling 0 (no sign extended) or current data according to the sign extended type.Sign bit is positioned at the most significant digit of current data, and 8bit is positioned at bit7 (bit0 is a lowest order) 16bit and is positioned at bit15 (bit0 is a lowest order).Filling is sent after finishing.

With reference to Fig. 4, be the described CPU hardware configuration of embodiment of the invention synoptic diagram.

Compare with Fig. 3, increased Logic2 module (being above-mentioned logic module).Accordingly, also increased R _DE, R _EX, R _MEMTo the connection of Logic2 module, so that from these pipeline registers, read control information, as GRFWaddr among the control information W and GRFWrite.In addition, when having the bypass path that dotted line represents, point to the connection of Logic2 module after also having increased DE decoding, and write R _WBConnection (as shown in phantom in FIG. connect).

Be example still with five step streamlines, and the bypass path that exists dotted line to represent, as instruction INST _xWhen carrying out the WB step, the Logic2 module need obtain subsequent instructions INST _X+1, INST _X+2, INST _X+3Control information, judge whether to exist the unnecessary GRF operation of writing.

The Logic2 module is by reading R _MEMCan obtain to instruct INST _xControl information GRFWaddr and GRFWrite.According to pipeline organization, next stage instruction INST _X+1This moment is just at execution in step MEM, so the Logic2 module can be by reading R _EXObtain instruction INST _X+1Control information GRFWaddr and GRFWrite.Equally, second level instruction INST _X+2This moment is just at execution in step EX, so the Logic2 module can be by reading R _DEObtain instruction INST _X+2Control information GRFWaddr and GRFWrite.Third level instruction INST _X+3This moment is just at execution in step DE, thus the Logic2 module can by read in the DE step generate after the decoding but also do not write R _DEControl information, obtain to instruct INST _X+3GRFWaddr and GRFWrite.

The concrete treatment scheme of Logic2 module can be with reference to flow process shown in Figure 2 with greater than the example of CPU streamlines in five steps, when judge can omit present instruction write the GRF operation time, the Logic2 module does not allow the signal (WE) write to GRF output; When needs are carried out when writing the operation of GRF, the Logic2 module allows the signal (WE) write to GRF output.The Logic1 module is carried out write operation according to the control of WE signal.

Based on foregoing, the present invention also provides a kind of device embodiment of the CPU of reduction power consumption.

With reference to Fig. 5, be the structure drawing of device of the described a kind of CPU of the reduction power consumption of the embodiment of the invention.

Described device mainly comprises:

Comparison module 51 is used for the register file operation of writing of comparison present instruction and back n level instruction; Wherein, the value of n is the maximum span value in bypass path on the CPU streamline;

Operation control module 52, be used for present instruction and all carry out write operation with at least one instruction of back n level instruction, and when institute's register address of writing of at least one instruction was identical with register address that present instruction is write, the control present instruction was write the operation of this register address and is omitted.

Accordingly, if do not instruct in the back n level instruction and carry out write operation, perhaps write operation is carried out at least one instruction in the instruction of back n level, all in the instruction of write operation to write register address all different with register address that present instruction is write but carry out, and the operation execution of this register address is write in then described operation control module 52 control present instructions.

Preferably, described comparison module 51 specifically can comprise:

Submodule 511 is obtained in control information, is used to obtain the control information of writing register file of present instruction and back n level instruction, and wherein the control information of every instruction comprises the information of whether writing and the register address of writing;

Judge submodule 512, be used for judging that according to described control information present instruction is when writing register file, every of back n level instruction is made the following judgment: judge whether to write register file, if write, then continue to judge this instruction the register address of writing whether with present instruction to write register address identical.

Said apparatus is equivalent to the Logic2 module among Fig. 4, can be arranged on the pipeline register R of CPU _MEMAnd between the register file GRF, by the control signal WE that whether writes to GRF output, control is not carried out the GRF operation of writing unnecessary in the present instruction, thereby reduces the CPU power consumption.

Accordingly, the present invention also provides a kind of CPU embodiment of low-power consumption.

With reference to Fig. 6, be the CPU structural drawing of the described a kind of low-power consumption of the embodiment of the invention.

Described CPU comprises register file 61 and writes register file control device 62, wherein:

Register file 61 is used for the target operand after storage instruction is carried out required primary operand and instruction execution;

Write register file control device 62, be used to control whether carry out the operation of writing register file 61, specifically comprise:

The operation control module, be used for instructing with at least one of back n level instruction and all carry out write operation when present instruction, and when institute's register address of writing of at least one instruction is identical with register address that present instruction is write, do not allow the control signal write to register file 61 output, the operation that this register address is write in present instruction is omitted in expression.

Preferably, described comparison module specifically can comprise:

Preferably, the described operation control module of writing register file control device 62 also is used for, when after in the n level instruction instruction carry out write operation, perhaps write operation is carried out at least one instruction in the instruction of back n level, but carry out in the instruction of write operation all write register address all with register address that present instruction is write not simultaneously, export to register file 61 and to allow the control signal write;

Accordingly, described CPU can also comprise:

Write operation module 63 is used for carrying out the operation that this register address is write in present instruction according to the described control signal that allows to write.

Wherein, according to the difference of instruction set, described register file 61 can have a plurality of write ports, all corresponding register address of each write port.And at each write port, can be provided with one respectively and write register file control device 62, also can be integrated into one and write in the register file control device 62, write register file control device 62 and all can control write operation module 63 and whether write data to each port at all write ports.

In addition, when described CPU is the five-stage pipeline structure, also comprise five pipeline registers, be followed successively by R _IF, R _DE, R _EX, R _MEM, R _WB(when the bypass path that exists dotted line to represent), then the maximal value of n is 3.

The described control information of writing register file control device 62 is obtained submodule from pipeline register R _MEMObtain the control information of writing register file of present instruction;

From pipeline register R _EXObtain the control information of writing register file of back first order instruction;

From pipeline register R _DEObtain the control information of writing register file of the second level, back instruction;

When the bypass path that exists dotted line to represent, described control information is obtained submodule also from writing pipeline register R _DEObtain the control information of writing register file of back third level instruction among the instruction decode result before.And described write operation module 63 is also carried out and is write R _WBOperation.Among Fig. 6, the connection situation the when connection of dotted line is represented to have the bypass path.

In sum, described CPU writes register file control device 62 by increase and saves power consumption, from the hardware design angle, can make full use of existing circuit and need not to increase a lot of hardware cells.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For device embodiment, because it is similar substantially to method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.

More than to a kind of method, device and a kind of low power consumption CPU that reduces the CPU power consumption provided by the present invention, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1. a method that reduces the CPU power consumption is characterized in that, comprising:

The register file operation of writing that compares present instruction and back n level instruction, if present instruction is all carried out write operation with at least one instruction in the n level instruction of back, and institute's register address of writing of at least one instruction is identical with register address that present instruction is write, and then omits the operation that this register address is write in present instruction;

Wherein, the value of n is the maximum span value in bypass path on the CPU streamline.

2. method according to claim 1 is characterized in that, also comprises:

If do not instruct in the back n level instruction and carry out write operation, perhaps write operation is carried out at least one instruction in the instruction of back n level, all in the instruction of write operation to write register address all different with register address that present instruction is write but carry out, and then carry out the operation that this register address is write in present instruction.

3. method according to claim 1 is characterized in that, describedly relatively comprises:

Obtain the control information of writing register file of present instruction and back n level instruction, wherein the control information of every instruction comprises the information of whether writing and the register address of writing;

When writing register file, instruct every to make the following judgment when judge present instruction according to described control information to back n level:

Judge whether to write register file, if write, then continue to judge this instruction the register address of writing whether with present instruction to write register address identical.

4. method according to claim 2 is characterized in that, also comprises:

If omit the operation that this register address is write in present instruction, then certain the bar instruction in the instruction of back n level obtains the result that present instruction will write this register address by the bypass path;

If carry out the operation that this register address is write in present instruction, then the result that present instruction writes is read in certain the bar instruction in the instruction of back n level from this register address, perhaps, obtains the result that present instruction will write this register address by the bypass path.

5. method according to claim 3 is characterized in that, obtains the control information of writing register file of present instruction and back n level instruction in the following manner:

The register of CPU five-stage pipeline is followed successively by R _IF, R _DE, R _EX, R _MEM, R _WB, then the maximal value of n is 3;

The control information of writing register file of present instruction is from pipeline register R _MEMObtain;

The control information of writing register file of back first order instruction is from pipeline register R _EXObtain;

The control information of writing register file of the second level, back instruction is from pipeline register R _DEObtain;

The control information of writing register file of back third level instruction is from writing pipeline register R _DEObtain among the instruction decode result before.

6. a device that reduces the CPU power consumption is characterized in that, comprising:

7. device according to claim 6 is characterized in that:

If do not instruct in the back n level instruction and carry out write operation, perhaps write operation is carried out at least one instruction in the instruction of back n level, all in the instruction of write operation to write register address all different with register address that present instruction is write but carry out, and the operation execution of this register address is write in then described operation control module control present instruction.

8. device according to claim 6 is characterized in that, described comparison module comprises:

9. a low power consumption CPU is characterized in that, comprising:

10. CPU according to claim 9 is characterized in that, described comparison module comprises:

11. CPU according to claim 10 is characterized in that:

The described operation control module of writing the register file control device also is used for, when after in the n level instruction instruction carry out write operation, perhaps write operation is carried out at least one instruction in the instruction of back n level, but carry out in the instruction of write operation all write register address all with register address that present instruction is write not simultaneously, export and allow the control signal write to the register address of writing of present instruction;

Described CPU also comprises:

The write operation module is used for carrying out the operation that this register address is write in present instruction according to the described control signal that allows to write.

12. CPU according to claim 10 is characterized in that:

When described CPU is the five-stage pipeline structure, also comprise five pipeline registers, be followed successively by R _IF, R _DE, R _EX, R _MEM, R _WB, then the maximal value of n is 3;

The described control information of writing the register file control device is obtained submodule from pipeline register R _MEMObtain the control information of writing register file of present instruction;

From writing pipeline register R _DEObtain the control information of writing register file of back third level instruction among the instruction decode result before.

13., it is characterized in that according to the arbitrary described CPU of claim 9 to 12:

Described register file comprises a plurality of write ports, the corresponding register address of each write port, and then the register address of writing of described instruction comprises a plurality of.