CN208580395U

CN208580395U - A kind of processor pipeline structure

Info

Publication number: CN208580395U
Application number: CN201820548377.4U
Authority: CN
Inventors: 胡振波
Original assignee: Wuhan Silicon Integrated Co Ltd
Current assignee: Wuhan Silicon Integrated Co Ltd
Priority date: 2018-03-14
Filing date: 2018-04-16
Publication date: 2019-03-05
Anticipated expiration: 2028-04-16

Abstract

The utility model discloses a kind of processor pipeline structures, including the location of instruction, Fetch unit, execution unit, memory access unit and writeback unit；The first end of Fetch unit is connected with the location of instruction, and second end is connected with the first end of execution unit；The second end of execution unit is connected with the first end of writeback unit, and third end is connected with the first end of memory access unit；The second end of memory access unit is connected with the second end of writeback unit；Writeback unit writes back module and second including first and writes back module；Described first writes back the first end of module is connected with the second end of execution unit, and second end is connected with the second end of memory access unit, and the first end that third end writes back module with second is connected；Second writes back the second end of module is connected with the 4th end of execution unit；The utility model is improved by the internal structure to pipelined units at different levels, solves the problems, such as that existing processor pipeline structure can not combine low-power consumption, inexpensive small area and high performance.

Description

A kind of processor pipeline structure

Technical field

The utility model belongs to processor hardware design field, more particularly, to a kind of super low-power consumption high-performance Processor pipeline structure.

Background technique

In recent years, as the continuous promotion of integrated circuit fabrication process, processor integrated level and performance are continuously improved, accordingly Ground, power consumption are also constantly increasing, with mobile device be widely used and the fast development of Internet of Things, for low-power consumption, The demand of the low cost and high performance processor of small area is continuously increased, and possesses high property again while reducing power consumption, low cost Can, become one new research hotspot of designer.

Authorization Notice No. is that the utility model patent of 104699463 B of CN discloses a kind of Novel hydroelectric cable architecture, is used The mode of register stack is constructed, to reduce a large amount of dynamic power consumptions that the overturning of register caused by data path generates；Above-mentioned stream Line structure is primarily adapted for use in a large amount of dynamic power consumptions that register overturning caused by reducing because of mass rapid transmission generates, and is passing The low occasion of defeated small scale error rate can not play effectiveness, and also will increase the complexity of design, increase processor area, cause Cost increases；

The utility model patent that Authorization Notice No. is 101464721 B of CN discloses a kind of control performance and power consumption Design method, by monitor pipeline-type processor performance, when detect processor throughput reduce when, reconfigure stream Waterline is switched to low performance mode from high performance mode, to reduce power consumption；The system and design it is sufficiently complex, be not suitable for it is low at The processor of this small area designs, and has also recognized this point in his specific embodiment, mentions development very It is complicated and time-consuming；

Authorization Notice No. is that the utility model patent of 103218029 B of CN discloses a kind of flowing water for controlling supply voltage Cable architecture by changing the structure of register in existing pipeline organization, while increasing register built error correction circuit and flowing water Error correction circuit outside line, further decreases supply voltage, adjusts voltage in real time using the height of error number so that kernel power consumption into One step reduces；Above-mentioned design scheme reduces cost, although reducing to a certain extent there is no reduction processor area is considered Power consumption, but the complexity and cost of system are also improved simultaneously.

Although increasing the complexity of system in conclusion existing processor reduces power consumption to a certain extent Degree, increases processor area and improves cost.

Utility model content

Aiming at the above defects or improvement requirements of the prior art, the utility model provides at a kind of super low-power consumption high-performance Device pipeline organization is managed, is improved by the internal structure to pipelined units at different levels, its object is to solve existing place Reason device pipeline organization can not combine low-power consumption, inexpensive small area and high performance problem.

To achieve the above object, according to the one aspect of the utility model, a kind of processor pipeline structure is provided, is wrapped Include the location of instruction, Fetch unit, execution unit, memory access unit and writeback unit；The first end of Fetch unit is deposited with instruction Storage unit is connected, and second end is connected with the first end of execution unit；The first end phase of the second end of execution unit and writeback unit Even, third end is connected with the first end of memory access unit；The second end of memory access unit is connected with the second end of writeback unit；

Fetch unit takes out an instruction from the location of instruction within a clock cycle；Execution unit is used for taking Refer to that the instruction of unit output is decoded and executed, the result of instruction execution writes back register group by writeback unit；

Writeback unit writes back module and second including first and writes back module；First writes back the first end and execution unit of module Second end be connected, second end is connected with the second end of memory access unit, and the first end that third end writes back module with second is connected；The Two write back the second end of module is connected with the 4th end of execution unit；

First writes back module for arbitrating each long period instruction execution result through execution unit or the output of memory access unit Sequence is write back, makes to write back and sequentially sends sequence consensus with what corresponding long period instructed；Second write back module for arbitrate through executing The one-cycle instruction and first that unit exports write back writing back sequentially for the multi-cycle instructions implementing result of module output, and long period refers to Enabling has higher priority.

Preferably, above-mentioned processor pipeline structure, Fetch unit include the first program counter, the second programmed counting Device, PC generation module, Partial Decode module, branch prediction module and command register；

The first end of command register is connected with the first end of the location of instruction, the first end of second end and execution unit It is connected；The first end of Partial Decode module is connected with the second end of the location of instruction, and the of second end and branch prediction module One end is connected, and third end is connected with the first end of PC generation module；The of the second end of branch prediction module and PC generation module Two ends are connected；The third end of PC generation module is connected with the first end of the first program counter, the 4th end and the first programmed counting The second end of device is connected, and the 5th end is connected with the third end of the location of instruction, and the 6th end is connected with the third end of execution unit； The third end of first program counter is connected with the first end of the second program counter；The second end of second program counter with hold The third end of row unit is connected；

Partial Decode module judges that this is current for being decoded to the present instruction taken out from the location of instruction The type of instruction is ordinary instruction or branch's jump instruction, and if ordinary instruction, Partial Decode module is directly by the current finger Order is sent to PC generation module；PC generation module is raw according to the current instruction address that present instruction and the first program counter are sent At the next address to instruction fetch；

If branch's jump instruction, then the present instruction is sent to branch prediction module by Partial Decode module；Branch is pre- The jump target addresses that module obtains the present instruction by static prediction are surveyed, PC generation module is obtained according to branch prediction module The jump target addresses of present instruction generate the next address to instruction fetch；

Partial Decode module, branch prediction module and PC generation module are combined logical structure, and the decoding of present instruction divides Branch prediction and the next generation to instruction fetch address are completed within the same clock cycle.

Preferably, above-mentioned processor pipeline structure, execution unit include decoding module, send module, instruction trace Module, one-cycle instruction computing module, long period ordering calculation module and delivery module；

The first end of decoding module is connected with the second end of command register, and the second of second end and the second program counter End is connected, and third end is connected with the first end of module is sent；Send first end phase of the second end of module with instruction trace module Even, third end is connected with the first end of one-cycle instruction computing module, the first end at the 4th end and long period ordering calculation module It is connected, the 5th end is connected with the first end of memory access unit；The second end of long period ordering calculation module and first writes back module First end is connected；The second end that the second end of instruction trace module writes back module with first is connected；One-cycle instruction computing module The second end second end that writes back module with second be connected, third end is connected with the second end of memory access unit, the 4th end and delivers The first end of module is connected；The third end that the second end of delivery module writes back module with second is connected, and third end and PC generate mould 6th end of block is connected；

Instruction trace module sends module for storing the long period command information for being sent away and not yet writing back When carrying out instruction and sending, each long period command information for will storing in the information for currently sending instruction and instruction trace module It is compared, to judge whether present instruction is related to the long period instruction generation data for being sent out and not yet to write back Property, if it is not, then normally sending；If so, pause send, until related long period instruction execution finish release data dependence it Just continue to send afterwards.

Preferably, above-mentioned processor pipeline structure, delivery module include abnormal judging submodule and branch prediction solution Analyse submodule；

The first end of the first end of branch prediction analyzing sub-module and abnormal judging submodule with one-cycle instruction operation 4th end of module is connected, and the second end of the second end of branch prediction analyzing sub-module and abnormal judging submodule is generated with PC 6th end of module is connected；The third end that the third end of abnormal judging submodule writes back module with second is connected；

Branch prediction analyzing sub-module is used to judge PC generation module according to the operation result of one-cycle instruction computing module Whether the next address to instruction fetch generated be correct, if so, not dealing with；If it is not, then removing wrong address and generating New next is to instruction fetch address and is fed back to PC generation module；

Abnormal judging submodule is used to judge that present instruction is executing according to the operation result of one-cycle instruction computing module Whether mistake occurs in the process, if it is not, not dealing with then；If so, removing current instruction address and generating new address and incite somebody to action It feeds back to PC generation module.

Preferably, above-mentioned processor pipeline structure, instruction trace module include multiple for storing long period instruction The list item of information, the information of the corresponding storage of list item one long period instruction, including source operand register index and result Register index.

Preferably, above-mentioned processor pipeline structure, instruction trace module realize that first writes back module pair using FIFO When multiple long period instructions carry out written-back operation, writing for different long period instructions is arbitrated according to the direction of the read pointer of FIFO sequence It rolls back and rationalize sequence；After a certain long period instruction is written back into, which is instructed corresponding information deletion by instruction trace module.

Preferably, above-mentioned processor pipeline structure, one-cycle instruction computing module are also used to generate memory access Address；

Control module of the memory access unit as memory access, according to above-mentioned memory reference address by address judge from Corresponding instruction is obtained in command storage unit part, or corresponding data is obtained from data storage part.

Preferably, above-mentioned processor pipeline structure, the location of instruction are realized using instruction close coupling memory.

In general, it can obtain down the above technical solutions conceived by the present invention are compared with the prior art, Column the utility model has the advantages that

(1) a kind of processor pipeline structure provided by the utility model writes back module and second by first and writes back mould The two-stage that block realizes instruction writes back, and first writes back module and the effect of instruction trace module cooperative completes the instruction of different long periods Write back, make its write back sequence and send sequence strict conformance, realize the succinct of hardware configuration, reduce processor area； Second write back module for arbitrate whole one-cycle instructions and long period instruction write back sequence, wherein long period instruction have it is excellent First grade；And in the idling cycle of no long period instruction write-back, one-cycle instruction then can at will write back；Plan is write back by two-stage Slightly by long period instruction delivery and write back separation, even if so that perform multicycle long period instruct, still will not block stream Waterline allows subsequent one-cycle instruction still to be able to smoothly write back and deliver, improves processor performance；

(2) a kind of processor pipeline structure provided by the utility model by instruction trace module and sends module to be assisted Same-action solves the problems, such as data dependence；Instruction trace module is for storing the length for being sent away and not yet writing back Cycles per instruction information, send module carry out instruction send when, will currently send instruction information and instruction trace module in Each long period command information is compared, with judge present instruction whether with the long period that has been sent out and not yet to write back Instruction generates RAW and WAW correlation, and data dependence is such as not present, then normally sends；Such as there is data dependence, then suspends It sends, just continues to send until related long period instruction execution, which finishes, releases data dependence；The utility model is adopted Solves the problems, such as data dependence with the method for obstruction assembly line, the result without instructing long period directly quickly bypasses To subsequent wait send instruction, the power consumption and area of processor are reduced；

(3) a kind of processor pipeline structure provided by the utility model uses the ITCM of monocycle access as instruction Memory, Fetch unit can fetch an instruction with a cycle from ITCM；Traditional Cache is replaced using ITCM, it can Meet super low-power consumption small area processor requirement of real-time, and reduces the cost and area of processor；Partial Decode module, branch Prediction module and PC generation module are combined logical structure, and Fetch unit completes instruction in one cycle and reads, partially translates Code, branch prediction generate the sequence of operations such as the next PC to instruction fetch, accomplish continuously instruction fetch, greatly improve Processor performance.

Detailed description of the invention

Fig. 1 is a kind of integrated stand composition of processor pipeline structure provided by the embodiment of the utility model；

Fig. 2 is a kind of structure chart of the Fetch unit of processor pipeline structure provided by the embodiment of the utility model；

Fig. 3 is a kind of structure chart of the execution unit of processor pipeline structure provided by the embodiment of the utility model；

Fig. 4 is a kind of structure chart of the writeback unit of processor pipeline structure provided by the embodiment of the utility model.

Specific embodiment

In order to make the purpose of the utility model, technical solutions and advantages more clearly understood, below in conjunction with attached drawing and implementation Example, the present invention will be further described in detail.It should be appreciated that specific embodiment described herein is only used to explain The utility model is not used to limit the utility model.In addition, institute in the various embodiments of the present invention described below The technical characteristic being related to can be combined with each other as long as they do not conflict with each other.

A kind of processor pipeline structure, it is ultralow to be primarily adapted for use in Embedded provided by the utility model embodiment The inexpensive small area processor of power consumption scene design, the hummingbird E200 processor ground certainly such as our company based on RISC-V framework Core；The pipeline organization includes multi-stage pipeline units, specifically includes the location of instruction, Fetch unit, execution unit, memory access Unit and writeback unit；The first end of Fetch unit is connected with the location of instruction, the first end phase of second end and execution unit Even；The second end of execution unit is connected with the first end of writeback unit, and third end is connected with the first end of memory access unit；Memory access list The second end of member is connected with the second end of writeback unit；

The utility model mainly divides the level of assembly line according to the clock cycle, wherein the location of instruction and fetching list Member belongs to the first level production line, and the location of instruction for storing instruction, deposit for continuously continual from instruction by Fetch unit Instruction fetch in storage unit；Execution unit and writeback unit belong to the second level production line, and execution unit is used to export Fetch unit Instruction decoded and executed, writeback unit is used to the result of instruction execution writing back general register group (Register File, Regfile)；Because the decoding of instruction is executed and is write back and is in the same clock cycle, by execution unit and Writeback unit is divided into the second level in pipeline organization.

It only needs above-mentioned two level production line can be completed some one-cycle instructions, and some long periods is instructed, Need to use the memory access function of memory access unit, memory access unit belongs to third level production line, but the result of memory access unit output is logical The writeback unit crossed in the second level production line writes back general register group, and therefore, a kind of super low-power consumption provided in this embodiment is high Performance processor pipeline organization is an elongated pipeline organization, compares existing linear type pipeline organization, reduces stream Waterline series, and then the area of processor can be reduced and reduce cost.

Fetch unit is used for the instruction fetch from the location of instruction, and to improve processor performance, the process of instruction fetch need to be done To " fast " and " successive ", a kind of super low-power consumption high-performance processor pipeline organization as provided by the present embodiment is main Suitable for the low small area processor of Embedded super low-power consumption Scenario Design, the journey of the embeded processor core of this rank Sequence size of code is little, therefore the utility model is using instruction close coupling memory (Instruction Tightly Coupled Memory；ITCM) the storage instructed as the location of instruction, Fetch unit can take from ITCM in a clock cycle An instruction out, realizes quick instruction fetch；Compared to traditional I-Cache, meeting super low-power consumption small area processor real-time Under the premise of it is required that, the cost and area of processor can reduce.

Fetch unit includes the first program counter, the second program counter, PC generation module, Partial Decode module, divides Branch prediction module and command register；

The first end of command register is connected with the first end of ITCM, and second end is connected with the first end of execution unit；Portion The first end of decoding module is divided to be connected with the second end of ITCM, second end is connected with the first end of branch prediction module, third end It is connected with the first end of PC generation module；The second end of branch prediction module is connected with the second end of PC generation module；PC is generated The third end of module is connected with the first end of the first program counter, and the 4th end is connected with the second end of the first program counter, 5th end is connected with the third end of ITCM, and the 6th end is connected with the second end of execution unit；The third end of first program counter It is connected with the first end of the second program counter；The second end of second program counter is connected with the third end of execution unit；

Command register be used to store the instruction (referred to as present instruction) taken out from ITCM in a certain clock cycle and Next clock cycle sends it to execution unit；Second program counter is used to receive working as the first program counter transmission Preceding IA simultaneously sends it to execution unit in next clock cycle；Execution unit is synchronous to receive present instruction and current IA.

Partial Decode module is used to decode the present instruction taken out from ITCM the class to judge the present instruction Type is ordinary instruction or branch's jump instruction, and if ordinary instruction, which is directly sent to by Partial Decode module PC generation module；PC generation module generates next according to the current instruction address that present instruction and the first program counter are sent Address (i.e. PC value) to instruction fetch；If branch's jump instruction, then the present instruction is sent to branch by Partial Decode module Prediction module；Branch prediction module obtains the jump target addresses of the present instruction by static prediction, PC generation module according to The jump target addresses for the present instruction that branch prediction module obtains generate next and send out respectively to the address of instruction fetch and by it Give the first program counter and ITCM；

Partial Decode module, branch prediction module and PC generation module are combined logical structure, and the decoding of present instruction divides Branch is predicted and next is completed within the same clock cycle to the generation of instruction fetch address and the acquisition of present instruction, therefore this Embodiment provide Fetch unit can complete within a clock cycle present instruction acquisition and next to instruction fetch address Generation, realize continuously instruction fetch, improve processor performance.

Branch prediction module is using a kind of simple, flexible static prediction method: the conditional branching jumped backward is referred to Order is predicted as really jumping, and the conditional branch instructions jumped forward is predicted as not jumping, specific main points are as follows:

1, jump instruction direct for conditional, the conditional branch instructions such as such as BEQ, BNE use above-mentioned static prediction method (jumping backward, be predicted as needing to jump, be otherwise predicted as not needing to jump)；For jump target addresses, using its PC and The offset that immediate indicates is added to obtain its jump target addresses；

2, for unconditional direct jump instruction, such as jal is instructed, and since it is bound to jump, there is no need to predict that it is jumped Direction；For jump target addresses, it is added to obtain its jump target addresses with the offset that immediate indicates using its PC.

3, for unconditional indirect jump instruction, such as jalr is instructed, and since it is bound to jump, there is no need to predict its jump Turn direction；For its jump target addresses, operation of the base address needed for calculating jump target addresses from its rs1 index Number, needs to read from general register group, and be also possible to and be carrying out the instruction executed in unit and form RAW data phase Guan Xing；The utility model is different according to its rsl index and takes different schemes: if it is constant deposit that call number is corresponding Device directly uses the constant, without reading from related register；If corresponding call number is common link register, will The register direct cable takes out, and generates the data dependence of read-after-write in order to prevent, needs to determine to flow in the second level Writeback unit in waterline does not carry out written-back operation to the register, specific as follows:

If 31, the call number of rs1 is x0, it is (fixed according to RISC-V framework that the constant 0 that then be used directly carries out base address calculating Adopted x0 indicates constant 0), without being read from Regfile.

If 32, the call number of rs1 is x1, refer to since x1 is frequently utilized for link register as function return jump It enables, by x1, direct cable is taken out from the Regfile in execution unit, does not need the Read Port for occupying Regfile；For It prevents from being carrying out the instruction executed in unit and needs to write back link register to cause RAW data dependence, branch is pre- Module is surveyed to need to determine currently whether there is instruction write-back link register；

If 3, the call number of rs1 is to need in addition to other of x0 and x1 register (abbreviation xn) using Regfile's Read Port reads out xn from Regfile, and needs to determine whether current Read Port is idle and resource is not present Conflict；Meanwhile being carrying out the instruction executed in unit in order to prevent and need to write back xn and causing RAW data dependence, branch is pre- Module is surveyed to need to determine currently whether there is instruction write-back Regfile.

Execution unit includes decoding module, sends module, instruction trace module, one-cycle instruction computing module, long period Ordering calculation module and delivery module；

The first end of decoding module is connected with the second end of command register, and the second of second end and the second program counter End is connected, and third end is connected with the first end of module is sent；Send first end phase of the second end of module with instruction trace module Even, third end is connected with the first end of one-cycle instruction computing module, the first end at the 4th end and long period ordering calculation module It is connected, the 5th end is connected with the first end of memory access unit；The second end of instruction trace module is connected with the first end of writeback unit； The second end of one-cycle instruction computing module is connected with the second end of writeback unit, the second end phase at third end and memory access unit Even, the 4th end is connected with the first end of delivery module；The second end of long period ordering calculation module and the third end of writeback unit It is connected；The second end of delivery module is connected with the 4th end of writeback unit, and third end is connected with the 6th end of PC generation module；

Decoding module is for decoding to obtain operand register the present instruction of acquisition and current instruction address Index；And for obtaining corresponding operation data from Read-Regfile according to operand register index；

Send module for being sent the operation data that decoding module obtains to different arithmetic elements according to instruction type It executes；Wherein, one-cycle instruction computing module is mainly used for the operation and execution of one-cycle instruction, long period ordering calculation module It is mainly used for the operation and execution of long period instruction；One-cycle instruction and the implementing result of long period instruction pass through writeback unit Write back Write-Regfile；

Delivery module is used to the calculated result of one-cycle instruction computing module consigning to PC generation module；Delivery module packet Include abnormal judging submodule and branch prediction analyzing sub-module；

The first end of the first end of branch prediction analyzing sub-module and abnormal judging submodule with one-cycle instruction operation 4th end of module is connected, and the second end of the second end of branch prediction analyzing sub-module and abnormal judging submodule is generated with PC 6th end of module is connected；The third end of abnormal judging submodule is connected with the 4th end of writeback unit；

Branch prediction analyzing sub-module is used to judge PC generation module according to the calculated result of one-cycle instruction computing module Whether next generated be correct to instruction fetch address, if so, not dealing with；If it is not, then removing wrong address and generating new Next to instruction fetch address and be fed back to PC generation module；Abnormal judging submodule according to one-cycle instruction for transporting The calculated result for calculating module judges whether present instruction occurs mistake in the process of implementation, if it is not, not dealing with then；If so, It removes current instruction address and generates new address and be fed back to PC generation module；The new address that PC generation module will acquire It is sent to ITCM, Fetch unit instruction fetch and is sent to execution unit and carries out decoding execution from ITCM again.

Since mistake may occur in the process of implementation for the instruction of part long period, so writeback unit needs and abnormal judgement Submodule carries out interface triggering exception, if producing exception, the implementing result of long period instruction does not write back Write- Regfile。

Module is sent to need whether to check it when every instruction is sent based on the micro-architecture sent in order It is executed but not yet there are data dependences between the instruction that writes back with sending before；Data dependence is divided into writeafterread (Write- After-Read；WAR), read-after-write (Read-After-Write；) and write after write (Write-After-Write RAW；WAW) several Kind；

1, WAR correlation: since pipeline organization provided by the utility model is suitable for based in order sending, in order The processor of the micro-architecture write back just has read source operand from general register group when sending in instruction, therefore The instruction write-back Write-Regfile of subsequent execution " operate " there is no fear of occurring " instruction that preamble executes is from Read- Before read operands in Regfile ", therefore it there is no fear of data collision caused by WAR correlation occurs.

2, RAW correlation: the instruction sent is in the second level of assembly line, it is assumed that the instruction sent before is (referred to as Preamble instruction) it is one-cycle instruction (second level for being also at assembly line writes back), then preamble one-cycle instruction, which has been completed, holds It goes and the instruction for having resulted back into Write-Regfile, therefore having sent can not generate and preamble one-cycle instruction RAW correlation caused by data collision；It is assumed that preamble instruction is long period instruction, since long period instruction needs are multiple Period could write-back result, therefore the instruction sent is possible to generate the RAW correlation that instructs with preamble long period.

3, WAW correlation: the instruction sent is in the second level of assembly line, it is assumed that preamble instruction is to refer to the monocycle It enabling, then preamble one-cycle instruction, which has been completed, executes and has resulted back into Write-Regfile, therefore sending Instruction can not generate data collision caused by the WAW correlation with preamble one-cycle instruction；It is assumed that preamble instruction is long Cycles per instruction, due to long period instruction need multiple periods could write-back result, the instruction sent is possible to generate With the WAW correlation of preamble long period instruction.

To sum up, in pipeline organization provided by the utility model, " instruction sent " is only possible to and " has not carried out RAW and WAW correlation is generated between the long period instruction finished ".

In order to detect RAW the and WAW correlation between the instruction currently sent and the instruction of preamble long period, this reality An instruction trace module is provided in execution unit with novel, which is sent away for storing And the long period command information not yet write back, information include but is not limited to the long period instruction source operand register index and Result register index；

The instruction trace module preferably uses the FIFO of first in, first out mechanism to realize；Module is sent often to send a long period Instruction is then long period instruction one list item (Entry) of distribution in instruction trace module, for storing long period instruction Source operand register index and result register index；Writeback unit writes back the implementing result that the long period instructs After Write-Regfile, which is then instructed corresponding list item to remove by instruction trace module, therefore instruction trace module Middle storage is the long period command information for being sent away and not yet writing back；Send module carry out instruction send when, By each list item in the source operand register index and result register index and instruction trace module of currently sending instruction Information is compared, so that judging whether present instruction instructs with the long period for being sent out and not yet to write back generates RAW With WAW correlation, data dependence is such as not present, then normally sends；Such as there is data dependence, then pause is sent, Zhi Daoxiang Customs director's cycles per instruction, which is finished to release, just to be continued to send after data dependence.The depth of FIFO is defaulted as two tables , the information of two long periods instruction can be stored simultaneously；List item number is preferentially no more than four, otherwise will reduce processor The speed of service.

Pipeline organization provided by the utility model conflicts for caused by data dependence, using the side of obstruction assembly line Method, and there is no directly quickly bypassing the result of long period instruction to subsequent wait send instruction, reduce the function of processor Consumption and area.

Part long period is instructed, such as Load and Store instruction and " A " extended instruction, needs to use memory access list The memory access function of member；Above-metioned instruction is sent module to be sent to after one-cycle instruction computing module, one-cycle instruction computing module Memory reference address is generated through operation and sends it to memory access unit, control mould of the memory access unit as memory access Block judges to obtain corresponding instruction from command storage unit part by address, or obtains corresponding data from data storage part.

Writeback unit includes first writing back module and second and writing back module, and first writes back the first end of module and long period refers to The second end of computing module is enabled to be connected, second end is connected with the second end of instruction trace module, and third end and second writes back module First end be connected, the 4th end is connected with the third end of memory access unit；Second writes back the second end of module and one-cycle instruction fortune The second end for calculating module is connected, and third end is connected with the third end of abnormal judging submodule；

First, which writes back module, is mainly used for arbitrating writing back for each long period instruction execution result, as shown in figure 4, long period refers to It enables the operation result after long period ordering calculation module or memory access cell processing initially enter first and writes back module；In addition, the One operation result for writing back the received long period instruction of module is also possible to from multiplier-divider, FPU and EAI coprocessor etc.； When these long period instruction write-backs, sequence theoretically is sent according to it without stringent, it is only necessary to occur conflicting in register Followed when situation and send sequence, remaining time can random ordering write back.But in order to realize the succinct of hardware, the present embodiment choosing It selects and strictly sends sequence to carry out writing back for its operation result according to long period instruction；Due to different long period instruction executions Periodicity is different or even the execution cycle number of some long periods instruction is dynamic, therefore can not easily judge these The precedence relationship of long period instruction, so needing the precedence relationship between pre-recorded these long periods instruction.

Instruction tracing module provided in this embodiment is to be used to record the information of long period instruction, and module is sent often to send one A long period instruction can be then long period instruction one list item of distribution in instruction tracing module to record long period instruction Information；Instruction label (the Instruction Tag that the FIFO pointer (Pointer) of this list item is instructed as the long period； ITAG)；Long period instruction carries always its corresponding ITAG after sending when its operation result is written back into；

Instruction trace module and first writes back the written-back operation that all long period instructions are completed in module cooperative cooperation, and first writes Returning in the operation result of the received long period instruction of module includes that the long period instructs corresponding ITAG；Due to instruction trace module It is the FIFO of a first in, first out, the read pointer (Read Pointer) of FIFO can be directed toward the table for entering instruction trace module at first , first, which writes back module, is sent to second for the operation result that long period corresponding to the list item instructs and writes back module, meanwhile, refer to Tracking module is enabled to instruct corresponding list item to delete the long period；First writes back module according to the direction sequence of instruction trace module Determine long period ordering calculation result writes back sequence, it is ensured that it writes back sequence and sends sequence strict conformance.

Second, which writes back module, is mainly used for receiving the operation result for the one-cycle instruction that one-cycle instruction computing module is sent, And the operation result of the long period instruction after first writes back module arbitration, and all instructions is carried out by the way of priority The arbitration for writing back sequence, since the execution period of long period instruction is long, than the one-cycle instruction that is writing back in program flow In in position earlier, so long period instruction writes back writing back with higher priority than one-cycle instruction.If In the idling cycle of no long period instruction write-back, one-cycle instruction then can at will be write back；In later i.e. in program flow (if without data dependence), therefore the one-cycle instruction of position can first write back than the long period instruction of position earlier Pipeline organization provided by the embodiment of the utility model is provided simultaneously with the out-of-order ability write back.

Compared to existing processor pipeline structure, a kind of processor pipeline structure provided by the utility model is led to It crosses and the internal structure of pipelined units at different levels is improved, the Partial Decode mould of combined logical structure is set in Fetch unit Block, branch prediction module and PC generation module, can complete within a clock cycle present instruction acquisition and next wait take The generation of IA realizes continuously instruction fetch, improves processor performance；Instruction trace is set in execution unit Module solves the problems, such as data dependence using the method for obstruction assembly line, and the result without instructing long period is direct Quickly bypass reduces the power consumption and area of processor to subsequent wait send instruction；Writeback unit is divided into first and writes back mould Block and second writes back module, writes back strategy for the delivery of long period instruction by two-stage and writes back separation, even if so that performing The instruction of multicycle long period, still will not block assembly line, subsequent one-cycle instruction is allowed still to be able to smoothly write back and hand over It pays, improves processor performance；Low-power consumption, inexpensive small area can not be combined by solving existing processor pipeline structure With high performance problem.

As it will be easily appreciated by one skilled in the art that the above is only the preferred embodiment of the utility model only, not To limit the utility model, any modification made within the spirit and principle of the present invention, equivalent replacement and change Into etc., it should be included within the scope of protection of this utility model.

Claims

1. a kind of processor pipeline structure, which is characterized in that including the location of instruction, Fetch unit, execution unit, memory access The first end of unit and writeback unit, the Fetch unit is connected with the location of instruction, and the first of second end and execution unit End is connected；The second end of the execution unit is connected with the first end of writeback unit, the first end phase at third end and memory access unit Even；The second end of the memory access unit is connected with the second end of writeback unit；

The writeback unit writes back module and second including first and writes back module；Described first writes back the first end and execution of module The second end of unit is connected, and second end is connected with the second end of memory access unit, the first end phase that third end writes back module with second Even；Described second writes back the second end of module is connected with the 4th end of execution unit.

2. processor pipeline structure as described in claim 1, which is characterized in that the Fetch unit includes the first program meter Number device, the second program counter, PC generation module, Partial Decode module, branch prediction module and command register；

The first end of described instruction register is connected with the first end of the location of instruction, the first end of second end and execution unit It is connected；The first end of the Partial Decode module is connected with the second end of the location of instruction, second end and branch prediction module First end be connected, third end is connected with the first end of PC generation module；The second end and PC of the branch prediction module generate The second end of module is connected；The third end of the PC generation module is connected with the first end of the first program counter, the 4th end with The second end of first program counter is connected, and the 5th end is connected with the third end of the location of instruction, the 6th end and execution unit Third end be connected；The third end of first program counter is connected with the first end of the second program counter；Described second The second end of program counter is connected with the third end of execution unit.

3. processor pipeline structure as described in claim 1, which is characterized in that described instruction storage unit is tight using instruction Coupled memory is realized.