CN208580395U - A kind of processor pipeline structure - Google Patents
A kind of processor pipeline structure Download PDFInfo
- Publication number
- CN208580395U CN208580395U CN201820548377.4U CN201820548377U CN208580395U CN 208580395 U CN208580395 U CN 208580395U CN 201820548377 U CN201820548377 U CN 201820548377U CN 208580395 U CN208580395 U CN 208580395U
- Authority
- CN
- China
- Prior art keywords
- instruction
- module
- unit
- long period
- writes back
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003860 storage Methods 0.000 claims description 8
- 230000008520 organization Effects 0.000 description 13
- 230000002159 abnormal effect Effects 0.000 description 12
- 238000000034 method Methods 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 9
- 238000013461 design Methods 0.000 description 8
- 238000004519 manufacturing process Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000013475 authorization Methods 0.000 description 3
- 230000009191 jumping Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 241000272878 Apodiformes Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004334 sorbic acid Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Landscapes
- Advance Control (AREA)
Abstract
The utility model discloses a kind of processor pipeline structures, including the location of instruction, Fetch unit, execution unit, memory access unit and writeback unit;The first end of Fetch unit is connected with the location of instruction, and second end is connected with the first end of execution unit;The second end of execution unit is connected with the first end of writeback unit, and third end is connected with the first end of memory access unit;The second end of memory access unit is connected with the second end of writeback unit;Writeback unit writes back module and second including first and writes back module;Described first writes back the first end of module is connected with the second end of execution unit, and second end is connected with the second end of memory access unit, and the first end that third end writes back module with second is connected;Second writes back the second end of module is connected with the 4th end of execution unit;The utility model is improved by the internal structure to pipelined units at different levels, solves the problems, such as that existing processor pipeline structure can not combine low-power consumption, inexpensive small area and high performance.
Description
Technical field
The utility model belongs to processor hardware design field, more particularly, to a kind of super low-power consumption high-performance
Processor pipeline structure.
Background technique
In recent years, as the continuous promotion of integrated circuit fabrication process, processor integrated level and performance are continuously improved, accordingly
Ground, power consumption are also constantly increasing, with mobile device be widely used and the fast development of Internet of Things, for low-power consumption,
The demand of the low cost and high performance processor of small area is continuously increased, and possesses high property again while reducing power consumption, low cost
Can, become one new research hotspot of designer.
Authorization Notice No. is that the utility model patent of 104699463 B of CN discloses a kind of Novel hydroelectric cable architecture, is used
The mode of register stack is constructed, to reduce a large amount of dynamic power consumptions that the overturning of register caused by data path generates;Above-mentioned stream
Line structure is primarily adapted for use in a large amount of dynamic power consumptions that register overturning caused by reducing because of mass rapid transmission generates, and is passing
The low occasion of defeated small scale error rate can not play effectiveness, and also will increase the complexity of design, increase processor area, cause
Cost increases;
The utility model patent that Authorization Notice No. is 101464721 B of CN discloses a kind of control performance and power consumption
Design method, by monitor pipeline-type processor performance, when detect processor throughput reduce when, reconfigure stream
Waterline is switched to low performance mode from high performance mode, to reduce power consumption;The system and design it is sufficiently complex, be not suitable for it is low at
The processor of this small area designs, and has also recognized this point in his specific embodiment, mentions development very
It is complicated and time-consuming;
Authorization Notice No. is that the utility model patent of 103218029 B of CN discloses a kind of flowing water for controlling supply voltage
Cable architecture by changing the structure of register in existing pipeline organization, while increasing register built error correction circuit and flowing water
Error correction circuit outside line, further decreases supply voltage, adjusts voltage in real time using the height of error number so that kernel power consumption into
One step reduces;Above-mentioned design scheme reduces cost, although reducing to a certain extent there is no reduction processor area is considered
Power consumption, but the complexity and cost of system are also improved simultaneously.
Although increasing the complexity of system in conclusion existing processor reduces power consumption to a certain extent
Degree, increases processor area and improves cost.
Utility model content
Aiming at the above defects or improvement requirements of the prior art, the utility model provides at a kind of super low-power consumption high-performance
Device pipeline organization is managed, is improved by the internal structure to pipelined units at different levels, its object is to solve existing place
Reason device pipeline organization can not combine low-power consumption, inexpensive small area and high performance problem.
To achieve the above object, according to the one aspect of the utility model, a kind of processor pipeline structure is provided, is wrapped
Include the location of instruction, Fetch unit, execution unit, memory access unit and writeback unit;The first end of Fetch unit is deposited with instruction
Storage unit is connected, and second end is connected with the first end of execution unit;The first end phase of the second end of execution unit and writeback unit
Even, third end is connected with the first end of memory access unit;The second end of memory access unit is connected with the second end of writeback unit;
Fetch unit takes out an instruction from the location of instruction within a clock cycle;Execution unit is used for taking
Refer to that the instruction of unit output is decoded and executed, the result of instruction execution writes back register group by writeback unit;
Writeback unit writes back module and second including first and writes back module;First writes back the first end and execution unit of module
Second end be connected, second end is connected with the second end of memory access unit, and the first end that third end writes back module with second is connected;The
Two write back the second end of module is connected with the 4th end of execution unit;
First writes back module for arbitrating each long period instruction execution result through execution unit or the output of memory access unit
Sequence is write back, makes to write back and sequentially sends sequence consensus with what corresponding long period instructed;Second write back module for arbitrate through executing
The one-cycle instruction and first that unit exports write back writing back sequentially for the multi-cycle instructions implementing result of module output, and long period refers to
Enabling has higher priority.
Preferably, above-mentioned processor pipeline structure, Fetch unit include the first program counter, the second programmed counting
Device, PC generation module, Partial Decode module, branch prediction module and command register;
The first end of command register is connected with the first end of the location of instruction, the first end of second end and execution unit
It is connected;The first end of Partial Decode module is connected with the second end of the location of instruction, and the of second end and branch prediction module
One end is connected, and third end is connected with the first end of PC generation module;The of the second end of branch prediction module and PC generation module
Two ends are connected;The third end of PC generation module is connected with the first end of the first program counter, the 4th end and the first programmed counting
The second end of device is connected, and the 5th end is connected with the third end of the location of instruction, and the 6th end is connected with the third end of execution unit;
The third end of first program counter is connected with the first end of the second program counter;The second end of second program counter with hold
The third end of row unit is connected;
Partial Decode module judges that this is current for being decoded to the present instruction taken out from the location of instruction
The type of instruction is ordinary instruction or branch's jump instruction, and if ordinary instruction, Partial Decode module is directly by the current finger
Order is sent to PC generation module;PC generation module is raw according to the current instruction address that present instruction and the first program counter are sent
At the next address to instruction fetch;
If branch's jump instruction, then the present instruction is sent to branch prediction module by Partial Decode module;Branch is pre-
The jump target addresses that module obtains the present instruction by static prediction are surveyed, PC generation module is obtained according to branch prediction module
The jump target addresses of present instruction generate the next address to instruction fetch;
Partial Decode module, branch prediction module and PC generation module are combined logical structure, and the decoding of present instruction divides
Branch prediction and the next generation to instruction fetch address are completed within the same clock cycle.
Preferably, above-mentioned processor pipeline structure, execution unit include decoding module, send module, instruction trace
Module, one-cycle instruction computing module, long period ordering calculation module and delivery module;
The first end of decoding module is connected with the second end of command register, and the second of second end and the second program counter
End is connected, and third end is connected with the first end of module is sent;Send first end phase of the second end of module with instruction trace module
Even, third end is connected with the first end of one-cycle instruction computing module, the first end at the 4th end and long period ordering calculation module
It is connected, the 5th end is connected with the first end of memory access unit;The second end of long period ordering calculation module and first writes back module
First end is connected;The second end that the second end of instruction trace module writes back module with first is connected;One-cycle instruction computing module
The second end second end that writes back module with second be connected, third end is connected with the second end of memory access unit, the 4th end and delivers
The first end of module is connected;The third end that the second end of delivery module writes back module with second is connected, and third end and PC generate mould
6th end of block is connected;
Instruction trace module sends module for storing the long period command information for being sent away and not yet writing back
When carrying out instruction and sending, each long period command information for will storing in the information for currently sending instruction and instruction trace module
It is compared, to judge whether present instruction is related to the long period instruction generation data for being sent out and not yet to write back
Property, if it is not, then normally sending;If so, pause send, until related long period instruction execution finish release data dependence it
Just continue to send afterwards.
Preferably, above-mentioned processor pipeline structure, delivery module include abnormal judging submodule and branch prediction solution
Analyse submodule;
The first end of the first end of branch prediction analyzing sub-module and abnormal judging submodule with one-cycle instruction operation
4th end of module is connected, and the second end of the second end of branch prediction analyzing sub-module and abnormal judging submodule is generated with PC
6th end of module is connected;The third end that the third end of abnormal judging submodule writes back module with second is connected;
Branch prediction analyzing sub-module is used to judge PC generation module according to the operation result of one-cycle instruction computing module
Whether the next address to instruction fetch generated be correct, if so, not dealing with;If it is not, then removing wrong address and generating
New next is to instruction fetch address and is fed back to PC generation module;
Abnormal judging submodule is used to judge that present instruction is executing according to the operation result of one-cycle instruction computing module
Whether mistake occurs in the process, if it is not, not dealing with then;If so, removing current instruction address and generating new address and incite somebody to action
It feeds back to PC generation module.
Preferably, above-mentioned processor pipeline structure, instruction trace module include multiple for storing long period instruction
The list item of information, the information of the corresponding storage of list item one long period instruction, including source operand register index and result
Register index.
Preferably, above-mentioned processor pipeline structure, instruction trace module realize that first writes back module pair using FIFO
When multiple long period instructions carry out written-back operation, writing for different long period instructions is arbitrated according to the direction of the read pointer of FIFO sequence
It rolls back and rationalize sequence;After a certain long period instruction is written back into, which is instructed corresponding information deletion by instruction trace module.
Preferably, above-mentioned processor pipeline structure, one-cycle instruction computing module are also used to generate memory access
Address;
Control module of the memory access unit as memory access, according to above-mentioned memory reference address by address judge from
Corresponding instruction is obtained in command storage unit part, or corresponding data is obtained from data storage part.
Preferably, above-mentioned processor pipeline structure, the location of instruction are realized using instruction close coupling memory.
In general, it can obtain down the above technical solutions conceived by the present invention are compared with the prior art,
Column the utility model has the advantages that
(1) a kind of processor pipeline structure provided by the utility model writes back module and second by first and writes back mould
The two-stage that block realizes instruction writes back, and first writes back module and the effect of instruction trace module cooperative completes the instruction of different long periods
Write back, make its write back sequence and send sequence strict conformance, realize the succinct of hardware configuration, reduce processor area;
Second write back module for arbitrate whole one-cycle instructions and long period instruction write back sequence, wherein long period instruction have it is excellent
First grade;And in the idling cycle of no long period instruction write-back, one-cycle instruction then can at will write back;Plan is write back by two-stage
Slightly by long period instruction delivery and write back separation, even if so that perform multicycle long period instruct, still will not block stream
Waterline allows subsequent one-cycle instruction still to be able to smoothly write back and deliver, improves processor performance;
(2) a kind of processor pipeline structure provided by the utility model by instruction trace module and sends module to be assisted
Same-action solves the problems, such as data dependence;Instruction trace module is for storing the length for being sent away and not yet writing back
Cycles per instruction information, send module carry out instruction send when, will currently send instruction information and instruction trace module in
Each long period command information is compared, with judge present instruction whether with the long period that has been sent out and not yet to write back
Instruction generates RAW and WAW correlation, and data dependence is such as not present, then normally sends;Such as there is data dependence, then suspends
It sends, just continues to send until related long period instruction execution, which finishes, releases data dependence;The utility model is adopted
Solves the problems, such as data dependence with the method for obstruction assembly line, the result without instructing long period directly quickly bypasses
To subsequent wait send instruction, the power consumption and area of processor are reduced;
(3) a kind of processor pipeline structure provided by the utility model uses the ITCM of monocycle access as instruction
Memory, Fetch unit can fetch an instruction with a cycle from ITCM;Traditional Cache is replaced using ITCM, it can
Meet super low-power consumption small area processor requirement of real-time, and reduces the cost and area of processor;Partial Decode module, branch
Prediction module and PC generation module are combined logical structure, and Fetch unit completes instruction in one cycle and reads, partially translates
Code, branch prediction generate the sequence of operations such as the next PC to instruction fetch, accomplish continuously instruction fetch, greatly improve
Processor performance.
Detailed description of the invention
Fig. 1 is a kind of integrated stand composition of processor pipeline structure provided by the embodiment of the utility model;
Fig. 2 is a kind of structure chart of the Fetch unit of processor pipeline structure provided by the embodiment of the utility model;
Fig. 3 is a kind of structure chart of the execution unit of processor pipeline structure provided by the embodiment of the utility model;
Fig. 4 is a kind of structure chart of the writeback unit of processor pipeline structure provided by the embodiment of the utility model.
Specific embodiment
In order to make the purpose of the utility model, technical solutions and advantages more clearly understood, below in conjunction with attached drawing and implementation
Example, the present invention will be further described in detail.It should be appreciated that specific embodiment described herein is only used to explain
The utility model is not used to limit the utility model.In addition, institute in the various embodiments of the present invention described below
The technical characteristic being related to can be combined with each other as long as they do not conflict with each other.
A kind of processor pipeline structure, it is ultralow to be primarily adapted for use in Embedded provided by the utility model embodiment
The inexpensive small area processor of power consumption scene design, the hummingbird E200 processor ground certainly such as our company based on RISC-V framework
Core;The pipeline organization includes multi-stage pipeline units, specifically includes the location of instruction, Fetch unit, execution unit, memory access
Unit and writeback unit;The first end of Fetch unit is connected with the location of instruction, the first end phase of second end and execution unit
Even;The second end of execution unit is connected with the first end of writeback unit, and third end is connected with the first end of memory access unit;Memory access list
The second end of member is connected with the second end of writeback unit;
The utility model mainly divides the level of assembly line according to the clock cycle, wherein the location of instruction and fetching list
Member belongs to the first level production line, and the location of instruction for storing instruction, deposit for continuously continual from instruction by Fetch unit
Instruction fetch in storage unit;Execution unit and writeback unit belong to the second level production line, and execution unit is used to export Fetch unit
Instruction decoded and executed, writeback unit is used to the result of instruction execution writing back general register group (Register
File, Regfile);Because the decoding of instruction is executed and is write back and is in the same clock cycle, by execution unit and
Writeback unit is divided into the second level in pipeline organization.
It only needs above-mentioned two level production line can be completed some one-cycle instructions, and some long periods is instructed,
Need to use the memory access function of memory access unit, memory access unit belongs to third level production line, but the result of memory access unit output is logical
The writeback unit crossed in the second level production line writes back general register group, and therefore, a kind of super low-power consumption provided in this embodiment is high
Performance processor pipeline organization is an elongated pipeline organization, compares existing linear type pipeline organization, reduces stream
Waterline series, and then the area of processor can be reduced and reduce cost.
Fetch unit is used for the instruction fetch from the location of instruction, and to improve processor performance, the process of instruction fetch need to be done
To " fast " and " successive ", a kind of super low-power consumption high-performance processor pipeline organization as provided by the present embodiment is main
Suitable for the low small area processor of Embedded super low-power consumption Scenario Design, the journey of the embeded processor core of this rank
Sequence size of code is little, therefore the utility model is using instruction close coupling memory (Instruction Tightly Coupled
Memory;ITCM) the storage instructed as the location of instruction, Fetch unit can take from ITCM in a clock cycle
An instruction out, realizes quick instruction fetch;Compared to traditional I-Cache, meeting super low-power consumption small area processor real-time
Under the premise of it is required that, the cost and area of processor can reduce.
Fetch unit includes the first program counter, the second program counter, PC generation module, Partial Decode module, divides
Branch prediction module and command register;
The first end of command register is connected with the first end of ITCM, and second end is connected with the first end of execution unit;Portion
The first end of decoding module is divided to be connected with the second end of ITCM, second end is connected with the first end of branch prediction module, third end
It is connected with the first end of PC generation module;The second end of branch prediction module is connected with the second end of PC generation module;PC is generated
The third end of module is connected with the first end of the first program counter, and the 4th end is connected with the second end of the first program counter,
5th end is connected with the third end of ITCM, and the 6th end is connected with the second end of execution unit;The third end of first program counter
It is connected with the first end of the second program counter;The second end of second program counter is connected with the third end of execution unit;
Command register be used to store the instruction (referred to as present instruction) taken out from ITCM in a certain clock cycle and
Next clock cycle sends it to execution unit;Second program counter is used to receive working as the first program counter transmission
Preceding IA simultaneously sends it to execution unit in next clock cycle;Execution unit is synchronous to receive present instruction and current
IA.
Partial Decode module is used to decode the present instruction taken out from ITCM the class to judge the present instruction
Type is ordinary instruction or branch's jump instruction, and if ordinary instruction, which is directly sent to by Partial Decode module
PC generation module;PC generation module generates next according to the current instruction address that present instruction and the first program counter are sent
Address (i.e. PC value) to instruction fetch;If branch's jump instruction, then the present instruction is sent to branch by Partial Decode module
Prediction module;Branch prediction module obtains the jump target addresses of the present instruction by static prediction, PC generation module according to
The jump target addresses for the present instruction that branch prediction module obtains generate next and send out respectively to the address of instruction fetch and by it
Give the first program counter and ITCM;
Partial Decode module, branch prediction module and PC generation module are combined logical structure, and the decoding of present instruction divides
Branch is predicted and next is completed within the same clock cycle to the generation of instruction fetch address and the acquisition of present instruction, therefore this
Embodiment provide Fetch unit can complete within a clock cycle present instruction acquisition and next to instruction fetch address
Generation, realize continuously instruction fetch, improve processor performance.
Branch prediction module is using a kind of simple, flexible static prediction method: the conditional branching jumped backward is referred to
Order is predicted as really jumping, and the conditional branch instructions jumped forward is predicted as not jumping, specific main points are as follows:
1, jump instruction direct for conditional, the conditional branch instructions such as such as BEQ, BNE use above-mentioned static prediction method
(jumping backward, be predicted as needing to jump, be otherwise predicted as not needing to jump);For jump target addresses, using its PC and
The offset that immediate indicates is added to obtain its jump target addresses;
2, for unconditional direct jump instruction, such as jal is instructed, and since it is bound to jump, there is no need to predict that it is jumped
Direction;For jump target addresses, it is added to obtain its jump target addresses with the offset that immediate indicates using its PC.
3, for unconditional indirect jump instruction, such as jalr is instructed, and since it is bound to jump, there is no need to predict its jump
Turn direction;For its jump target addresses, operation of the base address needed for calculating jump target addresses from its rs1 index
Number, needs to read from general register group, and be also possible to and be carrying out the instruction executed in unit and form RAW data phase
Guan Xing;The utility model is different according to its rsl index and takes different schemes: if it is constant deposit that call number is corresponding
Device directly uses the constant, without reading from related register;If corresponding call number is common link register, will
The register direct cable takes out, and generates the data dependence of read-after-write in order to prevent, needs to determine to flow in the second level
Writeback unit in waterline does not carry out written-back operation to the register, specific as follows:
If 31, the call number of rs1 is x0, it is (fixed according to RISC-V framework that the constant 0 that then be used directly carries out base address calculating
Adopted x0 indicates constant 0), without being read from Regfile.
If 32, the call number of rs1 is x1, refer to since x1 is frequently utilized for link register as function return jump
It enables, by x1, direct cable is taken out from the Regfile in execution unit, does not need the Read Port for occupying Regfile;For
It prevents from being carrying out the instruction executed in unit and needs to write back link register to cause RAW data dependence, branch is pre-
Module is surveyed to need to determine currently whether there is instruction write-back link register;
If 3, the call number of rs1 is to need in addition to other of x0 and x1 register (abbreviation xn) using Regfile's
Read Port reads out xn from Regfile, and needs to determine whether current Read Port is idle and resource is not present
Conflict;Meanwhile being carrying out the instruction executed in unit in order to prevent and need to write back xn and causing RAW data dependence, branch is pre-
Module is surveyed to need to determine currently whether there is instruction write-back Regfile.
Execution unit includes decoding module, sends module, instruction trace module, one-cycle instruction computing module, long period
Ordering calculation module and delivery module;
The first end of decoding module is connected with the second end of command register, and the second of second end and the second program counter
End is connected, and third end is connected with the first end of module is sent;Send first end phase of the second end of module with instruction trace module
Even, third end is connected with the first end of one-cycle instruction computing module, the first end at the 4th end and long period ordering calculation module
It is connected, the 5th end is connected with the first end of memory access unit;The second end of instruction trace module is connected with the first end of writeback unit;
The second end of one-cycle instruction computing module is connected with the second end of writeback unit, the second end phase at third end and memory access unit
Even, the 4th end is connected with the first end of delivery module;The second end of long period ordering calculation module and the third end of writeback unit
It is connected;The second end of delivery module is connected with the 4th end of writeback unit, and third end is connected with the 6th end of PC generation module;
Decoding module is for decoding to obtain operand register the present instruction of acquisition and current instruction address
Index;And for obtaining corresponding operation data from Read-Regfile according to operand register index;
Send module for being sent the operation data that decoding module obtains to different arithmetic elements according to instruction type
It executes;Wherein, one-cycle instruction computing module is mainly used for the operation and execution of one-cycle instruction, long period ordering calculation module
It is mainly used for the operation and execution of long period instruction;One-cycle instruction and the implementing result of long period instruction pass through writeback unit
Write back Write-Regfile;
Delivery module is used to the calculated result of one-cycle instruction computing module consigning to PC generation module;Delivery module packet
Include abnormal judging submodule and branch prediction analyzing sub-module;
The first end of the first end of branch prediction analyzing sub-module and abnormal judging submodule with one-cycle instruction operation
4th end of module is connected, and the second end of the second end of branch prediction analyzing sub-module and abnormal judging submodule is generated with PC
6th end of module is connected;The third end of abnormal judging submodule is connected with the 4th end of writeback unit;
Branch prediction analyzing sub-module is used to judge PC generation module according to the calculated result of one-cycle instruction computing module
Whether next generated be correct to instruction fetch address, if so, not dealing with;If it is not, then removing wrong address and generating new
Next to instruction fetch address and be fed back to PC generation module;Abnormal judging submodule according to one-cycle instruction for transporting
The calculated result for calculating module judges whether present instruction occurs mistake in the process of implementation, if it is not, not dealing with then;If so,
It removes current instruction address and generates new address and be fed back to PC generation module;The new address that PC generation module will acquire
It is sent to ITCM, Fetch unit instruction fetch and is sent to execution unit and carries out decoding execution from ITCM again.
Since mistake may occur in the process of implementation for the instruction of part long period, so writeback unit needs and abnormal judgement
Submodule carries out interface triggering exception, if producing exception, the implementing result of long period instruction does not write back Write-
Regfile。
Module is sent to need whether to check it when every instruction is sent based on the micro-architecture sent in order
It is executed but not yet there are data dependences between the instruction that writes back with sending before;Data dependence is divided into writeafterread (Write-
After-Read;WAR), read-after-write (Read-After-Write;) and write after write (Write-After-Write RAW;WAW) several
Kind;
1, WAR correlation: since pipeline organization provided by the utility model is suitable for based in order sending, in order
The processor of the micro-architecture write back just has read source operand from general register group when sending in instruction, therefore
The instruction write-back Write-Regfile of subsequent execution " operate " there is no fear of occurring " instruction that preamble executes is from Read-
Before read operands in Regfile ", therefore it there is no fear of data collision caused by WAR correlation occurs.
2, RAW correlation: the instruction sent is in the second level of assembly line, it is assumed that the instruction sent before is (referred to as
Preamble instruction) it is one-cycle instruction (second level for being also at assembly line writes back), then preamble one-cycle instruction, which has been completed, holds
It goes and the instruction for having resulted back into Write-Regfile, therefore having sent can not generate and preamble one-cycle instruction
RAW correlation caused by data collision;It is assumed that preamble instruction is long period instruction, since long period instruction needs are multiple
Period could write-back result, therefore the instruction sent is possible to generate the RAW correlation that instructs with preamble long period.
3, WAW correlation: the instruction sent is in the second level of assembly line, it is assumed that preamble instruction is to refer to the monocycle
It enabling, then preamble one-cycle instruction, which has been completed, executes and has resulted back into Write-Regfile, therefore sending
Instruction can not generate data collision caused by the WAW correlation with preamble one-cycle instruction;It is assumed that preamble instruction is long
Cycles per instruction, due to long period instruction need multiple periods could write-back result, the instruction sent is possible to generate
With the WAW correlation of preamble long period instruction.
To sum up, in pipeline organization provided by the utility model, " instruction sent " is only possible to and " has not carried out
RAW and WAW correlation is generated between the long period instruction finished ".
In order to detect RAW the and WAW correlation between the instruction currently sent and the instruction of preamble long period, this reality
An instruction trace module is provided in execution unit with novel, which is sent away for storing
And the long period command information not yet write back, information include but is not limited to the long period instruction source operand register index and
Result register index;
The instruction trace module preferably uses the FIFO of first in, first out mechanism to realize;Module is sent often to send a long period
Instruction is then long period instruction one list item (Entry) of distribution in instruction trace module, for storing long period instruction
Source operand register index and result register index;Writeback unit writes back the implementing result that the long period instructs
After Write-Regfile, which is then instructed corresponding list item to remove by instruction trace module, therefore instruction trace module
Middle storage is the long period command information for being sent away and not yet writing back;Send module carry out instruction send when,
By each list item in the source operand register index and result register index and instruction trace module of currently sending instruction
Information is compared, so that judging whether present instruction instructs with the long period for being sent out and not yet to write back generates RAW
With WAW correlation, data dependence is such as not present, then normally sends;Such as there is data dependence, then pause is sent, Zhi Daoxiang
Customs director's cycles per instruction, which is finished to release, just to be continued to send after data dependence.The depth of FIFO is defaulted as two tables
, the information of two long periods instruction can be stored simultaneously;List item number is preferentially no more than four, otherwise will reduce processor
The speed of service.
Pipeline organization provided by the utility model conflicts for caused by data dependence, using the side of obstruction assembly line
Method, and there is no directly quickly bypassing the result of long period instruction to subsequent wait send instruction, reduce the function of processor
Consumption and area.
Part long period is instructed, such as Load and Store instruction and " A " extended instruction, needs to use memory access list
The memory access function of member;Above-metioned instruction is sent module to be sent to after one-cycle instruction computing module, one-cycle instruction computing module
Memory reference address is generated through operation and sends it to memory access unit, control mould of the memory access unit as memory access
Block judges to obtain corresponding instruction from command storage unit part by address, or obtains corresponding data from data storage part.
Writeback unit includes first writing back module and second and writing back module, and first writes back the first end of module and long period refers to
The second end of computing module is enabled to be connected, second end is connected with the second end of instruction trace module, and third end and second writes back module
First end be connected, the 4th end is connected with the third end of memory access unit;Second writes back the second end of module and one-cycle instruction fortune
The second end for calculating module is connected, and third end is connected with the third end of abnormal judging submodule;
First, which writes back module, is mainly used for arbitrating writing back for each long period instruction execution result, as shown in figure 4, long period refers to
It enables the operation result after long period ordering calculation module or memory access cell processing initially enter first and writes back module;In addition, the
One operation result for writing back the received long period instruction of module is also possible to from multiplier-divider, FPU and EAI coprocessor etc.;
When these long period instruction write-backs, sequence theoretically is sent according to it without stringent, it is only necessary to occur conflicting in register
Followed when situation and send sequence, remaining time can random ordering write back.But in order to realize the succinct of hardware, the present embodiment choosing
It selects and strictly sends sequence to carry out writing back for its operation result according to long period instruction;Due to different long period instruction executions
Periodicity is different or even the execution cycle number of some long periods instruction is dynamic, therefore can not easily judge these
The precedence relationship of long period instruction, so needing the precedence relationship between pre-recorded these long periods instruction.
Instruction tracing module provided in this embodiment is to be used to record the information of long period instruction, and module is sent often to send one
A long period instruction can be then long period instruction one list item of distribution in instruction tracing module to record long period instruction
Information;Instruction label (the Instruction Tag that the FIFO pointer (Pointer) of this list item is instructed as the long period;
ITAG);Long period instruction carries always its corresponding ITAG after sending when its operation result is written back into;
Instruction trace module and first writes back the written-back operation that all long period instructions are completed in module cooperative cooperation, and first writes
Returning in the operation result of the received long period instruction of module includes that the long period instructs corresponding ITAG;Due to instruction trace module
It is the FIFO of a first in, first out, the read pointer (Read Pointer) of FIFO can be directed toward the table for entering instruction trace module at first
, first, which writes back module, is sent to second for the operation result that long period corresponding to the list item instructs and writes back module, meanwhile, refer to
Tracking module is enabled to instruct corresponding list item to delete the long period;First writes back module according to the direction sequence of instruction trace module
Determine long period ordering calculation result writes back sequence, it is ensured that it writes back sequence and sends sequence strict conformance.
Second, which writes back module, is mainly used for receiving the operation result for the one-cycle instruction that one-cycle instruction computing module is sent,
And the operation result of the long period instruction after first writes back module arbitration, and all instructions is carried out by the way of priority
The arbitration for writing back sequence, since the execution period of long period instruction is long, than the one-cycle instruction that is writing back in program flow
In in position earlier, so long period instruction writes back writing back with higher priority than one-cycle instruction.If
In the idling cycle of no long period instruction write-back, one-cycle instruction then can at will be write back;In later i.e. in program flow
(if without data dependence), therefore the one-cycle instruction of position can first write back than the long period instruction of position earlier
Pipeline organization provided by the embodiment of the utility model is provided simultaneously with the out-of-order ability write back.
Compared to existing processor pipeline structure, a kind of processor pipeline structure provided by the utility model is led to
It crosses and the internal structure of pipelined units at different levels is improved, the Partial Decode mould of combined logical structure is set in Fetch unit
Block, branch prediction module and PC generation module, can complete within a clock cycle present instruction acquisition and next wait take
The generation of IA realizes continuously instruction fetch, improves processor performance;Instruction trace is set in execution unit
Module solves the problems, such as data dependence using the method for obstruction assembly line, and the result without instructing long period is direct
Quickly bypass reduces the power consumption and area of processor to subsequent wait send instruction;Writeback unit is divided into first and writes back mould
Block and second writes back module, writes back strategy for the delivery of long period instruction by two-stage and writes back separation, even if so that performing
The instruction of multicycle long period, still will not block assembly line, subsequent one-cycle instruction is allowed still to be able to smoothly write back and hand over
It pays, improves processor performance;Low-power consumption, inexpensive small area can not be combined by solving existing processor pipeline structure
With high performance problem.
As it will be easily appreciated by one skilled in the art that the above is only the preferred embodiment of the utility model only, not
To limit the utility model, any modification made within the spirit and principle of the present invention, equivalent replacement and change
Into etc., it should be included within the scope of protection of this utility model.
Claims (3)
1. a kind of processor pipeline structure, which is characterized in that including the location of instruction, Fetch unit, execution unit, memory access
The first end of unit and writeback unit, the Fetch unit is connected with the location of instruction, and the first of second end and execution unit
End is connected;The second end of the execution unit is connected with the first end of writeback unit, the first end phase at third end and memory access unit
Even;The second end of the memory access unit is connected with the second end of writeback unit;
The writeback unit writes back module and second including first and writes back module;Described first writes back the first end and execution of module
The second end of unit is connected, and second end is connected with the second end of memory access unit, the first end phase that third end writes back module with second
Even;Described second writes back the second end of module is connected with the 4th end of execution unit.
2. processor pipeline structure as described in claim 1, which is characterized in that the Fetch unit includes the first program meter
Number device, the second program counter, PC generation module, Partial Decode module, branch prediction module and command register;
The first end of described instruction register is connected with the first end of the location of instruction, the first end of second end and execution unit
It is connected;The first end of the Partial Decode module is connected with the second end of the location of instruction, second end and branch prediction module
First end be connected, third end is connected with the first end of PC generation module;The second end and PC of the branch prediction module generate
The second end of module is connected;The third end of the PC generation module is connected with the first end of the first program counter, the 4th end with
The second end of first program counter is connected, and the 5th end is connected with the third end of the location of instruction, the 6th end and execution unit
Third end be connected;The third end of first program counter is connected with the first end of the second program counter;Described second
The second end of program counter is connected with the third end of execution unit.
3. processor pipeline structure as described in claim 1, which is characterized in that described instruction storage unit is tight using instruction
Coupled memory is realized.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201820352162 | 2018-03-14 | ||
CN2018203521625 | 2018-03-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN208580395U true CN208580395U (en) | 2019-03-05 |
Family
ID=65504537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201820548377.4U Active CN208580395U (en) | 2018-03-14 | 2018-04-16 | A kind of processor pipeline structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN208580395U (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108287730A (en) * | 2018-03-14 | 2018-07-17 | 武汉市聚芯微电子有限责任公司 | A kind of processor pipeline structure |
WO2021243490A1 (en) * | 2020-05-30 | 2021-12-09 | 华为技术有限公司 | Processor, processing method, and related device |
-
2018
- 2018-04-16 CN CN201820548377.4U patent/CN208580395U/en active Active
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108287730A (en) * | 2018-03-14 | 2018-07-17 | 武汉市聚芯微电子有限责任公司 | A kind of processor pipeline structure |
CN108287730B (en) * | 2018-03-14 | 2023-12-29 | 武汉市聚芯微电子有限责任公司 | Processor pipeline device |
WO2021243490A1 (en) * | 2020-05-30 | 2021-12-09 | 华为技术有限公司 | Processor, processing method, and related device |
US12086592B2 (en) | 2020-05-30 | 2024-09-10 | Huawei Technologies Co., Ltd. | Processor, processing method, and related device for accelerating graph calculation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108287730A (en) | A kind of processor pipeline structure | |
Reinman et al. | Fetch directed instruction prefetching | |
US7487340B2 (en) | Local and global branch prediction information storage | |
CN101201734B (en) | Method and device for predecoding executive instruction | |
CN104424158A (en) | General unit-based high-performance processor system and method | |
CN100559343C (en) | The method of the instruction that pre decoding is used to carry out and device | |
US20070288733A1 (en) | Early Conditional Branch Resolution | |
US12067396B2 (en) | Variable latency instructions | |
CN102855121B (en) | Branching processing method and system | |
CN103257849A (en) | Program execution control device | |
JP2005182825A5 (en) | ||
CN101495962A (en) | Method and apparatus for prefetching non-sequential instruction addresses | |
CN101971140A (en) | System and method for performing locked operations | |
US20140129805A1 (en) | Execution pipeline power reduction | |
US20070288732A1 (en) | Hybrid Branch Prediction Scheme | |
US20250117247A1 (en) | Entering protected pipeline mode without annulling pending instructions | |
US20070288731A1 (en) | Dual Path Issue for Conditional Branch Instructions | |
CN208580395U (en) | A kind of processor pipeline structure | |
Hayenga et al. | Revolver: Processor architecture for power efficient loop execution | |
Ye et al. | A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction | |
US20070288734A1 (en) | Double-Width Instruction Queue for Instruction Execution | |
CN110109705A (en) | A kind of superscalar processor branch prediction method for supporting embedded edge calculations | |
CN1266592C (en) | Dynamic VLIW command dispatching method according to determination delay | |
Hilton et al. | BOLT: Energy-efficient out-of-order latency-tolerant execution | |
US20210326136A1 (en) | Entering protected pipeline mode with clearing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GR01 | Patent grant | ||
GR01 | Patent grant |