0 ratings0% found this document useful (0 votes) 29 views15 pagesUnit-4 Processor in DPCO
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Unita
Instruction Execution ~ Building a Data Path — Designing a
Control Unit — Hardwired Control, Microprogrammed Control —
Pipelining — Data Hazard — Control Hazards.
* Please watch the videos before referring the notes
1. _ Instruction Execution
Steps in detail:
All instructions start by using the program counter to supply the instruction address to
the instruction memory.
After the instruction is fetched, the register operands used by an instruction are
specified by fields of that instruction.
& Once the register operands have been fetched, all the instruction classes, except jump,
use the ALU after reading the registers.
> Memory reference instructions (load or store) use the ALU for an address
calculation,
> Arithmetic Logical instructions use the ALU for the operation execution,
> Branches use the ALU for comparison.
The second input to the ALU can come from a register or the immediate field of the
instruction.
© After using the ALU, the actions required to complete various instruction classes are
not same.
> Ifthe operation is a memory reference instruction a load or store, the ALU
result is used as an address to either store a value from the registers or load a
value fiom memory into the registers. The result from the ALU or memory is,
written back into the register file.
> If the instruction is an arithmetic-logical instruction, the result from the ALU.
must be written to a register.
> Branches require the use of the ALU output to determine the next instruction
address, which comes either from the ALU (where the PC and branch off set
are summed) or from an adder that increments the current PC by 4.
Main 5 steps
1. Fetch an instruction and increment the program counter.
Decode the instruction and read registers from the register file.
Perform an ALU operation.
Read or write memory data if the instruction involves a memory operand.
Write the result into the destination register, if needed.
yawns
Load Instruction
Eg. Load RS, X(R7)
Steps are as follows:
1. Fetch the instruction from the memory.
2. Increment the program counter.3. Decode the instruction to determine the operation to be performed.
4. Read register R7.
5. Add the immediate value X to the contents of R7.
6. Use the sum X + [R7]as the effective address of the source operand, and read the
contents of that location in the memory.
7. Load the data received from the memory into the destination register, RS.
‘© Depending on how the hardware is organized, some of these actions can be performed
at the same time.
Arithmetic and Logic Instruction
‘© There are either two source registers, or a source register and an immediate source
operand.
© Noaccess to memory operands is required.
Eg. Add R3, R4, RS
Steps as follows
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read registers R4 and RS.
3. Compute the sum [R4] + [R5].
4. No action.
5. Load the result into the destination register, R3.
Store Instruction
Store R6, X(R8)
Steps as follows:
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read registers R6 and R8.
3. Compute the effective address X + [R8].
4. Store the contents of register R6 into memory location X + [RS].
5. No action,IL. Building a Datapath - Diagram is Mandatory ( Write individual blocks separately
first, then at last draw this final diagram. Individual blocks I have mentioned in the
video)
Datapath
© A datapath is a collection of functional units such as arithmetic logic units or
multipliers that perform data processing operations, registers, and buses.Along with
the control unit it composes the central processing unit (CPU).
© A larger datapath can be made by joining more than one datapaths using multiplexers.
1, Program Counter(PC)
A program counter is a register in a computer processor that contains the address
(location) of the instruction being executed at the current time. As each instruction gets
fetched, the program counter increases its stored value by 1
2. Adder
Used to increment the PC to the address of the next instruction.
It is built from the ALU.
3. Instruction Memory
a, A memory unit to store the instructions of a program and supply instructions
given an address.4. Registers
% The processor’s 32 general-purpose registers are stored in a structure called a re
file, A register file is a collection of registers in which any register can be read or
written by specifying the number of the register in the file.
% — Theregister file contains the register state of the computer
— AnALU is used to operate on the values read from the
ogisters.
5. Processing of R- format instruction in ALU:
add $tl, $t2, $t3,
— R-format instructions have three register operands, 0 we will need to read two data
Example
words from the register file and write one data word into the register file for each
instruction,
4 Foreach data word to be read from the registers, we need an input to the re
that specifies the register number to be read and an output from the register file that,
will carry the value that has been read from the registers,
‘The two values read are added using an ALU.
r file
% — Towrite a data word, we will need two inputs: one to specify the register number to
be written and one to supply the data to be written into the register.
6. Processing of Load/Store Instruction:
Example Iw Stl offet_value($t2)
‘sw Stl ,offsct_value ($12)
1. Sign Eatend- Convert the 16-bit offset field in the instruction toa 32-bit signed value.
2. Data Memory - The memory unit is a state element with inputs for the address and the
write data, and a single output for the read result. There are separate read and write
controls, although only one of these may be asserted on any given clock.
7. Processing of Jump Instructions
Eg. beq Stl,St2,offset
Explanation of example :
The beq instruction has three operands, two registers that are compared for equality.
If contents of tl = contents of 2 — Compute target using offset and take branch. (
ALU used to check equality- If zero flag is set means t1==t2)
Else - Proceed with next instruction1. Separate adder - Used for computing the branch target address.
2. Shift left - Used to add two zeroes to the low-order end of the sign-extended offset
field.
Multiplexer - It is mainly used to select the circuit combination as per the nature of
instruction,
Control Signals - I have not covered this in video. But you can read if required.
Ea ie etc Effect when asserted
RegDst The register destination number for the —_| The register destination number for the Write
Wirite register comes from the rtfield | register comes from the rd feld (bits 15:11).
(bits 20:16).
RegWrite None. ‘The register on the Write register input is
‘written with the value on the Write data input.
‘ALUStc The second ALU operand comes from the | The second ALU operand isthe sign-
second register fle output (Reac data 2). | extended, lower 16 bits of the instruction.
PCSre The PCs replaced by the output of the | The PC is replaced by the output ofthe adder
adder that computes the value of PC+ 4. | that computes the branch target
MemRead None. Data memory contents designated by the
address input are put on the Read dala output.
MemWrite | None. Data memory contents designated by the
address input are replaced by the vaiue on
the Write data input.
MemtoReg | The value fed tothe register Write data | The value fed to the register Write data input
input comes from the ALU. comes from the data memory.TIL. Control Unit
The setting of the control signals depends on.
+ Contents of the step counter
+ Contents of the instruction register
+ The result of a computation or a comparison operation
+ External input signals, such as interrupt requests
Hardwired Control Unit
© It isa method of generating control signals with the help of Finite State Machines
(FSM). It’s made in the form of a sequential logic circuit by physically connecting
components such as flip-flops, gates, and drums that result in the finished circuit. As a
result, it's known as a hardwired controller.
© Instruction register is a type of processor register used to contain an instruction that
is currently in execution. It generates the OP-code bits respective of the operation as
well as the addressing mode of operands.
© The instruction decoder decodes the opcode. Now on the basis of the addressing‘mode of instruction and operation which exists in the instruction register, the
instruction decoder sets the corresponding Instruction signal INS, to 1.
‘© Step Counter - specifies the current step of instruction execution. It contains the
signals from TI.,...., TS. Now on the basis of the step which contains the instruction,
one of the signals of a step counter will be set from TI to TS to 1.
© — Clock - The one-clock cycle of the clock will be completed for each step. For
example, suppose that if the stop counter sets T3 to 1, then after completing one clock
cycle, the step counter will set T4 to 1.
© Counter Enable will "disable" the Step Counter so that it will stop till current step of
execution is complete,then increment to the next step signal.
© Condition Signals - There are various conditions in which the signals are gencrated
with the help of control signals that can be less than, greater than, less than equal,
greater than equal, and many more.
© The external input is the last one. It is used to tell the Control Signal Generator
about the interrupts, which will affect the execution of an instruction.
‘Microprogrammed Control Unit
© Acontrol unit whose binary control values are saved as words in memory is called a
microprogrammed control unit.
1. Control Word: A control word is a word whose individual bits represent various control signals.
2. Micro-routine: A sequence of control words corresponding to the control sequence of a
machine instruction constitutes the micro-routine for that instruction.
3. Micro-instruction: Individual control words in this micro-routine are referred to.as
microinstructions.
. Micro-program: A sequence of micro-instructions is called amicro-program, whichis stored
ina ROM or RAM called @ Control Memory (CM).
5. Control Store: the micro-routines for allinstructions in the instruction set of a computer are
stored ina special memory called the Control Store." 1 Microinstruction / 1 Control Word
oer) free 110101110001 1010)
raed
RM
EC ee SEECCe)
Pn)
eerie)
Ecco
The Contiol memory adress register species the address ofthe microcnstucton
The Control memory is assumed to be a ROM, within which all contol information is permanently stored.
‘The control register holds the microinstruction fetched from the memory.
The micro-insbuction contains a control word that specifies one or more micio-opeations forthe data processor.
While the micro-operations are being executed, the next adress is computed in the next address generator
circuit and then transferted into the control addiess register to read the next microinstruction.
The next address generator is oten referred to as a micro-program sequencer, as it determines the address
sequence thats read from control memory.
‘Speed ast ‘Slow
Cost of More Cheaper
Implementation
Flexibility Difficult to modify Flexible
Ability to handle Difficult Easier
complex instruction
Decoding Complex, Easy
Application isc cise
Instruction Set Size Small Large
Control Memory Absent PresentPipelining
© Pipelining is an implementation technique in which multiple instructions are
overlapped in execution.
© Pipelining is a process of arrangement of hardware elements of the CPU such that its
overall performance is increased.
‘© Simultaneous execution of more than one instruction takes place in a pipelined
processor.
Real Life Example - Explanation refer my video
opm 7 8 9 aac eacr aioe 12 4 2am
i
Task
order
*
soa
8 Goes
¢ aoe=8
>
Goei
—
i
.
FIGURE 4.25 The laundry analogy for pipelining. Ann, rian, Cathy, and Don each have dirty
clothes to be washed, dried, felded, and put away. The washer, dryer, “folder and “storer” each take 30
minutes for their task. Sequential laundry takes 8 hours for 4 loads of wash, while pipelined laundry takes
just 3.5 hours. We show the pipeline stage of different loads over time by showing copies ofthe four resources
‘on this two-dimensional time line, but we realy have just one of each resource
Design of a basic pipeline
© In apipelined processor, a pipeline has two ends, the input end and the output end.
Between these ends, there are multiple stages/segments such that the output of one
stage is connected to the input of the next stage and each stage performs a specific
operation.
‘© Interface registers arc used to hold the intermediate output between two stages.
These interface registers are also called latch or buffer.
© Alll the stages in the pipeline along with the interface registers are controlled by a
common clock.Diagrammatic Representation of No-Pipeline vs Pipeline
Pipelined Version
Inst,
Inst,
Inst,
Inst,
Inst,
Main points:
‘© Instruction pipelining is a technique that implements a form of parallelism called as.
instruction level parallelism within a single processor.
‘© Multiple instructions are executed parallely..
© Staging:
© The hardware of the CPU is split up into several functional units.
© Each functional unit performs a dedicated task.
© The number of functional units may vary from processor to processor.
© These functional units are called as stages of the pipeline.
© — Control unit manages all the stages using control signals.
°
°
There is a register associated with each stage that holds the data.
There is a global clock that synchronizes the working of all the stages.
At the beginning of each clock cycle, each stage takes the input from its
register.
Each stage then processes the data and feed its output to the register of the
next stage.
°Advantages vs Drawbacks
‘Aovantages of Pipelining
‘© Insiructon throughput increeses,
‘© Increase in the numberof pipeline stages increases
the numberof instuctions executed simultaneously
‘© Faster ALU can be designee when pipalining is used.
© Pipelined CPU's works at higher clock frequencies
than the RAW,
‘© Pipelining increases the overall performance ofthe
Py,
Disadvantages of Pipelining
Designing of the pipelined processor is
complex.
Instruction latency increases in pipelined
processors,
The throughput of pipelined processor
is dificuttto predict.
The longer the pipeline, worse the
problem of hazard for branch
instructions,Hazards
There are situations in pipelining when the next instruction cannot execute in the following
clock cycle. These events are called hazards, and there are three different types.
1) Structural Hazard
2) Data Hazard
3) Control Hazard
1. Structural Hazard
® _Itmeans that the hardware cannot support the combination of instructions that we
‘want to execute in the same clock cycle.
4 This dependency arises due to the resource contfict in the pipeline. A resource contct
isa situation when more than one instruction ties to access the same resource in the
same oycle. A resource can be a register, memory, or ALU.
Fem) 10 x
Ie(em) EX.
© Inthe above scenario, in cycle 4, instructions I1 and I4 are trying to access same
resource (Memory) which introduces a resource conflict.
© To avoid this problem, we have to keep the instruction on wait until the required
resource (memory in our case) becomes available. This wait will introduce stalls in
the pipeline as shown below:
IF(Mem) ID EX Mem WB
IF(Mem) ID EX Mem WB
IF(Mem) ID EX Mem WB2. Data Hazard
ata hazards occur when instructions that exhibit data dependence, modify data in different
sages ofa pipeline.
11: ADD RI, R2, R3
2: SUB RS, Ri, R2
© When the above instructions are executed in a pipelined processor, then data
dependency condition will occur, which means that 12 tries to read the data before [1
writes it, therefore, 12 incorrectly gets the old value from Il.
© Tominimize data dependency stalls in the pipeline, operand forwarding is used.
Operand Forwarding : In operand forwarding, we use the interface registers present
between the stages to hold intermediate output so that dependent instruction can access new
value from the interface register directly.
Program
execution
ei a 200400 6008001900
FIGURE 4.29 Graphical representation of forwarding. Tie connection shows the forwarding path
from the output ofthe EX stage of to the input of the EX stage fr replacing the valu from regisier
530 rad inthe second stage of sub.3.Control Hazard
‘This ype ofepandoncy acceding te tartar cf contol inenuctns sch az BRANCH, CALL MP, te
‘On many nstuctonarhtecuros, ho procastr wil not know to target address of thes insructons when it
needs lo inser the new nston ino te pipeline
(ue fi, unwanted isructons are fd toe pane,
Eq 00.
oe ee
10112 (IMP 250) (Gump adress inown attr 0 stage ony)
soz:
[Expected output It >a > Ble
MEM
1 10(Pc250) EX Mem wa
‘The output which we get II > 12 > 13 > BIL
So, the output sequence is not equal to the expected output, that means the pipeline is not
implemented correctly.
1. Using Stalls - To correct the above problem we need to stop the Instruction fetch until
‘we get target address of branch instruction. This can be implemented by introducing
delay slot until we get the target address.
Tae asad
—X MEM WB
IF ID (PC:250) EX Mem WB
Output Sequence: ly -> I -> Delay (Stall) -> Bh
2. Branch Prediction - There are 2 different types of prediction
1. Static
a. In this strategy branch can be predicted based on branch code types
statically. This means that the probability of branch with respect to a
particular branch type is used to predict the branch.
b. This branch strategy may not produce accurate results every time. One
improvement over branch stalling is to predict that the branch will not
be taken and thus continue execution down the sequential instruction
stream.
2. Dynamica. This strategy uses recent branch history during program execution to
predict whether or not the branch will be taken next time when it
occurs. It uses recent branch information to predict the next branch.
This technique is called dynamic branch prediction.
b. A branch prediction buffer or branch history table is a small memory
indexed by the lower portion of the address of the branch instruction.
The memory contains a bit that says whether the branch was recently
taken or not.
3. Delayed Branching
1) The slot directly after a delayed branch instruction, which in the MIPS architecture is
filled by an instruction that does not affect the branch.
2) An instruction that always executes after the branch in the branch delay slot.