[go: up one dir, main page]

0% found this document useful (0 votes)
7 views65 pages

Slide 3

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 65

Computer Architecture

Design, Analysis, Execution and Optimization of


Instructions
Reduced Instruction Set Computer (RISC): MIPS
Multi-cycle Datapath and Control unit
Book: COD by Patterson and Hennessy from Chapter 4 A
4
Objectives
• Design the processor such that
• Clock period (T) should be lesser than a single-
cycle processor (primary objective)
• Minimize the resource utilization (secondary
objective)
• Common case analysis (secondary objective)
4
Delay information using 65 nm

Element Parameter Delay (ps)

Register clk-to-Q Tpcq 30

Register setup Tsetup 20

Multiplexer Tmux 25

ALU TALU 200

Memory read Tmem 250

Register file read tRFread 150

Register file write tRWrite 100

Register file setup tRFsetup 20


Single-cycle datapath

4 0 Redundant
M
Adders &
U
X 25 Memory
+ 1
+ CLK <<
25 CLK 4 2
MemWrite
RegDst
ALUSr CLK MemtoReg
0 c
25:2
M Read Read
Read 1 Branch
registe data
U PC address
r1 1
Addres
X s
1
20:1
6 Read 150 Zero Data Read
Instructio 1
registe memo data
n 0 M
250 0 r2 200 ry
Instructio
M
250 25
U
Jump 25M Write 25 U ALU X
n Memory 15:1 U registe X Write 0
1 X r Read 1 data
1 Write
100data
31:28 data 2 S ALUControl
Sig W MemRead
25:0 15: Result RegWrite ALUDecod
n
0 er
Ext
27:0 << n.
2
5:0 ALUOp
L
W

CLK
4 Performance analysis
• CPI = 1, for single-cycle implementation
• lw-instruction decides Critical path (Tc)

• Register read takes longer time than mux selection
4 Performance analysis
• The execution time of a program is a metric

• #instructions or length of a program depends on ISA


• How did we measure the clock period?
• Which instruction was dominating the clock period?
• How can we reduce the clock period or improve the frequency
(f=1/T)?
Improvement on clock
4
periodSrc Dstn
How about LW?

Combinati
in D Q onal Logic out
D Q

CLK
Clk period = T, f =1/T

Src-2
Src-1 Dstn- Dstn-
Comb. 1 Comb. 2
Logic- Logic- out
in D Q D Q D Q
1 2

CLK
T =Max{ (src-1, dstn-1), (src-2, dstn- This is not the pipeline
2) } Clk period = T/2, f =
structure
Combinational Logic = Combinational 2/TLogic-1 U Combinational Logic-2
4 Placement of Registers/
Partition the logic
• How does one place the registers judiciously in the
datapath?
• T = Max{ (src-1, dstn-1), (src-2, dstn-2), …}
• What Max{ } is telling us?
• Each partition can take almost equal time for
computation
9 Problems of Single-cycle Datapath

Design
Single-cycle design works well but inefficient design
• Clock length (worst-case delay) is same for all instructions
• It is not a balanced design
• CPI is 1
• Use more resources:
• Adder
• Memory
• Necessity of balanced datapath design technique by focusing
on common-case design & analysis principle
10
Balanced datapath design: Multi-cycle
approach
• Instruction execution can be broken down to smaller steps

• Simple instruction can complete the execution earlier than the


complex instructions

• One can design the multi-cycle datapath as similar in single-


cycle
• Connecting architectural elements with the storage using
combinational logic
• Next, design the controller

Chapter 4 (4.5) COD by P&H


11
Balanced datapath design: Multi-cycle
approach

• The key difference is


• Controller produces different signals on different
steps/states
• A finite state machine (FSM) approach
12
Balanced datapath design: Multi-cycle
approach

• Combined the instruction and data memory


• Remove the redundant adders
• Incorporate the non-architectural state/storage elements
• Not visible to the programmer
Single-cycle datapath

13 0 Redundant
M
Adders &
U
X Memory
+ 1
+ CLK <<
CLK 4 2
MemWrite
RegDst
ALUSr CLK MemtoReg
0 c
25:2
M Read Read
Read 1 Branch
registe data
U PC address
r1 1
Addres
X 20:1 s
1 6 Read Zero Data Read
Instructio 1
registe memo data
n 0 M
0 r2 ry
M U
Jump Instructio M Write U ALU X
n Memory 15:1 U registe X Write 0
1 X r Read 1 data
1 Write data
31:28 data 2 S ALUControl
Sig W MemRead
25:0 15: Result RegWrite ALUDecod
n
0 er
Ext
27:0 << n.
2
5:0 ALUOp
L
W

CLK
14
Balanced datapath design: Multi-cycle
approach

• Remove the redundant adders & memory


• Where do we place this operations?
• How does one control such operations?
Key points: Resource Sharing leads to Multi-Cycle Approach
15

• Datapath components are shared


• Functional units
• Memory
• Interconnect
• Balanced datapath design

Book: COD by Patterson and Hennessy from Chapter 4 A


Single memory unit, Fetch stage and datapath for reusing the ALU
16
Fetch stage and Non-architectural elements Instruction
register
17
IRWrite

PC+4
CLK CLK

Address
PC EN

Instruction &
EN Data Memory Zero

PCWrite
WriteData ALU
4

MemWrite ALUControl
ALUDecod
er

ALUOp
Instruction Reg.
1: IR = M[PC]; PC = PC
+4
(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
lw-instr. and non-architectural elements: A, Data and ALUOut-register
1: IR = M[PC]; PC = PC + 4
18 2: A = Reg[25:21];
3: ALUOut = A +
SignExtn(Imm)
4: MDR = M[ALUOut]
5: Reg[20:16] = MDR
lw-instr. and non-architectural elements: A, Data and ALUOut-register
1: IR = M[PC]; PC = PC + 4
19 2: A = Reg[25:21];
3: ALUOut = A +
IRWrite A-Reg. SignExtn(Imm)
lorD 4: MDR = M[ALUOut]
5: Reg[20:16] = MDR
CLK ALUOut-Reg.
CLK
CLK CLK
0
M Address
I 25:2
1
Read Read A CLK
P U R
EN registe data
C X r1 1
Instruction
EN 1 & data Read Zero
registe

ut
ALUO
Memory
CLK r2
WriteData 20:1
ALUResult
6 Write ALU
registe
M r Read
D Write data
MemWrite R data 2 ALUControl
Sig
15: RegWrite ALUDecod
n
0 er
Ext
n.
ALUOp
Data-Reg. /MDR

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


sw-instr and non-architectural elements: B-register
1: IR = M[PC]; PC = PC + 4
20 2: A = Reg[25:21];
B=Reg[20:16]
3: ALUOut = A +
SignExtn(Imm)
4: M[ALUOut] = B
sw-instr and non-architectural elements: B-register

21 1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
IRWrite B=Reg[20:16]
lorD
3: ALUOut = A +
SignExtn(Imm)
CLK CLK 4: M[ALUOut] = B
CLK
CLK
0
M I 25:2
Read Read A CLK
Address 1
P U R
EN register data
C X 1 1
Instruction
EN 1 & data 20:1
Read Zero
register 2 CLK

ut
ALUO
Memory 6
Write ALU
WriteData Read ALUResul
registe B t
r data
2
Write
MemWrite data ALUControl
Sig
RegWrite ALUDecod
n
er
Ext
15: n.
B-register 0 ALUOp

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


R-type instruction

22 1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
B=Reg[20:16]
3: ALUOut = A + B
4: Reg[15:11]=ALUOut
R-type instruction

23 1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
IRWrite B=Reg[20:16]
3: ALUOut = A + B
4: Reg[15:11]=ALUOut
CLK CLK
CLK
CLK

Address
I 25:2
1
Read Read A CLK
P R
EN register data
C 1 1
Instruction 20:1
EN & data Read Zero
6
register 2 CLK

ut
ALUO
Memory
15:1
PCWrite Write ALU
WriteData 1 Read ALUResul
registe B t
r data
2
Write
MemWrite data ALUControl
RegWrite ALUDecod
er
15:
0
5:0 ALUOp

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Fetch stage and B(R)-type instruction

• BEQ $S1, $S2, offset //Jump to the offset no. of instr., when $S1 = $S2
24
• BNE $S1, $S2, offset //Jump to the offset no. of instr., when $S1 != $S2
op rs rt offset
6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)

BEQ R1, R2, offset; if true, PC =PC+4+offset


else PC = PC+4
1: IR[PC]; PC = PC + 4
2: ALUOut = PC +
Shift(Sign(offset))
3: PC = ALUOut if Zero == 1
//A-B
Fetch stage and B(R)-type instruction

25 op rs rt offset
6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)

1: IR[PC]; PC = PC + 4
2: ALUOut = PC +
Shift(Sign(offset))
3: PC = ALUOut if Zero == 1
//A-B
Fetch stage and B(R)-type instructionBEQ R1, R2, offset; if true, PC =PC+4+offset
else PC = PC+4
26
IRWrite ALUSrcA
ALUSrcB1:0

CLK 0
CLK PCWrite
CLK M PCSrc
CLK U
X
Address
I 25:2
1
Read Read A 1
P register data Branch CLK 0
R
EN
1 1 M
C Instruction 20:1 U
EN & data Read 0 Zero
6 X
register 2 CLK 0

ut
ALUO
Memory 1
Write 0
ALU
WriteData Read 4 1 ALUResul
ALUOut
registe B t
r data
2
1
Write 1
MemWrite data ALUControl
Sig
15:0 RegWrite ALUDecod
n
<< er
Ext
1: IR[PC]; PC = PC + 4 n.
2
ALUOp
2: ALUOut = PC +
Shift(Sign(offset))
3: PC = ALUOut if Zero == 1 //A-B (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
I-type instruction: ADDI
op rs rd Immediate
27 6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)
1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
3: ALUOut = A +
SignExtn(Imm)
4: Reg[20:16] = ALUOut
I-type instruction: ADDI
op rs rd Immediate
28 6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)
IRWrite RegDst
1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
CLK 3: ALUOut = A +
CLK SignExtn(Imm)
CLK
CLK 4: Reg[20:16] = ALUOut

Address
I 25:2
1
Read Read A
P R
EN register data CLK
C 1 1
Instruction
EN & data Read Zero
register 2

ut
ALUO
Memory 20:1
6 Write ALU
WriteData Read ALUResul
ALUOut
registe t
r data
2
Write
MemWrite data ALUControl
Sig
RegWrite ALUDecod
n
er
Ext
15: n.
0 ALUOp

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.


Fetch stage and Jump instruction
• J addrs //PC  PC[31:28]addrs[27:0]
29 op address
6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)
1: IR = M[PC]; PC = PC + 4
2: PC = {PC[31:28],
LShift(Addr)}
Fetch stage and Jump instruction PC  PC+4[31:28] addrs[27:0]

Jump
30 31:0
0
27:0
31:0 IRWrite
1 31:28

Jump
CLK PCWrite
CLK PCSrc

Address
I
P R
EN 0
C Instruction
EN & data Zero
Memory 2

WriteData ALU
4 ALUResul
t

ALUControl
ALUDecod
er
1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21]; ALUOp
<<
B=Reg[20:16] 25: 2
3: PC = {PC[31:28], 0
LShift(Addr)} (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
Combined Datapath of all types of instructions

Jump
31
0
27:0 ALUSrcA
IRWrite RegDst
1 31:28 ALUSrcB1:0
lorD MemRead
CLK 0 Jump
CLK PCWrite
CLK M PCSrc
CLK U
0 X
25:2
M Read Read CLK
Address 1 1 Branch
P U EN register data 0
C X 1 1
Instruction 20:1 0 1
EN 1 & data Read Zero
6 0
20:1 register 2 CLK 2
Memory 0
CLK 6 0
Write ALU
WriteData 15:1 Read 4 1
1 1 registe ALUResul
r data 1 t
0
0 2
M Write 1
MemWrite U data
1
ALUControl
X Sig
1 RegWrite ALUDecod
n
<< er ALUOut
Ext
2
15: n.
0
5:0 ALUOp
MemtoReg <<
2
Control signals Truth
32 • IorD Table?
• Jump
• Memwrite
• MemRead
• IRWrite
• RegDst
• MemtoReg
• RegWrite
• ALUSrcA
• ALUSrcR1:0
• PCWrite 15 Control
signals
• Branch
• PCSrc
• ALUOp
How many bits are needed to
• ALUControl • States represent a state?
33
State table for CU
Multicycle model
34

Chapter 4 A of COD by P&H


Summery of Steps taken to execute
35
any instruction class

Chapter 4 of COD by P&H


Fetch stage
36
Mach Operation Control signals Next
ine state
state
T0 InsR  M[PC]; IorD=0, IRWrite=1, MemRead=1 T1
PC PC+4 ALUSrcA=0, ALUSrcB=01, ALUOp=00, PCSrc=00, PCWrite=1,
Jump=0
Decode stage
37

Machin Operation Control signals Next


e state state
T1 (PC+4) + ALUSrcA=0, ALUSrcB1:0 = 11, ALUOp=00 x
Lshift(SigExtn(offset))
A = Reg[25:21] B =
Reg[20:16]

Useful for BRZ


instruction
LW type instruction
38

Machin Operation Control signals Next


e state State
T2 A + sigEx(offset) ALUSrcA=1, ALUSrcB1:0 = 10, T3
ALUOp=00
T3 MDR M[A+sigEx(off)] IorD=1 , MemRead=1 T4
T4 RF[dest] MDR RegDst=0, MemtoReg=1, RegWrite=1 T0
SW type instruction
39

Machine Operation Control signals Nex


state t
stat
e
T2 A + sigEx(offset) ALUSrcA=1, ALUSrcB1:0 = 10, ALUOp=00 T5
T5 M[A+sigEx(offset)]  IorD=1, MemWrite=1 T0
B
R-type instruction
40

Mach Operation Control signals Next state


ine
state
T6 A Op B ALUSrcA=1, ALUSrcB1:0 = 00, T7
ALUOp=00
T7 RF[dstn]  A Op B RegDst=1, MemtoReg=0, RegWrite=1 T0
B-type instruction
41

Mach Operation Control signals Next State


ine
state
T1 (PC+4) + ALUSrcA=0, ALUSrcB1:0 = 11, ALUOp=00, PCSrc=01 T8
SigExtn(offset)
A = Reg[25:21]
B = Reg[20:16]
T8 A-B ALUSrcA=1, ALUSrcB1:0 = 00, ALUOp=01, Branch=1 T0

Decoding stage
ADDI instruction
42

Machi Operation Control signals Next state


ne
state
T2 A OP SigExtn(offset) ALUSrcA=1, ALUSrcB1:0 = 10, T9
ALUOp=00
T9 RF[Destn] A OP RegDst=0, MemtoReg=0 T0
SigExtn(offset)
Jump instruction
43

Machine Operation Control signals Next state


state
T10 A OP Jump=1, PCSrc=10 T0
SigExtn(offset)
Controller States’ information: Graphical Representation
44
Starting State

IF ID
(T0) (T1)
J
ADD LW ADDI BNE
SW
EXE EXE EXE EXE
(T6) (T2) ADDI (T8) (T10)
LW
SW
ADD
MEM MEM MEM MEM
(T7) (T3) (T5) (T9)
LW

WB
(T4)
45
Clock-cycle needed for the
instructions
Instructions Clock-
cycle
LW 5
SW 4
R-type 4
BEQ 3
ADDI 4
J 3
(Finite state) Control Unit
46

Book-P&H – ch-appx-D
47
Controller Design Strategies
• Microprogramed-based
• Store the truth or state-table in the
memory
• Hardwired-based
• Design the circuit for logical-expression
Microprogram based control unit
48

Controller’s Truth (state)


Table
CF: Control Field: 15 Signals or 17 bits
Book-Hamacher-ch5
P&H – ch-appx-C NA: Next Address: 4 bit

For better understanding, we have represented the value in NextAddress and the Address as decimal.
Microprogrammed Control
49 • Organization of microprogrammed control unit
External
Addrs
IR 1 0
MUX

Microprogram
sequencer: all units,
CMAR except control memory.

CF: Control Field

NA: Next Address


Control
Memory

M CF NA
M: Mode Book-Hamacher-ch5
P&H – ch-appx-C
Hardwired based control unit
50
Controller States’ information for Hardwired sequential CU
51 States are common in all
Starting State the instructions. Why?
Earlier, it was the unique
IF ID states
(T0) (T1) The counter generates
J the states and it is not
ADD LW ADDI BNE
SW loadable.

EXE EXE EXE EXE


(T2) (T2) ADDI (T3) (T2)
LW
SW
ADD
MEM MEM MEM MEM
(T3) (T3) (T3) (T3) What could be the
counter size?
LW

WB
(T4)
Hardwired based control unit
52 6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)
op rs rt rd shamt funct

IorD
IDCD: 6:2^6 Jump
Memwrite
R-type/ADD MemRead
BEQ IRWrite
Control Unit (CU) RegDst
MemtoReg
T7 RegWrite
T0
ALUSrcA
SDCD: 3:2^3
ALUSrcB_1:0

PCWrite
Counter: CLR
INCR Branch
3 bit CLK Book-Hamacher-ch5
PCSrc
P&H –ALUOp
ch-appx-C
Hardwired based control unit
53
IorD = T0’ + T3.LW + T3.SW +
Jump = T2.J
Memwrite = T3.SW
MemRead=T0’+
IRWrite =
RegDst =
MemtoReg =
RegWrite =
ALUSrcA =
ALUSrcB_1:0 =
PCWrite =
Branch =
PCSrc =
ALUOp = {T0’,T0’}+{T1’,T1’}+{(T2.LW)’, (T2.LW)’}
INCR = T0 + T1 + T2.LW + T3.LW + T2.SW
CLR = T4.LW + T3.SW +

Logical Expression of Control Signals


Comparison between Single and Multi-
54
cycle datapath
• Single cycle
 No non-architectural component
 Every instruction takes one clock cycle (CPI=1)
• Multi-cycle
 Non-architectural elements/register
 Instruction, data (memory), A, B, ALU-output
 Instructions take different unit of clock cycle
 Average CPI (>1)
Comparison between Single and Multi-
55
cycle Control10unit
ALUOp Meaning
00 add
01 subtract
Single ns 10 Look at funct field
Cycle CLK 11 n/a

Instr. Jump RegDst RegWrite ALUSrc Branch ALUOp1 ALUOp0 MemRead MemWrite MemtoReg

R-type 0 1 1 0 0 1 0 0 0 0
lw 0 0 1 1 0 0 0 1 0 1
sw 0 x 0 1 0 0 0 0 1 x
addi 0 0 1 1 0 0 0 0 0 0
B-type 0 x 0 0 1 0 1 0 0 x
J-type 1 x 0 x x x x 0 0 x
5 ns
LW’s
Multi- FSM
Cycle Machin Operation Control signals Next
e state State
T2 A + sigEx(offset) ALUSrcA=1, ALUSrcB1:0 = 10, T3
ALUOp=00
T3 Data M[A+sigEx(off)] IorD=1 T4
T4 RF[dest] Data RegDst=0, MemtoReg=1, RegWrite=1 T0
56
Performance Analysis
• The program consists of approximately 25% loads,
10% stores, 11% branches, 2% jumps, and 52% R-
type instructions. Determine the average CPI for
this program.
57
Performance Analysis
• The program consists of approximately 25% loads,
10% stores, 11% branches, 2% jumps, and 52% R-
type instructions. Determine the average CPI for
this program.
• The average CPI is the sum over each instruction of the CPI for that
instruction multiplied by the fraction of the time that instruction is
used
• Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) =
4.12
• For Single-cycle approach, Avg. CPI is 1
Performance Analysis
• Each cycle involved one ALU operation, memory access, or register file
58
access
• Each instruction is using only one stage at any time
• Assumptions:
• the register file is faster than the memory and
• writing memory is faster than reading memory
• Datapath has two possible critical paths:

Begin Delay elements Setup


of each time
stage for
= clock-to-Q each
D-ff stage
Performance Analysis
• Each cycle involved one ALU operation, memory access, or register file
59
access
• Each instruction is using only one stage at any time
• Assumptions:
• the register file is faster than the memory and
• writing memory is faster than reading memory (mux’s complexity is
more)
• Datapath has two possible critical paths:

Path-1: + + ++

Path-2: + + +
Combined Datapath of all types of instructions Green line decides the Critical
paths
60 Path-1: + + ++
IRWrite RegDst ALUSrcA
ALUSrcB1:0

𝒕 𝒑𝒄 𝒒 CLK CLK 0
PCWrite
Jump

CLK 𝒕 𝒎𝒖𝒙 𝒕 𝒔𝒆𝒕𝒖𝒑 CLK M


U
PCSrc
0 I 25:2 X
Read Read
𝒕 M
P 𝒑𝒄 𝒒U
Address
𝒕 R 1
register data
A 1 Branch CLK

C X
𝒎 𝒆𝒎 EN
1 1 𝒕 𝒎𝒖𝒙 0

Instruction 20:1 0 A 1
1 Read Zero
EN & data
Memory
6
20:1
0 register 2 CLK
0 𝒕 𝑨𝑳𝑼 L
U 2
CLK 6 0 O

𝒕 𝒔𝒆𝒕𝒖𝒑 lorD WriteData 15:1


Write
registe Read 4 1 ALU u 𝒕 𝒎𝒖𝒙
1 1 B ALUResul t
r data 1 t
M 0 2 0

D M Write 1
MemWrite U data
1
R X
ALUControl
Sig
1 RegWrite ALUDecod
n
<< er ALUOut
Path-2: + + + 15:
Ext
n.
2
0
5:0 ALUOp
MemtoReg <<
2
61
Performance Analysis
• XYZ-organization is contemplating building Para Delay (ps)
the multi-cycle MIPS processor instead of the mete
r
single-cycle processor. For both designs, the
30
organization plans on using a 65-nm CMOS
250
manufacturing process. The organization has
20
determined that the logic elements have the
200
delays given in Table. Help the organization
25
compare each processor’s execution time for
20
a program with 100 billion instructions
Performance Analysis of Multi-Cycle
62 Implementation
• XYZ-organization is contemplating building the multi-cycle
MIPS processor instead of the single-cycle processor. For Para Delay (ps)
both designs, the organization plans on using a 65-nm CMOS mete
manufacturing process. The organization has determined r
that the logic elements have the delays given in Table. Help 30
the organization compare each processor’s execution time 250
for a program with 100 billion instructions
20
200
• Tc = 30 + 25 + 250 + 20 = 350 ps 25
• Execution time = 20
(100 * 109 instrs.) * (4.12 cycle/instrs.) * (350 * 10-12
s/cycle)
= 133.9 seconds
63
Performance Analysis: A Comparison
• For multi-cycle, Tc = 350 ps and CPI = 4.12 Para Delay (ps)
mete
• For single-cycle, Tc = 925 ps and CPI = 1 r

• For multi-cycle, execution time = 133.9 seconds 30


250
• For single-cycle, execution time = 92.5 seconds 20
• This example shows multi-cycle processor is slow 200
than the single-cycle processor; why is it so? 25
• Sequencing overhead: 30 (clk-Q) + 20 () 20

• Multi-cycle processor is less expensive


• It has 5-nonarchitectural elements
#define OPCODE 0b11111100000000000000000000000000
#define RS 0b00000011111000000000000000000000 Fetch-and-Execute Algorithm for
64
#define RT 0b00000000000111110000000000000000 MIPS Microprocessor (32 bit)
#define RD 0b00000000000000001111100000000000
#define SHIFT 0b00000000000000000000011111000000 ALU(Src1, Src2){
#define OFFSET 0b00000000000000001111111111111111 switch (ALUControl){
case B-type: ZERO = (Src1- Src2) == 0 ? 1: 0;
int PC, MM[1024], RF[32], ALUControl; bool ZERO; break;
case ADD: return (Src1 + Src2);
int IR, A, B, ALUOut, MDR; //non-architectural elements

Load(MM); }
Setwhile
PC with
(1){address 1st instruction
IR = MM[PC]; which is
ALUControl = stored in IMM;
0b0010;PC = ALU(PC, 4); }
A = RF[(IR & RS)>>21]; B = RF[(IR & RT) >>16];
switch((IR & OPCODE) >>26){
case R-type:
Set ALUControl; //ADD, SUB, AND, OR, etc
Need conversion for
ALUOut = ALU(A, B); RF[(IR & RD)>>11] = ALUOut; 16 bit offset to 32 bits
case LW-type:
ALUControl = 0b0010;
ALUOut = ALU(A, (IR & OFFSET)); MDR = MM[ALUOut]; RF[(IR & RT)>>16]= MDR;
case SW-type:
ALUControl = 0b0010;
ALUOut = ALU(A, (IR & OFFSET)); MM[ALUOut] = B;
case B-type: ALUControl = 0b0010;
ALUOut = ALU(PC, (IR & OFFSET) <<2);
ALUControl = 0b0110; ALU(A, B); Multi-Cycle Approach
IF (ZERO ==1) {PC = ALUOut;}
} (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
}
65
Summary

• Disadvantages of Single-cycle processor


• Necessity of balanced design approach and multi-cycle processor
• Datapath design of multi-cycle processor
• Design of Control unit for multi-cycle processor
• Performance analysis of multi-cycle processor
• Comparison between single-cycle and multi-cycle processor
• Disadvantages of multi-cycle processor

You might also like