Slide 3

Computer Architecture
Design, Analysis, Execution and Optimization of

Instructions
Reduced Instruction Set Computer (RISC): MIPS
Multi-cycle Datapath and Control unit
Book: COD by Patterson and Hennessy from Chapter 4 A
4
Objectives
• Design the processor such that
• Clock period (T) should be lesser than a single-
cycle processor (primary objective)
• Minimize the resource utilization (secondary
objective)
• Common case analysis (secondary objective)
4
Delay information using 65 nm
Element Parameter Delay (ps)
Register clk-to-Q Tpcq 30
Register setup Tsetup 20
Multiplexer Tmux 25
ALU TALU 200
Memory read Tmem 250
Register file read tRFread 150
Register file write tRWrite 100
Register file setup tRFsetup 20

Single-cycle datapath
4 0 Redundant
M
Adders &
U
X 25 Memory
+ 1
+ CLK <<
25 CLK 4 2
MemWrite
RegDst
ALUSr CLK MemtoReg
0 c
25:2
M Read Read
Read 1 Branch
registe data
U PC address
r1 1
Addres
X s
1
20:1
6 Read 150 Zero Data Read
Instructio 1
registe memo data
n 0 M
250 0 r2 200 ry
Instructio
M
250 25
U
Jump 25M Write 25 U ALU X
n Memory 15:1 U registe X Write 0
1 X r Read 1 data
1 Write
100data
31:28 data 2 S ALUControl
Sig W MemRead
25:0 15: Result RegWrite ALUDecod
n
0 er
Ext
27:0 << n.
2
5:0 ALUOp
L
W
CLK
4 Performance analysis
• CPI = 1, for single-cycle implementation
• lw-instruction decides Critical path (Tc)
•
• Register read takes longer time than mux selection
4 Performance analysis
• The execution time of a program is a metric
• #instructions or length of a program depends on ISA

• How did we measure the clock period?
• Which instruction was dominating the clock period?
• How can we reduce the clock period or improve the frequency
(f=1/T)?
Improvement on clock
4
periodSrc Dstn
How about LW?
Combinati
in D Q onal Logic out
D Q
CLK
Clk period = T, f =1/T
Src-2
Src-1 Dstn- Dstn-
Comb. 1 Comb. 2
Logic- Logic- out
in D Q D Q D Q
1 2
CLK
T =Max{ (src-1, dstn-1), (src-2, dstn- This is not the pipeline
2) } Clk period = T/2, f =
structure
Combinational Logic = Combinational 2/TLogic-1 U Combinational Logic-2
4 Placement of Registers/
Partition the logic
• How does one place the registers judiciously in the
datapath?
• T = Max{ (src-1, dstn-1), (src-2, dstn-2), …}
• What Max{ } is telling us?
• Each partition can take almost equal time for
computation
9 Problems of Single-cycle Datapath
•
Design
Single-cycle design works well but inefficient design
• Clock length (worst-case delay) is same for all instructions
• It is not a balanced design
• CPI is 1
• Use more resources:
• Adder
• Memory
• Necessity of balanced datapath design technique by focusing
on common-case design & analysis principle
10
Balanced datapath design: Multi-cycle
approach
• Instruction execution can be broken down to smaller steps
• Simple instruction can complete the execution earlier than the

complex instructions
• One can design the multi-cycle datapath as similar in single-

cycle
• Connecting architectural elements with the storage using
combinational logic
• Next, design the controller
Chapter 4 (4.5) COD by P&H

11
approach
• The key difference is

• Controller produces different signals on different
steps/states
• A finite state machine (FSM) approach
12
approach
• Combined the instruction and data memory

• Remove the redundant adders
• Incorporate the non-architectural state/storage elements
• Not visible to the programmer
Single-cycle datapath
13 0 Redundant
M
Adders &
U
X Memory
+ 1
+ CLK <<
CLK 4 2
MemWrite
RegDst
ALUSr CLK MemtoReg
0 c
25:2
M Read Read
Read 1 Branch
registe data
U PC address
r1 1
Addres
X 20:1 s
1 6 Read Zero Data Read
Instructio 1
registe memo data
n 0 M
0 r2 ry
M U
Jump Instructio M Write U ALU X
n Memory 15:1 U registe X Write 0
1 X r Read 1 data
1 Write data
31:28 data 2 S ALUControl
Sig W MemRead
25:0 15: Result RegWrite ALUDecod
n
0 er
Ext
27:0 << n.
2
5:0 ALUOp
L
W
CLK
14
approach
• Remove the redundant adders & memory

• Where do we place this operations?
• How does one control such operations?
Key points: Resource Sharing leads to Multi-Cycle Approach
15
• Datapath components are shared

• Functional units
• Memory
• Interconnect
• Balanced datapath design
Book: COD by Patterson and Hennessy from Chapter 4 A

Single memory unit, Fetch stage and datapath for reusing the ALU
16
Fetch stage and Non-architectural elements Instruction
register
17
IRWrite
PC+4
CLK CLK
Address
PC EN
Instruction &
EN Data Memory Zero
PCWrite
WriteData ALU
4
MemWrite ALUControl
ALUDecod
er
ALUOp
Instruction Reg.
1: IR = M[PC]; PC = PC
+4
(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
lw-instr. and non-architectural elements: A, Data and ALUOut-register
1: IR = M[PC]; PC = PC + 4
18 2: A = Reg[25:21];
3: ALUOut = A +
SignExtn(Imm)
4: MDR = M[ALUOut]
5: Reg[20:16] = MDR
lw-instr. and non-architectural elements: A, Data and ALUOut-register
1: IR = M[PC]; PC = PC + 4
19 2: A = Reg[25:21];
3: ALUOut = A +
IRWrite A-Reg. SignExtn(Imm)
lorD 4: MDR = M[ALUOut]
5: Reg[20:16] = MDR
CLK ALUOut-Reg.
CLK
CLK CLK
0
M Address
I 25:2
1
Read Read A CLK
P U R
EN registe data
C X r1 1
Instruction
EN 1 & data Read Zero
registe
ut
ALUO
Memory
CLK r2
WriteData 20:1
ALUResult
6 Write ALU
registe
M r Read
D Write data
MemWrite R data 2 ALUControl
Sig
15: RegWrite ALUDecod
n
0 er
Ext
n.
ALUOp
Data-Reg. /MDR

sw-instr and non-architectural elements: B-register
1: IR = M[PC]; PC = PC + 4
20 2: A = Reg[25:21];
B=Reg[20:16]
3: ALUOut = A +
SignExtn(Imm)
4: M[ALUOut] = B
sw-instr and non-architectural elements: B-register
21 1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
IRWrite B=Reg[20:16]
lorD
3: ALUOut = A +
SignExtn(Imm)
CLK CLK 4: M[ALUOut] = B
CLK
CLK
0
M I 25:2
Read Read A CLK
Address 1
P U R
EN register data
C X 1 1
Instruction
EN 1 & data 20:1
Read Zero
register 2 CLK
ut
ALUO
Memory 6
Write ALU
WriteData Read ALUResul
registe B t
r data
2
Write
MemWrite data ALUControl
Sig
RegWrite ALUDecod
n
er
Ext
15: n.
B-register 0 ALUOp

R-type instruction
22 1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
B=Reg[20:16]
3: ALUOut = A + B
4: Reg[15:11]=ALUOut
R-type instruction
23 1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
IRWrite B=Reg[20:16]
3: ALUOut = A + B
4: Reg[15:11]=ALUOut
CLK CLK
CLK
CLK
Address
I 25:2
1
Read Read A CLK
P R
EN register data
C 1 1
Instruction 20:1
EN & data Read Zero
6
register 2 CLK
ut
ALUO
Memory
15:1
PCWrite Write ALU
WriteData 1 Read ALUResul
registe B t
r data
2
Write
RegWrite ALUDecod
er
15:
0
5:0 ALUOp

Fetch stage and B(R)-type instruction
• BEQ $S1, $S2, offset //Jump to the offset no. of instr., when $S1 = $S2
24
• BNE $S1, $S2, offset //Jump to the offset no. of instr., when $S1 != $S2
op rs rt offset
6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)
BEQ R1, R2, offset; if true, PC =PC+4+offset

else PC = PC+4
1: IR[PC]; PC = PC + 4
2: ALUOut = PC +
Shift(Sign(offset))
3: PC = ALUOut if Zero == 1
//A-B
Fetch stage and B(R)-type instruction
25 op rs rt offset
26) 21) 16) 11) 6)
1: IR[PC]; PC = PC + 4
2: ALUOut = PC +
Shift(Sign(offset))
3: PC = ALUOut if Zero == 1
//A-B
Fetch stage and B(R)-type instructionBEQ R1, R2, offset; if true, PC =PC+4+offset
else PC = PC+4
26
IRWrite ALUSrcA
ALUSrcB1:0
CLK 0
CLK PCWrite
CLK M PCSrc
CLK U
X
Address
I 25:2
1
Read Read A 1
P register data Branch CLK 0
R
EN
1 1 M
C Instruction 20:1 U
EN & data Read 0 Zero
6 X
register 2 CLK 0
ut
ALUO
Memory 1
Write 0
ALU
WriteData Read 4 1 ALUResul
ALUOut
registe B t
r data
2
1
Write 1
Sig
15:0 RegWrite ALUDecod
n
<< er
Ext
1: IR[PC]; PC = PC + 4 n.
2
ALUOp
2: ALUOut = PC +
Shift(Sign(offset))
3: PC = ALUOut if Zero == 1 //A-B (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
I-type instruction: ADDI
op rs rd Immediate
27 6-bits(31- 5-bits(25- 5-bits(20- 5-bits(15- 5-bits(10- 6-bits (5-0)
26) 21) 16) 11) 6)
1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
3: ALUOut = A +
SignExtn(Imm)
4: Reg[20:16] = ALUOut
I-type instruction: ADDI
op rs rd Immediate
26) 21) 16) 11) 6)
IRWrite RegDst
1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21];
CLK 3: ALUOut = A +
CLK SignExtn(Imm)
CLK
CLK 4: Reg[20:16] = ALUOut
Address
I 25:2
1
Read Read A
P R
EN register data CLK
C 1 1
Instruction
EN & data Read Zero
register 2
ut
ALUO
Memory 20:1
6 Write ALU
WriteData Read ALUResul
ALUOut
registe t
r data
2
Write
Sig
RegWrite ALUDecod
n
er
Ext
15: n.
0 ALUOp

Fetch stage and Jump instruction
• J addrs //PC  PC[31:28]addrs[27:0]
29 op address
26) 21) 16) 11) 6)
1: IR = M[PC]; PC = PC + 4
2: PC = {PC[31:28],
LShift(Addr)}
Fetch stage and Jump instruction PC  PC+4[31:28] addrs[27:0]
Jump
30 31:0
0
27:0
31:0 IRWrite
1 31:28
Jump
CLK PCWrite
CLK PCSrc
Address
I
P R
EN 0
C Instruction
EN & data Zero
Memory 2
WriteData ALU
4 ALUResul
t
ALUControl
ALUDecod
er
1: IR = M[PC]; PC = PC + 4
2: A = Reg[25:21]; ALUOp
<<
B=Reg[20:16] 25: 2
3: PC = {PC[31:28], 0
LShift(Addr)} (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
Combined Datapath of all types of instructions
Jump
31
0
27:0 ALUSrcA
IRWrite RegDst
1 31:28 ALUSrcB1:0
lorD MemRead
CLK 0 Jump
CLK PCWrite
CLK M PCSrc
CLK U
0 X
25:2
M Read Read CLK
Address 1 1 Branch
P U EN register data 0
C X 1 1
Instruction 20:1 0 1
EN 1 & data Read Zero
6 0
20:1 register 2 CLK 2
Memory 0
CLK 6 0
Write ALU
WriteData 15:1 Read 4 1
1 1 registe ALUResul
r data 1 t
0
0 2
M Write 1
MemWrite U data
1
ALUControl
X Sig
1 RegWrite ALUDecod
n
<< er ALUOut
Ext
2
15: n.
0
5:0 ALUOp
MemtoReg <<
2
Control signals Truth
32 • IorD Table?
• Jump
• Memwrite
• MemRead
• IRWrite
• RegDst
• MemtoReg
• RegWrite
• ALUSrcA
• ALUSrcR1:0
• PCWrite 15 Control
signals
• Branch
• PCSrc
• ALUOp
How many bits are needed to
• ALUControl • States represent a state?
33
State table for CU
Multicycle model
34
Chapter 4 A of COD by P&H

Summery of Steps taken to execute
35
any instruction class
Chapter 4 of COD by P&H

Fetch stage
36
Mach Operation Control signals Next
ine state
state
T0 InsR  M[PC]; IorD=0, IRWrite=1, MemRead=1 T1
PC PC+4 ALUSrcA=0, ALUSrcB=01, ALUOp=00, PCSrc=00, PCWrite=1,
Jump=0
Decode stage
37
Machin Operation Control signals Next

e state state
T1 (PC+4) + ALUSrcA=0, ALUSrcB1:0 = 11, ALUOp=00 x
Lshift(SigExtn(offset))
A = Reg[25:21] B =
Reg[20:16]
Useful for BRZ

instruction
LW type instruction
38
Machin Operation Control signals Next

e state State
T2 A + sigEx(offset) ALUSrcA=1, ALUSrcB1:0 = 10, T3
ALUOp=00
T3 MDR M[A+sigEx(off)] IorD=1 , MemRead=1 T4
T4 RF[dest] MDR RegDst=0, MemtoReg=1, RegWrite=1 T0
SW type instruction
39
Machine Operation Control signals Nex

state t
stat
e
T2 A + sigEx(offset) ALUSrcA=1, ALUSrcB1:0 = 10, ALUOp=00 T5
T5 M[A+sigEx(offset)]  IorD=1, MemWrite=1 T0
B
R-type instruction
40
Mach Operation Control signals Next state

ine
state
T6 A Op B ALUSrcA=1, ALUSrcB1:0 = 00, T7
ALUOp=00
T7 RF[dstn]  A Op B RegDst=1, MemtoReg=0, RegWrite=1 T0
B-type instruction
41
Mach Operation Control signals Next State

ine
state
T1 (PC+4) + ALUSrcA=0, ALUSrcB1:0 = 11, ALUOp=00, PCSrc=01 T8
SigExtn(offset)
A = Reg[25:21]
B = Reg[20:16]
T8 A-B ALUSrcA=1, ALUSrcB1:0 = 00, ALUOp=01, Branch=1 T0
Decoding stage
ADDI instruction
42
Machi Operation Control signals Next state

ne
state
T2 A OP SigExtn(offset) ALUSrcA=1, ALUSrcB1:0 = 10, T9
ALUOp=00
T9 RF[Destn] A OP RegDst=0, MemtoReg=0 T0
SigExtn(offset)
Jump instruction
43
Machine Operation Control signals Next state

state
T10 A OP Jump=1, PCSrc=10 T0
SigExtn(offset)
Controller States’ information: Graphical Representation
44
Starting State
IF ID
(T0) (T1)
J
ADD LW ADDI BNE
SW
EXE EXE EXE EXE
(T6) (T2) ADDI (T8) (T10)
LW
SW
ADD
MEM MEM MEM MEM
(T7) (T3) (T5) (T9)
LW
WB
(T4)
45
Clock-cycle needed for the
instructions
Instructions Clock-
cycle
LW 5
SW 4
R-type 4
BEQ 3
ADDI 4
J 3
(Finite state) Control Unit
46
Book-P&H – ch-appx-D
47
Controller Design Strategies
• Microprogramed-based
• Store the truth or state-table in the
memory
• Hardwired-based
• Design the circuit for logical-expression
Microprogram based control unit
48
Controller’s Truth (state)

Table
CF: Control Field: 15 Signals or 17 bits
Book-Hamacher-ch5
P&H – ch-appx-C NA: Next Address: 4 bit
For better understanding, we have represented the value in NextAddress and the Address as decimal.
Microprogrammed Control
49 • Organization of microprogrammed control unit
External
Addrs
IR 1 0
MUX
Microprogram
sequencer: all units,
CMAR except control memory.
CF: Control Field
NA: Next Address

Control
Memory
M CF NA
M: Mode Book-Hamacher-ch5
P&H – ch-appx-C
Hardwired based control unit
50
Controller States’ information for Hardwired sequential CU
51 States are common in all
Starting State the instructions. Why?
Earlier, it was the unique
IF ID states
(T0) (T1) The counter generates
J the states and it is not
ADD LW ADDI BNE
SW loadable.
EXE EXE EXE EXE

(T2) (T2) ADDI (T3) (T2)
LW
SW
ADD
MEM MEM MEM MEM
(T3) (T3) (T3) (T3) What could be the
counter size?
LW
WB
(T4)
26) 21) 16) 11) 6)
op rs rt rd shamt funct
IorD
IDCD: 6:2^6 Jump
Memwrite
R-type/ADD MemRead
BEQ IRWrite
Control Unit (CU) RegDst
MemtoReg
T7 RegWrite
T0
ALUSrcA
SDCD: 3:2^3
ALUSrcB_1:0
PCWrite
Counter: CLR
INCR Branch
3 bit CLK Book-Hamacher-ch5
PCSrc
P&H –ALUOp
ch-appx-C
53
IorD = T0’ + T3.LW + T3.SW +
Jump = T2.J
Memwrite = T3.SW
MemRead=T0’+
IRWrite =
RegDst =
MemtoReg =
RegWrite =
ALUSrcA =
ALUSrcB_1:0 =
PCWrite =
Branch =
PCSrc =
ALUOp = {T0’,T0’}+{T1’,T1’}+{(T2.LW)’, (T2.LW)’}
INCR = T0 + T1 + T2.LW + T3.LW + T2.SW
CLR = T4.LW + T3.SW +
Logical Expression of Control Signals

Comparison between Single and Multi-
54
cycle datapath
• Single cycle
 No non-architectural component
 Every instruction takes one clock cycle (CPI=1)
• Multi-cycle
 Non-architectural elements/register
 Instruction, data (memory), A, B, ALU-output
 Instructions take different unit of clock cycle
 Average CPI (>1)
Comparison between Single and Multi-
55
cycle Control10unit
ALUOp Meaning
00 add
01 subtract
Single ns 10 Look at funct field
Cycle CLK 11 n/a
Instr. Jump RegDst RegWrite ALUSrc Branch ALUOp1 ALUOp0 MemRead MemWrite MemtoReg
R-type 0 1 1 0 0 1 0 0 0 0
lw 0 0 1 1 0 0 0 1 0 1
sw 0 x 0 1 0 0 0 0 1 x
addi 0 0 1 1 0 0 0 0 0 0
B-type 0 x 0 0 1 0 1 0 0 x
J-type 1 x 0 x x x x 0 0 x
5 ns
LW’s
Multi- FSM
Cycle Machin Operation Control signals Next
e state State
T2 A + sigEx(offset) ALUSrcA=1, ALUSrcB1:0 = 10, T3
ALUOp=00
T3 Data M[A+sigEx(off)] IorD=1 T4
T4 RF[dest] Data RegDst=0, MemtoReg=1, RegWrite=1 T0
56
Performance Analysis
• The program consists of approximately 25% loads,
10% stores, 11% branches, 2% jumps, and 52% R-
type instructions. Determine the average CPI for
this program.
57
• The program consists of approximately 25% loads,
10% stores, 11% branches, 2% jumps, and 52% R-
type instructions. Determine the average CPI for
this program.
• The average CPI is the sum over each instruction of the CPI for that
instruction multiplied by the fraction of the time that instruction is
used
• Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) =
4.12
• For Single-cycle approach, Avg. CPI is 1
• Each cycle involved one ALU operation, memory access, or register file
58
access
• Each instruction is using only one stage at any time
• Assumptions:
• the register file is faster than the memory and
• writing memory is faster than reading memory
• Datapath has two possible critical paths:
Begin Delay elements Setup

of each time
stage for
= clock-to-Q each
D-ff stage
• Each cycle involved one ALU operation, memory access, or register file
59
access
• Each instruction is using only one stage at any time
• Assumptions:
• the register file is faster than the memory and
• writing memory is faster than reading memory (mux’s complexity is
more)
• Datapath has two possible critical paths:
Path-1: + + ++
Path-2: + + +
Combined Datapath of all types of instructions Green line decides the Critical
paths
60 Path-1: + + ++
IRWrite RegDst ALUSrcA
ALUSrcB1:0
𝒕 𝒑𝒄 𝒒 CLK CLK 0
PCWrite
Jump
CLK 𝒕 𝒎𝒖𝒙 𝒕 𝒔𝒆𝒕𝒖𝒑 CLK M

U
PCSrc
0 I 25:2 X
Read Read
𝒕 M
P 𝒑𝒄 𝒒U
Address
𝒕 R 1
register data
A 1 Branch CLK
C X
𝒎 𝒆𝒎 EN
1 1 𝒕 𝒎𝒖𝒙 0
Instruction 20:1 0 A 1
1 Read Zero
EN & data
Memory
6
20:1
0 register 2 CLK
0 𝒕 𝑨𝑳𝑼 L
U 2
CLK 6 0 O
𝒕 𝒔𝒆𝒕𝒖𝒑 lorD WriteData 15:1

Write
registe Read 4 1 ALU u 𝒕 𝒎𝒖𝒙
1 1 B ALUResul t
r data 1 t
M 0 2 0
D M Write 1
MemWrite U data
1
R X
ALUControl
Sig
1 RegWrite ALUDecod
n
<< er ALUOut
Path-2: + + + 15:
Ext
n.
2
0
5:0 ALUOp
MemtoReg <<
2
61
• XYZ-organization is contemplating building Para Delay (ps)
the multi-cycle MIPS processor instead of the mete
r
single-cycle processor. For both designs, the
30
organization plans on using a 65-nm CMOS
250
manufacturing process. The organization has
20
determined that the logic elements have the
200
delays given in Table. Help the organization
25
compare each processor’s execution time for
20
a program with 100 billion instructions
Performance Analysis of Multi-Cycle
62 Implementation
• XYZ-organization is contemplating building the multi-cycle
MIPS processor instead of the single-cycle processor. For Para Delay (ps)
both designs, the organization plans on using a 65-nm CMOS mete
manufacturing process. The organization has determined r
that the logic elements have the delays given in Table. Help 30
the organization compare each processor’s execution time 250
for a program with 100 billion instructions
20
200
• Tc = 30 + 25 + 250 + 20 = 350 ps 25
• Execution time = 20
(100 * 109 instrs.) * (4.12 cycle/instrs.) * (350 * 10-12
s/cycle)
= 133.9 seconds
63
Performance Analysis: A Comparison
• For multi-cycle, Tc = 350 ps and CPI = 4.12 Para Delay (ps)
mete
• For single-cycle, Tc = 925 ps and CPI = 1 r
• For multi-cycle, execution time = 133.9 seconds 30

250
• For single-cycle, execution time = 92.5 seconds 20
• This example shows multi-cycle processor is slow 200
than the single-cycle processor; why is it so? 25
• Sequencing overhead: 30 (clk-Q) + 20 () 20
• Multi-cycle processor is less expensive

• It has 5-nonarchitectural elements
#define OPCODE 0b11111100000000000000000000000000
#define RS 0b00000011111000000000000000000000 Fetch-and-Execute Algorithm for
64
#define RT 0b00000000000111110000000000000000 MIPS Microprocessor (32 bit)
#define RD 0b00000000000000001111100000000000
#define SHIFT 0b00000000000000000000011111000000 ALU(Src1, Src2){
#define OFFSET 0b00000000000000001111111111111111 switch (ALUControl){
case B-type: ZERO = (Src1- Src2) == 0 ? 1: 0;
int PC, MM[1024], RF[32], ALUControl; bool ZERO; break;
case ADD: return (Src1 + Src2);
int IR, A, B, ALUOut, MDR; //non-architectural elements
…
Load(MM); }
Setwhile
PC with
(1){address 1st instruction
IR = MM[PC]; which is
ALUControl = stored in IMM;
0b0010;PC = ALU(PC, 4); }
A = RF[(IR & RS)>>21]; B = RF[(IR & RT) >>16];
switch((IR & OPCODE) >>26){
case R-type:
Set ALUControl; //ADD, SUB, AND, OR, etc
Need conversion for
ALUOut = ALU(A, B); RF[(IR & RD)>>11] = ALUOut; 16 bit offset to 32 bits
case LW-type:
ALUControl = 0b0010;
ALUOut = ALU(A, (IR & OFFSET)); MDR = MM[ALUOut]; RF[(IR & RT)>>16]= MDR;
case SW-type:
ALUControl = 0b0010;
ALUOut = ALU(A, (IR & OFFSET)); MM[ALUOut] = B;
case B-type: ALUControl = 0b0010;
ALUOut = ALU(PC, (IR & OFFSET) <<2);
ALUControl = 0b0110; ALU(A, B); Multi-Cycle Approach
IF (ZERO ==1) {PC = ALUOut;}
} (c) Kanchan Manna; BITS-Pilani, Goa Campus, India.
}
65
Summary
• Disadvantages of Single-cycle processor

• Necessity of balanced design approach and multi-cycle processor
• Datapath design of multi-cycle processor
• Design of Control unit for multi-cycle processor
• Performance analysis of multi-cycle processor
• Comparison between single-cycle and multi-cycle processor
• Disadvantages of multi-cycle processor

Slide 3

Uploaded by

Document Informationclick to expand document informationcomp arch 3

Document Informationclick to expand document information

Copyright:

Available Formats

Slide 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slide 3

Uploaded by

Copyright:

Available Formats

Computer Architecture

Design, Analysis, Execution and Optimization of

Element Parameter Delay (ps)

Register clk-to-Q Tpcq 30

Register setup Tsetup 20

ALU TALU 200

Memory read Tmem 250

Register file read tRFread 150

Register file write tRWrite 100

Register file setup tRFsetup 20

• #instructions or length of a program depends on ISA

• Simple instruction can complete the execution earlier than the

• One can design the multi-cycle datapath as similar in single-

Chapter 4 (4.5) COD by P&H

• The key difference is

• Combined the instruction and data memory

• Remove the redundant adders & memory

• Datapath components are shared

Book: COD by Patterson and Hennessy from Chapter 4 A

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.

BEQ R1, R2, offset; if true, PC =PC+4+offset

(c) Kanchan Manna; BITS-Pilani, Goa Campus, India.

Chapter 4 A of COD by P&H

Chapter 4 of COD by P&H

Machin Operation Control signals Next

Useful for BRZ

Machin Operation Control signals Next

Machine Operation Control signals Nex

Mach Operation Control signals Next state

Mach Operation Control signals Next State

Machi Operation Control signals Next state

Machine Operation Control signals Next state

Controller’s Truth (state)

CF: Control Field

NA: Next Address

EXE EXE EXE EXE

Logical Expression of Control Signals

Begin Delay elements Setup

CLK 𝒕 𝒎𝒖𝒙 𝒕 𝒔𝒆𝒕𝒖𝒑 CLK M

𝒕 𝒔𝒆𝒕𝒖𝒑 lorD WriteData 15:1

• For multi-cycle, execution time = 133.9 seconds 30

• Multi-cycle processor is less expensive

• Disadvantages of Single-cycle processor

You might also like