Computer
Architecture
CH4 Processor Microarchitecture (III)
Prof. Ren-Shuo Liu
NTHU EE
• Assume a 12-stage pipelined processor with 2 stall cycles
per instruction on average can achieve a 6 times higher
clock frequency than a 1-cycle processor does. What is the
speedup over the 1-cycle processor?
Instruction 1
Instruction 2
1 Instruction 3
Instruction 1
Instruction 2
Instruction 3
2
Outline
• Background
• Single-cycle design
• Pipelined design
• Pipeline concepts and MIPS's pipeline
• Cost and issues of pipelining
• Detailed pipelined datapath and control
• Trace the pipeline
• Dependencies, hazards, and forwarding
• Stalls and exceptions
3
Multi-Cycle Datapath
op, control op, control op, control
4
Consecutive Instruction Example
# int a;
# int b;
# int c;
… … # … …
80: lw t1, 0(gp) #
84: lw t2, 4(gp) #
88: nop #
92: add t3, t1, t2 # c = a + b;
96: sw t3, 8(gp) #
5
80: lw t1, 0(gp)
84: lw t2, 4(gp)
88: nop
First Cycle 92:
96:
add
sw
t3, t1, t2
t3, 8(gp)
80
op, control op, control op, control
6
80: lw t1, 0(gp)
84: lw t2, 4(gp)
88: nop
Second Cycle 92:
96:
add
sw
t3, t1, t2
t3, 8(gp)
84 lw t1, 0(gp)
op, control op, control op, control
7
80: lw t1, 0(gp)
84: lw t2, 4(gp)
88: nop
Third Cycle 92:
96:
add
sw
t3, t1, t2
t3, 8(gp)
88 lw t2, 4(gp) lw t1, 0(gp)
op, control op, control op, control
8
80: lw t1, 0(gp)
84: lw t2, 4(gp)
88: nop
Forth Cycle 92:
96:
add
sw
t3, t1, t2
t3, 8(gp)
92 nop lw t2, 4(gp) lw t1, 0(gp)
op, control op, control op, control
9
80: lw t1, 0(gp)
84: lw t2, 4(gp)
88: nop
Fifth Cycle 92:
96:
add
sw
t3, t1, t2
t3, 8(gp)
96 add t3, t1, t2 nop lw t2, 4(gp) lw t1, 0(gp)
op, control op, control op, control
10
80: lw t1, 0(gp)
84: lw t2, 4(gp)
88: nop
Sixth Cycle 92:
96:
add
sw
t3, t1, t2
t3, 8(gp)
100 sw t3, 8(gp) add t3, t1, t2 nop lw t2, 4(gp)
op, control op, control op, control
11
Simplified Pipeline Diagram
lw t1, 0(gp)
lw t2, 4(gp)
nop
add t3, t1, t2
sw t3, 8(gp)
12
Resource Usage Diagram
lw t1, 0(gp)
lw t2, 4(gp)
nop
add t3, t1, t2
sw t3, 8(gp)
13
Resource Usage Diagram
Same ALU at different cycles
Same DM at different cycles
lw t1, 0(gp)
3rd-cycle ALU produces
result for 4th-cycle DM
lw t2, 4(gp)
nop
add t3, t1, t2
sw t3, 8(gp)
14
Real Hardware Diagram
lw t1, 0(gp)
lw t2, 4(gp)
nop
add t3, t1, t2
sw t3, 8(gp)
15
Pipeline Control
• Control signals are derived from each instruction
• Just like single-cycle datapath
• Control signals are passed along just like the data
PC
16
•
WB
•
Pipeline Control •
M •
•
What are they?
•
EX
•
PC
17
Datapath without Control
18
Datapath with Control
19
Dependencies and Hazards
• Dependencies
inst''
• Operand of the current instruction
depends on a previous instruction's inst'
outcome
PC: inst
• Hazards
• Current PC cannot control and data
be executed right now dependencies
• Pipeline needs to postpone executing
current PC (i.e., stall) for some cycles
• Dependencies may result in hazards
• Pipeline hardware design can resolve
some hazards
• Compiler can avoid some hazards
hazards
structure
hazards 20
Anatomy of Data Dependencies
• Source operand, rd or rt
• Source instruction type,
e.g., R-type, lw, & branch
• Distance, e.g., 1~5
PC: • Destination instruction type,
e.g., R-type, lw, sw, & branch
• Destination operand, rd
21
Results
ALU lw branch
• Distance == 1
incurs a hazard
No hazards • Distance >= 2
no hazard
ALU lw sw branch
22
Detailed Analysis
• ALU - ALU
alu r3, rs1, rs2 IF ID EX r3 M WB
alu rd, r3, r4 IF ID EX M WB
23
Detailed Analysis
• ALU - LW
alu r3, rs1, rs2 IF ID EX r3 M WB
lw rd, imm(r3) IF ID EX M WB
24
Detailed Analysis
• ALU - SW
alu r3, rs1, rs2 IF ID EX r3 M WB
sw rs2, imm(r3) IF ID EX M WB
25
Detailed Analysis
• ALU - SW
alu r3, rs1, rs2 IF ID EX r3 M WB
sw r3, imm(rs1) IF ID EX M WB
26
Detailed Analysis
• ALU - Branch
alu r3, rs1, rs2 IF ID EX r3 M WB
IF ID EX M WB
beq r3, rs2, imm IF ID EX M WB
beq r3, rs2, imm IF ID EX M WB
27
Results
ALU lw branch
• Distance == 1 • Distance <= 2
incurs a hazard incurs a hazard
• Distance >= 2 • Distance >= 3
no hazard no hazard
ALU lw sw branch
• rs2 can be forwarded (no hazard)
• rs1 dependency causes no hazard if
distance >= 2 (similar to lw-R-type)
28
Analysis
• LW - ALU
lw r3, imm(rs1) IF ID EX M r3 WB
IF ID EX M WB
alu rd, r3, rs2 IF ID EX M WB
alu rd, r3, rs2 IF ID EX M WB
29
Analysis
• LW - LW
lw r3, imm(rs1) IF ID EX M r3 WB
IF ID EX M WB
lw rd, imm(r3) IF ID EX M WB
lw rd, imm(r3) IF ID EX M WB
30
Analysis
• LW - SW
lw r3, imm(rs1) IF ID EX M r3 WB
IF ID EX M WB
sw rs2, imm(r3) IF ID EX M WB
sw rs2, imm(r3) IF ID EX M WB
31
Analysis
• LW - SW
lw r3, imm(rs1) IF ID EX M r3 WB
sw r3, imm(rs1) IF ID EX M WB
sw r3, imm(rs1) IF ID EX M WB
sw r3, imm(rs1) IF ID EX M WB
32
Analysis
• LW - SW
lw r3, imm(rs1) IF ID EX M r3 WB
IF ID EX M WB
IF ID EX M WB
beq r3, rs2, imm IF ID EX M WB
33
Results
ALU lw branch
• Distance == 1
incurs a control hazard
• Distance >= 2
no hazard
ALU lw sw branch
34
Hazard Example
ALU從register
file拿到的,不
是最新的值
→ solution:
ALU改由他處
得最新的值
35
Forwarding (Bypassing)
• Write-back value
becomes rs1 or rs2
before being written
into the register file
36
Forwarding Paths
• For a 5-stage pipeline
• Forwarding destinations
• rs1 of ALU, beq, and SW
• rs2 of ALU and beq
• rs2 of SW
• Forwarding sources
• rd of the 1-cycle-earlier ALU
• rd of the 2-cycle-earlier ALU
• rd of the 1-cycle-earlier LW
• rd of the 2-cycle-earlier LW
37
• rs1 of ALU, beq, and SW
Forwarding Paths • rs2 of R-type and beq
• rs2 of SW
imm - MU
X
M
U
X
• rd of the 2-cycle- • rd of the 1-cycle-
earlier ALU and LW • rd of the 1-cycle- earlier ALU and LW
earlier ALU
38
Forwarding Control
• Take special care of
• Don't forward any result to $0, which is always zero
add $0,$1,$2
add $3,$0,$4
• Forward the latest value for double dependencies
add $1,$1,$2
add $1,$1,$3
add $1,$1,$4
39
Forwarding Control
• Fwd x to rs = f1(inst, inst', inst'')
• Fwd inst'.rd to rs
• inst ∈ {ALU, beq, lw, sw} and
• inst' ∈ {ALU} and
• inst.rs == inst'.rd and
• inst.rs != 0
• Fwd inst''.rd to rs
• inst ∈ {ALU, beq, lw, sw} and
• inst'' ∈ {ALU, lw} and
• inst.rs == inst''.rd and
• inst.rs != 0 and
• Fwd inst'.rd to ALU_in is false // not double dependency
40
Forwarding Control
• Fwd inst''.rd to inst'.rs2 = f2(inst, inst', inst'')
• Fwd inst''.rd to inst'.rs2
• inst' ∈ {sw} and
• inst'' ∈ {lw} and
• inst'.rs2 == inst''.rd and
• inst'.rs2 != 0
41
Outline
• Background
• Single-cycle design
• Pipelined design
• Pipeline concepts and MIPS's pipeline
• Cost and issues of pipelining
• Detailed pipelined datapath and control
• Trace the pipeline
• Dependencies, hazards, and forwarding
• Stalls and exceptions
42
Data Hazards That Cause Stall(s)
• Two lw cases
lw r1, imm(r2) IF ID EX MEM WB
ALU r3, r4, r1 IF ID EX MEM WB
IF ID EX MEM WB
lw r1, imm(r2) IF ID EX MEM WB
sw r3, imm(r1) IF ID EX MEM WB
IF ID EX MEM WB
43
Data Hazards That Cause Stall(s)
• Three branch cases
ALU r1, r, r IF ID EX MEM WB
beq r1, r2, imm IF ID EX MEM WB
IF ID EX MEM WB
lw r1, imm(r2) IF ID EX MEM WB
beq r1, r2, imm IF ID EX MEM WB
lw r1, imm(r2) IF ID EX MEM WB
nop IF ID EX MEM WB
beq r1, r2, imm IF ID EX MEM WB
IF ID EX MEM WB 44
How to Handle
• Detect the situations
• Stall the pipeline
• Example
PC-4 lw r1, imm(r2) IF ID EX MEM WB
PC R-type r3, r4, r1 → nop IF ID EX MEM WB
PC R-type r3, r4, r1 IF ID EX MEM WB
45
Hazard Detect and Stall
keep_PC keep_inst inst inst' inst''
nop
46
Hazard Detect and Stall
keep_PC keep_inst inst inst' inst''
nop
lw $1, 100($2)
84
nop
nop
nop
80: lw
84: addi
47
Hazard Detect and Stall
keep_PC keep_inst inst inst' inst''
nop
addi t3, t1, 123
lw t1, 100(t2)
88
nop
nop
80: lw
84: addi
48
Hazard Detect and Stall
keep_PC keep_inst inst inst' inst''
nop
addi t3, t1, 123
lw t1, 100(t2)
88
nop
nop
80: lw
84: addi
49
Hazard Detect and Stall
keep_PC keep_inst inst inst' inst''
nop
addi t3, t1, 123
lw t1, 100(t2)
8C
nop
…
80: lw
84: addi
50
Hazard Detect and Stall
keep_PC keep_inst inst inst' inst''
nop
addi t3, t1, 123
90
nop
…
…
80: lw
84: addi
51
Hazard Detect and Stall
keep_PC keep_inst inst inst' inst''
nop
addi t3, t1, 123
94
…
…
…
80: lw
84: addi
52