Pipeline
Pipeline
Pipelining
Laundry Example
Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D
30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Pipeline Implementation:
Load Ifetch Reg Exec Mem Wr
ALU
I Im Reg Dm Reg
n Inst 0
s
ALU
t Inst 1 Im Reg Dm Reg
r.
ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3
ALU
Im Reg Dm Reg
e
r
Inst 4
ALU
Im Reg Dm Reg
Speedup and Efficiency
k cycles for the first task and n-1 cycles for the remaining
n-1 tasks
nk ≈k
If n is very large (n >> k), then Sk ≈ n
Efficiency and Throughput
Sk n
Ek = = k + (n-1)
k
Hk = [ k +n(n-1)] t = nf
k + (n-1)
Can pipelining get us into trouble?
Yes: Pipeline Hazards
Structural hazards: attempt to use the same resource two different
ways at the same time
E.g., combined washer/dryer would be a structural hazard or folder busy
doing something else (watching TV)
Single memory cause structural hazards
Data hazards: attempt to use item before it is ready
E.g., one sock of pair in dryer and one in washer; can’t fold until you get
sock from washer through dryer
instruction depends on result of prior instruction still in the pipeline
Control hazards: attempt to make a decision before condition is
evaluated
E.g., washing football uniforms and need to get proper detergent level;
need to see after dryer before next load in
branch instructions
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards
Slow Down From Stalls
ALU
I Mem Mem Reg
n
Load Reg
ALU
Mem Reg Mem Reg
t Instr 1
r.
ALU
Mem Reg Mem Reg
O Instr 2
r
ALU
d Mem Reg Mem Reg
e
Instr 3
r
ALU
Mem Reg Mem Reg
Instr 4
Single Memory is a Structural Hazard
Time (clock cycles)
ALU
I Mem Mem Reg
n
Load Reg
ALU
Mem Reg Mem Reg
t Instr 1
r.
ALU
Mem Reg Mem Reg
O Instr 2
r
d
Instr 3
ALU
Bubble Mem Reg Mem Reg
e
r
Instr 4
ALU
Mem Reg Mem Reg
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
Beq
O
r
Load Lost
ALU
Mem Reg Mem Reg
d potential
e
r
Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) =>slow
If 20% instructions are BEQ, all others have CPI 1, what is the average
CPI?
Control Hazard Solution #1: Stall
Control Hazard Solution #2: Predict
I Time (clock cycles)
n
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
Beq
O
r
Load
ALU
Mem Reg Mem Reg
d
e
r
ALU
s Mem Reg Mem Reg
t Add
r.
ALU
Mem Reg Mem Reg
Beq
O
r
ALU
d Misc Mem Reg Mem Reg
ALU
r Load Mem Reg Mem Reg
A is the best choice, fills delay slot & reduces instruction count (IC)
In B, the sub instruction may need to be copied, increasing IC
In B and C, must be okay to execute sub when branch fails
More On Delayed Branch
add r1 ,r2,r3
or r8, r1 ,r9
ALU
I add r1,r2,r3 Im Reg Dm Reg
ALU
Im Reg Dm Reg
s
t
sub r4,r1,r3
r.
ALU
Im Reg Dm Reg
and r6,r1,r7
O
ALU
r Im Reg Dm Reg
d or r8,r1,r9
e
ALU
Im Reg Dm Reg
r xor r10,r1,r11
Data Hazard Solution:
• “Forward” result from one stage to another
Time (clock cycles)
IF ID/RF EX MEM WB
ALU
I add r1,r2,r3 Im Reg Dm Reg
ALU
Im Reg Dm Reg
s
t
sub r4,r1,r3
r.
ALU
Im Reg Dm Reg
and r6,r1,r7
O
ALU
r Im Reg Dm Reg
d or r8,r1,r9
e
ALU
Im Reg Dm Reg
r xor r10,r1,r11
• “or” OK if define read/write properly
•Forwarding can’t prevent all data hazard! – lw followed by R-type?
Forwarding (or Bypassing): What about Loads?
• Dependencies backwards in time are hazards
Time (clock cycles)
IF ID/RF EX MEM WB
ALU
lw r1,0(r2) Im Reg Dm Reg
ALU
Im Reg Dm Reg
sub r4,r1,r3
ALU
lw r1,0(r2) Im Reg Dm Reg
ALU
sub r4,r1,r3 Stall Im Reg Dm Reg