0% found this document useful (0 votes)

22 views39 pages

Pipeline

The document discusses instruction pipelining, explaining how it works and its benefits. Pipelining allows multiple instructions to be processed simultaneously by overlapping their execution across different stages. While pipelining can improve throughput, hazards like structural, data, and control hazards can reduce its effectiveness if they cause stalls.

Uploaded by

Nepal Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views39 pages

Pipeline

Uploaded by

Nepal Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Introduction to Instruction Pipelining

Pipelining

Pipelining is an implementation technique in which

multiple instructions are overlapped in execution

Here we will consider RISC architecture

Memory Access by Load/Store
All other instructions use registers.
Pipelining is Natural!

Laundry Example
Ann, Brian, Cathy, Dave A B C D
each have one load of clothes
to wash, dry, and fold
Washer takes 30 minutes

Dryer takes 40 minutes

“Folder” takes 20 minutes

Sequential Laundry
6 PM 7 8 9 10 11 Midnight
Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D

Sequential laundry takes 6 hours for 4 loads

If they learned pipelining, how long would laundry take?
Pipelined Laundry: Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time

30 40 40 40 40 20
T
a A
s
k
B
O
r
d C
e
r
D

Pipelined laundry takes 3.5 hours for 4 loads

Pipelining Lessons
Pipelining doesn’t help
6 PM 7 8 9 latency of single task, it helps
throughput of entire workload
Time
Pipeline rate is limited by
slowest pipeline stage
30 40 40 40 40 20
T Multiple tasks operating
a A simultaneously using
s different resources
k
Potential speedup = Number
B pipeline stages
O
r Unbalanced lengths of
d C pipeline stages reduces
e speedup
r
D Time to “fill” pipeline and
time to “drain” it reduces
speedup

Stall for Dependencies

The Five Stages of Load
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Load Ifetch Reg/Dec Exec Mem Wr

Ifetch: Instruction Fetch

Fetch the instruction from the Instruction Memory
Reg/Dec: Registers Fetch and Instruction Decode
Exec: Calculate the memory address
Mem: Read the data from the Data Memory
Wr: Write the data back to the register file

Branch instruction – 2 stages

Store instruction – 4 stages
ALU instruction – 5 stages (4th stage is idle)
Pipelining
Improve performance by increasing throughput

Ideal speedup is number of stages in the pipeline.

Do we achieve this? NO!
The computer pipeline stage time are limited by the slowest resource, either
the ALU operation, or the memory access
Fill and drain time
Single Cycle, Multiple Cycle, vs. Pipeline
Cycle 1 Cycle 2
Clk
Single Cycle Implementation:
Load Store Waste

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk

Multiple Cycle Implementation:

Load Store R-type
Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch

Pipeline Implementation:
Load Ifetch Reg Exec Mem Wr

Store Ifetch Reg Exec Mem Wr

R-type Ifetch Reg Exec Mem Wr

Why Pipeline?
Suppose we execute 100 instructions
Single Cycle Machine
45 ns/cycle x 1 CPI x 100 inst = 4500 ns
Multicycle Machine
10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
Ideal pipelined machine
10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
Why Pipeline? Because the resources are there!
Time (clock cycles)

ALU
I Im Reg Dm Reg
n Inst 0
s

ALU
t Inst 1 Im Reg Dm Reg
r.

ALU
O Inst 2 Im Reg Dm Reg
r
d Inst 3

ALU
Im Reg Dm Reg
e
r
Inst 4

ALU
Im Reg Dm Reg
Speedup and Efficiency

k-stage pipeline processes n tasks in k + (n-1) clock

cycles:

k cycles for the first task and n-1 cycles for the remaining
n-1 tasks

Total time to process n tasks Tk = [ k + (n-1)]t

For the non-pipelined processor T1 = n k t

Speedup factor T1 nkt nk

Sk =
Tk = [ k + (n-1)] t= k + (n-1)

nk ≈k
If n is very large (n >> k), then Sk ≈ n
Efficiency and Throughput

Efficiency of the k-stages pipeline:

Sk n
Ek = = k + (n-1)
k

Pipeline throughput (the number of tasks per unit time):

Hk = [ k +n(n-1)] t = nf
k + (n-1)
Can pipelining get us into trouble?
Yes: Pipeline Hazards
Structural hazards: attempt to use the same resource two different
ways at the same time
E.g., combined washer/dryer would be a structural hazard or folder busy
doing something else (watching TV)
Single memory cause structural hazards
Data hazards: attempt to use item before it is ready
E.g., one sock of pair in dryer and one in washer; can’t fold until you get
sock from washer through dryer
instruction depends on result of prior instruction still in the pipeline
Control hazards: attempt to make a decision before condition is
evaluated
E.g., washing football uniforms and need to get proper detergent level;
need to see after dryer before next load in
branch instructions
Can always resolve hazards by waiting
pipeline control must detect the hazard
take action (or delay action) to resolve hazards
Slow Down From Stalls

• Perfect pipelining with no hazards à an instruction completes every

cycle (total cycles ~ num instructions) k + (n-1) ≈ n

à speedup = increase in clock speed = num pipeline stages

nk
Sk ≈ ≈k
n
•With hazards and stalls, some cycles (= stall time) go by
during which no instruction completes, and then the stalled
instruction completes

• Total cycles = number of instructions + stall cycles

• Slowdown because of stalls = 1/ (1 + stall cycles per instr)

Speedup Equation for Pipelining
Compared to unpipelined,

Now it is evident that

Speedup Equation for Pipelining

Putting in the equation of speedup

Thus, if there are no stalls, the speedup is equal to the number

of pipeline stages, matching our intuition for the ideal case.
Single Memory is a Structural Hazard
Time (clock cycles)

ALU
I Mem Mem Reg
n
Load Reg

ALU
Mem Reg Mem Reg
t Instr 1
r.

ALU
Mem Reg Mem Reg
O Instr 2
r

ALU
d Mem Reg Mem Reg
e
Instr 3
r

ALU
Mem Reg Mem Reg
Instr 4
Single Memory is a Structural Hazard
Time (clock cycles)

ALU
I Mem Mem Reg
n
Load Reg

ALU
Mem Reg Mem Reg
t Instr 1
r.

ALU
Mem Reg Mem Reg
O Instr 2
r
d
Instr 3

ALU
Bubble Mem Reg Mem Reg
e
r
Instr 4

ALU
Mem Reg Mem Reg

One cycle stall for structural hazard

Example: Dual-port vs. Single-port

Machine A: Dual ported memory

Machine B: Single ported memory, but its pipelined implementation has
a 1.05 times faster clock rate
Ideal CPI = 1 for both
Data references are of 40% mix

SpeedUpA = [1/(1 + 0)] x (clockunpipe/clockpipe)

= Pipeline Depth
SpeedUpB = [1/(1 + 0.4x1)] x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x 1.05
= 0.75 x Pipeline Depth
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33

Machine A is 1.33 times faster

Control Hazard
When a branch is executed, it may or may not change the
PC to something other than incrementing it.

If a branch changes the PC to its target address, it is

called a taken branch;

If it falls through, it is not taken, or untaken.

If instruction i is a taken branch, then the PC is normally

not changed until the end of ID, after the completion of the
address calculation and comparison.
Control Hazard Solution #1: Stall
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
Beq
O
r
Load Lost

ALU
Mem Reg Mem Reg
d potential
e
r

Stall: wait until decision is clear

Impact: 2 lost cycles (i.e. 3 clock cycles per branch instruction) =>slow

Move decision to end of decode by improving hardware

save 1 cycle per branch

If 20% instructions are BEQ, all others have CPI 1, what is the average
CPI?
Control Hazard Solution #1: Stall
Control Hazard Solution #2: Predict
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
Beq
O
r
Load

ALU
Mem Reg Mem Reg
d
e
r

Predict: guess one direction (taken/untaken) then back up if wrong

Impact: 0 lost cycles per branch instruction if right, 1 if wrong
(right 50% of time)
Need to “Squash” and restart following instruction if wrong
Produce CPI on branch of (1 *.5 + 2 * .5) = 1.5
Total CPI might then be: 1.5 * .2 + 1 * .8 = 1.1 (20% branch)
Control Hazard Solution #2: Predict
Control Hazard Solution #3: Delayed Branch
I Time (clock cycles)
n

ALU
s Mem Reg Mem Reg
t Add
r.

ALU
Mem Reg Mem Reg
Beq
O
r

ALU
d Misc Mem Reg Mem Reg

ALU
r Load Mem Reg Mem Reg

Delayed Branch: Redefine branch behavior (takes place after

next instruction)
Impact: 0 extra clock cycles per branch instruction if can find
instruction to put in “slot” ( 50% of time)
The longer the pipeline, the harder to fill
Used by MIPS architecture
Control Hazard Solution #3: Delayed Branch
Scheduling Branch Delay Slots (Fig A.14)
A. From before branch B. From branch target C. From fall through
add $1,$2,$3 sub $4,$5,$6 add $1,$2,$3
if $2=0 then if $1=0 then
delay slot delay slot
add $1,$2,$3
if $1=0 then
delay slot sub $4,$5,$6

becomes becomes becomes

add $1,$2,$3
if $2=0 then if $1=0 then
add $1,$2,$3 sub $4,$5,$6
add $1,$2,$3
if $1=0 then
sub $4,$5,$6

A is the best choice, fills delay slot & reduces instruction count (IC)
In B, the sub instruction may need to be copied, increasing IC
In B and C, must be okay to execute sub when branch fails
More On Delayed Branch

Compiler effectiveness for single branch delay

slot:
Fills about 60% of branch delay slots
About 80% of instructions executed in branch delay slots
useful in computation
About 50% (60% x 80%) of slots usefully filled
Evaluating Branch Alternatives
A simplified pipeline speedup equation for Branch:

Pipeline speedup = Pipeline depth

1 +Branch frequency´Branch penalty

Assume that in a deeper pipeline, it takes at least three pipeline

stages before the branch-target address is known and an
additional cycle before the branch condition is evaluated

Assume 4% unconditional branch

6% conditional branch- untaken
10% conditional branch-taken
Evaluating Branch Alternatives
Data Hazard on r1
An instruction depends on the result of a previous instruction still in the pipeline

add r1 ,r2,r3

sub r4, r1 ,r3

and r6, r1 ,r7

or r8, r1 ,r9

xor r10, r1 ,r11

Data Hazard on r1:
• Dependencies backwards in time are hazards

Time (clock cycles)

IF ID/RF EX MEM WB

ALU
I add r1,r2,r3 Im Reg Dm Reg

ALU
Im Reg Dm Reg
s
t
sub r4,r1,r3
r.

ALU
Im Reg Dm Reg
and r6,r1,r7
O

ALU
r Im Reg Dm Reg
d or r8,r1,r9
e

ALU
Im Reg Dm Reg
r xor r10,r1,r11
Data Hazard Solution:
• “Forward” result from one stage to another
Time (clock cycles)
IF ID/RF EX MEM WB

ALU
I add r1,r2,r3 Im Reg Dm Reg

ALU
Im Reg Dm Reg
s
t
sub r4,r1,r3
r.

ALU
Im Reg Dm Reg
and r6,r1,r7
O

ALU
r Im Reg Dm Reg
d or r8,r1,r9
e

ALU
Im Reg Dm Reg
r xor r10,r1,r11
• “or” OK if define read/write properly
•Forwarding can’t prevent all data hazard! – lw followed by R-type?
Forwarding (or Bypassing): What about Loads?
• Dependencies backwards in time are hazards
Time (clock cycles)
IF ID/RF EX MEM WB

ALU
lw r1,0(r2) Im Reg Dm Reg

ALU
Im Reg Dm Reg
sub r4,r1,r3

• Can’t solve with forwarding:

• Must delay/stall instruction dependent on loads
Forwarding (or Bypassing): What about Loads

• Dependencies backwards in time are hazards

Time (clock cycles)
IF ID/RF EX MEM WB

ALU
lw r1,0(r2) Im Reg Dm Reg

ALU
sub r4,r1,r3 Stall Im Reg Dm Reg

• Can’t solve with forwarding:

• Must delay/stall instruction dependent on loads
Software Scheduling to Avoid Load
Hazards
Try producing fast code for
a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f in memory.
Slow code: Fast code:
LW Rb,b LW Rb,b
LW Rc,c LW Rc,c
ADD Ra,Rb,Rc LW Re,e
SW a,Ra ADD Ra,Rb,Rc
LW Re,e LW Rf,f
LW Rf,f SW a,Ra
SUB Rd,Re,Rf SUB Rd,Re,Rf
SW d,Rd SW d,Rd

Compiler optimizes for performance by out-of-order execution

Summary: Pipelining
What makes it easy
all instructions are the same length
just a few instruction formats
memory operands appear only in loads and stores; Memory
addresses are asigned
What makes it hard?
structural hazards: suppose we had only one memory
control hazards: need to worry about branch instructions
data hazards: an instruction depends on a previous instruction

We’ll talk about modern processors and what really

makes it hard:
trying to improve performance with out-of-order execution, etc.
Summary
Pipelining is a fundamental concept
multiple steps using distinct resources
Utilize capabilities of the Datapath by pipelined
instruction processing
start next instruction while working on the current one
limited by length of longest stage (plus fill/flush)
detect and resolve hazards

Comandos OLT c300
100% (5)
Comandos OLT c300
19 pages
Cisco Intersight Mode With UCS X-Series v1 - Instant Demo: Americas Headquarters
No ratings yet
Cisco Intersight Mode With UCS X-Series v1 - Instant Demo: Americas Headquarters
54 pages
PROII Com Server Reference Guide
100% (1)
PROII Com Server Reference Guide
441 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Pipelined Processor Design: Computer Architecture and Assembly Language
No ratings yet
Pipelined Processor Design: Computer Architecture and Assembly Language
22 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Pipelining Basic and Intermediate Concepts
No ratings yet
Pipelining Basic and Intermediate Concepts
75 pages
Pipelining
No ratings yet
Pipelining
43 pages
Pipelining
No ratings yet
Pipelining
44 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Piplining
No ratings yet
Piplining
23 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Pipeline Hazards
No ratings yet
Pipeline Hazards
61 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Lec 1
No ratings yet
Lec 1
30 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
Chapter # 03 Pipelining
No ratings yet
Chapter # 03 Pipelining
85 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
Lecture # Pipelining
No ratings yet
Lecture # Pipelining
36 pages
Chapter 4.5 - 4.8 Piplined Processor and Hazards
No ratings yet
Chapter 4.5 - 4.8 Piplined Processor and Hazards
68 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Week 11 Reduced
No ratings yet
Week 11 Reduced
29 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Pipelining
No ratings yet
Pipelining
47 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Pipelining and Parallelism
No ratings yet
Pipelining and Parallelism
41 pages
Unit3 Pipelining
No ratings yet
Unit3 Pipelining
54 pages
Pipeline: A Simple Implementation of A RISC Instruction Set
No ratings yet
Pipeline: A Simple Implementation of A RISC Instruction Set
16 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Chapter 04 Processor 2
No ratings yet
Chapter 04 Processor 2
28 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Chapter 6 - Pipelining
0% (1)
Chapter 6 - Pipelining
61 pages
Chapter6 - Pipelining
No ratings yet
Chapter6 - Pipelining
61 pages
Module 5 - Pipelining
No ratings yet
Module 5 - Pipelining
61 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Pipe Lining
No ratings yet
Pipe Lining
29 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
3-Pipelining 241110 203716
No ratings yet
3-Pipelining 241110 203716
59 pages
Shri G.S. Institute of Technology and Science: Computer Architecture and Organisation (CO-24009) Session: 2019-2020
No ratings yet
Shri G.S. Institute of Technology and Science: Computer Architecture and Organisation (CO-24009) Session: 2019-2020
27 pages
Pipelining New
No ratings yet
Pipelining New
33 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
No ratings yet
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
69 pages
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
No ratings yet
Pipelining & Riscs: Pipelining Used Key Implementation Technique To Build Fast Processors. It
6 pages
Chapter 3 PPTV 31 Sem IIv 31
No ratings yet
Chapter 3 PPTV 31 Sem IIv 31
40 pages
Lect3 Pipeline
No ratings yet
Lect3 Pipeline
4 pages
ILP - Appendix C PDF
No ratings yet
ILP - Appendix C PDF
52 pages
4-Pipeline
No ratings yet
4-Pipeline
30 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
Comparison Between Pipelining
No ratings yet
Comparison Between Pipelining
9 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
100% (2)
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
24 pages
Pipeline
No ratings yet
Pipeline
22 pages
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
No ratings yet
IKI20210 Pengantar Organisasi Komputer Kuliah No. 25: Pipeline
27 pages
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet
Southern Marine Engineering Desk Reference: Second Edition Volume I
From Everand
Southern Marine Engineering Desk Reference: Second Edition Volume I
Rolf N. Ekenes
No ratings yet
Seabed Seismic Techniques: QC and Data Processing Keys
From Everand
Seabed Seismic Techniques: QC and Data Processing Keys
Mundy Obilor Jim
5/5 (2)
Introduction To Light Aircraft Design
From Everand
Introduction To Light Aircraft Design
Thomas Philippa
4.5/5 (3)
Loop-shaping Robust Control
From Everand
Loop-shaping Robust Control
Philippe Feyel
No ratings yet
How To Configure The MiCOM P139 Device With Configuration Files Prepared in Advance
No ratings yet
How To Configure The MiCOM P139 Device With Configuration Files Prepared in Advance
8 pages
MPMC Notes
No ratings yet
MPMC Notes
118 pages
Any Bus
No ratings yet
Any Bus
4 pages
XLA RM
No ratings yet
XLA RM
12 pages
0 Slot01 Intro InstallingTool FirstProgram Tran PDF
No ratings yet
0 Slot01 Intro InstallingTool FirstProgram Tran PDF
13 pages
COA Unit-1 and Unit-2 List of Questions
No ratings yet
COA Unit-1 and Unit-2 List of Questions
2 pages
What Is A Microprocessor
No ratings yet
What Is A Microprocessor
21 pages
Computer Memory Basics: RAM ROM Cache
No ratings yet
Computer Memory Basics: RAM ROM Cache
6 pages
Annadata Report
No ratings yet
Annadata Report
38 pages
Installing Vstream VMware
No ratings yet
Installing Vstream VMware
56 pages
Lista Firme Constanta
No ratings yet
Lista Firme Constanta
92 pages
Accelerating ML Recommendation With Over A Thousand Risc-V/Tensor Processors On Esperanto'S Et-Soc-1 Chip
No ratings yet
Accelerating ML Recommendation With Over A Thousand Risc-V/Tensor Processors On Esperanto'S Et-Soc-1 Chip
23 pages
Microsoft MB2-716 Exam
0% (2)
Microsoft MB2-716 Exam
10 pages
Subroutines and Macros
100% (1)
Subroutines and Macros
14 pages
Reliability Manual v0.1
No ratings yet
Reliability Manual v0.1
16 pages
C Online Test
No ratings yet
C Online Test
6 pages
Unit I - Introduction To Parallel Processing
No ratings yet
Unit I - Introduction To Parallel Processing
45 pages
Architecture Instruction Set Extensions Programming Reference
No ratings yet
Architecture Instruction Set Extensions Programming Reference
180 pages
Python Fast
No ratings yet
Python Fast
24 pages
ProlinK Pocket Router
No ratings yet
ProlinK Pocket Router
2 pages
LOAD - DB2 Utility
No ratings yet
LOAD - DB2 Utility
20 pages
Kontak Anak JB
No ratings yet
Kontak Anak JB
119 pages
Fortnite Part 2
100% (1)
Fortnite Part 2
2 pages
DRYPIXLink E05
No ratings yet
DRYPIXLink E05
556 pages
Cybro Hardware Manual v3.20
No ratings yet
Cybro Hardware Manual v3.20
65 pages
DPATT-3Bi Software Installation Version 2
No ratings yet
DPATT-3Bi Software Installation Version 2
18 pages
2V0-33.22 Exam - Free Actual Q&As, Page 7 - ExamTopics
No ratings yet
2V0-33.22 Exam - Free Actual Q&As, Page 7 - ExamTopics
2 pages

Pipeline

Uploaded by

Pipeline

Uploaded by

Introduction to Instruction Pipelining

Pipelining is an implementation technique in which

Here we will consider RISC architecture

Dryer takes 40 minutes

“Folder” takes 20 minutes

Sequential laundry takes 6 hours for 4 loads

Pipelined laundry takes 3.5 hours for 4 loads

Stall for Dependencies

Load Ifetch Reg/Dec Exec Mem Wr

Ifetch: Instruction Fetch

Branch instruction – 2 stages

Ideal speedup is number of stages in the pipeline.

Multiple Cycle Implementation:

Store Ifetch Reg Exec Mem Wr

R-type Ifetch Reg Exec Mem Wr

k-stage pipeline processes n tasks in k + (n-1) clock

Total time to process n tasks Tk = [ k + (n-1)]t

For the non-pipelined processor T1 = n k t

Speedup factor T1 nkt nk

Efficiency of the k-stages pipeline:

Pipeline throughput (the number of tasks per unit time):

• Perfect pipelining with no hazards à an instruction completes every

à speedup = increase in clock speed = num pipeline stages

• Total cycles = number of instructions + stall cycles

• Slowdown because of stalls = 1/ (1 + stall cycles per instr)

Now it is evident that

Putting in the equation of speedup

Thus, if there are no stalls, the speedup is equal to the number

One cycle stall for structural hazard

Machine A: Dual ported memory

SpeedUpA = [1/(1 + 0)] x (clockunpipe/clockpipe)

Machine A is 1.33 times faster

If a branch changes the PC to its target address, it is

If it falls through, it is not taken, or untaken.

If instruction i is a taken branch, then the PC is normally

Stall: wait until decision is clear

Move decision to end of decode by improving hardware

Predict: guess one direction (taken/untaken) then back up if wrong

Delayed Branch: Redefine branch behavior (takes place after

becomes becomes becomes

Compiler effectiveness for single branch delay

Pipeline speedup = Pipeline depth

Assume that in a deeper pipeline, it takes at least three pipeline

Assume 4% unconditional branch

sub r4, r1 ,r3

and r6, r1 ,r7

xor r10, r1 ,r11

Time (clock cycles)

• Can’t solve with forwarding:

• Dependencies backwards in time are hazards

• Can’t solve with forwarding:

Compiler optimizes for performance by out-of-order execution

We’ll talk about modern processors and what really

You might also like