Computer Architecture
ILP
Todays Agenda
Well consider techniques to increase instruction level
parallelism
o What limits ILP; how much we can expect to extract
o How to best exploit the available ILP
Two main techniques
o Hardware
o Software
Pipeline Review
Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls
Ideal pipeline CPI
o Maximum performance of the implementation
Structural Hazards
o H/w cannot support this combination of instructions
Data Hazards
o Instruction consumes a result not yet produced
Control Hazards
o Caused by time required for branch and jump resolution
3
ILP Example
Caravanning on a trip,
must stay in order to
prevent losing anyone
At toll, everyone get in the same lane to stay in order
This works..but its slow. Everyone has to wait for D to
get through the toll booth
Get two at a time (in
parallel)
ILP Basic Concept
Basic Idea: overlap the execution of unrelated instructions to
improve performance is known as instruction-level parallelism
Simple ILP recipe
o
o
If instructions are independent, do them at the same time
If not, do them one at a time
Two main techniques
1.
2.
Rely on hardware to help discover and exploit the parallelism
dynamically (market winner: Intel Pentium series)
Rely on software technology to find parallelism, statically at compiletime (special niche markets: Intel Itanium)
5
Basic Instruction Block
Basic instruction block is a straight-line code sequence with
no branches in, except at the entry point, and no branch out
except at the exit point of the sequence
o Example: Body of a loop
In typical integer code, dynamic branch frequency is 15%
(resulting avg. basic block size of about 7 instructions)
To obtain substantial performance enhancements, we must
exploit ILP across multiple BB
6
Major ILP Techniques
Loop-Level Parallelism
Exploit parallelism among iterations of a loop
Vector execution is one way
o
o
Graphics, DPS, media apps
Execute the same instructions on multiple data simultaneously
If not vector, then either
o
o
Dynamic exploitation via branch prediction
Static exploitation via loop unrolling
Turn LLP into ILP
8
Parallel & Dependent Instructions
Instructions are parallel if they can execute
simultaneously, regardless of pipeline depth
Dependent instructions
o
o
o
Are not parallel
Must be executed in parallel
But may still be partially overlapped
Three types of dependence
o
o
o
Data dependency (true data dependence)
Name dependence
Control dependence
9
Dependence & Hazards
Hazards: Conflicts that arises in an instruction stream line
Dependencies are a property of program
o
Dependency => potential for hazard
Three types of dependence
o
o
o
Data dependency (true data dependence)
Name dependence
Control dependence
10
Data Dependence
Inst J is data dependent on Inst I if
o
If J tries to read an operand before I writes it, or
J is data dependent on inst K which is dependent on I
True Dependence (compiler term)
o
Can cause Read After Write (RAW) hazard
11
Name Dependence: Anti-dependence
Name dependence
o
o
Two instructions use same register or memory location (name)
No actual flow of data between the instructions
Anti-dependence
o
J writes an operand before I reads it
Can cause Write After Read hazard
12
Name Dependence: Output dependence
J writes an operand before I writes it
Can cause Write After Write hazard
In case of name dependence: change the name, remove the
dependence!
o
Register renaming for register naming dependence
13
Control Dependence
Every instruction (except in the very first basic block) is control
dependent on same set branches
In general, these control dependencies must be preserved to
preserve program order
o
o
S1 is control dependent on p1
S2 is control dependent on p2 but not on p1
14
THE END
15