CMP3010L02 Performance Datapath
CMP3010L02 Performance Datapath
Dina Tantawy
Computer Engineering Department
Cairo University
Agenda
• Recap
• Performance Fallacies and pitfalls
• Real stuff: Benchmarking the Intel core i7
• What is pipelining?
• Characteristics of pipelining
Programming X X
language
Compiler X X
ISA
X X X
Core
X X
organization
Technology
X
Component Analysis
Is this true ??
Computer Engineering, Cairo University
MIPS Example
Store 10% 3 .3 .3 .3 .3
.4 .4 .2 .4
Branch 20% 2
• Performance depends on
• Algorithm: affects IC, possibly CPI
• Programming language: affects IC, CPI
• Compiler: affects IC, CPI
• Instruction set architecture: affects IC, CPI, Tc
n
n
Execution time ratio
i=1 Computer Engineering, Cairo University
i
CINT2006 for Intel Core i7 920
chips
30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r
d C
e
r D
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
Computer Engineering, Cairo University
What Is Pipelining Start work ASAP
6 PM 7 8 9 10 11 Midnight
Time
30 40 40 40 40 20
T
a A
s • Pipelined laundry takes 3.5
k hours for 4 loads
B
O
r
d C
e
r D
• Usually instructions are few in number and are typically one size.
MIPS ISA
$18
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Read Address
Memory Register
Instr[20-16] Read Addr 2 Data 1 zero
Read Data
PC Instr[31-0] 0 File Memory Read Data 1
Address ALU
Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Instr[15-0] Sign
ALU
16 Extend 32 control
Instr[5-0]
Computer Engineering, Cairo University
R-type Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Read Address
Memory Register
Instr[20-16] Read Addr 2 Data 1
zero
Read Data
PC Instr[31-0] File Memory Read Data
Address 0 Write Addr
ALU 1
1 Read 0
Data 2 Write Data 0
Write Data
Instr[15 1
-11]
Instr[5-0] Sign
ALU
16 Extend 32 control
Instr[5-0]
Computer Engineering, Cairo University
Load Word Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21]
Read Addr 1
Instruction
Read Address
Memory Register
Instr[20-16] Read Addr 2 Data 1 zero
Read Data
PC Address Instr[31-0] File Memory Read Data 1
0 Write Addr
ALU
1 Read 0 0
Data 2 Write Data
Instr[15
Write Data
-11] 1
Instr[15-0]
Store Word Sign
ALU
16 Extend 32 control
Instruction? Instr[5-0]
Computer Engineering, Cairo University
Branch Instruction Data/Control Flow
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp
Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21]
Read Addr 1
Instruction
Read Address
Memory Instr[20-16] Register
Read Addr 2 Data 1 zero
Read Data
PC Instr[31-0] 0 File Memory Read Data 1
Address ALU
Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Instr[15-0]
Sign
ALU
16 Extend 32 control
Instr[5-0]
Computer Engineering, Cairo University
RISC Instruction Set Implementation
• We first need to look at how instructions in the MIPS instruction
set are implemented without pipelining. We’ll assume that any
instruction of the subset of MIPS can be executed in at most 5
clock cycles.
• The five clock cycles will be broken up into the following steps:
• Instruction Fetch Cycle
• Instruction Decode/Register Fetch Cycle
• Execution Cycle
• Memory Access Cycle
• Write-Back Cycle
Control
Unit
Read Addr 1
Read
Register
Read Addr 2 Data 1
Instruction
File
Write Addr
Read
Data
Computer Engineering, 2 University
Cairo
Write Data
Executing R Format Operations (IE)
Read Addr 1
Read
Register
Read Addr 2 Data 1 overflow
Instruction
File zero
ALU
Write Addr
Read
Data 2
Write Data
overflow
Read Addr 1 zero
Register Read Address
Sign MemRead
16 Extend 32
Add Branch
Add target
4 Shift address
left 2
ALU control
PC
• If a load, the effective address computed from the previous cycle is referenced and the
memory is read. The actual data transfer to the register does not occur until the next cycle.
• If a store, the data from the register is written to the effective address in memory.
Read Addr 1
Read
Register
Read Addr 2 Data 1 overflow
Instruction
File zero
ALU
Write Addr
Read
Data 2
Write Data
ALU
Ifetch Reg DMem Reg
n
s
t
ALU
Ifetch Reg DMem Reg
r
.
ALU
Ifetch Reg DMem Reg
O
r
d
ALU
Ifetch Reg DMem Reg
e
r
P ro g ra m
e x e c u tio n
o rd e r 2 4 6 8 10 12 14 16 18
Time
(in instructions)
Instruction Reg ALU
Data Reg
lw $1, 100($0) fetch access
Instruction Data
8 ns Reg ALU Reg
fetch access
lw $2, 200($0)
Instruction
8 ns
fetch
lw $3, 300($0 ) ...
8 ns
Instruction D a ta
lw $2, 200($0) 2 ns R eg ALU R eg
fetch access
Instruction D a ta
lw $3, 300($0) 2 ns R eg ALU R eg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Computer Engineering, Cairo University
CPU pipelining: Example
Wrong
register
number