COMPUTER ORGANIZATION AND DESIGN
RISC-V
Edition
The Hardware/Software Interface
Chapter 1-2
Performance Evaluation
Understanding Performance
◼ Algorithm
◼ Determines number of operations executed
◼ Programming language, compiler, architecture
◼ Determine number of machine instructions executed
per operation
◼ Processor and memory system
◼ Determine how fast instructions are executed
◼ I/O system (including OS)
◼ Determines how fast I/O operations are executed
Chapter 1 — Computer Abstractions and Technology — 2
§1.6 Performance
Defining Performance
◼ Which airplane has the best performance?
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 100 200 300 400 500 0 2000 4000 6000 8000 10000
Passenger Capacity Cruising Range (miles)
Boeing 777 Boeing 777
Boeing 747 Boeing 747
BAC/Sud BAC/Sud
Concorde Concorde
Douglas Douglas DC-
DC-8-50 8-50
0 500 1000 1500 0 100000 200000 300000 400000
Cruising Speed (mph) Passengers x mph
Chapter 1 — Computer Abstractions and Technology — 3
Response Time and Throughput
◼ Response time
◼ How long it takes to do a task
◼ Throughput
◼ Total work done per unit time
◼ e.g., tasks/transactions/… per hour
◼ How are response time and throughput affected
by
◼ Replacing the processor with a faster version?
◼ Adding more processors?
◼ We’ll focus on response time for now…
Chapter 1 — Computer Abstractions and Technology — 4
Relative Performance
◼ Define Performance = 1/Execution Time
◼ “X is n time faster than Y”
Performanc e X Performanc e Y
= Execution time Y Execution time X = n
◼ Example: time taken to run a program
◼ 10s on A, 15s on B
◼ Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
◼ So A is 1.5 times faster than B
Chapter 1 — Computer Abstractions and Technology — 5
Measuring Execution Time
◼ Elapsed time
◼ Total response time, including all aspects
◼ Processing, I/O, OS overhead, idle time
◼ Determines system performance
◼ CPU time
◼ Time spent processing a given job
◼ Discounts I/O time, other jobs’ shares
◼ Comprises user CPU time and system CPU
time
◼ Different programs are affected differently by
CPU and system performance
Chapter 1 — Computer Abstractions and Technology — 6
CPU Clocking
◼ Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
◼ Clock period: duration of a clock cycle
◼ e.g., 250ps = 0.25ns = 250×10–12s
◼ Clock frequency (rate): cycles per second
◼ e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 7
CPU Time
CPU Time = CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
=
Clock Rate
◼ Performance improved by
◼ Reducing number of clock cycles
◼ Increasing clock rate
◼ Hardware designer must often trade off clock
rate against cycle count
Chapter 1 — Computer Abstractions and Technology — 8
CPU Time Example
◼ Computer A: 2GHz clock, 10s CPU time
◼ Designing Computer B
◼ Aim for 6s CPU time
◼ Can do faster clock, but causes 1.2 × clock cycles
◼ How fast must Computer B clock be?
Clock Cycles B 1.2 Clock Cycles A
Clock Rate B = =
CPU Time B 6s
Clock Cycles A = CPU Time A Clock Rate A
= 10s 2GHz = 20 10 9
1.2 20 10 9 24 10 9
Clock Rate B = = = 4GHz
6s 6s
Chapter 1 — Computer Abstractions and Technology — 9
Instruction Count and CPI
Clock Cycles = Instruction Count Cycles per Instruction
CPU Time = Instruction Count CPI Clock Cycle Time
Instruction Count CPI
=
Clock Rate
◼ Instruction Count for a program
◼ Determined by program, ISA and compiler
◼ Average cycles per instruction
◼ Determined by CPU hardware
◼ If different instructions have different CPI
◼ Average CPI affected by instruction mix
Chapter 1 — Computer Abstractions and Technology — 10
CPI Example
◼ Computer A: Cycle Time = 250ps, CPI = 2.0
◼ Computer B: Cycle Time = 500ps, CPI = 1.2
◼ Same ISA
◼ Which is faster, and by how much?
CPU Time = Instruction Count CPI Cycle Time
A A A
= I 2.0 250ps = I 500ps A is faster…
CPU Time = Instruction Count CPI Cycle Time
B B B
= I 1.2 500ps = I 600ps
B = I 600ps = 1.2
CPU Time
…by this much
CPU Time I 500ps
A
Chapter 1 — Computer Abstractions and Technology — 11
CPI in More Detail
◼ If different instruction classes take different
numbers of cycles
n
Clock Cycles = (CPIi Instructio n Count i )
i=1
◼ Weighted average CPI
Clock Cycles n
Instructio n Count i
CPI = = CPIi
Instructio n Count i=1 Instructio n Count
Relative frequency
Chapter 1 — Computer Abstractions and Technology — 12
CPI Example
◼ Alternative compiled code sequences using
instructions in classes A, B, C
Class A B C
CPI for class 1 2 3
IC in sequence 1 2 1 2
IC in sequence 2 4 1 1
◼ Sequence 1: IC = 5 ◼ Sequence 2: IC = 6
◼ Clock Cycles ◼ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
◼ Avg. CPI = 10/5 = 2.0 ◼ Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 13
Performance Summary
The BIG Picture
Instructio ns Clock cycles Seconds
CPU Time =
Program Instructio n Clock cycle
◼ Performance depends on
◼ Algorithm: affects IC, possibly CPI
◼ Programming language: affects IC, CPI
◼ Compiler: affects IC, CPI
◼ Instruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and Technology — 14
§1.7 The Power Wall
Power Trends
◼ In CMOS IC technology
Power = Capacitive load Voltage 2 Frequency
×30 5V → 1V ×1000
Chapter 1 — Computer Abstractions and Technology — 15
Reducing Power
◼ Suppose a new CPU has
◼ 85% of capacitive load of old CPU
◼ 15% voltage and 15% frequency reduction
Pnew Cold 0.85 (Vold 0.85) 2 Fold 0.85
= = 0.85 4
= 0.52
Cold Vold Fold
2
Pold
◼ The power wall
◼ We can’t reduce voltage further
◼ We can’t remove more heat
◼ How else can we improve performance?
Chapter 1 — Computer Abstractions and Technology — 16
§1.8 The Sea Change: The Switch to Multiprocessors
Uniprocessor Performance
Constrained by power, instruction-level parallelism,
memory latency
Chapter 1 — Computer Abstractions and Technology — 17
SPEC CPU Benchmark
◼ Programs used to measure performance
◼ Supposedly typical of actual workload
◼ Standard Performance Evaluation Corp (SPEC)
◼ Develops benchmarks for CPU, I/O, Web, …
◼ SPEC CPU2006
◼ Elapsed time to execute a selection of programs
◼ Negligible I/O, so focuses on CPU performance
◼ Normalize relative to reference machine
◼ Summarize as geometric mean of performance ratios
◼ CINT2006 (integer) and CFP2006 (floating-point)
n
n
Execution time ratio
i=1
i
Chapter 1 — Computer Abstractions and Technology — 18
CINT2006 for Intel Core i7 920
Chapter 1 — Computer Abstractions and Technology — 19
§1.10 Fallacies and Pitfalls
Pitfall: Amdahl’s Law
◼ Improving an aspect of a computer and
expecting a proportional improvement in
overall performance
Taffected
Timproved = + Tunaffected
improvemen t factor
◼ Example: multiply accounts for 80s/100s
◼ How much improvement in multiply performance to
get 5× overall?
80 ◼ Can’t be done!
20 = + 20
n
◼ Corollary: make the common case fast
Chapter 1 — Computer Abstractions and Technology — 20
Pitfall: MIPS as a Performance Metric
◼ MIPS: Millions of Instructions Per Second
◼ Doesn’t account for
◼ Differences in ISAs between computers
◼ Differences in complexity between instructions
Instructio n count
MIPS =
Execution time 10 6
Instructio n count Clock rate
= =
Instructio n count CPI CPI 10 6
10 6
Clock rate
◼ CPI varies between programs on a given CPU
Chapter 1 — Computer Abstractions and Technology — 21