Lecture 3
Understanding & Measuring
             Performance
REFERENCE: DAVID A. PATTERSON & JOHN L.
HENNESSY – COMPUTER ORGANIZATION AND DESIGN
                              Introduction
                                            25
 Hardware performance is often key to the effectiveness of
  an entire system of hardware and software.
 For different types of applications, different performance
  metrics may be appropriate, and different aspects of a
  computer systems may be the most significant factor in
  determining overall performance.
 Understanding how best to measure performance and
  limitations of performance is important when selecting a
  computer system
 To understand the issues of assessing performance.
     Why a piece of software performs as it does?
     Why one instruction set can be implemented to perform better than another?
     How some hardware feature affects performance?
         Why measure performance?
                                26
 Performance is important!
 Identify HW/SW performance problems
 Comparisons:
   Which machine is faster?
   Which ISA is better?
   Which implementation (of an ISA) is faster?
 Expose significant performance issues (enable us to
  ignore unimportant issues)
          More than one way to measure
                  performance
                               27
 Performance is evaluated differently by different entity.
     Better performance means faster processing speed (e.g.
      faster completion of a task/job)
     Better performance means higher throughput (doing
      more jobs in a time given)
     Better performance means doing more jobs at a smaller
      cost
  Which plane has better performance?
                                   28
If better performance   If higher throughput (transporting
means having a long     more passengers) is better
range                   performance                        If higher speed
                                                           is better
                                                           performance
           Understanding terminology
                                   29
 Execution time (a.k.a response time) :The total time it
  takes from start to completion of a task
 Throughput :The total amount of tasks completed in a
  given time interval
 CPU execution time (a.k.a CPU time) :The actual time
  CPU spends on a specific task
     User CPU time: time the CPU spends on running the actual program
     System CPU time: time the CPU spends on OS overhead on behalf of
      the program
 Clock cycle (a.k.a ticks, cycle) :Discrete time intervals
  (the processor clock which runs at a constant rate). Usually
  in nanoseconds (ns) or picoseconds (ps)
Clock Rate
            Understanding terminology
                                         30
 Clock period (a.k.a clock cycle time): the duration of one
  clock cycle. In sec, or msec
 Clock rate (or frequency) : the speed that the
  microprocessor executes each instruction or each vibration
  of the clock. In MHz/GHz.
     Frequency = 1/clock period
     1 MHz representing 1 million cycles per second,
      1 GHz representing 1 thousand million cycles per second (109)
 Clock cycles per instruction (CPI) : The average
  number of clock cycles each instruction takes to execute
                        Figure 1
1 cycle time =
how length of
this clock cycle
                        Figure 2
                   31
      Common performance metrics
                              32
 MB/s, Mb/s: Megabytes, Megabits Per Second
 MIPS: Millions of Instructions Per Second
 CPI: Clock Cycles Per Instruction
 IPC: Instructions Per Clock cycle
 Hz: (processor clock frequency) cycles Per Second
 LIPS: Logical Interference Per Second
 FLOPS: Floating-Point arithmetic Operations Per Second
    Computer performance measures
                                    33
 Performance is related to execution time.
 To maximize performance, we want to minimize the execution time
 If performance of Computer A is 10 times better than Computer B, what
  is the relation between their execution times?
                                                   This shows that CompB
                                                   needs 10x more time than
                                                   CompA to execute a given
                                                   task.
                     CPU Execution Time
                                         34
Clock period =       1                        Clock rate = frequency (Hz)
                 frequency
                                              “the frequency at which a CPU is
If a processor has frequency, 320 MHz:        running. It is measured in Hz unit”
Clock period =        1      = 3.125ns
                 320 000 000
   Example 1: Improving Performance
                                      35
 Our favorite program runs in 10 seconds on computer A,
  which has a 4 GHz clock. Computer B will run this program
  in 6 seconds, given that computer B requires 1.2 times as
  many clock cycles as computer A for this program. What is
  computer B’s clock rate?
  What do we know?
Computer A                            Computer B
CPU Execution Time = 10s              CPU Execution Time = 6s
                                      Clock cycle (CC) = 1.2 x clock cycle
Clock rate (CR) = 4GHz = 4 x 109 Hz
                                      Computer A
 Example 1: Improving Performance
                                      36
  What do we know?
Computer A                                 Computer B
CPU Execution Time = 10s                   CPU Execution Time = 6s
                                           Clock cycle (CC) = 1.2 x clock cycle
Clock rate (CR) = 4GHz = 4 x 109 Hz
                                           Computer A
  Example 1: Improving Performance
                             35
 Our favorite program runs in 10 seconds on computer A,
 which has a 4 GHz clock. Computer B will run this program
 in 6 seconds, given that computer B requires 1.2 times as
 many clock cycles as computer A for this program. What is
 computer B’s clock rate?
 Answer: 8Ghz
      Clock Cycles per Instruction (CPI)
                           37
 Previously, our calculations of Execution time did
  not include the number of instructions needed for
  the program.
 Different instructions may take different amounts of
  time to execute, depending on what they do
 Example: The MOV (Move) instruction – moving
  data from one place to another
            The MOV instruction : Analogy
                            38
 Analyze Conrad’s movement of putting the red balls
  into the container.            To do 5 movements takes
                                 longer to execute than 3
Balls from prime storage:
-walk
                                     Balls from sub storage:
-fetch ball
                                     -fetch ball
-walk (halfway)
                                     -walk
-walk
                                     -put ball in container
-put ball in container
                                     ➔ Total = 3
➔ Total = 5
CPU Execution Time
         39
        a.k.a Instruction count
                     CPU clock cycle
  Example 2: Using Performance Equation
                               40
 Suppose we have two implementations of the same
 instruction set architecture (ISA) and for the same
 program. Which computer is faster and by how much?
  Computer A: clock cycle time=250 ps and CPI=2.0
  Computer B: clock cycle time=500 ps and CPI=1.2
    Note: because both computer uses the same program,
     and the Instruction Count is not given, we can assume it
     to be a variable I
                                Remember the formula
Example 2: Using Performance Equation
                          41
                               Remember: the lower the
                               execution time, the better the
                               performance.
   Computer A is faster
      How much faster is Computer A?
Example 2 (continued)…
          42
               We can conclude, A is 1.2
                times faster than B for
                    this program
                      Measuring the CPI
                                          43
 Sometimes it is possible to compute the CPU clock cycles by
  looking at the different types of instructions and using their
  individual clock cycle counts
     Ci = count of the number of instructions of class i executed
     CPIi = average number of cycles per instruction for that instruction class
     n = number of instruction classes
 Remember that overall CPI for a program will depend on both the
  number of cycles for each instruction type and the frequency of each
  instruction type in the program execution
                      Sample: Calculate CPI
                                             44
   You are on the design team for a new processor. The clock of the processor runs
    at 200 MHz. The following table gives instruction frequencies for Benchmark
    B, as well as how many cycles the instructions take, for the different classes of
    instructions. For this problem, we assume that (unlike many of today's
    computers) the processor only executes one instruction at a time.
                Instruction Type          Frequency   Cycles
                Loads & Stores               30%      6 cycles
                Arithmetic Instructions      50%      4 cycles
                All Others                   20%      3 cycles
    If we say that there are 100 instructions, then:
    ➢ 30 of them will be loads and stores.
    ➢ 50 of them will be arithmetic instructions.
    ➢ 20 of them will be all others.
    Formula: (30 * 6) + (50 * 4) + (20 * 3) = 440 cycles/100 instructions
                                                 = 4.4 cycles per instruction
Factors Affecting the CPU Performance
                  45
46
 Example 3 : Comparing Code Segments
                                             47
 A compiler designer is trying to decide between two code sequences for a particular
     computer. The hardware designers have supplied the following facts:
 For a particular high-level-language statement, the compiler writer is considering
     two code sequence that require the following instruction counts:
Example: code segments
a)    Which code sequence executes the most instructions?
b)    Which will be faster?
c)    What is the CPI for each sequence?
               Example 3 : Part (a)
                             48
Sequence 1 executes 2 + 1+ 2 = 5 Instructions
Sequence 2 executes 4 + 1+ 1 = 6 Instructions
        Seq 2 executes THE MOST instructions
 Example 3 : Part (b)
             49
                          Using this equation
                  Takes 10 cycles to execute
                  5 instructions
                  Takes 9 cycles to execute
                  6 instructions
Seq 2 is FASTER
Example 3 : Part (c)
         50
               Code SEQ2 uses fewer
               clock cycles, it must
               have a lower CPI
Example 4 : Comparing Code Segments
                                       51       Recall
 A processor has 3 classes of instructions:
  Instruction   CPI    Code      Code          Clock cycles    Clock cycles
                       SEQ1      SEQ2             SEQ1            SEQ2
      A          1       5         3                5               3
      B          2       3         2                6               4
      C          5       1         2                5              10
                       9 ins.    7 ins.          16 clock        17 clock
                                                  cycles          cycles
 Which code sequence is faster?
   Code SEQ1 ➔ Takes 16 cycles                 Code SEQ2 ➔ Takes 17 cycles
   to execute 9 instructions                   to execute 7 instructions
                        Code SEQ1 is FASTER
    Example 4a: Calculating with CPI
                               52
 The ADD instruction takes 1 clock cycle to execute, while
  the MUL instruction takes 3 clock cycles. If a program
  consists of 20 ADD and 10 MUL instructions, what is the
  average CPI?
                                    What do we know?
                  Clock    Instruction
    Instruction
                  cycles      count          There are 2
                                             instructions
       ADD          1          20
       MUL          3          10
Example 4a: Calculating average CPI
                 53
                      Instruction   Clock    Instruction
                                    cycles      count
                         ADD          1          20
                         MUL          3          10
    0
    0
                             ACTIVITY
                                       55
CPU X runs a program/code sequence Y which consists of 100
instructions. Calculate and fill in the table below:
a) The CPI for each instruction class given below.
b) The execution time for each instruction class, given a clock
    cycle time is 0.25miliseconds.
c) The CPU X’s execution time
d) The CPU X’s clock rate
Instruction   Instructions    Clock         a) CPI   b) Execution time
                 count        Cycles
    A              20           3
    B              25           1
    C              10           2
    D              30           2
    E              10           3
    F               5           4
    Increasing the CPU Performance
                           58
 Decreasing the clock cycle time
 Datapath organization leading to lower CPI
 Reduction in the number of executed instructions.
     Example 5: Improve Performance
                                   59                     What do we know?
 Our favourite program runs in 20 seconds on Computer P,
  which has 8 GHz clock. We are trying to help a computer
  designer build Computer Q that will run this program in 5
  seconds. The designer has determined that the substantial
  increase in the clock rate is possible, but this will affect the
  rest of the CPU design, causing computer Q to require 1.5
  times as many clock cycles as computer P for this program.
  What clock rate should we tell the designer to target?
Computer P                              Computer Q
CPU Execution Time = 20s                CPU Execution Time = 5s
Clock rate (CR) = 8GHz = 8 x 109        Clock cycle (CC) = 1.5 x clock cycle
Hz                                      Computer P
What do we know?
Computer P                              Computer Q
                                   60
CPU Execution Time = 20s                CPU Execution Time = 5s
Clock rate (CR) = 8GHz = 8 x 109        Clock cycle (CC) = 1.5 x clock cycle
Hz                                      Computer P
               Understanding the Units
                                64
 CPU execution time for a program = Seconds for the
    program (S/P)
   Clock cycle = clock cycles per program (C/P)
   Clock cycle time = Seconds per clock cycle (S/C)
   Clock rate = clock cycle per second (C/S)
   Instruction count = Instructions executed for the program
    (I/P)
   Clock cycle per instruction = Average number of clock
    cycles per instructions (C/I)
    Understanding the Units
                       65
                       It cancels each other to
                       give the unit.
Example:
   10s = 20cycle/ clock rate
   Clock rate = 20/10 cycle per seconds = 2Hz
                            1 Hz is 1 cycle per second