Northwestern University
CompEng 361: Computer Architecture
                                      Fall 2023
                                Homework 1 - Solutions
   1. Consider three different processors P1, P2, and P3 executing the same instruction set.
      P1 has a 3 GHz clock rate and a CPI of 1.2. P2 has a 2.6 GHz clock rate and a CPI of
      1.0. P3 has a 4.25 GHz clock rate and has a CPI of 3.
           a. Which processor has the highest performance expressed in instructions per
              second?
Use the CPI equation:
        CPU Time = Instructions ⨉ CPI ⨉ Cycle Time
Instructions / CPU Time = 1 / (CPI ⨉ Cycle Time) = Clock Rate / CPI
P1 IPS = (3.0e9 cycles/sec) / (1.2 cycles/inst) = 2.5e9 insts/sec = 2.5 BIPS
P2 IPS = (2.6e9 cycles/sec) / (1.0 cycles/inst) =2.6e9 insts/sec =2.6 BIPS
P3 IPS = (4.25e9 cycles/sec) / (3.0 cycles/inst) = 1.42 e9 insts/sec = 1.42 BIPS
P2 is the fastest in terms of instructions per second
           b. If the processors each execute a program in 10 seconds, find the number of
              cycles and the number of instructions.
CPU Cycles / CPU Time = Clock Rate therefore CPU Cycles = Clock Rate ⨉ CPU Time
P1 Cycles = (3.0e9 cycles / sec) ⨉ (10 secs) = 3.0e10 cycles
P2 Cycles = (2.6e9 cycles / sec) ⨉ (10 secs) = 2.6e10 cycles
P3 Cycles = (4.25e9 cycles / sec) ⨉ (10 secs) = 4.25e10 cycles
Instructions = Instructions / Sec ⨉ CPU Time
P1 Instructions = (2.5 BIPS) (10 secs) = 25 billion instructions
P2 Instructions = (2.6 BIPS) (10 secs) = 26 billion instructions
P3 Instructions = (1.42 BIPS) (10 secs) = 14.2 billion instructions
           c. We are trying to reduce the execution time by 30% but this leads to an increase
              of 20% in the CPI. What clock rate should we have to get this time reduction?
(CPU Time new) / (CPU Time old) = (Instructions ⨉ CPI new ⨉ Cycle Time new) / (Instructions
⨉ CPI old ⨉ Cycle Time old)
since the instructions are the same in both cases (same program) we have some cancelation
and can reorganize as:
1 / (Cycle Time new) = (CPU Time old) / (CPU Time new) ⨉ (CPI new)/(CPI old) ⨉ 1 / (Cycle
Time old)
Clock Rate new = (CPU Time old) / (CPU Time new) ⨉ (CPI new)/(CPI old) ⨉ (Clock Rate old)
Clock Rate new = (1 / (1 - .3)) ⨉ (1.2 ⨉ CPI old) / ( CPI old) ⨉ (Clock Rate old)
Clock Rate new = 1.43 ⨉ 1.2 ⨉ (Clock Rate old) = 1.71 ⨉ (Clock Rate old)
P1 Clock Rate new = (1.71) (3 GHz) = 5.13 GHz
P2 Clock Rate new = (1.71) (2.6 GHz) = 4.45 GHz
P3 Clock Rate new = (1.71) (4.25 GHz) = 7.27 GHz
    2. Consider two different implementations of the same instruction set architecture. The
       instructions can be divided into four classes according to their CPI (class A, B, C, and
       D). P1 with a clock rate of 2.25 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate
       of 3.5 GHz and CPIs of 2, 2, 2, and 3. Given a program with a dynamic instruction count
       of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50%
       class C, and 20% class D,
            a. Which implementation is faster?
        CPU Time = Instructions ⨉ CPI ⨉ Cycle Time
Instructions / CPU Time = 1 / (CPI ⨉ Cycle Time) = Clock Rate / CPI
P1 Instructions / Sec = (2.25 GHz) / ((.1)(1) + (.2)(2) + (.5)(3) + (.2)(3)) = 0.865 BIPS
P2 Instructions / Sec = (3.5 GHz) / ((.1)(2) + (.2)(2) + (.5)(2) + (.2)(3)) = 1.59 BIPS
P2 is faster!
            b. What is the global CPI for each implementation?
P1 CPI = ((.1)(1) + (.2)(2) + (.5)(3) + (.2)(3)) = 2.6 cycles/inst
P2 CPI = ((.1)(2) + (.2)(2) + (.5)(2) + (.2)(3)) = 2.2 cycles/inst
           c. Find the clock cycles required in both cases.
Cycles = Instructions ⨉ CPI
P1 Cycles = (10e6 insts)(2.6 cycles/inst ) = 2.6e6
P2 Cycles = (10e6 insts)(2.2 cycles/inst ) = 2.2e6
   3. Compilers can have a profound impact on the performance of an application. Assume
       that for a program, compiler A results in a dynamic instruction count of 1.15E9 and has
       an execution time of 1.15 s, while compiler B results in a dynamic instruction count of
       1.6E9 and an execution time of 1.75 s.
            a. Find the average CPI for each program given that the processor has a clock
               cycle time of 1 ns.
CPI = CPU Time / (Instructions ⨉ Cycle Time)
Compiler A CPI = (1.15)/((1.15e9)(1ns)) = 1 cycles / inst
Compiler B CPI = (1.75)(/(1.6e9)(1ns)) = 1.09 cycles / inst
            b. Assume the compiled programs run on two different processors. If the execution
               times on the two processors are the same, how much faster is the clock of the
               processor running compiler A’s code versus the clock of the processor running
               compiler B’s code?
CPU Time = (Insts A ⨉ CPI A) / Clock Rate A = (Insts B ⨉ CPI B) / Clock Rate B
(Clock Rate A)/(Clock Rate B) = (Insts A ⨉ CPI A) / (Insts B ⨉ CPI B)
= ((1.15e9 insts)(1 cpi)) / ((1.6e9 insts)(1.09 cpi)) = 0.624
            c. A new compiler is developed that uses only 6.0E8 instructions and has an
               average CPI of 1.1. What is the speedup of using this new compiler versus using
               compiler A or B on the original processor?
CPU Time = Instructions ⨉ CPI ⨉ Cycle Time = (6.0e9 insts)(1.1 cycles/inst)(1 ns)
= 0.66 secs
CPU A speedup = 1.15 secs / 0.66 secs = 1.74 speedup
CPU B speedup = 1.75 secs / 0.66 secs = 2.65 speedup
   4. Assume a 15 cm diameter wafer has a cost of 13.5 dollarbucks, contains 74 dies, and
      has 0.020 defects/cm2. Assume a 21 cm diameter wafer has a cost of 16 dollarbucks,
      contains 110 dies, and has 0.031 defects/cm2.
           a. Find the yield for both wafers.
Yield = 1 / (1 + (defects/area ⨉ die area/2))^2
Yield A = 1 / (1 + (0.02)(176.71 cm^2)(74)/2))^2 = 95.39%
Yield B = 1 / (1 + (0.031)(346.36 cm^2)(110)/2))^2 = 90.91%
             b. Find the cost per die for both wafers.
Die cost = wafer cost / (dies/wafer ⨉ die yield)
Die cost A = 13.5 / ((74)(0.9523)) = 0.19 dollarbucks
Die cost B = 16 / ((110)(0.9091)) = 0.16 dollarbucks
           c. If the number of dies per wafer is increased by 10% and the defects per area unit
              increases by 15%, find the die area and yield.
For A, we now have 81.4 dies/wafer and area per die = 2.18 cm^2
Yield A = 1/(1 + ((0.023)(2.18/2))^2 = 95.17%
For B, we now have 121 dies/wafer and area per die = 2.86 cm^2
Yield B = 1/(1 + (0.03565)(2.86/2))^2 = 90.53%
           d. Assume a fabrication process improves the yield from 0.92 to 0.95. Find the
              defects per area unit for each version of the technology given a die area of
              200mm2.
Solving the yield equation for defects/area gives us:
Defects/Area = ((Sqrt(1/Yield) - 1) ⨉ 2 )/ Die Area
OG Tech = (Sqrt(1 / 0.92) - 1) ⨉ 2) / (2 cm^2) = 0.0426 defects / cm^2
New Tech = (Sqrt( 1/ 0.95) - 1) ⨉ 2) / (2 cm^2) = 0.026 defects / cm^2
   5. For the following C statement, what is the corresponding RISC-V assembly code?
      Assume that the variables i and j are assigned to registers t1 and t2, respectively.
      Assume that the base address of the arrays A and B are in registers t3 and t4,
      respectively, and assume that the elements of both arrays are 4-byte words.
               B[8] = A[i-j];
               sub t5, t1, t2        ;;   t5    ←   i - j
               slli t5, t5, 2        ;;   t5    ←   4 ⨉ t5
               add t5, t5, t3        ;;   t5    ←   &A[i - j]
               lw t5, 0 (t5)         ;;   t5    ←   *(&A[i - j])
           sw t5, 32 (t4)        ;; B[8] ← A[i - j]
6. Translate the following C code to RISC-V. Assume that the variables i and j are
   assigned to registers t1 and t2, respectively. Assume that the base address of the
   arrays A and B are in registers t3 and t4, respectively, and assume that the elements of
   both arrays are 4-byte words.
           B[8] = A[i] - A[j];
           slli t5, t1, 2        ;;   t5 ← 4 ⨉ i
           slli t6, t2, 2        ;;   t6 ← 4 ⨉ j
           add t5, t5, t3        ;;   t5 ← &A[i]
           add t6, t6, t3        ;;   t6 ← &A[j]
           lw t5, 0 (t5)         ;;   t5 ← *(&A[i])
           lw t6, 0 (t6)         ;;   t6 ← *(A[j])
           sub t5, t5, t6        ;;   t5 ← A[i] - A[j]
           sw t5, 32 (t4)        ;;   B[8] ← A[i] - A[j]
7. Provide the type and RISC-V assembly language instruction for the following binary
   value:
   0000 1000 0000 0000 0000 0101 0001 0011two
           addi x10, x0, 128
8. Provide the binary representation of following RISC-V instruction:
   lw a0, 32(sp)
   a0 = x10 and sp = x2
   0000 0010 0000 0001 0010 0101 0000 0011