ICT1.
003 – Computer Architecture
Chapter 1: Computer Abstraction & Performance
exercises
1. Consider three different processors P1, P2, and P3 execu ng the same instruc on set. P1
has a 3 GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3
has a 4.0 GHz clock rate and has a CPI of 2.2.
a. Which processor has the highest performance expressed in instruc ons per second?
b. If the processors each execute a program in 10 seconds, find the number of cycles
and the number of instruc ons.
c. We are trying to reduce the execu on me by 30%, but this leads to an increase of
20% in the CPI. What clock rate should we have to get this me reduc on?
2. Consider two different implementa ons of the same instruc on set architecture. The
instruc ons can be divided into four classes according to their CPI (classes A, B, C, and
D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3, and 3, and P2 with a clock rate of 3
GHz and CPIs of 2, 2, 2, and 2.
Given a program with a dynamic instruc on count of 1.0E6 instruc ons divided into
classes as follows: 10% class A, 20% class B, 50% class C, and 20% class D, which is faster:
P1 or P2?
a. What is the global CPI for each implementa on?
b. Find the clock cycles required in both cases.
3. Compilers can have a profound impact on the performance of an applica on. Assume
that for a program, compiler A results in a dynamic instruc on count of 1.0E9 and has an
execu on me of 1.1 s, while compiler B results in a dynamic instruc on count of 1.2E9
and an execu on me of 1.5 s.
a. Find the average CPI for each program given that the processor has a clock cycle me
of 1 ns.
b. Assume the compiled programs run on two different processors. If the execu on
mes on the two processors are the same, how much faster is the clock of the
processor running compiler A’s code versus the clock of the processor running
compiler B’s code?
c. A new compiler is developed that uses only 6.0E8 instruc ons and has an average
CPI of 1.1. What is the speedup of using this new compiler versus using compiler A or
B on the original processor?
4. Assume for arithme c, load/store, and branch instruc ons, a processor has CPIs of 1, 12,
and 5, respec vely. Also assume that on a single processor a program requires the
execu on of 2.56E9 arithme c instruc ons, 1.28E9 load/store instruc ons, and 256
million branch instruc ons. Assume that each processor has a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over mul ple cores, the number of
arithme c and load/store instruc ons per processor is divided by 0.7 × p (where p is the
number of processors) but the number of branch instruc ons per processor remains the
same.
a. Find the total execu on me for this program on 1, 2, 4, and 8 processors, and show
the rela ve speedup of the 2, 4, and 8 processors result rela ve to the single
processor result.
b. If the CPI of the arithme c instruc ons was doubled, what would the impact be on
the execu on me of the program on 1, 2, 4, or 8 processors?
c. To what should the CPI of load/store instruc ons be reduced in order for a single
processor to match the performance of four processors using the original CPI values?
5. Suppose that we are developing a new version of the AMD Barcelona processor with a 4
GHz clock rate. We have added some addi onal instruc ons to the instruc on set in such
a way that the number of instruc ons has been reduced by 15%. The execu on me is
reduced to 700 s and the new SPEC ra o is 13.7.
a. Find the new CPI?
b. This CPI value is larger than obtained in (a) as the clock rate was increased from 3 GHz
to 4 GHz. Determine whether the increase in the CPI is similar to that of the clock rate.
If they are dissimilar, why?
c. By how much has the CPU me been reduced?
d. For a second benchmark, libquantum, assume an execu on me of 960 ns, CPI of 1.61,
and clock rate of 3 GHz. If the execu on me is reduced by an addi onal 10% without
affec ng to the CPI and with a clock rate of 4 GHz, determine the number of
instruc ons.
e. Determine the clock rate required to give a further 10% reduc on in CPU me while
maintaining the number of instruc ons and with the CPI unchanged.
f. Determine the clock rate if the CPI is reduced by 15% and the CPU me by 20% while
the number of instruc ons is unchanged.
6. Consider the following two processors. P1 has a clock rate of 4 GHz, average CPI of 0.9,
and requires the execu on of 5.0E9 instruc ons. P2 has a clock rate of 3 GHz, an average
CPI of 0.75, and requires the execu on of 1.0E9 instruc ons.
a. One usual fallacy is to consider the computer with the largest clock rate as having
the highest performance. Check if this is true for P1 and P2.
b. Another fallacy is to consider that the processor execu ng the largest number of
instruc ons will need a larger CPU me. Considering that processor P1 is execu ng a
sequence of 1.0E9 instruc ons and that the CPI of processors P1 and P2 do not
change, determine the number of instruc ons that P2 can execute in the same me
that P1 needs to execute 1.0E9 instruc ons.
c. A common fallacy is to use MIPS (millions of instruc ons per second) to compare the
performance of two different processors, and consider that the processor with the
largest MIPS has the largest performance. Check if this is true for P1 and P2.
d. Another common performance figure is MFLOPS (millions of floa ng-point
opera ons per second), defined as
MFLOPS = No. FP opera ons/ (execu on me ×1,000,000)
but this figure has the same problems as MIPS. Assume that 40% of the instruc ons
executed on both P1 and P2 are floa ng-point instruc ons. Find the MFLOPS figures
for the processors.
7. Another pi all is expec ng to improve the overall performance of a computer by
improving only one aspect of the computer. Consider a computer running a program
that requires 250s, with 70s spent execu ng FP instruc ons, 85s executed L/S
instruc ons, and 40s spent execu ng branch instruc ons.
a. By how much is the total me reduced if the me for FP opera ons is reduced by
20%?
b. By how much is the me for INT opera ons reduced if the total me is reduced by
20%?
c. Can the total me can be reduced by 20% by reducing only the me for branch
instruc ons?
8. Assume a program requires the execu on of 50 × 106 FP instruc ons, 110 × 106 INT
instruc ons, 80 × 106 L/S instruc ons, and 16 × 106 branch instruc ons. The CPI for
each type of instruc on is 1, 1, 4, and 2, respec vely. Assume that the processor has a 2
GHz clock rate.
a. By how much must we improve the CPI of FP instruc ons if we want the program to
run two mes faster?
b. By how much must we improve the CPI of L/S instruc ons if we want the program to
run two mes faster?
c. By how much is the execu on me of the program improved if the CPI of INT and FP
instruc ons is reduced by 40% and the CPI of L/S and Branch is reduced by 30%?
9. When a program is adapted to run on mul ple processors in a mul processor system,
the execu on me on each processor is comprised of compu ng me and the overhead
me required for locked cri cal sec ons and/or to send data from one processor to
another.
Assume a program requires t = 100 s of execu on me on one processor. When run p
processors, each processor requires t/p s, as well as an addi onal 4 s of overhead,
irrespec ve of the number of processors. Compute the per-processor execu on me for
2, 4, 8, 16, 32, 64, and 128 processors. For each case, list the corresponding speedup
rela ve to a single processor and the ra o between actual speedup versus ideal speedup
(speedup if there was no overhead).
10. Assume that for a given program 70% of the executed instruc ons are arithme c, 10%
are load/store, and 20% are branch.
a. Given this instruc on mix and the assump on that an arithme c instruc on requires
two cycles, a load/store instruc on takes six cycles, and a branch instruc on takes
three cycles, find the average CPI.
b. For a 25% improvement in performance, how many cycles, on average, may an
arithme c instruc on take if load/store and branch instruc ons are not improved at
all?
c. For a 50% improvement in performance, how many cycles, on average, may an
arithme c instruc on take if load/store and branch instruc ons are not improved at
all?