Lecture 36
Lecture 36
ARCHITECTURE (COA)
EET 2211
4TH SEMESTER – CSE & CSIT
CHAPTER 18, LECTURE 36
By Ms. Arya Tripathy
MULTICORE
COMPUTERS
v1 s t
ha rdwa re p e r f o r m a n c e i s s u e i s I N C R E A S E I N PA R A L L E L I S M A N D
COMPLEXITY
vThe organizational changes in processor design have primarily been focused on exploiting
ILP, so that more work is done in each clock cycle.These changes include, in chronological
order:
1. Pipelining: Individual instructions are executed through a pipeline of stages so that while
one instruction is executing in one stage of the pipeline, another instruction is executing in
another stage of the pipeline.
vWith each of these innovations, designers have over the years attempted to increase the performance
of the system by adding complexity.
vIn the case of pipelining, for example, simple three-stage pipelines were replaced by pipelines with
five stages.
vThere is a practical limit to how far this trend can be taken, because with more stages, there is the
need for more logic, more interconnections, and more control signals.
vSimilarly, with superscalar organization, increased performance can be achieved by increasing the
number of parallel pipelines.
vThe increase in complexity to deal with all of the logical issues related to very long pipelines,
multiple superscalar pipelines, and multiple SMT register banks means that increasing amounts of the
chip area are occupied with coordinating and signal transfer logic.
vThis increases the difficulty of designing, fabricating, and debugging the chips.
vIn general terms, the experience of recent decades has been encapsulated in a rule of thumb known
as Pollack’s rule, which states that performance increase is roughly proportional to square root of
increase in complexity.
vIn other words, if you double the logic in a processor core, then it delivers only 40% more
performance.
vIn principle, the use of multiple cores has the potential to provide near-linear performance
improvement with the increase in the number of cores—but only for software that can take advantage.
üTo maintain the trend of higher performance as the number of transistors per chip rises, designers
have resorted to more elaborate processor designs (pipelining, superscalar, SMT) and to high clock
frequencies.
üUnfortunately, power requirements have grown exponentially as chip density and clock frequency
have risen.
üOne way to control power density is to use more of the chip area for cache memory.
üMemory transistors are smaller and have a power density an order of magnitude lower than that of
logic.
üPower considerations provide another motive for moving toward a multicore organization. Because
the chip has such a huge amount of cache memory, it becomes unlikely that any one thread of
execution can effectively use all that memory.
üEven with SMT, multithreading is done in a relatively limited fashion and cannot therefore fully
exploit a gigantic cache, whereas a number of relatively independent threads or processes has a greater
opportunity to take full advantage of the cache memory.
9 MULTICORE COMPUTERS 7/16/2021
SOFTWARE PERFORMANCE ISSUES
vThe potential performance benefits of a multicore organization depend on the ability
to effectively exploit the parallel resources available to the application.
vSpeed up =
vBut as Figure (a) on the next slide shows, even a small amount of serial code has a
noticeable impact.
vIf only 10% of the code is inherently serial, running the program on a multicore
system with eight processors yields a performance gain of only a factor of 4.7.
vMany kinds of servers can also effectively use the parallel multicore
organization, because ser vers typically handle numerous relatively
independent transactions in parallel.