Lecture-4 Parallel hardware-Jameel-NNL
Lecture-4 Parallel hardware-Jameel-NNL
Level/Prerequisites: None
Basic design
– Memory is used to store both
program and data instructions
– Program instructions are coded data
which tell the computer to do
something
– Data is simply information to be used
by the program
A central processing unit (CPU)
gets instructions and/or data from
memory, decodes the instructions
and then sequentially performs
them.
Like everything else, parallel computing has its own "jargon". Some of the
more commonly used terms associated with parallel computing are listed
below. Most of these will be discussed in more detail later.
Task
– A logically discrete section of computational work. A task is
typically a program or program-like set of instructions that is
executed by a processor.
Parallel Task
– A task that can be executed by multiple processors safely
(yields correct results)
Serial Execution
– Execution of a program sequentially, one statement at a
time. In the simplest sense, this is what happens on a one
processor machine. However, virtually all parallel tasks will
have sections of a parallel program that must be executed
serially.
CS4172 P$DC Dr jameel Ahmad
Shared Memory
Distributed Memory
Hybrid Distributed-Shared Memory
Advantages
– Global address space provides a user-friendly programming
perspective to memory
– Data sharing between tasks is both fast and uniform due to the
proximity of memory to CPUs
Disadvantages:
– Primary disadvantage is the lack of scalability between memory
and CPUs. Adding more CPUs can geometrically increases traffic
on the shared memory-CPU path, and for cache coherent systems,
geometrically increase traffic associated with cache/memory
management.
– Programmer responsibility for synchronization constructs that
insure "correct" access of global memory.
– Expense: it becomes increasingly difficult and expensive to design
and produce shared memory machines with ever increasing
numbers of processors.
Advantages
– Memory is scalable with number of processors. Increase the
number of processors and the size of memory increases
proportionately.
– Each processor can rapidly access its own memory without
interference and without the overhead incurred with trying to
maintain cache coherency.
– Cost effectiveness: can use commodity, off-the-shelf processors
and networking.
Disadvantages
– The programmer is responsible for many of the details associated
with data communication between processors.
– It may be difficult to map existing data structures, based on global
memory, to this memory organization.
– Non-uniform memory access (NUMA) times