cloud computing
cloud computing
Session 3
Principles of Parallel and Distributed
Computing
Recap
First Generation
Second Generation
Third Generation
Fourth Generation
9.1 3
Session 3 - contents
6
Mile Stones to Cloud computing Evolution
2010:
1970: DARPA’s 1999: Grid Microsoft
TCP/IP Computing Azure
1984: IEEE 1997: IEEE 2008: Google
802.3 802.11 (Wi- AppEngine
Ethernet
1966: Flynn’s Taxonomy & LAN Fi)
1989: TCP/IP 2007: Manjrasoft
SISD, SIMD, MISD,
IETF RFC Aneka
MIMD
1969: 1122
ARPANET 1984: 2005: Amazon
1951: UNIVAC AWS (EC2, S3)
DEC’s
I, 1975: Xerox VMScluster 2004: Web
First Mainframe PARC
Clouds Invented
1990: Lee-Calliau 2.0
1960: Cray’s WWW, HTTP,
Grids Ethernet
First HTML
Clusters Supercompute
r
Mainframes
11
Elements of Parallel
computing
Silicon-based processor chips are reaching their physical limits.
Processing speed is constrained by the speed of light, and the
density of transistors packaged in a processor is constrained by
thermodynamics limitations.
A viable solution to overcome this limitation is to connect multiple
processors working in coordination with each other to solve
“Grand Challenge” problems.
The first step in this direction led
◦ To the development of parallel computing, which encompasses techniques,
architectures, and systems for performing multiple activities in parallel.
◦ As discussed earlier, the term parallel computing has blurred its edges with
the term distributed computing and is often used in place of later term.
What is Parallel Processing?
Processing of multiple tasks simultaneously on multiple processors is called parallel
processing.
The parallel program consists of multiple active processes ( tasks) simultaneously
solving a given problem.
A given task is divided into multiple subtasks using a divide-and-conquer technique,
and each subtask is processed on a different central processing unit (CPU).
Programming on multi processor system using the divide-and-conquer technique is
called parallel programming.
Many applications today require more computing power than a traditional sequential
computer can offer.
Parallel Processing provides a cost effective solution to this problem by increasing the
number of CPUs in a computer and by adding an efficient communication system
between them.
The workload can then be shared between different processors. This setup results in
higher computing power and performance than a single processor a system offers.
Parallel Processing influencing factors
The development of parallel processing is being influenced by many
factors. The prominent among them include the following:
◦ Computational requirements are ever increasing in the areas of both scientific and
business computing. The technical computing problems, which require high-speed
computational power, are related to
life sciences, aerospace, geographical information systems, mechanical design and analysis etc.
◦ Sequential architectures are reaching mechanical physical limitations as they are
constrained by the speed of light and thermodynamics laws.
The speed which sequential CPUs can operated is reaching saturation point ( no more vertical
growth), and hence an alternative way to get high computation speed is to connect multiple CPUs (
opportunity for horizontal growth).
◦ Hardware improvements in pipelining , super scalar, and the like are non scalable and
require sophisticated compiler technology.
Developing such compiler technology is a difficult task.
◦ Vector processing works well for certain kinds of problems. It is suitable mostly for
scientific problems ( involving lots of matrix operations) and graphical processing.
It is not useful for other areas, such as databases.
◦ The technology of parallel processing is mature and can be exploited commercially
here is already significant R&D work on development tools and environments.
◦ Significant development in networking technology is paving the way for
heterogeneous computing.
Hardware architectures for parallel Processing
Processor 1
Machines built using MISD
model are not useful in
most of the applications.
Few machines are built
but none of them Processor 2
available commercially.
This type of systems are
more of an intellectual
Processor N
exercise than a practical 18
ultiple – Instruction , Multiple Data (MIMD) systems
MIMD computing system is
a multi processor machine
capable of executing
multiple instructions on
multiple data sets. Instructio Instructio Instructio
n n n
Each PE in the MIMD model Stream 1 Stream 2 Stream N
asynchronously,
Processor 2
MIMD machines are
broadly categorized into Data Input Data Output 3
N
shared-memory MIMD and
distributed memory MIMD
Processor N
based on the way PEs are
coupled to the main
memory.
19
Shared Memory MIMD machines
All the PEs are connected to a
single global memory and they all
have access to it.
Systems based on this model are
also called tightly coupled multi
processor systems.
The communication between PEs Processor 1 Processor 2 Processor N
in this model takes place through
the shared memory. Memory
Modification of the data stored in Bus
the global memory by one PE is
visible to all other PEs.
Dominant representative shared Global System Memory
memory MIMD systems are silicon
graphics machines and Sun/IBM
SMP ( Symmetric Multi-
Processing).
Distributed Memory MIMD machines
All PEs have a local memory. Systems
based on this model are also called IPC Channel IPC Channel
loosely coupled multi processor
systems.
The communication between PEs in
this model takes place through the
interconnection network, the inter
process communication channel, or
IPC. Processor 1 Processor 2 Processor 2
The network connecting PEs can be
configured to tree, mesh, cube, and Memory Memory Memory
Bus Bus Bus
so on.
Each PE operates asynchronously,
and if Local Local Local
Memory Memory Memory
communication/synchronization
among tasks is necessary, they can
do so by exchanging messages
between them.
Shared Vs Distributed MIMD model
The shared memory MIMD architecture is easier to program but
is less tolerant to failures and harder to extend with respect to
the distributed memory MIMD model.
Failures, in a shared memory MIMD affect the entire system,
whereas this is not the case of the distributed model, in which
each of the PEs can be easily isolated.
Moreover, shared memory MIMD architectures are less likely to
scale because the addition of more PEs leads to memory
contention.
This is a situation that does not happen in the case of distributed
memory, in which each PE has its own memory.
As a result, distributed memory MIMD architectures are most
popular today.
Approaches to Parallel Programming
A sequential program is one that runs on a single
processor and has a single line of control.
To make many processors collectively work on a
single program, the program must be divided into
smaller independent chunks so that each processor
can work on separate chunks of the problem.
The program decomposed in this way is a parallel
program.
A wide variety of parallel programming approaches
are available.
23
Approaches to Parallel Programming Contd…
The most prominent among them are the following.
◦ Data Parallelism
◦ Process Parallelism
◦ Farmer-and-worker model
The above said three models are suitable for task-level parallelism. In the case of
data level parallelism, the divide-and-conquer technique is used to split data into
multiple sets, and each data set is processed on different PEs using the same
instruction.
This approach is highly suitable to processing on machines based on the SIMD
model.
In the case of Process Parallelism, a given operation has multiple (but distinct)
activities that can be processed on multiple processors.
In the case of Farmer-and-Worker model, a job distribution approach is used, one
processor is configured as master and all other remaining PEs are designated as
slaves, the master assigns the jobs to slave PEs and, on completion, they inform
the master, which in turn collects results.
These approaches can be utilized in different levels of parallelism.
24
Levels of Parallelism
Levels of Parallelism are decided on
the lumps of code ( grain size) that Grain Size Code Parallelized
can be a potential candidate of Item By
parallelism.
Large Separate Programmer
The table shows the levels of
parallelism. and
All these approaches have a common
heavy
goal weight
◦ To boost processor efficiency by hiding process
latency. Medium Function Programmer
◦ To conceal latency, there must be
or
another thread ready to run whenever
a lengthy operation occurs. procedure
The idea is to execute Fine Loop or Parallelizing
concurrently two or more single- instructio compiler
threaded applications. Such as n block
compiling, text formatting, Very Fine Instructio Processor
database searching, and device n
simulation. 25
Levels of Parallelism
Messages Messages
IPC IPC Large Level
(Processes, Tasks)
Task 1 Task 2 Task N
Shared Shared
function f1() Memory function f2() Memory function fJ() Medium Level
{…} {…} {…} (Threads, Functions)
Function 1 Function 2 Function J
26
Laws of Caution Speed Vs
Cost
Studying how much an application or
a software system can gain from
parallelism.
In particular what need to keep in
mind is that parallelism is used to
perform multiple activities together so
that the system can increase its
throughput or its speed.
But the relations that control the #Processors Vs
increment of speed are not linear.
For example: for a given n processors,
Speed
the user expects speed to be increase
by in times. This is an ideal situation,
but it rarely happens because of the
communication overhead.
Here two important guidelines to take
into account.
27
Session3 - Summary
Google Classroom
Code: 4jpxnjt
Attend Unit1-Quiz2
Thank you