Chapter # 1
Chapter # 1
Computing
Lecture –
Introduction
1
Background – Serial Computing
• Earlier, computer software was written conventionally for Serial
computing.
• Standard computing is also known as “Serial computing"
• This meant that to solve a problem, an algorithm divides the
problem into smaller instructions.
• These discrete instructions are then executed on the Central
Processing Unit of a computer one by one.
• Only after one instruction is finished, next one starts. 2
Background – Serial Computing
3
Background – Serial Computing
4
Background – Serial Computing
• A real-life example of this would be people standing in a queue waiting for a
movie ticket and there is only a cashier.
• The cashier is giving tickets one by one to the persons. The complexity of this
situation increases when there are 2 queues and only one cashier.
• So, in short, Serial Computing is following:
1. In this, a problem statement is broken into discrete instructions.
2. Then the instructions are executed one by one.
3. Only one instruction is executed at any moment of time.
5
Background – Serial Computing
• Look at point 3. This was causing a huge problem in the computing industry
as only one instruction was getting executed at any moment of time. This was
a huge waste of hardware resources, as only one part of the hardware will be
running for particular instruction and of time.
• As problem statements were getting heavier and bulkier, so does the amount
of time in execution of those statements. Examples of processors are
Pentium 3 and Pentium 4.
• Now let’s come back to our real-life problem. We could definitely say that
complexity will decrease when there are 2 queues and 2 cashiers giving
tickets to 2 persons simultaneously. This is an example of Parallel Computing.
6
Parallel Computer
• Virtually all stand-alone computers today are parallel from a hardware
perspective:
- Multiple functional units (L1 cache, L2 cache, branch, prefetch, decode,
floating-point, graphics processing (GPU), integer, etc.)
- Multiple execution units/cores
- Multiple hardware threads
7
Parallel Computer
8
Parallel Computer
• Networks connect multiple stand-alone computers (nodes) to make larger
parallel computer clusters.
9
Parallel Computing
• Kind of computing architecture where the large problems break into
independent, smaller, usually similar parts that can be processed in one go.
11
Parallel Computing
12
Parallel Computing
• Helps in faster application processing and task resolution by increasing the
available computation power of systems.
• The parallel computing principles are used by most supercomputers employ
to operate.
• The operational scenarios that need massive processing power or
computation, generally, parallel processing is commonly used there.
13
Parallel Computing
• Typically, this infrastructure is housed where various processors are installed in a server
rack; application server distributes the computational requests into small chunks then
requests are processed simultaneously on each server.
• The earliest computer software is written for serial computation as they are able to
execute a single instruction at one time, but parallel computing is different where it
executes several processors an application or computation in one time.
14
Parallel Computing – Why?
• The Real-World is a Massively Complex
- In the natural world, many complex, interrelated events are happening at
the same time, yet within a temporal sequence.
15
Parallel Computing – Why?
• The Real-World is a Massively Complex
- Compared to serial computing, parallel computing is much better suited
for modeling, simulating and understanding complex, real world
- phenomena.
- For example, imagine modeling these serially:
16
Parallel Computing – Why?
• Save Time/Monet
- In theory, throwing more resources at a task will shorten its time to
completion, with potential cost savings.
- Parallel computers can be built from cheap, commodity components.
17
Parallel Computing – Why?
• Solve Larger/Complex Problems
- Many problems are so large and/or complex that it is
impractical or impossible to solve them using a serial
program, especially given limited computer memory.
18
Parallel Computing – Why?
• Solve Larger/Complex Problems
- Example: "Grand Challenge Problems"
(en.wikipedia.org/wiki/Grand_Challenge) requiring petaflops and
petabytes of computing resources.
- Example: Web search engines/databases processing millions of
transactions every second
Petaflops =
Petabytes =
19
Parallel Computing – Why?
• Provide Accuracy
- Single compute resource can only do one thing at a time.
Multiple compute resources can do many things simultaneously.
- Example: Collaborative Networks provide a global venue where
people from around the world can meet and conduct work
"virtually".
20
Parallel Computing – Why?
• Provide Accuracy
- Single compute resource can only do one thing at a time.
Multiple compute resources can do many things simultaneously.
- Example: Collaborative Networks provide a global venue where
people from around the world can meet and conduct work
"virtually".
21
Parallel Computing – Why?
• Take Advantage of non-local Resources
- Using compute resources on a wide area network, or even the
Internet when local compute resources are scarce or insufficient.
- Example: SETI@home (setiathome.berkeley.edu) has over 1.7
million users in nearly every country in the world. (May, 2018).
22
Parallel Computing – Why?
• BETTER USE OF UNDERLYING PARALLEL HARDWARE
- Modern computers, even laptops, are parallel in architecture with multiple
processors/cores.
- Parallel software is specifically intended for parallel hardware with
multiple cores, threads, etc.
- In most cases, serial programs run on modern computers "waste"
potential computing power.
23
Parallel Computing – Who is using?
• Science and Engineering
- Historically, parallel computing has been considered to be "the high end of
computing", and has been used to model difficult problems in many areas
of science and engineering:
- Atmosphere, Earth, Environment
- Physics - applied, nuclear, particle, condensed matter, high pressure,
fusion, photonics
- Bioscience, Biotechnology, Genetics
- Chemistry, Molecular Sciences
24
Parallel Computing – Who is using?
• Science and Engineering
- Geology, Seismology
- Mechanical Engineering - from prosthetics to spacecraft
- Electrical Engineering, Circuit Design, Microelectronics
- Computer Science, Mathematics
- Defense, Weapons
25
Parallel Computing – Who is using?
• Industrial and Commercial
- • Today, commercial applications provide an equal or greater driving force
in the development of faster computers. These applications require
- the processing of large amounts of data in sophisticated ways. For
example:
- "Big Data",data mining
- Artificial Intelligence (AI)
- Oil exploration
26
Parallel Computing – Who is using?
• Industrial and Commercial
- Web search engines, web based business services
- Medical imaging and diagnosis
- Pharmaceutical design
- Financial and economic modeling
- Management of national and multi-national corporations
- Advanced graphics and virtual reality, particularly in the entertainment
industry
- Networked video and multi-media technologies
- Collaborative work environments 27
Parallel Computing – Who is using?
• Global Applications
- Parallel computing is now being used extensively around the world, in a
wide variety of applications
28
Parallel Computing - Advantages
• It saves time and money as many resources working together will reduce the
time and cut potential costs.
• It can take advantage of non-local resources when the local resources are
finite.
• The instruction stream (I) and the data stream (D) can be either single (S) or multiple (M)
30
Parallel Computing – Flynn’s Taxonomy
• M.J. Flynn proposed a classification for the organization of a computer system by the
number of instructions and data items that are manipulated simultaneously.
• Parallel processing may occur in the instruction stream, in the data stream, or both.
• Four combinations: SISD, SIMD, MISD, MIMD
31
Parallel Computing – Flynn’s Taxonomy
• Instructions are executed sequentially, and the system may or may not have internal
parallel processing capabilities.
• Most conventional computers have SISD architecture like the traditional Von-Neumann
computers.
32
Parallel Computing – Flynn’s Taxonomy
• Instructions are decoded by the Control Unit and then the Control Unit sends the
instructions to the processing units for execution.
35
Parallel Computing – Flynn’s Taxonomy
- The commander barks out an order that all the active soldiers should do
and they execute the order synchronously.
36
Parallel Computing – Flynn’s Taxonomy
• Most modern computers, particularly those with graphics processor units (GPUs) employ
SIMD instructions and execution units.
37
Parallel Computing – Flynn’s Taxonomy
38
Parallel Computing – Flynn’s Taxonomy
39
Parallel Computing – Flynn’s Taxonomy
40
Parallel Computing – Flynn’s Taxonomy
• Processors are asynchronous, since they can independently execute different programs
on different data sets.
• MIMD’s have been considered by most researchers to include the most powerful, least
restricted computers.
41
Parallel Computing – Flynn’s Taxonomy
42
Parallel Computing –Parallel Computer Architectures
• Shared memory: All processors can access the same memory
43
Parallel Computing –Non-Uniform Memory Access (NUMA)
• Not all processors have equal access to all memories
70
Parallel Computing – Limitations
• It addresses such as communication and synchronization between multiple
sub-tasks and processes which is difficult to achieve.
• The algorithms must be managed in such a way that they can be handled in a
parallel mechanism.
• The algorithms or programs must have low coupling and high cohesion. But
it’s difficult to create such programs.
72
Parallel Computing – Types
• Bit-level parallelism
- Every task is dependent on processor word size. In terms of performing a
task on large-sized data, it reduces the number of instructions the
processor must execute.
- There is a need to split the operation into series of instructions. For
example, there is an 8-bit processor, and you want to do an operation on
16-bit numbers. First, it must operate the 8 lower-order bits and then the
8 higher-order bits. Therefore, two instructions are needed to execute the
operation. The operation can be performed with one instruction by a 16-
bit processor.
73
Parallel Computing – Types
• Instruction-level parallelism
- A processor can only address less than one instruction for each clock cycle
phase. These instructions can be re-ordered and grouped which are later
on executed concurrently without affecting the result of the program. This
is called instruction-level parallelism
74
Parallel Computing – Parallel Programming
• Parallel programming (also, unfortunately, sometimes called concurrent
programming), is a computer programming technique that provides for the
execution of operations concurrently, either
- within a single parallel computer OR
- across a number of systems.
• In the latter case, the term distributed computing is used.
75
Parallel Computing – Parallel Programming Model
• Shared Model
- tasks share a common address space, which they read and write
asynchronously.
- Various mechanisms such as locks / semaphores may be used to control
access to the shared memory.
- Advantage:
- no need to explicitly communicate of data between tasks -> simplified
programming
- Disadvantages:
- Need to take care when managing memory, avoid synchronization
conflicts
- Harder to control data locality
76
Parallel Computing – Parallel Programming Model
• Threads
- A thread can be considered as a subroutine in the main program
- Threads communicate with each other through the global memory
- commonly associated with shared memory architectures and operating
systems
- PosixThreads or pthreads
- OpenMP
77
Parallel Computing – Parallel Programming Model
• Message Passing
- A set of tasks that use their own local memory during computation.
- Data exchange through sending and receiving messages.
- Data transfer usually requires cooperative operations to be performed by
each process. For example, a send operation must have a matching receive
operation.
- MPI (released in 1994)
- MPI-2 (released in 1996)
78
Parallel Computing – Parallel Programming Model
• Data Parallel Model
- On shared memory architectures, all tasks may have access to the data structure through
global memory. On distributed memory architectures the data structure is split up and
79
resides as "chunks" in the local memory of each task.
Parallel Computing – Parallel Programming Model
Data Parallel Model
80
Parallel Computing – Parallel Programming Model
• Hybrid
- combines various models, e.g. MPI/OpenMP
• Single Program Multiple Data (SPMD)
- A single program is executed by all tasks simultaneously
81
von Neumann Computer
• Model for designing and building computers, based on the following
three characteristics:
1) The computer consists of four main sub-systems:
• Memory
• ALU (Arithmetic/Logic Unit)
• Control Unit
• Input/Output System (I/O)
Processor (CPU)
Memory Input-Output
Control Unit
ALU
Communicate with
Store data and program
"outside world", e.g.
• Screen
• Keyboard
Execute program
• Storage devices
• ...
Do arithmetic/logic operations
requested by program
83
von Neumann Computer
• All computers more or less based on the same basic design, the Von
Neumann Architecture!
84
Personnel Computer
85
Motivations for Parallel Computing
• Fundamental limits on single processor speed
• Heat dissipation from CPU chips
• Disparity between CPU & memory speeds
• Distributed data communications
• Need for very large scale computing platforms
86
Motivations for Parallel Computing
• Fundamental limits – Cycle Speed
- Intel 8080 2MHz 1974
- ARM 2 8MHz 1986
- Intel Pentium Pro 200MHz 1996
- AMD Athlon 1.2GHz 2000
- Intel QX6700 2.66GHz 2006
- Intel Core i7 3770k 3.9GHz 2013
- Speed of light: 30cm in 1ns
87
Moore’s Law
• Moore’s observation in 1965: the number of transistors per square
inch on integrated circuits had doubled every year since the
integrated circuit was invented
• Moore’s revised observation in 1975: the pace was slowed down a
bit, but data density had doubled approximately every 18 months
• How about the future? (price of computing power falls by a half
every 18 months?)
88
Moore’s Law – Held for Now
89
Power Wall Effect in Computer Architecture
• Too many transistors in a given chip die area
• Tremendous increase in power density
• Increased chip temperature
• High temperature slows down the transistor switching rate and the overall
speed of the computer
• Chip may melt down if not cooled properly Efficient cooling systems are
expensive
90
Cooling Computer Chips
• Some people suggest to put computer chips in liquid nitrogen to cool them
91
Solutions
• Use multiple inexpensive processors A processor with multiple cores
92
A Multi-Core Processor
93
CPU and Memory Speed
• In 20 years, CPU speed (clock rate) has increased by a factor of 1000
• DRAM speed has increased only by a factor of smaller than 4
• How to feed data faster enough to keep CPU busy?
• CPU speed: 1-2 ns
• DRAM speed: 50-60 ns
• Cache: 10 ns
94
Memory Access and CPU Speed
95
CPU, Memory and Disk Speed
96
Possible Solutions
• A hierarchy of successively fast memory devices (multilevel caches)
• Location of data reference (code)
• Efficient programming can be an issue
• Parallel systems may provide
1.) larger aggregate cache
2.) higher aggregate bandwidth to the memory system
97
Parallel Computing – Useful Terms
• Concurrent - Events or processes which seem to occur or progress at the
same time.
• Parallel –Events or processes which occur or progress at the same time
• Parallel programming (also, unfortunately, sometimes called concurrent
programming), is a computer programming technique that provides for the
execution of operations concurrently, either
- within a single parallel computer OR
- across a number of systems.
• In the latter case, the term distributed computing is used. 98
Parallel Computing – Flynn’s Taxonmy
• Best known classification scheme for parallel computers.
• Depends on parallelism it exhibits with its
- Instruction stream
- Data stream
• A sequence of instructions (the instruction stream) manipulates a sequence
of operands (the data stream)
• The instruction stream (I) and the data stream (D) can be either single (S) or
multiple (M)
99
Parallel Computing – Flynn’s Taxonmy
• Four combinations: SISD, SIMD, MISD, MIMD
100
Parallel Computing – Flynn’s Taxonmy
• Serial Computer
• Single-CPU systems
- i.e., uniprocessors
- Note: co-processors don’t count as more processors
101
Parallel Computing – Flynn’s Taxonmy
102
Parallel Computing – Flynn’s Taxonmy
• On a memory access, all active processors must access the same location in
their local memory.
• The data items form an array and an instruction can act on the complete array
in one cycle.
103
Parallel Computing – Flynn’s Taxonmy
- The commander barks out an order that all the active soldiers should do
and they execute the order synchronously.
104
Parallel Computing – Flynn’s Taxonmy
105
Parallel Computing – Flynn’s Taxonmy
106
Parallel Computing – Flynn’s Taxonmy
• Processors are asynchronous, since they can independently execute different programs
on different data sets.
• MIMD’s have been considered by most researchers to include the most powerful, least
restricted computers.
107
Parallel Computing – Flynn’s Taxonmy
108
Parallel Computing – Useful Terms
• All processors have access to all memory locations .
• Uniform memory access (UMA)
• Similar to uniprocessor, except additional, identical CPU’s are added to the
bus.
• Each processor has equal access to memory and can do anything that any
other processor can do.
• Also called a symmetric multiprocessor or SMP
• We will discuss in greater detail later (e.g., text pg 43)
• SMPs and clusters of SMPs are currently very popular 109
Parallel Computing – Useful Terms
• A Processing Unit within a CPU. CPU can have multiple cores
• Each CPU core has its own L1 cache, but may share L2 and L3 caches
110
Parallel Computing – Useful Terms
• Cache is a small amount of memory which is a part of the CPU - closer to the CPU than RAM. It is
used to temporarily hold instructions and data that the CPU is likely to reuse.
• The more cache there is, the more data can be stored closer to the CPU.
112
Parallel Computing – Useful Terms
• FLOPS
- Floating-point operations per second, or FLOPS, is the unit of
measurement that calculates the performance capability of a
supercomputer.
- Floating-point operations can only be executed on computers with
integrated floating-point registers.
- The average computer’s processor performance is measured by megahertz
(MHz) units to calculate its clock speed. Since supercomputers are far
more capable when it comes to power performance, the method in which
performance is calculated must be on a considerably larger scale
113
Parallel Computing – Useful Terms
• FLOPS
One petaFLOPS is equal to 1,000,000,000,000,000 (one quadrillion) FLOPS,
or one thousand teraFLOPS
114
Parallel Computing – Useful Terms
• SETI@home
116
Parallel Computing
• Parallel computing also helps in faster application processing and task
resolution by increasing the available computation power of systems.
• The parallel computing principles are used by most supercomputers employ
to operate.
• The operational scenarios that need massive processing power or
computation, generally, parallel processing is commonly used there.
117