Parallel Computer Architecture
The End of the Road
Advantages of Multiprocessors
Able to create powerful computers by
simply connecting multiple processors performance single processor
More cost-effective than building a high Obtain fault-tolerance to carry on the
tasks, albeit with degraded performance
4 Decades of Computing
Batch Era (1960s)
IBM System/360 mainframe dominated the corporate computer centers (10 MB disk, 1 MB magnetic core memory) Typical batch processing machine No connection beyond the computer room
Time-Sharing Era (1970s)
Advancing in ss-memory & ICs spawned the minicomputer era Small, fast, and inexpensive enough to be spread throughout the company at the divisional level Still too expensive and difcult to use to hand over to end-users
Time-sharing computing Existing 2 kinds:
centralized data processing mainframes time-sharing minicomputers
Desktop Era (1980s)
PCs were introduced in 1977 Many players (Altairs, Tandy, Commondore, Apple, IBM, and etc) Became pervasive and change the face of computing Along came networked computers (LAN & WAN)
Network Era (1990s)
Advance network technologies led to network computing paradigm Transition from a processorcentric view of computing to a network-centric view A number of commercial parallel computers with multiple processors:
Shared memory systems Distributed memory systems
Four Decades of Computing
Feature Decade Location Users Data Objective Interface Operation Connectivity Owners Batch 1960s Time-Sharing 1970s Desktop 1980s Desktop Individuals Fonts, graphs Present See & point Layout LAN Departmental end-users Network 1990s Mobile Groups Multimedia Communicate Ask & tell Orchestrate Internet Everyone
Computer Room Terminal Room
Experts Alphanumeric Calculate Punched card Process None
Corporate computer centers
Specialists Text, numbers Access Kbd & CRT Edit Peripheral cable Divisional IS shops
Current Trends
The substitution of expensive and specialized parallel machines by the more cost-effective clusters of workstations
A cluster is a collection of stand-alone computers connected using some interconnection network
A pervasiveness of the Internet created interest in network computing and more recently in grid computing
Grids are geographically distributed platforms of computation - dependable, consistent, pervasive, and less expensive access to HPC facilities
Flynns Taxonomy of Computer Architecture
Based on the notion of a stream of
information
instruction data
CPU
fetch
Memory
execute
(manipulate data as programmed)
Single Instruction
Multiple Instruction
Single Data
SISD
MISD
Multiple Data
SIMD
MIMD
SIMD Architecture
Single Instruction, Multiple Data (SIMD)
prev instruction load A(1) load B(1) C(1)=A(1)*B(1) store C(1) next instruction prev instruction load A(2) load B(2) C(2)=A(2)*B(2) store C(2) next instruction prev instruction load A(n) load B(n) C(n)=A(n)*B(n) store C(n) next instruction
time
P1
P2
Pn
MIMD Architecture
Instruction Stream Control Unit-1 Instruction Stream P1 Data Stream M1
Instruction Stream Control Unit-n Instruction Stream Pn
Data Stream Mn
Multiple Instruction, Multiple Data (MIMD)
prev instruction load A(1) load B(1) C(1)=A(1)*B(1) store C(1) next instruction prev instruction call funcD x=y^z sum=x^2 call sub1(i,j) next instruction prev instruction do 10 i=1,N alpha=w**3 zeta=C(i) 10 continue next instruction
time
P1
P2
Pn
SIMD Architecture Model
Consists of two parts:
a front-end computer a processor array
each element in the processor array is identical to one another and performs operation on different data in sync front-end can access PEs memory via the bus
SIMD Architecture Model
lock-step synchronization Processors either do nothing or exactly the same ops simultaneously In SIMD, parallelism is exploited by applying simultaneous operations across large sets of data
SIMD Congurations
Control Unit P1 P2 P3 Pn-1 Pn
Each PE has its own local memory
M1
M2
M3
Mn-1
Mn
Interconnection Network
Control Unit
P1
P2
P3
Pn-1
Pn
PEs and memory modules communicate via the IN
Interconnection Network
M1
M2
M3
Mn-1
Mn
ILLIAC IV
Control Unit
P1
P2
P3
Pn-1
Pn
M1
M2
M3
Mn-1
Mn
Interconnection Network
MIMD Architecture
M M M M Interconnection Network P P P P
INTRODUCTION TO ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
INTRODUCTION TO ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Shared Memory MIMD Architecture
Interconnection Network
Interconnection Network P P P P P P P P
Shared Memory MIMD Architecture
Message Passing MIMD Architecture
Figure 1.6 Shared memory versus message passing architecture.
Interconnection Network
Commercial examples of SMPs are Sequent Computers Balance and Symmetry, Sun Microsystems multiprocessor servers, and Silicon Graphics Inc. multiprocessor servers. P P P P A message passing system (also referred to as distributed memory) typically combines the local memory and processor at each node of the interconnection network. M M M M There is no global memory, so it is necessary to move data from one local memory to another by means of message passing. This is typically done by a Send/Receive pair Message Passing MIMD Architecture of commands, which must be written into the application software by a programmer. Figure 1.6 Shared memory versus message passing architecture. Thus, programmers must learn the message-passing paradigm, which involves data copying and dealing with consistency issues. Commercial examples of message passing architectures c. 1990 were the nCUBE, iPSC/2, and various Transputer-based systems. These systems eventually gave way to Internet connected systems whereby Commercial examples of SMPs are Sequent Computers Balance and Symmetry, the processor/memory nodes were either Internet servers or clients on individuals Sun Microsystems multiprocessor servers, and Silicon Graphics Inc. multiprocessor
information exchange through central shared memory
information exchange through network in message passing systems
MIMD Architecture
P
NTRODUCTION TO ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
using bus/cache architecture called SMP (symmetric multiprocessor) since
Interconnection Network
Shared Memory MIMD Architecture
equal chance to read/ write memory equal access speed
Interconnection Network
INTRODUCTION TO ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCE
MIMD Architecture
Interconnection Network
also known as distributed memory no global memory using message passing to move data from one to another (Send/Recieve Figure 1.6 pair of commands)
Shared Memory MIMD Architecture
Interconnection Network
Message Passing MIMD Architecture
Shared memory versus message passing architecture.
this architecture give Commercial examples of SMPs are Sequent Computers Balance and Symm Sun Microsystems multiprocessor servers, and Silicon Graphics Inc. multiproc way to Internet servers. A message passing system (also referred to as distributed memory) typically connected systems bines the local memory and processor at each node of the interconnection net
There is no global memory, so it is necessary to move data from one local mem
MIMD Architecture
M M M M Interconnection Network P P P P
INTRODUCTION TO ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
INTRODUCTION TO ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
Shared Memory MIMD Architecture
Interconnection Network
Interconnection Network P P P P P P P P
Shared Memory MIMD Architecture
Message Passing MIMD Architecture
Figure 1.6 Shared memory versus message passing architecture.
Interconnection Network
Commercial examples of SMPs are Sequent Computers Balance and Symmetry, Sun Microsystems multiprocessor servers, and Silicon Graphics Inc. multiprocessor servers. P P P P A message passing system (also referred to as distributed memory) typically combines the local memory and processor at each node of the interconnection network. M M M M There is no global memory, so it is necessary to move data from one local memory to another by means of message passing. This is typically done by a Send/Receive pair Message Passing MIMD Architecture of commands, which must be written into the application software by a programmer. Figure 1.6 Shared memory versus message passing architecture. Thus, programmers must learn the message-passing paradigm, which involves data copying and dealing with consistency issues. Commercial examples of message passing architectures c. 1990 were the nCUBE, iPSC/2, and various Transputer-based systems. These systems eventually gave way to Internet connected systems whereby Commercial examples of SMPs are Sequent Computers Balance and Symmetry, the processor/memory nodes were either Internet servers or clients on individuals Sun Microsystems multiprocessor servers, and Silicon Graphics Inc. multiprocessor
programming is easier
provided scalability
DSM (distributed-shared memory) is the hybrid between the two
DSM
memory is physically distributed [message
passing]
memory can be addressed as one (logically
shared) address space [shared memory]
programming-wise, the architecture looks
and behaves like a shared memory machine, but a message passing architecture lives underneath the software
SGI Origin2000
SIMD
Control Unit
Control Unit
P1
P2
P3
Pn-1
Pn
P1
P2
P3
Pn-1
Pn
M1
M2
M3
Mn-1
Mn
Interconnection Network
Interconnection Network
M1
M2
M3
Mn-1
Mn
access control - which process accesses are
possible to which resources
synchronization - constraints limit the time
of accesses from sharing processes to shared resources
SIMD
Control Unit
Control Unit
P1
P2
P3
Pn-1
Pn
P1
P2
P3
Pn-1
Pn
M1
M2
M3
Mn-1
Mn
Interconnection Network
Interconnection Network
M1
M2
M3
Mn-1
Mn
protection - a system feature that prevents
processes from making arbitrary access to resources belonging to other processes
Interconnection Network
MIMD
P
Interconnection Network P P P P P P P
Shared Memory MIMD Architecture
Message Passing MIMD Architecture
nodes are typically able to simultaneously
store messages in buffers perform send/receive operations
scalable - the number of processors can be increased without signicant decrease in efciency of operation
Interconnection Networks
Interconnection Networks (INs)
Can be classied based on mode of operation control strategy switching techniques topology
Mode of Operation
Accordingly, INs are classied as:
Synchronous
a single global clock used by all operating in a lock-step manner
Asynchronous does not require a global clock handshaking signals are used
Sync tends to be slower than async, sync is race and hazard-free, however.
Control Strategy
Accordingly, INs are classied as
Centralized a single central CU is used to oversee
and control the operation
Decentralized the control function is distributed
among different components
Control Strategy
The function and reliability of the central
the multistage interconnection networks are decentralized control unit can become the bottleneck in a centralized control system
While the crossbar is a centralized system,
Switching Techniques
INs can be classied as:
circuit switching
a complete path has to be established and remain existence during the whole communication
packet switching communication takes place via messages that are divided into smaller entities (packets) packets travel in a store-and-forward manner
While packet s/w tends to use resources more efciently, it suffers from variable packet delays
Topology
Topology describes how to connect
processors and memories to other processors and memories
Shared Memory INs
bus-based
P
switch-based
C
Global Memory
P P C C
C P
C P
C
P C
P
M M M M
Message Passing INs
Static interconnection network Dynamic interconnection network
Static INs
Linear Array
Ring
Mesh
Tree
Hypercube
Dynamic INs
Establish a connection between two or
more nodes on the y as messages are routed along the links
The number of hops in a path from source
to destination node is equal to the number of point-to-point links a message must traverse to reach its destination
Single-stage
Multiple-stage
Crossbar switch