0% found this document useful (0 votes)

78 views13 pages

Intro To Communication: - Advantages

This document discusses hypercube networks and their use in parallel computing models. Some key points: - Hypercubes have good diameter and bisection width properties but the number of links per node is not constant and increases with the log of the number of processors. - In a hypercube network, each processor can be identified with a bit string of length d, where d is the number of dimensions. Neighboring processors differ in only one bit. - Communication between processors first finds the number of differing bits in their IDs, then routes the message by changing one bit at a time until the destination is reached. - Permutation networks allow all processors to communicate simultaneously without any two having the same destination,

Uploaded by

Xin Zhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

78 views13 pages

Intro To Communication: - Advantages

Uploaded by

Xin Zhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Intro to communication

9/8/09

Hypercubes
• Advantages:
– Best diameter and bisection width we have seen so far.
– Practically, these are very useful bounds and multiple systems
use a hypercube network topology.

• Disadvantages:
– Number of links per node is not constant
– p = 2d, 2d links per node = Θ(log p)

1
Bit labeling
• Because the number of processors along
any given dimension is 2, we can
represent a single coordinate by a single
bit: 0 or 1

• Each processor, therefore, has an

identifier of d bits.
– For example, (0,1,1) in a 8-processor
hypercube

Helpful facts
• Flipping any of the d bits returns a neighbor of a
given processor.
– Each flip corresponds to a link in the network.

• Because at most d bits can be flipped, a

processor has d neighbors.

• This also gives us a quick algorithm to check if

any two processors are neighbors.
– Hamming distance of 1

2
Communication paths
• First, find the number of bits that two
messages differ.

• Route the message to a neighbor, and

continue until it arrives.

• If the processor ids differ by k bits, there

are k! possible shortest paths possible.
– High connectivity = good

Our model of parallel computation

• As described previously, our model will be:

– p processors connected by an interconnection

network
– Each processor has its own local memory
– Accessing remote memory requires using the network

• We want processors to communicate

simultaneously
– Using MPI function calls for your assignments
– Further complicated because accessing remote
memory often requires a send/receive

3
Details
• We will assume that only one message is being
routed per processor.

• Before the message is sent, it must be prepared,

a destination and error correcting applied and
any other preprocessing.

• This is called setup time and we will use τ to

model this parameter.

More details
• The transmission of the message can be
performed at the rate of µ time per word.

• It follows that total communication time can be

modeled as:
– τ + µm, where m is the size of the message

• To be comparable to computation, we usually

measure this in floating point operations, or
flops.

4
Supporting simultaneous
communication
• A processor can send and receive a message at
any given time step.

• It is clear that a processor can not perform

multiple sends and receives at the same time.

• Therefore, we would like communication steps

such that no two destinations are the same:
– We will call this a permutation network

Example
• This permutation (2 3 1 0) implies:
– Processor 0 sends a message to 2 (0, 2)
– Processor 1 sends a message to 3 (1, 3)
– Processor 2 sends a message to 1 (2, 1)
– Processor 3 sends a message to 0 (3, 0)

• Note that any step where a processor is not

involved can be represented by communication
with itself, e.g., (0,0).

5
Multistage communication networks

• Multistage interconnection networks

(details next class and in text) are
topologically equivalent.

• It can be shown that these can achieve a

permutation in at most 3 steps.

Take home messages

• We make the theoretical assumption that
any permutation can be supported in
parallel (with some constant overhead).

• We will use permutation networks as our

interconnection network in this course.
– Simplifies algorithm design and analysis

6
In class example
• In small groups, what are some
permutations that can be achieved using a
ring network?

• What about a hypercube? Come up with

one example for a 3D cube of 8
processors.

Often used permutations

• The permutation network, by definition,
allows any communication where no two
destinations are the same in one step

• Even so, we usually only use the following:

– Shift permutations
– Hypercubic permutations

7
Shift permutations
• A shift permutation is one such that each
processor communicates with its neighbor

• Left shift:
– Processor i communicates with processor i -1

• Right shift:
– Processor i communicates with processor (i +
1) mod p

Hypercubic permutations
• Let d = ceiling (log p)

• Just as in a static hypercube, each processor’s

id can be viewed as a string of d bits

• Fix a bit position j. The communication where

processors that only differ at this bit is a
hypercubic permutation

• There are d hypercubic permutations

8
Another reason why
hypercubes are cool

• Obviously, these can be simultaneously

routed on any hypercube network

• This is better than many multistage

networks, some of which we will discuss
next class.

Tips for parallel algorithm analysis

• In any algorithm we design, we will
separate its computation and
communication costs.

• For example:
– Computation time = O(f(n,p))
– Communication time = O(τg(n,p) + µh(n,p))

9
A simple algorithm example
• Suppose we’d like to add n numbers,
where n >> p.

• For simplicity, lets assume p evenly

divides n.

• What is one algorithm for this?

An approach
• It is clear that computing the sum of n/p
integers on a single processor requires n/p
time.

• Adding p resulting integers can be

performed in log p steps.

• Computation time = O(n/p + log p)

10
Communication time
• Communication time = O(τ + µ)log p

• Spoiler: Can this be done on a

hypercube?

How do we know if this good?

• The sequential runtime of adding n
integers is Θ(n).

• Therefore, a parallel algorithm must take

at least Ω(n/p); ours is O(n/p + log p)

• As a result, we are efficient as long as:

– n > p log p (in class)

11
But why?
• Note τ and µ are constants, and we are
using O notation here, not Θ

• Why isn’t the algorithm?

– Max{f(n,p), g(n,p), h(n,p)}

Older values of communication

constants (relative to one flop)
Machine Τ µ τ/µ

IBM BG/L 3000 50-100 15-30

Intel Delta 4650 87 54

Intel Paragon 7800 9 867

Cray T3D 27000 9 3000

IBM SP1 28000 50 560

CM5 450 3 113

12
An example
• Suppose our algorithm achieves ideal speedup

• Further,
– f(n,p) = g(n,p) = h(n,p) = T(n, 1) / p

• If τ, however, is 1000, our speedup would be

p/1000
– This implies 1000 processors would be as fast as 1

Rules of thumb
• We would like f(n,p) to be as close to
O(T(n,1)/p) as possible

• Ideally, h(n,p) is a slower growing function

than f(n,p) and g(n,p) even slower

Introduction To Parallel Computing: Solution Manual
No ratings yet
Introduction To Parallel Computing: Solution Manual
70 pages
Solution 2-DD
No ratings yet
Solution 2-DD
70 pages
Slides Chapter 2 - Parallel Programming Platforms
No ratings yet
Slides Chapter 2 - Parallel Programming Platforms
33 pages
Introduction
No ratings yet
Introduction
46 pages
Chapter 2 - Parallel Programming Platforms
No ratings yet
Chapter 2 - Parallel Programming Platforms
33 pages
Pdcco 1
No ratings yet
Pdcco 1
8 pages
Parallel Architectures
No ratings yet
Parallel Architectures
160 pages
Parallel Architecture
No ratings yet
Parallel Architecture
33 pages
Parallel Computing Quiz
No ratings yet
Parallel Computing Quiz
15 pages
Interconnection Networks: Crossbar Switch, Which Can Simultaneously Connect Any Set of
No ratings yet
Interconnection Networks: Crossbar Switch, Which Can Simultaneously Connect Any Set of
11 pages
Parallel Communication Costs
No ratings yet
Parallel Communication Costs
24 pages
Chapter 4
No ratings yet
Chapter 4
46 pages
Parallel Computing Challenges & Trends
No ratings yet
Parallel Computing Challenges & Trends
81 pages
Interconnection Networks
No ratings yet
Interconnection Networks
31 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
Lecture 7 Disributed Algorithms
No ratings yet
Lecture 7 Disributed Algorithms
43 pages
Lecture 15 PDC BCS 6EF SMI Spring 2025
No ratings yet
Lecture 15 PDC BCS 6EF SMI Spring 2025
27 pages
Chapter 3
No ratings yet
Chapter 3
21 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
PDC - Co1-Basic Op & Cost Analysis
No ratings yet
PDC - Co1-Basic Op & Cost Analysis
22 pages
++probleme Tot
No ratings yet
++probleme Tot
22 pages
Distributed Systems Overview
No ratings yet
Distributed Systems Overview
54 pages
Parallel Algorithms: Peter Harrison and William Knottenbelt
No ratings yet
Parallel Algorithms: Peter Harrison and William Knottenbelt
65 pages
Advanced Computer Architecture CSE 8383
No ratings yet
Advanced Computer Architecture CSE 8383
56 pages
HPC Overview
No ratings yet
HPC Overview
45 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
21 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
LEC6 parallelAlg-Broadcasting
No ratings yet
LEC6 parallelAlg-Broadcasting
15 pages
Lec8 MPIalgorithmDesign
No ratings yet
Lec8 MPIalgorithmDesign
12 pages
Lecture 5 Network Topologies For Parallel Architectures - Updated
No ratings yet
Lecture 5 Network Topologies For Parallel Architectures - Updated
46 pages
Lecture 3 - 3 Evaluating Static Interconnection Networks
No ratings yet
Lecture 3 - 3 Evaluating Static Interconnection Networks
41 pages
Static Interconnection Networks
No ratings yet
Static Interconnection Networks
10 pages
CMP 316 Data Communication and Networks WRITEUP Update
No ratings yet
CMP 316 Data Communication and Networks WRITEUP Update
122 pages
Parallel Algorithms Overview
No ratings yet
Parallel Algorithms Overview
271 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
No ratings yet
2 New Module 2 Performance Analysis of Multiprocessor Architectures Students Version
13 pages
Fundamentals of Parallel Computers
No ratings yet
Fundamentals of Parallel Computers
6 pages
10-Hypercube & Network
No ratings yet
10-Hypercube & Network
22 pages
IntroDistribuetComputing
No ratings yet
IntroDistribuetComputing
41 pages
PDC - Lecture - No. 3
No ratings yet
PDC - Lecture - No. 3
34 pages
Parallel Processing Lecture3
No ratings yet
Parallel Processing Lecture3
54 pages
QNS. Parallel Computing
No ratings yet
QNS. Parallel Computing
44 pages
Lecture 9 - Parallel Algorithms
No ratings yet
Lecture 9 - Parallel Algorithms
28 pages
Parallel and Distributed Computing Research Paper
No ratings yet
Parallel and Distributed Computing Research Paper
8 pages
Multiprocessor Interconnection Networks Networks: CS 740 November 19, 2003
No ratings yet
Multiprocessor Interconnection Networks Networks: CS 740 November 19, 2003
8 pages
Optimal Communication Algorithms For Hypercubes : Journal of Parallel and Distributed Computing 11, 263-275 (1991)
No ratings yet
Optimal Communication Algorithms For Hypercubes : Journal of Parallel and Distributed Computing 11, 263-275 (1991)
13 pages
Massively Parallel Processors
No ratings yet
Massively Parallel Processors
102 pages
Lecture 03 InterprocessCommunication
No ratings yet
Lecture 03 InterprocessCommunication
45 pages
Parallel and Distributed Algorithms
No ratings yet
Parallel and Distributed Algorithms
21 pages
Distributed Systems Course Guide
No ratings yet
Distributed Systems Course Guide
91 pages
443 CN File
No ratings yet
443 CN File
52 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
CompArch 23a MP-1
No ratings yet
CompArch 23a MP-1
17 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Parallel Computing
No ratings yet
Parallel Computing
30 pages
Lecture 4 Network Topologies For Parallel Architecture
No ratings yet
Lecture 4 Network Topologies For Parallel Architecture
34 pages
Lecture 4 Flynn's Classical Taxonomy
No ratings yet
Lecture 4 Flynn's Classical Taxonomy
43 pages
IP Addressing and Subnetting Practice
No ratings yet
IP Addressing and Subnetting Practice
3 pages
Robotic Animagdgsdgsdtion How To
No ratings yet
Robotic Animagdgsdgsdtion How To
20 pages
VTN244 Equipment Manual
No ratings yet
VTN244 Equipment Manual
79 pages
Web App for Lecture Material Sharing
No ratings yet
Web App for Lecture Material Sharing
3 pages
PAC Draft Report
No ratings yet
PAC Draft Report
205 pages
N68S3+ 20170713
No ratings yet
N68S3+ 20170713
3 pages
DMR387181 RevB
No ratings yet
DMR387181 RevB
20 pages
Internet Uses Explained
No ratings yet
Internet Uses Explained
9 pages
Rapid Change in Cloud Technology and Its Effects On Improving Government Service Delivery - Capstone Project V1.1
100% (5)
Rapid Change in Cloud Technology and Its Effects On Improving Government Service Delivery - Capstone Project V1.1
55 pages
NAVMAN GPS I Series Install (Ver 3.1.1) Ver1.02
No ratings yet
NAVMAN GPS I Series Install (Ver 3.1.1) Ver1.02
2 pages
Informatics Practices MS
No ratings yet
Informatics Practices MS
15 pages
E Banking System
No ratings yet
E Banking System
3 pages
Virus (TROJAN HORSE AND SALAMI ATTACK)
No ratings yet
Virus (TROJAN HORSE AND SALAMI ATTACK)
10 pages
J1939-82 - Compliance - 2015-06
No ratings yet
J1939-82 - Compliance - 2015-06
62 pages
2.4Hz 12dbi Outdoor Omni-Directional Antenna: Features
No ratings yet
2.4Hz 12dbi Outdoor Omni-Directional Antenna: Features
2 pages
Siebel 8.1.1.x and BI Publusher
No ratings yet
Siebel 8.1.1.x and BI Publusher
15 pages
Harmony Connect Remote Access Battlecard
No ratings yet
Harmony Connect Remote Access Battlecard
4 pages
Raj Emmanuel
No ratings yet
Raj Emmanuel
85 pages
UGC NET CS-Computer Networks Test 1
No ratings yet
UGC NET CS-Computer Networks Test 1
3 pages
Ncp-Mci-5.15 Exam Dump From Julie
No ratings yet
Ncp-Mci-5.15 Exam Dump From Julie
63 pages
LRN List
100% (2)
LRN List
13 pages
1511 MAX Tig
100% (1)
1511 MAX Tig
36 pages
Design and Specification of Open Systems: Ron Bernstein
No ratings yet
Design and Specification of Open Systems: Ron Bernstein
28 pages
Sps PPR RPF Series Mobile Printer Configuration Guide
No ratings yet
Sps PPR RPF Series Mobile Printer Configuration Guide
21 pages
Home Dashboard Design in Android Studio - AwsRh
100% (1)
Home Dashboard Design in Android Studio - AwsRh
14 pages
Part 2 - Kubernetes Interview Questions For DevOps
No ratings yet
Part 2 - Kubernetes Interview Questions For DevOps
4 pages
IoT Applications and Challenges Analysis
No ratings yet
IoT Applications and Challenges Analysis
6 pages
Windows 7 Tips, Tricks and Secrets
No ratings yet
Windows 7 Tips, Tricks and Secrets
14 pages
Add Solar Camera to NVR via ISUP
No ratings yet
Add Solar Camera to NVR via ISUP
10 pages
Monitoring in Grid
No ratings yet
Monitoring in Grid
8 pages

Intro To Communication: - Advantages

Uploaded by

Intro To Communication: - Advantages

Uploaded by

Intro to communication

• Each processor, therefore, has an

• Because at most d bits can be flipped, a

• This also gives us a quick algorithm to check if

• Route the message to a neighbor, and

• If the processor ids differ by k bits, there

Our model of parallel computation

– p processors connected by an interconnection

• We want processors to communicate

• Before the message is sent, it must be prepared,

• This is called setup time and we will use τ to

• It follows that total communication time can be

• To be comparable to computation, we usually

• It is clear that a processor can not perform

• Therefore, we would like communication steps

• Note that any step where a processor is not

• Multistage interconnection networks

• It can be shown that these can achieve a

Take home messages

• We will use permutation networks as our

• What about a hypercube? Come up with

Often used permutations

• Even so, we usually only use the following:

• Just as in a static hypercube, each processor’s

• Fix a bit position j. The communication where

• There are d hypercubic permutations

• Obviously, these can be simultaneously

• This is better than many multistage

Tips for parallel algorithm analysis

• For simplicity, lets assume p evenly

• What is one algorithm for this?

• Adding p resulting integers can be

• Computation time = O(n/p + log p)

• Spoiler: Can this be done on a

How do we know if this good?

• Therefore, a parallel algorithm must take

• As a result, we are efficient as long as:

• Why isn’t the algorithm?

Older values of communication

IBM BG/L 3000 50-100 15-30

Intel Delta 4650 87 54

Intel Paragon 7800 9 867

Cray T3D 27000 9 3000

IBM SP1 28000 50 560

CM5 450 3 113

• If τ, however, is 1000, our speedup would be

• Ideally, h(n,p) is a slower growing function

You might also like