[go: up one dir, main page]

0% found this document useful (0 votes)
57 views24 pages

Parallel Algorithms: Chapter 9

The document discusses parallel algorithms and parallel computation models. It introduces: - Brent's theorem on the time complexity of parallel computations using p processors. - Different PRAM models for parallel random access machines including EREW, CREW, CRCW variants. - Algorithms for computing prefix sums and the maximum of values in parallel using these models in O(log n) time and O(n) processors.

Uploaded by

Krutarth Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views24 pages

Parallel Algorithms: Chapter 9

The document discusses parallel algorithms and parallel computation models. It introduces: - Brent's theorem on the time complexity of parallel computations using p processors. - Different PRAM models for parallel random access machines including EREW, CREW, CRCW variants. - Algorithms for computing prefix sums and the maximum of values in parallel using these models in O(log n) time and O(n) processors.

Uploaded by

Krutarth Patel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Chapter9

ParallelAlgorithms
AlgorithmTheory
WS2013/14
FabianKuhn
AlgorithmTheory,WS2013/14 FabianKuhn 2
ParallelComputations
T
p
:timetoperformcomp.withp procs
I
1
:work (total#operations)
Timewhendoingthe
computation sequentially
I

:criticalpath/span
Timewhenparallelizingas
muchaspossible
LowerBounds:
T
p

T
1
p
, T
p
T

AlgorithmTheory,WS2013/14 FabianKuhn 3
BrentsTheorem
BrentsTheorem:Onp processors,aparallelcomputationcanbe
performedintime
T
p

T
1
-T

p
+T

.
Corollary:Greedyisa2approximationalgorithmforscheduling.
Corollary: Aslongasthenumberofprocessorsp = 0 I
1
I

,itis
possibletoachievealinearspeedup.
AlgorithmTheory,WS2013/14 FabianKuhn 4
PRAM
BacktothePRAM:
Sharedrandomaccessmemory,synchronouscomputationsteps
ThePRAMmodelcomesinvariants
EREW (exclusiveread,exclusivewrite):
Concurrentmemoryaccessbymultipleprocessorsisnotallowed
Iftwoormoreprocessorstrytoreadfromorwritetothesame
memorycellconcurrently,thebehaviorisnotspecified
CREW (concurrentread,exclusivewrite):
ReadingthesamememorycellconcurrentlyisOK
Twoconcurrentwritestothesamecellleadtounspecified
behavior
Thisisthefirstvariantthatwasconsidered(alreadyinthe70s)
AlgorithmTheory,WS2013/14 FabianKuhn 5
PRAM
ThePRAMmodelcomesinvariants
CRCW (concurrentread,concurrentwrite):
ConcurrentreadsandwritesarebothOK
Behaviorofconcurrentwriteshastospecified
WeakCRCW:concurrentwriteonlyOKifallprocessorswriteu
CommonmodeCRCW:allprocessorsneedtowritethesamevalue
ArbitrarywinnerCRCW:adversarypicksoneofthevalues
PriorityCRCW:valueofprocessorwithhighestIDiswritten
StrongCRCW:largest(orsmallest)valueiswritten
Thegivenmodelsareorderedinstrength:
weak commonmode arbitrarywinner priority strong
AlgorithmTheory,WS2013/14 FabianKuhn 6
SomeRelationsBetweenPRAMModels
Theorem:Aparallelcomputationthatcanbeperformedintimet,
usingp processorsonastrongCRCWmachine,canalsobe
performedintime0(t log p) usingp processorsonanEREW
machine.
Each(parallel)stepontheCRCWmachinecanbesimulatedby
0(log p) stepsonanEREWmachine
Theorem:Aparallelcomputationthatcanbeperformedintimet,
usingp probabilisticprocessorsonastrongCRCWmachine,can
alsobeperformedinexpectedtime0(t log p) using0(p log p )
processorsonanarbitrarywinnerCRCWmachine.
Thesamesimulationturnsoutmoreefficientinthiscase
AlgorithmTheory,WS2013/14 FabianKuhn 7
SomeRelationsBetweenPRAMModels
Theorem:Acomputationthatcanbeperformedintimet,usingp
processorsonastrongCRCWmachine,canalsobeperformedin
time0(t) using0 p
2
processorsonaweakCRCWmachine
Proof:
Strong:largestvaluewins,weak:onlyconcurrentlywritingu isOK
AlgorithmTheory,WS2013/14 FabianKuhn 8
SomeRelationsBetweenPRAMModels
Theorem:Acomputationthatcanbeperformedintimet,usingp
processorsonastrongCRCWmachine,canalsobeperformedin
time0(t) using0 p
2
processorsonaweakCRCWmachine
Proof:
Strong:largestvaluewins,weak:onlyconcurrentlywritingu isOK
AlgorithmTheory,WS2013/14 FabianKuhn 9
ComputingtheMaximum
Observation:OnastrongCRCWmachine,themaximumofan
valuescanbecomputedin0(1) timeusingn processors
Eachvalueisconcurrentlywrittentothesamememorycell
Lemma:OnaweakCRCWmachine,themaximumofn integers
between1 and n canbecomputedintime0 1 using0 n proc.
Proof:
Wehave n memorycells
1
, ,
n
forthepossiblevalues
Initializeall

1
Forthen valuesx
1
, , x
n
,processor] sets
x
]
u
Sinceonlyzeroesarewritten,concurrentwritesareOK
Now,

= u iff valuei occursatleastonce


StrongCRCWmachine:max.valueintime0(1) w.0 n proc.
WeakCRCWmachine:time0(1) using0 n proc.(prev.lemma)
AlgorithmTheory,WS2013/14 FabianKuhn 10
ComputingtheMaximum
Theorem:Ifeachvaluecanberepresentedusing0 log n bits,the
maximumofn (integer)valuescanbecomputedintime0(1) using
0(n) processorsonaweakCRCWmachine.
Proof:
Firstlookat
Iog
2
n
2
highestorderbits
Themaximumvaluealsohasthemaximumamongthosebits
Thereareonly n possibilitiesforthesebits
max.of
Iog
2
n
2
highestorderbitscanbecomputedin0 1 time
Forthosewithlargest
Iog
2
n
2
highestorderbits,continuewith
nextblockof
Iog
2
n
2
bits,
AlgorithmTheory,WS2013/14 FabianKuhn 11
PrefixSums
Thefollowingworksforanyassociativebinaryoperator:
associativity: ob c = o bc
AllPrefixSums: Givenasequenceofn valueso
1
, , o
n
,theall
prefixsumsoperationw.r.t.returnsthesequenceofprefixsums:
s
1
, s
2
, , s
n
= o
1
, o
1
o
2
, o
1
o
2
o
3
, , o
1
o
n
Canbecomputedefficientlyinparallelandturnsouttobean
importantbuildingblockfordesigningparallelalgorithms
Example:Operator:+,input:o
1
, , o
8
= S, 1, 7, u, 4, 1, 6, S
s
1
, , s
8
=
AlgorithmTheory,WS2013/14 FabianKuhn 12
ComputingtheSum
Letsfirstlookats
n
= o
1
o
2
o
n
Parallelizeusingabinarytree:
AlgorithmTheory,WS2013/14 FabianKuhn 13
ComputingtheSum
Lemma:Thesums
n
= o
1
o
2
o
n
canbecomputedin
time0(log n) onanEREWPRAM.Thetotalnumberof
operations(totalwork)is0(n).
Proof:
Corollary:Thesums
n
canbecomputedintime0 log n using
0 n log n processorsonanEREWPRAM.
Proof:
FollowsfromBrentstheorem(I
1
= 0(n),I

= 0(log n))
AlgorithmTheory,WS2013/14 FabianKuhn 14
GettingThePrefixSums
Insteadofcomputingthesequences
1
, s
2
, , s
n
letscompute
r
1
, , r
n
= u, s
1
, s
2
, , s
n-1
(u:neutralelementw.r.t.)
r
1
, , r
n
= u, o
1
, o
1
o
2
, , o
1
o
n-1
Togetherwiths
n
,thisgivesallprefixsums
Prefixsumr

= s
-1
= o
1
o
-1
:





a
2
a
2
a
3
a
3
a
1
a
1
a
4
a
4
a
5
a
5
a

a
7
a
7
a
8
a
8
a
9
a
9
a
1
a
1
a
11
a
11
a
12
a
12
a
13
a
13
a
14
a
14
a
15
a
15
a
1
a
1
r
14
(x
13
)
AlgorithmTheory,WS2013/14 FabianKuhn 15
GettingThePrefixSums
Claim:Theprefixsumr

= o
1
o
-1
isthesumofallthe
leavesintheleftsubtreeofeachancestoru oftheleaf:
containingo

suchthat: isintherightsubtreeofu.





a
2
a
2
a
3
a
3
a
1
a
1
a
4
a
4
a
5
a
5
a

a
7
a
7
a
8
a
8
a
9
a
9
a
1
a
1
a
11
a
11
a
12
a
12
a
13
a
13
a
14
a
14
a
15
a
15
a
1
a
1
AlgorithmTheory,WS2013/14 FabianKuhn 16
ComputingThePrefixSums
Foreachnodeu ofthebinarytree,definer(u) asfollows:
r : isthesumofthevalueso

attheleavesinalltheleftsub
treesofancestorsu of: suchthat: isintherightsubtreeofu.
Foraleafnode : holdingvalueo

:r u = r
|
= x
|-1
Fortherootnode:r ruut =
Forallothernodes::
: istheleftchild ofu:
r : = r(u)
u u
u u
: istherightchildofu:
(u hasleftchildw)
r : = r u +S
(S:sumofvaluesin
subtreeofw)
u u
w w u u
S S
AlgorithmTheory,WS2013/14 FabianKuhn 17
ComputingThePrefixSums
leafnode: holdingvalueo

:r u = r
|
= x
|-1
rootnode:r ruut =
Node: istheleftchildofu:r : = r(u)
Node: istherightchildofu:r : = r u +S
Where:S = sumofvaluesinleftsubtreeofu
Algorithmtocomputevaluesr(u):
1. Computesumofvaluesineachsubtree(bottomup)
Canbedoneinparalleltime0 log n with0(n) totalwork
2. Computevaluesr(:) topdown fromroottoleaves:
Tocomputethevaluer(:),onlyr(u) oftheparentu andthesumofthe
leftsibling(if: isarightchild)areneeded
Canbedoneinparalleltime0 log n with0 n totalwork
AlgorithmTheory,WS2013/14 FabianKuhn 18
Example
1. Computesumsofallsubtrees
Bottomup(levelwiseinparallel,startingattheleaves)
2. Computevaluesr(:)
Topdown(startingattheroot)
8 8 3 3 -1 -1 3 3 2 2 8 8 1 1 1 1 3 3 4 4 5 5 7 7 2 2
11 11 -1 -1 9 9 9 9 2 2 4 4 9 9 9 9
1 1 11 11
13 13
18 18
21 21 31 31
52 52

21

3 11
11
11
1
1
1 1
19
19 21
21
21
21 29 3
3
31
34
34
34 38 43
43
5
AlgorithmTheory,WS2013/14 FabianKuhn 19
ComputingPrefixSums
Theorem:Givenasequenceo
1
, , o
n
ofn values,allprefixsums
s

= o
1
o

(for1 i n)canbecomputedintime0(log n)
using0 n log n processors onanEREWPRAM.
Proof:
Computingthesumsofallsubtreescanbedoneinparallelin
time0 log n using0 n totaloperations.
Thesameistrueforthetopdownsteptocomputether(:)
ThetheoremthenfollowsfromBrentstheorem:
I
1
= 0 n , I

= 0 log n I
p
< I

+
I
1
p
Remark:Thiscanbeadaptedtootherparallelmodelsandto
differentwaysofstoringthevalue(e.g.,arrayorlist)
AlgorithmTheory,WS2013/14 FabianKuhn 20
ParallelQuicksort
Keychallenge:parallelizepartition
Howcanwedothisinparallel?
Fornow,letsjustcareaboutthevalues pivot
Whataretheirnewpositions
5 5 14 14 18 18 8 8 19 19 21 21 3 3 1 1 25 25 17 17 11 11 4 4 2 2 1 1 2 2 2 2 9 9 13 13 23 23 1 1
pivot
5 5 14 14 8 8 3 3 1 1 11 11 4 4 1 1 2 2 9 9 13 13 1 1 18 18 19 19 21 21 25 25 17 17 2 2 2 2 23 23
partition
AlgorithmTheory,WS2013/14 FabianKuhn 21
UsingPrefixSums
Goal:Determinepositionsofvalues pivot afterpartition
5 5 14 14 18 18 8 8 19 19 21 21 3 3 1 1 25 25 17 17 11 11 4 4 2 2 1 1 2 2 2 2 9 9 13 13 23 23 1 1
pivot
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 2 2 2 2 3 3 3 3 3 3 4 4 5 5 5 5 5 5 7 7 7 7 8 8 8 8 9 9 1 1 11 11 11 11 12 12
5 5 14 14 8 8 3 3 1 1 11 11 4 4 1 1 2 2 9 9 13 13 1 1 18 18 19 19 21 21 25 25 17 17 2 2 2 2 23 23
prefixsums
partition
AlgorithmTheory,WS2013/14 FabianKuhn 22
PartitionUsingPrefixSums
Thepositionsoftheentries> pivot canbedeterminedinthe
sameway
Prefixsums:I
1
= 0 n , I

= 0(log n)
Remainingcomputations:I
1
= 0 n , I

= 0(1)
Overall: I
1
= 0 n , I

= 0(log n)
Lemma:Thepartitioningofquicksortcanbecarriedoutin
parallelintime0 log n using0
n
Iog n
processors.
Proof:
ByBrentstheorem:I
p

1
1
p
+I

AlgorithmTheory,WS2013/14 FabianKuhn 23
ApplyingtoQuicksort
Theorem:OnanEREWPRAM,usingp processors,randomized
quicksortcanbeexecutedintimeI
p
(inexpectationandwith
highprobability),where
I
p
= 0
n log n
p
+log
2
n .
Proof:
Remark:
Wegetoptimal(linear)speedupw.r.t.tothesequential
algorithmforallp = 0 n log n .
AlgorithmTheory,WS2013/14 FabianKuhn 24
OtherApplicationsofPrefixSums
Prefixsumsareaverypowerfulprimitivetodesignparallel
algorithms.
Particularlyalsobyusingotheroperatorsthan+
ExampleApplications:
Lexicalcomparisonofstrings
Addmultiprecisionnumbers
Evaluatepolynomials
Solverecurrences
Radixsort/quicksort
Searchforregularexpressions
Implementsometreeoperations

You might also like