Parallel Algorithms: Chapter 9
Parallel Algorithms: Chapter 9
ParallelAlgorithms
AlgorithmTheory
WS2013/14
FabianKuhn
AlgorithmTheory,WS2013/14 FabianKuhn 2
ParallelComputations
T
p
:timetoperformcomp.withp procs
I
1
:work (total#operations)
Timewhendoingthe
computation sequentially
I
:criticalpath/span
Timewhenparallelizingas
muchaspossible
LowerBounds:
T
p
T
1
p
, T
p
T
AlgorithmTheory,WS2013/14 FabianKuhn 3
BrentsTheorem
BrentsTheorem:Onp processors,aparallelcomputationcanbe
performedintime
T
p
T
1
-T
p
+T
.
Corollary:Greedyisa2approximationalgorithmforscheduling.
Corollary: Aslongasthenumberofprocessorsp = 0 I
1
I
,itis
possibletoachievealinearspeedup.
AlgorithmTheory,WS2013/14 FabianKuhn 4
PRAM
BacktothePRAM:
Sharedrandomaccessmemory,synchronouscomputationsteps
ThePRAMmodelcomesinvariants
EREW (exclusiveread,exclusivewrite):
Concurrentmemoryaccessbymultipleprocessorsisnotallowed
Iftwoormoreprocessorstrytoreadfromorwritetothesame
memorycellconcurrently,thebehaviorisnotspecified
CREW (concurrentread,exclusivewrite):
ReadingthesamememorycellconcurrentlyisOK
Twoconcurrentwritestothesamecellleadtounspecified
behavior
Thisisthefirstvariantthatwasconsidered(alreadyinthe70s)
AlgorithmTheory,WS2013/14 FabianKuhn 5
PRAM
ThePRAMmodelcomesinvariants
CRCW (concurrentread,concurrentwrite):
ConcurrentreadsandwritesarebothOK
Behaviorofconcurrentwriteshastospecified
WeakCRCW:concurrentwriteonlyOKifallprocessorswriteu
CommonmodeCRCW:allprocessorsneedtowritethesamevalue
ArbitrarywinnerCRCW:adversarypicksoneofthevalues
PriorityCRCW:valueofprocessorwithhighestIDiswritten
StrongCRCW:largest(orsmallest)valueiswritten
Thegivenmodelsareorderedinstrength:
weak commonmode arbitrarywinner priority strong
AlgorithmTheory,WS2013/14 FabianKuhn 6
SomeRelationsBetweenPRAMModels
Theorem:Aparallelcomputationthatcanbeperformedintimet,
usingp processorsonastrongCRCWmachine,canalsobe
performedintime0(t log p) usingp processorsonanEREW
machine.
Each(parallel)stepontheCRCWmachinecanbesimulatedby
0(log p) stepsonanEREWmachine
Theorem:Aparallelcomputationthatcanbeperformedintimet,
usingp probabilisticprocessorsonastrongCRCWmachine,can
alsobeperformedinexpectedtime0(t log p) using0(p log p )
processorsonanarbitrarywinnerCRCWmachine.
Thesamesimulationturnsoutmoreefficientinthiscase
AlgorithmTheory,WS2013/14 FabianKuhn 7
SomeRelationsBetweenPRAMModels
Theorem:Acomputationthatcanbeperformedintimet,usingp
processorsonastrongCRCWmachine,canalsobeperformedin
time0(t) using0 p
2
processorsonaweakCRCWmachine
Proof:
Strong:largestvaluewins,weak:onlyconcurrentlywritingu isOK
AlgorithmTheory,WS2013/14 FabianKuhn 8
SomeRelationsBetweenPRAMModels
Theorem:Acomputationthatcanbeperformedintimet,usingp
processorsonastrongCRCWmachine,canalsobeperformedin
time0(t) using0 p
2
processorsonaweakCRCWmachine
Proof:
Strong:largestvaluewins,weak:onlyconcurrentlywritingu isOK
AlgorithmTheory,WS2013/14 FabianKuhn 9
ComputingtheMaximum
Observation:OnastrongCRCWmachine,themaximumofan
valuescanbecomputedin0(1) timeusingn processors
Eachvalueisconcurrentlywrittentothesamememorycell
Lemma:OnaweakCRCWmachine,themaximumofn integers
between1 and n canbecomputedintime0 1 using0 n proc.
Proof:
Wehave n memorycells
1
, ,
n
forthepossiblevalues
Initializeall
1
Forthen valuesx
1
, , x
n
,processor] sets
x
]
u
Sinceonlyzeroesarewritten,concurrentwritesareOK
Now,
= 0(log n))
AlgorithmTheory,WS2013/14 FabianKuhn 14
GettingThePrefixSums
Insteadofcomputingthesequences
1
, s
2
, , s
n
letscompute
r
1
, , r
n
= u, s
1
, s
2
, , s
n-1
(u:neutralelementw.r.t.)
r
1
, , r
n
= u, o
1
, o
1
o
2
, , o
1
o
n-1
Togetherwiths
n
,thisgivesallprefixsums
Prefixsumr
= s
-1
= o
1
o
-1
:
a
2
a
2
a
3
a
3
a
1
a
1
a
4
a
4
a
5
a
5
a
a
7
a
7
a
8
a
8
a
9
a
9
a
1
a
1
a
11
a
11
a
12
a
12
a
13
a
13
a
14
a
14
a
15
a
15
a
1
a
1
r
14
(x
13
)
AlgorithmTheory,WS2013/14 FabianKuhn 15
GettingThePrefixSums
Claim:Theprefixsumr
= o
1
o
-1
isthesumofallthe
leavesintheleftsubtreeofeachancestoru oftheleaf:
containingo
suchthat: isintherightsubtreeofu.
a
2
a
2
a
3
a
3
a
1
a
1
a
4
a
4
a
5
a
5
a
a
7
a
7
a
8
a
8
a
9
a
9
a
1
a
1
a
11
a
11
a
12
a
12
a
13
a
13
a
14
a
14
a
15
a
15
a
1
a
1
AlgorithmTheory,WS2013/14 FabianKuhn 16
ComputingThePrefixSums
Foreachnodeu ofthebinarytree,definer(u) asfollows:
r : isthesumofthevalueso
attheleavesinalltheleftsub
treesofancestorsu of: suchthat: isintherightsubtreeofu.
Foraleafnode : holdingvalueo
:r u = r
|
= x
|-1
Fortherootnode:r ruut =
Forallothernodes::
: istheleftchild ofu:
r : = r(u)
u u
u u
: istherightchildofu:
(u hasleftchildw)
r : = r u +S
(S:sumofvaluesin
subtreeofw)
u u
w w u u
S S
AlgorithmTheory,WS2013/14 FabianKuhn 17
ComputingThePrefixSums
leafnode: holdingvalueo
:r u = r
|
= x
|-1
rootnode:r ruut =
Node: istheleftchildofu:r : = r(u)
Node: istherightchildofu:r : = r u +S
Where:S = sumofvaluesinleftsubtreeofu
Algorithmtocomputevaluesr(u):
1. Computesumofvaluesineachsubtree(bottomup)
Canbedoneinparalleltime0 log n with0(n) totalwork
2. Computevaluesr(:) topdown fromroottoleaves:
Tocomputethevaluer(:),onlyr(u) oftheparentu andthesumofthe
leftsibling(if: isarightchild)areneeded
Canbedoneinparalleltime0 log n with0 n totalwork
AlgorithmTheory,WS2013/14 FabianKuhn 18
Example
1. Computesumsofallsubtrees
Bottomup(levelwiseinparallel,startingattheleaves)
2. Computevaluesr(:)
Topdown(startingattheroot)
8 8 3 3 -1 -1 3 3 2 2 8 8 1 1 1 1 3 3 4 4 5 5 7 7 2 2
11 11 -1 -1 9 9 9 9 2 2 4 4 9 9 9 9
1 1 11 11
13 13
18 18
21 21 31 31
52 52
21
3 11
11
11
1
1
1 1
19
19 21
21
21
21 29 3
3
31
34
34
34 38 43
43
5
AlgorithmTheory,WS2013/14 FabianKuhn 19
ComputingPrefixSums
Theorem:Givenasequenceo
1
, , o
n
ofn values,allprefixsums
s
= o
1
o
(for1 i n)canbecomputedintime0(log n)
using0 n log n processors onanEREWPRAM.
Proof:
Computingthesumsofallsubtreescanbedoneinparallelin
time0 log n using0 n totaloperations.
Thesameistrueforthetopdownsteptocomputether(:)
ThetheoremthenfollowsfromBrentstheorem:
I
1
= 0 n , I
= 0 log n I
p
< I
+
I
1
p
Remark:Thiscanbeadaptedtootherparallelmodelsandto
differentwaysofstoringthevalue(e.g.,arrayorlist)
AlgorithmTheory,WS2013/14 FabianKuhn 20
ParallelQuicksort
Keychallenge:parallelizepartition
Howcanwedothisinparallel?
Fornow,letsjustcareaboutthevalues pivot
Whataretheirnewpositions
5 5 14 14 18 18 8 8 19 19 21 21 3 3 1 1 25 25 17 17 11 11 4 4 2 2 1 1 2 2 2 2 9 9 13 13 23 23 1 1
pivot
5 5 14 14 8 8 3 3 1 1 11 11 4 4 1 1 2 2 9 9 13 13 1 1 18 18 19 19 21 21 25 25 17 17 2 2 2 2 23 23
partition
AlgorithmTheory,WS2013/14 FabianKuhn 21
UsingPrefixSums
Goal:Determinepositionsofvalues pivot afterpartition
5 5 14 14 18 18 8 8 19 19 21 21 3 3 1 1 25 25 17 17 11 11 4 4 2 2 1 1 2 2 2 2 9 9 13 13 23 23 1 1
pivot
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 2 2 2 2 3 3 3 3 3 3 4 4 5 5 5 5 5 5 7 7 7 7 8 8 8 8 9 9 1 1 11 11 11 11 12 12
5 5 14 14 8 8 3 3 1 1 11 11 4 4 1 1 2 2 9 9 13 13 1 1 18 18 19 19 21 21 25 25 17 17 2 2 2 2 23 23
prefixsums
partition
AlgorithmTheory,WS2013/14 FabianKuhn 22
PartitionUsingPrefixSums
Thepositionsoftheentries> pivot canbedeterminedinthe
sameway
Prefixsums:I
1
= 0 n , I
= 0(log n)
Remainingcomputations:I
1
= 0 n , I
= 0(1)
Overall: I
1
= 0 n , I
= 0(log n)
Lemma:Thepartitioningofquicksortcanbecarriedoutin
parallelintime0 log n using0
n
Iog n
processors.
Proof:
ByBrentstheorem:I
p
1
1
p
+I
AlgorithmTheory,WS2013/14 FabianKuhn 23
ApplyingtoQuicksort
Theorem:OnanEREWPRAM,usingp processors,randomized
quicksortcanbeexecutedintimeI
p
(inexpectationandwith
highprobability),where
I
p
= 0
n log n
p
+log
2
n .
Proof:
Remark:
Wegetoptimal(linear)speedupw.r.t.tothesequential
algorithmforallp = 0 n log n .
AlgorithmTheory,WS2013/14 FabianKuhn 24
OtherApplicationsofPrefixSums
Prefixsumsareaverypowerfulprimitivetodesignparallel
algorithms.
Particularlyalsobyusingotheroperatorsthan+
ExampleApplications:
Lexicalcomparisonofstrings
Addmultiprecisionnumbers
Evaluatepolynomials
Solverecurrences
Radixsort/quicksort
Searchforregularexpressions
Implementsometreeoperations