[go: up one dir, main page]

0% found this document useful (0 votes)
43 views59 pages

Average Case 1

This document provides lecture notes on average-case analysis of algorithms. It introduces key concepts like optimization problems, algorithms, adversarial models, and basic probability theory. It then covers the average-case analysis of specific algorithms for sorting, selecting, knapsack problems, scheduling, paging, bin packing, and traveling repairman problems. The analysis focuses on comparing average or expected performance to worst-case bounds.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views59 pages

Average Case 1

This document provides lecture notes on average-case analysis of algorithms. It introduces key concepts like optimization problems, algorithms, adversarial models, and basic probability theory. It then covers the average-case analysis of specific algorithms for sorting, selecting, knapsack problems, scheduling, paging, bin packing, and traveling repairman problems. The analysis focuses on comparing average or expected performance to worst-case bounds.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Average-Case Analysis

Lecture Notes, Winter Term 08/09

University of Freiburg

Alexander Souza
Contents

1 Introduction 3
1.1 Optimization Problems and Algorithms . . . . . . . . . . . . . . . . . . . . 4
1.2 Adversarial Models and Performance Measures . . . . . . . . . . . . . . . . 5
1.3 Basic Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Sorting and Selecting 11


2.1 Quicksort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Quickselect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Knapsack 17
3.1 Stochastic Profits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.1 Greedy Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.2 Nemhauser-Ullmann Algorithm . . . . . . . . . . . . . . . . . . . . . 21
3.2 Stochastic Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Adaptive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.2 Non-Adaptive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Completion Time Scheduling 31


4.1 Deterministic Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Stochastic Non-Clairvoyant Scheduling . . . . . . . . . . . . . . . . . . . . . 32

5 Paging 36
5.1 Competitive Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Structure of Typical Request Sequences . . . . . . . . . . . . . . . . . . . . 44

6 Bin Packing 48
6.1 Deterministic Bin Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.2 Stochastic Bin Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

7 Traveling Repairman 53
7.1 Deterministic Arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7.2 Stochastic Arrivals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

2
Chapter 1

Introduction

Generally speaking, average-case analysis asks how some given algorithm behaves “typ-
ically”. The main motivation for this is that for some (practically relevant) algorithms
there is a gap between the worst-possible and the usually observed mannerism.
For example, the deterministic sorting algorithm Quicksort  that always chooses the
first element of an n-element array as its pivot requires O n2 comparisons in the worst
case. Thus, from that perspective one has to dismiss Quicksort (since there are algo-
rithms that guarantee O (n log n) comparisons). On the other hand, Quicksort is often
observed to outperform other algorithms in terms of number of comparisons. We will show
very soon that the expected number of comparisons of Quicksort is also O (n log n) and
even that this is achieved with high probability.
Now we explain the unspecified notions of the first two sentences in little more detail.
First of all, an algorithm receives an input and produces an output after a finite number of
elementary operations. Within this lecture we will mostly be concerned with optimization
algorithms. The outputs these algorithms produce are valuated with an objective function
and the goal is to minimize (respectively maximize) the valuation.
The second undetermined notion from above is behave: We will either be interested in
the running time or the approximation guarantee of a particular algorithm. The number
of elementary operations the algorithm executes before returning its output is called the
running time on that particular instance. For example, many algorithms that solve NP-
hard optimization problems exactly require exponential running time on some instances.
Exponential running time is in general not desired since it only allows treatment of rela-
tively small inputs. On the other hand, if we resort to polynomial running time we may
have to surrender on solution quality (unless P = NP), i.e., the algorithm does not always
produce solutions with best-possible objective value. We are mainly interested in algo-
rithms where one can prove an a priori bound on the ratio of the objective value achieved
by the algorithm compared with the best-possible value – the approximation guarantee.
The third notion, “typically” is intended to reflect the following gap that some practi-
cally relevant algorithms feature. From the worst-case perspective, unless P = NP, there is
no way that we can avoid exponential running time (if we are interested in exact solutions)
or suboptimal solutions (if we insist on polynomial running time) in general. So, what is
the point in asking for “typical” running time, respectively approximation guarantee? The
simple answer is: observed behaviour. There are several algorithms where the worst-case
bounds on running time, respectively approximation guarantee, do not adequately reflect
the behaviour one usually comes across – these bounds seem to be too pessimistic. Among
these algorithms there are several ones that play an important role in applications, e.g.,
Quicksort for sorting and Least Recently Used for memory management. It is thus

3
primarily of practical interest to give answers why those algorithms are often observed to
work well. After this glimpse into the field we have to make all these notions more precise.

1.1 Optimization Problems and Algorithms


An instance I of a (combinatorial) optimization problem P can formally be defined as a
tuple I = (U, S, val, extr) with the following meaning:
U the solution space (on which val and S are defined),
S the set of feasible solutions S ⊆ U ,
val the objective function val : U → R,
extr the extremum (usually max or min).
U and S are usually not given explicitly. Instead, S is often defined with a feasibility
predicate F by S = {X ∈ U : X satisfies F }. Our goal is to find a feasible solution where
the desired extremum of val is attained. It will be convenient to write a problem P in the
format:
Problem 1.1 P

extremize val(X)
subject to X ∈ U satisfies F

Any extremizing solution is called an optimum solution, or simply an optimum. We


usually denote by opt = opt(I) the optimal objective value of an instance I. The
objective value an algorithm alg achieves on the instance I is denoted alg = alg(I).
Example 1.1 (Knapsack). We have a knapsack that can carry a total weight of at most
c. There are n items and each such item i has weight wi and yields profit pi at a market.
The goal is to choose the items that maximize our total profit at a market without violating
the capacity of our knapsack. Consider c = 100 and the following weight-profit table:
i wi pi
1 100 150
2 1 2
3 50 55
4 50 100
The feasibility predicate is F : P
“total weight of chosenP
items is at most c”. For any set
X ⊆ {1, 2, 3, 4} define p(X) = i∈X pi and w(X) = i∈X wi . Now we formulate the
problem in the introduced notation.

Problem 1.2 Knapsack

maximize p(X)
subject to w(X) ≤ c,
X ⊆ {1, 2, 3, 4}.

Which is the best solution? We evaluate all possibilities and find that X ∗ = {3, 4}
gives w(X ∗ ) = 155 altogether which maximizes our profit.

4
A central problem around combinatorial optimization is that it is often in principle
possible to find an optimum solution by enumerating the set of feasible solutions, but
this set mostly contains “too many” elements. This phenomenon is called combinatorial
explosion.
An optimization algorithm is given a description of an instance I, called the input, and
has to produce a feasible solution O, called the output, after a finite number of elementary
operations, where each such operation is effective and well-defined. We do not further
specify what an elementary operation means as this depends on the context. Operations
of interest include additions, multiplications, memory accesses, comparisons, and so on.
The number of elementary operations an algorithm executes on an input I before producing
an output O is called its running time on that input. The number of bits of the input is
denoted |I| and called the input-length. An algorithm whose running time is a polynomial
in |I| is called a polynomial time (or efficient) algorithm.
The central problem we are facing is that (unless P = NP), there are (many) optimiza-
tion problems for which no efficient algorithm exists. This means that we only have the
following choices: If we insist on the optimum solution, then we have to accept exponential
running time (at least on some instances). Otherwise, if we want polynomial running time,
then we must accept that any algorithm may produce suboptimal solutions (at least on
some instances).
This fundamental problem yields that we have to reason more about how to judge
optimization algorithms. It is convenient to see optimization as a game played between
a (malicious) adversary and an algorithm. Such a game is played with an optimization
problem P , which is simply a set of instances P = {I1 , I2 , . . . }. In the most basic form,
the adversary chooses an instance among the members of P and the algorithm has to find
a feasible solution. Throughout, we shall assume that our algorithms are deterministic.

1.2 Adversarial Models and Performance Measures


The variants of the game differ mainly in two ways:

(1) The way the adversary is restricted in the choice of the instance (the adversarial
model ), and

(2) the manner in which the solution produced by the algorithm is assessed (the perfor-
mance measure).

We will present several popular choices in the paragraphs below.

Worst-Case Model
In the classical worst-case perspective, the adversary may choose an arbitrary instance I
of the members of a given problem P .
For any instance I, let tI (alg) denote the running time of an algorithm alg on it. In
terms of running time we valuate the algorithm with the quantity

t(alg) = sup tI (alg).


I∈P

In terms of solution quality on a given instance I, we judge an algorithm with the


approximation ratio rI (alg) = alg(I)/opt(I) produced. Since the adversary is assumed
to be malicious, we have to be pessimistic. If P is a minimization problem, the adversary

5
tries to maximize the approximation ratio and the overall quality of the algorithm is
measured with
alg(I)
r(alg) = sup rI (alg) = sup .
I∈P I∈P opt(I)
An algorithm with r(alg) ≤ % is called a %-approximation. If P is a maximization problem,
he tries to minimize the ratio and we judge
alg(I)
r(alg) = inf rI (alg) = inf
I∈P I∈P opt(I)
and call it a %-approximation if r(alg) ≥ %.

Average-Case Models
For reasons sketched above, the worst-case model is sometimes too pessimistic to give
satisfactory answers from the perspective of the practitioner. So, apparently, the adversary
is too powerful, and one has to seek models that try to capture “typical” behaviour.
One attempt in that direction is the diffuse adversary. The adversary is given the
problem, i.e., the set P of instances and along with that a set of available probability
distributions ∆ = {D1 , D2 , . . . }. The set ∆ is assumed to be chosen from an outside
entity. Each distribution D ∈ ∆ assigns to each instance I ∈ P a certain probability
Pr [I]. Notice that the input-instance I is now a random variable. Instead of allowing
the adversary to choose a particular instance we only allow him to choose any probability
distribution from the set ∆.
In terms of running time, one is usually interested in
X
t(alg) = sup E [tI (alg)] = sup tI (alg)Pr [I] .
D∈∆ D∈∆ I∈P

Notice that I is a random variable and hence tI (alg) is also a random variable.
It is debatable how the quality of an algorithm should be judged in such a model.
There are at least two alternatives. Let P be a minimization problem without loss of
generality. To transfer the definitions below to maximization we only have to replace sup
with inf.
One point of view is: Let D be the any distribution for P . Then taking the expectation
with respect to D yields
  X
alg(I) alg(I)
eD (alg) = E = Pr [I]
opt(I) opt(I)
I∈P

which judges the algorithm through the expected value of the approximation ratio. Con-
sequently, as the adversary is malicious, the overall judgement is given through
 
alg(I)
e(alg) = sup eD (alg) = sup E .
D∈∆ D∈∆ opt(I)
Another way is this: For a distribution D, and taking expectations with respect to it,
define P
0 E [alg(I)] alg(I)Pr [I]
eD (alg) = = PI∈P ,
E [opt(I)] I∈P opt(I)Pr [I]
i.e., the ratio of the expectations, which yields the overall measure
E [alg(I)]
e0 (alg) = sup eD (alg) = sup .
D∈∆ D∈∆ E [opt(I)]

6
If e(alg) ≤ %, respectively e0 (alg) ≤ % holds, we speak of a %-expected-approximation
(with the measure used in the context).
Both measures have their advantages and drawbacks. The perspective of e(alg) is
that one instance is drawn at random and the algorithm is compared directly against the
optimal algorithm on that instance (in expectation). Thus the measure prefers algorithms
that give good approximation ratios “often”. By contrast, from the point of view of e0 (alg)
many instances are drawn and the total objective value achieved is compared with the total
optimal value (respectively in expectation). This measure thus prefers algorithms that
give good approximation ratios when the optimum itself yields “large” objective values.
It certainly depends on the intended application which perspective is the advisable one.

Example 1.2. Consider a hypothetical minimization problem with two instances P =


{A, B}. Associate the following probability distribution with the problem
1 1
Pr [x ∈ A] = 1 − and Pr [x ∈ B] = .
n n
where n is arbitrarily large. Let opt denote the value of the optimal objective value and
define opt(A) = 1 and opt(B) = n. Let alg1 and alg2 denote the values achieved
by two certain algorithms where these have the properties alg1 (A) = 1, alg1 (B) = n2 ,
alg2 (A) = n, and alg2 (B) = n.
The following truth-table depicts the relative qualities of the algorithms measured with
r(alg), e(alg), and e0 (alg) respectively.

I A B r(alg) e(alg) e0 (alg)


opt(I) 1 n 1 1 1
alg1 (I) 1 n2 n 2 (n + 1)/2
alg2 (I) n n n n n/2
Pr [I] 1 − 1/n 1/n
Observe that alg1 achieves the optimum value with probability 1 − 1/n and with
probability 1/n it has worst-case performance alg1 (I)/opt(I) = n. As n grows, the
algorithm performs optimal with increasing probability.
Further observe that alg2 achieves the worst-case performance with probability 1−1/n
and the optimum value on instances where the optimum value is n, i.e., large. But these
occur only with probability 1/n.
The measure e(alg) favours alg1 , i.e., the one that performs well with high probability.
The larger n is, the more it favours alg1 , as its probability for being optimal increases.
By contrast e0 (alg) favours alg2 , which works well on instances where the optimum
value is large. Finally, with the approximation ratio r(alg), these algorithms are even
indistinguishable.

Decision-Making Under Uncertainty


In the above paragraphs we always assumed implicitly that the algorithm is given the whole
input at the outset. This assumption is not always justified. Sometimes we are facing
situations where the instance becomes known only gradually. Nonetheless, an algorithm
has to make decisions.
For example, due to the halting problem, it is impossible to find out how long a
processor will be executing a program before it terminates (or whether it terminates at
all). However, the operating system has to decide which program to run and how much
computation time it will be granted.

7
A classical worst-case notion is competitive analysis where an algorithm receives a
sequence of inputs and has to react upon each one without any foresight. Analogously
to the approximation ratio, the performance is judged by comparing the objective value
achieved with the optimal value. In such situations, a malicious adversary is even more
powerful than before, since an algorithm that is not even aware of the whole instance is
compared with an optimal algorithm that knows the whole instance in advance. Thus the
problem that the results in this model tend to be overly pessimistic is true even more.
Hence average-case models for decision-making under uncertainty are interesting both
from a theoretical and practical perspective. However, up to now, there are several rea-
sonable approaches, each one with advantages and drawbacks. We will not go into the
details here, but introduce the models in the chapters when discussed.

1.3 Basic Probability Theory


This section is intended as a refresher of the basic notions of probability theory. We will
restrict ourselves to discrete probability spaces here. This will mostly be sufficient for
our purposes, but in case needed, we will also allow continuous probability spaces with
analogous definitions.

Fundamentals
A set Ω = {ω1 , ω2 , . . . } is called a probability space and its elements ω ∈ Ω are called
the elementary events. A probability distribution D assigns to each elementary event ω a
number, called its probability Pr [ω] such that:
(1) 0 ≤ Pr [ω] ≤ 1 for all ω ∈ Ω, and
P
(2) ω∈Ω Pr [ω] = 1.

P A ⊆ Ω is called an event and the corresponding probability is given through


Any subset
Pr [A] = ω∈A Pr [ω]. For each event A define the complementary event A = Ω − A.
Observe that A ⊆ B implies Pr [A] ≤ Pr [B]. Notice the special probabilities Pr [∅] = 0
and Pr [Ω] = 1.
Theorem 1.3. Let A
n
P1n, . . . , An be pairwise disjoint events, i.e., Ai ∩ Aj = ∅ for all i 6= j,
then Pr [∪i=1 AP i] = i=1 Pr [Ai ]. If the events are not pairwise disjoint, then we have
Pr [∪ni=1 Ai ] ≤ ni=1 Pr [Ai ].
Let A and B be two events where B 6= ∅. Then the conditional probability of A given B
is defined by Pr [ A | B] = Pr [A ∩ B] /Pr [B]. Two events A and B are called independent
if Pr [A ∩ B] = Pr [A] Pr [B].
Theorem 1.4. If Pr [A1 ∩ · · · ∩ An ] > 0 then
Pr [A1 ∩ · · · ∩ An ] = Pr [A1 ] Pr [ A2 | A1 ] · · · · · Pr [ An | A1 ∩ · · · ∩ An−1 ] .
Theorem 1.5. Let A1 , . . . , An be pairwise disjoint events and let B ⊆ A1 ∪ · · · ∪ An . Then
we have
Xn
Pr [B] = Pr [ B | Ai ] Pr [Ai ] .
i=1
If Pr [B] > 0 we additionally have
Pr [Ai ∩ B] Pr [Ai ∩ B]
Pr [ Ai | B] = = Pn .
Pr [B] i=1 Pr [ B | Ai ] Pr [Ai ]

8
For some given probability space any mapping X : Ω → Z is called a (numerical)
random variable. The expected value of X is given through
X X
E [X] = X(ω)Pr [ω] = xPr [X = x]
ω∈Ω x∈Z
P
provided that x∈Z |x|Pr [X = x] converges.

Theorem 1.6. Let X and Y be two random variables such that X(ω) ≤ Y (ω) for all
ω ∈ Ω then E [X] ≤ E [Y ].

Theorem 1.7 (Linearity of Expectation). For random variables X1 , . . . , Xn and constants


a1 , . . . , an , b we have that

E [a1 X1 + · · · + an Xn + b] = a1 E [X1 ] + · · · + an E [Xn ] + b.

The variance of a random variable X is Var [X] = E (X − E [X])2 = E X 2 − E [X]2 .


   

Random variables X and Y are called independent if Pr [X = x, Y = y] = Pr [X = x] Pr [Y = y]


holds for all x and y.

Theorem 1.8. For independent random variables X1 , . . . , Xn and constants a1 , . . . , an , b


we have that
E [X1 · · · · · Xn ] = E [X1 ] · · · · · E [Xn ]
and
Var [a1 X1 + · · · + an Xn + b] = a21 Var [X1 ] + · · · + a2n Var [Xn ] .

Bounds on Distributions
Two elementary bounds on the distribution of any random variable are the theorems of
Markov and Chebyshev.

Theorem 1.9 (Markov). Let X ≥ 0 be a random variable. Then

E [X]
Pr [X ≥ t] ≤
t
holds for all t > 0.

Theorem 1.10 (Chebyshev). Let X be a random variable. Then

Var [X]
Pr [|X − E [X] | ≥ t] ≤
t2
holds for all t > 0.

Important Distributions
A Bernoulli distributed random variable X ∈ {0, 1} indicates if a certain event that has
probability p occurs. We denote X ∼ Ber (p) and it has the distribution:

Pr [X = 1] = p and Pr [X = 0] = 1 − p.

We further have E [X] = p and Var [X] = p(1 − p).

9
A sum of n independent Bernoulli variables X = X1 + · · · + Xn with respective pa-
rameter p has Binomial distribution, denoted X ∼ Bin (n, p). The distribution is for any
k ∈ {0, . . . , n}:  
n k
Pr [X = k] = p (1 − p)n−k .
k
We clearly have E [X] = np and Var [X] = np(1 − p).
A random variable X ∈ {1, 2, . . . } that counts the number of trials until some event
that has probability p occurs for the first time has geometric distribution. It is denoted
X ∼ Geo (p) and has the distribution

Pr [X = k] = (1 − p)k−1 p

and E [X] = 1/p and Var [X] = (1 − p)/p2 .


A uniform distributed random variable X is used to assign the same probability to
each element of a set U = {a, a + 1, . . . , b − 1, b}. We write X ∼ Uni {a, . . . , b} and it has
the distribution
1
Pr [X = k] =
b−a+1
and E [X] = (a + b)/2 and Var [X] = ((b − a + 1)2 − 1)/12.

10
Chapter 2

Sorting and Selecting

In this chapter we give average-case analyses of the well-known sorting and searching
algorithms Quicksort and Quickselect.

2.1 Quicksort
The problem Sort is the following: We are given a sequence a = (a1 , a2 , . . . , an ) of
pairwise distinct numbers and are asked to find a permutation π of (1, 2, . . . , n) such that
the sequence b = (b1 , b2 , . . . , bn ) = (aπ(1) , aπ(2) , . . . , aπ(n) ) satisfies b1 ≤ b2 ≤ · · · ≤ bn . The
elementary operations we are interested in are the comparisons “≤” an algorithm asks.
Let ta (alg) be the number of comparisons of an algorithm alg given a sequence a.
The assumption that the numbers are pairwise distinct can easily be removed, but we
consider it for clarity of exposition.
The idea of the algorithm Quicksort is to choose some element p from a, called the
pivot, and divide a into two subsequences ` and r. The sequence ` contains the elements
ai ≤ p and r those ai > p. Quicksort is then called recursively until the sequence is
empty. The sequence (Quicksort(`), p, Quicksort(r)) is finally returned.

Algorithm 2.1 Quicksort


Input. Sequence (a1 , a2 , . . . , an )

Output. Sequence (b1 , b2 , . . . , bn )

(1) If n = 0 return.

(2) Otherwise let p = a1 . Let ` and r be two empty sequences.

(3) For i = 2, . . . , n, if ai ≤ p append ai to ` otherwise append ai to r.

(4) Return (Quicksort(`), p, Quicksort(r))

Worst-Case Analysis
It is well-known that this variant of Quicksort has the weakness that it may require
2

Ω n comparisons.
Observation 2.1. There is a sequence a such that ta (Quicksort) = n(n − 1)/2.

11
Proof. Consider a = (1, 2, . . . , n). Then, in step (3), ` remains empty while r contains
n − 1 elements. This step requires n − 1 comparisons. By induction, the recursive calls
Quicksort(`) and Quicksort(r) require 0, respectively (n − 1)(n − 2)/2 comparisons
“≤”. Thus, the whole algorithm needs n−1+(n−1)(n−2)/2 = n(n−1)/2 comparisons.

Average-Case Analysis
Here we study Quicksort in the following probabilistic model. Let π be any permutation
of (1, 2, . . . , n) and let the probability of π be 1/n!, i.e., Pr [π] = 1/n!. We draw a permu-
tation π at random according to this probability distribution and the input of Quicksort
is the sequence A = (aπ(1) , aπ(2) , . . . , aπ(n) ). (We use the capital letter A to emphasize that
it is a random variable.)
Then the expected P number of comparisons Quicksort needs is O (n log n). More
specifically let Hn = nk=1 1/k denote the n-th Harmonic number and recall that log n ≤
Hn ≤ log n + 1.

Theorem 2.2. We have that E [tA (Quicksort)] = 2(n + 1)Hn − 4n.

We will give two proofs of this theorem. In the first one we derive a recursion, in
the second one we use linearity of expectation. It causes no loss of generality to assume
that the original sequence is itself a permutation of (1, 2, . . . , n) and will be assumed in
the sequel. Also, let cn = E [tA (Quicksort)] be the expected number of comparisons
Quicksort carries out on a uniformly distributed permutation A of length n.

First proof of Theorem 2.2. The idea of this proof is to give a recursion formula and then
solve it (by induction). Before we actually prove this theorem we need a lemma that is
needed due to the recursive structure of the algorithm. The original input A is a random
permutation with uniform distribution. Hence the pivot P is a random variable. Thus also
the sequences L and R constructed by Quicksort are random permutations of subsets
of {1, 2, . . . , n}. However, for the proof to work, we have to show that these sequences are
also uniform distributed.
Lemma 2.3. Let the outcome of the pivot element be P = p. Let L = ` and R = r be two
possible outcomes of the sequences Quicksort constructs given pivot p. Then we have
1 1
Pr [ L = ` | P = p] = and Pr [ R = r | P = p] = .
|`|! |r|!

Proof. We give a proof for L; for R it is analogous. Conditioning on a pivot element


p ∈ {1, 2, . . . , n} yields that each outcome ` of L has the same elements, namely, the
numbers i < p. Let ` beany outcome of L and let k = |`| be the cardinality of that
sequence. There are n−1 k (n − 1 − k)! many permutations of the sequence (1, 2, . . . , n)
that contain p as first element and ` as a subsequence. Thus, the probability that the
outcome of L is exactly ` (given pivot p) is
 
n − 1 (n − 1 − k)! (n − 1)! (n − 1 − k)! 1
Pr [ L = ` | P = p] = = · =
k (n − 1)! k!(n − 1 − k)! (n − 1)! k!

which is what we had to show.

12
Since A is uniformly distributed the probability that the outcome of the pivot P = p
equals 1/n for every p ∈ {1, 2, . . . , n}. For n = 0 we have c0 = 0 and for n > 0 we we
obtain the recursion
n
X
cn = E [tA (Quicksort)] = n − 1 + E [ tA (Quicksort) | P = p] Pr [P = p]
p=1
n
X
=n−1+ (E [ tL (Quicksort) | P = p] + E [ tR (Quicksort) | P = p])Pr [P = p]
p=1
n
X 1
=n−1+ (cp−1 + cn−p )
n
p=1
n−1
2X
=n−1+ ck
n
k=0

where the equality E [ tL (Quicksort) | P = p] = cp−1 for L (and analogously for R) de-
pends critically on Lemma 2.3.
We “guess” aP solution for cn and verify by induction that it satisfies the recursion
cn = n − 1 + 2/n n−1 k=0 ck . Our guess is ck = 2(k + 1)Hk − 4k for any k ∈ {0, 1, . . . , n}.
The base case is k = 0. There, Quicksort does not perform any comparison at all,
i.e., c0 = 0. Using H0 = 0 yields the base case.
For the inductive case we have that Pthe claim is true for any k ≤ n − 1 and we will
prove it for k = n. Using the identity n−1 k=0 (k + 1)Hk = n/2(nHn − n/2 − 3/2 + Hn ) we
find
n−1 n−1
2X 2X
cn = n − 1 + ck = n − 1 + (2(k + 1)Hk − 4k)
n n
k=0 k=0
 
n 3
= n − 1 + 2 nHn − − + Hn − 4(n − 1) = 2(n + 1)Hn − n − 3 − 3(n − 1)
2 2
= 2(n + 1)Hn − 4n

and the result is established.

Second proof of Theorem 2.2. We assumed that the original sequence is a permutation of
(1, 2, . . . , n). So, for any i < j ∈ {1, 2, . . . , n} let the random variable Xij be equal to one
if i and j are compared during the course Pn−1ofP the algorithm and zero otherwise. The total
number of comparisons is hence X = i=1 nj=i+1 Xij . Thus, by
 
n−1
X n
X n−1
X n
X n−1
X n
X
cn = E [X] = E  Xij  = E [Xij ] = Pr [Xij = 1]
i=1 j=i+1 i=1 j=i+1 i=1 j=i+1

we have to derive the probability that i and j are compared.


First observe that each element will be pivot element in the course of the algorithm
exactly once. Thus the input A induces a unique sequence P = (P1 , P2 , . . . , Pn ) of pivots.
Now fix i and j arbitrarily. When will these elements be compared? We claim that it
will be the case if and only if either i or j is the first pivot in the sequence P from the set
{i, . . . , j}. If i and j are compared, then either one must be the pivot and they must be
in the same sequence. Thus all previous pivots (if any) must be smaller than i or larger
than j, since i and j would end up in different subsequences, otherwise. Hence, either i or

13
j is the first pivot in the set {i, . . . , j}. The converse direction is trivial: if one of i or j is
the first pivot from the set {i, . . . , j}, then i and j are still in the same sequence and will
hence be compared.
What is the probability that, say, i is the first pivot of {i, . . . , j}? Consider the sub-
sequence S induced by the elements {i, . . . , j} on A. The crucial observation is that S is
also a subsequence of P . Hence i is the first pivot from the set {i, . . . , j} if and only if i is
the first element from that set in A. Since A is uniformly distributed the probability for
that event is exactly 1/(j − i + 1). Analogous reasoning for j yields the overall probability
Pr [Xij = 1] = 2/(j − i + 1).
This allows us to calculate
n−1 n n−1 n n−1
X n−i+1
X X X X 2 X 2
cn = Pr [Xij = 1] = =
j−i+1 k
i=1 j=i+1 i=1 j=i+1 i=1 k=2
n n+1−k n n
X X 2 X 2 X 2
= = (n + 1 − k) = (n + 1) − 2(n − 1)
k k k
k=2 i=1 k=2 k=2
= 2(n + 1)Hn − 4n
and the proof is complete.

So we have shown that the expected number of comparisons of Quicksort is O (n log n).
We can even strengthen the statement: The number of comparisons is O (n log n) with high
probability, i.e., with probability that tends to one as n tends to infinity.
Theorem 2.4. For any c > 1 we have that
4
Pr [tA (Quicksort) ≤ cE [tA (Quicksort)]] ≥ 1 − .
(c − 1)2 Hn
Proof. Our goal is to apply Chebyshev’s inequality: Pr [|X − E [X] | ≥ t] ≤ Var [X] /t2 ,
where, as before X denotes the number of comparisons.
Suppose that we can prove Var [X] = 2nE [X] + 8n2 then we have
Var [X]
Pr [X > cE [X]] ≤ Pr [|X − E [X] | > (c − 1)E [X]] ≤
(c − 1)2 E [X]2
2 2 4
= 2
+ 2 2

2(c − 1) Hn (c − 1) Hn (c − 1)2 Hn
which is what we wanted to show.
As before, Xij is equal to one if the elements i and j are compared during the course
of the algorithm, zero otherwise.
  !
X X
Var [X] = E X 2 − E [X]2 = E  Xk`  − E [X]2
 
Xij 
i<j k<`
X X X
= E [Xij ] + 2 E [Xij Xik ] + E [Xij Xk` ] − E [X]2
i<j i<j,k i<j, k<`
k6=i, `6=j
X X
≤ E [X] + 2 E [Xij Xik ] + E [Xij ] E [Xk` ] − E [X]2
i<j,k i<j, k<`
k6=i, `6=j
X
≤ E [X] + 2 E [Xij Xik ] .
i<j,k

14
Now we have to bound E [Xij Xik ] for any combination of i < j and a free k. For any of
the cases i < j < k, i < k < j, and k < i < j we have
2
E [Xij Xik ] = E [ Xik | Xij ] Pr [Xij = 1] ≤ .
j−i+1
This yields
X X 2
E [Xij Xik ] ≤ (n − 1) ≤ 2n(n − 1)Hn
j−i+1
i<j,k i<j

and thus in total Var [X] ≤ E [X]+4(n−1)nHn ≤ 2nE [X]+8n2 which gives the result.

2.2 Quickselect
In the problem Select we are given a sequence a = (a1 , a2 , . . . , an ) of pairwise distinct
numbers and are asked to find the k-th smallest element, for some given k. We are again
interested in the number of comparisons “≤”. Let ta (alg, k) be the number of comparisons
of an algorithm alg given a sequence a and parameter k.
The idea of the algorithm Quickselect is, similarly to Quicksort, to choose some
element p from a, called the pivot, and divide a into two subsequences ` and r. The se-
quence ` contains the elements ai ≤ p and r those ai > p. Then Quickselect determines
which subsequence contains the desired element and is then called recursively on that
subsequence until the element is found.

Algorithm 2.2 Quickselect


Input. Sequence (a1 , a2 , . . . , an ), integer k

Output. Element b

(1) If n = 0 return.

(2) Otherwise let p = a1 . Let ` and r be two empty sequences.

(3) For i = 2, . . . , n, if ai ≤ p append ai to ` otherwise append ai to r.

(4) If |`| = k − 1 return b = p. Else, if |`| > k return Quickselect(`, k), otherwise return
Quickselect(r, k − |`|)

Worst-Case Analysis
Analogously to Quicksort, this variant of Quickselect may require Ω n2 compar-


isons.

Observation 2.5. There is a sequence a such that ta (Quickselect, n) = n(n − 1)/2.

Proof. Consider a = (1, 2, . . . , n) and k = n. Then, in step (3), ` remains empty while r
contains n − 1 elements. This step requires n − 1 comparisons. By induction, the recursive
call to Quickselect(r, n − 1) requires (n − 1)(n − 2)/2 comparisons “≤”. Thus, the whole
algorithm needs n − 1 + (n − 1)(n − 2)/2 = n(n − 1)/2 comparisons.

15
Average-Case Analysis
We study Quickselect also in the probabilistic model that each permutation of the origi-
nal sequence is equally likely. Then the expected number of comparisons of Quickselect
is O (n).

Theorem 2.6. For any k we have that E [tA (Quickselect, k)] ≤ 4n.

Proof. Again, without loss of generality we assume that the original sequence is itself a
permutation of (1, 2, . . . , n). Let cn,k be the expected number of comparisons of Quicks-
elect for the parameter k on a uniformly distributed permutation A of length n.
We give a recursion formula and then give an upper bound on the solution (by induc-
tion). Notice that Lemma 2.3 holds here, too, since Quickselect uses the same pivot-rule
as Quicksort. Since A is uniformly distributed the probability that the outcome of the
pivot P = p equals 1/n for every p ∈ {1, 2, . . . , n}. For n = 0 we have c0 = 0 and for n > 0
we we obtain the recursion
n
X
cn,k = E [tA (Quickselect, k)] = n − 1 + E [ tA (Quickselect, k) | P = p] Pr [P = p]
p=1

k−1
X
=n−1+ E [ tR (Quickselect, k − p) | P = p]
p=1

n
X
+ E [ tL (Quickselect, k) | P = p] Pr [P = p]
p=k+1
 
k−1 n
1 X X
=n−1+ cn−p,k−p + cp,k 
n
p=1 p=k+1

where the equality E [ tL (Quickselect) | P = p] = cp,k for L (and analogously for R)


depends critically on Lemma 2.3.
We guess that cn,k ≤ 4n for any k and verify
   
k−1 n k−1 n
1 X X 1 X X
cn,k = n − 1 + cn−p,k−p + cp,k  ≤ n − 1 +  4(n − p) + 4p
n n
p=1 p=k+1 p=1 p=k+1
 
4 (n − k)(n − k + 1) k(k + 1)
=n−1+ n(n − 1) − −
n 2 2
2
 
4 n
≤n−1+ n(n − 1) − = n − 1 + 4(n − 1) − n
n 4
≤ 4n

and the result is established.

16
Chapter 3

Knapsack

This chapter is concerned with the Knapsack problem. This problem is of interest in
its own right because it formalizes the natural problem of selecting items so that a given
budget is not exceeded but profit is as large as possible. Questions like that often also
arise as subproblems of other problems. Typical applications include: option-selection in
finance, cutting, and packing problems. See Example 1.1 for an illustration.
In the Knapsack problem we are given a capacity c and n items. Each item i comes
along with a profit pi and a weight wi . We are asked to choose a subset of the items as
to maximize total profit but the total weight not exceeding c. Let xj ∈ {0, 1} indicate if
a certain item j is included in the solution or not.
In the sequel let p = (p1 , . . . , pn ), w = (w1 , . . . , wn ), and x =P(x 1 , . . . , xn ) denote the
n n
respective vectors. The profit of a vector x ∈ {0, P 1} is val(x) = j=1 pj xj . The weight of
a vector x ∈ {0, 1}n is given by weight(x) = nj=1 P wj xj . In order to obtain a non-trivial
problem we assume wj ≤ c for all j = 1, . . . , n and nj=1 wj > c throughout.

Problem 3.1 Knapsack


X
maximize val(x) = pj xj
j
X
subject to wj xj ≤ c,
j

xj ∈ {0, 1} j = 1, . . . , n.

Knapsack is NP-hard which means that “most probably”, there is no polynomial time
optimization algorithm for it. We will consider two stochastic variants of the problem: In
Section 3.1 the profits of the items are random, in Section 3.2 the weights are.

3.1 Stochastic Profits


3.1.1 Greedy Algorithm
The idea behind the following Greedy algorithm is to successively choose items that yield
most profit per unit weight. The ratio pj /wj is called the efficiency of item j. We may
assume that the itemsPkare given in non-increasing order of efficiency. The smallest item
number k such that j=1 pj > c is called the break item, i.e., the first one that does not

17
fit into the knapsack anymore. Common sense suggests the following simple algorithm:
xj = 1 for j = 1, . . . , k − 1, xj = 0 for j = k, . . . , n.
Unfortunately, the approximation ratio of this algorithm can be arbitrarily bad as the
example below shows. The problem is that more efficient items can “block” more profitable
ones.
Example 3.1. Consider the following instance, where c is a sufficiently large integer.
j pj wj pj /wj
1 1 1 1
2 c−1 c 1 − 1/c
The algorithm chooses item 1, i.e., the solution x = (1, 0) and hence val(x) = 1. The
optimum solution is x∗ = (0, 1) and thus val(x∗ ) = c − 1. The approximation ratio of the
algorithm is 1/(c − 1), i.e., arbitrarily bad (small). However, this natural algorithm can
be turned into a 1/2-approximation from a worst-case perspective and even better from
an average-case point of view.

Algorithm 3.1 Greedy


Integer c, vectors p, w ∈ Nn with wj ≤ c,
P
Input. j wj > c, and p1 /w1 ≥ · · · ≥ pn /wn .

Output. Vector x ∈ {0, 1}n such that weight(x) ≤ c.


Pj
(1) Define k = min{j ∈ {1, . . . , n} : i=1 wi > c}.

(2) Let x = (1k−1 , 0n−k+1 ) and y = (0k−1 , 1, 0n−k ).

(3) Return x if val(x) ≥ val(y); otherwise y.

Worst-Case Analysis
The result below states that this modified algorithm is a 1/2-approximation. It is an
exercise to show that this bound is tight.
Theorem 3.2. We have that r(Greedy) ≥ 1/2.
Proof. In Problem 3.1, let us replace the constraints xj ∈ {0, 1} with xj ∈ [0, 1]. This yields
that the optimum solution of the modified problem can only be larger than the optimum
of the original problem. Such a modification is called relaxation of the constraints.

Problem 3.2 Fractional Knapsack


X
maximize val(x) = pj xj
j
X
subject to wj xj ≤ c,
j

xj ∈ [0, 1] j = 1, . . . , n.

The optimal solution of this problem has the following structure. The proof of the
observation below is an exercise.

18
Observation 3.3. Let p, w, ∈ Nn be non-negative integral vectors with
p1 p2 pn
≥ ≥ ··· ≥ .
w1 w2 wn
The break index is defined as
j
( )
X
k = min j ∈ {1, . . . , n} : wi > c .
i=1

Then an optimum solution for the Fractional Knapsack problem is given by

zj∗ = 1 for j = 1, . . . , k − 1,
c − k−1
P
∗ i=1 wi
zj = for j = k, and
wk
zj∗ = 0 for j = k + 1, . . . , n.

Notice that z ∗ has the form z ∗ = (1, . . . , 1, α, 0, . . . , 0) for some value 0 ≤ α < 1 at the
break index. The Greedy algorithm constructs two solutions x and y. Observe that x
has the form x = (1, . . . , 1, 0, 0, . . . , 0), i.e., equal to z ∗ except it has α = 0 at the break
index. Furthermore y = (0, . . . , 0, 1, 0, . . . , 0) with only the break item taken. The value
obtained by the Greedy algorithm is equal to max{val(x), val(y)}.
Let x∗ be an optimum solution for the Knapsack instance. Since every solution
that is feasible for the Knapsack instance is also feasible for the respective Fractional
Knapsack instance we have that val(x∗ ) ≤ val(z ∗ ).
In total we have

val(x∗ ) ≤ val(z ∗ ) = val(x) + αpk ≤ val(x) + val(y) ≤ 2 max{val(x), val(y)}

which implies the approximation ratio of 1/2.

Average-Case Analysis
On random instances (and sufficiently large capacity c), Greedy is usually much better
than 1/2-approximate. Indeed, the algorithm approximates the better, the larger the
capacity is. In the sequel let the weights wj be arbitrary, i.e., adversarial and the profits
Pj ∼ Uni (0, 1) be uniform distributed. It is important to note that the adversary specifies
the weights before the profits materialize.
Theorem 3.4. We have that e(Greedy) ≥ 1 − O c−1 .


Proof. Let p be an outcome of the random profits and let them be ordered according to
efficiency. Consider the solution-candidate x of Greedy and z ∗ the optimal solution for
Fractional Knapsack. Analogously to the deterministic case we have
val(x) val(x) val(x) αpk 1
≥ ≥ =1− ≥1− .
val(x∗ ) val(z ∗ ) val(x) + αpk val(x) val(x)
Taking expectations yields e(Greedy) ≥ 1 − E [1/val(X)]. Thus it suffices to show that
E [1/val(X)] = O c−1 . Consider the following solution s: Sort the items according to


weight and include exactly the c smallest items. It is important to note that this solution
is constructed without looking at the profits. We prove:

19
(1) For any outcome of the profits we have val(s) ≤ val(x).
(2) E [1/val(X)] ≤ E [1/val(s)] = O c−1 .


These properties immediately imply the result.


Consider an outcome of the pj and let S and X denote the sets associated with the
vectors s and x. Let p(S) and p(X) denote the respective profits. For (1) we have to show
p(S) ≤ p(X). First of all observe that S ⊇ X is impossible. Why? The set S consists of
exactly c items, while X consists of at least c items as all the weights are at most one.
Case 1. If S ⊆ X then x has all the items s has (and possibly some more). Since the
profits are non-negative this yields p(S) ≤ p(X).
Case 2. Otherwise, there are i and j such that si = 1 and xi = 0 and sj = 0 and xj = 1.
Since
wj ≥ max{w` : ` ∈ S} ≥ wi
and  
pj p` pi
≥ min :`∈X ≥
wj w` wi
we have that pj ≥ pi wj /wi ≥ pi . This means that the item j Greedy has chosen is
more profitable than the item i it has not. Now remove j from X and i from S and
repeat the argument until Case 1 holds.
Chernoff bound: Let X1 , . . . , Xn ∈ {0, 1} be inde-
The main tool to prove (2) is the P
pendent indicator variables. For X = i Xi and any δ ≥ 0 it holds that
 2 
δ E [X]
Pr [X ≥ (1 + δ)E [X]] ≤ exp − .
3
Let the c items in the set S be numbered 1, 2, . . . , c. Fix some number Pk and define
that Xj,k equals one if Pj ≤ 1/6k and zero otherwise. Further define Xk = cj=1 Xj,k . We
obviously have that p(S) ≤ c/12k implies Xk ≥ c/2.
      h
1 1 1 c ci
E ≤E ≤E p(S) ≥ Pr p(S) ≥
val(X) val(s) p(S) 12 12
∞    
X 1 c c c c
+ E ≤ p(S) ≤ Pr ≤ p(S) ≤
p(S) 12k 12(k − 1) 12k 12(k − 1)
k=2
∞ ∞
  !
12 X 12k c 12 X h ci
≤ + Pr p(S) ≤ ≤ 1+ kPr Xk ≥
c c 12(k − 1) c 2
k=2 k=2

!
12 X h ci
≤ 1+ 2kPr Xk ≥ .
c 2
k=1

Here we apply the Chernoff bound: Using E [Xk ] = c/6k we find


(c/6k)(3k − 1)2
   
h ci ck
Pr Xk ≥ = Pr [Xk ≥ (1 + 3k − 1)E [Xk ]] ≤ exp − ≤ exp − .
2 3 2
With this bound we have that
  ∞  !
1 12 X ck
= O c−1

E ≤ 1+ 2k exp −
val(X) c 2
k=1

and the result is proved.

20
3.1.2 Nemhauser-Ullmann Algorithm
In this section we will give an average-case analysis of an elegant enumeration algorithm,
named after its authors, Nemhauser-Ullmann. This algorithm computes an optimal
solution for the Knapsack problem. Not surprisingly, since Knapsack is NP-hard, the
algorithm has exponential running time in the worst case. However, under some rather
weak assumptions on profit distributions, the algorithm has polynomial expected running
time. For sake of exposition, we restrict our attention to the (important) special case that
profits are uniformly distributed.
One (stupid) way of solving Knapsack is to enumerate all subsets of the set of items,
check the feasibility of the current candidate, and keep track of the most profitable subset.
However, this approach has always exponential running time and enumerates a lot of
“uninteresting” subsets. Why? If we know (for some reason) that a certain subset is not
optimal, then all its subsets will also not be optimal. Hence it makes no sense enumerating
them. A more clever way will be introduced now.
The algorithm uses the following dominance criterion. First of all, we may assume
without loss of generality that for two solutions x 6= y we have val(x) 6= val(y) or
weight(x) 6= weight(y). Why? Since these solutions are equivalent otherwise, we lose
nothing by droping one of them. Let x and y be two solutions. We say that y is dominated
by x if
val(x) ≥ val(y) and weight(x) ≤ weight(y).
A solution x is called dominating if there is no solution y 6= x that dominates x. The
intuition is that the solutions that are dominated are redundant: We already have a
solution that is not heavier and at least as profitable. The following observation is the
basis of the algorithm.
Observation 3.5. An optimal solution is to be sought among the dominating solutions.
Let Sj be the set of all dominating solutions only considering the items {1, . . . , j}, i.e.,
the solutions are of the form x ∈ {0, 1}j 0n−j . For any two solutions x and y define the
operation z = x ⊕ y that sets zj = 1 if xj = 1 or yj = 1, and zj = 0 otherwise. Further
define the special solution 1j which is everywhere zero except at item j, where it has the
value one.
Algorithm 3.2 Nemhauser-Ullmann
Integer c, vectors p, w ∈ Nn with wj ≤ c, and
P
Input. j wj > c.

Output. Vector x ∈ {0, 1}n such that weight(x) ≤ c.

(1) Let S0 = {0}.

(2) For j = 1, . . . , n let Tj = {x ⊕ 1j : x ∈ Sj−1 } and

Sj = Sj−1 ∪ Tj − {y ∈ Sj−1 ∪ Tj : y is dominated by some x ∈ Sj−1 ∪ Tj }.

(3) Return the most profitable solution x∗ ∈ S0 ∪ · · · ∪ Sn with weight(x∗ ) ≤ c.

The running time obviously depends on the cardinalities of the sets S0 , S1 , . . . , Sn . For
any j let qj be an upper bound on Sj . We may assume that q0 ≤ q1 ≤ · · · ≤ qn . It is an
exercise to show that the computation of Sj can be implemented to run in time O (qj ).
This yields the following bound.

21
Lemma 3.6. Let I be an instance of Knapsack with n items. We have that

tI (Nemhauser-Ullmann) = O (nqn ) ,

where qn is any upper bound on |Sn | of the instance I.

Worst-Case Analysis
In the worst case, the running time of Nemhauser-Ullmann is exponential. The proof
is left as an exercise.

Theorem 3.7. We have that t(Nemhauser-Ullmann) = Ω (2n ).

Average-Case Analysis
Here we give an average-case analysis using the following probabilistic model. It causes
no loss of generality to assume that the weights and profits are chosen between zero and
one. Specifically, an adversary chooses weights wj ∈ [0, 1] arbitrarily and the profits are
independent and uniformly distributed Pj ∼ Uni (0, 1). It is important to note that the
adversary specifies the weights without knowing the outcomes of the Pj . Then the expected
running time of Nemhauser-Ullmann is polynomial.

Theorem 3.8. We have that t(Nemhauser-Ullmann) = O n4 .




Let the random variable Qj denote the number of dominating sets using the elements
{1, . . . , j}, i.e., the cardinality of the set Sj . The idea of the proof below is to show
E [Qn ] = O n3 .

Proof of Theorem 3.8. Let m = 2n and define x1 , x2 , . . . , xm as the sequence of all vectors
x ∈ {0, 1}n according to non-decreasing weight, i.e., weight(x1 ) ≤ · · · ≤ weight(xm ), and
those with equal weight, with decreasing profit.
Let A0 = ∅ and for any k ≥ 1 define the set Ak = {x1 , . . . , xk } and the number

δk = max val(x) − max val(x).


x∈Ak x∈Ak−1

The crucial observation is that xk is dominating if and only if δk > 0. Why? First of
all, since Ak ⊇ Ak−1 we have δk ≥ 0. For δk > 0 the solution xk is not dominated by
any y ∈ Ak−1 since val(xk ) > val(y). It is also not dominated by any y ∈ Am − Ak since
at least weight(xk ) < weight(y) or val(xk ) > val(y) is true. Thus, if δk > 0, then xk is
dominating. Conversely, if δk = 0 there is a solution y ∈ Ak−1 with val(xk ) = val(y) and
weight(xk ) ≥ weight(y), i.e., xk is dominated by y. Pk
Furthermore observe that we can write val(xk ) = `=1 δ` for any k. For random
profits, δk is actually a random variable ∆k . The following intermediate result (proof
below) is the key: It essentially states that if a solution is dominating, i.e., ∆k > 0, then
its increase in profit is reasonably large.
Lemma 3.9. It holds that E [ ∆k | ∆k > 0] ≥ 1/(32n2 ).

22
Observe that E [val(xm )] = n/2. Using Lemma 3.9 we can furthermore write
"m # m
X X
E [val(xm )] = E ∆k = E [∆k ]
k=1 k=1
m
X
= E [ ∆k | ∆k > 0] Pr [∆k > 0]
k=1
m
X 1
≥ Pr [∆k > 0]
32n2
k=1

so we have obtained
m
X
Pr [∆k > 0] ≤ 32n2 E [val(xm )] = 16n3 .
k=1

We are interested inPE [Qn ]. Let Dk be an indicator variable for the event ∆k > 0. Thus
we can write Qn = m k=1 Dk and hence
"m # m
X X
E [Qn ] = E Dk = Pr [∆k > 0] ≤ 16n3
k=1 k=1

which yields the theorem (provided that Lemma 3.9 is true).

Proof of Lemma 3.9. For ease of notation we will drop unnecessary subscripts: Let x = xk
be some solution in the sequence x1 , x2 , . . . , xm as defined above and let ∆ = ∆k be the
associated random variable to δ = δk . We will use this notation below: For two solutions
x and y define z = x y so that zj = 1 if xj = 1 and yj = 0, and zj = 0 otherwise.
We bound E [ ∆ | ∆ > 0] from below by
 
1 1
E [ ∆ | ∆ > 0] ≥ Pr ∆ ≥ 2
∆>0 ·
16n 16n2
and thus it suffices to show that
 
1 1
Pr ∆ ≥ 2
∆>0 ≥ .
16n 2
Fix a solution x = xk and let x1 , . . . , xk−1 be its predecessors in the sequence of
solutions. Further define a` = x x` and b` = x` x. Now we look at the probability we
are interested in:
   
1 1
Pr ∆ ≥ ∆ > 0 = Pr ∀` < k val(x) − val(x` ) ≥ ∀` < k val(x) > val(x` )
16n2 16n2
 
1
= Pr ∀` < k val(a` ) ≥ val(b` ) + ∀` < k val(a` ) ≥ val(b ` )
16n2
 
1
= Pr ∀` < k val(b` ) ≤ val(a` ) − ∀` < k val(b` ) ≤ val(a` )
16n2

Without loss of generality, the items are indexed such that x ∈ {0, 1}d 0n−d for some d.
This yields that a` ∈ {0, 1}d 0n−d and b` ∈ 0d {0, 1}n−d .
Now fix the realizations of the profits p1 , . . . , pd , i.e., we have fixed the values of x
and a` for any ` < k. The remaining profits pd+1 , . . . , pn may still be varied. Recall the

23
definition of conditional probabilities Pr [ A | B] = Pr [A ∩ B] /Pr [B] for any two events A
and B with Pr [B] > 0. Consider the following sets (that will correspond to events later):
 
n−d 1
A = (pd+1 , . . . , pn ) ∈ [0, 1] : ∀` < k val(b` ) ≤ val(a` ) −
16n2
n o
B = (pd+1 , . . . , pn ) ∈ [0, 1]n−d : ∀` < k val(b` ) ≤ val(a` )

We obviously have A ⊆ B and thus for each fixed realization of p1 , . . . , pd that


 
1 Pr [A ∩ B] vol(A)
Pr ∆ ≥ 2
∆ > 0, p1 , . . . , pd = = .
16n Pr [B] vol(B)

Let 0 ≤ ε ≤ 1 and define the Bε as the set B scaled by 1 − ε


n o
Bε = (pd+1 , . . . , pn ) ∈ [0, 1 − ε]n−d : ∀` < k val(b` ) ≤ (1 − ε)val(a` ) .

Now choose ε such that Bε ⊆ A and we find

vol(A) vol(Bε ) (1 − ε)n−d vol(B)


≥ = = (1 − ε)n−d ≥ 1 − ε(n − d).
vol(B) vol(B) vol(B)

In the above reasoning the outcomes p1 , . . . , pd where fixed. We derived a bound on


the desired probability conditional on these values and a fixed value for ε. Now we have
to treat the P1 , . . . , Pd as random variables but we still keep ε fixed. Hence, then, Bε and
A are random sets. Assume for the moment that the choice ε = 1/(4n) yields Bε ⊆ A
with probability at least 3/4. Then we have
    
1 1
Pr ∆ ≥ ∆ > 0 = E Pr ∆ ≥ ∆ > 0, P1 , . . . , Pd
16n2 16n2
   
vol(A) vol(Bε )
=E ≥E A ⊆ Bε Pr [A ⊆ Bε ]
vol(B) vol(B)
 
n−d 3 3 3 1
≥ 1− · ≥ · ≥
4n 4 4 4 2

and it only remains to show that the choice of ε = 1/(4n) has the desired property.
For any a` with val(a` ) ≥ 1/(4n) and this ε we have

val(a` ) 1
val(a` )(1 − ε) = val(a` ) − ≤ val(a` ) − .
4n 16n2
Thus, whenever val(a` ) ≥ 1/(4n), any member of Bε is also a member of A, which means

24
that Bε ⊆ A. This property allows us to finalize the proof:
 
1
Pr [Bε ⊆ A] ≥ Pr val(a` ) ≥ ∀` < k val(b` ) ≤ val(a` )
4n
 
1
= 1 − Pr val(a` ) ≤ ∀` < k val(b` ) ≤ val(a` )
4n
 
1
≥ 1 − Pr ∃j ∈ {1, . . . , d} : Pj ≤ ∀` < k val(b` ) ≤ val(a` )
4n
d  
X 1
≥1− Pr Pj ≤ ∀` < k val(b ` ) ≤ val(a ` )
4n
j=1
d  
X 1 d
≥1− Pr Pj ≤ =1−
4n 4n
j=1
3
≥ .
4
Above, Pr [ Pj ≤ 1/(4n) | ∀` < k val(b` ) ≤ val(a` )] ≤ Pr [Pj ≤ 1/(4n)] is the only location
in the whole proof where we use that the profits are uniform distributed.

3.2 Stochastic Weights


In this section we consider a variant of the problem where the weights Wj are random vari-
ables. The only assumption is that the Wj are independent (and of course non-negative).
In order to obtain a truly different problem than before, we consider the following
non-clairvoyant setting: We learn the actual outcome wj of the random variable Wj only
upon placing it into the knapsack. If we attempt to insert j and the items fits within our
current residual capacity, it is accepted irrevocably, otherwise it is rejected. The process
ends when the knapsack overflows. The profit pj of each item is known beforehand and
the objective is still to maximize the total profit of accepted items.
It causes no loss of generality to assume that the capacity of the knapsack is equal to
one. The set J = {1, . . . , n} denotes the set of items. An actual instance of the problem
is the vector p of profits and the vector of probability distributions W for the weights.
Below, adaptive algorithms are allowed to adjust the choice of the next candidate,
while non-adaptive algorithms have to specify their ordering in advance. This makes some
difference as the following example illustrates.

Example 3.10. The capacity is c = 1, n = 3, and the profits are p = (1, 1, 1). The weight
distributions are W = (W1 , W2 , W3 ) with
( (
1/5 probability 1/2 2/5 probability 1/2
W1 = , W2 = 4/5 probability 1, W3 = .
3/5 probability 1/2 9/10 probability 1/2

If we have to specify the ordering at which we try the items in advance, then the best
ordering is (1, 2, 3). This yields an expected profit of 3/2.
If we are allowed to choose depending on the current situation in our knapsack, we
might want to do as follows: We first insert item 1. In case W1 = 1/5 we insert item
2. Otherwise we insert item 3. Looking at the respective decision tree yields that our
expected profit is 7/4.

25
We further want to motivate that there are actually applications (in scheduling) for
such a setting. Suppose we are given a set of jobs and we can obtain a certain profit pj
from each job j. The actual duration of the job wj is not entirely known. We rather know
the probability distribution for the duration. There is a fixed deadline (i.e., the capacity of
the knapsack) and we seek to maximize the profit of the jobs finished before the deadline.
We will consider two types of algorithms, depending on the point in time they have
to choose the item that is tried next. The more powerful ones are adaptive algorithms:
Given the set I ⊆ J that is currently in the knapsack and its residual capacity 0 ≤ c ≤ 1,
an algorithm Π determines an item j = Π(I, c) as the next candidate. We can thus think
of such an algorithm as a mapping Π : 2J × [0, 1] → J specifying the next item to insert.
Notice that the profit of such an algorithm, denoted val(Π) is a random variable. We
denote by
adapt(p, W ) = max E [val(Π)]
Π

the maximal expected profit achievable by adaptive algorithms for the instance (p, W ).
By contrast, a non-adaptive algorithm π has to specify the ordering in which items are
tried in advance, i.e., the algorithm chooses a permutation π of J. We denote by

non-adapt(p, W ) = max E [val(π)]


π

the maximal expected profit achievable by non-adaptive algorithms for the instance (p, W ).
Since non-adaptive are special cases of adaptive algorithms we always have the inequal-
ity
adapt(p, W ) ≥ non-adapt(p, W ).
For any non-adaptive algorithm Alg, that chooses the permutation π, say, and achieves
alg(p, W ) = E [val(π)], the quantity

alg(p, W )
α(Alg) = sup
(p,W ) adapt(p, W )

is called the adaptivity gap of the algorithm.


We begin the exposition in Section 3.2.1 with a bound on the expected profit of optimal
adaptive algorithms. We continue in Section 3.2.2 and give a non-adaptive algorithm that
achieves 7/32 ≈ 1/5 of the expected profit of any optimal adaptive algorithm.
We need some additional definitions:
h i The truncated weight of an item j is W̃j =
min{1, Wj } and denote µj = E W̃j . This quantity is more useful than the actual weight
of an item, since any outcome wj > 1 yields rejection of the item P j.
As usual denote for any x ∈ P {0, 1}n the profit val(x) = j pj xj , the weight (now
a
P random variable) weight(x) = j W x
j j , and the mean truncated weight mass(x) =
j µj xj , also called the mass of x.
It is an exercise to prove the following bound.

Lemma 3.11. For any x ∈ {0, 1}n we have Pr [weight(x) ≤ 1] ≥ 1 − mass(x).

3.2.1 Adaptive Algorithms


Since adaptive algorithms Π can be seen as mappings Π : 2J × [0, 1] → J it is not clear
that an optimal adaptive policy can be described using polynomial space only. However, a
simple relaxation similar to the Fractional Knapsack problem can be used to obtain
an upper bound on the expected profit attainable by any adaptive algorithm.

26
Fix an adaptive policy Π and let X denote the (random) items that are inserted
successfully. If the weights are deterministic it is clear that mass(X) ≤ 1. Perhaps
surprisingly, in a stochastic setting we can have E [mass(X)] ≥ 1.

Example 3.12. Suppose we have an infinite number of items, where each profit is one
and the weight is one with probability p and zero otherwise. Then the expected number of
items we insert successfully is 2/p−1 since we can continue until two of them instantiate to
unit weight. Each item j has mean truncated weight µj = p and thus E [mass(X)] = 2 − p,
which can be arbitrarily close to 2.

However, the mass we try to insert can never be more than two in expectation. The
idea of proof of the following lemma is the observation that each W̃j is bounded by one
and we can count at most one item beyond the capacity of one.

Lemma 3.13. For any instance (with knapsack capacity equal to one) and any adaptive
policy, let X denote the (random) vector of items the algorithm attempts to insert. Then
E [mass(X)] ≤ 2.

With this key-lemma, we can bound the expected profit of an optimal adaptive al-
gorithm. Define by vj = pj Pr [Wj ≤ 1] the effective value of item j which is an upper
bound on the expected profit an algorithm can gain if it attempts to insert j. Now we
parametrize the Fractional Knapsack relaxation we already know with a value t for
the capacity of the knapsack:
 
X X 
Φ(t) = max vj xj : µj xj ≤ t, xj ∈ [0, 1] .
 
j j

With this formulation we can prove a bound on the expected profit of any optimal adaptive
algorithm.

Theorem 3.14. For any instance (p, W ) it holds that adapt(p, W ) ≤ Φ(2).

Proof. Fix an adaptive policy Π and let x denote the (random) vector of items that Π
attempts to insert into the knapsack. Further let the vector y be yj = Pr [xj P
= 1].
Given y, the expected mass that Π attempts to insert is E [mass(x)] = j µj yj . On
the other hand, Lemma 3.13 yields that E [mass(x)] ≤ 2. But this in turn means that the
vector y is feasible for the problem Φ(2) and we find the bounds
X
Φ(2) ≥ vj yj ≥ E [val(Π)]
j

which yields the claim.

3.2.2 Non-Adaptive Algorithms


In the previous section we gave a bound on the largest possible value of any adaptive
algorithm, i.e., including an optimal one. However, the question within this section is if
also a much simpler non-adaptive algorithm can obtain “good” solutions.
Indeed, we show that a natural greedy approach yields a solution with expected profit
7/32 ≈ 1/5 of that of an optimal adaptive algorithm. We give a randomized algorithm for
sake of exposition, but it can easily be derandomized.

27
Consider the function Φ(t), which can be seen as the Fractional Knapsack relax-
ation with capacity t. The usual Greedy algorithm considers the items in decreasing
order of expected effective efficiency vj /µj :
v1 vn
≥ ··· ≥ .
µ1 µn
Pk
Define mk = j=1 µj and observe that we have for any t = mk−1 + ξ ∈ [mk−1 , mk ]

k−1
X vk
Φ(t) = vj + ξ.
µk
j=1

We may assume that Φ(1) = 1 by scaling the vjPby an appropriate factor. And we may
assume that there are enough items such that nj=1 µj ≥ 1 (which can be achieved by
adding items with profit zero and positive weight).
Now we define the randomized greedy Rgreedy algorithm: Let r be the break index,
i.e., the smallest index with rj=1 µj ≥ 1 and let µ0r = 1 − r−1
P P
j=1 j , i.e., the part of item r
µ
that fits within capacity one. LetPp = µr /µr , wr = pwr . For j = 1, . . . , r − 1 set wj0 = wj
0 0 0

and µr = µr . We assume Φ(1) = rj=1 wj0 = 1.


0

Algorithm 3.3 Rgreedy


Input. Integer r, vectors v 0 , µ0 , real p0 ∈ [0, 1]

Output. Vector x ∈ {0, 1}n such that weight(x) ≤ 1.

(1) Choose index k with probability wk0 .

(2) If k < r insert item k, i.e., xk = 1. If k = r, flip another independent coin and insert
item r, i.e., xr = 1 with probability p0 , otherwise discard it.

(3) Then insert the items 1, 2, . . . , k − 1, k + 1, . . . , r in the greedy order, i.e., for j =
1, 2, . . . , k − 1, k + 1, . . . , r set xj = 1 if the item j still fits.

Theorem 3.15. We have that α(Rgreedy) ≥ 7/32.


Pr Pr
Proof. For simplicity assume that j=1 µj = 1 and that Φ(1) = j=1 vj = 1. Then
adapt(p, W ) ≤ Φ(2) ≤ 2, but more strongly

adapt(p, W ) ≤ Φ(2) ≤ 1 + ω,

where ω = vr /µr ≤ ( rj=1 vj )/( rj=1 µj ) = 1. With rj=1 µj = 1, the algorithm has the
P P P
simpler form:

(1) Choose k ∈ {1, . . . , r} with probability vk and insert item k.

(2) Then insert items 1, . . . , k − 1, k + 1, . . . , r in the greedy order.

We estimate the expected value achieved by this algorithm, with respect to the ran-
domization of the weights and with respect to our own randomization. Fix any k. The
Pk−1k is inserted first with probability vk , inserted after {1, . . . , k − 1} with probability
item
j=1 vj , and after {j, 1, . . . , k − 1} with probability vj for any k < j ≤ r. If k is the first

28
item, then its expected profit is vk = pk Pr [Wk ≤ 1]. Otherwise we use Lemma 3.11, i.e.,
a variant of Markov’s inequality:
 
X k Xk
Pr [item k fits] = Pr  Wj ≤ 1 ≥ 1 −
 µj .
j=1 j=1

P by {j, 1, . . . , k P
Similarly for the case when k is preceded − 1}.
With the obvious observation 1 − kj=1 µj ≥ vj (1 − kj=1 µj ) for any j, be denote by
ck the following lower bound on the expected profit of item k:
 
k−1
X k
X X r Xk
ck = vk vk + vj (1 − µ` ) + (1 − µ` − µj )
j=1 `=1 k+1 `=1
 
r
X k
X k
X r
X
= vk  vj (1 − µ` ) + vk µ` − vj µj  .
j=1 `=1 `=1 `=k+1
Pr Pr Pr
It holds that rgreedy(p, W ) ≥ k=1 ck and using j=1 vj = 1 and j=1 µj = 1 we
simplify the bound to
 
r
X k
X k
X r
X
rgreedy(p, W ) ≥ vk 1 − µj + vk µj − wj µj 
k=1 j=1 j=1 j=k+1
X X
=1+ (−vk µj + vk2 µj ) − vk vj µj
j≤k≤r k≤j≤r
X r
X
=1+ (−vk µj + vk2 µj + vk vj µj ) − vk vj µj
j≤k≤r j,k=1
X r
X
=1+ vk µj (vj + vk − 1) − vj µj .
j≤k≤r j=1

To symmetrize this polynomial, we apply the condition of the greedy ordering. For any
j < k we have vj + vk − 1 ≤ 0 and the greedy ordering implies vk µj ≤ vj µk , allowing us
to replace vk µj by (vk µj + vj µk )/2 for all pairs j < k. This yields
r r
1 X X X
rgreedy(p, W ) ≥ 1 + (vk µj + vj µk )(vj + vk − 1) + vj µj (2vk − 1) − vj µj
2
j≤k≤r j=1 j=1
r r r
1 X 1 X X
=1+ vk µj (vj + vk − 1) + vj µj (2vj − 1) − vj µj
2 2
j,k=1 j=1 j=1

Pr Pr
and using j=1 vj = j=1 µj = 1 gives

r r r
1 1X 2 X 2 X
≥ + vk + vk µk − vk µk .
2 2
k=1 k=1 k=1

We want to compare this expression to 1 + ω, where ω = min{vj /µj : j < r}. We use the

29
Pr 2
Pr
value of ω to estimate k=1 vk ≥ω k=1 vk µk and we obtain
r r r
1 ωX X
2
X
rgreedy(p, W ) ≥ + vk µk + vk µk − vk µk
2 2
k=1 k=1 k=1
r
1 X ω 
= + µk vk + vk − 1
2 2
k=1
r  2
1 X 1 ω
≥ − µk −
2 2 4
k=1

where the last inequality


P was obtained from minimizing a quadratic function in vk for each
k. Finally, with k µk = 1 we find

1 ω ω2
≥ + − .
4 4 16
We compare this to the adaptive optimum, which is bounded by 1 + ω and minimize for
ω ∈ [0, 1]:
rgreedy(p, W ) 1 ω2 7
≥ − ≥ .
adapt(p, W ) 4 16(1 + ω) 32
It remains to remove the assumption that j=1 µj = 1. We claim that if rj=1 µj > 1, the
Pr P
randomized greedy algorithm performs just like the simplified algorithmPr we 0just analyzed,
0 0
on a modified instance with values vj and mean sizes µj (so that j=1 µj = 1; see the
description of the algorithm). Indeed, Φ(1) = 1 and ω = vr /µr = vr0 /µ0r in both cases.
Hence the bound on adapt(p, W ) is the same. For an item k < r, our estimate of the
expected profit in both instances is
 
k−1
X k
X r
X k
X
ck = vk0 vk0 + vj0 (1 − µ0` ) + (1 − µ0` − µ0j ) .
j=1 `=1 k+1 `=1

For the original instance, this is because the expected contributions of item r to the total
weight, conditioned on being selected first, is p0 µr = µ0r . If it is not selected first it is not
counted at all. The expected profit for item r is cr = vr0 p0 vr = (vr0 )2 in both instances.
This reduces the analysis to the case we dealt with already, completing the proof of the
theorem.

30
Chapter 4

Completion Time Scheduling

4.1 Deterministic Scheduling


Suppose that we are given one machine for scheduling jobs. Any job j of the n given jobs J
has a priority, i.e., a weight wj and a processing requirement pj . We assume that executing
j takes time pj and that once we started the job we must execute it until termination, i.e.,
we say that preemption is not allowed. This implies that any schedule, i.e., assignment of
jobs to time intervals, can be seen as just an ordering, i.e., a list of the jobs. We denote
by Π(J) the set of permutations of J. The total weighted completion time of a list π is
X
val(π) = wj cj (π),
j

where cj (π) denotes the time when job j finishes under the ordering π. This objective
function measures the total (weighted) time the jobs have to wait in the system before
completion. Of course, our goal is to minimize the objective function.

Problem 4.1 Completion Time Scheduling


X
minimize val(π) = wj cj (π)
j

subject to π ∈ Π(J)

In order to get an intuition for the objective function and that the ordering indeed
plays a role, let us have a look at the following instance.

Example 4.1. There are n jobs J = {1, . . . , n}, all with weight wj = 1, and the following
processing times: p1 = p, p2 = · · · = pn = 1, where p is a large number. Consider the
orderings
π = (1, 2, . . . , n) and π ∗ = (2, . . . , n, 1).
We find val(π) = p + p + 1 + p + 2 + · · · + p + n − 1 = np + n(n − 1)/2 and val(π ∗ ) =
1 + 2 + · · · + n − 1 + n − 1 + p = n(n − 1)/2 + n − 1 + p. So if we choose p substantially larger
than n2 , then the ratio val(π)/val(π ∗ ) is arbitrarily close to n. We will see very soon how
to solve this problem in polynomial time (which will imply that π ∗ is optimal).

The above example indicates that one might want to schedule the jobs in increasing
order of processing time. For the weighted version it may be a good idea to schedule the

31
jobs in increasing order of the ratio pj /wj , called efficiency. So we may assume that the
jobs are given in the greedy order
p1 p2 pn
≤ ≤ ··· ≤ .
w1 w2 wn
For any list π denote by j ≤π k that job j is not after job k in π; analogously j <π k
for j before k. In any ordering π, whenever some job j is somewhere before a job k, i.e.,
j <π k, such that pj /wj > pk /wk , then j and k from an inversion. Notice that the greedy
ordering has no inversions.
Theorem 4.2. The greedy order is optimal for total weighted completion time scheduling.

Proof. We show that eliminating an inversion strictly improves the objective function. Let
there be an inversion, otherwise there is nothing to show. Then there is an inversion where
the involved job j directly precedes the other one k, say, i.e., the ordering π has the form
π = (. . . , j, k, . . . ). Swap j and k and leave the rest of the ordering unchanged yielding
the ordering π 0 = (. . . , k, j, . . . ). Now π 0 has exactly one inversion less than π.
We obviously have
val(π 0 ) = val(π) − wk pj + wj pk .
Since j and k from an inversion it holds that −wk pj + wj pk < 0 since pj /wj > pk /wk
completing the proof.

4.2 Stochastic Non-Clairvoyant Scheduling


In the preceding section we assumed that the processing requirements pj are given at the
outset. However, this assumption is not always justified. For example, in computer science
we usually have no idea how long algorithms, respectively programs we run usually take.
Instead, we may want to learn about the “typical” duration by recording the respective
data from earlier executions and base our scheduling decisions on that data.
In this section we are given the probability distribution Pj of the processing require-
ment of any job j. Our only assumptions will be that the jobs are independent and take
integer values. Notice that we learn the outcome pj of the random variable Pj only upon
completing the job j. And, as before, the priorities wj are given in advance and we are
not allowed to interrupt a job once begun.
So we are interested in algorithms that determine (in advance) an ordering of the jobs
and then schedule the jobs according to this. In the notation of Chapter 3, this is a
non-adaptive algorithm. In contrast to Chapter 3 we do not compare a non-adaptive with
an adaptive algorithm. Instead, we compare (in expectation) the solution our algorithm
produces directly to the optimal solution given the whole data, i.e., the wj and the pj in
advance.
More precisely, let Alg be an algorithm that chooses the ordering π, say, on an instance
P = (P1 , . . . , Pn ) and w = (w1 , . . . , wn ). For any outcome p of P denote by
X X X XX
alg(p, w) = val(π) = wj cj (π) = wj ( pk ) = wk pj
j j k≤π j j k>π j

the total weighted completion time of that algorithm. alg = alg(P, w) denotes the
induced random variable. Given any outcome p of P , let

opt = opt(p, w) = min val(π)


π∈Π(J)

32
be the optimal objective value. Let opt = opt(P, w) be the induced random variable and
Π∗ the respective random ordering. As mentioned already, we measure the quality of an
algorithm with the expected value of the approximation ratio
h alg i
e(alg) = E ,
opt
where the expectation is taken with respect to the processing time distribution.

Greedy Algorithm
Maybe the most natural algorithm is to determine the expected values µj = E [Pj ] and to
schedule the jobs in the ordering
µ1 µ2 µn
≤ ≤ ··· ≤ .
w1 w2 wn
Let this algorithm be called Egreedy.
The question is how good is this algorithm? Suppose that the algorithm scheduled the
job j before k since µj /wj ≤ µk /wk , but the outcome is such that pj /wj > pk /wk . This
means that the algorithm scheduled the jobs j and k in the wrong order. How large is the
contribution to the objective function of this error?
If we look at the proof of Theorem 4.2 we see that a pair of adjacent jobs that are
in the wrong order contribute wj pk − wk pj to the objective function. They are in wrong
position if and only if pj /wj > pk /wk . This suggests the following definitions: For any
pair j and k of jobs define
δjk = wj pk − wk pj ,
and (
1 if pj /wj > pk /wk ,
xjk =
0 otherwise.
The intuition here is that δjk measures the influence of the error, i.e., of the inversion of
j and k, to the objective function and xjk triggers if this error had occurred. Thus the
following lemma, whose proof is left as an exercise, comes to mind: Let π be an arbitrary
ordering and π ∗ be an optimal ordering for the instance p and w.

Lemma 4.3. We have val(π) = val(π ∗ ) + j k>π j δjk xjk .


P P

For technical reasons in the analysis below, we need a little generalized version of this
lemma. We alter the definition of δjk a little bit, but xjk stays the same: For a pair j and
k of jobs and any number β ≥ 1 define δjk = wk pj − βwj pk . This yields a similar result.

Lemma 4.4. For any β ≥ 1 we have val(π) ≤ βval(π ∗ ) + j k>π j δjk xjk .
P P

Proof. Define the variable yjk (π) that is one if k >π j and otherwise zero. We use yjk =
∗ = y (π ∗ ) as a shorthand. Observe that x
yjk (π) and yjk jk jk = 1 if and only if yjk = 1 and
∗ ∗ π ∗
yjk = 0. To see this recall that yjk = 0, i.e., k ≤ j implies pk /wk ≤ pj /wj and hence
δjk ≥ 0. Further note that we have yjk = 1 − ykj and yjk ∗ = 1 − y ∗ for j 6= k.
kj
For every list π it holds that
XX XX X
val(π) = wk pj = wk pj yjk (π) + wj pj
j k≥π j j k j

33
Using β ≥ 1 we calculate
XX
val(π) − βval(π ∗ ) ≤ ∗
wk pj (yjk − βyjk )
j k
XX
∗ ∗

= wk pj (yjk − βyjk ) + wj pk (ykj − βykj )
j k>π j
XX
∗ ∗

≤ wk pj (yjk − yjk ) − wj pk ((β − βykj ) − (β − βykj ))
j k>π j
XX
∗ ∗

= wk pj (yjk − yjk ) − βwj pk (yjk − yjk )
j k>π j
XX XX
∗ ∗
= (wk pj − βwj pk )(yjk − yjk )= δjk (1 − yjk )
j k>π j j k>π j
XX
= δjk xjk
j k≥π j

and the proof is complete.


We need additional notation before stating the result. Let j and k be two jobs with
associated weights wj and wk such that µj /wj ≤ µk /wk . We denote by ∆jk and Xjk
the induced random variables by δjk and xjk . Define the quantity αjk = Pr [Xjk = 1] =
Pr [Pj /wj > Pk /wk ] and α = maxjk αjk . Notice that the probability that the Egreedy
algorithm finishes the jobs j and k in wrong order in comparison to the optimum is αjk .
It will turn out that the expected approximation ratio depends on the worst of these
probabilities, i.e., on α.
Here is a little bit technical definition. The intuition is that we want to be able to say
something about the (expected) contribution of an error, provided that it has occurred:
For every distribution there is a number β ≥ 1 such that for all j and k as above
   
wk Pj − βwj Pk wk Pj
E
opt wk Pj > wj Pk ≤ E opt (4.1)

and Pr [Pj /wj > Pk /wk ] ≤ α.


Now we state a bound on the expected approximation ratio of the Egreedy algorithm
that depends on the parameters α and β as defined above.
Theorem 4.5. We have that e(Egreedy) = E [egreedy/opt] ≤ β/(1 − α).
For an example application of the result, suppose that the jobs are independent geo-
metric distributed (with arbitrary expectations).
Corollary 4.6. For geometric processing time distributions we have e(Egreedy) ≤ 2.
Proof. It suffices to prove that the geometric distribution allows the choice β = 1 and
α ≤ 1/2. It is straightforward to show that the geometric distribution yields
αjk = wk µj /(wk µj + wj µk ) ≤ 1/2
and it is an exercise to prove the following lemma.
Lemma 4.7. Let h : N1 × · · · × Nn → N be a monotone non-decreasing function. Let
P = (P1 , . . . , Pn ) denote the vector of independent geometric distributed random variables
Pj . Then it holds for any j and any integer t ≥ 0 that
   
Pj − t Pj
E Pj > t ≤ E .
h(P ) h(P )

34
It is not hard to see that, since the jobs are assumed to be independent, t is allowed
to be the random variable wj Pk . Observe also that opt is a monotone non-decreasing
function. (Why?) So we apply the lemma to the random variable wk Pj and see that we
are allowed to choose β = 1 in (4.1).

Proof of Theorem 4.5. Let the n jobs be indexed such that µ1 /w1 ≤ · · · ≤ µn /wn , i.e.,
in the greedy order. Let π = (1, . . . , n) be the list of the jobs in that order. We use
Lemma 4.4 and property (4.1) to find

βval(Π∗ ) + val(π) − βval(Π∗ )


   
h egreedy i val(π)
E =E =E
opt opt opt
X X  ∆jk Xjk 
≤β+ E
opt
j k>π j
 
XX wk Pj − βwj Pk
=β+ E Xjk = 1 Pr [Xjk = 1]
opt
j k>π j
 
XX wk Pj
≤β+ αE
opt
j k>π j
P P
k>π j wk Pj

j
≤ β + αE
opt
h egreedy i
= β + αE .
opt
Rearranging completes the proof.

35
Chapter 5

Paging

Memory management is a fundamental problem in computer architecture and operating


systems. In its simplest form, the memory system consists of two levels: a fast but small
cache and a slow but large main memory. The cache is a temporary memory for data items
needed. Copying from main memory to the cache takes time, but has the advantage that
the data in the cache can be accessed fast. A replacement strategy decides which data
item is to be evicted in case an item which is currently missing in the cache is requested.
It is well-known that the chosen replacement strategy is crucial for the overall system
performance.
The underlying theoretical problem is known as the Paging problem. We are given a
collection of data items, called pages, and cache of size k, which means that it can store
up to k pages. Further, a sequence of page-requests has to be served by making each
requested page available in the fast memory. If the page is already cached, the request
is served without cost. Otherwise, a page fault occurs and the requested page must be
brought to fast memory, which might involve the removal of a currently cached page. The
objective function of a replacement strategy is to minimize the number of page faults.
In the offline version of the problem, the whole request sequence is known in advance.
The Longest-Forward-Distance algorithm (denoted Opt), is an optimal offline al-
gorithm. The algorithm always evicts the page with the most distant next request. In
online paging, requests arrive one-by-one and eviction decisions must be made at every
arriving request, without any knowledge of the future. Examples for online strategies are
Least-Recently-Used (Lru), which evicts the page in the cache whose last access was
earliest, and First-In-First-Out (Fifo), which evicts the page that has been in fast
memory longest.
It is common to evaluate the performance of online algorithms by using competitive
analysis. In that model, an online algorithm Alg is compared to an optimum offline algo-
rithm Opt on the same input sequence. The request sequence is chosen by an adversary A
out of a set SA of admissible sequences. Denote the respective number of faults of Alg
and Opt on the request sequence R by alg(R) and opt(R). An algorithm is said to be
c-competitive against the adversary A if

alg(R) ≤ c · opt(R) + γ for all R ∈ SA ,

where c = c(k) may depend on k but not on the length of R and γ is a constant independent
of both. The smallest such c is called the competitive ratio of the algorithm and denoted
rA (alg). If no such c exists, then the algorithm is called not competitive.
It is known that Lru (and several other paging algorithms including Fifo) are k-com-
petitive against the unrestricted adversary, and that this bound is tight. However, the

36
theoretical k-competitiveness seems to be overly pessimistic to be meaningful for practical
situations. For example, in an experimental study the observed competitive ratio of Lru
ranged between one and three but was frequently around 3/2. This gap between theo-
retical worst-case bounds and practically observed behaviour called for research. There is
consensus that the practical success of Lru (and other algorithms) is due to a phenomenon
called locality of reference. This means that a requested page is likely to be requested again
in the near future. Thus a lot of the related theoretical work was devoted to modeling
locality appropriately. The two most common ways are randomized adversaries, where re-
quest sequences obey an underlying probability distribution and deterministic adversaries,
for which the set of admissible sequences is restricted explicitly.

Overview
One difficulty in analyzing the paging problem was the lack of a strong lower bound for the
minimum number of faults needed to serve a given request sequence. So far, this number
was bounded from below by partitioning the sequence into phases, giving insight only into
local structure of the paging problem.
In Section 5.1 we give an analysis, which is based on a lower bound that captures the
structure of the entire sequence. Furthermore, the framework characterizes the regime in
which an offline optimal strategy can substantially outperform Lru. Two requests to the
same page with exactly ` distinct pages inbetween are called a pair with distance `. Define
the characteristic vector

c(R) = (c0 (R), c1 (R), . . . , cp−1 (R))

of a sequence R, where every entry c` (R) counts the number of pairs in the sequence with
distance exactly `. Here p = p(R) denotes the number of distinct pages accessed in R.
This definition allows us to characterize the number of faults of Lru because it will have
evicted a certain page if and only if at least k distinct pages were requested since the last
request to that page. Therefore we have
X
lru(R) = c` (R) + p(R).
`≥k

The crucial result for our analysis is the lower bound


1X`−k+1
opt(R) ≥ c` (R).
2 `
`≥k

The similar form of these bounds allows direct comparison of the number of faults of Lru
and Opt on any given sequence and any cache size k.
In Section 5.2 we propose a deterministic adversary, which is only mildly restricted by
two parameters. The adversary not only captures locality of reference, but also another
phenomenon which has not been considered in previous theoretical work: typical memory
access patterns. The idea here is that if data is accessed, many of the items will be
needed again, but not in the near future. For those requests any algorithm, especially
also an optimum one, will most likely incur a fault. We explain the favorable observed
competitive ratio to a large extend with this phenomenon. In specific, we prove a bound
for the competitive ratio of Lru which depends on the parameters of our adversary. We
argue that the values for these in real programs are indeed such that our bound gives
values between two and five, coming relatively close to the competitive ratios observed in
pratice.

37
We formalize this intuition with the (α, β)-adversary. Let α > 1 and β ≥ 0 be fixed.
The (α, β)-adversary is free to choose any sequence R respecting the constraint
αk−1
X p−1
X
c` (R) ≤ β c` (R).
`=k `=αk

The intuition behind this definition is that there are not “too many” request pairs with
“critical” distance between k and αk − 1: The pairs with distance less than k do no harm
because neither Lru nor Opt fault there. On the other extreme, both, Lru and Opt will
fault on many pairs with distance at least αk. But the ones where Opt might outperform
Lru are those with distance inbetween.
Our analysis yields that the competitive ratio of Lru for this class of sequences is
bounded by
α
r(α,β) (lru) ≤ 2(1 + β) .
α−1
Observe that, depending on the values for α and β, our bound can actually become as
small as two. Furthermore, the notions of the characteristic vector and the (α, β)-adversary
are a formal framework describing on which sequences Lru performs good and explains
why it performs poorly on others. We performed extensive experiments and calculated
feasible values for α and β. The results supported the intuition that the sequences in
practice feature “large” α and “small” β. Our bound for the competitive ratio of Lru on
these sequences ranged between two and five, coming close to practical experience. See
Figure 5.2 and Figure 5.3 for two representative results.

5.1 Competitive Analysis


Characteristic Vector
Let M = {1, 2, . . . , m} be the set of pages stored in the large memory. Let R = (r1 , r2 , . . . , rn )
denote a sequence of requests of length n, where we refer to the set T = {1, 2, . . . , n} as
time. We require rt ∈ M for every t ∈ T and call P = {r1 , . . . , rp } ⊆ M the set of
requested pages. Throughout, p = p(R) = ||P || counts the number of requested pages in
the sequence.
A request pair (i, j) is defined by two time-indices i and j within R such that 1 ≤ i <
j ≤ n. We call a request-pair (i, j) consecutive if ri = rj and all pages r` for i < ` < j are
distinct from ri . Let

C = {(s, t) : consecutive request pair (s, t) in R}

denote the set of consecutive request pairs.


Further define the set of distinct pages between (i, j) by D(i, j) = {ri+1 , . . . , rj−1 } and
define the distance of a consecutive request pair (i, j) by dist(i, j) = ||D(i, j)||. Let

C` (R) = {(s, t) ∈ C : dist(s, t) = `}

denote the set of consecutive pairs with distance `.


The characteristic vector of a sequence R is defined by

c(R) = (c0 (R), c1 (R), . . . , cp−1 (R)),

where c` (R) = ||C` (R)||. Observe that each c` (R) counts the number of consecutive pairs
(i, j) that have exactly ` distinct pages between i and j.

38
Bounds for LRU and OPT
The characteristic vector enables us to derive algebraic descriptions for the number of
faults of Lru, see Theorem 5.1. Furthermore, it allows us to bound the number of faults
of any optimal offline paging algorithm Opt from below, see Theorem 5.2. Especially the
latter result is crucial for our analysis.
Theorem 5.1. Let k denote the cache size and let R be a request sequence with charac-
teristic vector c(R), then X
lru(R) = c` (R) + p(R).
`≥k

Proof. If a certain page is in the cache, it will be evicted by Lru before it is requested the
next time if and only if there are requests to at least k distinct pages before that. The
additional p(R) in the bound is due to the initial faults, i.e., the faults incurred when the
page is brought to the cache for the first time.

Theorem 5.2. Let k denote the cache size and let R be a request sequence with charac-
teristic vector c(R) and p(R) ≥ k + 1, then
1 X`−k+1
opt(R) ≥ k−1 k−1
c` (R).
1+ k − p−1 `≥k
`

The bound is tight.


For practical applications, we can assume that the number of distinct requested pages
p is large. Hence we state the following corollary for future reference.
Corollary 5.3. Let k denote the cache size and let R be a request sequence with charac-
teristic vector c(R) and p(R) ≥ k + 1, then
1X`−k+1
opt(R) ≥ c` (R).
2 `
`≥k

For an intuitive understanding of this bound, consider an (` + 1)-multicycle, i.e., a


sequence of the form (1, 2, . . . , ` + 1, 1, 2, . . . , ` + 1, . . . ). It is easy to see that the optimum
algorithm faults ` − k + 1 times per repetition of the cycle (1, 2, . . . , ` + 1), and that every
consecutive request pair has distance `. Hence, averaging over the whole sequence yields
that each consecutive request pair contributes (` − k + 1)/` to the total number of faults.
The lower bound provides the abstraction that an arbitrary sequence can be seen as if
it were a collection of multicycles with the same characteristic vector.
Before we proceed to the proof of Theorem 5.2, we want to give some intuition about
the obstacles we have to overcome. Let R be a request sequence which requests p distinct
pages; then  
`−k+1
opt(R) ≥ max c` (R) ,
`≥k `
where k denotes the cache size. To see this, observe that Opt faults at least ` − k + 1
times between two consecutive requests to a page, with ` distinct pages inbetween. This
is because at the point in time of the first request it has cached at most k − 1 of the
pages inbetween. Therefore, by considering only consecutive request pairs of distance `,
we overcount by at most a factor ` because every fault is counted for at most ` consecutive
request pairs. Observe that the bound is tight for (` + 1)-multicycles.

39
Theorem 5.2 states that the max in the above lower bound can be replaced by a sum,
losing a factor which is at most two. The main difficulty for the proof is that consecutive
request pairs of different distances may overlap, which has to be handled appropriately.
We make use of amortized analysis to deal with it. The remainder of this section is devoted
to the proof.

Proof of Theorem 5.2. Recall that C denotes the set of all consecutive request pairs and
also recall the set T which is referred to as time. Consider an execution of an arbitrary
optimal offline algorithm Opt on a given sequence R, e.g., Longest-Forward-Distance.
Define
F = F (R) = {t ∈ T : Opt faults at time t} (5.1)
and (
1 if t ∈ F (R),
ft (R) =
0 otherwise,
which yields X
opt(R) = ft (R).
t∈T

Let t ∈ T be a point in time and let (i, j) ∈ C be a consecutive request pair. We prove
that there exitsts a function δt (i, j) which has the two properties expressed in Claim 5.4
and Claim 5.5 below.
Claim 5.4. Let (i, j) ∈ C. If dist(i, j) ≥ k, then it holds that
X dist(i, j) − k + 1
δt (i, j) = .
dist(i, j)
t∈T
P
Otherwise, i.e., if dist(i, j) < k, then t∈T δt (i, j) = 0.
Claim 5.5. For every t ∈ T we have that
 
X k−1 k−1
δt (i, j) ≤ 1 + − ft (R).
k p−1
(i,j)∈C

Suppose that Claim 5.4 and Claim 5.5 hold. Then the following argument completes
the proof of the lower bound.
X
opt(R) = ft (R)
t∈T
(Claim 5.5) 1 X X
≥ k−1
δt (i, j)
1+ k − k−1
p−1 t∈T (i,j)∈C
1 X X
= k−1 k−1
δt (i, j)
1+ k − p−1 (i,j)∈C t∈T

(Claim 5.4) 1 X dist(i, j) − k + 1


= k−1 k−1
1+ k − p−1
dist(i, j)
(i,j)∈C,
dist(i,j)≥k
1 X`−k+1
= k−1 k−1
c` (R).
1+ k − p−1 `≥k
`

40
i 5 4 3 2 1 j t

(a)

t=i 1 1 1 2 3 3 4 5 5 j t

(b)

Figure 5.1: Two views on the rank: fixed pair and fixed time.

It is easy to see from the discussion preceeding this proof, that the bound is tight for
(k + 1)-multicycles (1, 2, . . . , k + 1, 1, 2, . . . , k + 1, . . . ). These sequences request p = k + 1
distinct pages and hence 1 + (k − 1)/k − (k − 1)/(p − 1) = 1.
Corollary 5.3 is implied from the observation
 
k−1 k−1
1+ − ft (R) ≤ 2ft (R).
k p−1

Below, we present the proofs of Claim 5.4 and Claim 5.5 that complete the proof of
the theorem.

Before we actually prove Claim 5.4 and Claim 5.5 we need additional notation. For
every pair (i, j) ∈ C define the set

F (i, j) = {t ∈ F : i < t < j},

i.e., the times when Opt faults between i and j. For every t ∈ F (i, j), define the rank of
the fault at time t within (i, j) by

rankt (i, j) = ||{s ∈ F (i, j) : s ≥ t}||.

The intuition is that the fault with rank r is the r-th last fault within (i, j).
Figure 5.1 depicts two perspectives on the rank. Black circles indicate points in time
when the optimum faults, white circles the others. It turns out that both perspectives are
needed to prove Claim 5.4 and Claim 5.5.
In Figure 5.1 (a), a fixed pair (i, j) is considered and the ranks of the points in time
when the optimum faults are shown. These ranks decrease over time.
In Figure 5.1 (b), the fixed time i = t ∈ F is considered and all the pairs (v, w) with
w > t (and v < t) are shown. The numbers below the time axis indicate the respective
rank of time t for each pair.

41
Let
L(i, j) = {t ∈ F (i, j) : rankt (i, j) ≤ dist(i, j) − k + 1}
denote the set of the dist(i, j) − k + 1 last faults of (i, j).
Define the value of (i, j) at time t ∈ T by
val1 (i, j) = dist(i, j), (5.2)
(
valt (i, j) − 1 if t ∈ L(i, j),
valt+1 (i, j) = (5.3)
valt (i, j) otherwise.
Observe that the value of a pair (i, j) decreases by exactly one at a time whenever one
of the dist(i, j) − k + 1 last faults in L(i, j) occurs. At all other times, the value remains
constant. It is not hard to see that the following equality holds. It relates the rank of a
fault at time t ∈ L(i, j) with the value at that time:
valt (i, j) = rankt (i, j) + k − 1. (5.4)
Hence valt (i, j) ∈ {k, . . . , dist(i, j)} for all t ∈ L(i, j).
We are now in position to define the function δt (i, j) based on valt (i, j):
(
k−1 k−1
valt+1 (i,j) − valt (i,j) if dist(i, j) ≥ k,
δt (i, j) = (5.5)
0 otherwise.
Observe that δt (i, j) = 0 for all times except t ∈ L(i, j). The two crucial properties
of
P this definition are the following. First, for each pair (i, j) with dist(i, j) ≥ k we have
t∈T δt (i, j) = (dist(i, j) − k + 1)/dist(i, j), which will be shown below. Second, observe
that (5.5) is defined
P in such a way that the smaller the rank of a fault is, the more it
contributes to t∈L(i,j) δt (i, j). This second property is crucially needed for the proof of
Claim 5.5 below.
Proof of Claim 5.4. Let (i, j) ∈ C be a consecutive request pair. P
First, let dist(i, j) < k and observe that the definition (5.5) implies t∈T δt (i, j) = 0.
Second, let dist(i, j) ≥ k. Observe that the number of faults of Opt within (i, j)
is ||F (i, j)|| ≥ dist(i, j) − k + 1 for all pairs with distance at least k. This is because
there are dist(i, j) distinct pages between i and j and there are at most k − 1 of these
in the cache of Opt at time i. Hence, for pairs (i, j) with dist(i, j) ≥ k the property
||F (i, j)|| ≥ dist(i, j) − k + 1 implies that ||L(i, j)|| = dist(i, j) − k + 1.
Then (5.1), (5.2), (5.3), and (5.5) imply
X X X
δt (i, j) = δt (i, j) = δt (i, j)
t∈T t∈F t∈L(i,j)
X k−1 k−1
= −
valt+1 (i, j) valt (i, j)
t∈L(i,j)
dist(i,j)
X 1 1
= (k − 1) −
v−1 v
v=k
 
1 1
= (k − 1) −
k − 1 dist(i, j)
dist(i, j) − k + 1
= (k − 1)
(k − 1)dist(i, j)
dist(i, j) − k + 1
=
dist(i, j)

42
and Claim 5.4 is established.

Before we proceed to the proof of Claim 5.5, we give some of the intuition behind it
and introduce additional notation.
P t ∈ F be a point in time when an Opt fault occurs. We have to prove that the
Let
sum (i,j)∈C δt (i, j) of all the pairs (i, j) for which δt (i, j) > 0 is not “too large.” As
mentioned above (see the discussion after (5.5)), for every consecutive
P request pair (i, j)
with dist(i, j) ≥ k, a fault at time t contributes the more to t∈L(i,j) δt (i, j) the smaller
the value of that pair is at time t. Hence it is crucial to prove that there are not “too
many” pairs with small value at time t. This central property is expressed in Lemma 5.6.
For every t ∈ T define the set

X(t) = {(i, j) ∈ C : t ∈ L(i, j)}.

Observe that if no fault occurs at time t, then X(t) is empty. Otherwise, i.e., for t ∈ F , the
set X(t) comprises all the pairs for which the fault at time t is one of the dist(i, j) − k + 1
last faults. Hence, for these pairs the value valt (i, j) decreases by definition (5.3) at time
t + 1. Furthermore, observe that
X X
δt (i, j) = δt (i, j),
(i,j)∈C (i,j)∈X(t)

i.e., it actually suffices to consider the pairs (i, j) ∈ X(t), because δt (v, w) = 0 for all the
other pairs (v, w) 6∈ X(t).
For every ` = 1, . . . , p − 1 define the sets and quantities

X` (t) = {(i, j) ∈ X(t) : rankt (i, j) = `}


x` (t) = ||X` (t)||

i.e., the (number of) pairs for which the fault at time t has rank `.

Lemma 5.6. For all t ∈ F it holds that


r
X
x` (t) ≤ r + k − 1
`=1

for all r = 1, . . . , p − k and xp−k+1 (t) = · · · = xp−1 (t) = 0.

Proof. Consider a fixed time t ∈ F . For simplicity of notation, we omit the argument t,
e.g., we write X` and X instead of X` (t) and X(t).
First observe that (5.4) implies rankt (i, j) ≤ dist(i, j) − k + 1 ≤ p − k. Hence Xp−k+1 =
· · · = Xp−1 = ∅ and xp−k+1 = · · · = xp−1 = 0.
The sets X` partition the pairs (i, j) ∈ X according to P their rank. Hence
Pr the sets X`
r r
induce a disjoint partition of X and we have || ∪`=1 X` || = `=1 ||X` || = `=1 x` .
For sake of contradiction, fix r such that ∪r`=1 X` comprises m > r + k − 1 pairs. Let
these be (i1 , j1 ), . . . , (im , jm ). Without loss of generality t < j1 < · · · < jm−1 < jm .
Observe that all the pages rt , rj1 , . . . , rjm are distinct from each other. Hence, at time
t, the set {rt , rj1 , . . . , rjm−1 , rjm } comprises exactly m + 1 distinct pages. Therefore any
algorithm with cache size k faults at least m + 1 − k > r + k − 1 + 1 − k = r times. Hence
there are more than r faults of Opt within the time range t, . . . , jm . Consider the pair
(im , jm ) and observe that the fault at time t within that pair has rank more than r, i.e.,
rankt (im , jm ) > r. Hence (im , jm ) 6∈ ∪r`=1 X` yields the desired contradiction.

43
Proof of Claim 5.5. First observe that for every t ∈ T \F we have
 
X k−1 k−1
δt (i, j) = 0 ≤ 1 + − ft
k p−1
(i,j)∈C

by definition of δt (i, j).


Second, let t ∈ F and consider the set X(t). Recall the disjoint partition X(t) =
X1 (t) ∪ · · · ∪ Xp−k (t). By (5.3), (5.4), and (5.5) each pair (i, j) ∈ X` (t) contributes

k−1 k−1
δt (i, j) = −
`+k−2 `+k−1
P
to (i,j)∈X(t) δt (i, j).
Now interpret the x` as variables of the following linear program, where the constraints
(5.7) are implied from Lemma 5.6.
p−k  
X k−1 k−1
maximize x` − (5.6)
`+k−2 `+k−1
`=1
Xr
subject to x` ≤ r + k − 1 for r = 1, . . . , p − k (5.7)
`=1

Clearly,
P the optimum value of the linear program (5.6) immediately yields an upper bound
on (i,j)∈X(t) δt (i, j).
Now we prove that the unique optimal solution to (5.6) is x∗ = (k, 1, . . . , 1). Observe
that this solution x∗ has the property that all constraints in (5.7) are satisfied with equality.
Consider an arbitrary optimal solution x = (x1 , . . . , xp−k ) where P not all constraints in
(5.7) are satisfied with equality. Let v be the smallest such index, i.e., v`=1 x` = v+k−1−ε
for some ε > 0.
Suppose v = p − k holds. Then we may increase xv by ε and achive a feasible solution
with larger objective value and hence a contradiction. Otherwise, i.e., for v < p − k,
consider the solution xε = (x1 , . . . , xv + ε, xv+1 − ε, . . . , xp−k ), which is indeed feasible. We
have that the objective value of xε − x is
 
k−1 k−1
ε − >0
(v + k − 1)(v + k − 2) (v + k)(v + k − 1)

which is a contradiction. Hence x∗ = (k, 1, . . . , 1) is the unique optimum solution to (5.6).


This solution and a simple calculation yields
 
X k−1 k−1
δt (i, j) ≤ 1 + − ft (R) (5.8)
k p−1
(i,j)∈C

and the proof of Claim 5.5 is complete.

5.2 Structure of Typical Request Sequences


What does the request sequence of a typical program look like? To answer this question,
recall that programs are organized in two parts: code and data. The algorithms executed
by the program are implemented in the code segment, which is usually small compared
to the cache size. By locality of reference, most consecutive requests to code pages will

44
have short distance. For example, a lot of time of the control flow is spent in executing
loops resulting in an enormous number of requests to few pages. However, the size of the
datastructures needed by an algorithm can be assumed to be large compared to the cache
size, e.g., databases, matrices, or graphs. This data is mostly accessed in a structured
way, e.g., linearly reading, or in an irregular way, e.g., the queries to a database. In both
cases, we expect a typical memory access pattern: a consecutive request to a page either
has small or large distance compared to the cache size.
We argue that request distances can be divided into three classes: short, long, and
intermediate. Intuitively, for short distance, “no” (reasonable) algorithm faults, for long
distance, “any” algorithm faults (especially any optimal algorithm), but it is not clear for
the distances inbetween. As discussed below, the treatment of the latter turns out to be
crucial for the performance of a strategy. Let k denote the cache size, let α > 1 be such
that αk is an integer, and let β ≥ 0. If a request pair has distance in the interval [0, k − 1],
then its distance is short, if it is in the interval [k, αk − 1] it is intermediate (also called
critical ), and if it is in the interval [αk, p − 1] it is called long.
The (α, β)-adversary is free to choose any sequence R in the set S(α,β) defined by
αk−1 p−1
( )
X X
S(α,β) = R with c(R) : c` (R) ≤ β c` (R) . (5.9)
`=k `=αk
The intuition behind this definition is that there must not be “too many” pairs with
critical distance compared to the long-distance pairs. This classification of sequences is
directly derived from the discussion preceeding this definition. The parameter β controls
the number of consecutive requests with distance in the critical interval [k, αk − 1]. To see
why this interval is critical, consider an (` + 1)-multicycle and observe that each request
pair has distance `. For these sequences, the competitive ratio of Lru is `/(` − k + 1).
This function decreases from k to α/(α − 1) as ` grows from k to αk − 1. Notice that we
do not restrict the number of requests with short distance, i.e., in the interval [0, k − 1].
Hence, our model implicitly captures a notion of locality of reference. Observe that the
interval [αk, p − 1] is also not constrained, allowing sequences that have a typical memory
access pattern.
Summarizing, the (α, β)-adversary is restricted with the number of pairs with distance
αk − 1], but only mildly. We just require that the total number
the interval [k,P Pof interme-
p−1
diate requests αk−1
`=k c` (R) can be balanced with the long-distance requests `=αk c` (R).
Theorem 5.7. For the (α, β)-adversary and cache size k, it holds that
α
r(α,β) (lru) ≤ f (α, β) := 2(1 + β) .
α−1

Proof. Let R ∈ S(α,β) be an admissible sequence. Further, with Theorem 5.1, Theorem 5.2,
(5.9), and `≥k `−k+1 c` (R) ≥ (α − 1)/α p`=αk c` (R) we find
P P
`
p
X αk−1
X p
X
lru(R) = c` (R) + p(R) = c` (R) + c` (R) + p(R)
`=k `=k `=αk
p
X α X`−k+1
≤ (1 + β) c` (R) + p(R) ≤ (1 + β) c` (R) + p(R)
α−1 `
`=αk `≥k
α
≤ 2(1 + β) opt(R) + p(R)
α−1
and the proof is complete, where we have used that p(R) ≤ p is a constant.

45
SPEC Hydro
purpose fluid dynamics
pages 7 451 717
x-axis `
y-axis c` (log-scale)
k 1024 (4 MB)
(α, β) (16.002, 0.012)
f (α, β) 2.16

Figure 5.2: SPEC benchmark Hydro

This result can be considered a theoretical underpinning of the good practical behaviour
of Lru. It gives a criterion for the sequences on which the algorithm achieves constant
competitive ratio. We argue that request sequences that occur in practice satisfy this
criterion.
The bound f (α, β) is a constant if α is “large” and β is “small”. Indeed, the discus-
sion above suggests that practical sequences are in a set S(α,β) with this property. To
substantiate this claim, we performed experiments with memory traces generated by our
implementations of standard algorithms and by the execution of SPEC benchmark pro-
grams. SPEC benchmarks are standard for performance evaluation in computer industry.
They provide programs for a broad variety of applications, e.g., numerical simulations or
user programs. We calculated feasible values for α and β and obtained that our bound
f (α, β) gives values between two and five for these request sequences, see Figure 5.2 and
Figure 5.3.
Further experimental study is needed to determine consistent values of α and β. How-
ever, according to the above reasoning, the size of data needed by a program is large
compared to the cache size, i.e., at least a constant factor. We hence expect that α is
“large” in practice. In our experiments, we also monitored the structure of the charac-
teristic vector of these sequences. Figure 5.2 and Figure 5.3 depict two representative
examples of our results. The x-axis shows the length ` of the consecutive request and the

46
SPEC Nasa7
purpose weather simulation
pages 31 764 036
x-axis `
y-axis c` (log-scale)
k 1024 (4 MB)
(α, β) (3.97, 0.6)
f (α, β) 4.28

Figure 5.3: SPEC benchmark Nasa7

y-axis shows the value of the respective c` in log-scale. There is a peak for smaller values
of ` and there are many large entries c` for larger values of `, corresponding to short and
long distance. This structure suggests that β should be “small” in practice.

47
Chapter 6

Bin Packing

The problem Bin Packing is also a rather basic combinatorial optimization problem. It
occurs, for example, when certain items have to be packed into as few as possible trucks,
where each truck has a weight limitation.

6.1 Deterministic Bin Packing


Formally, the problem is the following. We are given a set of items indexed through
I = {1, . . . , n} along with their weights w = (w1 , . . . , wn ), where 0 ≤ wi ≤ 1. We are
furthermore given an arbitrary
P number of bins with capacity one, each. For any subset
S ⊆ I let weight(S) = i∈S wi be the total weight of the items in that set. The task is to
place the items into as few as possible bins, such that the capacity of no bin is exceeded.
More formally, let Bj ⊆ I denote the set of items placed in bin j, then the problem reads
as follows:

Problem 6.1 Bin Packing

minimize m
subject to Bj ⊆ I, 1 ≤ j ≤ m,
m
∪j=1 Bj = I,
weight(Bj ) ≤ 1, 1 ≤ j ≤ m.

The problem is NP-hard. Specifically, the offline version, i.e., the set I and the vector w
are given at the outset, admits a polynomial-time approximation algorithm with guarantee
3/2. But the problems is not approximable within 3/2 − ε for any ε > 0 in polynomial
time, unless P = NP. Under the (natural) assumption that the optimal number of bins
diverges as the number of items grows, the problem admits a (so-called) asymptotic PTAS,
i.e., the problem can be approximated within 1 + ε in polynomial time for any constant
ε > 0. However, this is not a practical algorithm, since the constants involved are rather
large and since there is an exponential dependency in 1/ε.

Online Bin Packing


In this chapter, we shall examine the online version of the problem, i.e., the set I is not
known beforehand. Instead, we learn the weights w1 , w2 , . . . , wn of the items one-by-one

48
and have to place item i into some bin before the learn the weight of item i + 1. For
techncal reasons we will assume that limn→∞ weight(I) = ∞.
In worst-case perspective, we measure the quality of an algorithm with its (asymptotic)
competitive ratio
alg(w)
r(alg) = lim
n→∞ opt(w)

where alg(I) and opt(I) denote the respective number of bins used by the algorithm alg
and an offline optimal algorithm opt.
One of the simplest online algorithms for Bin Packing is Next-Fit Pk (nf): Initially
i = 1 and j = 1. Pack items i, i + 1, . . . , k into bin Bj as long as `=i w` ≤ 1. When
violated, set i = k + 1 and j = j + 1 and repeat the previous step until there are no items
left.
The idea is to keep always one bin active, i.e., the one that receives items. Once an
item yields that the capacity of the currently active bin is exceeded, the bin is closed
and another bin is opened, i.e., becomes active. Closed bins will never be opened again.
Next-Fit is 2-competitive, which is also best-possible for the algorithm.

Theorem 6.1. We have that


r(nf) = 2.

Proof. We may assume that nf is using more than one bin, since otherwise there is nothing
to show.
Consider the event that nf closes a bin which has remaining capacity p and opens a new
bin. The item that is placed into the new bin has weight at least p. Thus the total weight
of any two consecutive bins is at least one, i.e., at most half of the total capacity is wasted.
Thus, weight(B1 ) + weight(B2 ) ≥ 1, . . . , weight(B2k−1 ) + weight(B
P 2k ) ≥ 1, for k = bm/2c.
Since all items are placed in the bins B1 , . . . , Bm we have i∈I wi ≥ bm/2c ≥ (m − 1)/2.
Thus, X
alg(w) = m ≤ 2 wi + 1.
i∈I
P
Obviously, since we have unit-capacity bins, we have opt(w) ≥ i wi , which yields

alg(w) ≤ 2opt(w) + 1

and the upper bound r(nf) ≤ 2 is shown.


For a tight example, let the number of items be n = 4k and consider the weights
w = (1/2, 1/2k, 1/2, 1/2k, . . . , 1/2, 1/2k). The optimal packing uses k + 1 bins, where k
bins are filled with the 1/2-items, two for each bin, and one bin with the 2k items of weight
1/2k. Obviously, nf places one 1/2-item and one 1/2k-item into each bin, thus needing
2k bins.

6.2 Stochastic Bin Packing


In average-case view, we assume that the item weights Wi are independent and identically
distributed over [0, 1], which yields an n-element vector W = (W1 , . . . , Wn ) of random
variables. We will be interested in the ratio of the expected number of bins as the number
of items grows, i.e.,
E [alg(W )]
e0 (alg) = lim ,
n→∞ E [opt(W )]

49
where alg(W ) and opt(W ) denote the respective random variables for the number of
bins for W .
We consider the (stochastic) process of Next Fit operating on the random weights
W1 , . . . , Wn given one-by-one. The main result of this section is for uniform distributed
weights (but the analysis is more general and can in principle be transfered to other
distributions as well).

Theorem 6.2. If the weights Wi are independent and identically uniform distributed over
[0, 1], then
4
e0 (nf) ≤ .
3

The process Next-Fit induces several other processes:

(1) The sequence (Lj )j≥1 denotes the total weight, also called level, of items packed in Bj
when bin j is closed. The complementary sequence (Rj )j≥1 with Rj = 1 − Lj gives
the remainder of bin j when closed, respectively.

(2) The sequence (Nj )j≥1 of the number of items packed into bin j when closed respec-
tively.

(3) For any non-empty bin j, (Uj )j≥1 denotes the sequence of the sizes of the first item
packed into j.

(4) The process (Mk )k≥1 denotes the sequence of the number of bins when exactly k items
are given.

We shall assume throughout that the item-weights Wj are independent and identically
distributed over [0, 1] according to some random variable X. First we derive some general
structural properties of the above processes, and then we will materialize a bound on nf
for uniform distributed weights.

General Properties
We investigate the process (Lj )j≥1 first. Since nf never opens a bin once closed, it follows
that the distribution of Lj+1 depends only on Lj but not on Lk for any 1 ≤ k < j. This
means that (Lj )j≥1 induces a (first-order) Markov chain, i.e., the Markov condition

Pr [ Lj+1 ≤ t | Lj = tj , . . . , L1 = t1 ] = Pr [ Lj+1 ≤ t | Lj = tj ]

is true for all t, tj , . . . , t1 ≥ 0.


Define FLj (t) = Pr [Lj ≤ t] as the distribution of Lj . Denote

K(s, t) = Pr [ Lj+1 ≤ t | Lj = s]

in the sequel. It is known from Markov chain theory that the distributions for Lj+1 and
Lj satisfy
Z 1
FLj+1 (t) = K(s, t)dFLj (s), (6.1)
0
with the boundary condition
FL1 (t) = K(1, t),

50
since L1 can be seen as a bin following
Pk a bin with L0 = 1.
For any k denote by Sk = i=1 Wi . If the level at closure of bin j is s, then the first
item placed into bin j + 1 has weight more than 1 − s. In our notation: Lj = s implies
Uj+1 > 1 − s. With this we find the following expression for K(s, t):
(
0 if t ≤ 1 − s,
K(s, t) = P
k≥0 Pr [ Uj+1 + Sk ≤ t, Uj+1 + Sk + X > 1 | Lj = s] if t > 1 − s.

Above, we have used that the distribution of Wk+1 is identical to X. The distribution of
Uj+1 is also that of X, conditioned on being larger than 1 − Lj , i.e.,
(
0 if u ≤ 1 − s
FUj+1 |Lj (u|s) = FX (u)−FX (1−s)
1−FX (1−s) if 1 − s < u ≤ 1.

With these observations, (6.1) is well defined. It is, again, known from Markov chain
theory that the Markov chain for (Lj )j≥1 is ergodic, which implies that it has a unique
limit distribution FL = limj→∞ FLj . Specifically, the following theorem gives a way how
to compute it:

Theorem 6.3. The process (Lj )j≥1 is an ergodic Markov chain with a limit distribution
FL satisfying Z 1
FL (t) = K(s, t)dFL (s).
0

As j increases, FLj converges (geometrically fast) to FL and E [Lj ] to E [L].

Uniform Distributed Weights


Here we assume that the weights Wi are identically uniform distributed over [0, 1]. Thus,
we Wi have the distribution FX (t) = t for any 0 ≤ x ≤ 1. Further, the distribution of the
Sk in the range 0 ≤ t ≤ 1 is known to be FSk (t) = tk /k!. For these distributions, we can
materialize the analysis given above.

Lemma 6.4. If FX (t) = t for 0 ≤ t ≤ 1, then K(s, t) defined in (6.1) is given through
(
0 if t ≤ 1 − s
K(s, t) = 1 −(1−t)+s
1 − s (1 − t)e if 1 − s < t ≤ 1.

For the limit distribution FL it holds


3
FL (t) = t3 for 0 ≤ t ≤ 1 and E [L] = .
4

Before we prove Lemma 6.4 we show how it implies our main result.

Proof of Theorem 6.2. Define R = 1 − L as the remaining capacity in the limit. Fix any
n, let Mn and Mn∗ be the respective number of bins used by nf and opt, respectively, for
W . Observe the lower bound
Mn
X Mn
X Mn
X
Mn∗ ≥ Mn − Ri = 1 − Rj = Lj .
j=1 j=1 j=1

51
Thus we may write
h PMn i hP i hP i
Mn Mn
E [nf(W )] E [Mn ] E M n − j=1 R j + E j=1 R j E j=1 R j
= ∗
≤ h i = 1 + hP i.
E [opt(W )] E [Mn ] P
E Mn − j=1 Rj M n Mn
E j=1 Lj

As n → ∞, implies Mn → ∞ with probability one, we have


hP i hP i hP i
Mn m 1 m
E j=1 R j E j=1 R j limm→∞ m E j=1 R j E [R] 1
lim hP i = lim hP i = hP i = = ,
n→∞ Mn m→∞ m 1 m E [L] 3
E j=1 Lj E j=1 Lj limm→∞ m E j=1 Lj

which proves the theorem using E [L] = 3/4 and E [R] = 1/4 as given in Lemma 6.4.

Proof of Lemma 6.4. Using FX (t) = t for 0 ≤ t ≤ 1 we first of all find


(
0 if u ≤ 1 − s
FUj+1 |Lj (u|s) = 1
s (s + u − 1) if 1 − s < u ≤ 1.

Let fU |L denote the respective density. This gives

XZ t
K(s, t) = fU |L (u|s)Pr [Sk ≤ t − u, Sk + X > 1 − u] du
k≥0 u=1−s

XZ t Z t−u
= fU |L (u|s)fSk (x)Pr [X > 1 − u − x] dxdu
k≥0 u=1−s x=0

where we have used independence. Using Pr [X > x] = 1 − x and fU |L (u|s) = 1/s yields
t t−u
XZ 1 xk−1
Z
K(s, t) = (u + x)dxdu.
u=1−s x=0 s (k − 1)!
k≥0

Rt
The term for k = 0 gives the contribution u=1−s u/sdu = (t2 − (1 − s)2 )/2s. Changing
the order of summation (which is allowed as the sums converge uniformly) and integrating
yields the claim.

52
Chapter 7

Traveling Repairman

Here we consider the following online problem: A server operates on a graph G with non-
negative edge-weights and requests to vertices arrive over time (chosen by an adversary),
where each request requires servicetime of one unit. Each request must be served, but the
server is free to choose their ordering and intends to minimize the total flowtime. The
flowtime of a request is the time between its arrival and departure, i.e., the time spent
waiting.
First of all, from the worst-case perspective we observe that no deterministic online
algorithm can have competitive ratio less than diam(G), where diam(G) denotes the diam-
eter of G, i.e., the maximum value of all shortest path lengths between all pairs of vertices.
Even worse, but not at all surprising, no algorithm of the Ignore-type is competitive,
i.e., has bounded competitive ratio, see Theorem 7.2.
These negative results for the natural Ignore-type algorithms from the worst-case
perspective justify an average-case model. We assume that the requests arrive according
to a any stochastic process with rate λ > 0 (and two further natural assumptions), but
an adversary remains free to choose the vertices of the requests. This model captures
as special cases deterministic arrivals and Poisson arrivals (that can be observed in many
real-world scenarios, e.g., arrival times of phone-calls at a call-center). We give an Ignore-
type algorithm that has expected competitive ratio c(λ)ham(G) if λ 6= 1, where c(λ) is a
constant depending on λ, see Theorem 7.3. The value of c(λ) can be as small as two but
diverges as λ tends to one. Thus, for λ 6= 1, the algorithm is competitive. The intuition is
as follows: If λ < 1, then the requests arrive slower than they are served by our algorithm.
If λ > 1, then they arrive faster than they can be served even by an optimum offline
algorithm. For the remaining case λ = 1, no Ignore-type algorithm is competitive. The
reason is that the requests arrive as fast as they can be served by an optimal algorithm, but
online algorithms can be forced to make routing-mistakes that accumulate in the flowtimes
of many requests.

Model and Notation


Throughout, we consider the following model. A server operates on a connected undirected
graph G = (V, E) with n vertices V = {1, 2, . . . , n}, edges e ∈ E, and edge-weights
w(e) ≥ 0. There is a distinguished initial vertex, 1 say, where the server is located at
time zero. The server moves at unit speed, i.e., the time to pass an edge e is equal to
w(e). We define the distance wij between two vertices i, j as the weight of a shortest path
connecting i with j. Further, let the hamiltonicity ham(G) be the weight of a shortest,
not necessarily simple, circle in G that visits all vertices.

53
At certain points in time 0 ≤ a1 ≤ a2 ≤ · · · ≤ am requests R = (r1 , r2 , . . . , rm ) arrive.
Each request has the form rj = (ai , vi ), i.e., its arrival aj and a vertex vj . It is demanded
that the server eventually visits the vertex vj to serve the request, which takes unit time.
The time dj when the request is served is called its departure. Hence the request has
waited for time fj = dj − aj , which is called its flowtime.
The server is free to choose the ordering of service
Pm and is allowed to wait; these decisions
induce a schedule. For any schedule S, F (S) = j=1 fj (S) denotes its total flowtime, where
fj (S) is the flowtime of request rj induced by S. Our objective function is to minimize the
total flowtime. The schedule that minimizes F given the request sequence R in advance
is denoted S ∗ . For ease of notation we usually write F instead of F (S) and F ∗ instead of
F (S ∗ ). An algorithm Alg is called online if its decision at any point in time t depends
only on the past, i.e., the requests r1 , . . . , rj with aj ≤ t. Otherwise the algorithm is called
offline. We write alg(R) = F and opt(R) = F ∗ .
We study two different sources for the request sequence R. Firstly, a malicious deter-
ministic adversary is free to choose R. We measure the quality of schedules produced by
an online algorithm Alg facing such an adversary with the competitive ratio

alg(R)
r(alg) = sup .
R opt(R)

If r(alg) is not bounded, then the algorithm alg is not competitive. Secondly, we study
the following stochastic adversary. For a fixed λ > 0, the adversary is free to choose
a probability distribution D(λ) on the arrival times of the requests, provided that the
expected number of arrivals is λ. Furthermore, the vertices of the requests can also be
chosen arbitrarily. We define the expected competitive ratio by
h alg i
eλ (alg) = sup E ,
D(λ) opt

where alg and opt denote the respective random variables induced by D(λ).
We consider a family of algorithms, called Ignore. Such an algorithm maintains two
memories M and M 0 that are served in phases as follows. Suppose that the algorithm
has collected several requests in the memory M . As soon as the algorithm starts serving
those, all requests that arrive in the meantime are stored in the memory M 0 (and are hence
ignored in the current phase). After all requests in M have been served, the phase ends
and the procedure repeats with changed roles of M and M 0 . Note that we have not yet
specified how the requests in a phase are actually served, which leaves space for various
strategies.

7.1 Deterministic Arrivals


The following negative result states that no lazy online algorithm for this problem is
competitive within less than diam(G), where diam(G) = maxij∈V wij is the diameter of
G. An algorithm is called lazy if it only changes the vertex it is located if there are
pending requests. The situation looks worse for Ignore algorithms: They are not even
competitive, see Theorem 7.2. This result is perhaps not very surprising but justifies that
we consider an average-case analysis of this natural type of algorithms.

Observation 7.1. There is a graph G such that for every lazy online algorithm alg it
holds r(alg) ≥ diam(G).

54
Proof. Consider the graph G = ({1, 2}, {{1, 2}}) with w({1, 2}) = h > 0, where h is
arbitrary. Clearly diam(G) = h.
Initially, the server of the algorithm and the optimum are at vertex 1. Consider just
one request at vertex 2 which arrives at time h. The optimal server anticipates the arrival
and starts traversing the edge at time zero, reaches vertex 2 at time h and serves the
request with flowtime 1. The lazy online algorithm starts traversing the edge at time h
and serves the request with flowtime h + 1.

Theorem 7.2. No deterministic Ignore-algorithm is competitive for the online traveling


repairman problem with flowtimes.
Proof. We assume that the Ignore algorithm starts its first phase immediately when
the first request arrives. Furthermore we assume that the algorithm serves the requests in
each separate phase optimally. We show below how to extend the lower bound to arbitrary
Ignore algorithms.
Consider the graph G = ({1, 2}, {{1, 2}}) with w({1, 2}) = h > 0, where h is an
arbitrary integer. Let the initial vertex of both servers be 1.
We compose a request sequence R out of blocks B1 , B2 , . . . , Bk of requests as defined
below. The idea is as follows: The delay of the algorithm at a block Bi is the difference
between the respective minimum times the algorithm and the optimum start serving any
of the requests in Bi . The essential property of the construction is that the algorithm is
forced to traverse the edge more often than the optimum which increases its delay and
yields unbounded competitive ratio.
We encode the requests with the format (δ, v), where v denotes the respective vertex
and δ the difference between the arrival times of a request and its predecessor (respectively
time zero if there is no predecessor).

B1 = (h, 2)
B2 = (1, 2), . . . , (1, 2), (1, 1), (1, 2), . . . , (1, 2)
| {z } | {z } | {z }
h requests at 2 one trap request at 1 3h + 1 requests at 2

B3 = (h + 2, 1), (1, 1), . . . , (1, 1), (1, 2), (1, 1), . . . , (1, 1)
| {z } | {z } | {z }
5h − 1 requests at 1 one trap request at 2 5h + 1 requests at 1

B4 = (h + 2, 2), (1, 2), . . . , (1, 2), (1, 1), (1, 2), . . . , (1, 2)
| {z } | {z } | {z }
9h − 1 requests at 2 one trap request at 1 7h + 1 requests at 2
..
.

Now we show that the delay increases by 2h with every block.


The optimum serves the sequence R = (B1 , B2 , . . . , Bk ) as follows. At time zero its
server travels to vertex 2, arrives there at time h, i.e., just in time to serve the first request
with flowtime 1 at time h+1. Now the server is at the right vertex to serve all the requests
at this vertex of the first block, except for the trap-request. This is the last request of
the block to be served after traveling to vertex 1. Then the server is at the right spot
to handle the requests of the next block. We continue in this manner for every block.
All requests, except traps, are served at flowtime 1, each. The trap at block Bi , say, has
flowtime (2i − 1)h + h + 1. This yields total flowtime
k
X
opt(R) ≤ 1 + ((4(i − 2) + 1)h + (2i − 1)h + 2ih + 2) ≤ 8hk 2 .
i=2

55
Now we consider the algorithms. Its server is at vertex 1 until the first request arrives
which is served with flowtime h + 1 at time 2h + 1. Notice that the delay of the algorithm
is already h. Until time 2h + 1 the first h + 1 requests of B2 have arrived. These include
the trap request at vertex 1. As we are dealing with an Ignore algorithm it will serve the
trap during the next phase. As the phases are served optimally the trap is handled last in
the phase after one edge-traversal. But now the server is at the wrong vertex to serve the
remaining requests of the block and the edge needs to be traversed once more. Thus the
algorithm crossed the edge two times more than the optimum server and hence its delay
increased by 2h. We continue in this manner and see that the total delay before block Bi
is (at least) 2h(i − 1). Thus we find the lower bounds on the total flowtime
k
X k
X
alg(R) ≥ 2h(i − 1)(2ih) ≥ 2h2 i2 ≥ h2 k 3 /2
i=1 i=k/2

and the competitive ratio r(alg) ≥ kh/16, which is unbounded.


To adapt the lower bound for general Ignore algorithms first recall that any such
algorithm waits for a determined time (which may depend on the requests seen so far)
before starting the first phase. In this case we start by giving the block B1 , i.e., one request
at vertex 2 at time h which is served by the optimal server at time h + 1. Then we wait for
the server of the algorithm to move. Right at this time we issue the blocks B2 , B3 , . . . , Bk
similarly as before.
We can also remove the assumption that the Ignore algorithm serves each phase
optimally. If this is not the case we adapt the blocks as follows, where we use that the
algorithm is deterministic: Let t1 be the time the algorithms needs to finish block B1 .
Similarly to B2 we issue requests to vertex 2 and one trap requests to vertex 1 in the time
until t1 . If the delay of the algorithm after this phase is less than 2h we continue similarly
to the remainder of block B2 , otherwise we continue with block B3 and so forth.

7.2 Stochastic Arrivals


In this section we consider the following stochastic variant of the problem. As before, each
request requires time one for service. An adversary is free to choose the vertices for the
arriving requests and has to specify a stochastic process (Xt )t≥0 , where Xt denotes the
number of arrivals up to some time t. However, the process has to satisfy the following
natural conditions:

(1) For some fixed rate λ > 0 the expected number of arrivals per unit-time is λ.

(2) For any stopping time T , E [XT +t − XT ] ≤ E [Xt ] holds for all t ≥ 0. (A random
variable T is called stopping time for (Xt )t≥0 if the event T = t depends only on
(Xs )s≤t .)

(3) We also require that Var [Xt ] ≤ cE [Xt ] for some constant c, i.e., the process has to
have bounded variance.

Notice that Poisson-distributed and deterministic arrivals fall into this class as special
cases.
We are interested in the expected competitive ratio defined by eλ (alg) = limt→∞ eλ (alg, t),
where eλ (alg, t) = E [alg/opt] when the process is stopped at some time t and the ex-
pectation is taken with respect to Xt . We show that, in this model, bounded expected

56
competitive ratios are possible for Ignore algorithms if and only if λ 6= 1. For the case
λ = 1 the ratio is unbounded for all deterministic Ignore algorithms.
In specific, we consider the following Ignore algorithm Alg. At time zero, the algo-
rithm computes a fixed circle in the graph G that visits all vertices and has length equal
to ham(G). Let w be a parameter to be specified later on. The algorithm waits for time
w and then serves the requests that arrived using the precomputed tour. All requests that
arrive in the meantime are ignored and served in the subsequent phase, etc.
Theorem 7.3. For any stochastic process as defined above, if λ 6= 1, then there is a
constant c(λ) such that the algorithm alg yields

eλ (alg) ≤ c(λ) · (ham(G) + c).

There is a process with λ = 1 such that no Ignore algorithm is expected competitive.


The statement for λ = 1 of the theorem follows from the proof of Theorem 7.2 by
observing that the arrival rate of the requests is one. The statement for λ 6= 1 follows
directly from Lemma 7.5 and Lemma 7.6. The ideas behind these are as follows: For
λ < 1 the requests arrive slower than they are served, i.e., the algorithm is competitive.
For λ > 1 they arrive faster than they can be served, i.e., even the optimum is large.
Now we establish the following technical lemma – the main tool for our analysis –
which relates E [alg/opt] with E [alg] /E [opt]. The intended application is as follows.
We partition the probability space into two parts. The first part is a “good event”, which,
whenever it occurs, implies a bounded competitive ratio. The second part is a “bad event”
for which the value of alg/opt might be large. However, the hope is that this bad event
is unlikely.
Lemma 7.4. Let X and Y be two positive random variables. Then for every δ > 0 such
that Pr [Y ≥ δE [Y ]] > 0 we have
   
X E [X] X
E ≤ +E Y < δE [Y ] Pr [Y < δE [Y ]] .
Y δE [Y ] Y

Proof. First condition on the event Y ≥ E [Y ] δ and then use E [ X | A] ≤ E [X] /Pr [A]
which holds for any non-negative random variable and any event A with Pr [A] > 0. We
deduce
   
X X
E =E Y ≥ δE [Y ] Pr [Y ≥ δE [Y ]]
Y Y
 
X
+E Y < δE [Y ] Pr [Y < δE [Y ]]
Y
E [ X | Y ≥ δE [Y ]] Pr [Y ≥ δE [Y ]]

δE [Y ]
 
X
+E Y < δE [Y ] Pr [Y < δE [Y ]]
Y
 
E [X] X
≤ +E Y < δE [Y ] Pr [Y < δE [Y ]] .
δE [Y ] Y
The next to last inequality holds since for every event A with Pr
 [A] > 0 the
 conditional
expectation can be bounded in the way E [ X | A] = (E [X] − E X | A Pr A )/Pr [A] ≤
E [X] /Pr [A], where we have used that X is non-negative.

57
Below we use the following additional notation. A request rj is called pending while it
has arrived but not departed. By definition, the algorithm initially waits until time T0 = w.
Let the number of requests that arrive during the time interval I0 = [0, T0 ) be A0 . The
time when the algorithm has served these requests and is back at the initial vertex 1 be T1 .
Further let the number of arriving requests in the time interval I1 = [T0 , T1 ) be A1 . This
induces a sequence of intervals I0 , I1 , I2 , . . . , called phases, stopping times T0 , T1 , T2 , . . .
and numbers A0 , A1 , A2 , . . . , called arrivals. Fj denotes the (random) flowtime of a request
rj and I(j) the phase in which rj arrives. The optimum flowtime of request rj is denoted
Fj∗ .

Lemma 7.5. For λ < 1 we have eλ (alg) ≤ (4/(1 − λ) + c/2) · (ham(G) + 1).

Proof. We begin by choosing a value for the parameter w, i.e., the time the algorithm waits
before starting its first phase. For any positive value of w, we expect that λw requests
arrive while waiting. Hence we expect that the first phase takes time ham(G) + λw. We
require that this time be no more than w, i.e., we need ham(G) + λw ≤ w. Hence we
choose w = ham(G)/(1 − λ) which is possible for λ < 1.
Let t be any fixed time and let Xt be the (random) number of requests that arrive until
time t. Now we establish the bounds F ∗ ≥ Xt and E [F ] ≤ E [Xt ] 2(ham(G) + 1)/(1 − λ).
The optimum flowtime for any request rj is Fj∗ ≥ 1 and for Fj we have Fj ≤ ham(G) +
AI(j)−1 + ham(G) + AI(j) + 1.
We prove by induction that E [Ai ] ≤ λw for all i ≥ 0. This implies E [Fj ] ≤ 2ham(G) +
(2λw+1) ≤ 2ham(G)/(1−λ)+1, by choice of w. The base case E [A0 ] = λw is clear. For the
inductive case we exploit the following assumed property of our process: For any stopping
time T , E [XT +t − XT ] ≤ E [Xt ] holds for all t ≥ 0. By induction hypothesis E [Ai ] ≤ λw
holds for a fixed i and it follows that E [Ai+1 ] = λE [Ti+1 − Ti ] ≤ λE [ham(G) + Ai ] ≤
λ(ham(G) + λw) ≤ λw. The crude estimates F ≤ Xt2 (ham(G) + 1) and F ∗ ≥ Xt give
 
F 1 E [Xt ] (ham(G) + 1)
E Xt < E [Xt ] ≤ .
F∗ 2 2

With Lemma 7.4, Chebyshev’s inequality, and the choice δ = 1/2 we have
 
F 1
eλ (alg, t) ≤ E Xt ≥ E [Xt ]
Xt 2
   
F 1 1
+E X t < E [X t ] Pr X t < E [X t ]
F∗ 2 2
 
2E [F ] E [Xt ] (ham(G) + 1) 1
≤ + Pr |Xt − E [Xt ] | ≥ E [Xt ]
E [Xt ] 2 2
4(ham(G) + 1) E [Xt ] (ham(G) + 1) Var [Xt ]
≤ + · ,
1−λ 2 E [Xt ]2

which yields the claim using Var [Xt ] ≤ cE [Xt ], as assumed.

Lemma 7.6. Let 0 < δ < 1 be an any constant such that (1 − δ)λ > 1. Then we have
!
alg 2 cλ2
E (λ) ≤ + 1 2 · (ham(G) + 1).
δ 2 (λ − 1−δ )

58
Proof. Here we choose w = 0, i.e., the algorithms begins its first phase immediately after
the arrival of the first request. We stop the process at any fixed time t and let Xt be the
number of requests that arrived.
As upper bound we use the crude estimate F ≤ Xt2 (ham(G) + 1). Now we establish
the lower bound F ∗ ≥ Xt2 δ 2 /2 which holds conditional on the event that Xt (1 − δ) ≥ t.
At time t the optimum can have served at most t many requests. Thus, by the condition
on Xt , there are P ≥ XP t − t ≥ δXt requests pending. As each request takes time at
least one we have F ∗ ≥ Pi=1 i ≥ δ 2 Xt2 /2. Thus, using Chebyshev’s inequality, we find
Pr [Xt (1 − δ) < t] ≤ Pr [|Xt − E [Xt ] | > t(λ − 1/(1 − δ))] ≤ cλ/t(λ − 1/(1 − δ))2 and hence

2Xt2 (ham(G) + 1)
 
alg
Et (λ) ≤ E Xt (1 − δ) ≥ t
δ 2 Xt2
 2 
Xt (ham(G) + 1)
+E
Xt Xt (1 − δ) < t Pr [Xt (1 − δ) < t]
2(ham(G) + 1) (ham(G) + 1)t cλt
≤ 2
+ · 2 1 2
δ (1 − δ) t (λ − 1−δ )
2(ham(G) + 1) c(ham(G) + 1)λ2
≤ + 1 2
δ2 (λ − 1−δ )

which yields the claim using λ(1 − δ) > 1.

59

You might also like