0% found this document useful (0 votes)

16 views11 pages

Online Optimal Transport & Algorithms

Uploaded by

elmansouriomar9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views11 pages

Online Optimal Transport & Algorithms

Uploaded by

elmansouriomar9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Stopping times and online algorithms

EL MANSOURI Omar

Contents
1 Random tree metric and online optimal transport 2
1.1 Brief introduction to optimal transport . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The online transport problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 The tree case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Wrapping-ups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 The K-server problem/simplex paging via online convex optimization 6

2.1 Introduction to the K-server problem . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 “Reminder” on online gradient and online mirror descent . . . . . . . . . . . . . 7
2.3 Back to the paging problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1
1 Random tree metric and online optimal transport
1.1 Brief introduction to optimal transport
The principle of optimal transport is, given two measures µ and η on Rd , to transport µ onto η
while minimizing some cost function.

−η
→

Figure 1: intuition for optimal transport

From Monge’s point of view, the objective is to find a function T : X → Y (X and Y being
the respective supports of µ and η) such that T# µ = η where T# µ(A) = µ(T −1 (A)) is the push
forward of µ by T , and that minimizes the cost
Z
c(x, T (x)) dµ(x)
X

If µ is not regular (a.c Lebesgue), such a “transport map” T might not exist, as the following
example shows.

1 1
2 δ−1 µ = δ0 2 δ+1 η = 12 (δ−1 + δ1)

−1 1 0 1 1
2 2

Figure 2: counter-example for the existence of T

Here, µ = δ0 and η = 12 (δ−1 + δ1 ). In order to transport µ onto η, one would have to split the
mass at 0 into equal halves and send each half onto −1 et 1 respectively, effectively mapping 0
to two different destinations.

In Kantorovich’s terms, the goal is to find a distribution π ∈ M(X , Y) where M(X , Y) is the
set of all joint distributions on X × Y with marginals µ on X and η on Y , such that the cost:

E(x,y)∼π [c(x, y)]

is minimized. Note that this objective function is linear in π, as well as the constraints π1 = µ
and π2 = η. Hence, Kantorovich’s formulation of the problem is a linear program, whose dual

2
can be expressed as
Z Z
max f (x) dµ(x) + g(y) dη(y) such that ∀ (x, y) ∈ X × Y, f (x) + g(y) ≤ c(x, y)
f :X →R X Y
g:Y→R

The typical choice for the cost function is c(x, y) = d(x, y) for some distance d. Under these
conditions and adding exponents, the problem can be restated as searching for
1/p
Wp (µ, η) = min E[dp (x, y)]
π∈M(X ,Y)

called the p-Wasserstein distance between µ and η.

1 Xn i.i.d
Theorem 1.1. If µ is supported on Rd and µn = δX where Xi ∼ µ, then
n i=1 i
E[W (µ, µn )] ≲ n−1/d if d ≥ 2
1
≤ √ if d = 1
n

1.2 The online transport problems

Xn Xn
Given µ = n1 δ and vn = n1
i=1 ui
δ where vj are i.i.d and distributed according to F ,
j=1 vj
the goal is to minimize the transport cost from v to µ when the vj are observed and matched
or transported sequentially.
The data of the problem is (u1 , . . . , un ) ∈ Rd . The sequential transporting works as follows: at
each step,
– observe vj ∼ F
– match vj to um(j) where m(j) ∈ [n]\{m(1), . . . , m(j − 1)}
– pay d(vj , um(j) )
Xn
The goal in online matching is then to minimize the total cost paid n1 j=1
d(vj , um(j) ). The
“oracle” that knows everything in advance would use the optimal transport map, i.e. E[OPT] =
E[W (µ, vn )] where OPT is the optimum.

The difficult case is when F = µ. The data is still u1 , . . . , un ∈ Rd , and at each step vj ∼
U[{u1 , . . . , un }] then observe, match and pay. Here, the vj are i.i.d (sampled with replacement)
and E[OPT] = E[W (µ, µn )]. The optimnal online solution is obtained through dynamic pro-
gramming, but is too complicated to analyze (and compute). We need a simpler algorithm.
The idea is to keep an “invariant” which is the distribution of the remaining n − k vi after k
iterations is a uniform n − k size subset of the n points. We thus minimize the cost to keep the
invariant at each iteration.

Let us denote by Sk the set of last k remaining vi . Then, we will have that Sk is a set of k points
drawn from {u1 , . . . , un } without replacement. At stage n − k, there remain k points in Sk and
the algorithm computes the optimal matching between the uniform distribution on {u1 , . . . , un }
(i.e. the law of vn−k+1 ) to the uniform distribution on Sk . It then matches vn−k+1 at random
using this optimal matching.
We know that E[OPT] = E[W (µ, µn )]. The expected cost of the algorithm is
n
X
E[ALG] = E[W (µ, U(Sk )]
k=1

3
where Sk is a k-subset of {u1 , . . . , un } drawn without replacement.

Notations. Let us introduce a bit of notation.

— We will denote by M (Sk ) = W (µ, U(Sk )) where Sk is a multi-subset of {u1 , . . . , un } of
cardinality k.
— We will denote by Uk the distribution of Sk when elements are taken without replacement
(recall that Sk is a subset of {u1 , . . . , uk }.
— We denote by Ik the same distribution but with replacements.
The following lemma will be admitted.

Lemma 1.2. We have: ck = ESk ∼Ik [M (Sk )] ≥ ESk ∼Uk [M (Sk )].

Hence, setting OPT = cn , we obtain the competitive ratio of the algorithm by:
n n
1X 1 X
ALG ≤ ck then CR(ALG) ≤ ck
n k=1 ncn k=1
Using the fact that E[W (µ, µk )] ≤ k −1/d ,
n
1 X 1
CR(ALG) ≲ ck ≤ 1/d
ncn k=1 n cn

1.3 The tree case

Proposition 1.3. If the ui are on a line, then the competitve ratio CR is constant (lesser than
4). More precisely, we will show that
n
r
Ck ≤ Cn
k

Proof. Let e be an edge of the line (“two adjacent nodes”) and let us denote by de the
distance cost of this edge. Then,
X Xe ne
M (sK ) = de −
e∈E k n
where ne is the size (number of ui ) of the smallest half-line starting from e and Xe is the
number of vj that belong to this half-line. In the example above, ne = 2 and Xe = 1. Note
that Xe is a random variable, but we know its law: it as binomial with parameters ne /n
and k, B(k, nne ). Then,
s
X de kne ne ne
r
X
E[M (Sk )] ≤ 1− ≤ de
e∈E k n n e∈E nk
X
Xe ne
and OPT = M (Sk ) so that E[OPT] = e
de n
− n
where Xe ∼ B(n, nne ). It also holds
h i √
σ(X ne
that E |Xe − E[Xe ]| ≤ √ e) ≤ with σ(Xe ) the standard deviation of Xe . Hence,
2 4
√
X de ne
E[OPT] ≥
e∈E 4n

Corollary 1.4. The result also holds for a tree (the exact same proof works).

4
1.4 The general case
In general the metric induced by u1 , . . . , un is not a tree. But if there was a tree T such that
d(ui , uj ) ≤ dT (ui , uj ) ≤ Cd(ui , uj ) (the vertices of T being the ui ) where dT is the metric of T ,
then using the algorithm with respect to the tree metric dT would provide
n
1X
E[ALG] = ES ∼U [M (Sk )]
n k=1 k k
n
1X
≤ ES ∼U [M T (Sk )]
n k=1 k k
≤ 4E[OPT] because dT is a tree metric
≤ 4CE[OPT] because of the inequality
So, the algorithm would we 4C-competitive. Unfortunately, there is no such deterministic tree
with C ≪ n as the following example shows.

1
But, if we take this line at random, then E[dT (ui , ui+1 )] = 1 1 + n
+ n n1 = 2.

Theorem 1.5. Given n points u1 , . . . , un and a metric d(· , ·), there exists a random tree T
such that d(u, v) ≤ E[dT (u, v)] ≤ 8 log(n)d(u, v).

Proof. The random tree is constructed as follows: first, let π be a random permutation of [n],
1
then let β ∈ (1, 2) be a random number drawn from a distribution with density x log(2) on
(1, 2). Without loss of generality, we can assume that min d(u, v) ≥ 1 and max d(u, v) = 2I
for some large I ∈ N. This is equivalent to a representation of the following form:

{u1, u2, u3, u4, u5}

layer I

layer I − 1 {u4, u5} {u1, u2, u3}

layer I − 2 {u1, u2}

u4 u5 u3
...
u1 u2

Figure 3: the constructed random tree for n = 5

We will have DI = {u1 , . . . . , un }. Given a layer Di+1 (collection of clusters of [n]), we will
build inductively Di by splitting each of its clusters. We will denote βi = β2i−1 and for every
cluster of Di+1 and every k ∈ {1, . . . , n} we will create a new cluster with all unassigned
(yet vertices) whose distances to uπ(e) is smaller than βi .
And, the distance between a cluster Di+1 and its sub-cluster is 2i+1 . Given a pair (u, v) of
vertices, we say that w “settles’ (u, v) at level i if w is the first center to which u or v is
assigned, and that w “cuts” (u, v) at level i if w settles (u, v) at the same level but not both
u and v are assigned to w. Therefore, dT (u, v) = 2i+2 .

We then define the local metric

dTw (u, v) = 1{w cuts (u, v) at level i} · 2i+2
X

5
dTw (u, v).
X
This implies that dT (u, v) ≤ w

Notice that w cuts (u, v) at level i if:

(i) d(w, u) ≤ βi = β2i+1 < d(w, v) (or the reverse inequality)
(ii) w settles the pair (u, v)
If we rank the w’s with respect to their distance to the pair (u, v) and we denote by s the
rank of w that cuts (u, v) at level i, then
Z d(w,v)
8x 1
E[dTw (u, v)] = · dx
d(w,u) x log 2 s
because the probability that w settles (u, v) knowing it is the s-th closest point is 1/s.
Hence,
8 8d(u, v)
E[dTw (u, v)] ≤ (d(w, v) − d(w, u)) ≤
s log 2 s log 2
n
" #
8d(u, v) 1 8 log(n)
dTw (u, v) ≤
X X
so E ≲ d(u, v)
w s log 2 s=1 s log(2)

1.5 Wrapping-ups
Theorem 1.6. The algorithm for online stochastic matching (applied to the random tree metric)
has a competitive ratio of Θ(log n).

If we are more cautious, we would get a competitive ratio of log(log(log n))2 . According to an
ongoing conjecture, it should be constant (with perhaps a dimension-dependant constant).

2 The K-server problem/simplex paging via online con-

vex optimization
2.1 Introduction to the K-server problem
Given a metric space (X, d), “requests” arrive sequentially: r1 ∈ X, then r2 ∈ X, . . . , rT ∈ X.
The algorithm has K servers at some location in X and it must send one server at the request
rt . If we denote by ρt ∈ X K the locations of the servers at time t (server 1 at ρt,1 , etc.), then
an algorithm must enforce rt ∈ {ρt,i }i∈[K] .
The performance of an algorithm is the total distances of all the servers, starting from ρ0 :
 
XT X
K
E[ALG] = E d(ρt−1,j , ρt,j )
t=1 j=1

(we use expected values if algorithm ALG is random) and

T X
X K
OPT = min d(Yt−1,j , Yt,j )
Yt,j : xt ∈{Yt,j }j∈[K]
t=1 j=1

An algorithm is C-competitive if there exists some constant α > 0 such that for any sequence
of requests, E[ALG] ≤ CE[OPT] + α (α simply represents the fact that initialization does not
matter).

6
Remark 2.1. A natural idea would be to use a greedy algorithm: move the closest server to
the request. For instance, is X = {x1 , x2 , x3 } (which is discrete) and ρ0 = {x1 , x3 }:

server 1 server 2

x1 x2 x3

Figure 4: counter-example for the greedy algorithm’s optimality

If r = x2 then x3 then x2 then x3 , . . . then a possible set of successive moves is ρ2 from x3 to x2

(with cost 1), then ρ2 from x2 to x3 (cost 1), . . . In this setting, the greedy algorithm will have a
cost of 1 + · · · + 1 = T (each time, ρ2 is moved) whereas the optimal procedure will only move
ρ1 to x2 at the beginning, procducing a final cost of only 1.
This shows that the greedy algorithm has a +∞-competitive ratio on the simplest (non-trivial)
problem. We are thus forced to use randomness to “hide” the position of the servers (i.e. hide
it from an attacking opponent who sends requests in order to fail the algorithm).

The case of the K-server problem turns out to be very complex, so we will instead focus on a
simpler version: the K-paging problem.

In the K-paging problem, X = [n] = {1, . . . , n} and d(xi , xj ) = 1xi ̸=xj (the original motivation
behind this lies in the context of a memory of size K and n different request types, i.e. web
pages, for which a page needs to be in the memory to be displated).
We are going to look for fractional algorithms: a fractional algorithm specifies at each time
the “probability” of having a server at a given position.Xn Instead of ρ = (ρ1 , . . . , ρK ) ∈ [n]K , we
will consider z = (z1 , . . . , zn ) ∈ [0, 1]n such that z = K, or equivalently, the anti-paging
i=1 i
#» Xn
allocation, setting x = 1 − z so that x = (x1 , . . . , xn ) is required to satisfy x = n − K.
i=1 i

Define then:  
 n 
P = x = (x1 , . . . , xn ) ∈ [0, 1]n
X
xi = n − K

i=1

A request rt ∈ X is rewritten as a unit vector et = (0, . . . , 0, 1, 0, . . . , 0) where the 1 is in the

rt -th position. The constraint of an algorithm is that at time t, xt,rt = 0 (the request has been
served). In this case, xt moves in the direction of −et in order to satisfy the request. A tentative
solution would be to follow the projected gradient descent:

xt+nδ = πP (xt+(n−1)δ − ηe1 )

where πP is the projection on the polytope P.

2.2 “Reminder” on online gradient and online mirror descent

Given an action space X ⊂ Rd that is compact and convex, at stage t ∈ N, the algorithm
chooses xt ∈ X then nature chooses a convex L-Lipschitz loss function ft : X → R, and the
algorithm suffers ft (xt ) and observes ft (·).

7
The performance ofd an algorithm is measured by its regret
T T
1X 1X
Rt = ft (xt ) − min ft (x)
T t=1 x∈X T
t=1

Note that in the minimum, the same x is used to evaluate each ft . If ft = f some function
XT
independent on f , then RT ≥ f T1 x − f (x∗ ) the “optimization error”).
t=1 t

Proposition 2.2. Gradient descent with the usual step size

R
xt+1 = πX xt − η∇ft (xt ) with η= √
L T
√
L being the Lipschitz constant, ensures that RT ≤ RL/ T where R is the diameter of x.

Proof. We have:

∀x ∈ X, ft (xt ) − ft (x∗ ) ≤ ∇ft (xt )T (xt − x)

1
≤ (xt − yt+1 )T (xt − x) by definition of xt+1
η
1
= (∥xt − x∥2 + ∥xt − yt+1 ∥2 − ∥yt+1 − x∥2 )
2η
1
= (∥xt − x∥2 − ∥yt+1 − xt ∥2 + ∥η∇ft (xt )∥2 )
2η
1
≤ (∥xt − x∥2 − ∥xt+1 − xt ∥2 + η 2 L2
2η
T
1X ∥xT − x∗ ∥2 ηL2 R2 ηL2
so ft (xt ) − ft (x) ≤ + ≤ +
T t=1 2ηT 2 2ηT 2

Example 2.3. For gradient descent, xt+1 = xt − η∇f (xt ).

Since xt and ∇f (xt ) are not really in the same space (xt ∈ Rd and ∇f (xt ) is in its dual), the
“mirror descent” rewrites the gradient descent method by respecting the two dual spaes.
To do that, we consider as function ϕ that will map points to gradients and gradients to points.

Definition 2.4. Given a convex open set D ⊂ Rd , ϕ is a mirror map for D if:
∗) ϕ is strictly convex and differentiable
∗) ∇ϕ(D) − Rd
∗) limx→∂D ∥∇ϕ∥ = +∞
If ϕ is a mirror map and X ⊂ D, the mirror descent algorithm is defined by:

 ∇ϕ(y = ∇ϕ(xt ) − η∇f (xt )
t+1 )
ϕ
 xt+1 = πX (yt+1 )

8
dual space
X ∇ϕ(·)

xt ∇ϕ(xt)
−η∇f (xt)
xt+1
∇ϕ(yt+1)

yt+1

∇ϕ(·)−1

Figure 5: illustration of mirror maps and mirror descent

ϕ
where πX is a projection onto X that respects the geometry of ϕ, i.e. xt+1 = minx∈X Dϕ (x, yt+1 )
with Dϕ the Bregman divergence defined by
1
Dϕ (x, y) = ϕ(x) − ϕ(y) + ∇ϕ(y)T (x − y) ≈ (x − y)T ∇2 ϕ(y)(x − y)
2
(the terms ϕ(y) − ∇ϕ(y)T (x − y) are a first order Taylor expansion of ϕ(x)).

Proposition 2.5 (properties of the Bregman divergence). The following hold:

(i) ∇ϕ(x) − ∇ϕ(y))T (x − z) = Dϕ (x, y) + Dϕ (z, x) − Dϕ (z, y)
T
ϕ
(ii) ∀ x ∈ X and y ∈ D, (∇ϕ πX (y)) − ∇ϕy (πxϕ (y) − x) ≤ 0

Theoremq 2.6. The mirror descent (with respect to ϕ) for f convex and L-Lipschitz, with
R
η = L 2ρ/T , where R2 = supx∈X ϕ(x) − ϕ(x1 ) and ρ is the strong convexity parameter of ϕ,
ensures that: s
2
Rt ≤ RL
ρT

Proof. It is almost identitcal to the gradient descent proof:

ft (xt ) − ft (x∗ ) ≤ ∇ft (xt )T (xt − x)

1
≤ (∇ϕ(xt ) − ∇ϕ(yt+1 ))T (xt − x)
η
1
= (Dϕ (x, xt ) + Dϕ (xt , yt+1 ) − Dϕ (x, yt+1 ))
η
1
≤ (Dϕ (x, xt ) − Dϕ (x, xt+1 ) + Dϕ (x, xt+1 ) + Dϕ (xt , yt+1 ) − Dϕ (x, yt+1 ))
η
1
≤ (Dϕ (x, xt ) − Dϕ (x, xt+1 ) + Dϕ (xt , yt+1 ) − Dϕ (xt+1 , yt+1 ))
η

9
by telescoping in the two first tirms. In addition,

Dϕ (xt , yy+1 ) − Dϕ (xt+1 , yt+1 )

= ϕ(xt ) − ϕ(xt+1 ) − ∇ϕT (xt − xt+1 )
ρ
≤ (∇ϕ(xt ) − ∇ϕ(yt+1 ))T (xt − xt+1 ) − ∥xt − xt+1 ∥2
2
ρ
= η∇ft (xt ) (xt − xt+1 ) − ∥xt − xt+1 ∥2
T
2
ρ
≤ ηL∥xt − xt+1 ∥ − ∥xt − xt+1 ∥
2
Summing over T gives:
T
1X 1 (ηL)2 1 (ηL)2
f (xt ) − ft (x) ≤ Dϕ (x, xt ) + ≤ (ϕ(x) − ϕ(xt )) +
T t=1 ηT 2ρ ηT 2ρ

n X o
Corollary 2.7. If X = (x1 , . . . , xn ) ∈ [0, 1]n x = 1 and ∥∇f ∥∞ ≤ 1, then the mirror
k k
Xn
descent with the entropy ϕ(x) = i=1
xi log(xi ) ensures that
s
log(K) q
RT ≤ (instead of K/T )
T

2.3 Back to the paging problem

At time t, the algorithm received a request −et then moved in the direction of −et if we were
following gradient
Xdescent. We shall instead use the mirror descent algorithm with respect to
entropy: ϕ(x) = i xi log(xi ). For gradient descent, we have:

xm+1 = xm − η∇f (xm )

!
1
x m 1+ m
− x(m)
so = −∇f (xm )
η
so ẋ = −∇f (x)

For mirror descent,

∇ϕ(xt+1 ) = ∇ϕ(xt ) − η∇f (xt )

∆ϕ(xt+1 ) − ∆ϕ(xt )
so = −∇f (xt )
η
so ∇2 ϕ(xt )T ẋt = −∇f (xt )

which is the continuous-time version of the unprojected mirror descent. In our case (since the
anti-paging polytope is not the simplest), the mirror descent, both in discrete or continuous
time, might “hit” the boundary of P.
If we define the normal cone of P at x by the expression

NP (x) = {z ∈ Rd | z T (y − x) ≤ 0, ∀ y ∈ P}

the projected mirror descent in continuous time writes as

∇2 ϕ(x(t))ẋ(t) ∈ −et − NP (x(t))

10
The interesting property is that this equation has a unique solution starting from x(0). The
algorithm is given a request er and follows the continuous time solution of the projected mirror
descent until the request is satisfied.

Proposition 2.8. This algorithm is supx∈P ∥∇ϕ∥-competitive.

Proof. (The result is actually useless because sup ∥∇ϕ∥ = +∞). We will look at how the
Bregman divergence between x(t) (the position of the algorithm) and y(t) (the position of
the optimum) evolves. More precisely, we shall focus on:
c (y, x) = −ϕ(x) − ⟨∇ϕ(x), y − x⟩
Dϕ

Indeed, ∂t D
c (y, x) = ∂ D
ϕ t,1 ϕ (y, x)ẏ +∂t,2 Dϕ (y, x)ẋ, the two terms respectively corresponding
c c
to the instantaneous cost of OPT, x∥∇ϕ∥∞ , and the instantaneous cost of ALG.
c (y, x) = ϕ(x) − ⟨ϕ(x), y − x⟩, if y is fixed and we differentiate with respect to t,
Since Dϕ
then
T
ẋ + ⟨∇2 ϕ(x)ẋ, y − x⟩ +
c (y, x) = −∇ϕ(x)
⟨∇ϕ(x),

∂t Dϕ
x⟩

= ⟨∇2 ϕ(x)ẋ, y − x⟩
∈ −⟨er − NP (x), y − x⟩

c (y, x) ≤ −⟨e , y − x⟩.

so, since y is a point that satisfies the request, ∂t Dϕ r

Stopping Times and Online Algorithms Lecture Notes
No ratings yet
Stopping Times and Online Algorithms Lecture Notes
11 pages
Advanced Algorithms
No ratings yet
Advanced Algorithms
218 pages
Slightly Improved Approximation Algorithm For Metric TSP
No ratings yet
Slightly Improved Approximation Algorithm For Metric TSP
89 pages
AAscript
No ratings yet
AAscript
158 pages
BCS-042 em 2024-2025
No ratings yet
BCS-042 em 2024-2025
17 pages
Combinatorial Optimization
No ratings yet
Combinatorial Optimization
7 pages
Cmu850 f20
No ratings yet
Cmu850 f20
309 pages
Introduction To Combinatorial Optimization (Ding-Zhu Du Panos M. Pardalos Xiaodong Hu Weili Wu) (Personal - Utdallas.edu)
No ratings yet
Introduction To Combinatorial Optimization (Ding-Zhu Du Panos M. Pardalos Xiaodong Hu Weili Wu) (Personal - Utdallas.edu)
230 pages
Exercise 6
No ratings yet
Exercise 6
4 pages
Optimal Transport for Data Science
No ratings yet
Optimal Transport for Data Science
209 pages
Efficient Qos Routing: Stavroula Siachalou, Leonidas Georgiadis
No ratings yet
Efficient Qos Routing: Stavroula Siachalou, Leonidas Georgiadis
10 pages
Chapters PDF
No ratings yet
Chapters PDF
51 pages
Graphs and Beyond - Faster Algorithms For High Dimensional Convex Optimization - Jakub Pachocki
No ratings yet
Graphs and Beyond - Faster Algorithms For High Dimensional Convex Optimization - Jakub Pachocki
202 pages
Algorithm Methodologies
No ratings yet
Algorithm Methodologies
28 pages
hw3 P
50% (2)
hw3 P
15 pages
Optimization Paradigms Lecture Notes
No ratings yet
Optimization Paradigms Lecture Notes
118 pages
Minimum Spanning Trees
No ratings yet
Minimum Spanning Trees
17 pages
Dict PDF
No ratings yet
Dict PDF
221 pages
Unit Iii
No ratings yet
Unit Iii
16 pages
Approximation Algorithms
No ratings yet
Approximation Algorithms
37 pages
Algo 09
No ratings yet
Algo 09
39 pages
Solutions 3
No ratings yet
Solutions 3
4 pages
Lec 16
No ratings yet
Lec 16
10 pages
Dynamic Programming
No ratings yet
Dynamic Programming
36 pages
Dynamic Programming for Engineers
No ratings yet
Dynamic Programming for Engineers
102 pages
Negative-Weight Single-Source Shortest Paths in Near-Linear Time
No ratings yet
Negative-Weight Single-Source Shortest Paths in Near-Linear Time
38 pages
Online and Streaming Algorithms For Clustering - UCSD - Lec6
No ratings yet
Online and Streaming Algorithms For Clustering - UCSD - Lec6
9 pages
DAA Unit III
No ratings yet
DAA Unit III
53 pages
Intro To Data Science
No ratings yet
Intro To Data Science
47 pages
UPT Cours GTV2
No ratings yet
UPT Cours GTV2
37 pages
Min Cost Path
No ratings yet
Min Cost Path
10 pages
Probabilistic Approximation of Metric Spaces and Its Algorithmic Applications
No ratings yet
Probabilistic Approximation of Metric Spaces and Its Algorithmic Applications
60 pages
Design & Analysis of Algorithms (DAA) Unit - III
No ratings yet
Design & Analysis of Algorithms (DAA) Unit - III
17 pages
Draft ch1 8
No ratings yet
Draft ch1 8
105 pages
Efficient Algorithms for Graph Problems
100% (1)
Efficient Algorithms for Graph Problems
6 pages
What Is Dynamic Programming
No ratings yet
What Is Dynamic Programming
6 pages
Dijkstra's Algorithm in C
No ratings yet
Dijkstra's Algorithm in C
5 pages
DAA Endsem Notes
No ratings yet
DAA Endsem Notes
15 pages
DAA UNIT - 5 Material
No ratings yet
DAA UNIT - 5 Material
10 pages
Advanced Algorithm Insights
No ratings yet
Advanced Algorithm Insights
4 pages
Introduction To Approximation Algorithms
No ratings yet
Introduction To Approximation Algorithms
5 pages
Approximation Algorithms
No ratings yet
Approximation Algorithms
13 pages
Shortest Path Algorithms in Graphs
No ratings yet
Shortest Path Algorithms in Graphs
16 pages
DAA New Unit 3
No ratings yet
DAA New Unit 3
58 pages
Ads Unit-5
No ratings yet
Ads Unit-5
45 pages
HW 2
No ratings yet
HW 2
2 pages
Safoori
No ratings yet
Safoori
9 pages
Routing Algorithms
No ratings yet
Routing Algorithms
57 pages
Data Structures and Algorithms: Kurt Mehlhorn Peter Sanders
No ratings yet
Data Structures and Algorithms: Kurt Mehlhorn Peter Sanders
55 pages
Cmu850 f20
No ratings yet
Cmu850 f20
285 pages
4th Sem DAA Module 3
No ratings yet
4th Sem DAA Module 3
10 pages
Advanced Algorithms Course. Lecture Notes. Part 12: Small Vertex Covers (Continued)
No ratings yet
Advanced Algorithms Course. Lecture Notes. Part 12: Small Vertex Covers (Continued)
3 pages
Assignment 03 Ai
No ratings yet
Assignment 03 Ai
17 pages
Computational Thinking Project Notes
No ratings yet
Computational Thinking Project Notes
10 pages
DAA Approximation Algorithms
No ratings yet
DAA Approximation Algorithms
32 pages
Homework 4: Question 1 - Exercise 17.4-3, 17-3.6
No ratings yet
Homework 4: Question 1 - Exercise 17.4-3, 17-3.6
5 pages
Approximation Algorithms
No ratings yet
Approximation Algorithms
13 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
236 pages
Cost Revenue Profit Function
No ratings yet
Cost Revenue Profit Function
19 pages
Lecture Tutorial
No ratings yet
Lecture Tutorial
40 pages
Knowledge Representation Using First-Order Logic
No ratings yet
Knowledge Representation Using First-Order Logic
51 pages
Discrete Mathematics An Open Introduction 2nd Edition Oscar Levin PDF Download
No ratings yet
Discrete Mathematics An Open Introduction 2nd Edition Oscar Levin PDF Download
41 pages
Stereo Image Coding: A Projection Approach: Hal Uk Aydıno Glu, and Monson H. Hayes, III
No ratings yet
Stereo Image Coding: A Projection Approach: Hal Uk Aydıno Glu, and Monson H. Hayes, III
11 pages
Grade 6 Math Lesson: Pie Graphs & Probability
No ratings yet
Grade 6 Math Lesson: Pie Graphs & Probability
7 pages
Group Theory Worksheet: Goals
No ratings yet
Group Theory Worksheet: Goals
6 pages
Revised Linear Equations in Two Variables
No ratings yet
Revised Linear Equations in Two Variables
15 pages
C Language - Operators
No ratings yet
C Language - Operators
53 pages
Task 3 Rubric
No ratings yet
Task 3 Rubric
2 pages
Bioinformatics BS MS: Curriculum OF
No ratings yet
Bioinformatics BS MS: Curriculum OF
56 pages
Height Project
No ratings yet
Height Project
9 pages
The Tobit Model
No ratings yet
The Tobit Model
13 pages
Computing Tools For Mathematics Spring 2020 Key
No ratings yet
Computing Tools For Mathematics Spring 2020 Key
2 pages
Number System Conversion UGC NET
No ratings yet
Number System Conversion UGC NET
4 pages
Aptitude Solved Problems: Answer: 10 Liters Explanation
No ratings yet
Aptitude Solved Problems: Answer: 10 Liters Explanation
11 pages
Add and Subtract Fractions Notes Like and Unlike
No ratings yet
Add and Subtract Fractions Notes Like and Unlike
4 pages
07 Properties of Regular Languages
No ratings yet
07 Properties of Regular Languages
13 pages
Finalsample
No ratings yet
Finalsample
6 pages
Main SCM
No ratings yet
Main SCM
3,420 pages
Culinary Calculations Simplified Math For Culinary Professionals 2nd Edition Terri Jones - The Full Ebook Version Is Just One Click Away
100% (24)
Culinary Calculations Simplified Math For Culinary Professionals 2nd Edition Terri Jones - The Full Ebook Version Is Just One Click Away
73 pages
Shewhart, Walter A. - Economic Control of Quality Manufactured Product-American Society For Quality (ASQ) (1980) - 250-274
No ratings yet
Shewhart, Walter A. - Economic Control of Quality Manufactured Product-American Society For Quality (ASQ) (1980) - 250-274
25 pages
Culture and Values A Survey of The Western Humanities 8th Edition Lawrence S. Cunningham Instant Download
No ratings yet
Culture and Values A Survey of The Western Humanities 8th Edition Lawrence S. Cunningham Instant Download
92 pages
Rastall Gravity and f(R,T) Link
No ratings yet
Rastall Gravity and f(R,T) Link
12 pages
National Curriculum Maths Year 1.248203355
No ratings yet
National Curriculum Maths Year 1.248203355
2 pages
Polynomials: 1. Objective Questions
No ratings yet
Polynomials: 1. Objective Questions
7 pages
Highway Geometric Design Solved Problems
100% (1)
Highway Geometric Design Solved Problems
15 pages
Algebra Skills for Students
No ratings yet
Algebra Skills for Students
20 pages
Introduction To Discrete Time Signals and Systems - SBMA1402
No ratings yet
Introduction To Discrete Time Signals and Systems - SBMA1402
157 pages
Cambridge IGCSE: MATHEMATICS 0580/42
No ratings yet
Cambridge IGCSE: MATHEMATICS 0580/42
19 pages

Online Optimal Transport & Algorithms

Uploaded by

Online Optimal Transport & Algorithms

Uploaded by

Stopping times and online algorithms

2 The K-server problem/simplex paging via online convex optimization 6

Figure 1: intuition for optimal transport

Figure 2: counter-example for the existence of T

E(x,y)∼π [c(x, y)]

called the p-Wasserstein distance between µ and η.

1.2 The online transport problems

Notations. Let us introduce a bit of notation.

1.3 The tree case

{u1, u2, u3, u4, u5}

layer I − 1 {u4, u5} {u1, u2, u3}

layer I − 2 {u1, u2}

Figure 3: the constructed random tree for n = 5

We then define the local metric

Notice that w cuts (u, v) at level i if:

2 The K-server problem/simplex paging via online con-

(we use expected values if algorithm ALG is random) and

Figure 4: counter-example for the greedy algorithm’s optimality

If r = x2 then x3 then x2 then x3 , . . . then a possible set of successive moves is ρ2 from x3 to x2

A request rt ∈ X is rewritten as a unit vector et = (0, . . . , 0, 1, 0, . . . , 0) where the 1 is in the

xt+nδ = πP (xt+(n−1)δ − ηe1 )

where πP is the projection on the polytope P.

2.2 “Reminder” on online gradient and online mirror descent

Proposition 2.2. Gradient descent with the usual step size

∀x ∈ X, ft (xt ) − ft (x∗ ) ≤ ∇ft (xt )T (xt − x)

Example 2.3. For gradient descent, xt+1 = xt − η∇f (xt ).

Figure 5: illustration of mirror maps and mirror descent

Proposition 2.5 (properties of the Bregman divergence). The following hold:

Proof. It is almost identitcal to the gradient descent proof:

ft (xt ) − ft (x∗ ) ≤ ∇ft (xt )T (xt − x)

Dϕ (xt , yy+1 ) − Dϕ (xt+1 , yt+1 )

2.3 Back to the paging problem

xm+1 = xm − η∇f (xm )

For mirror descent,

∇ϕ(xt+1 ) = ∇ϕ(xt ) − η∇f (xt )

the projected mirror descent in continuous time writes as

∇2 ϕ(x(t))ẋ(t) ∈ −et − NP (x(t))

Proposition 2.8. This algorithm is supx∈P ∥∇ϕ∥-competitive.

c (y, x) ≤ −⟨e , y − x⟩.

You might also like