[go: up one dir, main page]

0% found this document useful (0 votes)
126 views44 pages

MATH858D Markov Chains: Maria Cameron

This document provides an overview of discrete time and continuous time Markov chains. It begins by defining discrete time Markov chains and their transition matrices. It discusses concepts like communicating classes, irreducibility, and absorbing states. It then covers time evolution of probability distributions, hitting times, absorption probabilities, recurrence and transience. The document also discusses continuous time Markov chains, their jump chains, holding times, and class structure. It concludes by covering transition path theory, reactive trajectories, committors, and effective currents for Markov chains.

Uploaded by

adel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views44 pages

MATH858D Markov Chains: Maria Cameron

This document provides an overview of discrete time and continuous time Markov chains. It begins by defining discrete time Markov chains and their transition matrices. It discusses concepts like communicating classes, irreducibility, and absorbing states. It then covers time evolution of probability distributions, hitting times, absorption probabilities, recurrence and transience. The document also discusses continuous time Markov chains, their jump chains, holding times, and class structure. It concludes by covering transition path theory, reactive trajectories, committors, and effective currents for Markov chains.

Uploaded by

adel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

MATH858D

MARKOV CHAINS

MARIA CAMERON

Contents
1. Discrete time Markov chains 2
1.1. Time evolution of the probability distribution 3
1.2. Communicating classes and irreducibility 3
1.3. Hitting times and absorption probabilities 5
1.4. Solving recurrence relationships 11
1.5. Recurrence and transience 13
1.6. Invariant distributions and measures 14
2. Time reversal and detailed balance 18
3. Markov Chain Monte Carlo methods 20
3.1. Metropolis and Metropolis-Hastings algorithms 21
3.2. Ising Model 22
3.3. MCMC for cryptography 22
4. Continuous time Markov chains 23
4.1. Right-continuous random processes 24
4.2. The exponential distribution 24
4.3. Jump chains and holding times 25
4.4. Class structure 29
4.5. Hitting times and absorption probabilities 30
4.6. Recurrence and transience 31
4.7. Invariant distributions and convergence to equilibrium 32
4.8. Time reversal and detailed balance for continuous-time Markov chains 34
5. Transition Path Theory 34
5.1. Settings 35
5.2. Reactive trajectories 35
5.3. The forward and backward committors 36
5.4. Probability distribution of reactive trajectories 37
5.5. Probability current of reactive trajectories 38
5.6. Effective current 39
5.7. Transition rate 39
5.8. Reaction pathways 40
5.9. Simplifications for time-reversible Markov chains 40
5.10. Sampling reactive trajectories and no-detour reactive trajectories 41
References 44
1
2 MARIA CAMERON

1. Discrete time Markov chains


Think about the following problem.
Example 1 (Gambler’s ruin). Imagine a gambler who has $1 initially.
At each discrete moment of time t = 0, 1, . . ., the gambler can play $1 if he
has it and win one more $1 with probability p or lose it with probability
q = 1 − p. If the gambler runs out of money, he is ruined and cannot play
anymore. What is the probability that the gambler will be ruined?
The gambling process described in this problem exemplifies a discrete time Markov
chain. In general, a discrete time Markov chain is defined as a sequence of random variables
(Xn )n≥0 taking a finite or countable set of values which we will denote by S and call the
set of states and characterized by the Markov property: the probability distribution of
Xn+1 depends only of the probability distribution of Xn and does not depend on Xk for
all k ≤ n − 1.
Definition 1. We say that a sequence of random variables (Xn )n≥0 , Xn : Ω → S ⊂ Z, is
a Markov chain with initial distribution λ and transition matrix P = (pij )i,j∈S if
(1) X0 has distribution λ = {λi | i ∈ S} and
(2) the Markov property holds:
P(Xn+1 = in+1 | Xn = in , . . . , X0 = i0 ) = P(Xn+1 = in+1 | Xn = in ) = pin in+1 .
Note that the ith row of P is the probability distribution for Xn+1 conditioned on the
fact that Xn = i. Therefore, all entries of the matrix P are nonnegative, and the row sums
are equal to one:
X X
pij ≥ 0, P(Xn+1 = j | Xn = i) = pij = 1.
j∈S j∈S

A matrix P satisfying these conditions in called stochastic.


Some natural questions about a Markov chain are:
• What is the equilibrium probability distribution, i.e., the one that is preserved from
step to step?
• Does the probability distribution of Xn tend to the equilibrium distribution?
• How one can find the probability to reach some particular subset of states A ⊂ S?
What is the expected time to reach this subset of states?
• Suppose we have selected two disjoint subsets of states A and B. What is the
probability to reach first B rather than A starting from a given state? What is the
expected time to reach B starting from A?
Prior to addressing these question, we will go over some basic concepts.
1.1. Time evolution of the probability distribution. If the set of states S is finite,
i.e., if |S| = N , the P n is merely the nth power of P . If S is infinite, we define P n by
(n)
X X
(P n )ij ≡ pij = ... pii1 pi1 i2 . . . pin−1 j .
i1 ∈S in−1 ∈S
MARKOV CHAINS 3

Notation Pi (Xn = j) denotes the probability that the Markov process starting at i at
time 0 will reach state j at time n:
Pi (Xn = j) := P(Xn = j | X0 = i).
Theorem 1. Let (Xn )n≥0 be a Markov chain with initial distribution λ and transition
matrix P . Then for all n, m ≥ 0
(1) P(Xn = j) = (λP n )j ;
(n)
(2) Pi (Xn = j) = P(Xn+m = j | Xm = i) = pij .

Proof. (1)
X X
P(Xn = j) = ... P(Xn = j, Xn−1 = in−1 , . . . , X0 = i0 )
i0 ∈S in−1 ∈S
X X
= ... P(Xn = j | Xn−1 = in−1 , . . . , X0 = i0 )P(Xn−1 = in−1 , . . . , X0 = i0 )
i0 ∈S in−1 ∈S
X X
= ... P(Xn = j | Xn−1 = in−1 )P(Xn−1 = in−1 | Xn−2 = in−1 ) . . . P(X0 = i0 )
i0 ∈S in−1 ∈S
X X
= ... λi0 pi0 i1 . . . pin−1 j = (λP n )j .
i0 ∈S in−1 ∈S

(2) The second statement is proven similarly.




1.2. Communicating classes and irreducibility. We say that state i leads to state j
(denote it by i −→ j) if
Pi (Xn = j for some n ≥ 0) > 0.
If i leads to j and j leads to i we say that i and j communicate and write i ←→ j. Note
that i leads to j if and only if one can find a finite sequence i1 , . . . , in−1 such that
pii1 > 0, pi1 i2 > 0, . . . , pin−1 j > 0.
(n)
This, in turn, is equivalent to the condition that pij > 0 for some n.
The relation ←→ is an equivalence relation as it is
(1) symmetric as if i ←→ j then j ←→ i;
(2) reflective, i.e., i ←→ i;
(3) transitive, as i ←→ j and j ←→ k imply i ←→ k.
Therefore, the set of states is divided into equivalence classes with respect to the relation
←→ called communicating classes.
Definition 2. We say that a communicating class C is closed if
i ∈ C, i −→ j imply j ∈ C.
4 MARIA CAMERON

Once the chain jumps into a closed class, it stays there forever.
A state i is called absorbing if {i} is a closed class. In the corresponding network, the
vertex i has either only incoming edges, or no incident edges at all.

Example 2 Let us identify the states in the Gambler’s ruin Markov


chain 1 with the number of dollars at each of them. It is easy to see that
states {1, 2, . . .} =: C1 constitute a communication class. The class C1 is
not closed because state 1 ∈ C1 leads to state 0 ∈ / C1 . State 0 is a closed
communicating class {0} =: C0 and an absorbing state.

Definition 3. A Markov chain whose set of states S is a single communicating class is


called irreducible.

Example 3 Let us consider a set of 7 identical particles shaped like


balls interacting according to a sticky potential. I.e., the particles do not
interact, when they do not touch each other, and they stick together as
they touch forming a bond. Some amount of energy needs to be spent in
order to break a bond. One example of such a system is a toy constructor
consisting of magnetic sticks and steel balls. Another example is micron-
size styrofoam balls immersed in water. M. Brenner’s and V. Manoharan’s
group (Harvard University) conducted a number of physical experiments
with such balls. M. Holmes-Cerfon and collaborators developed an efficient
numerical algorithm for enumeration all possible configurations of particles
and calculating transition rates between the configurations. A complete
enumeration has been done for up to 14 particles, an a partial one for
up to 19 [10]. One can model the dynamics of such a particle system as a
continuous-time Markov chain which, in turn, can be converted into a jump
chain, i.e., a discrete-time Markov chain. Such a jump chain for 7 particles
is displayed in Fig. 1. The numbers next to the arrows are the transition
probabilities. This chain was obtained from Fig. 6 in [9]. This Markov
chain is irreducible because the process starting at any configuration, can
reach any other configuration. While there are no direct jumps between
states 2 and 4, the transitions between them can happen in two jumps. So
is true for states 1 and 5. The transition matrix for this chain is given by:
 
0.7395 0.0299 0.0838 0.1467 0
 0.1600 0.1520 0.4880 0 0.2000 
 
(1) P =  0.1713 0.1865 0.4893 0 0.1529 
 0.8596 0 0 0 0.1404 
0 0.2427 0.4854 0.1553 0.1165

1.3. Hitting times and absorption probabilities.


MARKOV CHAINS 5

.7395
.152 1
2 .0299

.16
.0838
.488
.1713
.1865 3
.8596
.2 .2427
.1467
.4893
.1529
.4854
5 .1404

.1553
4
.1165

Figure 1. A jump chain for 7 particles interacting according to a sticky


potential obtained from Fig. 6 in [9].

Definition 4. Let (Xn )n≥0 be a Markov chain with transition matrix P . The hitting time
of a subset A ⊂ S is the random variable τA : Ω → {0, 1, 2, . . .} ∪ {∞} given by
τ A = inf{n ≥ 0 | Xn ∈ A},
where we agree that inf ∅ = ∞.
Definition 5. The probability starting from state i that (Xn )n≥0 ever hits A is
(2) hA A
i = Pi (τ < ∞).
If A is a closed class, hA i is called the absorption probability. The mean time taken for
(Xn )n≥0 to reach A starting from i is
X
(3) kiA = Ei [τ A ] ≡ E[τ A |X0 = i] = nPi (τ A = n) + ∞Pi (τ A = ∞).
n<∞

Example 4 In the Gambler’s ruin example 1, a good question to ask is


what is the probability that the gambler will eventually run out of money
if initially he has i dollars. This probability is 1, the next question is what
is the expected time for the gambler to run out of money. Using the just
{0} {0} {0}
introduced notations, one needs to find hi and, if hi = 1, what is ki .
6 MARIA CAMERON

The quantities hA A
i and ki can be calculated by solving certain linear equations.

Theorem 2. The vector of hitting probabilities hA = {hA i | i ∈ S} is the minimal non-


negative solution to the system of linear equations
(
hAi = 1, i∈A
(4) A
P A
hi = j∈S pij hj , i ∈
/ A.

(Minimality means that is x = {xi | i ∈ S} is another solution with xi ≥ 0 for all i, then
hA
i ≤ xi for all i.)

Proof. First we show that the hitting probabilities satisfy Eq. (4). Indeed, if i ∈ A then
τ A = 0 and hence Pi (τ A < ∞) = 1. If i ∈
/ A, then
X
Pi (τ A < ∞) = Pi (τ A < ∞ | X1 = j)Pi (X1 = j)
j∈S
X X
= Pj (τ A < ∞)pij = hA
j pij .
j∈S j∈S

Now we show that if x = {xi | i ∈ S} is another nonnegative solution of Eq. (4) then
xi ≥ hA A
i for all i ∈ S. If i ∈ A then hi = xi = 1. If i ∈
/ A, we have
X X X X X X
xi = pij xj = pij + pij xj = pij + pij pjk xk
j∈S j∈A j ∈A
/ j∈A j ∈A
/ k∈S
!
X X X X
= pij + pij pjk + pjk xk
j∈A j ∈A
/ k∈A k∈A
/
XX
A A
=Pi (τ = 1) + Pi (τ = 2) + pij pjk xk .
j ∈A
/ k∈A
/

Continuing in this manner we obtain


n
X X X
xi = Pi (τ A = k) + ... pij1 pj1 j2 . . . pjn−1 jn xjn
k=1 j1 ∈A
/ jn ∈A
/
X X
=Pi (τ A ≤ n) + ... pij1 pj1 j2 . . . pjn−1 jn xjn .
j1 ∈A
/ jn ∈A
/

Since xj ≥ 0 for all j ∈ S, the last term in the last sum is nonnegative. Therefore,
xi ≥ Pi (τ A ≤ n) for all n.
Hence
xi ≥ lim Pi (τ A ≤ n) = Pi (τ A < ∞) = hi .
n→∞

MARKOV CHAINS 7

Theorem 3. The vector of mean hitting times k A = {kiA | i ∈ S} is the minimal non-
negative solution to the system of linear equations
(
kiA = 0, i∈A
(5) A
P A
ki = 1 + j∈S pij kj , i ∈
/ A.

Proof. First we show that the mean hitting times satisfy Eq. (5). Indeed, if i ∈ A the
kiA = 0 as τ A = 0. Let us consider two cases.
Case 1: there is i∗ ∈ S\A such that hA i∗ < 1.
A
Case 2: for all i ∈ S\A such that hi = 1.
In Case 1, Eq. (4) implies that all hA / A such that i −→ i∗ . In this case, all
i < 1 for i ∈
kiA = ∞ such that i −→ i∗ by Eq. (3). Hence Eq. (5) holds. Let us consider Case 2. If
i∈/ A then

X
kiA =Ei [τ A ] = nP(τ A = n | X0 = i)
n=1

X X
= n P(τ A = n | X1 = j, X0 = i)Pi (X1 = j)
n=1 j∈S

We can switch order of summation because all terms are positive (this follows from the
monotone convergence theorem). Also the Markov property implies that

P(τ A = n | X1 = j, X0 = i) = P(τ A = n | X1 = j).

We continue:

XX
kiA = nP(τ A = n | X1 = j)Pi (X1 = j)
j∈S n=1

!
X X
= (m + 1)P(τ A = m | X0 = j)pij
j∈S m=0
∞ ∞
!
X X
= mP(τ A = m | X0 = j)pij + P(τ A = m | X0 = j)pij
m=0 m=0
X X ∞
X
= pij kjA + pij P(τ A = m | X0 = j).
j∈S j∈S m=0

Now we use the observe that



X
P(τ A = m | X0 = j) = hA
j =1
m=0
8 MARIA CAMERON

since we are considering Case 2. Finally,


X
pij = 1
j∈S

as this is a row sum of the transition matrix. As a result, we obtain what we have desired:
X
kiA = 1 + pij kjA .
j∈S

Now we show that if {yi | i ∈ S} with yi ≥ 0 for every i ∈ S is another solution of Eq. (5)
then kiA ≤ yi for all i ∈ S. If i ∈ A, then kiA = yi = 0. For i ∈
/ A we have:
!
X X X X
yi =1 + pij yj = 1 + pij yj = 1 + pij 1 + pjk yk
j∈S j ∈A
/ j ∈A
/ k∈A
/
XX
=Pi (τ A ≥ 1) + Pi (τ A ≥ 2) + pij pjk yk .
j ∈A
/ k∈A
/

Continuing in this manner we obtain:


X X
yi =Pi (τ A ≥ 1) + Pi (τ A ≥ 2) + . . . Pi (τ A ≥ n) + ... pij1 pj1 j2 . . . pjn−1 jn yjn
j1 ∈A
/ jn ∈A
/
X X
A A A
=Pi (τ ) = 1 + 2Pi (τ = 2) + . . . + nPi (τ ≥ n) + ... pij1 pj1 j2 . . . pjn−1 jn yjn .
j1 ∈A
/ jn ∈A
/

Since yi ≥ 0, so is the last term. Hence


yi ≥ Pi (τ A = 1) + 2Pi (τ A = 2) + . . . nPi (τ A ≥ n) for all n.
Therefore,

X
yi ≥ nPi (τi = n) = Ei [τ A ] = kiA .
n=1


Example 5 Consider a particle wandering along the edges of a cube Fig.


2(a). If the particle reaches vertices (0, 0, 0) and (1, 1, 1), it disappears.
From each of the other vertices (colored with a shade of grey in Fig. 2(a)),
it moves to any vertex connected to it via an edge with equal probabilities.
Suppose that the particle is initially located at the vertex (0, 0, 1). Find the
probability that it will disappear at vertex (0, 0, 0).
Hint: consider four subsets of vertices:
0 ≡ {(0, 0, 0)},
1 ≡ {(1, 0, 0), (0, 1, 0), (0, 0, 1)},
2 ≡ {(0, 1, 1), (1, 0, 1), (0, 1, 1)}, and
3 ≡ {(1, 1, 1)}
as shown in the Fig. 2(b). Find the probabilities to jump along each arrow
MARKOV CHAINS 9

in Fig. 2(b). Denote by Pi the probability for the particle to disappear at


vertex (0, 0, 0) starting from subset i, i = 0, 1, 2, 3. Write an appropriate
system of equations for Pi and solve it.
Solution 1: Transition probabilities between the subsets 0, 1, 2 and 3 are

(0,1,1) (1,1,1)

(1,0,1)
(0,0,1) 2/3
1/3 1/3

0 1 2 3
(0,1,0) 2/3
(1,1,0)

(0,0,0)
(1,0,0)
(a) (b)

Figure 2. Illustration for Example 5

shown in Fig. 2(b). Let Pi be the probability for the particle to disappear
at (0, 0, 0) provided that it is initially at the subset of vertices i. Then we
have:
P0 = 1;
1 2
P1 = P0 + P2 ;
3 3
2 1
P2 = P1 + P3 ;
3 3
P3 = 0.
The solution of this system is P0 = 1, P1 = 35 , P2 = 25 , P3 = 0.
Solution 2: Transition probabilities between the subsets 0, 1, 2 and 3 are
shown in Fig. 2(b). The probability to get to 0 starting from 1 is the sum
of probabilities to get to 0 from nth visit of 1:

1 2 2(n−1) 1 1
 
X 3
P = = 4 = .
3 3 31− 9 5
n=1
3
Answer: 5.

Example 6 Consider a particle wandering along the edges of a cube like


in Example 5 except for now the only absorbing state is the vertex (0, 0, 0).
If particle is at any other vertex, it goes to one of the vertices connected to
10 MARIA CAMERON

it by an edge with equal probability. Find the expected time for a process
starting at each vertex to be absorbed at (0, 0, 0).
Solution: Taking symmetry into account, we define a reduced Markov

2/3 1
1/3

0 1 2 3
2/3 1/3

Figure 3. Illustration for Example 6

chain shown in Fig. 3. Let ki = Ei [τ 0 ] be the expected first passage time


to (0, 0, 0) provided that it is initially at the subset of vertices i. Then we
have:
k0 = 0;
1 2
k1 = 1 + k0 + k2 ;
3 3
2 1
k2 = 1 + k1 + k3 ;
3 3
k3 = 1 + k2 .
The solution of this system is k0 = 0, k1 = 7, k2 = 9, k3 = 10.

1.4. Solving recurrence relationships. In the case where the Markov chain has an
infinite set of states, Z or {0, 1, 2, . . .}, and only transitions between nearest neighbors are
possible, Eqs. (4) and (5) become linear 2nd order recurrence relationships, homogeneous
and nonhomogeneous respectively. A recipe for solving linear recurrence relationships
with constant coefficients, homogeneous and nonhomogeneous, can be found e.g. here (a
presentation by Niloufar Shafiei).
Second order recurrence relationships can be solved uniquely if one has two initial
(boundary) conditions. However, if the set of states S = {0, 1, 2, . . .} and A = {0} (as
in the Markov chain Gambler’s ruin 1), Eqs. (4) and (5) have only one boundary condi-
tion. The solutions hA and k A are determined by the additional requirements that they
must be minimal and nonnegative.
Now we consider the “Birth-and-death” Markov chain where the coefficients are of the
transition matrix P are
P00 = 1, Pi,i+1 = pi , Pi,i−1 = qi , pi + qi = 1, i ≥ 1.
In this chain, 0 is an absorbing state, and we wish to calculate the absorption probability
starting from an arbitrary state i. Eq. (4) gives:
h0 = 1, hi = qi hi−1 + pi hi+1 , i ≥ 1.
MARKOV CHAINS 11

This recurrence relationship cannot be solved by the tools for the case of constant coeffi-
cients. However, another technique works in this case. Consider
ui := hi−1 − hi .
Subtracting hi from both parts of hi = qi hi−1 + pi hi+1 and taking into account that
qi + pi = 1 we get:
pi ui+1 = qi ui .
Therefore,
   
qi qi qi−1 . . . q1
ui+1 = ui = u1 =: γi u1 .
pi pi pi−1 . . . p1
Then
u1 + u2 + . . . + ui = h0 − h1 + h1 − h2 + . . . + hi−1 − hi = h0 − hi .
Hence
i−1
X
hi = h0 − u1 (1 + γ1 + . . . + γi−1 ) = 1 − u1 γj ,
j=0

as h0 = 1. Here we have defined γ0 = 1. Note that u1 cannot be determined from the


boundary condition h0 = 1. It has to be determined from the condition that h is the
minimal nonnegative solution. Therefore, we need to consider two cases.
P∞
j=0 γj = ∞.: In this case, u1 must be 0. Hence hi = 1 for all i ≥ 0. Hence the
P∞absorption probability is 1 for every i.
j=0 γj < ∞.: In this case, the minimal nonnegative solution will be the one where

hi → 0 as i → ∞.
This will take place if we set
 −1
X∞
u1 =  γj  .
j=0

Then
Pi−1 P∞
γj γj
hi = 1 − Pj=0

j=i
= P∞ .
j=0 γj j=0 γj
Therefore, the absorption probabilities hi < 1 for i ≥ 1.

Example 7 A gambler has $1 initially. At each round, he either wins


$1 with probability p or loses $1 with probability q = 1 − p playing agains
an infinitely rich casino. Find the probability that he gets broke, i.e., his
capital is down to $0.
12 MARIA CAMERON

Solution: Let Pi be the probability to get to the situation of having $0


provided that the initial amount is $i. We have:
P0 = 1;
Pi = pPi+1 + qPi−1 , 1 ≤ i < ∞.
Observe that the probability to get to $0 starting from $1 is the same as the
one to get to $1 starting from $2. Therefore, the probability to get to $0
starting from $2 is the product of the probabilities to get to $1 from $2 and
to get to $0 from $1, i.e., P2 = P12 . Hence, we get the following quadratic
equation for P1 , taking into account that P0 = 1 and q = 1 − p:
P1 = pP12 + 1 − p.
1−p 1−p
Solving it, we get two roots: 1 and p . If p ≤ 1/2, then p ≥ 1, hence
1−p
the only suitable solution is P1 = 1. If p > 1/2, then p < 1, and we
should pick the root P1 = 1−pp . One can see it as follows. Suppose that
there is a maximal amount of money $N that the gambler can get from the
casino. Performing a calculation similar to the one in the previous problem
and letting N → ∞, one can get that P1 → q/p = (1 − p)/p as N → ∞.
Answer: P1 = 1 if p ≤ 1/2, and P1 = 1−p p if p > 1/2.

1.5. Recurrence and transience.


Definition 6. Let (Xn )n≥0 be a Markov chain with transition matrix P . We say that a
state i is recurrent if
(6) Pi (Xn = i for infinitely many n) = 1.
We say that a state i is transient if
(7) Pi (Xn = i for infinitely many n) = 0.
Surprisingly at the first glance, one can show that every state is either recurrent or
transient. This is the consequence of the Markov property. To prove this, we will need the
following definitions.
Definition 7. • The first passage time to state i is the random variable Ti defined
by
Ti (ω) = inf{n ≥ 1 | Xn (ω) = i}, where inf ∅ = ∞.
(r)
• The rth passage time to state i is the random variable Ti defined inductively by
(0) (r+1) (r)
Ti = 0, Ti = inf{n ≥ Ti + 1 | Xn (ω) = i}, r = 0, 1, 2, . . . .
• The length of rth excursion to i is
(
(r) (r−1) (r−1)
(r) Ti − Ti if Ti <∞
Si =
0 otherwise.
MARKOV CHAINS 13

• The return probability is defined by


fi = Pi (Ti < ∞).
z
• The number of visits Vi of state i is the random variable that can be written as the
sum of indicator functions
X∞
Vi = 1{Xn =i} .
n=0

Note that
∞ ∞
" #
X X  
Ei [Vi ] =Ei 1{Xn =i} = E 1{Xn =i}
n=0 n=0
∞ ∞
(n)
X X
(8) = Pi (Xn = i) = pii .
n=0 n=1
Also note that the conditions for a state to be recurrent or transient can be written as
• state i is recurrent if Pi (Vi = ∞) = 1;
• state i is transient if Pi (Vi = ∞) = 0.
Theorem 4. the following dichotomy holds:
(n)
(1) if Pi (Ti < ∞) = 1, then i is recurrent and ∞
P
n=1 pii = ∞;
P∞ (n)
(2) if Pi (Ti < ∞) < 1, then i is transient and n=1 pii < ∞.
In particular, every state is either transient or recurrent.
Proof. (1) Let us denote Pi (Ti < ∞) by fi . First show that
Pi (Vi > r) = fir .
(r) (r) (r−1) (r−1)
Pi (Vi > r) =Pi (Ti < ∞) = Pi (Si < ∞ | Ti < ∞)Pi (Ti < ∞)
(r) (r−1) (r−1) (r−2)
=pi (Si < ∞ | Ti < ∞)Pi (Si < ∞ | Ti < ∞) . . . Pi (Ti < ∞)
=fir .
(2) If fi = Pi (Ti < ∞) = 1, then
pi (Vi = ∞) = lim Pi (Vi > r) = lim fir = lim 1 = 1.
r→∞ r→∞ r→∞
P∞ (n)
Hence i is recurrent and n=0 pii = Ei [Vi ] = ∞.
(3) If fi = Pi (Ti < ∞) < 1, then
Pi (Vi = ∞) = lim Pi (Vi > r) = lim fir = 0.
r→∞ r→∞
Hence i is transient and
∞ ∞ ∞
X (n)
X X 1
pii = Ei [Vi ] = Pi (Vi > r) = fir = < ∞.
1 − fi
n=0 r=0 r=0
14 MARIA CAMERON


Now I will list some facts about recurrence and transience. I will not prove them. Proofs
can be found e.g. in [1].
• In a communicating class, states are either all transient or all recurrent.
• Every recurrent class is closed.
• Every finite closed class is recurrent.
• For a simple random walk on Z, where the entries of the transition matrix are all
zeros except for pi,i+1 = q, pi,i−1 = 1 − q, all states are transient if q 6= 1/2, and all
states are recurrent if q = 1/2.
• For a simple symmetric random walk on Z2 , all states are recurrent.
• For a simple symmetric random walk on Zn , n ≥ 3, all states are transient.

1.6. Invariant distributions and measures.


Definition 8. A measure on a Markov chain is any vector λ = {λi ≥ 0 | i ∈ S}. A
measure is invariant (a. k. a stationary or equilibrium) if
λ = λP.
P
A measure is a distribution if, in addition, i∈S λi = 1.
Theorem 5. Let the set of states S of a Markov chain (Xn )n≥0 be finite. Suppose that for
some i ∈ S
(n)
Pi (Xn = j) = pij → πj as n → ∞.
Then π = {πi | i ∈ S} is an invariant distribution.
(n) P
Proof. Since pij ≥ 0 we have πj ≥ 0. Show that j∈S πj = 1. Since S is finite, we can
swap the order of taking limit and summation:
(n)
X X X (n)
πj = lim pij = lim pij = 1.
n→∞ n→∞
j∈S i∈S i∈S

Show that π = πP :
(n) (n−1) (n−1)
X X X
πj = lim pij = lim pik pkj = lim pik pkj = πk pkj .
n→∞ n→∞ n→∞
k∈S k∈S k∈S

Remark If the set of states is not finite, then the one cannot exchange summation and
(n)
taking limit. For example, limn→∞ pij = 0 for all i, j for a simple symmetric random walk
on Z. {πi = 0 | i ∈ Z} is certainly an invariant measure, but it is not a distribution.
The existence of an invariant distribution does not guarantee convergence to it. For
example, consider the two-state Markov chain with transition matrix
 
0 1
P = .
1 0
MARKOV CHAINS 15

The distribution π = (1/2, 1/2) is invariant as


 
0 1
(1/2, 1/2) = (1/2, 1/2).
1 0
However, for any initial distribution λ = (q, 1 − q) where q ∈ [0, 1/2) ∪ (1/2, 1], the limit
lim P n
n→∞

does not exist as


P 2k = I, P 2k+1 = P.
In order to eliminate such cases, we introduce the concept of aperiodic states.
(n)
Definition 9. Let us call a state i aperiodic, if pii > 0 for all sufficiently large n.
Theorem 6. Suppose P is irreducible and has an aperiodic state i. Then for all states j
(n)
and k, pjk > 0 for all sufficiently large n. In particular, all states are aperiodic.
(r) (s)
Proof. Since the chain is irreducible, there exist such r and s that pji > 0 and pik > 0.
Then for sufficiently large n we have
(r+n+s) (r) (s) (r) (n) (s)
X
pjk = pji1 pi1 i2 . . . pin−1 in pin k ≥ pji pii pik > 0.
i1 ,...,in ∈S


Definition 10. We will call a Markov chain aperiodic if all its states are aperiodic.
Theorem 7. Suppose that (Xn )n≥0 is a Markov chain with transition matrix P and initial
distribution λ. Let P be irreducible and aperiodic, and suppose that P has an invariant
distribution π. Then
P(Xn = j) → πj as n → ∞ for all j.
In particular,
(n)
pij → πj as n → ∞ for all i, j.

A proof of this theorem is found in [1]. In the case where the set of states is finite,
this result can be proven by means of linear algebra. A building block of this proof is the
Perron-Frobenius theorem.
Theorem 8. Let A be an N × N matrix with nonnegative entries such that all entries of
Am are strictly positive for all m > M . Then
(1) A has a positive eigenvalue λ0 > 0 with corresponding left eigenvector x0 where all
entries are positive;
(2) if λ 6= λ0 is any other eigenvalue, then |λ| < λ0 .
(3) λ0 has geometric and algebraic multiplicity one.
16 MARIA CAMERON

Let P be the stochastic matrix for a Markov chain with N states. For sufficiently large
n, all entries of P n for stochastic irreducible aperiodic matrices P become positive. The
proof of this fact is similar to the one of Theorem 6. Furthermore, the largest eigenvalue
of a stochastic matrix is equal to 1. Indeed, since the row sums of P are ones, λ0 = 1 is an
eigenvalue with the right eigenvector e = [1, . . . , 1]> .
Now let us show that the other eigenvalues do not exceed λ0 = 1 in absolute value. Let
(λ, v) be an eigenvalue and a corresponding right eigenvector of a stochastic matrix P . We
normalize v so that
vi = max |vk | = 1.
k∈S
Since X
λvi = pik vk ,
k∈S
we have
1 X 1 X X
|λ| = pik vk ≤ pik |vk | ≤ pik = 1.

vi vi
k∈S k∈S k∈S
Remark The fact that has eigenvalues of a stochastic matrix do not exceed 1 in absolute
value is an instance of the Gershgorin Circle Theorem.
Theorem 9. Every irreducible aperiodic Markov chain with a finite number of states N
has a unique invariant distribution π. Moreover,
(9) lim qP n = π
n→∞
for any initial distribution q.
Proof. The Perron-Frobenius theorem applied to a finite stochastic irreducible aperiodic
matrix P implies that the largest eigenvalue of P is λ0 = 1 and all other eigenvalues are
strictly less than 1 in absolute value. The left eigenvector π, corresponding to λ0 has
positive entries and can be normalized so that they sum up to 1. Hence,
N
X
π = πP, πi = 1.
i=1
Now let us establish convergence. First we consider the case when P is diagonalizable:
P = V ΛU,
where Λ is the matrix with ordered eigenvalues along its diagonal:
 
1
 λ1 
Λ=  , 1 > |λ1 | ≥ . . . ≥ |λN −1 |,
 
..
 . 
λN −1
V is the matrix of right eigenvectors of P : P V = V Λ, such that its first column is
e = [1, . . . , 1]> . U = V −1 is the matrix of left eigenvectors of P : U P = ΛU . The
first row of U is π = [π1 , . . . , πN ]. One can check that if U V = IN , these choices of the
MARKOV CHAINS 17

first
PN column of V and the first row of U are consistent. Therefore, taking into account that
i=1 qi = 1, we calculate:

lim qP n
n→∞
 
1
 
1 ∗ ∗ ∗ π1 π2 . . . πN
1 ∗ ∗ ∗  λn2  ∗ ∗

∗ ∗ 
= lim [q1 q2 . . . qN ]   . ..

... ...
 
n→∞      
1 ∗ ∗ ∗ λnN ∗ ∗ ∗ ∗
 
1

π1 π2 . . . π N
 0
 ∗ ∗ ∗ ∗ 

= [1 0 . . . 0] 

. . 
...

 .  
0 ∗ ∗ ∗ ∗
= [π1 π2 . . . πN ].
In the case when P is not diagonalizable, the argument is almost identical, just a bit
more tedious. We consider the Jordan decomposition of P
P = V JU
where U = V −1 and J is the Jordan form of of P , i.e., a block-diagonal matrix of the form:
 
1
 J1 
J = ,
 
..
 . 
Jr
with the first block being 1 × 1 matrix J0 ≡ 1, and respectively, the first column of V being
[1, . . . , 1]> , and the first row of U being π – the right and left eigenvectors corresponding
to the eigenvalue 1, and the other blocks Ji of sizes mi × mi , where 1 ≤ mi ≤ N − 1 and
m1 + . . . + mr = N − 1, of the form
 
λi 1
 λi 1 
(10) Ji =  =: λi Imi ×mi + E.
 
.. .. 
 . . 
λi

Exercise (1) Check that the matrix E in Eq. (10) with ones right above the diagonal
and all other entries zero is nilpotent. More precisely, E mi = 0mi ×mi .
(2) Check that the matrices λi Imi ×mi and E commute.
(3) Check that
m i −1  
X n
n
Ji = λin−k E k .
k
k=0
18 MARIA CAMERON

(4) Argue that


lim J n = 0mi ×mi
n→∞ i
provided that |λi | < 1.
(5) Now prove Eq. (9) for the case when P is not diagonalizable.


2. Time reversal and detailed balance


For Markov chains, the past and the future are independent given the present. This
property is symmetric in time and suggests looking at Markov chains with time running
backwards. On the other hand, convergence to equilibrium shows that the behavior is
asymmetric in time. Hence, to complete the symmetry in time, we need to start with the
equilibrium distribution.
For convenience, we will use the following notations:
M arkov(λ, P ) denotes the discrete-time Markov chain with initial distribution λ and tran-
sition matrix P .
M arkov(λ, L) denotes a continuous-time Markov chain initial distribution λ and generator
matrix L.
Theorem 10. Let (Xn )0≤n≤N be M arkov(π, P ), where P is irreducible and π is invariant.
Define Yn = XN −n . Then (Yn )0≤n≤N is M arkov(π, P̂ ) where the transition matrix P̂ =
(p̂ij ) defined by
πj pji = πi p̂ij for all i, j ∈ S.
Proof. Note that, since P is irreducible, all components of π are positive. We need to check
the following three facts.
(1) Check that P̂ is a stochastic matrix (i.e., all its entries are nonnegative and its row
sums are equal to 1):
πj
p̂ij = pji ≥ 0.
πi
X 1 X πi
p̂ij = πj pji = = 1.
πi πi
j∈S j∈S
In the last equation, we used the fact that π is invariant for P .
(2) Check that π is invariant for P̂ , i.e., that π P̂ = π:
X X X
πj p̂ji = πi pij = πi pij = πi for all i ∈ S.
j∈S j∈S j∈S

(3) Check that (Yn )0≤n≤N satisfies Markov property.


P(Y0 = i0 , Y1 = i1 , . . . , YN = iN ) = P(X0 = iN , X1 = iN −1 , . . . , XN = i0 )
=πiN piN iN −1 . . . pi1 i0 = p̂iN iN −1 πiN −1 piN −1 iN −2 . . . pi1 i0
= . . . = p̂iN −1 iN . . . p̂i0 i1 πi0 .
Therefore, (Yn )0≤n≤N satisfies Markov property.
MARKOV CHAINS 19


Definition 11. The chain (Yn )0≤n≤N is called the time-reversal of (Xn )0≤n≤N .
Definition 12. A stochastic matrix P and a measure λ are in detailed balance if
λi pij = λj pji .
Suppose the set of states S is finite, the matrix P is irreducible, and the system is
distributed according to the invariant distribution π. The condition of detailed balance
means the following. Let Ni→j (n) be the number of transitions from i to j observed by
time n. Then for all i, j ∈ S,
Ni→j (n)
lim = 1,
n→∞ Nj→i (n)

if P is in detailed balance with π. In words, over large intervals of times, on average, one
observes equal numbers of transitions from i to j and from j to i for all i, j ∈ S given the
detailed balance.
The detailed balance condition gives us another way to check whether a given measure
λ is invariant.
Theorem 11. Let P and λ be in detailed balance. Then λ is invariant for P .
Proof.
X X
(λP )i = λj pji = λi pij = λi .
j∈S j∈S
Hence λP = λ. 
Definition 13. Let (Xn )n≥0 be M arkov(λ, P ) where P is irreducible. We say that (Xn )n≥0
is reversible if for all N ≥ 1, (XN −n )0≤n≤N is M arkov(λ, P ).
Theorem 12. Let P be an irreducible stochastic matrix and let λ be a distribution. Suppose
that (Xn )n≥0 is M arkov(λ, P ). Then the following are equivalent:
(1) (Xn )n≥0 is reversible;
(2) P and λ are in detailed balance.
Proof. Both (1) and (2) imply that λ is invariant for P . Then both (1) and (2) are
equivalent to the statement that P̂ = P . 

3. Markov Chain Monte Carlo methods


As we have discussed, Monte Carlo methods are those where random numbers are used
in order to evaluate something nonrandom. Markov Chain Monte Carlo methods (MCMC)
are those where the estimation is done via constructing a Markov Chain whose invariant
distribution is the desired distribution. MCMC methods are used for numerical approx-
imation of multidimensional integrals. Such integrals arise, e.g., in Bayesian parameter
estimation, computational physics, and computational biology. For example, consider the
20 MARIA CAMERON

problem of finding the expected value of g(η) where η is a random variable with pdf π(x),
x ∈ Rd :
Z
(11) E[g(η)] = g(x)π(x)dx.
x∈Rd

Or, consider the problem of finding the expected value of g(η) in the case where Ω is finite
but huge, i.e., |Ω| = N where N is huge. Let π(ω) be the probability distribution on Ω,
then
X
(12) E[g(η)] = g(η(ω))π(ω),
ω∈Ω

Note that in both of the cases, one rarely knows π per se. Instead, often only a measure f
proportional to π is known. For example, think about the canonical pdf for n particles in
3D: Z
1 −β(V (x)+|p|2 /2) 2
µ(x, p) = e , Z= e−β(V (x)+|p| /2) dxdp.
Z R6n
The normalization constant Z, except for some simple cases, cannot be evaluated analyt-
ically. Therefore, µ(x, p) is, strictly speaking, unknown. However, for each (x, p) one can
calculate the measure
2 /2)
f (x, p) = e−β(V (x)+|p|
that is proportional to µ(x, p)
Therefore, the problem is two-fold:
• The expected value is hard-to-evaluate due to either high dimensionality of the
integral, so that numerical quadrature methods are unappreciable, or due to the
huge number of summands in the sum (think about numbers like N = 2n where
n ∼ 10k , k = 2, 3, 4 . . .). Moreover, π is often far from being uniform, and some
kind of importance sampling is necessary to be able to obtain a satisfactory estimate
using a reasonable number of samples of η.
• The pdf or the probability distribution π is unknown. Instead, f , that is propor-
tional to π, is given.

3.1. Metropolis and Metropolis-Hastings algorithms. We will explain the idea of


the Metropolis algorithm on the example of the task of numerical approximation of the
sum in Eq. (12) where Ω is a finite set, |Ω| = N , N is huge. We wish to construct a
discrete-time Markov chain (Xn )n≥0 , Xn : Ω → {1, . . . , N }, i.e., where the each random
variable Xn is simply an enumeration of the set of outcomes. Therefore, we may think
that the set of states S and the set of outcomes Ω are identical. In order to be able to
approximate the sum in Eq. (12), we need to design a transition matrix P so that the
the desired measure f isP invariant, and for any initial distribution λ, λP n converges to
π := Z −1 f (where Z = i∈S fi ) as n → ∞. Choosing P irreducible and aperiodic, we
guarantee the achievement of the convergence to the unique invariant distribution. To
MARKOV CHAINS 21

make P to have the desired invariant measure f , it suffices to pick P being in detailed
balance with the measure f , i.e., the transition probabilities should satisfy
fi pij = fj pji .
Such a transition matrix is constructed in two steps. As A. Chorin puts it, first do some-
thing stupid and then improve it.
(1) Suppose at time n, Xn = k. Propose a move from state k according to some
irreducible aperiodic transition matrix Q = (qij )ij∈S made-up by you. In the
original Metropolis algorithm, the matrix Q must be symmetric, i.e., qij = qji .
Suppose the proposed move is from state k to state l.
(2) To guarantee that the condition fi pij = fj pji holds, accept the proposed move with
the probability
 
fl
(13) α = min ,1 .
fk
I.e., if the proposed state l is more likely than the current state k, move to the
new state. Otherwise, move there with probability fl /fk or stay at state k with
probability 1 − fl /fk .
As a result, the transition probabilities pij are given by
   
fj X fj
(14) pij = qij min , 1 , pii = 1 − qij min ,1 .
fi fi
j6=i

Let us check that P is in detailed balance with f . Assume i 6= j. Let fj /fi ≤ 1.


Then
fj
fi pij = fi qij = fj qij = fj qji = fi pji .
fi
If fj /fi > 1 then
fj
fi pij = fi qij = fi qji = fi pji = fj pij .
fi
Therefore, we have constructed a discrete-time Markov chain converging to the
desired equilibrium distribution.
The Metropolis-Hastings algorithm is a generalization of the Metropolis algorithms for
the case where the matrix Q is not symmetric, i.e, qij 6= qij for at least one pair of states
(i, j). It differs from the Metropolis algorithm only by the definition of the acceptance
probability α: in the Metropolis-Hastings, α is given by
 
fl qlk
(15) α = min ,1
fk qkl
Therefore, the transition probabilities pij are
   
fj qji X fj qji
(16) pij = qij min , 1 , pii = 1 − qij min ,1 .
fi qij fi qij
j6=i
22 MARIA CAMERON

Exercise Check that P = (pij )i,j∈S and f are in detailed balance.

3.2. Ising Model. A description of the Ising model is found in A. Chorin’s and O. Hald’s
book [?] (see sections 5.4 and 5.4 in the 2nd edition). The 2D Ising model is a popular
toy example for learning the Metropolis algorithm. Due to its simplicity, and interest-
ing theoretical analysis of this model has been conducted. Numerous internet resources
can be readily found. You can find some details regarding its behavior near the critical
temperature e.g. here, here, and here.
The Ising model is also considered on other kinds of lattices and on graphs. Variations
of the Ising model are used, for example, to model opinion dynamics (e.g., click here and
here).

3.3. MCMC for cryptography. The Metropolis algorithm can also be used to decipher
encrypted messages. A popular description of this application can be found e.g. in P.
Diaconis’s paper1.

4. Continuous time Markov chains


We will restrict our attention to the case where the set of states S is finite: |S| = N ∈ N.
Consider a weighted directed graph G(S, E, L), were S is the set of vertices, E is the set
of arcs (directed edges), and L = {Lij }(i→j)∈E is the set of weights. We assume that there
are self-loops of the form (i → i). Abusing notations, we define the generator matrix L as
follows:

Lij ,
 (i → j) ∈ E,
(17) Lij = 0, i 6= j, and (i → j) ∈
/ E,

 P
− k6=i Lik , i = j.
Note that the row sums of the matrix L are zero, all off-diagonal entries of L are nonneg-
ative, while all diagonal entries are nonpositive. For convenience, we will denote the sums
of off-diagonal entries of row i by Li , i.e.,
X
Li := Lij . Note that Lii = −Li .
j6=i

We define the matrix P (t) for t ≥ 0 to be the matrix exponential



X (tL)k
P (t) = etL := .
k!
k=0
Exercise Show that
(1) P (t) satisfies the semigroup property:
P (s + t) = P (s)P (t) for all s, t ≥ 0;
1I thank P. Wertheimer (a graduate student, UMD, MATH) for referring me to cryptography applications
of MCMC and this article in particular.
MARKOV CHAINS 23

(2) P (t), t ≥ 0, satisfies the forward equation


d
P (t) = P (t)L, P (0) = I;
dt
(3) P (t), t ≥ 0, satisfies the backward equation
d
P (t) = LP (t), P (0) = I;
dt
(4) for k = 0, 1, 2, . . ., we have
 k
d
P (t) = Lk ;
dt t=0
(5) P (t) is a stochastic matrix for any t ≥ 0, i.e., its row sums are ones, and all its
entries are nonnegative.
Therefore, we can define a discrete time Markov chain on the graph G(S, E, L) as follows.
Pick an interval of time h and an initial probability distribution λ. Then at the moments
of time 0, h, 2h, . . . we will have a discrete time Markov chain with the initial distribution
λ and the transition matrix P = ehL .
4.1. Right-continuous random processes.
Definition 14. • Let S be a discrete set. A continuous time random process (Xt )t≥0
with values in S is a family of random variables
Xt : Ω → S.
• A random process is right-continuous if for all ω ∈ Ω and all t ≥ 0 there exists
 > 0 such that
Xs (ω) = Xt (ω) for all t ≤ s < t + .
(I.e., if the system is at state i at time t then there exists an interval of time [t, t+)
during which the system will stay at i.)
The reason for considering right-continuous random processes is that the probability of
any event depending on such a process can be determined in terms of its finite-dimensional
distributions, i.e., from the probabilities
P(Xt0 = i0 , Xt1 = i1 , . . . , Xtn = in )
for n ≥ 0, 0 < t0 < t1 < . . . < tn and i0 , i1 , . . . , in ∈ S.
Definition 15. The jump times J0 , J1 , ... of (Xt )t≥0 and holding times S1 , S2 , ... are
defined by
J0 = 0, Jn+1 = inf{t ≥ Jn | Xt 6= XJn }, n = 0, 1, 2, . . . , inf ∅ ≡ ∞,
(
Jn − Jn−1 , if Jn−1 < ∞,
Sn = .
∞, otherwise
24 MARIA CAMERON

4.2. The exponential distribution. We interpret the absolute values of the diagonal
entries of the generator matrix L as the escape rates: Li is the escape rate from the state
i. Correspondingly, if Li > 0, L−1 i is the expected holding time at state i. The off-diagonal
entries Lij , i 6= j of L are often called the pairwise transition rates.
Now we will go over some important properties of exponential random variables. Suppose
the jump time T from i to j is an exponentially distributed random variable with parameter
Li , i.e.,
P(T > t) = e−Li t for all t ≥ 0.
If Li > 0, then T has pdf
(
Li e−Li t , t ≥ 0,
fT (t) =
0, t < 0.
The expected value of T is
Z ∞
1
E[T ] = Li e−Li t tdt = ,
0 Li
i.e., Li is the reciprocal of the expected jump time.
Why did we choose the exponential distribution for the jump times? The reason is that
the exponential random variable is the only random variable that possesses the memoryless
property.
Theorem 13. A random variable T : Ω → [0, ∞] has an exponential distribution if and
only if it has the following memoryless property:
(18) P(T > t + s | T > s) = P(T > t) for all s, t ≥ 0.
Exercise Prove it.
Exercise Show that if T is an exponential random variable with parameter λ and a > 0
then Ta is an exponential random variable with parameter λa.
Theorem 14. Let S be a countable set and let Tk , k ∈ S, be independent exponential
random variables with parameters qk . Let
X
0 < q := qk < ∞.
k∈S

Set
T := inf Tk .
k∈S

Then this infimum is attained at a unique random value K ∈ S, with probability 1. More-
over, T and K are independent, and T is exponential with parameter q, and
qk
P(K = k) = .
q
MARKOV CHAINS 25

Proof. Set K = k if Tk < Tj for all j 6= k, otherwise let K be undefined. Then


P(K = k & T > t) = P(Tk > t & Tj > Tk for all j 6= k)
Z ∞
= qk e−qk s P(Tj > s for all j 6= k)ds
Zt ∞ Y
= qk e−qk s e−qj s ds
t j6=k
Z ∞
qk −qt
= qk e−qs ds = e .
t q
Hence
P(K = k for some k) = 1
and T and K have the claimed joint distribution. 

4.3. Jump chains and holding times. Given a generator matrix L one can define the
jump matrix Π = (πij | i, j ∈ S) as follows:
( (
Lij
, i 6
= j and Li 6
= 0, 0, Li 6= 0,
(19) πij = Li , πii = .
0, i 6= j and Li = 0 1, Li = 0

Now we can give a definition of a continuous-time Markov chain in terms of its jump chain
and holding times.
Definition 16. A right-continuous process (Xt )t≥0 on S is a continuous-time Markov
chain with initial distribution λ and generator matrix L if its jump chain is discrete-time
Markov chain (Yn )n≥0 with initial distribution λ and transition matrix Π defined from L
by Eq. (19) and if for each n ≥ 1, conditional on Y0 , ..., Yn−1 , its holding times S1 , ..., Sn
are independent exponential random variables with parameters LY0 , ..., LYn−1 respectively.
Given a discrete-time Markov chain (Yn )n≥0 with initial distribution λ and transition
matrix Π, and independent random variables T1 , T2 , ... with parameter 1, independent of,
one can construct a continuous time random chain (Xt )t≥0 with the same set of states and
the same initial distribution as follows: Set the holding times and the jump times according
to:
Tn
Sn = , Jn = S1 + . . . + Sn .
LYn−1
Then define
Xt = Yn where n is such that Jn ≤ t < Jn+1 .
Given a continuous-time Markov chain (Xt )t≥0 with initial distribution λ and generator
matrix L, one can construct a discrete-time random chain (Yn )n≥0 with the same set of
states and the same initial distribution as follows. We begin with an initial state X0 = Y0
26 MARIA CAMERON

with distribution λ, and with an array (Tnj | n ≥ 1, j ∈ S) of independent exponential


random variables with parameter 1. Then, inductively, for n = 0, 1, 2, . . ., if Yn = i we set
j
j Tn+1
(20) Sn+1 = , for j 6= i,
Lij
j
(21) Sn+1 = inf Sn+1 ,
j6=i
j
(22) Yn+1 = j = arg min Sn+1 .
j6=i
j
Then, conditional on Yn = i, the random variables Sn+1 are independent exponential
random variables with parametersPLij for all j 6= i. So, conditional on Yn = i, Sn+1 is
exponential with parameter Li = j6=i Lij . Furthermore, Yn+1 has distribution (πij | j ∈
S), and Sn+1 and Yn+1 are independent, and independent of Y0 , ..., Yn and S1 , ..., Sn as
required. This construction shows why we call Li the rate of leaving i (or the escape rate
from i) and Lij the rate of going for i to j (or the transition rate from i to j).

Example 8 Let us convert the continuous-time Markov chain in Fig. 4

2 7 1 2 7/15 1

6 1 6/15
5/8
5 2 1/5

2/15

4 4/5
4 3 4 3
3 3/8

Figure 4. Conversion of a continuous-time Markov chain (left) into the


corresponding jump chain (right). The jump rates are calculated the pair-
wise transition rates using Eq. (19).

(left) to a corresponding jump chain. The result is shown in Fig. 4 (right).


MARKOV CHAINS 27

The continuous-time Markov chain on the right has the generator matrix L
given by
 
0 0 0 0
 7 −15 2 6 
L= .
 0 1 −5 4 
0 5 3 −8
Using Eq. (19) we calculate the jump rates for the corresponding jump
chain on the right:
 
1 0 0 0
 7/15 0 2/15 6/15 
Π=
 0
.
1/5 0 4/5 
0 5/8 3/8 0
Note that the any jump chain is a discrete-time Markov chain, where the
probability to stay at any state unless it is an absorbing state is zero. Vice
versa, given a jump matrix Π and the escape rates Li , i ∈ S, one can
construct the generator matrix L for the corresponding continuous-time
Markov chain.

Theorem 15. Let (Xt )t≥0 be a right-continuous random process with values in a finite
set S. Let L be a generator matrix on S with jump matrix Π. Then the following three
conditions are equivalent:
(1) (jump chain/holding time definition) conditional on X0 = i, the jump chain
(Yn )n≥0 of (Xt )t≥0 is a discrete-time Markov chain and for each n ≥ 1, conditional
on Y0 , ..., Yn−1 , the holding times S1 , ..., Sn are independent exponential random
variables with parameters LY0 , ..., LYn−1 respectively;
(2) (infinitesimal definition) for all t, h ≥ 0, conditional on Xt = i, Xt+h is inde-
pendent of (Xs | s < t) and, as h ↓ 0, uniformly in t, for all j
P(Xt+h = j | Xt = i) = δij + Lij h + o(h),
where δij is the Kronecker symbol:
(
1, i=j
δij =
0, i 6= j
(3) (transition probability definition) for all n = 0, 1, 2, . . ., all times
0 < t0 < t1 < . . . < tn+1 and all states i0 , . . . , in+1
P(Xtn+1 = in+1 | Xt0 = i0 , . . . , Xtn = in ) = P(Xtn+1 = in+1 | Xtn = in )
(23) = pin in+1 (tn+1 − tn ),
where {pij (t) | i, j ∈ S, t ≥ 0} is the solution of the forward equation
P 0 (t) = P (t)L, P (0) = I.
28 MARIA CAMERON

If (Xt )t≥0 satisfies any of these three conditions, we say that it is a continuous-time
Markov chain with the generator matrix L.
Proof. (1) Suppose (1) holds and prove (2). As h ↓ 0,
Pi (Xh = i) ≥ Pi (J1 > h) = e−Li h = 1 + Lii h + o(h),
(recall that Lii = −Li ), and for j 6= i,
Pi (Xh = j) ≥ Pi (J1 ≤ h, Y1 = j, S2 > h)
= (1 − e−Li h )πij e−Lj h = Lij h + o(h).
Thus, for every state j there is an inequality
Pi (Xh = j) ≥ δij + Lij h + o(h).
By taking the finite sum over j we see that these must be equalities: the left-hand
sides sum up to 1 while the right-hand sides sum up to 1 + o(h). Then, by Markov
property, for any t, h ≥ 0, conditional on Xt = i, Xt+h is independent of (Xs | s ≤ t)
and, as h ↓ 0, uniformly in t
P(Xt+h = j | Xt = i) = Pi (Xh = j) = δij + Lij h + o(h).
(2) Suppose (2) holds and prove (3). Set pij (t) = Pi (Xt = j) = P(Xt = j | X0 = i).
For all t, h ≥ 0, as h ↓ 0, uniformly in t
X
pij (t + h) = Pi (Xt = k)P(Xt+h = j | Xt = k)
k∈S
X
= pik (t)(δkj + Lkj h + o(h)).
k∈S

Since S is finite we have


pij (t + h) − pij (t) X
= pik (t)Lkj + o(1).
h
k∈S

So, letting h ↓ 0, we see that pij (t) is differentiable on the right. Then by uniformity
we replace t by t − h in the above and let h ↓ 0 to see first that pij (t) is continuous
on the left, then differentiable on the left, hence differentiable, and satisfies the
forward equations
X
p0ij (t) = pik (t)Lkj , pij (0) = δij .
k∈S

Since S is finite, pij (t) is the unique solution of the forward equation. Also, if (2)
holds then Eq. (23) holds.
(3) Suppose (3) holds and prove (1). Condition (3) determines the finite-dimensional
distributions of (Xt )t≥0 and hence the distribution of jump chain and holding times.
Hence condition (1) is satisfied.

MARKOV CHAINS 29

4.4. Class structure. The concepts i leads to j (i −→ j), i communicates with j (i ←→ j),
communicating class, closed class, absorbing state, and irreducibility are inherited from
discrete time Markov chains.
Theorem 16. For distinct states i, j ∈ S the following are equivalent:
(1) i −→ j;
(2) i −→ j for the jump chain;
(3) Lii1 Li1 i2 . . . Lin−1 j > 0 for some states i1 , ..., in−1 ;
(4) pij (t) > 0 for all t > 0;
(5) pij (t) > 0 for some t > 0.
Proof. Implications (4) ⇒ (5) ⇒ (1) ⇒ (2) are clear. If (2) holds then there are states
i1 , . . . , in−1 such that πii1 πi1 i2 . . . πin−1 j > 0,
which implies (3).
If Lij > 0, then
pij (t) ≥ Pi (J1 ≤ t, Y1 = j, S2 > t) = (1 − e−Li t )πij e−Lj t > 0 for all t.
So, if (3) holds, then
pij (t) ≥ pii1 (t/n)pi1 i2 (t/n) . . . pin−1 j (t/n) > 0 for all t.
Hence (4) holds. 

4.5. Hitting times and absorption probabilities.


Definition 17. Let (Xt )t≥0 be a Markov chain with generator matrix L. The hitting time
of a subset A ⊂ S is the random variable
τ A (ω) = inf{t ≥ 0 | Xt (ω) ∈ A}
with the usual convention inf ∅ = ∞.
The probability that, starting from i, (Xt )t≥0 ever hits A is then
hA A
i = Pi (τ < ∞).

When A is a closed class, hAi is called the absorption probability.


Note that the hitting probabilities are the same as in the jump chain. Therefore, we can
calculate them as we have done it for discrete-time Markov chains.
Theorem 17. The vector of hitting times is the minimal nonnegative solution of
(
hA = 1, i ∈ A,
(24) Pi A
j∈S Lij hj = 0, i ∈
/ A.

Proof. Since the hitting probabilities are the same for continuous-time Markov chains and
the corresponding jump chains, we will start with Eq. (4) where πij are the transition
30 MARIA CAMERON

probabilities. Consider i ∈
/ A. First assume LiP> 0. Then πij = Lij /Li and πii = 0. Then,
taking into account that Lii = −Li and Li = j6=i Lij we have:
X X Lij X Lij Lii
hA
i = πij hA
j = hA
j = hA A
j + hi + hA
i .
Li Li Li
j∈S j6=i j6=i

Canceling hA
i in the right- and left-hand side and then canceling Li we get
X
Lij hA
j = 0.
j∈S

If Li = 0 then i is an absorbing state. Then πii = 1, πij = 0 for j 6= i. Then Eq. (4) gives
hA A
i = hi while Eq. (24) gives 0 = 0. 

The mean hitting times kiA are defined as they were for discrete-time chains.

kiA = Ei [τ A ] = E[τ A | X0 = i]
Theorem 18. Assume that Li > 0 for all i ∈ / A. The vector of mean hitting times
k A = {kiA | i ∈ S} is the minimal nonnegative solution of
(
kiA = 0, i ∈ A,
(25) P A
j∈S Lij kj = −1, i ∈/ A.

The proof follows the same lines as the one for the case of discrete-time Markov chains.
Here we will verify that Eq. (25) must be satisfied. If X0 ∈ A then τ A = 0 and hence
kiA = 0. If X0 ∈
/ A then τ A ≥ J1 , so, by Markov property of the jump chain

Ei [τ A − J1 | Y1 = j] = Ej [τ A ].
Therefore,
X
kiA = Ei [τ A ] = Ei [J1 ] + Ei [τ A − J1 | Y1 = j]Pi (Y1 = j)
j6=i
X
= L−1
i + πij kjA
j6=i
X Lij X Lij
= L−1
i + kjA = L−1
i + kjA + kiA .
Li Li
j6=i j∈S

Canceling kiA and then Li , we obtain


X
Lij kjA + 1 = 0
j∈S

Which implies Eq. (25).


MARKOV CHAINS 31

4.6. Recurrence and transience. We say that a state i is recurrent if


Pi ({t ≥ 0 | Xt = i} is unbounded) = 1.
We say that a state i is transient if
Pi ({t ≥ 0 | Xt = i} is unbounded) = 0.
Theorem 19. Let (Xt )t≥0 be a Markov chain with generator matrix L, and (Yn )n≥0 be the
corresponding jump chain.
(1) If i is recurrent for the corresponding jump chain (Yn )n≥0 then i is recurrent for
the continuous-time chain (Xt )t≥0 .
(2) If i is transient for the corresponding jump chain (Yn )n≥0 then i is transient for
the continuous-time chain (Xt )t≥0 .
(3) Every state is either transient or recurrent.
(4) Recurrence and transience are class properties.
The first passage time to i is defined by
Ti (ω) = inf{t ≥ J1 (ω) | Xt (ω) = i}.
Theorem 20. The following dichotomy holds:
R∞
(1) if Li = 0 or Pi (Ti < ∞) = 1, then i is recurrent and 0R pii (t)dt = ∞;

(2) if Li > 0 and Pi (Ti < ∞) < 1, then i is transient and 0 pii (t)dt < ∞.
4.7. Invariant distributions and convergence to equilibrium. Since we are consid-
ering only Markov chains where the set of states is finite, all invariant measures can be
normalized so that the sum of their entries is one, and therefore, every invariant measure
can be made into an invariant distribution.
Definition 18. We say that λ = {λi | i ∈ S} is an invariant measure if λi ≥ 0 for all i,
λL = 0.
Theorem 21. Let L be the generator matrix and let λ be a measure. Then the following
are equivalent:
(1) λ is invariant;
(2) µΠ = µ where µi = λi Li .
Proof. πij = Lij /Li if i 6= j and Li > 0. πij = 0 if i 6= j and Li = 0. πii = 0 if Li > 0,
πii = 1 if L1 = 0. This can be written as in one line as
Li (πij − δij ) = Lij .
The equality µ = µΠ is equivalent to µ(Π − I) = 0. We have
X X X
(µ(Π − I))j = µi (πij − δij ) = λi Li (πij − δij ) = λi Lij = (λL)j .
i∈S i∈S i∈S

This proves the theorem. 


32 MARIA CAMERON

Theorem 22. Let L be an N × N irreducible generator matrix, and λ be a measure. Let


s > 0 be given. Then the following are equivalent:
(1) λL = 0;
(2) λP (s) = λ.
Proof. By the backward equation
d d
λP (s) = λ P (s) = λLP (s).
ds ds
Hence λL = 0 implies
λP (s) = λP (0) = λI = λ for all s.
d
On the other hand, λP (s) = λ implies that ds λP (s) = 0. Since P (s) is invertible as it is
the fundamental solution matrix of a first order linear differential equation with constant
coefficients, λLP (s) = 0 implies that λL = 0.

Theorem 23. Let L be an N × N irreducible generator matrix. Then there exists a unique
invariant distribution π and for all states i, j ∈ S
lim pij (t) = πj .
t→∞

Proof. Existence and uniqueness of the invariant distribution follows from its existence and
uniqueness for the matrix P (t) = etL . Note that all entries of P (t) are positive for all t > 0
owing to irreducibility and Theorem 16. Applying the Perron-Frobenius theorem (Theorem
8) we conclude that P (t) has an eigenvalue µ0 = 1 whose algebraic multiplicity is equal to
1 and µ0 > |µk | for all other eigenvalues µk of P . The unique invariant distribution π is
the left eigenvector of P (t) corresponding to the eigenvalue µ0 = 1, i.e.,
π = πP (t).
By Theorem 22, this is equivalent to πL = 0. Hence there exists a unique invariant
distribution π such that πL = 0.
Now we will show convergence to π. We conduct the proof for the case when L is diago-
nalizable. It it is not, the proof is similar but a bit more tedious (see the proof of the anal-
ogous theorem (Theorem 9) for discrete-time Markov chains.) Let λ0 = 0, λ1 , . . . , λN −1
be eigenvalues of L. If
L = V ΛU
is the eigendecomposition of L then P (t) has the eigendecomposition
 
1
tL
 e λ1 t 
P (T ) ≡ e = V   U.
 
. .
 . 
eλ N −1 t

Therefore, the eigenvalues of P (t) are


µ0 = 1, µ1 = eλ1 t , . . . , µN −1 = eλN −1 t .
MARKOV CHAINS 33

Since |µk | < 1 for 1 ≤ k ≤ N − 1, we conclude that the real parts of λk , k = 1, 2, . . . , N − 1


are negative, i.e.,
Re(λk ) < 0 for k = 1, 2, . . . , N − 1.
Hence
lim eλk t = 0 for k = 1, 2, . . . , N − 1.
t→∞
V is the matrix of right eigenvectors of L. Since row sums of L are zeros, the first
eigenvector corresponding to the zero eigenvalue, i.e. the first column of V , must be
e = [1 1 . . . 1]> . The first row of U , the matrix of left eigenvectors of L must be the
invariant distribution π. We will denote the rest of right eigenvectors by vk and the rest
of left eigenvectors by φk , i.e.,
 
π
 φ1 
V = [e φ1 . . . φN −1 ], U =  ..  .
 
 . 
φN −1
Then
  
1 π
 e λ1 t  φ1 
P (t) = [e φ1 . . . φN −1 ] 
  
..  .. 
 .  . 
eλN −1 t φN −1
 
1 N −1
 ..  X
(26) =  .  [π1 . . . πN ] + vk eλk t φk
1 k=1
 
π1 . . . π N
 .. ..  as t → ∞.
→ . . 
π1 . . . π N


4.8. Time reversal and detailed balance for continuous-time Markov chains. For
continuous-time Markov chains analogous results for time reversal and detailed balance take
place. Let L be an irreducible generator matrix and π be an invariant distribution. The
generator matrix L̂ for the time-reversal is defined by
πi L̂ij = πj Lji .
The detailed balance condition reads
πi Lij = πj Lji .
Analogs of Theorems 10, 11, 12 hold.
34 MARIA CAMERON

5. Transition Path Theory


The Transition Path Theory (TPT) was introduced by W. E and E. Vanden-Eijnden in
2006 [7, 8] in the context of stochastic differential equations. They contraposed it to the
Transition State Theory (TST) developed by Eyring and Polanyi in 1930s. In a nutshell,
TPT is a conceptual apparatus for describing reactive events. The key concept of the TPT,
the committor function, is a solution of a boundary value problem of a certain elliptic PDE
that cannot be solved in practice in dimensions higher than 3. Then, in 2009, Metzner,
Schuette, and Vanden-Eijnden [11] extended TPT to continuous-time Markov chains (a.k.a
Markov jump processes (MJP)). Since the application of the TPT to MJP is hinged to
finding the committor that, in this case, is the solution to a system of linear algebraic
equations which, in practice, can be either readily done or done after some additional
work, the TPT has become a powerful practical tool for analysis of transition processes
in complex networks. For example, one of the benchmark problems in chemical physics,
the rearrangement of the Lennard-Jones cluster of 38 atoms was analyzed using the TPT
and resulted in a detailed description of the transition mechanism between the two lowest
potential energy minima [5].

5.1. Settings. We will consider a continuous-time Markov chain with a finite set of states
S, |S| = N , and irreducible generator matrix L. This Markov chain can be represented as
a network, where states correspond to the vertices of the graph, two vertices i and j are
connected by a directed edge if an only if Lij > 0. If both Lij > 0 and Lji > 0, we will
draw an undirected edge between i and j.
Let A and B be selected nonintersecting subsets of S. For simplicity, we assume that
there exists no edge (i, j) such that i ∈ A and j ∈ B, i.e., one cannot get from A to B
without spending some time in S\(A∪B) ≡ (A∪B)c . The sets A and B can be interpreted
as the reactant set and the product set respectively. For example, if you are modeling a
protein folding, A can be a collection of unfolded states, while B – a collection of folded
states.
There exists a unique invariant distribution π = (πi )i∈S , i.e., πL = 0. We do not
assume that π and L are in detailed balance. We will also need to consider a family
of time reversed chains (X̂t )t∈R , X̂t = Xτ −t where τ is some moment of time. The generator
matrix for the time reversed process is L̂ = (L̂ij )i,j∈S defined by
πj
L̂ij = Lji .
πi
5.2. Reactive trajectories. The subject of TPT is reactive trajectories that are defined
as follows. Consider a very long trajectory starting from an arbitrary state i, i.e., (Xt )t∈R
such that X0 = i. Since the Markov chain is irreducible and finite, every state is recurrent.
Hence this trajectory will visit all of the states infinitely many times with probability 1.
Let us prune those pieces of it that go from A to B, i.e., we will detect the collections of
moments of time {tA B
n }n∈N and {tn }n∈N such that

tA B A
n < tn < tn+1 n ∈ N,
MARKOV CHAINS 35

lim X(t) = xA
n ∈ A, X(tB B
n ) = xn ∈ B,
t↑tA
n

for any t ∈ [tA B c


n , tn ) X(t) ∈ (A ∪ B) .
In words, tA
n is the moment of time when the trajectory leaves A nth time so that it does
not return to A prior reaching B, and tB n is the nth time when the trajectory enters B.
The intervals [tA B
n , tn ) are called reactive times. The union of the reactive times is denoted
by R: [
R := (tA B
n , tn ).
n∈Z
Now consider the corresponding jump chain and the jump trajectory (Yk )k≥0 .
Definition 19. The ordered sequence
φn = [xA 1 kn B
n , xn , . . . , xn ≡ xn ]
consisting of successive states of the jump chain (Yk )k∈Z visited during the nth transition
from A to B is called the nth reactive trajectory. The set of all such sequences is called the
set of reactive trajectories.
The concept of reactive trajectory is illustrated in Fig. 5.
5.3. The forward and backward committors.
Definition 20. The forward committor q + = (qi+ )i∈S is the probability that the process
starting at state i will first reach B rather than A, i.e.,
qi+ = Pi (τB+ < τA+ ),
where
τA+ = inf{t > 0 | X(t) ∈ A}, τB+ = inf{t > 0 | X(t) ∈ B}
are the first entrance times to A and B respectively.
The backward committor q − = (qi− )i∈S is the probability that the process arriving at state
i last came from A rather than B. Equivalently, the backward committor q − = (qi− )i∈S is
the probability that the time-reversed process starting at state i will first reach B rather
than A, i.e.,
qi− = Pi (τA− < τB− ),
where
τA− = inf{t > 0 | X̂(t) ∈ A}, τB− = inf{t > 0 | X̂(t) ∈ B},
are the last exit times from A and B respectively. Here (X̂t )t∈R is the time-reversed process
for (Xt )t∈R , i.e., X̂t = X−t , t ∈ R.
The forward and backward committors satisfy the following equations:
P
+ c
 j∈S Lij qj = 0, i ∈ (A ∪ B) ,

(27) qi+ = 0, i ∈ A,
 +

qi = 1, i ∈ B,
36 MARIA CAMERON

1 3 7

10
A 19 2
11
6

12
4
20
5 24
21 8 B
9 23

22 13
14 26
17
25
18 15
16

Figure 5. Two examples of reactive trajectories are shown in red. Re-


active trajectory 1: [19, 2, 3, 7, 6, 2, 4, 6, 12, 23]. Reactive trajectory 2:
[22, 17, 16, 14, 9, 5, 8, 13, 15, 25].

and
P

 j∈S L̂ij qj = 0,
 i ∈ (A ∪ B)c ,
(28) qi− = 1, i ∈ A,
 −

qi = 0, i ∈ B,
where L̂ is the generator matrix for the time-reversed process.
Eq. (27) is justified as follows. Let us modify out network and make all states in A
absorbing, i.e., Lij = 0 for all i ∈ A. The other Lij ’s are unchanged. Then Eq. (27)
becomes the equation for the hitting probabilities for the set B for the modified network.
I. e., qi+ is the probability that the process starting at i will hit B prior being absorbed by
one of the states in A. This is exactly what the forward committor is. A similar argument
applied to the reversed process shows that the backward committor satisfies Eq. (28).
5.4. Probability distribution of reactive trajectories. What is the probability to find
a reactive trajectory at state i at any time t? To answer this question, consider an infinitely
long trajectory (Xt )t∈R where X0 is distributed according to the invariant distribution π.
For any fixed time t, the probability to find Xt at state i is πi . If Xt = i where i ∈ A or
MARKOV CHAINS 37

i ∈ B, time t is not reactive, hence this probability is 0. If Xt = i where i ∈ (A ∪ B)c , we


need to take the probability πi to find Xt at i and multiply it by the probability that Xt
came to i from A and will go next to B, i.e., by qi− qi+ . Therefore, the probability to find a
reactive trajectory at state i at any time t is given by
− +
(29) mR
i = π i qi qi .

In [11], mR R
i is called the probability distribution of reactive trajectories. Note that m is
not a distribution, as it is not normalized. It is a measure. The normalization constant for
mRi , X X
ZR = mRi = πi qi− qi+ ,
i∈S i∈S
is the probability that any given t belongs to the set of reactive times, i.e.,
[
ZR = P(t ∈ (tA B
n , tn )) ≡ P(t ∈ R).
n∈Z

5.5. Probability current of reactive trajectories. The probability current of reactive


trajectories along edge (i → j) is defined as the average number of transitions for i to j
per unit time performed by reactive trajectories. This probability current denoted by fij
is given by
(
πi qi− Lij qj+ , i 6= j,
(30) fij =
0, i = j.
Indeed, the product πi qi− gives the probability that the trajectory arrived at i from A rather
than from B. Lij is the transition rate from i to j, and the factor qj+ is the probability
that the trajectory from j will go next to B rather than to A.
It follows from Eq. (30) that the probability current of reactive trajectories along every
edge (i, j) is nonnegative. Note that for an edge (i, j) where i, j ∈ (A ∪ B)c both fij and
fji can be positive. This reflects the fact that reactive trajectories can go many times back
and forth across the edge (i, j) on their way from A to B. The next theorem says that the
probability current in neither produced nor absorbed at any state j ∈ (A ∪ B)c .
Theorem 24. For all i ∈ (A ∪ B)c , the probability current is conserved, i.e., the amount
of current coming to state i equals to the amount of current going out of state i:
X
(31) (fij − fji ) = 0 for all i ∈ (A ∪ B)c .
j∈S

Proof. Let i ∈ (A ∪ B)c . Plugging in Eq. (30) to Eq. (31) we obtain


X X
(fij − fji ) = (πi qi− Lij qj+ − πj qj− Lji qi+ )
j∈S j6=i
X X
= πi qi− Lij qj+ − qi+ πj Lji qj− .
j6=i j6=i
38 MARIA CAMERON

It follows from Eqs. (27) and (28) that


X
Lij qj+ = Li qi+
j6=i

and X X πi
πj Lji qj− = πj L̂ij qj− = πi L̂i qi− = πi Li qi− .
πj
j6=i j6=i

Therefore,
X
(fij − fji ) = πi qi− Li qi+ − qi+ πi Li qi− = 0.
j∈S


5.6. Effective current. As we have mentioned, the reactive trajectories can go back in
forth along an edge (i, j) where i, j ∈ (A ∪ B)c on their way from A to B making both fij
and fji positive. The difference fij − fji is the net current from i to j carried by reactive
trajectories from i to j. The nonnegative part of fij − fji , denoted by fij+ , is called the
effective current:
(32) fij+ := max{fij − fji , 0}.

5.7. Transition rate. The transition rate from A to B (the reaction rate) is the average
number of transitions per unit time performed by an infinite trajectory (Xt )t∈R . It is equal
to the total reactive current coming out of A which is the same as the total reactive current
going into B, i.e.,
X X
νR = fij = fij+
i∈A, j∈S i∈A, j∈S
X X
(33) = fij = fij+ .
i∈S, j∈B i∈S, j∈B

One can obtain another expression for the reaction rate νR as the total reactive current
through an arbitrary cut. A cut in a network G(S, E) is a partition of the nodes in S
into two disjoints subsets that are joint by at least one edge in E. The set of edges whose
endpoints are in different subsets of the partition is referred to as the cut-set. Here we will
focus on A-B-cuts that are such that A and B are on different sides of the cut-set. Any
A-B-cut leads to the decomposition S = SA ∪ SB such that SA ⊇ A and SB ⊇ B (see Fig.
6).
Theorem 25. The transition rate νR is given by
X X
(34) νR = Fi,j ,
i∈SA j∈SB

where Fi,j := fij − fji and SA ∪ SB is an arbitrary AB-cut.


MARKOV CHAINS 39

CSLA CSRB

A B

Figure 6. Illustration for the concept of an A-B-cut. The edges of the


cut-set are shown with dashed lines.

Proof. We will use the fact that for any subset S 0 ⊂ S,


X
(35) Fi,j = 0
i∈S 0 ,j∈S 0

because for every term Fi,j = fij − fji in this sum there is a term −Fi,j = fji − fij . We
have:
X X
Fi,j = Fi,j
i∈SA ,j∈SB i∈A∪(SA \A)
j∈(S\SA )
X X X
(36) = Fi,j + Fi,j − Fi,j
i∈A i∈SA \A i∈SA
j∈S j∈S j∈SA

(37) = νAB + 0 − 0 = νAB .

The second sum in (36) is zero by current conservation, while the third sum is zero by
(35). 

5.8. Reaction pathways. The effective current f + = (fij+ )i,j∈S defined by Eq. (32)
induces a directed graph with the set of states S. In other words, we connect states i and
j by a directed edge (i → j) if and only if fij+ > 0. We denote this graph by G{f + }.

Definition 21. A reaction pathway w = (i0 , i1 , . . . , in ) is a simple (containing no loops)


directed path in the graph G{f + } such that

i0 ∈ A, in ∈ B, ik ∈ (A ∪ B)c , 1 ≤ k ≤ n − 1.
40 MARIA CAMERON

5.9. Simplifications for time-reversible Markov chains. The case where the Markov
chain is time reversible, i.e., L̂ = L which is equivalent to the statement that L and π are
in detailed balance, i.e.,
πi Lij = πj Lji
is worth of special consideration. Many interesting systems possess this property, and the
formulas for the backward committor, the reactive current and for the transition rate can
be given in terms of the forward committtor.
Exercise (1) Show that the forward and backward committor are related via
qi− = 1 − qi+ , i ∈ S.
Hence we can simplify the notations: denote the forward commuter by q = (qi )i∈S .
Then the backward commuter is merely 1 − q.
(2) Show that the reactive current Fij := fij − fji is given by
Fij = πi Lij (qj − qi ).
(3) Starting
P from the expression for the transition rate from A to B (the reaction rate)
νR = i∈A,j∈S Fij , show that it can be rewritten as
1 X
(38) νR = πi Lij (qj − qi )2 .
2
i,j∈S
Besides the transition rate νR , one can consider the rates kA,B and kB,A defined as the
inverse of the average times the last set hit by the trajectory was A or B, respectively.
These rates are given by
(39) kA,B = νR /ρA , kB,A = νR /ρB ,
where
X X
(40) ρA = πi (1 − qi ), ρB = πi qi (ρA + ρB = 1)
i∈S i∈S
are the proportions of time such that the trajectory last hit A or B, respectively.
The directed graph G{f + } induced by the effective current contains no directed cycles in
the case of detailed balance because every its directed edge connects a state with a smaller
value of the committor q with a state with a large value of the committor. As a result, the
committor is strictly increasing along every directed path in the graph G{f + } (see Fig. 7).

We can use cuts to characterize the width of the transition tube carrying the current
of reactive trajectories. A specific set of cuts is convenient for this purpose, namely the
family of isocommittor cuts which are such that their cut-set C is given by
(41) C(q ∗ ) = {(i, j) | qi ≤ q ∗ , qj > q ∗ }, q ∗ ∈ [0, 1).
Isocommittor cuts [5] are special because if i ∈ CL and j ∈ CR , the reactive current
between these nodes is nonnegative, Fij ≥ 0, which also mean that every reaction pathway
(no-detour reactive trajectory) contains exactly one edge belonging to an isocommittor cut
since the committor increases monotonically along these transition paths. Therefore, we
MARKOV CHAINS 41

Figure 7. Examples of reaction pathways in the case of detailed balance


are shown by blue arrows. The values of the committor are coded by color:
green: q = 0, blue: q = 1. Note that the sequences of values of the commuter
strictly increase along reaction pathways.

can sort the edges in the isocommittor cut C(q) according to the reactive current they
carry, in descending order, and find the minimal number of edges N (q) carrying at least
p% of this current. By doing so for each value of the committor 0 ≤ q ≤ 1 and for different
values of the percentage p ∈ (0, 100), one can then analyze the geometry of the transition
channel - how broad is it, how many sub-channels are they, etc.
Remark In the case of time-reversible Markov chains, the forward committor strictly
increases along the edges of the graph G({f = }) (check this!). Therefore, the committor
strictly increases along the reaction pathways. The reaction pathways were dubbed no-
detour reactive trajectories in [5].

5.10. Sampling reactive trajectories and no-detour reactive trajectories. In this


section, we assume that the Markov chain under consideration is time-reversible (i.e., π
and L are in detailed balance). The probability current of reactive trajectories fij and
effective current fij+ allow us to sample reactive trajectories and reaction pathways.
Recall that he probability current of reactive trajectories fij is the mean number of
transitions performed by the reactive trajectories along the edge (i, j) per unit time, i.e.,
42 MARIA CAMERON

the transition rate for the reactive trajectories along the edge (i, j), we can replace the
original generator matrix L with the generator matrix F whose off-diagonal entries are fij
and the diagonal ones are defined so that the row sums of F are zeros.
Then one can convert the generator matrix F into the corresponding jump matrix ΠF
according to Eq. (19) and generate a collection of reactive trajectories starting from states
in A having outgoing edges.
One can do the same procedure but using the effective current f + instead of the prob-
ability current of reactive trajectories f . The detailed balance condition guaranties that
these pathways are simple and the committer strictly increases along them.
Propositions justifying these instructions are found in [5].

Example 9 Consider the continuous-time Markov chain generated by a


discretization of the overdamped Langevin dynamics
dx ∂ √
= − V (x, y) + 2T ηx ,
dt ∂x
dy ∂ √
= − V (x, y) + 2T ηy ,
dt ∂y

where ηx and ηy are independent white noises, on a 2D regular mesh with


step h:
{(yi , xj ) | 1 ≤ i, j ≤ N, yi = ymin + (i − 1)h, xj = xmin + (j − 1)h}.
We assume that the boundaries of the domain [x1 , xN ] × [y1 , yN ] are reflect-
ing. From any non-boundary mesh point (state) (i, j) only transitions to its
four nearest neighbors (i + 1, j) (North), (i − 1, j) (South), (i, j + 1) (East),
and (i, j − 1) (West) are possible. The corresponding pairwise transition
rates are
1 ∂V T
(42) (i, j) → (i + 1, j) : LN (i, j) = − (i, j) + 2
2h ∂y h
1 ∂V T
(43) (i, j) → (i − 1, j) : LS (i, j) = (i, j) + 2
2h ∂y h
1 ∂V T
(44) (i, j) → (i, j + 1) : LE (i, j) = − (i, j) + 2
2h ∂x h
1 ∂V T
(45) (i, j) → (i, j − 1) : LW (i, j) = (i, j) + 2
2h ∂x h
For the top and bottom boundaries LN and LS are modified as
2T
(46) LN (N, j) = 0, LN (1, j) = ,
h2
2T
(47) LS (1, j) = 0, LS (N, j) = 2 ,
h
MARKOV CHAINS 43

while LE and LW are found by Eqs. (44)-(45). For the left and right
boundaries LE and LW are modified as
2T
(48) LE (i, N ) = 0, LE (j, 1) = 2 ,
h
2T
(49) LW (i, 1) = 0, LW (i, N ) = 2 ,
h
while LN and LS are found by Eqs. (42)-(43). Therefore, the generator
matrix L is 5-diagonal. Its diagonal entries are all equal to −4T /h2 . The
forward and backward committors for this system are shown in Fig. 8 (a)
and (b) respectively. A reactive trajectory superimposed with the probabil-
ity distribution of reactive trajectories is shown in Fig. 8 (c). A collection
of reaction pathways superimposed with the probability distribution of re-
active trajectories is shown in Fig. 8 (d).

References
[1] J. R. Norris, “Markov Chains”, Cambridge University Press, 1998
[2] A. Bovier, Metastability, in “Methods of Contemporary Statistical Mechanics”, (ed. R. Kotecky), LNM
1970, Springer, 2009
[3] A. Bovier, M. Eckhoff, V. Gayrard, and M. Klein, Metastability and Low Lying Spectra in Reversible
Markov Chains, Comm. Math. Phys. 228 (2002), 219-255
[4] A. Bovier, Metastability, in “Methods of Contemporary Statistical Mechanics”, (ed. R. Kotecky), LNM
1970, Springer, 2009
[5] Flows in Complex Networks: Theory, Algorithms, and Application to Lennard-Jones Cluster Re-
arrangement, M. Cameron and E. Vanden-Eijnden, Journal of Statistical Physics 156, 3, 427-454
(2014), arXiv:1402.1736
[6] M. Cameron, Metastability, Spectrum, and Eigencurrents of the Lennard-Jones-38 Network, J. Chem.
Phys., (2014), 141, 184113, arXiv: 1408.5630
[7] W. E and E. Vanden-Eijnden, Towards a Theory of Transition Paths, J. Stat. Phys., 123 (2006), 503
[8] W. E and E. Vanden-Eijnden, Transition-Path Theory and Path-Finding Algorithms for the Study of
Rare Events, Ann. Rev. Phys. Chem., 61 (2010), 391–420
[9] M. Holmes-Cerfon, S. J. Gortler, and M. P. Brenner, A geometrical approach to computing free-energy
landscapes from short-ranged potentials, PNAS January 2, 2013 110 (1) E5-E14
[10] M. Holmes-Cerfon, Sticky-Sphere Clusters, Annual Review of Condensed Matter Physics Vol. 8:77-98
[11] Metzner, P., Schuette, Ch., Vanden-Eijnden, E.: Transition path theory for Markov jump processes.
SIAM Multiscale Model. Simul. 7, 1192 – 1219 (2009)
44 MARIA CAMERON

1 1
2 2
0.8 0.8
1 A 1 A

0.6 0.6
0 0
y

y
0.4 0.4
−1 −1
0.2 0.2
B
−2 −2 B
0 0
−2 −1 0 1 2 −2 −1 0 1 2
x x

(a) (b)
−3 −3
x 10 x 10
2 8 2 8

1 A 1 A
6 6

0 0
y

y
4 4

−1 −1
2 2
B B
−2 −2
0 0
−2 −1 0 1 2 −2 −1 0 1 2
x x

(c) (d)

Figure 8. Example 9. Thin white curves show level sets of the 7-well po-
tential. Thick white curves represent the boundaries of the sets A and B.
(a): The forward committor. (b): The backward committor. (c): A reac-
tive trajectory superimposed with the probability distribution of reactive
trajectories. (d): A collection of reaction pathways superimposed with the
probability distribution of reactive trajectories.

You might also like