Dynamic Programming Solution To The Longest Common Subsequence Problem
Dynamic Programming Solution To The Longest Common Subsequence Problem
Methodology
(1) Characterize the Structure of an Optimal Solution. The LCS problem exhibits optimal sub-
structure in the following manner. Given a sequence X =< x1 , x2 , . . . , xm >, we define the ith prefix of X,
for i = 0, 1, . . . , m, as Xi =< x1 , x2 , . . . , xi >.
Claim 1 Let X =< x1 , x2 , . . . , xm > and Y =< y1 , y2 , . . . , yn > be sequences, and let Z =< z1 , z2 , . . . , zk >
be any LCS of X and Y .
1. If xm = yn , then zk = xm = yn and Zk−1 is an LCS of Xm−1 and Yn−1 .
2. If xm 6= yn , then zk 6= xm implies that Z is an LCS of Xm−1 and Y .
3. If xm 6= yn , then zk 6= yn implies that Z is an LCS of X and Yn−1 .
Proof: (1) By contradiction, assume zk 6= xm , then by appending xm = yn to Z, we get a common
subsequence of X and Y of length k + 1, contradicting the supposed optimality of Z. So zk = xm = yn .
Thus, the prefix Zk−1 is a common subsequence of Xm−1 and Yn−1 . Next we show that it is an LCS. Suppose
for the purpose of contradiction that there exists a common subsequence W of Xm−1 and Yn−1 with length
greater than k − 1. We can append xm = yn to W and get a common subsequence of X and Y whose length
is greater than k, which contradicting the supposed optimality of Z.
(2) zk 6= xm implies that Z is a common subsequence of Xm−1 and Y . By contradiction, suppose
that there is a common subsequence W of Xm−1 and Y with length greater than k, then W is a common
subsequence of Xm and Y , contradicting the supposed optimality of Z.
(3) The proof is similar to (2). 2
(2) Recursively Define the Value of the Optimal Solution. Let C[i, j] be the length of an LCS of
the sequences Xi and Yj . If either i = 0 or j = 0, one of the sequences has length 0, and so the LCS has
length 0. If i, j > 0 and xm = yn we should first find an LCS of Xm−1 and Yn−1 and then append xm = yn
to this LCS to get an LCS of X and Y . If i, j > 0 and xm 6= yn , then we must first find an LCS of Xm−1
and Y and an LCS of X and Yn−1 , and then choose the longer one as an LCS of X and Y . We thus have
the following recurrence.
Claim 2
0
if i = 0 or j = 0,
C[i, j] = C[i − 1, j − 1] + 1 if i, j > 0 and xi = yj ,
max(C[i, j − 1], C[i − 1, j]) if i, j > 0 and xi 6= yj .
Proof: The correctness of this recursive definition is embodied in the paragraph which proceeds it. 2
1
(3) Compute the Value of the Optimal Solution Bottom-up. Consider the following piece of pseu-
docode, where X =< x1 , x2 , . . . , xm >, Y =< y1 , y2 , . . . , yn >.
LCS-Length(X, Y )
1 m = X.length
2 n = Y.length
3 let S[1..m, 1..n] and C[0..m, 0..n] be new tables
4 for i = 1 to m
5 C[i, 0] = 0
6 for j = 0 to n
7 C[0, j] = 0
8 for i = 1 to m
9 for j = 1 to n
10 if xi == yj
11 C[i, j] = C[i − 1, j − 1] + 1
12 S[i, j] = “ - ”
13 elseif C[i − 1, j] ≥ C[i, j − 1]
14 C[i, j] = C[i − 1, j]
15 S[i, j] = “ ↑ ”
16 else C[i, j] = C[i, j − 1]
17 S[i, j] = “ ← ”
18 return C and S
Claim 3 When the above procedure terminates, C[i, j] will contain the length of an LCS of the sequences
Xi and Yj , and S[i, j] will point to the table entry corresponding to the optimal subproblem solution chosen
when computing C[i, j].
Proof: The correctness of the above procedure is based on the fact that it correctly implements the recursive
definition given above. The base case is properly handled in Line 4-7, and the recursive case is properly
handled in Lines 8 to 17. Note that since the loop defined in Line 8 goes from 1 to m and the loop defined
in Line 9 goes from 1 to n, no element of C is accessed in either Line 11,13,14 or 16 before it has been
computed. 2
(4) Construct the Optimal Solution from the Computed Information. Consider the following
piece of pseudocode, where S is the table computed above.
Print-LCS(S, X, i, j)
1 if i == 0 or j == 0
2 return
3 if S[i, j] == “ - ”
4 Print-LCS(S, X, i − 1, j − 1)
5 print xi
6 elseif S[i, j] == “ ↑ ”
7 Print-LCS(S, X, i − 1, j)
8 elsePrint-LCS(S, X, i, j − 1)
2
j 0 1 2 3 4 5 6
i yi B D C A B A
0 xi
0 0 0 0 0 0 0
↑ ↑ ↑ - ← -
1 A
0 0 0 0 1 1 1
- ← ← ↑ - ←
2 B
0 1 1 1 1 2 2
↑ ↑ - ← ↑ ↑
3 C
0 1 1 2 2 2 2
- ↑ ↑ ↑ - ←
4 B
0 1 1 2 2 3 3
↑ - ↑ ↑ ↑ ↑
5 D
0 1 2 2 2 3 3
↑ ↑ ↑ - ↑ -
6 A
0 1 2 2 3 3 4
- ↑ ↑ ↑ - ↑
7 B
0 1 2 2 3 4 4
Figure 1: The C and S tables computed by LCS-LENGTH on the sequence X =< A, B, C, B, D, A, B >
and Y =< B, D, C, A, B, A >.
(5) Running Time and Space Requirements. The LCS-Length procedure runs in Θ(mn) since each
table entry takes Θ(1) time to compute, and it uses Θ(mn) additional space in the form of the tables S and
C. The Print-LCS procedure runs in time O(m + n) since it decrements at least one of i and j in each
recursive call. It uses no additional space beyond the inputs given. Thus, the total running time is Θ(mn)
and the total space requirement is Θ(mn).