[go: up one dir, main page]

0% found this document useful (0 votes)
14 views8 pages

Lecture 19

Uploaded by

Burim Baftijari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

Lecture 19

Uploaded by

Burim Baftijari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Lecture 19: Edit Distance

Agenda:

• LCS

• Edit distance

Reading:

• Text: (partly covered) 350-355

1
Lecture 19: Dynamic Programming

Longest common subsequence (LCS) problem:

Definitions: – Sequence/string:
dynamicprogramming is a sequence over the English alpha-
bet
– Base/letter/character
– Subsequence:
the given sequence with zero or more letters deleted
e.g., dog is a subsequence of dynamicprogramming
WARNing: bases appear in the same order, but not nec-
essarily consecutive
– Common subsequence
– LCS problem: given two sequences X = x1 x2 . . . xn and
Y = y1 y2 . . . ym , find a maximum-length common subse-
quence of them.

• The LCS problem has the “optimal substructure” ...


– if xn is NOT in the LCS (to be computed), then we only
need to compute an LCS of x1 x2 . . . xn−1 and y1 y2 . . . ym
...
– similarly, if ym is NOT in the LCS (to be computed),
then we only need to compute an LCS of x1 x2 . . . xn and
y1 y2 . . . ym−1 ...
– if xn and ym are both in the LCS (to be computed), then
xn = ym and we need to compute an LCS of x1 x2 . . . xn−1
and y1 y2 . . . ym−1 ;
and then adding xn to the end to form an LCS for the
original problem

2
Lecture 19: Dynamic Programming

Longest common subsequence (LCS) problem (cont’d):

• Therefore,
Letting DP [n, m] to denote the length of an LCS of X and
Y , then it is equal to
LCS(x1 x2 . . . xn−1 , y1 y2 . . . ym ),

max length of LCS(x1 x2 . . . xn , y1 y2 . . . ym−1 ),
LCS(x1 x2 . . . xn−1 , y1 y2 . . . ym−1 ) + ‘x0n , if xn = ym

• Correctness

• In general, let DP [i, j] denote the length of an LCS of x1 x2 . . . xi


and y1 y2 . . . yj .

• Recurrence:
If i = 0 or j = 0 then clearly DP [i, j] = 0.
Else
DP [i − 1, j],

DP [i, j] = max DP [i, j − 1],
DP [i − 1, j − 1] + 1, if xi = yj

3
Lecture 19: Dynamic Programming

Longest common subsequence (LCS) problem (cont’d)


— solving the recurrence:

• Pseudocode

procedure dpLCS(X, Y )

n ← length[X]
m ← length[Y ]
for i ← 1 to m do
DP [i, 0] ← 0
for j ← 0 to n do
DP [0, j] ← 0
for i ← 1 to n do
for j ← 1 to m do
if xi = yj then
DP [i, j] ← DP [i − 1, j − 1] + 1
else if DP [i − 1, j] ≥ DP [i, j − 1] then
DP [i, j] ← DP [i − 1, j]
else
DP [i, j] ← DP [i, j − 1]
return DP [n, m]

• Running time: Θ(n × m)


There are n × m entries each takes constant time to compute.

• Step 4: to find the LCS, store extra information and trace


back!

4
Lecture 19: Edit Distance

Editing Sequences:

• Given two sequences S and T over alphabets Σ = {a, b, c}.

• Starting with S, at each step we can either:


– delete a letter from S,
– insert a letter from Σ into S (at any place),
– exchange a letter of S with another one in Σ.

• each of these operations has a cost (given).

• If we use δ to denote “null” then:


– cost of exchanging a with b is: Cost[a, b], and
– cost of deleting c (i.e. replacing it with δ) is Cost[c, δ],
and
– cost of inserting letter b (i.e. replacing δ with b) is
Cost[δ, b].

• Given S, T , and cost table Cost, want to find a sequence of


operations that transform S into T with minimum cost.

• We assume that Cost is metric:


– Cost[α, α] = 0 for α ∈ Σ,
– Cost[α, β] = Cost[β, α], for α, β ∈ Σ ∪ {δ},
– Cost[α, β] ≤ Cost[α, σ] + Cost[σ, β], for α, β, σ ∈ Σ ∪ {δ}.

5
Lecture 19: Edit Distance

Edit Distance:

• Let’s assume S = s1 s2 . . . sn and T = t1 t2 . . . tm .

• Subproblems will be: best edit distance of s1 . . . si and t1 . . . tj


with 1 ≤ i ≤ n and 1 ≤ j ≤ m.

• Let Edit[i, j] denote the minimum cost of editing s1 . . . si into


t1 . . . tj .

• In the optimal solution, either:


– si is deleted and the rest of s1 . . . si−1 is transformed into
t1 . . . tj , or
– s1 . . . si is transformed into t1 . . . tj−1 and we insert tj at
the end, or
– si is changed into tj and the rest of s1 . . . si−1 is trans-
formed into t1 . . . tj−1 .

• Thus:
Edit[i − 1, j] + Cost[si , δ],

Edit[i, j] = min Edit[i, j − 1] + Cost[δ, tj ],
Edit[i − 1, j − 1] + Cost[si , tj ],

Pi Pj
• with Edit[i, 0] = k=1
Cost[sk , δ] and Edit[0, j] = k=1
Cost[δ, tj ].

6
Lecture 19: Dynamic Programming

Edit Distance:

• Pseudocode

procedure Edit-Dist(S, T, Cost)

n ← length[S]
m ← length[T ]
Edit[0, 0] ← 0
for i ← 1 to n do
Edit[i, 0] ← Edit[i − 1, 0] + Cost[S[i], δ]
for j ← 1 to n do
Edit[0, j] ← Edit[0, j − 1] + Cost[δ, T [j]]
for i ← 1 to n do
for j ← 1 to m do
Op1 ← Edit[i − 1, j] + Cost[S[i], δ]
Op2 ← Edit[i, j − 1] + Cost[δ, T [j]]
Op3 ← Edit[i − 1, j − 1] + Cost[S[i], T [j]]
Edit[i, j] = min{Op1, Op2, Op3}
return Edit[n, m]

• Running time: Θ(n × m)


There are n × m entries each takes constant time to compute.

• To find the sequence of operations: keep extra information


and trace back!

7
Lecture 19: Edit Distance

Have you understood the lecture contents?


well ok not-at-all topic

   LCS

   edit distance

   DP for edit distance

You might also like