0% found this document useful (0 votes)

22 views62 pages

Lecture 3

Uploaded by

chunfeng277

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views62 pages

Lecture 3

Uploaded by

chunfeng277

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

CS 466

Introduction to Bioinformatics
Lecture 3
Mohammed El-Kebir
September 6, 2024
Course Announcements
Instructor:
• Mohammed El-Kebir (melkebir) HW1: To be released on 09/13
Office hours: Wednesdays, 3:30-4:30pm in and due 09/21 at 11:59pm
Siebel 3216 (not today)

Piazza / Gradescope: Midterm:

• https://piazza.com/illinois/fall2024/cs466 week of 10/09 – 10/11,
• Gradescope entry code: ‘VDEE52’ details to follow

TA:
• Nicole (nsdong2): Mondays 4-5pm in CS Tutoring Center (Siebel basement)
• Claire (czchou2): Tuesdays 2-3pm in CS Tutoring Center (Siebel basement)
• Mrinmoy (mroddur2): Thursdays 2-3pm in Siebel 2219
2
Outline
1. Edit distance
2. Review elementary graph theory
3. Manhattan Tourist problem
4. Longest/shortest paths in DAGs
5. Global alignment
6. Fitting alignment

Reading:
• Jones and Pevzner. Chapters 2.7-2.9 and 6.1-6.4, 6.6
• Lecture notes
3
Alignment
An alignment between two strings v (of m characters) and w (of n characters)
is a two row matrix where the first row contains the characters of v in order,
the second row contains the characters of w in order, and spaces may be
interspersed throughout each.

Input Output

v: KITTEN (m = 6) v: K - I T T E N -
w: SITTING (n = 7) w: S I - T T I N G

Question: Is this a good alignment?

Answer: Count the number of insertion, deletions, substitutions.
4
Computing Edit Distance
Edit Distance problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " , compute the
minimum number 𝑑(𝐯, 𝐰) of elementary operations to transform 𝐯 into 𝐰.

v: ATGTTAT... deletion insertion mismatch match

w: AGCGTAC...
𝑖−1 𝑖
prefix of 𝐯 of length 𝑖 𝐯# : A T - G T T T
prefix of 𝐰 of length 𝑗 𝐰$ : A G C G T - C
𝑗−1 𝑗
Optimal substructure:
Edit distance obtained from edit distance of prefix of string.
5
Computing Edit Distance – Dynamic Programming 8
>
> 0, if i = 0 and j = 0,
>
>
… 𝐯" … - … 𝐯" … 𝐯" >
>
deletion insertion mismatch match <d[i 1, j] + 1, if i > 0,
… - … 𝐰! … 𝐰! … 𝐰! d[i, j] = min d[i, j 1] + 1, if j > 0,
>
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
W A T C G
V 0 1 2 3 4

0 0 1 2 3 4
A T - G T
A 1 1 0 1 2 3 A T C G -
T 2 2 1 0 1 2

G 3 3 2 1 1 1
A T G T
A T C G
T 4 4 3 2 2 2

6
Computing Edit Distance – Your turn! 8
> 0, if i = 0 and j = 0,
… 𝐯" … -
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗 >
>
>
>
… -
deletion … 𝐰!
insertion >
<d[i 1, j] + 1, if i > 0,
0
or d[i, j] = min d[i, j 1] + 1, if j > 0,
1 1 >
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
… 𝐯" … 𝐯" >
>
… 𝐰!
mismatch … 𝐰!
match 1 :d[i 1, j 1], if i > 0, j > 0 and vi = wj .
𝑖, 𝑗 − 1 𝑖, 𝑗

W C A R W A T E W A R E
V 0 1 2 3 V 0 1 2 3 V 0 1 2 3

0 0 0

C 1 C 1 C 1

A 2 A 2 A 2

T 3 T 3 T 3

𝑑 𝐜𝐚𝐭, 𝐜𝐚𝐫 = 𝑑 𝐜𝐚𝐭, 𝐚𝐭𝐞 = 𝑑 𝐜𝐚𝐭, 𝐚𝐫𝐞 = 7

Computing Edit Distance – Your turn! 8
> 0, if i = 0 and j = 0,
… 𝐯" … -
𝑖 − 1, 𝑗 − 1 𝑖 − 1, 𝑗 >
>
>
>
… -
deletion … 𝐰!
insertion >
<d[i 1, j] + 1, if i > 0,
0
or d[i, j] = min d[i, j 1] + 1, if j > 0,
1 1 >
>
>
> d[i 1, j 1] + 1, if i > 0, j > 0 and vi 6= wj ,
… 𝐯" … 𝐯" >
>
… 𝐰!
mismatch … 𝐰!
match 1 :d[i 1, j 1], if i > 0, j > 0 and vi = wj .
𝑖, 𝑗 − 1 𝑖, 𝑗

W C A R W A T E W A R E
V 0 1 2 3 V 0 1 2 3 V 0 1 2 3

0 0 1 2 3 0 0 1 2 3 0 0 1 2 3

C 1 1 0 1 2 C 1 1 1 2 3 C 1 1 1 2 3

A 2 2 1 0 1 A 2 2 1 2 3 A 2 2 1 2 3

T 3 3 2 1 1 T 3 3 2 1 2 T 3 3 2 2 3

𝑑 𝐜𝐚𝐭, 𝐜𝐚𝐫 = 1 𝑑 𝐜𝐚𝐭, 𝐚𝐭𝐞 = 2 𝑑 𝐜𝐚𝐭, 𝐚𝐫𝐞 = 3 8

Change Problem and Edit distance
W A T C G
V 0 1 2 3 4
Make M cents using minimum
number of 1, 3 and 5 cent coins. 0 0 1 2 3 4

A 1 1 0 1 2 3
Value 1 2 3 4 5 6 7
Min # coins 1 2 1 2 1 2 3 T 2 2 1 0 1 2

G 3 3 2 1 1 1

T 4 4 3 2 2 2

• Both have optimal substructure and can be solved using dynamic programming
• These are examples of a more general problem!
9
Review of Graph Theory
• Graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣% , … , 𝑣" }
• Edges 𝐸 = {(𝑣# , 𝑣$ ), … }

Chicago

Bloomington

Champaign-Urbana

Indianapolis

St. Louis

10
Review of Graph Theory
• Directed Graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣% , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣$ ), … }

Chicago

Bloomington

Champaign-Urbana

Indianapolis

St. Louis

11
Review of Graph Theory
• Directed Graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣% , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣$ ), … }
• Path is a sequence of vertices and edges
Chicago
that connect them

Bloomington

Champaign-Urbana

Indianapolis

St. Louis

12
Review of Graph Theory
• Directed Graph 𝐺 = (𝑉, 𝐸)
• Vertices 𝑉 = {𝑣% , … , 𝑣" }
• Directed edges 𝐸 = {(𝑣# , 𝑣$ ), … }
• Path is a sequence of vertices and edges
Chicago
that connect them
130
• Edges can be weighted
140
Bloomington
50
Champaign-Urbana

180
170 150 Indianapolis

St. Louis

13
Manhattan Tourist Problem

Begin
A tourist in Manhattan * *
wants to visit the
maximum number of *
attractions (*) by
* *
traveling on a path (only
* *
eastward and southward) *
from start to end
*
* * *
End

14
Manhattan Tourist Problem
A tourist in Manhattan
wants to visit the Begin
maximum number of 2 1
attractions (*) by
traveling on a path (only 1 1 1
eastward and southward)
from start to end 1 2
1
May be more than 1 5
attraction on a street. 1 1 3
Add weights! End

15
Manhattan Tourist Problem
0 1 2 3 4
begin j coordinate
3 2 4 0
0 0 3 5 9

Manhattan Tourist 1 0 2 4 3

Problem (MTP): 3 2 4 2
1 13
Given a weighted,
1
directed grid graph G 4 6 5 2

i coordinate
with two vertices “begin” 2
0 7 3
15
4
19
and “end”, find the
maximum weight path in 4 4 5 2 1

G from “begin” to “end”. 3

3 3 0 2
20

5 6 8 5 3

1 3 2 2 end
4 23
16
Manhattan Tourist Problem – Exhaustive Algorithm
0 1 2 3 4
Check all paths begin 3 2 4 0
j coordinate
0 0 3 5 9

Question: 1 0 2 4 3

How many paths? 1

3 2 4
13
2

1
4 6 5 2

i coordinate
0 7 3 4
2 15 19

4 4 5 2 1
3 3 0 2
3
20

5 6 8 5 3

1 3 2 2 end
4 23
17
Manhattan Tourist Problem – Greedy Algorithm
1 2 5
begin

5 3 10 5

2 1 5

3 5 3 1
better path!
2 3 4

promising start, 0 0 5 2
but leads to bad 22
0 0 0
choices! end
18

18
Manhattan Tourist Problem – Optimal Substructure
1 2 5
begin

5 3 10 5

2 1 5

best score to this

3 5 3 1 point
2 3 4
20

0 0 5 2

0 0 0
end best score to end
best score to this point 18 22

19
Manhattan Tourist Problem – Optimal Substructure
𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)

1 2 5
begin

5 3 10 5

2 1 5

best score to
Question: What is the recurrence? 3 5 3 1 this point
2 3 4
20

0 0 5 2

0 0 0
end best score to
best score to this point 18 22 end

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between 𝑖 − 1, 𝑗 and 𝑖, 𝑗

• 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between 𝑖, 𝑗 − 1 and 𝑖, 𝑗
20
Manhattan Tourist Problem – Optimal Substructure
𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)

1 2 5
begin

5 3 10 5

2 1 5
8 best score to
>
<0, if i = 0 and j = 0, 3 5 3 1 this point
s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0, 2 3 4
>
: 20
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 0 5 2

0 0 0
end best score to
best score to this point 18 22 end

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between 𝑖 − 1, 𝑗 and 𝑖, 𝑗

• 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between 𝑖, 𝑗 − 1 and 𝑖, 𝑗
21
MTP – Solving Recurrence using Dynamic Programming
𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0
source 8
>
<0, if i = 0 and j = 0,
s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between

𝑖 − 1, 𝑗 and 𝑖, 𝑗
• 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗

22
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1
source 8
>
<0, if i = 0 and j = 0,
1 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1
i 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗

23
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2
source 8
>
<0, if i = 0 and j = 0,
1 2 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3
i 5 3
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3

2
8

24
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5

2
2
8 12
0

3
8

25
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3

2 3
2
8 12 16
0 0

0
3
8 12

26
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1

2 3 4
2
8 12 16 20
0 0 5

0 0
3
8 12 21

27
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1

2 3 4
2
8 12 16 20
0 0 5 2

0 0 0
3
8 12 21 22

28
MTP – Solving Recurrence using Dynamic Programming
j 𝑠[𝑖, 𝑗] is the best score for path to coordinate (𝑖, 𝑗)
0 1 2 3
source 8
>
<0, if i = 0 and j = 0,
1 2 5 s[i, j] = max s[i 1, j] + w[(i 1, j), (i, j)] if i > 0,
0 >
:
s[i, j 1] + w[(i, j 1), (i, j)] if j > 0.
0 1 3 8
i 5 3 10 5
• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between
2 1 5 𝑖 − 1, 𝑗 and 𝑖, 𝑗
1
5 7 13 18 • 𝑤[ 𝑖, 𝑗 − 1 , (𝑖, 𝑗)] weight of street between
𝑖, 𝑗 − 1 and 𝑖, 𝑗
3 5 3 1

2 3 4 Let 𝑚 be the number of rows and 𝑛 be

2
8 12 16 20 the number of columns.
0 0 5 2
Running time: 𝑂(𝑚𝑛)

3
0 0 0 Question: Implementation?
8 12 21 22
S3,3 = 22
29
Manhattan Is Not a Perfect Grid
A2 A3

What about diagonals?

A1 B

8
>
<s[A1 ] + w[A1 , B],
s[B] = max s[A2 ] + w[A2 , B],
>
:
s[A3 ] + w[A3 , B].

30
Manhattan Is Not a Perfect Grid, It’s a Directed Graph

pred 𝑖, 𝑗
𝐺 = (𝑉, 𝐸) is a directed
acyclic graph (DAG) Each edge is evaluated
with nonnegative edges once: 𝑂( 𝐸 ) time
weights 𝑤 ∶ 𝐸 → ℝ.
𝑖, 𝑗

s[0, 0] = 0
0 0 0 0
s[i, j] = max {s[i , j ] + w[(i , j ), (i, j)]}
(i0 ,j 0 ) 2 pred(i,j)

31
Dynamic Programming as a Graph Problem
Begin
Manhattan Tourist Problem: * *
Every path in directed graph is a possible * * *
tourist path. Find maximum weight path. *
* *
Running time: 𝑂 𝑚𝑛 = 𝑂( 𝐸 )
*
* * *
End

Change Problem: Make M cents using

minimum number of coins 𝐜 = 1, 3, 5 .
Every path in directed graph is a possible
change. Find shortest path.
Running time: 𝑂 𝑀𝑛 = 𝑂( 𝐸 )
32
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given
V 0 1 2 3 4
strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " ,
compute the minimum number
0 𝑑(𝐯, 𝐰) of elementary operations
A 1 to transform 𝐯 into 𝐰.
T 2

G 3

T 4

𝐯" deletion - insertion 𝐯" mismatch 𝐯" match

- 𝐰! 𝐰! 𝐰!

33
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given
V 0 1 2 3 4
strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " ,
compute the minimum number
0 O O O O O 𝑑(𝐯, 𝐰) of elementary operations
A 1 O O O O O to transform 𝐯 into 𝐰.
T 2 O O O O O
Edit graph is a weighed, directed grid
G 3 O O O O O graph 𝐺 = (𝑉, 𝐸) with source vertex
T 4 O O O O O
(0, 0) and target vertex (𝑚, 𝑛). Each
edge (𝑖, 𝑗) has weight [𝑖, 𝑗] corresponding
𝐯" deletion - insertion 𝐯" mismatch 𝐯" match to edit cost: deletion (1), insertion (1),
- 𝐰! 𝐰! 𝐰! mismatch (1) and match (0).

34
What About the Edit Distance Problem?
W A T C G
V 0 1 2 3 4

0 O O O O O

A 1 O O O O O
Alignment is a path from (0, 0) to (𝑚, 𝑛)
T 2 O O O O O
Edit graph is a weighed, directed grid
G 3 O O O O O graph 𝐺 = (𝑉, 𝐸) with source vertex
T 4 O O O O O
(0, 0) and target vertex (𝑚, 𝑛). Each
edge (𝑖, 𝑗) has weight [𝑖, 𝑗] corresponding
𝐯" deletion - insertion 𝐯" mismatch 𝐯" match to edit cost: deletion (1), insertion (1),
- 𝐰! 𝐰! 𝐰! mismatch (1) and match (0).

35
What About the Edit Distance Problem?
W A T C G Edit Distance problem: Given edit
V 0 1 2 3 4
graph 𝐺 = (𝑉, 𝐸), with edge
weights c ∶ 𝐸 → 0,1 . Find
0 O O O O O shortest path from (0, 0) to (𝑚, 𝑛).
A 1 O O O O O
Alignment is a path from (0, 0) to (𝑚, 𝑛)
T 2 O O O O O
Edit graph is a weighed, directed grid
G 3 O O O O O graph 𝐺 = (𝑉, 𝐸) with source vertex
T 4 O O O O O
(0, 0) and target vertex (𝑚, 𝑛). Each
edge (𝑖, 𝑗) has weight [𝑖, 𝑗] corresponding
𝐯" deletion - insertion 𝐯" mismatch 𝐯" match to edit cost: deletion (1), insertion (1),
- 𝐰! 𝐰! 𝐰! mismatch (1) and match (0).

36
Shortest Path vs Longest Path
• Change graph, edit graph and the MTP grid are directed graphs G.

• Change problem and Edit Distance problem are minimization problems.

• Find shortest path in G from source to sink.

• Manhattan Tourist problem is a maximization problem.

• Find longest path in G from source to sink.

37
Shortest Path vs Longest Path
• Shortest path in directed graphs can be found efficiently (Dijkstra, Bellman-
Ford, Floyd-Warshall algorithms)
• Longest path in direct graphs cannot be found efficiently (NP-hard).

• Change graph, edit graph and MTP grid graph are directed acylic graphs
(DAGs).
• No directed cycles.

• Longest path problem in a DAG can

solved efficiently by dynamic programming directed cycle

Question: What’s the relation between absence of directed cycles and optimal substructure?
38
Weighted Edit Distance
𝑑[𝑖, 𝑗] is the edit distance of 𝐯# and 𝐰$ ,
where 𝐯# is prefix of 𝐯 of length 𝑖 and 𝐰$ is prefix of 𝐰 of length 𝑗
8 deletion
… 𝐯$

>
> d[i 1, j] + 1, … -

>
<d[i, j 1] + 1, insertion
… -
… 𝐰#

d[i, j] = min mismatch … 𝐯$

>
> d[i 1, j 1] + 1, if vi 6= wj , … 𝐰#
>
: … 𝐯$
d[i 1, j 1], if vi = wj . … 𝐰#

Replace +1 with different penalties for different types of edits.

39
Weighted Edit Distance – Practice Problem
• Compute weighted edit distance between 𝐯 = AGT and 𝐰 = ATCT.

A T C G
w 0 1 2 3 4
V 8
> 0, if i = 0 and j = 0,
>
>
>
>
0 >
<d[i 1, j] + 1, if i > 0,
d[i, j] = min d[i, j 1] + 1, if j > 0,
A >
1 >
>
> d[i 1, j 1] + 2, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
G 2

T 3

40
Weighted Edit Distance – Practice Problem
• Compute weighted edit distance between 𝐯 = AGT and 𝐰 = ATCT.

A T C G
w 0 1 2 3 4
V 8
> 0, if i = 0 and j = 0,
>
>
>
>
0 0 1 2 3 4 >
<d[i 1, j] + 1, if i > 0,
d[i, j] = min d[i, j 1] + 1, if j > 0,
A >
1 1 0 1 2 3 >
>
> d[i 1, j 1] + 2, if i > 0, j > 0 and vi 6= wj ,
>
>
:d[i 1, j 1], if i > 0, j > 0 and vi = wj .
G 2 2 1 2 3 2

T 3 3 2 1 2 3

41
Edit Distance – Additional Insights
• An alignment corresponds to a series of elementary operations

Examples from http://profs.scienze.univr.it/~liptak/ACB/files/StringDistance_6up.pdf 42

Edit Distance – Additional Insights
• An alignment corresponds to a series of elementary operations

• But not every series of elementary operations corresponds to an alignment! Why?

Examples from http://profs.scienze.univr.it/~liptak/ACB/files/StringDistance_6up.pdf 43

Distance Function / Metric

A distance function (metric) on a set 𝑋 is a function 𝑑 ∶ 𝑋 × 𝑋 → ℝ

s.t. for all 𝑥, 𝑦, 𝑧 ∈ 𝑋:
i. 𝑑 𝑥, 𝑦 ≥ 0 [non-negativity]
ii. 𝑑 𝑥, 𝑦 = 0 if and only if 𝑥 = 𝑦 [identity of indiscernibles]
iii. 𝑑 𝑥, 𝑦 = 𝑑(𝑦, 𝑥) [symmetry]
iv. 𝑑 𝑥, 𝑦 ≤ 𝑑 𝑥, 𝑧 + 𝑑(𝑧, 𝑦) [triangle inequality]

Question: Is edit distance a distance function?

44
Edit Distance is a Distance Function
Edit distance 𝑑(𝐯, 𝐰) is the minimum number of elementary operations to
transform 𝐯 ∈ Σ ∗ into 𝐰 ∈ Σ ∗ .

Claim: edit distance is a distance function.

Proof: Let 𝐮, 𝐯, 𝐰 ∈ Σ ∗ .
i. 𝑑 𝐯, 𝐰 ≥ 0 [non-negativity]
Edit distance is defined by an alignment. This in turn uniquely determines
a series of elementary operations, each with cost either 0 (match) or 1
(otherwise). Thus, 𝑑 𝐯, 𝐰 ≥ 0.

45
Edit Distance is a Distance Function
Edit distance 𝑑(𝐯, 𝐰) is the minimum number of elementary operations to
transform 𝐯 ∈ Σ ∗ into 𝐰 ∈ Σ ∗ .

Claim: edit distance is a distance function.

Proof: Let 𝐮, 𝐯, 𝐰 ∈ Σ ∗ .
ii. 𝑑 𝐯, 𝐰 = 0 if and only if 𝐯 = 𝐰 [identity of indiscernibles]
(=>) By the premise, 𝑑 𝐯, 𝐰 = 0. By definition, the optimal alignment can only consist
of operations with cost 0. That is, the alignment consist of only matches. Thus, 𝐯 = 𝐰.
(<=) By the premise, 𝐯 = 𝐰. Thus, there exists an alignment where every pair of
columns is a match. This means that |𝐯| = |𝐰| and each letter 𝑣0 equals 𝑤0 (where 𝑖 ∈
[|𝐯|]). Moreover, only the match operations have cost 0, the other operations have
cost 1. Hence, this is the optimal alignment with cost 𝑑 𝐯, 𝐰 = 0.
46
Edit Distance is a Distance Function
Edit distance 𝑑(𝐯, 𝐰) is the minimum number of elementary operations to
transform 𝐯 ∈ Σ ∗ into 𝐰 ∈ Σ ∗ .

Claim: edit distance is a distance function.

Proof: Let 𝐮, 𝐯, 𝐰 ∈ Σ ∗ .
iii. 𝑑 𝐯, 𝐰 = 𝑑(𝐰, 𝐯) [symmetry]
Let 𝐀 = [𝑎0,1 ] be the optimal alignment corresponding to 𝑑 𝐯, 𝐰 , i.e. 𝐀 is an 2 × 𝑘
matrix where 𝑘 ∈ {max( 𝐯 , 𝐰 ), … , 𝐯 + 𝐰 }. Define the function 𝑓 𝐀 = 𝐁 such
that 𝐁 is obtained by interchanging the two rows of 𝐀. Since the cost of any insertion,
deletion and mismatch is 1, we have that alignment 𝐁 has cost 𝑑 𝐯, 𝐰 . The existence
of an alignment from 𝐰 to 𝐯 with cost less than 𝑑 𝐯, 𝐰 , yields a contradiction as it
implies that 𝐀 is not an optimal alignment from 𝐯 to 𝐰. Hence, 𝑑 𝐰, 𝐯 = 𝑑 𝐯, 𝐰 .
47
Edit Distance is a Distance Function
Edit distance 𝑑(𝐯, 𝐰) is the minimum number of elementary operations to
transform 𝐯 ∈ Σ ∗ into 𝐰 ∈ Σ ∗ .

Claim: edit distance is a distance function.

Proof: Let 𝐮, 𝐯, 𝐰 ∈ Σ ∗ .
iv. 𝑑 𝐯, 𝐰 ≤ 𝑑 𝐯, 𝐮 + 𝑑(𝐮, 𝐰) [triangle inequality]
Assume for a contradiction that 𝑑 𝐯, 𝐰 > 𝑑 𝐯, 𝐮 + 𝑑(𝐮, 𝐰). Let 𝑆 be the sequence
of elementary operations for transforming 𝐯 into 𝐮. Let 𝑆′ be the sequence of
elementary operations for transforming 𝐮 into 𝐰. Note that 𝑑 𝐯, 𝐮 = |𝑆| and
𝑑 𝐮, 𝐰 = |𝑆′|. Concatenate 𝑆 and 𝑆′ and remove redundant operations, yielding
sequence 𝑆′′. By definition, 𝑆 22 ≤ 𝑆 + 𝑆 2 . We can obtain an alignment of 𝐯 and
𝐰 from 𝑆′′ with cost 𝑆 22 ≤ 𝑑 𝐯, 𝐮 + 𝑑(𝐮, 𝐰). This yields a contradiction with
𝑑 𝐯, 𝐰 > 𝑑 𝐯, 𝐮 + 𝑑(𝐮, 𝐰) being the cost of the optimal alignment of 𝐯 and 𝐰.
48
Outline
1. Edit distance recap
2. Global alignment
3. Fitting alignment
4. Local alignment
5. Gapped alignment

Reading:
• Jones and Pevzner. Chapters 6.6, 6.8 and 6.9

49
Biological Sequence Alignment
• Weighted edit distance: find W A T C G
alignment with minimum V 0 1 2 3 4
distance
• Shortest path in weighted 0 O O O O O
edit graph
A 1 O O O O O
• Sequence alignment: find
alignment with maximum T 2 O O O O O
similarity G 3 O O O O O
• Longest path in weighted
edit graph T 4 O O O O O
• Score function:
𝛿∶ Σ ∪ − 3→ℝ $% deletion - insertion $% mismatch $% match
- "# "# "#

Question: What is an example of 𝛿? 𝛿(𝑣# , −) 𝛿(−, 𝑤$ ) 𝛿(𝑣# , 𝑤$ ) 50

Scoring Matrices
A C
Transitions: interchanges among purines
(two rings) or pyrimidines (one ring)
• A <--> G
• C <--> T

Transversions: interchanges between purines

(two rings) and pyrimidines (one ring)
• A <--> C, A <--> T
• G <--> C, G <--> T

Transitions more likely than transversions!

G T
51
Scoring Matrices
Transitions: interchanges among purines
(two rings) or pyrimidines (one ring)
• A <--> G
𝛿 A T C G -
• C <--> T A 1 -2 -2 -1 -1
T -2 1 -1 -2 -1
Transversions: interchanges between purines
(two rings) and pyrimidines (one ring) C -2 -1 1 -2 -1
• A <--> C, A <--> T G -1 -2 -2 1 -1
• G <--> C, G <--> T
- -1 -1 -1 -1 −∞
Transitions more likely than transversions!

52
Global Alignment – Needleman-Wunsch Algorithm
Global Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and scoring
function 𝛿, find alignment with maximum score.

• An alignment is a source-to-sink path in the edit graph

• An alignment 𝐀 = [𝑎#,$ ] is a 2 × 𝑘 matrix s.t. (i) 𝑘 = {max 𝑚, 𝑛 , … , 𝑚 + 𝑛},
(ii) 𝑎#,$ ∈ Σ ∪ − and (iii) there is no 𝑗 ∈ [𝑘] where 𝑎%,$ = 𝑎(,$ = −
8
>
> 0, if i = 0 and j = 0,
>
<s[i 1, j] + (v , ),
i if i > 0, deletion
s[i, j] = max
>
> s[i, j 1] + ( , wj ), if j > 0, insertion
>
:
s[i 1, j 1] + (vi , wj ), if i > 0 and j > 0. match/
mismatch
53
Demonstration
• http://rna.informatik.uni-
freiburg.de/Teaching/index.jsp?toolName=Needleman-Wunsch

• 𝐯 = ATGTTAT and 𝐰 = ATCGTAC. 𝛿 A T C G -

A 1 -2 -2 -1 -1
T -2 1 -1 -2 -1
C -2 -1 1 -2 -1
G -1 -2 -2 1 -1
- -1 -1 -1 -1 −∞

54
NGS Characterized by Short Reads
… CATTCAGTAG …
… AGCCATTAG …
… GGTAGTTAG … … GGTAAACTAG …

… TATAATTAG … … CGTACCTAG …
Genome
Next-generation 10-100’s million short reads
Millions -billions
DNA sequencing Short read: 100 nucleotides
nucleotides
Allow for inexact matches due to: Human reference genome is
• Sequencing errors 3,300,000,000 nucleotides, while a
short read is 100 nucleotides.
• Polymorphisms/mutations in Global sequence alignment will not
reference genome work!
Question: How to account for discrepancy
between lengths of reference and short read?
55
Fitting Alignment
For short read alignment, we want to align complete short read 𝐯 ∈
Σ ! to substring of reference genome 𝐰 ∈ Σ " . Note that 𝑚 ≪ 𝑛.

𝐯 ∈ Σ4
𝐰 ∈ Σ5

Fitting Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and

scoring function 𝛿, find a alignment of 𝐯 and a substring of 𝐰 with
maximum global alignment score 𝑠 ∗ among all global alignments of
𝐯 and all substrings of 𝐰
56
Fitting Alignment – Naive Approach
Fitting Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and scoring
function 𝛿, find an alignment of 𝐯 and a substring of 𝐰 with maximum global
alignment score 𝑠 ∗ among all global alignments of 𝐯 and all substrings of 𝐰

𝐯 ∈ Σ4
𝐰 ∈ Σ5

• Consider all contiguous non-empty substrings of 𝐰, defined by 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛

• How many?

57
Fitting Alignment – Naive Approach
Fitting Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and scoring
function 𝛿, find an alignment of 𝐯 and a substring of 𝐰 with maximum global
alignment score 𝑠 ∗ among all global alignments of 𝐯 and all substrings of 𝐰

𝐯 ∈ Σ4
𝐰 ∈ Σ5

• Consider all contiguous non-empty substrings of 𝐰, defined by 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛

"
• How many? Answer: 𝑛 + (
• What are their total lengths?
• What is the running time?

58
Fitting Alignment – Dynamic Programming
Fitting Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and scoring
function 𝛿, find an alignment of 𝐯 and a substring of 𝐰 with maximum global
alignment score 𝑠 ∗ among all global alignments of 𝐯 and all substrings of 𝐰
8
𝐯\𝐰 0 T A C G G C >
> 0, if i = 0,
>
<s[i 1, j] + (v , ), if i > 0,
0 s[i, j] = max
i
>
> s[i, j 1] + ( , wj ), if i > 0 and j > 0,
>
:
A s[i 1, j 1] + (vi , wj ), if i > 0 and j > 0.
s⇤ = max{s[m, 0], . . . , s[m, n]}
G

59
Fitting Alignment – Dynamic Programming
Fitting Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and scoring
function 𝛿, find an alignment of 𝐯 and a substring of 𝐰 with maximum global
alignment score 𝑠 ∗ among all global alignments of 𝐯 and all substrings of 𝐰
8
0, Start anywhere on first row if
𝐯\𝐰 0 T A C G G C >
> i = 0,
>
<s[i 1, j] + (v , ), if i > 0,
0 s[i, j] = max
i
>
> s[i, j 1] + ( , wj ), if i > 0 and j > 0,
>
:
A s[i 1, j 1] + (vi , wj ), if i > 0 and j > 0.
s⇤ = max{s[m, 0], . . . , s[m, n]} End anywhere on last row
G

60
Fitting Alignment – Dynamic Programming
Fitting Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and scoring
function 𝛿, find an alignment of 𝐯 and a substring of 𝐰 with maximum global
alignment score 𝑠 ∗ among all global alignments of 𝐯 and all substrings of 𝐰
8
0, Start anywhere on first row if
𝐯\𝐰 0 T A C G G C >
> i = 0,
>
<s[i 1, j] + (v , ), if i > 0,
0 s[i, j] = max
i
>
> s[i, j 1] + ( , wj ), if i > 0 and j > 0,
>
:
A s[i 1, j 1] + (vi , wj ), if i > 0 and j > 0.
s⇤ = max{s[m, 0], . . . , s[m, n]} End anywhere on last row
G
Question: Let match score be 1,
G mismatch/indel score be -1. What is 𝑠 ∗ ?
𝐯 A - G G Question: Same scores. What is optimal
𝐰 A C G G global alignment and score?
61
Fitting Alignment – Dynamic Programming
• Online:
https://valiec.github.io/AlignmentVisualizer/index.html
8
𝐯\𝐰 0 T A C G G C >
> 0, if i = 0,
>
<s[i 1, j] + (v , ), if i > 0,
0 s[i, j] = max
i
>
> s[i, j 1] + ( , wj ), if i > 0 and j > 0,
>
:
A s[i 1, j 1] + (vi , wj ), if i > 0 and j > 0.
s⇤ = max{s[m, 0], . . . , s[m, n]}
G
Question: Let match score be 1,
G mismatch/indel score be -1. What is 𝑠 ∗ ?
𝐯 A - G G Question: Same scores. What is optimal
𝐰 A C G G global alignment and score?
62

3.1 Sequence Alignment
No ratings yet
3.1 Sequence Alignment
8 pages
Chapter Four and Five
No ratings yet
Chapter Four and Five
56 pages
Unit Iv R23
No ratings yet
Unit Iv R23
31 pages
Daa 2
No ratings yet
Daa 2
21 pages
DAA Lab Manual Final
No ratings yet
DAA Lab Manual Final
34 pages
Assembly Line Scheduling
No ratings yet
Assembly Line Scheduling
4 pages
Dynamic Programming - General Method
No ratings yet
Dynamic Programming - General Method
13 pages
Ads Unit-5
No ratings yet
Ads Unit-5
45 pages
Cs221 Section3 Solutions
No ratings yet
Cs221 Section3 Solutions
8 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
14 - Chapter 23
No ratings yet
14 - Chapter 23
27 pages
Week 7 & 11 - Dynamic Programming Strategy
No ratings yet
Week 7 & 11 - Dynamic Programming Strategy
58 pages
Design & Analysis of Algorithms (DAA) Unit - III
No ratings yet
Design & Analysis of Algorithms (DAA) Unit - III
17 pages
Unit 3 Vips
No ratings yet
Unit 3 Vips
74 pages
ADSA Mod 3 Selfmade
No ratings yet
ADSA Mod 3 Selfmade
12 pages
Daa - Key Set A CT2
No ratings yet
Daa - Key Set A CT2
10 pages
BCS-042 em 2024-2025
No ratings yet
BCS-042 em 2024-2025
17 pages
CH4. Greedy Approach: - Grab Data Items in Sequence, Each Time
No ratings yet
CH4. Greedy Approach: - Grab Data Items in Sequence, Each Time
17 pages
Multistage Backward
No ratings yet
Multistage Backward
13 pages
AP - Dynamic Programming - Algorithms and Complexity Ana
No ratings yet
AP - Dynamic Programming - Algorithms and Complexity Ana
11 pages
DAA New Unit 4
No ratings yet
DAA New Unit 4
47 pages
DAA-Module 4
No ratings yet
DAA-Module 4
75 pages
Safoori
No ratings yet
Safoori
9 pages
ADA Lab Manual
No ratings yet
ADA Lab Manual
32 pages
Assignment 4
No ratings yet
Assignment 4
18 pages
CCCS314 - DAA - 22 - 23 - 3rd 06 Dynamic Programming
No ratings yet
CCCS314 - DAA - 22 - 23 - 3rd 06 Dynamic Programming
38 pages
Wa0036.
No ratings yet
Wa0036.
92 pages
DAA
No ratings yet
DAA
62 pages
MT 2009 Answers
No ratings yet
MT 2009 Answers
8 pages
Student Algorithm Competition Guide
No ratings yet
Student Algorithm Competition Guide
26 pages
Lecture 1 DP
No ratings yet
Lecture 1 DP
55 pages
Data Structures and Algorithms Unit - V: Dynamic Programming
No ratings yet
Data Structures and Algorithms Unit - V: Dynamic Programming
19 pages
Final
No ratings yet
Final
6 pages
Exp-3 1
No ratings yet
Exp-3 1
3 pages
Travelling Salesman Problem (Dynamic Approach)
No ratings yet
Travelling Salesman Problem (Dynamic Approach)
5 pages
Dynamic Programming
No ratings yet
Dynamic Programming
35 pages
Data Structures and Algorithms Unit - V: Dynamic Programming
No ratings yet
Data Structures and Algorithms Unit - V: Dynamic Programming
19 pages
Week 8 Leacture
No ratings yet
Week 8 Leacture
89 pages
Pritam Mondal Algorithm PCA2
No ratings yet
Pritam Mondal Algorithm PCA2
21 pages
Algorithm Lab Programs
No ratings yet
Algorithm Lab Programs
24 pages
Dynamic Programming
No ratings yet
Dynamic Programming
12 pages
Chapter Four - Dynamic Programming
No ratings yet
Chapter Four - Dynamic Programming
85 pages
הרצאה 9
No ratings yet
הרצאה 9
64 pages
lecture7-20+ (3) - 複本
No ratings yet
lecture7-20+ (3) - 複本
31 pages
Test2 Sample
No ratings yet
Test2 Sample
9 pages
Introduction To Dynamic Programming: 20bits About Writing
No ratings yet
Introduction To Dynamic Programming: 20bits About Writing
6 pages
Lecture 13 - All Pairs Shortest Path
No ratings yet
Lecture 13 - All Pairs Shortest Path
42 pages
III Test Dsa
No ratings yet
III Test Dsa
50 pages
Dynamic Programming
No ratings yet
Dynamic Programming
36 pages
C Programming: Floyd's Algorithm
No ratings yet
C Programming: Floyd's Algorithm
6 pages
6 Dynamic Programming
No ratings yet
6 Dynamic Programming
81 pages
3-Module 2 Scribblings-24-01-2024
No ratings yet
3-Module 2 Scribblings-24-01-2024
14 pages
Full Cse 408 Unit 3
No ratings yet
Full Cse 408 Unit 3
168 pages
Manual
No ratings yet
Manual
23 pages
Daa All Algorithm
No ratings yet
Daa All Algorithm
17 pages
B60 AOAexp06
No ratings yet
B60 AOAexp06
11 pages
AOA Notes
No ratings yet
AOA Notes
25 pages
HW3
No ratings yet
HW3
3 pages
Geng Shan
No ratings yet
Geng Shan
206 pages
A Hierarchical 3D Gaussian Representation For Real-Time Rendering of Very Large Datasets
No ratings yet
A Hierarchical 3D Gaussian Representation For Real-Time Rendering of Very Large Datasets
15 pages
MonST3R: Dynamic Scene Geometry Estimation
No ratings yet
MonST3R: Dynamic Scene Geometry Estimation
21 pages
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
No ratings yet
SLAM3R: Real-Time Dense Scene Reconstruction From Monocular RGB Videos
15 pages
03 Matrix
No ratings yet
03 Matrix
112 pages
04 GNNBasic
No ratings yet
04 GNNBasic
107 pages
Chen Et Al. 2018
No ratings yet
Chen Et Al. 2018
10 pages
Xu Et Al. 2022
No ratings yet
Xu Et Al. 2022
12 pages
CIS1000 Digital Disruption Course
No ratings yet
CIS1000 Digital Disruption Course
16 pages
Rain Sensor Design & Implementation
No ratings yet
Rain Sensor Design & Implementation
61 pages
Multiplexer & Demultiplexer by Dr. Arvind Nautiyal
No ratings yet
Multiplexer & Demultiplexer by Dr. Arvind Nautiyal
23 pages
Flytrap-FirmwareUpgradeProcedures Linksys WRT300N v2 fw2 00 08
No ratings yet
Flytrap-FirmwareUpgradeProcedures Linksys WRT300N v2 fw2 00 08
8 pages
ADVC-50 User Manual
No ratings yet
ADVC-50 User Manual
10 pages
Class XI Practical Record Book
No ratings yet
Class XI Practical Record Book
3 pages
Itu - H.323
No ratings yet
Itu - H.323
108 pages
IRemoval Pro Premium Guide Latest V3.1.1
0% (1)
IRemoval Pro Premium Guide Latest V3.1.1
4 pages
YAsh ETI
No ratings yet
YAsh ETI
16 pages
Free MS Office 2013 Keys
100% (1)
Free MS Office 2013 Keys
3 pages
Data Structures and Algorithms in Java - Extensive Question Bank by Dan Myers
No ratings yet
Data Structures and Algorithms in Java - Extensive Question Bank by Dan Myers
6 pages
Salesforce DML and Database Class Guide
No ratings yet
Salesforce DML and Database Class Guide
13 pages
CIM and CAD/CAM Study Guide
No ratings yet
CIM and CAD/CAM Study Guide
6 pages
Dodge Technical Catalogue
No ratings yet
Dodge Technical Catalogue
437 pages
Phoresis Extended Eng Rel. 5
No ratings yet
Phoresis Extended Eng Rel. 5
24 pages
Python Sets
No ratings yet
Python Sets
30 pages
AWS Certification Pathways Guide 1701669108
No ratings yet
AWS Certification Pathways Guide 1701669108
9 pages
Deep Learning-Based Quality Assessment of 3d Point Clouds Without Reference
No ratings yet
Deep Learning-Based Quality Assessment of 3d Point Clouds Without Reference
6 pages
Module Cuac 413
No ratings yet
Module Cuac 413
93 pages
Intelligent Transportation Systems Using External Infrastructure: A Literature Survey
No ratings yet
Intelligent Transportation Systems Using External Infrastructure: A Literature Survey
18 pages
Lecture Notes w4 - Scrum and Kanban
No ratings yet
Lecture Notes w4 - Scrum and Kanban
65 pages
Picture Yourself Learning Microsoft Word 2010 Step by Step 1st Edition Diane (Diane Koers) Koers
No ratings yet
Picture Yourself Learning Microsoft Word 2010 Step by Step 1st Edition Diane (Diane Koers) Koers
81 pages
DOC1087 GB - P142-L-NUT-033-B - UserManual - dBLexd
No ratings yet
DOC1087 GB - P142-L-NUT-033-B - UserManual - dBLexd
71 pages
Network Concept - Subjective Questions-Answers
No ratings yet
Network Concept - Subjective Questions-Answers
3 pages
Yashwantrao Chavan Maharashtra Open University
No ratings yet
Yashwantrao Chavan Maharashtra Open University
1 page
M32 Maintenance Manual A2794-30
No ratings yet
M32 Maintenance Manual A2794-30
122 pages
Practice Problems For Curve Fitting - Solution
No ratings yet
Practice Problems For Curve Fitting - Solution
8 pages
Modbus Protocol for Engineers
No ratings yet
Modbus Protocol for Engineers
4 pages
Command Language in HCI
No ratings yet
Command Language in HCI
21 pages
Vijay Resume
No ratings yet
Vijay Resume
2 pages

Lecture 3

Uploaded by

Lecture 3

Uploaded by

CS 466

Piazza / Gradescope: Midterm:

Question: Is this a good alignment?

v: ATGTTAT... deletion insertion mismatch match

𝑑 𝐜𝐚𝐭, 𝐜𝐚𝐫 = 𝑑 𝐜𝐚𝐭, 𝐚𝐭𝐞 = 𝑑 𝐜𝐚𝐭, 𝐚𝐫𝐞 = 7

𝑑 𝐜𝐚𝐭, 𝐜𝐚𝐫 = 1 𝑑 𝐜𝐚𝐭, 𝐚𝐭𝐞 = 2 𝑑 𝐜𝐚𝐭, 𝐚𝐫𝐞 = 3 8

G from “begin” to “end”. 3

How many paths? 1

best score to this

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between 𝑖 − 1, 𝑗 and 𝑖, 𝑗

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between 𝑖 − 1, 𝑗 and 𝑖, 𝑗

• 𝑤[ 𝑖 − 1, 𝑗 , (𝑖, 𝑗)] weight of street between

2 3 4 Let 𝑚 be the number of rows and 𝑛 be

What about diagonals?

Change Problem: Make M cents using

𝐯" deletion - insertion 𝐯" mismatch 𝐯" match

• Change problem and Edit Distance problem are minimization problems.

• Manhattan Tourist problem is a maximization problem.

• Longest path problem in a DAG can

d[i, j] = min mismatch … 𝐯$

Replace +1 with different penalties for different types of edits.

Examples from http://profs.scienze.univr.it/~liptak/ACB/files/StringDistance_6up.pdf 42

• But not every series of elementary operations corresponds to an alignment! Why?

Examples from http://profs.scienze.univr.it/~liptak/ACB/files/StringDistance_6up.pdf 43

A distance function (metric) on a set 𝑋 is a function 𝑑 ∶ 𝑋 × 𝑋 → ℝ

Question: Is edit distance a distance function?

Claim: edit distance is a distance function.

Claim: edit distance is a distance function.

Claim: edit distance is a distance function.

Claim: edit distance is a distance function.

Question: What is an example of 𝛿? 𝛿(𝑣# , −) 𝛿(−, 𝑤$ ) 𝛿(𝑣# , 𝑤$ ) 50

Transversions: interchanges between purines

Transitions more likely than transversions!

• An alignment is a source-to-sink path in the edit graph

• 𝐯 = ATGTTAT and 𝐰 = ATCGTAC. 𝛿 A T C G -

Fitting Alignment problem: Given strings 𝐯 ∈ Σ ! and 𝐰 ∈ Σ " and

• Consider all contiguous non-empty substrings of 𝐰, defined by 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛

• Consider all contiguous non-empty substrings of 𝐰, defined by 1 ≤ 𝑖 ≤ 𝑗 ≤ 𝑛

You might also like