0% found this document useful (0 votes)

89 views15 pages

Global Alignment: Ben Langmead

This document discusses global alignment and generalizing edit distance. It introduces the idea of assigning different costs or penalties to different sequence differences, like transitions versus transversions or gaps versus substitutions. It presents an implementation of global alignment using dynamic programming that allows a custom penalty function. The algorithm runs in O(mn) time and space and traceback returns the optimal alignment in O(m+n) time. Scoring functions aim to reflect expected mutational events and biological interchangeability, though simple functions are often used for computational efficiency.

Uploaded by

mohit mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views15 pages

Global Alignment: Ben Langmead

Uploaded by

mohit mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Global Alignment

Ben Langmead

Department of Computer Science

Please sign guestbook (www.langmead-lab.org/teaching-materials)

to tell me brieﬂy how you are using the slides. For original Keynote
ﬁles, email me (ben.langmead@gmail.com).
Generalizing edit distance

What if it doesn’t make sense for every edit to cost 1?

If you compare two human genomes, you see some kinds of

sequence diﬀerences more often than others
Generalizing edit distance

Transitions are A ↔ G
and C ↔ T changes

Transversions are A C,
A T, C G, G T

For random mutations,

transitions should be half as
frequent as transversions...

...but if you compare two humans, transition

to transversion ratio (ti/tv) is ~2.1
Generalizing edit distance

GGGTAGCGGGTTTAAC
||||| ||||||||||
GGGTAACGGGTTTAAC
Human substitution rate ≈ 1 in 1,000

GGGTAGCGGGTTTAAC
||||| |||||||||
GGGTA--GGGTTTAAC
Small-gap rate is ≈ 1 in 3,000

Wanted: keep basic edit distance idea and algorithm, but give
diﬀerent weights to diﬀerent events according to likelihood
Penalty function

A C G T -
A 0 4 2 4 8 2 Transitions (A G, C T)
C 4 0 4 2 8 4 Transversions
G 2 4 0 4 8
8 Gaps
T 4 2 4 0 8
- 8 8 8 8
Global alignment
Pj 1 Pi 1
Let D[0, j] = k=0 s( , y[k]), and let D[i, 0] = k=0 s(x[k], )
8
< D[i 1, j] + s(x[i 1], )
Otherwise, let D[i, j] = min D[i, j 1] + s( , y[j 1])
:
D[i 1, j 1] + s(x[i 1], y[j 1])
s(a, b) assigns a cost to a particular gap or substitution

A C G T -
A 0 4 2 4 8 2 Transitions (A G, C T)

C 4 0 4 2 8
s(a, b) : 4 Transversions (everything else)
G 2 4 0 4 8
T 4 2 4 0 8 8 Gaps
- 8 8 8 8
Global alignment: implementation

from numpy import zeros

def globalAlignment(x, y, s):

""" Calculate global alignment value of sequences x and y using
dynamic programming. Return global alignment value. """
D = zeros((len(x)+1, len(y)+1), dtype=int)
for j in range(1, len(y)+1): Use of new
D[0, j] = D[0, j-1] + s('-', y[j-1])
for i in range(1, len(x)+1): penalty function
D[i, 0] = D[i-1, 0] + s(x[i-1], '-')
for i in range(1, len(x)+1):
for j in range(1, len(y)+1):
D[i, j] = min(D[i-1, j-1] + s(x[i-1], y[j-1]), # diagonal
D[i-1, j ] + s(x[i-1], '-'), # vertical
D[i , j-1] + s('-', y[j-1])) # horizontal
return D, D[len(x), len(y)]

Similar to edit distance

http://bit.ly/CG_DP_Global
Global alignment: implementation
def exampleCost(xc, yc):
""" Cost function assigning 0 to match, 2 to transition, 4 to
transversion, and 8 to a gap """
if xc == yc: return 0 # match
if xc == '-' or yc == '-': return 8 # gap
minc, maxc = min(xc, yc), max(xc, yc)
if minc == 'A' and maxc == 'G': return 2 # transition
elif minc == 'C' and maxc == 'T': return 2 # transition
return 4 # transversion

A C G T -
A 0 4 2 4 8
C 4 0 4 2 8
G 2 4 0 4 8
T 4 2 4 0 8
- 8 8 8 8

http://bit.ly/CG_DP_Global
Global alignment: dynamic programming
D = zeros((len(x)+1, len(y)+1), dtype=int)
globalAlignment for j in range(1, len(y)+1):
D[0, j] = D[0, j-1] + s('-', y[j-1])
initialization: for i in range(1, len(x)+1):
D[i, 0] = D[i-1, 0] + s(x[i-1], '-')

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80 s(a, b)
T 8 A C G T -
A 0 4 2 4 8
A 16 C 4 0 4 2 8
C 24 G 2 4 0 4 8
T 4 2 4 0 8
G 32 - 8 8 8 8
T 40
C 48
A 56
G 64
C 72
Global alignment: dynamic programming
for i in range(1, len(x)+1):
globalAlignment for j in range(1, len(y)+1):
D[i, j] = min(D[i-1, j-1] + s(x[i-1], y[j-1]), # diagonal
loop: D[i-1, j ] + s(x[i-1], '-'), # vertical
D[i , j-1] + s('-', y[j-1])) # horizontal

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80 s(a, b)
T 8 0 8 16 24 32 40 48 56 64 72 A C G T -
A 0 4 2 4 8
A 16 8 0 8 16 24 32 40 48 56 64 C 4 0 4 2 8
C 24 16 8 ? G 2 4 0 4 8
T 4 2 4 0 8
G 32 - 8 8 8 8
T 40
C 48
A 56
G 64
C 72
Global alignment: dynamic programming
for i in range(1, len(x)+1):
globalAlignment for j in range(1, len(y)+1):
D[i, j] = min(D[i-1, j-1] + s(x[i-1], y[j-1]), # diagonal
loop: D[i-1, j ] + s(x[i-1], '-'), # vertical
D[i , j-1] + s('-', y[j-1])) # horizontal

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80 s(a, b)
T 8 0 8 16 24 32 40 48 56 64 72 A C G T -
A 0 4 2 4 8
A 16 8 0 8 16 24 32 40 48 56 64 C 4 0 4 2 8
C 24 16 8 2 10 18 24 32 40 48 56 G 2 4 0 4 8
T 4 2 4 0 8
G 32 24 16 10 2 10 18 26 34 40 48 - 8 8 8 8
T 40 32 24 16 10 2 10 18 26 34 42
C 48 40 32 24 18 10 2 10 18 26 34
A 56 48 40 32 26 18 10 2 10 18 26
G 64 56 48 40 32 26 18 10 6 10 18
C 72 64 56 48 40 34 26 18 12 10 10 Optimal global
alignment value
Global alignment: getting the alignment

Traceback works just as it did for edit distance

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80
T 8 0 8 16 24 32 40 48 56 64 72
A 16 8 0 8 16 24 32 40 48 56 64
C 24 16 8 2 10 18 24 32 40 48 56
G 32 24 16 10 2 10 18 26 34 40 48
T 40 32 24 16 10 2 10 18 26 34 42 TACGTCA-GC
C 48 40 32 24 18 10 2 10 18 26 34 || |||| ||
TATGTCATGC
A 56 48 40 32 26 18 10 2 10 18 26 +2 +8
(transition) (gap)
G 64 56 48 40 32 26 18 10 6 10 18
C 72 64 56 48 40 34 26 18 12 10 10
Global alignment: summary

Matrix-ﬁlling dynamic programming algorithm is O(mn) time and space

FillIng matrix is O(mn) space and time, yields global alignment value

Traceback is O(m + n) time, yields optimal alignment

Global alignment: scoring functions

Where do these penalty functions come from? A C G T -

A 0 4 2 4 8
C 4 0 4 2 8
G 2 4 0 4 8
T 4 2 4 0 8
- 8 8 8 8
They can be based on:

Expected frequency of the diﬀerent mutational events

How interchangeable are the alternatives are from a biological perspective

Does the substitution change the shape or function of the molecule

Prevalence of simple (linear, constant, aﬃne) gap penalties is mostly because

that’s what we can do eﬃciently, as discussed in HW4

One occasionally sees more general (e.g. convex) gap penalties

BLOSUM62 Some amino acid substitutions have a smaller impact on
structure & function than others. BLOSUM62 elements
are, roughly speaking, log-odds of observing these
substitutions between two highly related proteins

Rare; larger eﬀect on Common; modest eﬀect

structure/function on structure/function

negative positive

Matrix is symmetric

Amino acids

IEEE 315-1975 Standard Graphic Symbols For Electrical and Electronic Diagrams
No ratings yet
IEEE 315-1975 Standard Graphic Symbols For Electrical and Electronic Diagrams
453 pages
Radio Frequency Modulation Made Easy Saleh Faruque 2024 Scribd Download
100% (10)
Radio Frequency Modulation Made Easy Saleh Faruque 2024 Scribd Download
62 pages
Dynamic Programming - 2
No ratings yet
Dynamic Programming - 2
24 pages
User Manual - Clustering of Secretariats - 28022025
No ratings yet
User Manual - Clustering of Secretariats - 28022025
10 pages
Tutorial Gibbons
No ratings yet
Tutorial Gibbons
67 pages
05 Dynamic Programming I I
No ratings yet
05 Dynamic Programming I I
64 pages
03 Med
No ratings yet
03 Med
52 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
001 - AffineGap (2023 - 08 - 02 04 - 29 - 18 UTC)
No ratings yet
001 - AffineGap (2023 - 08 - 02 04 - 29 - 18 UTC)
18 pages
Privacy Facebook Acquisti Slides
No ratings yet
Privacy Facebook Acquisti Slides
61 pages
Scoring Matrices 06
No ratings yet
Scoring Matrices 06
25 pages
18-IntroNLP II PDF
No ratings yet
18-IntroNLP II PDF
187 pages
A Research Proposal On ATM Cash Demand Prediction Using Deep Learning Approach: - A Case Study On Enat Bank
No ratings yet
A Research Proposal On ATM Cash Demand Prediction Using Deep Learning Approach: - A Case Study On Enat Bank
11 pages
Chat GPT and Education: April 2023
No ratings yet
Chat GPT and Education: April 2023
9 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
IT 304 OOPM Unit III - 1693892203
No ratings yet
IT 304 OOPM Unit III - 1693892203
10 pages
OCR Assignment
No ratings yet
OCR Assignment
3 pages
06DynamicProgrammingII 2x2
No ratings yet
06DynamicProgrammingII 2x2
17 pages
555-Article Text-1793-2-10-20221202
No ratings yet
555-Article Text-1793-2-10-20221202
12 pages
CSMC Contest (ALevel) Memo 2024 493195792
No ratings yet
CSMC Contest (ALevel) Memo 2024 493195792
2 pages
Mini Project
No ratings yet
Mini Project
66 pages
Lecture 4
No ratings yet
Lecture 4
57 pages
Cloud Saas
No ratings yet
Cloud Saas
17 pages
Stuff Uk June 2023
No ratings yet
Stuff Uk June 2023
118 pages
Code Review
No ratings yet
Code Review
5 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
38 pages
Lecture1 2
No ratings yet
Lecture1 2
44 pages
Course Introduction
No ratings yet
Course Introduction
8 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
Operation Manual: © ZOOM Corporation Reproduction of This Manual, in Whole or in Part, by Any Means, Is Prohibited
No ratings yet
Operation Manual: © ZOOM Corporation Reproduction of This Manual, in Whole or in Part, by Any Means, Is Prohibited
51 pages
Brand Management
No ratings yet
Brand Management
14 pages
Alignment Algorithm
No ratings yet
Alignment Algorithm
58 pages
Last Crash Log
No ratings yet
Last Crash Log
2 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
COB Sequencealignment
No ratings yet
COB Sequencealignment
49 pages
Tuning The Pentium Pro Microarchitecture
No ratings yet
Tuning The Pentium Pro Microarchitecture
8 pages
Bit Stuffing
No ratings yet
Bit Stuffing
17 pages
Affine Gap
No ratings yet
Affine Gap
18 pages
The Ultimate Guide To IBM Certified Mobile Application Developer - Mobile Foundation V8.0
No ratings yet
The Ultimate Guide To IBM Certified Mobile Application Developer - Mobile Foundation V8.0
3 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
423f11 Lec4 Gaps
No ratings yet
423f11 Lec4 Gaps
17 pages
Lecture-7-Dynamic Programming Global-Sequence Alignment
No ratings yet
Lecture-7-Dynamic Programming Global-Sequence Alignment
31 pages
Zhang 2000
No ratings yet
Zhang 2000
12 pages
Csci3104 S2018 L7
No ratings yet
Csci3104 S2018 L7
11 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
String Edit PDF
No ratings yet
String Edit PDF
39 pages
1019ohasd Start Fail - Log
No ratings yet
1019ohasd Start Fail - Log
4 pages
Week 4
No ratings yet
Week 4
38 pages
Needleman Algo
No ratings yet
Needleman Algo
4 pages
DNA Alignment
No ratings yet
DNA Alignment
76 pages
Sequence Comparison Part 3
No ratings yet
Sequence Comparison Part 3
22 pages
q1 Answer
No ratings yet
q1 Answer
2 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Aiml Docx
No ratings yet
Aiml Docx
11 pages
Note Assistant: Important SAP Notes For The Revamped Note Assistant
No ratings yet
Note Assistant: Important SAP Notes For The Revamped Note Assistant
11 pages
EZVPN Configuration Example PDF
No ratings yet
EZVPN Configuration Example PDF
18 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
No ratings yet
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
32 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Mlpyq
No ratings yet
Mlpyq
5 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
Sol Assignment 4 - Edit Distance & Sequence Alignment
No ratings yet
Sol Assignment 4 - Edit Distance & Sequence Alignment
4 pages
Design and Analysis of Algorithm Lab Manual - Answers
No ratings yet
Design and Analysis of Algorithm Lab Manual - Answers
13 pages
056 - Anti-Lock Brake System - IG Power Source Circuit
No ratings yet
056 - Anti-Lock Brake System - IG Power Source Circuit
3 pages
CATALOG - GENECHECKER Model UF-300 Real-Time PCR System PDF
No ratings yet
CATALOG - GENECHECKER Model UF-300 Real-Time PCR System PDF
2 pages
VS AXESS 4ETL Version2 Datasheet VISIONIS
No ratings yet
VS AXESS 4ETL Version2 Datasheet VISIONIS
4 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
cng465 hw1
No ratings yet
cng465 hw1
2 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
Needlemanwunsch 130216130832 Phpapp01
No ratings yet
Needlemanwunsch 130216130832 Phpapp01
39 pages
Kappa210 220 225 235 240
No ratings yet
Kappa210 220 225 235 240
2 pages
CHAPTER 2 Tumba PDF
No ratings yet
CHAPTER 2 Tumba PDF
18 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Needleman Wunsch PDF
No ratings yet
Needleman Wunsch PDF
3 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Enterprise Product Portfolio: Huawei Enterpise Ict Solutions
No ratings yet
Enterprise Product Portfolio: Huawei Enterpise Ict Solutions
22 pages
OM402UNI M 2013 3v3en Part12
No ratings yet
OM402UNI M 2013 3v3en Part12
1 page
DP and Edit Dist
No ratings yet
DP and Edit Dist
30 pages
3.1 Sequence Alignment
No ratings yet
3.1 Sequence Alignment
8 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
What Is Dynamic Programming?
No ratings yet
What Is Dynamic Programming?
7 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet

Global Alignment: Ben Langmead

Uploaded by

Global Alignment: Ben Langmead

Uploaded by

Global Alignment

Department of Computer Science

Please sign guestbook (www.langmead-lab.org/teaching-materials)

What if it doesn’t make sense for every edit to cost 1?

If you compare two human genomes, you see some kinds of

For random mutations,

...but if you compare two humans, transition

from numpy import zeros

def globalAlignment(x, y, s):

Similar to edit distance

Traceback works just as it did for edit distance

Matrix-ﬁlling dynamic programming algorithm is O(mn) time and space

Traceback is O(m + n) time, yields optimal alignment

Where do these penalty functions come from? A C G T -

Expected frequency of the diﬀerent mutational events

How interchangeable are the alternatives are from a biological perspective

Does the substitution change the shape or function of the molecule

Prevalence of simple (linear, constant, aﬃne) gap penalties is mostly because

One occasionally sees more general (e.g. convex) gap penalties

Rare; larger eﬀect on Common; modest eﬀect

You might also like