Lecture09 Review
Lecture09 Review
Lecture09 Review
.
• RNA and Protein Secondary Structure
• Bio-networks
3
RNA Secondary Structure
• RNA is typically single stranded
4
Bio-networks
5
Overview
• The phylogentics
three is constructed to
answer the post
sequence analysis
questions:
– How are these
sequences (species)
related?
– How are the
organisms from which
these sequences
come related?
Parsimony Illustration
Tree possible trees for the four sequences, with the number
of character changes
0 1 2 3 4
agglomerativ
a agglomerativity
a,b
b
a,b,c,d,e
c
c,d,e
d
d,e
e
divisive
4 3 2 1 0
divisivity
Major Lines
.
1. Most Probable state path
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 xL-1 xL
x1 x2 xi
• Emission Probabilities
• Match: (a,b) with pab – only from M states
• Insertion in x: (a,-) with qa – only from X state
• Insertion in y: (-,a).with qa - only from Y state.
Adding termination probabilities
We may want a model which defines a probability distribution over all
possible sequences.
M X Y END
For this, an END state is 1-2δ
added, with transition
M
-τ
δ δ τ
probability τ from any 1-ε
other state to END. This
X
-τ
ε τ
assume expected 1-ε
sequence length of 1/ τ. Y -τ ε τ
EN
D 1
HMM for Sequence Alignment:
detailed algorithm
•Most probable path is the best alignment
Y
⎛ δ v M
(i, j − 1) ⎞
v [i, j ] = q y j max ⎜ ⎟
⎜ ε vY (i, j − 1) ⎟
⎝ ⎠
Termination:
( )
v E = τ max v M (n , m ), v X (n , m ), v Y (n , m )
1 M
min ε ( m ) = ∑ ∑ x i − m k
2
2 k =1 x i ∈ S k
{
S k = xi : xi − m k < xi − m j , j ≠ k }
• MSE Clustering
1 k
y= ......
⎧1 y = argmin|| xt − mr ||
I ( y | x) = ⎨ r
⎩0 otherwise m(1) m(2)
mnew old
( old
yi = myi +η xyi − myi )
X
The EM Algorithm
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
Expectation Maximization (EM)
algorithm
• EM is a family of algorithms for learning probabilistic
models in problems that involve hidden state
y=3
Viterbi’s algorithm for most probable
path
s1 s2 si
x1 x2 xi
tc -ca--
-cc--ag
.
tabular computation
8 T
7 4 3 1
6 5 2 1 -1
4 3 0 0 2
2 1 1 3 5
S 0 2 4 6 8
.
traceback: reconstruct the best alignment
Dependence relations
p ( y | x ) p ( x ) = p ( x, y ) = p ( x | y ) p ( y )
p ( x ) = ∑ p ( x, y )
y
p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .
akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l
p ( x i | s i = l ) = el ( x i )
vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
M M M M
S1 S2 SL-1 SL
x1 x2 XL-1 xL
p(f|e)
e
f
p(e|e) p(f|f)
p(A|e) p(C|e) p(e|f) p(T|f)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
Given the “visible” sequence x=(x1,…,xL),: How to estimate
the parameters for HMM? Baum-Welch Algorithm
The EM Algorithm
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
Parameter Estimation for HMM
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
An HMM model is defined by the parameters: akl and ek(b), for all
states k,l and all symbols b.
Let θ denote the collection of these parameters.
akl
k l
ek(b)
b
Case 1: Sequences are fully known
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
2 3
pTG = p GG =
2 6
p
0
1
1-p 1-q
q
P(010010110101|p,q)=pq(1-p)pqp(1-q)qpqp
J (q, p)
= 0, p5 (1 − p)[4q3 (1 − q) − q4 ] = 0,
dq
4(1 − q) − q = 0, q = 4 / 5
1 M
min ε ( m ) = ∑ ∑ x i − m k
2
2 k =1 x i ∈ S k
{
S k = xi : xi − m k < xi − m j , j ≠ k }
• MSE Clustering
1 k
y= ......
⎧1 y = argmin|| xt − mr ||
I ( y | x) = ⎨ r
⎩0 otherwise m(1) m(2)
mnew old
( old
yi = myi +η xyi − myi )
X
Case 2: State paths are unknown:
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
Maximum likelihood learning: Baum-Welch (BW) Algorithm
s1 Si-1 Si sL
X1
.. Xi-1 Xi
.. XL
X
Computing P(si-1=k, si=l | Xj,θ)
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi xL-1 xL
p ( x ) = ∑ p ( x, y )
y
p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .
akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l
p ( x i | s i = l ) = el ( x i )
vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
s1 s2 si-1 si
x1 x2 xi-1 xi
s1 s2 si-1 si
x1 x2 xi-1 xi
x1 x2 xi
P(x1,x2,s2) = Σ s P(x1,s1,s2,x2)
1
{Second step}
= Σ s P(x1,s1) P(s2 | x1,s1) P(x2 | x1,s1,s2)
Last equality due 1
to conditional
independence = Σs 1
P(x1,s1) P(s2 | s1) P(x2 | s2)
{step i}
P(x1,…,xi,si) = Σ s P(x1,…,xi-1, si-1) P(si | si-1 ) P(xi | si)
i-1
The backward algorithm
si si+1 sL-1 sL
xi+1 xL-1 xL
P(xL| sL-1) = Σ sLP(xL ,sL |sL-1) = Σ sLP(sL |sL-1) P(xL |sL-1 ,sL )=
Last equality due to
conditional independence = Σ s P(sL |sL-1) P(xL |sL )
L
{first step}
{step i}
P(xi+1,…,xL|si) = Σ si+1 P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1)
=b(si) =b(si+1)
Likelihood of evidence
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
x1 x2 xi xL-1 xL
=p(x1,…,xL),
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi xL-1 xL
P(x1,…,xL) = Σ si P(x1,…,xL,sL)
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)
Decision Rule
S n = s1s2 L sn , α j , j = 1, L , k
j = arg max j [α j p ( S n | M j )]
*
Decide
S n from M j *
The EM Algorithm
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
y= j
zij = p(y = j |Xi)
re-estimate Z from p (E- re-estimate p from Z
step) (M-step) j
pcj
c
y=3
Motif and Profile Matrices
a motif is assumed to have a fixed width, W
zij = p(y = j | Xi )
p ( X i | y = j , p ), p = [ p ck ]