Lecture09 Review

Concepts
.
• RNA and Protein Secondary Structure
• Bio-networks
• Networks Analyses and Modeling

RNA and Protein Structure
• Primary Structure:
Sequence
• Secondary Structure:
Pairing
• Tertiary Structure: 3D
Tertiary
Shape Structure Secondary Structure
• Quaternary Structure
(Protein)
3
RNA Secondary Structure
• RNA is typically single stranded
• Folding, in large part is determined by

base-pairing A -- U and C -- G are the
canonical base pairs
• The base-paired structure is referred to as

the secondary structure of RNA
4
Bio-networks
– Interaction network: Protein-protein interaction

– Regulatory network: transcription factor/microRNA
– Signal transduction network: pathway
– Metabolic network
– Others: Genetic interactions
5
Overview
• The phylogentics
three is constructed to
answer the post
sequence analysis
questions:
– How are these
sequences (species)
related?
– How are the
organisms from which
these sequences
come related?
Parsimony Illustration
Tree possible trees for the four sequences, with the number
of character changes
Each base change has a cost of 1
construct a minimum spanning tree in graph theory

Hierarchical (used most often)
0 1 2 3 4
agglomerativ
a agglomerativity
a,b
b
a,b,c,d,e
c
c,d,e
d
d,e
e
divisive
4 3 2 1 0
divisivity
Major Lines
.
1. Most Probable state path
M M M M
S1 S2 SL-1 SL
T T T T
x1 x2 xL-1 xL
First Question: Given an output sequence x = (x1,…,xL),

A most probable path s*= (s*1,…,s*L) is one which
maximizes p(s|x).
s* = ( s1* ,..., s *L ) = maxarg p ( s1 ,..., s L | x1 ,..., x L )
(s1 ,..., s L )
Viterbi’s algorithm for most probable
path
s1 s2 si
x1 x2 xi
The task: compute maxarg p ( s1 ,..., s L ; x1 ,..., x L )

(s1 ,..., sL )
Idea: for i=1,…,L and for each state l, compute:
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .
Exercise: For i = 1,…,L and for each state l:

vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ a kl }
k
Sequence Comparison using
HMM
“Hidden” States Symbols emitted
Match M Match: {(a,b)| a,b in ∑ }
Insertion in x Insertion in x: {(a,-)| a in ∑ }.
insertion in y Insertion in y: {(-,a)| a in ∑ }.
• We call this type of model a pair HMM to

distinguish it from the standard HMMs that
emit single sequence
• Each aligned pair is generated by the above
HMM with certain probability
• Most probable path is the best alignment
(Why?)
The Transition Probabilities
M X Y
Transitions probabilities 1-2δ δ δ
M
δ: transition from M to an
insert state X 1- ε ε
ε: staying in an insert state
Y 1- ε ε
• Emission Probabilities
• Match: (a,b) with pab – only from M states
• Insertion in x: (a,-) with qa – only from X state
• Insertion in y: (-,a).with qa - only from Y state.
Adding termination probabilities
We may want a model which defines a probability distribution over all
possible sequences.
M X Y END
For this, an END state is 1-2δ
added, with transition
M
-τ
δ δ τ
probability τ from any 1-ε
other state to END. This
X
-τ
ε τ
assume expected 1-ε
sequence length of 1/ τ. Y -τ ε τ
EN
D 1
HMM for Sequence Alignment:
detailed algorithm
•Most probable path is the best alignment
Let vM(i,j) be the probability of the most

probable alignment of x(1..i) and y(1..j),
which ends with a match. Then using a
recursive argument, we get:
M ⎛ (1 − 2δ )v (i − 1, j − 1) ⎞
⎜ ⎟
v M [i, j ] = pxi y j X
max ⎜ (1 − ε )v (i − 1, j − 1) ⎟
⎜ ⎟
⎜ (1 − ε )v (i − 1, j − 1) ⎟
Y
⎝ ⎠
Most probable path
Similarly, vX(i,j) and vY(i,j), the probabilities of the most
probable alignment of x(1..i) and y(1..j), which ends
with an insertion to x or y:
X
⎛ δ v M
(i − 1, j ) ⎞
v [i, j ] = qxi max ⎜ ⎟
⎜ ε v X (i − 1, j ) ⎟
⎝ ⎠
Y
⎛ δ v M
(i, j − 1) ⎞
v [i, j ] = q y j max ⎜ ⎟
⎜ ε vY (i, j − 1) ⎟
⎝ ⎠
Termination:
( )
v E = τ max v M (n , m ), v X (n , m ), v Y (n , m )
1 M
min ε ( m ) = ∑ ∑ x i − m k
2
2 k =1 x i ∈ S k
{
S k = xi : xi − m k < xi − m j , j ≠ k }
• MSE Clustering
1 k
y= ......
⎧1 y = argmin|| xt − mr ||
I ( y | x) = ⎨ r
⎩0 otherwise m(1) m(2)
mnew old
( old
yi = myi +η xyi − myi )
X
The EM Algorithm
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
Expectation Maximization (EM)
algorithm
• EM is a family of algorithms for learning probabilistic
models in problems that involve hidden state
• in our problem, the hidden state is where the motif starts in

each training sequence
zij = p(y = j | Xi )
E step:
y=3
Viterbi’s algorithm for most probable
path
s1 s2 si
x1 x2 xi
The task: compute maxarg p ( s1 ,..., s L ; x1 ,..., x L )

(s1 ,..., sL )
Idea: for i=1,…,L and for each state l, compute:
Exercise: For i = 1,…,L and for each state l:

vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ a kl }
k
Markovian vs Recursive Argument
D(S,T)=Min (D(S, U)+d(U,T))
for every neighbor U of T.
What is the matching

corresponding to the
blue path?
tc -ca--
-cc--ag
.
tabular computation
8 T
7 4 3 1
6 5 2 1 -1
4 3 0 0 2
2 1 1 3 5
S 0 2 4 6 8
.
traceback: reconstruct the best alignment
Dependence relations
p ( y | x ) p ( x ) = p ( x, y ) = p ( x | y ) p ( y )
p ( x ) = ∑ p ( x, y )
y
p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )

= p(s1 ,..., si −1 , si , x1 ,..., xi −1 | xi , si = l ) p( xi | si = l )
p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )

= p(s1 ,..., si −1 , x1 ,..., xi −1 ) p(si = l | s1 ,..., si −1 , x1 ,..., xi −1 )
akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l
p ( x i | s i = l ) = el ( x i )
vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
M M M M
S1 S2 SL-1 SL
x1 x2 XL-1 xL
p(f|e)
e
f
p(e|e) p(f|f)
p(A|e) p(C|e) p(e|f) p(T|f)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)
A C T G
Given the “visible” sequence x=(x1,…,xL),: How to estimate
the parameters for HMM? Baum-Welch Algorithm
The EM Algorithm
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
G x t m1 , ∑ 1 ) L (
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
Parameter Estimation for HMM
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
An HMM model is defined by the parameters: akl and ek(b), for all
states k,l and all symbols b.
Let θ denote the collection of these parameters.
akl
k l
ek(b)
b
Case 1: Sequences are fully known
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
We know the complete structure of each sequence in the

training set {X1,...,Xn}. We wish to estimate akl and ek(b)
for all pairs of states k, l and symbols b.
Markov Model ( in batch)
p0 = [1,0,0,0]
⎡ p( A | A) p(T | A) p(C | A) p(G | A)⎤
⎢ p( A | T ) p(T | T ) p(C | T ) p(G | T )⎥⎥
P=⎢
⎢ p( A | C) p(T | C) p(C | C) p(G | C)⎥
⎢ ⎥
⎣ p( A | G) p(T | G) p(C | G) p(G | G)⎦
2
p AC =
4
ATGGACGGGATGACG
2 3
p AT = p GA = 2
4 6 p CG =
2
2 3
pTG = p GG =
2 6
p
0
1
1-p 1-q
q
P(010010110101|p,q)=pq(1-p)pqp(1-q)qpqp
maxq, p J (q, p), J (q, p) = p5 (1 − p)q4 (1 − q)

J ( q, p )
= 0, [5 p 4 (1 − p ) − p 5 ]q 4 (1 − q ) = 0,
dp
5(1 − p ) − p = 0, p = 5 / 6
J (q, p)
= 0, p5 (1 − p)[4q3 (1 − q) − q4 ] = 0,
dq
4(1 − q) − q = 0, q = 4 / 5
1 M
min ε ( m ) = ∑ ∑ x i − m k
2
2 k =1 x i ∈ S k
{
S k = xi : xi − m k < xi − m j , j ≠ k }
• MSE Clustering
1 k
y= ......
⎧1 y = argmin|| xt − mr ||
I ( y | x) = ⎨ r
⎩0 otherwise m(1) m(2)
mnew old
( old
yi = myi +η xyi − myi )
X
Case 2: State paths are unknown:
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
In this case only the values of the xi’s of the input

sequences are known.
This is a ML problem with “missing data”.
We wish to find θ* so that p(x|θ*)=MAXθ{p(x|θ)}.
For each sequence x,
p(x|θ)=∑s p(x,s|θ),
taken over all state paths s.
The EM Algorithm
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
G x t m1 , ∑ 1 ) L (
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
Maximum likelihood learning: Baum-Welch (BW) Algorithm
s1 Si-1 Si sL
X1
.. Xi-1 Xi
.. XL
Ald denotes the number of transitions from state l to state d

Ek(b) denotes the number of emitting b as in state k
by considering probable paths
α1 α
k k
j= 1
L
X
Computing P(si-1=k, si=l | Xj,θ)
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi xL-1 xL
P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)

Xj = fk(i-1) aklel(xi ) bl(i)
Via the forward Via the backward
algorithm algorithm
fk(i-1) aklel(xi ) bl(i)

p(si-1=k,si=l | Xj) =
P(x1,…,xL)
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)

Dependence relations
p ( y | x ) p ( x ) = p ( x, y ) = p ( x | y ) p ( y )
p ( x ) = ∑ p ( x, y )
y
p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )

= p(s1 ,..., si −1 , si , x1 ,..., xi −1 | xi , si = l ) p( xi | si = l )
p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )

= p(s1 ,..., si −1 , x1 ,..., xi −1 ) p(si = l | s1 ,..., si −1 , x1 ,..., xi −1 )
akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l
p ( x i | s i = l ) = el ( x i )
vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
s1 s2 si-1 si
x1 x2 xi-1 xi
p (si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p (si | si −1 )
s1 s2 si-1 si
x1 x2 xi-1 xi
p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi )

= p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )
The forward algorithm
s1 s2 si
x1 x2 xi
The task: Compute f(si) = P(x1,…,xi,si) for i=1,…,L (namely,

considering evidence up to time slot i).
P(x1, s1) = P(s1) P(x1|s1) {Basis step}
P(x1,x2,s2) = Σ s P(x1,s1,s2,x2)
1
{Second step}
= Σ s P(x1,s1) P(s2 | x1,s1) P(x2 | x1,s1,s2)
Last equality due 1
to conditional
independence = Σs 1
P(x1,s1) P(s2 | s1) P(x2 | s2)
{step i}
P(x1,…,xi,si) = Σ s P(x1,…,xi-1, si-1) P(si | si-1 ) P(xi | si)
i-1
The backward algorithm
si si+1 sL-1 sL
xi+1 xL-1 xL
The task: Compute b(si) = P(xi+1,…,xL|si) for i=L-1,…,1

(namely, considering evidence after time slot i).
P(xL| sL-1) = Σ sLP(xL ,sL |sL-1) = Σ sLP(sL |sL-1) P(xL |sL-1 ,sL )=
Last equality due to
conditional independence = Σ s P(sL |sL-1) P(xL |sL )
L
{first step}
{step i}
P(xi+1,…,xL|si) = Σ si+1 P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1)
=b(si) =b(si+1)
Likelihood of evidence
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
To compute the likelihood of evidence P(x1,…,xL), do one

more step in the forward algorithm, namely,
Σs f(sL) = Σ s P(x1,…,xL,sL)
L L
Likelihood of evidence
s1 s2 si sL-1 sL
x1 x2 xi xL-1 xL
do one more step in the backward algorithm, namely,

Σ b(s1) P(s1) P(x1|s1) = Σ P(x2,…,xL|s1) P(s1) P(x1|s1)
s1 s1
=p(x1,…,xL),
s1 s2 Si-1 si sL-1 sL
x1 x2 xi-1 xi xL-1 xL
P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)

Xj = fk(i-1) aklel(xi ) bl(i)
Via the forward Via the backward
algorithm algorithm
fk(i-1) aklel(xi ) bl(i)

p(si-1=k,si=l | Xj) =
P(x1,…,xL)
P(x1,…,xL) = Σ si P(x1,…,xL,sL)
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)
Decision Rule
S n = s1s2 L sn , α j , j = 1, L , k
j = arg max j [α j p ( S n | M j )]
*
Decide
S n from M j *
The EM Algorithm
α1 α k
y= 1
L k
α new
j =
1
N t =1
N
∑ p(y xt )
Bayesian Inversion
p( y xt ) =
(
G x t m1 , ∑ 1 ) L (
j = α new
j N
∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X
p (1 x t ) = 0 .5 ∑
new
j
1 N
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2
x t = [x 1 , x 2 ]
t
x1
y= j
zij = p(y = j |Xi)
re-estimate Z from p (E- re-estimate p from Z
step) (M-step) j
pcj
c
EM Approach for Motif Discovery

Expectation Maximization (EM)
algorithm
• EM is a family of algorithms for learning probabilistic
models in problems that involve hidden state
• in our problem, the hidden state is where the motif starts in

each training sequence
E step:
y=3
Motif and Profile Matrices
a motif is assumed to have a fixed width, W
• Given a set of aligned sequences, it is

straightforward to construct a profile
p
matrix characterizing a motif of interest
M Step: Example
p ( X i | y = j , p ), p = [ p ck ]

Lecture09 Review

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Lecture09 Review

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture09 Review

Uploaded by

Copyright:

Available Formats

Concepts

• Networks Analyses and Modeling

• Folding, in large part is determined by

• The base-paired structure is referred to as

– Interaction network: Protein-protein interaction

Each base change has a cost of 1

construct a minimum spanning tree in graph theory

First Question: Given an output sequence x = (x1,…,xL),

The task: compute maxarg p ( s1 ,..., s L ; x1 ,..., x L )

Exercise: For i = 1,…,L and for each state l:

• We call this type of model a pair HMM to

Let vM(i,j) be the probability of the most

• in our problem, the hidden state is where the motif starts in

The task: compute maxarg p ( s1 ,..., s L ; x1 ,..., x L )

Exercise: For i = 1,…,L and for each state l:

What is the matching

p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )

p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )

We know the complete structure of each sequence in the

maxq, p J (q, p), J (q, p) = p5 (1 − p)q4 (1 − q)

In this case only the values of the xi’s of the input

Ald denotes the number of transitions from state l to state d

P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)

fk(i-1) aklel(xi ) bl(i)

P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)

p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )

p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )

p (si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p (si | si −1 )

p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi )

The task: Compute f(si) = P(x1,…,xi,si) for i=1,…,L (namely,

The task: Compute b(si) = P(xi+1,…,xL|si) for i=L-1,…,1

To compute the likelihood of evidence P(x1,…,xL), do one

do one more step in the backward algorithm, namely,

P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)

fk(i-1) aklel(xi ) bl(i)

EM Approach for Motif Discovery

• in our problem, the hidden state is where the motif starts in

• Given a set of aligned sequences, it is

You might also like