[go: up one dir, main page]

0% found this document useful (0 votes)
7 views51 pages

Lecture09 Review

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 51

Concepts

.
• RNA and Protein Secondary Structure

• Bio-networks

• Networks Analyses and Modeling


RNA and Protein Structure
• Primary Structure:
Sequence
• Secondary Structure:
Pairing
• Tertiary Structure: 3D
Tertiary
Shape Structure Secondary Structure
• Quaternary Structure
(Protein)

3
RNA Secondary Structure
• RNA is typically single stranded

• Folding, in large part is determined by


base-pairing A -- U and C -- G are the
canonical base pairs

• The base-paired structure is referred to as


the secondary structure of RNA

4
Bio-networks

– Interaction network: Protein-protein interaction


– Regulatory network: transcription factor/microRNA
– Signal transduction network: pathway
– Metabolic network
– Others: Genetic interactions

5
Overview
• The phylogentics
three is constructed to
answer the post
sequence analysis
questions:
– How are these
sequences (species)
related?
– How are the
organisms from which
these sequences
come related?
Parsimony Illustration
Tree possible trees for the four sequences, with the number
of character changes

Each base change has a cost of 1

construct a minimum spanning tree in graph theory


Hierarchical (used most often)

0 1 2 3 4
agglomerativ
a agglomerativity
a,b
b
a,b,c,d,e
c
c,d,e
d
d,e
e
divisive
4 3 2 1 0
divisivity
Major Lines

.
1. Most Probable state path
M M M M
S1 S2 SL-1 SL

T T T T
x1 x2 xL-1 xL

First Question: Given an output sequence x = (x1,…,xL),


A most probable path s*= (s*1,…,s*L) is one which
maximizes p(s|x).
s* = ( s1* ,..., s *L ) = maxarg p ( s1 ,..., s L | x1 ,..., x L )
(s1 ,..., s L )
Viterbi’s algorithm for most probable
path
s1 s2 si

x1 x2 xi

The task: compute maxarg p ( s1 ,..., s L ; x1 ,..., x L )


(s1 ,..., sL )
Idea: for i=1,…,L and for each state l, compute:
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .

Exercise: For i = 1,…,L and for each state l:


vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ a kl }
k
Sequence Comparison using
HMM
“Hidden” States Symbols emitted
Match M Match: {(a,b)| a,b in ∑ }
Insertion in x Insertion in x: {(a,-)| a in ∑ }.
insertion in y Insertion in y: {(-,a)| a in ∑ }.

• We call this type of model a pair HMM to


distinguish it from the standard HMMs that
emit single sequence
• Each aligned pair is generated by the above
HMM with certain probability
• Most probable path is the best alignment
(Why?)
The Transition Probabilities
M X Y
Transitions probabilities 1-2δ δ δ
M
δ: transition from M to an
insert state X 1- ε ε
ε: staying in an insert state
Y 1- ε ε

• Emission Probabilities
• Match: (a,b) with pab – only from M states
• Insertion in x: (a,-) with qa – only from X state
• Insertion in y: (-,a).with qa - only from Y state.
Adding termination probabilities
We may want a model which defines a probability distribution over all
possible sequences.
M X Y END
For this, an END state is 1-2δ
added, with transition
M

δ δ τ
probability τ from any 1-ε
other state to END. This
X

ε τ
assume expected 1-ε
sequence length of 1/ τ. Y -τ ε τ

EN
D 1
HMM for Sequence Alignment:
detailed algorithm
•Most probable path is the best alignment

Let vM(i,j) be the probability of the most


probable alignment of x(1..i) and y(1..j),
which ends with a match. Then using a
recursive argument, we get:
M ⎛ (1 − 2δ )v (i − 1, j − 1) ⎞
⎜ ⎟
v M [i, j ] = pxi y j X
max ⎜ (1 − ε )v (i − 1, j − 1) ⎟
⎜ ⎟
⎜ (1 − ε )v (i − 1, j − 1) ⎟
Y
⎝ ⎠
Most probable path
Similarly, vX(i,j) and vY(i,j), the probabilities of the most
probable alignment of x(1..i) and y(1..j), which ends
with an insertion to x or y:
X
⎛ δ v M
(i − 1, j ) ⎞
v [i, j ] = qxi max ⎜ ⎟
⎜ ε v X (i − 1, j ) ⎟
⎝ ⎠

Y
⎛ δ v M
(i, j − 1) ⎞
v [i, j ] = q y j max ⎜ ⎟
⎜ ε vY (i, j − 1) ⎟
⎝ ⎠

Termination:
( )
v E = τ max v M (n , m ), v X (n , m ), v Y (n , m )
1 M
min ε ( m ) = ∑ ∑ x i − m k
2

2 k =1 x i ∈ S k
{
S k = xi : xi − m k < xi − m j , j ≠ k }
• MSE Clustering

1 k
y= ......

⎧1 y = argmin|| xt − mr ||
I ( y | x) = ⎨ r
⎩0 otherwise m(1) m(2)

mnew old
( old
yi = myi +η xyi − myi )
X
The EM Algorithm

α1 α k
y= 1
L k
α new
j =
1
N t =1
N

∑ p(y xt )

Bayesian Inversion

p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N

∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X

p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2

x t = [x 1 , x 2 ]
t

x1
Expectation Maximization (EM)
algorithm
• EM is a family of algorithms for learning probabilistic
models in problems that involve hidden state

• in our problem, the hidden state is where the motif starts in


each training sequence
zij = p(y = j | Xi )
E step:

y=3
Viterbi’s algorithm for most probable
path
s1 s2 si

x1 x2 xi

The task: compute maxarg p ( s1 ,..., s L ; x1 ,..., x L )


(s1 ,..., sL )
Idea: for i=1,…,L and for each state l, compute:
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .

Exercise: For i = 1,…,L and for each state l:


vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ a kl }
k
Markovian vs Recursive Argument
D(S,T)=Min (D(S, U)+d(U,T))
for every neighbor U of T.

What is the matching


corresponding to the
blue path?

tc -ca--
-cc--ag

.
tabular computation

8 T
7 4 3 1

6 5 2 1 -1

4 3 0 0 2

2 1 1 3 5

S 0 2 4 6 8

.
traceback: reconstruct the best alignment
Dependence relations
p ( y | x ) p ( x ) = p ( x, y ) = p ( x | y ) p ( y )

p ( x ) = ∑ p ( x, y )
y

p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .

p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )


= p(s1 ,..., si −1 , si , x1 ,..., xi −1 | xi , si = l ) p( xi | si = l )

p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )


= p(s1 ,..., si −1 , x1 ,..., xi −1 ) p(si = l | s1 ,..., si −1 , x1 ,..., xi −1 )

akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l

p ( x i | s i = l ) = el ( x i )

vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
M M M M
S1 S2 SL-1 SL

x1 x2 XL-1 xL
p(f|e)

e
f
p(e|e) p(f|f)
p(A|e) p(C|e) p(e|f) p(T|f)
p(G|f)
p(C|f) p(G|e)
p(A|f) p(T|e)

A C T G
Given the “visible” sequence x=(x1,…,xL),: How to estimate
the parameters for HMM? Baum-Welch Algorithm
The EM Algorithm

α1 α k
y= 1
L k
α new
j =
1
N t =1
N

∑ p(y xt )

Bayesian Inversion

p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N

∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X

p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2

x t = [x 1 , x 2 ]
t

x1
Parameter Estimation for HMM
s1 s2 si sL-1 sL

x1 x2 xi xL-1 xL

An HMM model is defined by the parameters: akl and ek(b), for all
states k,l and all symbols b.
Let θ denote the collection of these parameters.
akl
k l

ek(b)
b
Case 1: Sequences are fully known
s1 s2 si sL-1 sL

x1 x2 xi xL-1 xL

We know the complete structure of each sequence in the


training set {X1,...,Xn}. We wish to estimate akl and ek(b)
for all pairs of states k, l and symbols b.
Markov Model ( in batch)
p0 = [1,0,0,0]
⎡ p( A | A) p(T | A) p(C | A) p(G | A)⎤
⎢ p( A | T ) p(T | T ) p(C | T ) p(G | T )⎥⎥
P=⎢
⎢ p( A | C) p(T | C) p(C | C) p(G | C)⎥
⎢ ⎥
⎣ p( A | G) p(T | G) p(C | G) p(G | G)⎦
2
p AC =
4
ATGGACGGGATGACG
2 3
p AT = p GA = 2
4 6 p CG =
2

2 3
pTG = p GG =
2 6
p

0
1
1-p 1-q
q
P(010010110101|p,q)=pq(1-p)pqp(1-q)qpqp

maxq, p J (q, p), J (q, p) = p5 (1 − p)q4 (1 − q)


J ( q, p )
= 0, [5 p 4 (1 − p ) − p 5 ]q 4 (1 − q ) = 0,
dp
5(1 − p ) − p = 0, p = 5 / 6

J (q, p)
= 0, p5 (1 − p)[4q3 (1 − q) − q4 ] = 0,
dq
4(1 − q) − q = 0, q = 4 / 5
1 M
min ε ( m ) = ∑ ∑ x i − m k
2

2 k =1 x i ∈ S k
{
S k = xi : xi − m k < xi − m j , j ≠ k }
• MSE Clustering

1 k
y= ......

⎧1 y = argmin|| xt − mr ||
I ( y | x) = ⎨ r
⎩0 otherwise m(1) m(2)

mnew old
( old
yi = myi +η xyi − myi )
X
Case 2: State paths are unknown:
s1 s2 si sL-1 sL

x1 x2 xi xL-1 xL

In this case only the values of the xi’s of the input


sequences are known.
This is a ML problem with “missing data”.
We wish to find θ* so that p(x|θ*)=MAXθ{p(x|θ)}.
For each sequence x,
p(x|θ)=∑s p(x,s|θ),
taken over all state paths s.
The EM Algorithm

α1 α k
y= 1
L k
α new
j =
1
N t =1
N

∑ p(y xt )

Bayesian Inversion

p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N

∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X

p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2

x t = [x 1 , x 2 ]
t

x1
Maximum likelihood learning: Baum-Welch (BW) Algorithm

s1 Si-1 Si sL

X1
.. Xi-1 Xi
.. XL

Ald denotes the number of transitions from state l to state d


Ek(b) denotes the number of emitting b as in state k
by considering probable paths
α1 α
k k
j= 1
L

X
Computing P(si-1=k, si=l | Xj,θ)
s1 s2 Si-1 si sL-1 sL

x1 x2 xi-1 xi xL-1 xL

P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)


Xj = fk(i-1) aklel(xi ) bl(i)
Via the forward Via the backward
algorithm algorithm

fk(i-1) aklel(xi ) bl(i)


p(si-1=k,si=l | Xj) =
P(x1,…,xL)

P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)


Dependence relations
p ( y | x ) p ( x ) = p ( x, y ) = p ( x | y ) p ( y )

p ( x ) = ∑ p ( x, y )
y

p(x | y) p( y) p(x | y) p( y)
p( y | x) = p(x, y) / p(x) = =
p(x) ∑ p(x, y) y
vl(i) = the probability p(s1,..,si;x1,..,xi|si=l ) of a most probable
path up to i, which ends in state l .

p(s1 ,..., si , x1 ,..., xi | si = l ) = p(s1 ,..., si −1 , si , x1 ,..., xi −1 , xi | si = l )


= p(s1 ,..., si −1 , si , x1 ,..., xi −1 | xi , si = l ) p( xi | si = l )

p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi ) = p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )


= p(s1 ,..., si −1 , x1 ,..., xi −1 ) p(si = l | s1 ,..., si −1 , x1 ,..., xi −1 )

akl
p(si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p(si | si −1 ) k l

p ( x i | s i = l ) = el ( x i )

vl (i ) = el ( xi ) ⋅ max{vk (i − 1) ⋅ akl } xi
k
s1 s2 si-1 si

x1 x2 xi-1 xi

p (si | s1 ,..., si −1 , x1 ,..., xi −1 ) = p (si | si −1 )

s1 s2 si-1 si

x1 x2 xi-1 xi

p(s1 ,..., si −1 , si , x1 ,..., xi −1 | si = l, xi )


= p(s1 ,..., si −1 , si = l, x1 ,..., xi −1 )
The forward algorithm
s1 s2 si

x1 x2 xi

The task: Compute f(si) = P(x1,…,xi,si) for i=1,…,L (namely,


considering evidence up to time slot i).
P(x1, s1) = P(s1) P(x1|s1) {Basis step}

P(x1,x2,s2) = Σ s P(x1,s1,s2,x2)
1
{Second step}
= Σ s P(x1,s1) P(s2 | x1,s1) P(x2 | x1,s1,s2)
Last equality due 1

to conditional
independence = Σs 1
P(x1,s1) P(s2 | s1) P(x2 | s2)
{step i}
P(x1,…,xi,si) = Σ s P(x1,…,xi-1, si-1) P(si | si-1 ) P(xi | si)
i-1
The backward algorithm
si si+1 sL-1 sL

xi+1 xL-1 xL

The task: Compute b(si) = P(xi+1,…,xL|si) for i=L-1,…,1


(namely, considering evidence after time slot i).

P(xL| sL-1) = Σ sLP(xL ,sL |sL-1) = Σ sLP(sL |sL-1) P(xL |sL-1 ,sL )=
Last equality due to
conditional independence = Σ s P(sL |sL-1) P(xL |sL )
L
{first step}
{step i}
P(xi+1,…,xL|si) = Σ si+1 P(si+1 | si) P(xi+1 | si+1) P(xi+2,…,xL| si+1)
=b(si) =b(si+1)
Likelihood of evidence
s1 s2 si sL-1 sL

x1 x2 xi xL-1 xL

To compute the likelihood of evidence P(x1,…,xL), do one


more step in the forward algorithm, namely,
Σs f(sL) = Σ s P(x1,…,xL,sL)
L L
Likelihood of evidence
s1 s2 si sL-1 sL

x1 x2 xi xL-1 xL

do one more step in the backward algorithm, namely,


Σ b(s1) P(s1) P(x1|s1) = Σ P(x2,…,xL|s1) P(s1) P(x1|s1)
s1 s1

=p(x1,…,xL),
s1 s2 Si-1 si sL-1 sL

x1 x2 xi-1 xi xL-1 xL

P(x1,…,xL,si-1=k,si=l) = P(x1,…,xi-1,si-1=k) aklel(xi ) P(xi+1,…,xL |si=l)


Xj = fk(i-1) aklel(xi ) bl(i)
Via the forward Via the backward
algorithm algorithm

fk(i-1) aklel(xi ) bl(i)


p(si-1=k,si=l | Xj) =
P(x1,…,xL)

P(x1,…,xL) = Σ si P(x1,…,xL,sL)
P(x1,…,xL,si) = P(x1,…,xi,si) P(xi+1,…,xL | si) ≡ f(si) b(si)
Decision Rule

S n = s1s2 L sn , α j , j = 1, L , k
j = arg max j [α j p ( S n | M j )]
*

Decide
S n from M j *
The EM Algorithm

α1 α k
y= 1
L k
α new
j =
1
N t =1
N

∑ p(y xt )

Bayesian Inversion

p( y xt ) =
(
α j G xt m j , Σ j ) (
G x t m1 , ∑ 1 ) L (
G xt m k , ∑ k ) n new
j = α new
j N

∑ α G(x m , Σ )
r r t r r
N
1
x̂1 x̂ k m new
j =
n new
∑ p( y x )x
t =1
t t
j
X

p (1 x t ) = 0 .5 ∑
new
j
1 N
= new ∑ p( y xt ) xt − mnew
nj t =1
j [xt − mnew T
j ][ ]
p (2 x t ) = 0 .05
p (3 x t ) = 0 .45
x2

x t = [x 1 , x 2 ]
t

x1
y= j
zij = p(y = j |Xi)
re-estimate Z from p (E- re-estimate p from Z
step) (M-step) j

pcj
c

EM Approach for Motif Discovery


Expectation Maximization (EM)
algorithm
• EM is a family of algorithms for learning probabilistic
models in problems that involve hidden state

• in our problem, the hidden state is where the motif starts in


each training sequence
zij = p(y = j | Xi )
E step:

y=3
Motif and Profile Matrices
a motif is assumed to have a fixed width, W

• Given a set of aligned sequences, it is


straightforward to construct a profile
p
matrix characterizing a motif of interest
M Step: Example

zij = p(y = j | Xi )

p ( X i | y = j , p ), p = [ p ck ]

You might also like