0% found this document useful (0 votes)

41 views28 pages

Bioinformatics-Lesson 07 - Hidden Markov Model

This document provides an introduction to Markov chain models and hidden Markov models. It discusses using Markov chains to model nucleotide frequencies in DNA and discriminate between CpG islands and other genomic regions. It then introduces hidden Markov models and the key questions they aim to answer: the likelihood of a sequence, the most probable path that generated a sequence, and learning model parameters from example sequences. An example of an occasionally dishonest casino is used to illustrate hidden Markov models and the forward algorithm for calculating sequence likelihoods.

Uploaded by

mahedi hasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views28 pages

Bioinformatics-Lesson 07 - Hidden Markov Model

Uploaded by

mahedi hasan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to Bioinformatics

Lecture 9
A Markov Chain Model
• Nucleotide frequencies in the human genome
A C T G
29.5 20.4 20.5 29.6
A Markov Chain Model
• Traditionally end of a sequence is not modelled
• can also have an explicit end state; allows the model
to represent
– a distribution over sequences of different lengths
– preferences for ending sequences with certain
symbols
Markov Chain Model: Definition

• a Markov chain model is defined by

– a set of states
• some states emit symbols
• other states are silent
– (e.g. the begin and end states)
– a set of transitions with associated
probabilities
• the transitions emanating from a given state
define a distribution over the possible next
states
Markov Chain Model: Property
• given some sequence x of length L, we can ask how
probable the sequence is given our model
• for any probabilistic model of sequences, we can write
this probability as
Pr(x)  Pr(xL , xL1 ,K , x1 )
 Pr(xL | xL1 ,K , x1 ) Pr(xL1 | xL2 ,K , x1 )K Pr(x1 )

• key property of a (1st order) Markov chain: the

probability of each xi depends only on the value of xi-1
Pr(x)  Pr(xL | xL1 ) Pr(xL 1 | xL2 )K Pr(x2 | x1 ) Pr(x1 )
L
 Pr(x1 ) Pr(xi | xi 1 )
i2
The Probability of a Sequence for a Given
Markov Chain Model

Pr(cggt)  Pr(c) Pr(g | c) Pr(g | g) Pr(t | g) Pr(end |

t)
Markov Chain Model: Notation
• the transition parameters can be denoted by a
xi1 xi where

a xi1 xi  Pr(xi | x )
i1
• similarly we can denote the probability of a sequence x
as B x1 L  L

xi1 xi  Pr(x1 ) Pr(xi |

a i2
a i2
xi 1 )
where aB x i
represents the transition from the begin state
Written CpG to

CpG Islands distinguish from

a C≡G base pair

• CpG dinucleotides are rarer than would be expected

from the independent probabilities of C and G.
– Reason: When CpG occurs, C is typically chemically
modified by methylation and there is a relatively high
chance of methyl-C mutating into T
• A CpG island is a region where CpG dinucleotides
are much more abundant than elsewhere.
• High CpG frequency may be biologically significant;
e.g., may signal promoter region (“start” of a gene).
Markov Chain for Discrimination
• suppose we want to distinguish CpG islands from
other sequence regions
• given sequences from CpG islands, and sequences
from other regions, we can construct
– a model to represent CpG islands (model +)
– a null model to represent other regions (model -)
• can then score a test sequence by:

Pr( x | model)
score( x)  log
Pr(x | model - )
Markov Chain for Discrimination
• parameters estimated for + and - models
– human sequences containing 48 CpG islands
– 60,000 nucleotides

• Calculated Transition probabilities for both models

Markov Chain for Discrimination
• Calculated the log-odds ratio

L  L
a
score(x)  log
Pr( x | model)
Pr(x | model
  log xi1 x
i    xi1 xi
i1 i1

xi1 xi
a
-)
x x
• i1
i
are the log-likelihood ratios of corresponding
transition probabilities
 A C G T
A -0.740 0.419 0.580 -0.803

C -0.913 0.302 1.812 -0.685

T -0.624 0.461 0.331 -0.730

T -1.169 0.573 0.393 -0.679

Markov Chain for Discrimination

•Solid bars represent

non-CPG

•Dotted bars represent

CpG islands

•Error could be due to

inadequate modeling or
mislabelling
A simple Hidden Markov Model (HMM)

• given say a T in our input sequence, which state

emitted it?
Why Hidden ?

• we’ll distinguish between the observed parts

of a problem and the hidden parts
• in the Markov chain models it is clear which
state accounts for each part of the observed
sequence
• in the model above, there are multiple states
that could account for each part of the
observed sequence
– this is the hidden part of the problem
– the essential difference between Markov
chain and Hidden Markov model
Hidden Markov Models
• Components:
– Observed variables
• Emitted symbols
– Hidden variables
– Relationships between them
• Represented by a graph with transition
probabilities

• Goal: Find the most likely explanation for the

observed variables
Notations in HMM
• States are decoupled from symbols
• x is the sequence of symbols emitted by model
– xi is the symbol emitted at time i
• A path, , is a sequence of states
– The i-th state in  is i
• akr is the probability of making a transition from
state k to state r:
akr Pr(i  r |i 1
 k)
• ek(b) is the probability that symbol b is emitted
when in state k

e (b )  Pr(x b|
The occasionally dishonest casino
• A casino uses a fair die most of the time, but
occasionally switches to a loaded one
– Fair die: Prob(1) = Prob(2) = . . . = Prob(6) =
1/6
– Loaded die: Prob(1) = Prob(2) = . . . = Prob(5) = 1/10,
Prob(6) = ½
– These are the emission probabilities
• Transition probabilities
– Prob(Fair  Loaded) = 0.01
– Prob(Loaded  Fair) = 0.2
– Transitions between states obey a Markov process
An HMM for occasionally dishonest casino
1: 1/6
2: 1/6 akl
0.99
0.80
0.01
1: 1/10
2: 1/10
3: 1/6 3: 1/10
4: 1/6 4: 1/10
5: 1/6 5: 1/10
0.2
6: 1/6 6: 1/2

ek (b) Fair Loaded

The occasionally dishonest casino
• Known:
– The structure of the model
– The transition probabilities
• Hidden: What the casino did
– FFFFFLLLLLLLFFFF...
• Observable: The series of die
tosses
– 3415256664666153...
• What we must infer:
– When was a fair die used?
– When was a loaded one used?
• The answer is a sequence
FFFFFFFLLLLLLFFF...
Three Important Questions
• How likely is a given sequence?
– the Forward algorithm
• What is the most probable “path” for
generating a given sequence?
– the Viterbi algorithm
• How can we learn the HMM parameters
given a set of sequences?
– the Baum-Welch (Forward-Backward)
algorithm
How Likely is a Given Sequence?
The probability that the path 1, 2,…, L is taken
and the sequence x1,x2,…,xL is generated:

Pr(x1 ,K , xL | 1 ,K ,  L )  a0 1

 e (xi )a 
i i i1

i1

(assuming begin/end are the only silent

states on path)
The occasionally dishonest casino
x  x1 , x2 , 
x3 6,2,6
Pr(x,  )  a0 F eF (6)a FF eF (2)a FF eF (6)
(1)

1 1

1
0.5  0.99   0.99 
 FFF
 (1)
6 6
6
 0.00227
Pr(x, 
(2)
)  a0 L eL (6)aLL eL (2)aLL eL (6)
 LLL
 (2)  0.5 0.5 0.8 0.1 0.8
0.5
 0.008
Pr(x,  )  a0 L eL (6)aLF eF (2)aFL eL (6)aL 0
(3)

 LFL 1
 0.5 0.5 0.2   0.01
 (3)
0.5
6
 0.0000417
Making the inference
• Model assigns a probability to each explanation of the
observation:
P(326|FFL)

• Maximum Likelihood: Determine which

explanation is most likely
– Find the path most likely to have produced
the observed sequence

• Total probability: Determine probability

that observed sequence was produced by the
HMM
– Consider all paths that could have produced
the observed sequence
How Likely is a Given Sequence?
• for a single path  the probability that the
sequence x is generated L

Pr(x1 ,K , xL | 1 ,K ,  L )  a0 1

 e (xi )a 
i i i1

i1

• the probability over all such paths is:

Pr(x
• but,K
1
the number
sequence...

, xL ) ofpaths Pr(x
can
1 ,K ,
be exponential in the length of the

xL • the Forward algorithm enables us to compute this efficiently

The most probable path
The most likely path * satisfies

*  argmax
Pr(x,  )

To find *, consider all possible ways the last
symbol of x could have been emitted
vk (i )  Prob. of path  1 ,L ,  i most
Let
to emit x ,K , xlikely
1 i such
that  i k
Then
vk (i )  ek (xi ) max
r
rv rk

(i  1)a 
The Viterbi Algorithm
• Initialization: (i = 0)
v0 (0)  1, vk (0)  0 for k  0
• Recursion: (i = 1, . . . , L): For each state k
vk (i)  ek (xi ) max
r
vr (i 1)ark 
• Termination:
Pr(x,  * )  vk (L)ak
max 0
k

To find *, use trace-back, as in dynamic programming

Viterbi: Example
x
 6 2 6
1 0 0 0
B
0 (1/6)(1/2) (1/6)max{(1/12)0.99, (1/6)max{0.013750.99,
= 1/12 (1/4)0.2} 0.020.2}
F
 = 0.01375 = 0.00226875

0 (1/2)(1/2) (1/10)max{(1/12)0.01, (1/2)max{0.013750.01,

= 1/4 (1/4)0.8} 0.020.8}
L
= 0.02 = 0.08

0.80
0.99
0.01 1: 1/10
1: 1/6
2: 1/10
2: 1/6

rv
3: 1/10
vk (i )  ek (xi ) max
r
rk
3:
4:
1/6
1/6
4: 1/10
5: 1/10

5: 1/6 0.2
(i  1)a 6: 1/6
6: 1/2

Fair Loaded
Viterbi gets it right more often than not

The numbers in first rows show 300 rolls of a die. Second rows show which die
was actually used for the roll. Third lines show the Viterbi algorithm prediction

Bioinformatics HMM Updated
No ratings yet
Bioinformatics HMM Updated
28 pages
Hidden Markov Models: CH 3.2, 3.2 of DEKM
No ratings yet
Hidden Markov Models: CH 3.2, 3.2 of DEKM
27 pages
Gene Finding and HMMS: 6.096 - Algorithms For Computational Biology - Lecture 7
No ratings yet
Gene Finding and HMMS: 6.096 - Algorithms For Computational Biology - Lecture 7
69 pages
Hidden Markov Models: Modified From
No ratings yet
Hidden Markov Models: Modified From
32 pages
HMMs for Genomic Sequence Analysis
No ratings yet
HMMs for Genomic Sequence Analysis
59 pages
Markov Chain Models: BMI/CS 576 WWW - Biostat.wisc - Edu/bmi576/ Cdewey@biostat - Wisc.edu Fall 2010
No ratings yet
Markov Chain Models: BMI/CS 576 WWW - Biostat.wisc - Edu/bmi576/ Cdewey@biostat - Wisc.edu Fall 2010
36 pages
HMM Forward Viterbi
No ratings yet
HMM Forward Viterbi
22 pages
Lecture 7
No ratings yet
Lecture 7
25 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Lecture07 HMM S
No ratings yet
Lecture07 HMM S
26 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Introduction To Machine Learning CMU-10701: Hidden Markov Models
No ratings yet
Introduction To Machine Learning CMU-10701: Hidden Markov Models
30 pages
BT302 L9 HMM
No ratings yet
BT302 L9 HMM
29 pages
ML 5
No ratings yet
ML 5
28 pages
Markov Models: Current Next Transition Probabilities Current
100% (1)
Markov Models: Current Next Transition Probabilities Current
53 pages
Hidden Markov Models for Experts
No ratings yet
Hidden Markov Models for Experts
59 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
Module 5
No ratings yet
Module 5
51 pages
HMM in BI
No ratings yet
HMM in BI
37 pages
Markov Chains
No ratings yet
Markov Chains
22 pages
Bts 360 S Article
No ratings yet
Bts 360 S Article
104 pages
Recitation4 Notes
No ratings yet
Recitation4 Notes
6 pages
IS 7118 Unit-6 HMM
No ratings yet
IS 7118 Unit-6 HMM
78 pages
Markov Models for Data Analysis
No ratings yet
Markov Models for Data Analysis
32 pages
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
No ratings yet
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
67 pages
6761 4 MarkovChains
No ratings yet
6761 4 MarkovChains
76 pages
Markov Chains
No ratings yet
Markov Chains
6 pages
Hidden Markov Model (HMM) Architecture
No ratings yet
Hidden Markov Model (HMM) Architecture
15 pages
MCMC
No ratings yet
MCMC
70 pages
Lec7 MarkovChains
No ratings yet
Lec7 MarkovChains
14 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
Cheat Sheet Or2 Final
No ratings yet
Cheat Sheet Or2 Final
5 pages
Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu
No ratings yet
Learning HMM Parameters: WWW - Biostat.wisc - Edu/bmi576/ Sroy@biostat - Wisc.edu
31 pages
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
No ratings yet
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
36 pages
Hidden Markov Models & Algorithms
No ratings yet
Hidden Markov Models & Algorithms
39 pages
Lecture 11
No ratings yet
Lecture 11
55 pages
Machine Learning For Natural Language Processing: Hidden Markov Models
No ratings yet
Machine Learning For Natural Language Processing: Hidden Markov Models
33 pages
ViteRbi Algorithm
No ratings yet
ViteRbi Algorithm
19 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
Lec20 PDF
No ratings yet
Lec20 PDF
7 pages
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
No ratings yet
Artificial Intelligence and Learning Algorithms: Presented by Brian M. Frezza 12/1/05
67 pages
HMM
No ratings yet
HMM
25 pages
Backward Algo
No ratings yet
Backward Algo
4 pages
Hidden Markov Models & Viterbi Algorithm
No ratings yet
Hidden Markov Models & Viterbi Algorithm
41 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Chapter 4 - Discrete Time Markov Chains
No ratings yet
Chapter 4 - Discrete Time Markov Chains
37 pages
Markov Chain Implementation in C++ Using Eigen
No ratings yet
Markov Chain Implementation in C++ Using Eigen
9 pages
Title: Hidden Markov Model: Hidden Markov Model The States That Were Responsible For Emitting The Various Symbols Are
No ratings yet
Title: Hidden Markov Model: Hidden Markov Model The States That Were Responsible For Emitting The Various Symbols Are
5 pages
Implementation of Discrete Hidden Markov Model For Sequence Classification in C++ Using Eigen
No ratings yet
Implementation of Discrete Hidden Markov Model For Sequence Classification in C++ Using Eigen
8 pages
Hidden Markov Models 3pb6fukspf
No ratings yet
Hidden Markov Models 3pb6fukspf
29 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
26 pages
HMM Isolated Word Recognition
No ratings yet
HMM Isolated Word Recognition
23 pages
Programación Dinamica Aplicacion Al Analisis de ADN
No ratings yet
Programación Dinamica Aplicacion Al Analisis de ADN
19 pages
Stochastic Process Simulation in Matlab
No ratings yet
Stochastic Process Simulation in Matlab
17 pages
(Computational Biology, V. 2) Timo Koski - Hidden Markov Models For Bioinformatics-Kluwer (2001)
No ratings yet
(Computational Biology, V. 2) Timo Koski - Hidden Markov Models For Bioinformatics-Kluwer (2001)
404 pages
Markov Models & Algorithms Guide
No ratings yet
Markov Models & Algorithms Guide
30 pages
17 19 HMMs
No ratings yet
17 19 HMMs
23 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Valigonda Station - Kukkadam Station
100% (1)
Valigonda Station - Kukkadam Station
17 pages
HR Lab Report
No ratings yet
HR Lab Report
27 pages
Mathematics and Science Overview
No ratings yet
Mathematics and Science Overview
20 pages
NSRC Sericulture Products & Services
No ratings yet
NSRC Sericulture Products & Services
6 pages
Acetaminophen
No ratings yet
Acetaminophen
11 pages
Siemens S7-300 Getting Started - Commissioning A CPU 31xC
100% (1)
Siemens S7-300 Getting Started - Commissioning A CPU 31xC
20 pages
The Heart of Hearts of Rumi's Mathnawi (PDFDrive) - 401 PDF
No ratings yet
The Heart of Hearts of Rumi's Mathnawi (PDFDrive) - 401 PDF
1 page
Ramsar Sites 2024
No ratings yet
Ramsar Sites 2024
37 pages
Dialogue of Talking About The Patient
No ratings yet
Dialogue of Talking About The Patient
6 pages
Ai in Iot Use Cases and Challenges: Dmitry Petukhov
No ratings yet
Ai in Iot Use Cases and Challenges: Dmitry Petukhov
11 pages
Exercise Set For Balancing The Aura
No ratings yet
Exercise Set For Balancing The Aura
1 page
Afmd 2025 Broch..
No ratings yet
Afmd 2025 Broch..
2 pages
Ethical Implications of CODIS
No ratings yet
Ethical Implications of CODIS
61 pages
Trust and Obey
No ratings yet
Trust and Obey
3 pages
Punkapocalyptic Conversion Guide
No ratings yet
Punkapocalyptic Conversion Guide
5 pages
Interior Design Portfolio by Syed Ashfaq
No ratings yet
Interior Design Portfolio by Syed Ashfaq
16 pages
Constitution of Madonna Central Choir-2
No ratings yet
Constitution of Madonna Central Choir-2
26 pages
Bedjanič - 1999 - New Records of Hemianax Ephippiger in Slovenia - Exuviae 6
No ratings yet
Bedjanič - 1999 - New Records of Hemianax Ephippiger in Slovenia - Exuviae 6
5 pages
Anna Univ Exam Schedule 2010
No ratings yet
Anna Univ Exam Schedule 2010
14 pages
Encrypted Data Analysis Guide
No ratings yet
Encrypted Data Analysis Guide
73 pages
Kimia Form 5 Bab 2 Modul Sebatian Karbon Part 1
No ratings yet
Kimia Form 5 Bab 2 Modul Sebatian Karbon Part 1
59 pages
Bmc1-Bearing Capacity of Soil
No ratings yet
Bmc1-Bearing Capacity of Soil
24 pages
Design - November 2024
No ratings yet
Design - November 2024
16 pages
Wireline Formation Testing and Sampling Technology
No ratings yet
Wireline Formation Testing and Sampling Technology
8 pages
X34 MODBUS Protocol Guide
No ratings yet
X34 MODBUS Protocol Guide
8 pages
Narrow Gauge Worls N91
No ratings yet
Narrow Gauge Worls N91
4 pages
1st - Math
No ratings yet
1st - Math
30 pages
Brainerd's Spiritual Legacy
100% (1)
Brainerd's Spiritual Legacy
1 page
Litespeed Titanium Road Bikes Guide
No ratings yet
Litespeed Titanium Road Bikes Guide
12 pages
Helix V 1605-2 25 V KS 400-50
No ratings yet
Helix V 1605-2 25 V KS 400-50
1 page

Bioinformatics-Lesson 07 - Hidden Markov Model

Uploaded by

Bioinformatics-Lesson 07 - Hidden Markov Model

Uploaded by

Introduction to Bioinformatics

• a Markov chain model is defined by

• key property of a (1st order) Markov chain: the

Pr(cggt)  Pr(c) Pr(g | c) Pr(g | g) Pr(t | g) Pr(end |

xi1 xi  Pr(x1 ) Pr(xi |

CpG Islands distinguish from

• CpG dinucleotides are rarer than would be expected

• Calculated Transition probabilities for both models

C -0.913 0.302 1.812 -0.685

T -0.624 0.461 0.331 -0.730

T -1.169 0.573 0.393 -0.679

•Solid bars represent

•Dotted bars represent

•Error could be due to

• given say a T in our input sequence, which state

• we’ll distinguish between the observed parts

• Goal: Find the most likely explanation for the

ek (b) Fair Loaded

(assuming begin/end are the only silent

• Maximum Likelihood: Determine which

• Total probability: Determine probability

• the probability over all such paths is:

xL • the Forward algorithm enables us to compute this efficiently

To find *, use trace-back, as in dynamic programming

0 (1/2)(1/2) (1/10)max{(1/12)0.01, (1/2)max{0.013750.01,

You might also like