Hidden Markov Models and Sequential Data
Hidden Markov Models and Sequential Data
Hidden Markov Models and Sequential Data
and
Sequential Data
Sequential Data
• Often arise through measurement of time
series
• Snowfall measurements on successive days in Buffalo
• Rainfall measurements in Chirrapunji
• Daily values of currency exchange rate
• Acoustic features at successive time frames in speech
recognition
• Non-time series
• Nucleotide base pairs in a strand of DNA
• Sequence of characters in an English sentence
• Parts of speech of successive words
10
Sound Spectrogram of Spoken Words
• “Bayes Theorem”
• Successive
observations of the
speech spectrum are
highly correlated
Task of making a sequence of decisions
• Processes in time, states at time t are influenced by
a state at time t-1
10
Model Assuming Independence
• Simplest model:
• Treat as independent
• Graph without links
Markov Model
N
p ( x1 ,..x N ) = ∏ p (x N | x1 ,..x N −1 )
n =1
First Order Markov Model
• Chain of observations {xn}
p ( xn | x1..xn −1 ) = p ( xn | xn −1 )
• If model is used to predict next observation, distribution of
prediction will only depend on preceding observation and
independent of earlier observations
Second Order Markov Model
• Conditional distribution of observation xn
depends on the values of two previous
observations xn-1 and xn-2
N
p ( x1 ,..x N ) = p ( x1 ) p( x2 | x1 )∏ p (xn | xn −1 , xn − 2 )
n =1
Latent variables
Observations
• Joint distribution for this model
N N
p ( x1 ,..x N , z1 ,..z n ) = p ( z1 )∏ p( z n | z n −1 )∏ p(xn | z n )
n =1 n =1
Two Models Described by this Graph
Latent variables
Observations
10
First-order Markov Models (MMs)
• State at time t: ω(t) Model for the production of
• Particular sequence of length T: any sequence is described by
ωT = {ω(1), ω(2), ω(3), …, ω(T)} the transition probabilities
e.g., ω6 = {ω1, ω4, ω2, ω2, ω1, ω4}
P(ωj(t + 1) | ωi (t)) = aij
Note: system can revisit a state
Which is a time-independent
at different steps and not every probability of having state ωj at
state needs to be visited step (t+1) given at time t state
was ωi
No requirement transitional
Particular model probabilities be symmetric
r =1
(2)
t =T (1) t =T
P(V T | ωrT ) = ∏ P (v(t ) | ω (t )) P(ωrT ) = ∏ P(ω (t ) | ω (t − 1))
t =1 t =1
• Calculate P ( V ) = ∑∏
T
r =1 t =1
P ( v ( t ) | ω ( t )) P ( ω ( t ) | ω ( t − 1 )
Define:
Trellis:
Unfolding of
HMM through
time
Computation of Probabilities by Forward Algorithm
In the evaluation trellis
we only accumulate
values
ω ω ω ω
0 1 2 3
ω 0
1 0 0 0
aij=
ω 1
0.2 0.3 0.1 0.4
ω 2
0.2 0.5 0.2 0.1
ω 3
0.8 0.1 0 0.1
v0 v1 v2 v3 v4
ω 0
1 0 0 0 0
ω 3
0 0.5 0.2 0.1 0.2
Example Problem: compute probability of generating sequence V4 = {v1, v3, v2, v0}
Assume ω 1 is the start state
at t = 0 state is ω1
thus α1 (0) = 1, α j (0) = 0 for j ≠ 1
arrows show calculation of α j (1)
top arrow : α 0 (1) = α1 (0)a10b01 = 1[0.2 × 0] = 0
Visible symbol
at each step aij ω0 ω1 ω2 ω3
α i (t ) ω0 1 0 0 0
a ijb jk ω1 0.2 0.3 0.1 0.4
ω2 0.2 0.5 0.2 0.1
ω3 0.8 0.1 0 0.1
bjk v0 v1 v2 v3 v4
ω0 1 0 0 0 0
Notes:
1. If aij and bjk are replaced by log probabilities a
we add terms rather than multiply them
2. Best path is maintained for each node
3. O(c T T)
Viterbi
Decoding
Trellis
Example of Decoding
Four states with explicit absorber state ω 0 ω 1 ω 2 ω 3
ω 0
1 0 0 0
aij= ω 1
0.2 0.3 0.1 0.4
ω 2
0.2 0.5 0.2 0.1
ω 3
0.8 0.1 0 0.1
HMM produces a
class-conditional
probability
w[.7] d[.8]
i[.8], l[.8] u[.5], v[.2] o[.5]
1 2 3 4 5 6 r[.4]
7 8
i[.7]
u[.3]
m[.2]
m[.1]