Introduction To Machine Learning CMU-10701: Hidden Markov Models
Introduction To Machine Learning CMU-10701: Hidden Markov Models
Introduction To Machine Learning CMU-10701: Hidden Markov Models
CMU-10701
Hidden Markov Models
– Characters in a sentence
2
Markov Models
Chain rule
Markov Assumption
1st order
2nd order
4
Markov Models
# parameters in
stationary model
Markov Assumption K-ary variables
S1 S2 ST-1 ST
O1 O2 OT-1 OT
O1 O2 OT-1 OT
S1 S2 ST-1 ST
Initial probabilities
p(S1 = i) = πi
O1 O2 OT-1 OT
Transition probabilities
p(St = j|St-1 = i) = pij
Emission probabilities
p(Ot = y|St= i) =
8
HMM Example
10
HMM Example
F F L L L F
11
State Space Representation
Switch between F and L with 5% probability
0.05
0.95 L F 0.95
0.05
HMM Parameters
Initial probs P(S1 = L) = 0.5 = P(S1 = F)
Transition probs P(St = L/F|St-1 = L/F) = 0.95
P(St = F/L|St-1 = L/F) = 0.05
Emission probabilities P(Ot = y|St= F) = 1/6 y = 1,2,3,4,5,6
P(Ot = y|St= L) = 1/10 y = 1,2,3,4,5
= 1/2 y=6
12
Three main problems in HMMs
• Evaluation – Given HMM parameters & observation seqn
find prob of observed sequence
O1 Ot-1 Ot
Introduce St-1
.
. Chain rule
.
Markov assumption
16
Forward Algorithm
Can compute αtk for all k, t using dynamic programming:
• Iterate: for t = 2, …, T
αtk = p(Ot|St = k) ∑ αit-1 p(St = k|St-1 = i) for all k
i
• Termination: = ∑ αkT
k
17
Decoding Problem 1
• Given HMM parameters & observation
sequence
find probability that hidden state at time t was k
Ot Ot+1 Ot+2 OT
Introduce St+1
.
. Chain rule
.
Markov assumption
19
Backward Algorithm
Can compute βtk for all k, t using dynamic programming:
• Termination:
20
Most likely state vs. Most likely
sequence
Most likely state assignment at time t
E.g. Which die was most likely used by the casino in the third roll given the
observed sequence?
E.g. What was the most likely sequence of die rolls used by the casino
given the observed sequence?
MLA of x?
Not the same solution ! MLA of (x,y)?
21
Decoding Problem 2
• Given HMM parameters & observation
sequence
find most likely assignment of state sequence
VkT
Compute recursively
VkT - probability of most likely sequence of states ending at
state ST = k
22
Viterbi Decoding
S1 St-1 St
. Bayes rule
. Ot-1
Markov assumption O1 Ot
.
23
Viterbi Algorithm
Can compute Vtk for all k, t using dynamic programming:
• Iterate: for t = 2, …, T
for all k
• Termination:
Traceback:
24
Computational complexity
• What is the running time for Forward, Backward, Viterbi?
25
Learning Problem
• Given HMM with unknown parameters
and observation sequence
find parameters that maximize likelihood of observed data
But likelihood doesn’t factorize
since observations not i.i.d.
EM (Baum-Welch) Algorithm:
E-step – Fix parameters, find expected state assignments
M-step – Fix expected state assignments, update parameters
26
Baum-Welch (EM) Algorithm
• Start with random initialization of parameters
• E-step – Fix parameters, find expected state assignments
Forward-Backward algorithm
27
Baum-Welch (EM) Algorithm
• Start with random initialization of parameters
• E-step
= expected # times
in state i
-1
= expected # transitions
from state i
= expected # transitions
from state i to j
• M-step
28
Some connections
• HMM vs Linear Dynamical Systems (Kalman Filters)
29
HMMs.. What you should know
• Useful for modeling sequential data with few parameters
using discrete hidden states that satisfy Markov assumption
• Representation - initial prob, transition prob, emission prob,
State space representation
• Algorithms for inference and learning in HMMs
– Computing marginal likelihood of the observed sequence:
forward algorithm
– Predicting a single hidden state: forward-backward
– Predicting an entire sequence of hidden states: viterbi
– Learning HMM parameters: an EM algorithm known as Baum-
Welch
30