Introduction To Machine Learning CMU-10701: Hidden Markov Models

Introduction to Machine Learning
CMU-10701
Hidden Markov Models
Barnabás Póczos & Aarti Singh
Slides courtesy: Eric Xing

i.i.d to sequential data
So far we assumed independent,
identically distributed data
Sequential (non i.i.d.) data
– Time-series data
E.g. Speech
– Characters in a sentence
– Base pairs along a DNA strand
2
Markov Models
Joint distribution of n arbitrary random variables
Chain rule
Markov Assumption (mth order)

Current observation
only depends on past
m observations
3
Markov Models
 Markov Assumption
1st order
2nd order
4
Markov Models
# parameters in
stationary model
 Markov Assumption K-ary variables
1st order O(K2)
mth order O(Km+1)
n-1th order O(Kn)
≡ no assumptions – complete (but directed) graph
Homogeneous/stationary Markov model (probabilities don’t depend on n)

5
• Distributions that characterize sequential data with few
parameters but are not limited by strong Markov assumptions.
S1 S2 ST-1 ST
O1 O2 OT-1 OT
Observation space Ot ϵ {y1, y2, …, yK}

Hidden states St ϵ {1, …, I}
6
S1 S2 ST-1 ST
O1 O2 OT-1 OT
1st order Markov assumption on hidden states {St} t = 1, …, T

(can be extended to higher order).
Note: Ot depends on all previous observations {Ot-1,…O1}

7
• Parameters – stationary/homogeneous markov model

(independent of time t)
S1 S2 ST-1 ST
Initial probabilities
p(S1 = i) = πi
O1 O2 OT-1 OT
Transition probabilities
p(St = j|St-1 = i) = pij
Emission probabilities
p(Ot = y|St= i) =
8
HMM Example
• The Dishonest Casino

A casino has two dices:
Fair dice
P(1) = P(2) = P(3) = P(5) = P(6) = 1/6
Loaded dice
P(1) = P(2) = P(3) = P(5) = 1/10
P(6) = ½
Casino player switches back-&-

forth between fair and loaded die
with 5% probability
9
HMM Problems
10
HMM Example
F F L L L F
11
State Space Representation
 Switch between F and L with 5% probability
0.05
0.95 L F 0.95
0.05
 HMM Parameters
Initial probs P(S1 = L) = 0.5 = P(S1 = F)
Transition probs P(St = L/F|St-1 = L/F) = 0.95
P(St = F/L|St-1 = L/F) = 0.05
Emission probabilities P(Ot = y|St= F) = 1/6 y = 1,2,3,4,5,6
P(Ot = y|St= L) = 1/10 y = 1,2,3,4,5
= 1/2 y=6
12
Three main problems in HMMs
• Evaluation – Given HMM parameters & observation seqn
find prob of observed sequence
• Decoding – Given HMM parameters & observation seqn

find most probable
sequence of hidden states
• Learning – Given HMM with unknown parameters and

observation sequence
find parameters that maximize
likelihood of observed data
13
HMM Algorithms
• Evaluation – What is the probability of the observed

sequence? Forward Algorithm
• Decoding – What is the probability that the third roll was

loaded given the observed sequence? Forward-Backward
Algorithm
– What is the most likely die sequence given the observed
sequence? Viterbi Algorithm
• Learning – Under what parameterization is the observed

sequence most probable? Baum-Welch Algorithm (EM)
14
Evaluation Problem
• Given HMM parameters & observation
sequence
S1 S2 ST-1 ST
find probability of observed sequence
O1 O2 OT-1 OT
requires summing over all possible hidden state values at all

times – KT exponential # terms!
Instead:
αkT Compute recursively 15

Forward Probability
Compute forward probability αkt recursively over t

S1 St-1 St
O1 Ot-1 Ot
Introduce St-1
.
. Chain rule
.
Markov assumption
16
Forward Algorithm
Can compute αtk for all k, t using dynamic programming:
• Initialize: α1k = p(O1|S1 = k) p(S1 = k) for all k
• Iterate: for t = 2, …, T
αtk = p(Ot|St = k) ∑ αit-1 p(St = k|St-1 = i) for all k
i
• Termination: = ∑ αkT
k
17
Decoding Problem 1
sequence
find probability that hidden state at time t was k
Compute recursively αkt βtk
S1 St-1 St St+1 ST-1 ST
O1 Ot-1 Ot Ot+1 OT-1 OT

18
Backward Probability
Compute forward probability βtk recursively over t

St St+1 St+2 ST
Ot Ot+1 Ot+2 OT
Introduce St+1
.
. Chain rule
.
Markov assumption
19
Backward Algorithm
Can compute βtk for all k, t using dynamic programming:
• Initialize: βTk = 1 for all k
• Iterate: for t = T-1, …, 1

for all k
• Termination:
20
Most likely state vs. Most likely
sequence
 Most likely state assignment at time t
E.g. Which die was most likely used by the casino in the third roll given the
observed sequence?
 Most likely assignment of state sequence
E.g. What was the most likely sequence of die rolls used by the casino
given the observed sequence?
MLA of x?
Not the same solution ! MLA of (x,y)?
21
Decoding Problem 2
sequence
find most likely assignment of state sequence
VkT
Compute recursively
VkT - probability of most likely sequence of states ending at
state ST = k
22
Viterbi Decoding
Compute probability Vkt recursively over t
S1 St-1 St
. Bayes rule
. Ot-1
Markov assumption O1 Ot
.
23
Viterbi Algorithm
Can compute Vtk for all k, t using dynamic programming:
• Initialize: V1k = p(O1|S1=k)p(S1 = k) for all k
• Iterate: for t = 2, …, T
for all k
• Termination:
Traceback:
24
Computational complexity
• What is the running time for Forward, Backward, Viterbi?
O(K2T) linear in T instead of O(KT) exponential in T!
25
Learning Problem
• Given HMM with unknown parameters
and observation sequence
find parameters that maximize likelihood of observed data
But likelihood doesn’t factorize
since observations not i.i.d.
hidden variables – state sequence
EM (Baum-Welch) Algorithm:
E-step – Fix parameters, find expected state assignments
M-step – Fix expected state assignments, update parameters
26
Baum-Welch (EM) Algorithm
• Start with random initialization of parameters
• E-step – Fix parameters, find expected state assignments
Forward-Backward algorithm
27
Baum-Welch (EM) Algorithm
• Start with random initialization of parameters
• E-step
= expected # times
in state i
-1
= expected # transitions
from state i
= expected # transitions
from state i to j
• M-step
28
Some connections
• HMM vs Linear Dynamical Systems (Kalman Filters)
HMM: States are Discrete

Observations Discrete or Continuous
Linear Dynamical Systems: Observations and States are multi-

variate Gaussians whose means are
linear functions of their parent states
(see Bishop: Sec 13.3)
29
HMMs.. What you should know
• Useful for modeling sequential data with few parameters
using discrete hidden states that satisfy Markov assumption
• Representation - initial prob, transition prob, emission prob,
State space representation
• Algorithms for inference and learning in HMMs
– Computing marginal likelihood of the observed sequence:
forward algorithm
– Predicting a single hidden state: forward-backward
– Predicting an entire sequence of hidden states: viterbi
– Learning HMM parameters: an EM algorithm known as Baum-
Welch
30

Introduction To Machine Learning CMU-10701: Hidden Markov Models

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Introduction To Machine Learning CMU-10701: Hidden Markov Models

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To Machine Learning CMU-10701: Hidden Markov Models

Uploaded by

Copyright:

Available Formats

Introduction to Machine Learning

Barnabás Póczos & Aarti Singh

Slides courtesy: Eric Xing

– Base pairs along a DNA strand

Joint distribution of n arbitrary random variables

Markov Assumption (mth order)

1st order O(K2)

mth order O(Km+1)

n-1th order O(Kn)

≡ no assumptions – complete (but directed) graph

Homogeneous/stationary Markov model (probabilities don’t depend on n)

Observation space Ot ϵ {y1, y2, …, yK}

1st order Markov assumption on hidden states {St} t = 1, …, T

Note: Ot depends on all previous observations {Ot-1,…O1}

• Parameters – stationary/homogeneous markov model

• The Dishonest Casino

Casino player switches back-&-

• Decoding – Given HMM parameters & observation seqn

• Learning – Given HMM with unknown parameters and

• Evaluation – What is the probability of the observed

• Decoding – What is the probability that the third roll was

• Learning – Under what parameterization is the observed

requires summing over all possible hidden state values at all

αkT Compute recursively 15

Compute forward probability αkt recursively over t

• Initialize: α1k = p(O1|S1 = k) p(S1 = k) for all k

Compute recursively αkt βtk

S1 St-1 St St+1 ST-1 ST

O1 Ot-1 Ot Ot+1 OT-1 OT

Compute forward probability βtk recursively over t

• Initialize: βTk = 1 for all k

• Iterate: for t = T-1, …, 1

 Most likely assignment of state sequence

Compute probability Vkt recursively over t

• Initialize: V1k = p(O1|S1=k)p(S1 = k) for all k

O(K2T) linear in T instead of O(KT) exponential in T!

hidden variables – state sequence

HMM: States are Discrete

Linear Dynamical Systems: Observations and States are multi-

You might also like