Introduction To Hidden Markov Models
Introduction To Hidden Markov Models
Introduction To Hidden Markov Models
Markov Models
• Set of states: {s , s , , s }
1 2 N
• Process moves from one state to another generating a sequence
of states :
si1 ,ofsieach
• Markov chain property: probability 2 , subsequent
, sik , state depends only on
what was the previous state:
P( sikmodel,
| si1 , the
si 2 following 1 ) P ( sik | sik 1 )
, , sik probabilities
• To define Markov have to be specified:
transition probabilities and initial probabilities
aij P( s j | si )
i P ( si )
Example of Markov Model
0.3 0.7
Rain Dry
0.2 0.8
A=(a ), a = P(s | s ) ,
specified: matrix of transition probabilities ij ij j i
Low High
0.2 0.8
0.6 0.6
0.4 0.4
Rain Dry
Example of Hidden Markov Model
• Two states : ‘Low’ and ‘High’ atmospheric pressure.
• Two observations : ‘Rain’ and ‘Dry’.
• Transition probabilities: P( )
‘Low’|‘Low’ =0.3 , P( )
‘High’|‘Low’ =0.7 ,
P(‘Low’|‘High’)=0.2, P(‘High’|‘High’)=0.8
• Observation probabilities : P(‘Rain’|‘Low’)=0.6 , P(‘Dry’|‘Low’)=0.4 ,
P(‘Rain’|‘High’)=0.4 , P(‘Dry’|‘High’)=0.3 .
• Initial probabilities: say P(‘Low’)=0.4 , P(‘High’)=0.6 .
Calculation of observation sequence probability
• Suppose we want to calculate a probability of a sequence of observations in
our example, {‘Dry’,’Rain’}.
• Consider all possible hidden state sequences:
{‘High’,’High’})
NOT
OBSERVABLE
12
Three Basic Problems in HMM
MM ?
• 1.The Evaluation Problem
ua t ea n H –Given a model anda
l
sequenceHoof o eva
w tobservations orithm X ( X 1 , X 2 ,..., X T, )what is the
A lg
rward
probability F o P ( X; |i.e., ) the probability of the model
that generates the observations?
• 2.The Decoding Problem – Given aMmodel M? and a
an H
sequence of observation to D ec od Xe ( X 1 , X 2 ,..., X, Twhat
) is the
How orithm
Siterb( s0 , s1 ,..., sT ) in the model
most likely state sequence
V
i A lg
that produces the observations?
• For Markov chains, the output symbols are the same as the states.
– See hot weather: we’re in state hot
• But in part-of-speech tagging (and other things)
– The output symbols are words
– But the hidden states are part-of-speech tags
• So we need an extension!
• A Hidden Markov Model is an extension of a Markov chain in
which the input symbols are not the same as the states.
• This means we don’t know which state we are in.
08/04/2020 15
Hidden Markov Models
• States Q = q1, q2…qN;
• Observations O= o1, o2…oN;
– Each observation is a symbol from a vocabulary V =
{v1,v2,…vV}
• Transition probabilities
– Transition probability matrix A = {aij}
aij P(qt j | qt1 i) 1 i, j N
• Observation likelihoods
– Output probability matrix B={bi(k)}
bi (k) P(X t ok | qt i)
• Special initial probability vector
i P(q1 i) 1 i N
Hidden Markov Models
• Some constraints
N
a ij 1; 1 i N
j1
i P(q1 i) 1 i N
M
b (k) 1
N
k1
i j 1
j1
08/04/2020 17
Assumptions
• Markov assumption:
• Output-independence
P(qi | q1 ...qi1) P(qassumption
i | qi1 )
08/04/2020 18
Eisner task
• Given
– Ice Cream Observation Sequence: 1,2,3,2,2,2,3…
• Produce:
– Weather Sequence: H,C,H,H,H,C…
08/04/2020 19
HMM for ice cream
08/04/2020 20
HMM
A colored ball choosing example :
Observation : RRGGBRGR
State Sequence : ??
• Here :
– S = {U1, U2, U3} U1 U2 U3
A=
– V = { R,G,B} U1 0.1 0.4 0.5
• For observation: U2 0.6 0.2 0.2
– O ={o1… on} U3 0.3 0.4 0.3
• And State sequence R G B
– Q ={q1… qn} B=
U1 0.3 0.5 0.2
• π is P(q1 U i )
i
U2 0.1 0.4 0.5
U3 0.6 0.1 0.3
Model Definition
( A, B , ) P (O | )
3. How to adjust to best maximize
– Re-estimate λ
Three basic problems (contd.)
• Problem 1: Likelihood of a sequence
– Forward Procedure
– Backward Procedure
• Problem 2: Best state sequence
– Viterbi Algorithm
• Problem 3: Re-estimation
– Baum-Welch ( Forward-Backward Algorithm )
Problem 2
• Given Observation Sequence O ={o1… oT}
– Get “best” Q ={q1… qT} i.e.
• Solution :
1. Best state individually likely at a position i
2. Best state given all the previously observed states
and observations
Viterbi Algorithm
Example
...and so on
CS626-449: NLP, Speech and
Web-Topics-in-AI
Pushpak Bhattacharyya
CSE Dept.,
IIT Bombay
Lecture 38-39: Baum Welch
Algorithm; HMM training
Baum Welch algorithm
a b a
q r
b a b
1. i (t ) forward probability
P(W 1,t 1,St s i ),t 0
α1(t) 1.0 if i 0
0 otherwise
T T
P (W 1,n ) P (W 1,t,Sn 1 s ) αi(n 1 )
i
i 1 i 1
T
αj(t 1 ) αi(t)P ( s i w j
s )
k
i 1
W1 W2…………… Wn-1 Wn
S1 S2 Sn Sn+1
Probabilities to be used, contd…
2. i (t ) backward probability
P (Wt , n,, St s i ) ,t 1
1(1) P(W 1, n,, S 1 s i )
P (W 1, n)
i (n 1) P(| Sn 1 s i ) 1
T
i (t 1) P ( s i w
s ). j (t )
k j
j 1
T
Exercise 1:- Prove the following:P (W 1,n) j (t ). j (t )
j 1
Start of baum-welch algorithm
b
b
q a
r
a
q r a 5
P(q
r ) 3 / 8
b
q q b 3
c( s i w j
s )
k
P( s w
r q a 3
s )
i k j
T A
i wm
c ( s
l 1 m 1
s l
) r q b 2
T=#states
A=#alphabet symbols
Now if we have a non-deterministic transitions then multiple
state seq possible for the given o/p seq (ref. to previous slide’s
feature). Our aim is to find expected count through this.
Interplay Between Two Equations
c( s i
Wk
sj)
P( s i
Wk
sj) T A
c ( s
l 1 m 1
i
Wk
s j
)
C (s i
Wk
sj)
i,n1 1,n
P ( S |
si , n 1
W ) n ( s i
Wk
s j
, Si ,n1 , w1,n )
wk
No. of times the transitions sisj occurs in the string
Learning probabilities
a:0.67
b:0.17
q r
a:0.16
b:1.0
Actual (Desired) HMM
a:0.4
b:0.48
q r
a:0.48
b:1.0 Initial guess
One run of Baum-Welch algorithm:
string ababa
a a b b a a b b b b P(path) a b a b
q r r q q q q q
q r q r q q 0.00077 0.00154 0.00154 0 0.00077
q r q q q q 0.00442 0.00442 0.00442 0.00442 0.00884
q q q r q q 0.00442 0.00442 0.00442 0.00442 0.00884
q q q q q q 0.02548 0.0 0.000 0.05096 0.07644
Rounded Total 0.035 0.01 0.01 0.06 0.095
New Probabilities (P) 0.06 1.0 0.36 0.581
(0.01/
(0.01+0.06+
0.095)
State sequences
This way through multiple iterations the probability values will converge.
Appling Naïve Bayes
P ( S1,n 1 | W1,n )
P( S1,n 1 , W1,n )
P (W1,n )
1
P(S1 ) P(S 2W1 | S1 ) P( S3W2 | S1W2W1 ) P(S 4W3 | S1S 2W1W2 )
P(W1,n )
P( S1 ) n
P(W1,n ) i 1
P ( Si 1Wi | Si )
P( S s , S
t 1
t
i
t 1 s j , Wt Wk , S1,n 1 , W1,n )
n
wk s ).i (t 1)
α (t)P(s
t 1
i i i