Hidden Markov models (HMMs) are often used as black boxes to perform a wide range of classification and clustering tasks. Refer to Part 1 of this series for a detailed introduction to HMMs and to Part 2 for the application of the Forward algorithm. In this final part, we will look at another application of HMMs, the Viterbi Algorithm, and examine how to employ the algorithm to solve an example problem in R.
In part 1 of this post, we defined our HMM with possible states: Hungry (H) and Thirsty (T), and three possible observations: Burger (B), Ice-cream (I) and Soda (S).
The corresponding initial probability vector, IN, is:
Hungry | Thirsty |
0.4 | 0.6 |
With a transition probability matrix, TR:
at t=0 \ at t=1 | Hungry | Thirsty |
Hungry | 0.3 | 0.7 |
Thirsty | 0.8 | 0.2 |
And an emission probability matrix, EM
Burger | Ice-cream | Soda | |
Hungry | 0.6 | 0.3 | 0.1 |
Thirsty | 0.05 | 0.4 | 0.55 |
Given the model we defined and a sequence of observations B–I–S, we are now aiming to answer the following question: after consuming a Burger, Ice-cream and a Soda, what is the most likely state at the end of the meal: Hungry or Thirsty? Here we wish to derive the final hidden state from the B–I–S sequence.
The good news is we have already completed 50% of the work in part 2! Given that we have estimated the probability of each observation of the sequence given both possible states, we can decode the hidden state at the end of lunch by isolating the most probable sequence of hidden states. This is exactly what the Viterbi Algorithm does.
Let’s start by estimating the probability of having a Burger given the two possible states. I could have this Burger because I am Hungry: P_{1}(B|H_{1}) = P_{init}(H) x P(B|H) = 0.4 x 0.6 = 0.24, or because I am Thirsty: P_{1}(B|T_{1}) = P_{init}(T) x P(B|T) = 0.6 x 0.05 = 0.03. Figure 1 represents these two possible paths.
Now let’s move on to the second item on the menu: my Ice-cream. Given my two possible states while I had the Burger, there are now four paths that lead to the two possibles states and to my Ice-cream order (see Figure 2).
Let’s start by assuming that my state while having the Burger was Hungry. I then ordered the Ice-cream for one of two possible reasons:
Considering the above reasons, the following probabilities can be calculated. The probability of being Thirsty while ordering the ice-cream after having a Burger out of Hunger (case i) is therefore
P_{2}(T|I, H_{1}) = P_{1}(B|H_{1}) x P(T|H) x P(I|T) = 0.24 x 0.7 x 0.4 = 0.067.
The probability of still being Hungry while ordering the ice-cream after having a Burger out of Hunger (case ii) is:
P_{2}(H|I, H_{1}) = P_{1}(B|H_{1}) x P(H|H) x P(I|H) = 0.24 x 0.3 x 0.3 = 0.022.
Now, let’s assume instead that I had a Burger because I was Thirsty. The probability of being Thirsty now while ordering the ice-cream after having a Burger out of Thirst (case iii) is therefore:
P_{2}(T|I, T1) = P_{1}(B|T_{1}) x P(T|T) x P(I|T) = 0.03 x 0.2 x 0.4 = 0.002.
The probability of being Hungry while ordering the ice-cream after having a Burger out of Thirst (case iv) is :
P_{2}(H|I, T_{1}) = P_{1}(B|T1) x P(T|H) x P(I|H) = 0.03 x 0.8 x 0.3 = 0.007.
We then consider the most likely path that arrives at each state. Of the two paths that lead to being Hungry, P_{2}(H|I, H_{1}) = 0.022 is the most likely, and so we define P_{2}(H_{2}) = 0.022. Similarly, of the two paths that lead to being Thirsty, P_{2}(T|I, H_{1}) = 0.067 is more likely and so P_{2}(T_{2}) = 0.067.
Again, there are four possible paths resulting in me ordering a Soda as dessert (see Figure 3).
Firstly, let’s assume that I had an Ice-cream because I was Hungry, and then ordered a Soda. What are the probability of being Thirsty or Hungry at this stage? The Thirsty probability is given by:
P_{3}(T|S,H_{2}) = P_{2}(H_{2}) x P(T|H) x P(S|T) = 0.022 x 0.7 x 0.55 = 0.008
while the Hungry state is:
P_{3}(H|S,H_{2}) = P_{2}(H_{2}) x P(H|H) x P(S|H) = 0.022 x 0.3 x 0.1 = 0.0007.
Now, assuming that I was Thirsty when having an Ice-cream instead, and then decided to get a Soda, the probability of still being Thirsty after my Soda is:
P_{3}(T|S,T_{2}) = P_{2}(T_{2}) x P(T|T) x P(S|T)= 0.067 x 0.2 x 0.55 = 0.007.
Finally, the probability of being Hungry while ordering the Soda is:
P_{3}(H|S,T_{2}) = P_{2}(T_{2}) x P(H|T) x P(S|H)= 0.067 x 0.8 x 0.1 = 0.005
Now that we know all the possible paths to both states at every item of the meal, we can trace back to the most probable sequence of hidden states. Let’s start from the end with the Soda. From the four possible paths leading to getting my Soda, the highest probability was 0.008, which is associated with P_{3}(T|S,H_{2}). Therefore, my most likely state when I ordered my Soda was Thirsty.
This probability P_{3}(T|S,H_{2}) is also associated with previously getting the Ice-cream because of a Hungry state, H_{2}. Now looking at both possible paths leading to a Hungry state H_{2}, the path with the highest probability was associated with P_{2}(H|I, H_{1}) (with a probability of 0.022). This probability is also associated with getting my Burger because I was initially Hungry, H_{1}.
Putting all pieces of the puzzle together, we can derive the most probable hidden state sequence which, as illustrated in Figure 4, is: Hungry, H_{1} -> Hungry, H_{2} -> Thirsty, T_{3}.
Using the HMM R library [2], here is a snippet to implement our example and solve it with the Viterbi algorithm:
library(HMM)
# Initialise HMM
hmm = initHMM(c("Hungry", "Thirsty"), c("Burger", "Ice-cream", "Coke"),
startProbs=c(0.4, 0.6),
transProbs=matrix(c(.3, .8, .7, .2), 2),
emissionProbs=matrix(c(0.6, 0.05, 0.3, 0.4, 0.1, 0.55), 3, 2))
# Sequence of observations
observations = c("Burger", "Ice-cream", "Coke")
# Viterbi algor
viterbi_state = viterbi(hmm, observations)
cat("Hidden states sequence is: ", viterbi_state)
cat("Final Hidden state is:", viterbi_state[3])
Hidden states sequence is: "Hungry" "Hungry" "Thirsty"
Final Hidden state is: "Thirsty"
Congratulations, this wraps up our blog post series on demystifying the Hidden Markov Model. We hope you have not only learnt the basics of Hidden Markov models but also have a deeper understanding of their application. Are you Hungry or Thirsty? You know what to do!!