Course:CPSC522/Hidden Markov Models
Hidden Markov Models
Hidden Markov model is a statistical model, where the system is modeled follows Markov process with hidden (unobserved) states.^{[1]}
Principal Author: Jiahong Chen
Collaborators: Dandan wang, Mehrdad Ghomi
Abstract
This page describes HMM as a probabilistic model, introduces the framework of it as well as giving an example to tell how a HMM is established. Also, it introduces three main questions of HMM and gives detailed algorithms to describe how to solve these questions.
Builds on
A hidden Markov model (HMM) is a statistical Markov model, where the system is modeled follows Markov process with hidden (unobserved) states, and builds on Graphical_Models. Also, it can be presented as the simplest dynamic Bayesian network.
Related Pages
As mentioned above, A hidden Markov model (HMM) is a statistical Markov model, which builds on Graphical Models. It is also related to Bayesian network and Markov Decision Process.
Content
Introduction
Hidden Markov Model(HMM) is a probabilistic sequence model used for assigning a label or class to each unit in a sequence, and mapping a sequence of labels to a sequence of observations^{[2]}. It could be applied to numerous fields like speech recognition, natural language processing, bioinformatics, pattern recognize and so on^{[3]}. HMM are able to compute the probability of a sequence of unobservable events, which usually described as hidden events with the help of a sequence of observed events^{[2]}.
HMM gets its name from two defining properties. The first one is states of hidden process follow Markov property, that is, given state , current state is independent to states prior to . The second property is observation is only dependent to , where is unobservable ^{[4]}^{[5]}. Figure [1] indicates the belief representation of an HMM^{[4]}^{[5]}.
Framework of HMM
Definition of HMM
The main procedure of generating a HMM is as time pass by, states will change from one to another and there will be a observation in each new state. In this way, a HMM knows all the states will occur and all the observations it might meet. For example, it has possible states and possible observations. So, one of states will occur each time, and one of observations will appear each time based on current state. Besides, HMM knows the probability of changing from one state to another and the probability of what observation it might meet in each new state in order to increase the sequence. Thus, it is able to transfer from current state to a new one and making a new observation at this new state. Moreover, given a time period of length , the HMM will have states and observations, and at each time, state is paired to observation .
A Hidden Markov model could be defined as follow: Assuming is the set of all possible states while is a set of all possible observations: , where N refers to the number of possible states, and M refers to the number of possible observations. Besides, and both have the length of and indicate state sequence and observation sequence respectively: . In this way, the state transposition probability matrix could be built as: , where means at state , there will be a probability for this state to transform to state . It is defined as follow:
Moreover, output emission probability matrix , which indicates the probability to generate observation at given state , could be established as , where means at time , given state as , there is a probability to observe :
Furthermore, initial state vector , which intend to provide the initial state for a HMM could be defined as , where , and it means the probability for real observation to be at initial time 1.
A Hidden Markov model is decided by initial state vector , state transposition probability matrix and output emission probability matrix . Assuming a HMM is represented by , the model could be expressed as: . Initial state vector for state transposition probability matrix decides hidden Markov chain, and output emission probability matrix determines how the observations is generated from states.
Generation of observation sequence
An observation sequence could be generated according to the definition of HMM.
Algorithm 1 (generation of observation sequence)
input: and output: procedure: 1. Generate initial states according to . 2. Let = 1. 3. Generate , according to output emission matrix 4. Generate according to state transposition probability 5. Let if , go to step 3 else, stop.
Example 1 Here is an example given to explain how a hidden Markov model is established. Assuming there are 4 opaque boxes, each of them contains 10 white or red balls. Balls assigned in each box are shown in Table 1.
Box No.  1  2  3  4 

Red balls  5  3  6  8 
White balls  5  7  4  2 
Grab balls to generate a observation sequence as following rules. First, randomly choose one box from all four boxes with same probability, then grab a ball from this opaque box to record its color and put it back. Second, transfer to next box as following rules: if current box is No.1, the next box must be box No.2; if current box is No.2 and No.3, they have a probability of 0.4 to transfer to the box No.1 and No.2 respectively, and a probability of 0.6 to transfer to box No.3 and No.4 respectively; if current box is No.4, it might stay in box No.4 with probability of 0.5, otherwise it will go to box No.3. Third, move to new box according to the status and rules in step 2. Last, repeat step 2 and 3, for times to generate the observation sequence of ball color with length . For example, let , we might have a observation sequence as follow:
During the whole ball grabbing process, observer will only able to get the color of ball, they cannot know in which box is the ball come from, in an other words, they cannot observe the sequence of box transformation. In another words, box sequence is hidden, while the sequence of balls is observable.
In this way, we could determine possible states , possible observations and initial state vector .
Possible states are:
Possible observations are:
Initial state vector could be defined as:
Furthermore, state transposition probability matrix (Table 2) and output emission probability matrix (Table 3) could be defined.
A  1  2  3  4 

1  0  1  0  0 
2  0.4  0  0.6  0 
3  0  0.4  0  0.6 
4  0  0  0.5  0.5 
B  1  2 

1  0.5  0.5 
2  0.3  0.7 
3  0.6  0.4 
4  0.8  0.2 
In this way a hidden Markov model could be defined as .
Three core problem of HMM
There are three fundamental questions of HMM ^{[3]}, which are widely used to solve real world problems:
1. Probability calculation: given and observation sequence , calculate the probability of the occurrence of .
2. Learning: given observation sequence , estimate attributes of under maximum probability of .
3. Predicting (also named as decoding): given and observation sequence , find out state sequence with maximum probability .
Probability calculation
This question is to compute the probability of a particular output sequence by using parameters of the HMM: . In general, probability calculation problem could be solved by forwardbackward algorithm ^{[3]}^{[6]}. Forwardbackward algorithm includes forward procedure and backward procedure, both of them are able to compute the probability of output sequence individually.
Forward procedure
Definition: Given Markov model , the observation sequence to time as and current state , forward probability could be defined as:
In this way, observation probability for current state could be calculated recursively.
Algorithm 2 (forward algorithm) input: , . output: . procedure: 1. Initialization: . 2. Recursion: , where . 3. Terminal: .
Example 2 Here is an example to present how forward procedure works. Assuming there is a boxball observing system like Example 1 with HMM . State transposition probability matrix and output emission probability matrix are shown in Table 4 and Table 5 respectively. Letting , and using the forward procedure to solve it.
A  1  2  3 

1  0.5  0.2  0.3 
2  0.3  0.5  0.2 
3  0.2  0.3  0.5 
B  1  2 

1  0.5  0.5 
2  0.4  0.6 
3  0.7  0.3 
Solution 1. Calculate initial values: 2. Compute recursively 3. Sum
Backward procedure
Backward procedure is similar to forward procedure. Given Markov model , current state at time and the observation sequence from time to time , backward probability could be defined as:
Similarly, observation probability for current state could be calculated.
Algorithm 3 (backward algorithm) input: , . output: . procedure: 1. Initialization: . 2. Recursion: , where . 3. Terminal: .
Learning
HMM is also used to find the best set of state transition and output emission probabilities. Given an output sequence, this method could derive the maximum likelihood estimate of the parameters of the HMM. In this section, Expectation Maximization algorithm^{[7]} helps HMM to find out that, given a sequence of observations, what are the values of state transposition probability matrix and output emission probability matrix makes such observation most likely ^{[6]}.
Algorithm 4 (Expectation Maximization algorithm) input: . output: , . procedure: 1. Initialization: set values of matrix , as random valid numbers, where and , , . 2. Estep: run forwardbackward algorithm o compute and for : 3. Mstep: reestimate the maximum likelihood of and : 4. Repeat step 2 and 3, until converge.
This algorithm is a variant of forwardbackward algorithm, also called BaumWelch algorithm. It is used for parameter learning for HMM.
Prediction
The final question of HMM is to compute what was the most likely sequence of states with the help of a sequence of observation. Viterbi algorithm uses dynamic programming to solve such problem ^{[3]}.
To apply Viterbi algorithm, some variables (, ) should be introduced.
stands for maximum probability routine among all possible states () at time :
Also, could be calculated:
stands for the node of maximum probability routine among all possible states () at time :
The process of Viterbi algorithm is shown as follow:
Algorithm 5 (Viterbi algorithm) input: , . output: optimal state routine , probability of such routine 1. Initialization: 2. for t = (2, 3, \cdots, T) 3. Stop:
The Viterbi algorithm is similar to forward procedure except only tracks maximum probability and record its corresponding state sequence, rather than tracking the total probability of all observations so far.
Applications
Hidden Markov Model are applied to many fields to discover not immediately observable data sequence ^{[1]}:
 Single Molecule Kinetic analysis^{[8]}
 Cryptanalysis
 Speech recognition
 Speech synthesis
 Partofspeech tagging
 Document Separation in scanning solutions
 Machine translation
 Partial discharge
 Gene prediction
 sequence alignmentAlignment of biosequences
 Time Series Analysis
 Activity recognition
 Protein folding^{[9]}
 Metamorphic Virus Detection^{[10]}
 DNA Motif Discovery^{[11]}
Annotated Bibliography
 ↑ ^{1.0} ^{1.1} ^{1.2} Hidden Markov Model, Available at: https://en.wikipedia.org/wiki/\Hidden_Markov_model [Accessed Jan272016].
 ↑ ^{2.0} ^{2.1} Daniel, Jurafsky (2014). Speech and Language Processing. Prentice Hall. ISBN 9780131873216.
 ↑ ^{3.0} ^{3.1} ^{3.2} ^{3.3} Hang, Li (2012). Statistical Learning Methods (in Chinese). Tsinghua University Press. ISBN 9787302275954.
 ↑ ^{4.0} ^{4.1} Zoubin, Ghahramani (2001). "An Introduction to Hidden Markov Models and Bayesian Networks". International Journal of Pattern Recognition and Artificial Intelligence.
 ↑ ^{5.0} ^{5.1} David, Poole (2010). Artificial Intelligence: Foundations of Computational Agents. Cambridge University Press. ISBN 9780521519007.
 ↑ ^{6.0} ^{6.1} Hidden Markov Models Fundamentals, Available at: http://cs229.stanford.edu/section/cs229hmm.pdf [Accessed Jan292016]
 ↑ Expectation Maximization algorithm, Available at: https://en.wikipedia.org/wiki/Expectation–maximization_algorithm [Accessed Jan302016].
 ↑ NICOLAI, CHRISTOPHER (2013). "SOLVING ION CHANNEL KINETICS WITH THE QuB SOFTWARE". Biophysical Reviews and Letters. 8 (3n04): 191–211. doi:10.1142/S1793048013300053.
 ↑ Stigler, J.; Ziegler, F.; Gieseke, A.; Gebhardt, J. C. M.; Rief, M. (2011). "The Complex Folding Network of Single Calmodulin Molecules". Science. 334 (6055): 512–516. doi:10.1126/science.1207598. PMID 22034433.
 ↑ Wong, W.; Stamp, M. (2006). "Hunting for metamorphic engines". Journal in Computer Virology. 2 (3): 211–229. doi:10.1007/s1141600600287.
 ↑ Wong, K. C.; Chan, T. M.; Peng, C.; Li, Y.; Zhang, Z. (2013). "DNA motif elucidation using belief propagation". Nucleic Acids Research. 41 (16): e153. doi:10.1093/nar/gkt574. PMC 3763557. PMID 23814189.
To Add
1. Add the proof process of three core questions of HMM
2. Add more examples to better explain three core questions of HMM
