Jump to content

Course:CPSC522/Hidden Markov Models

From UBC Wiki

Hidden Markov Models

Hidden Markov model is a statistical model, where the system is modeled follows Markov process with hidden (unobserved) states.[1]

Principal Author: Jiahong Chen

Collaborators: Dandan wang, Mehrdad Ghomi

Abstract

This page describes HMM as a probabilistic model, introduces the framework of it as well as giving an example to tell how a HMM is established. Also, it introduces three main questions of HMM and gives detailed algorithms to describe how to solve these questions.

Builds on

A hidden Markov model (HMM) is a statistical Markov model, where the system is modeled follows Markov process with hidden (unobserved) states, and builds on Graphical_Models. Also, it can be presented as the simplest dynamic Bayesian network.

Related Pages

As mentioned above, A hidden Markov model (HMM) is a statistical Markov model, which builds on Graphical Models. It is also related to Bayesian network and Markov Decision Process.

Content

Introduction

Hidden Markov Model(HMM) is a probabilistic sequence model used for assigning a label or class to each unit in a sequence, and mapping a sequence of labels to a sequence of observations[2]. It could be applied to numerous fields like speech recognition, natural language processing, bioinformatics, pattern recognize and so on[3]. HMM are able to compute the probability of a sequence of unobservable events, which usually described as hidden events with the help of a sequence of observed events[2].

HMM gets its name from two defining properties. The first one is states of hidden process follow Markov property, that is, given state

St1

, current state

St

is independent to states prior to

St1

. The second property is observation

Ot

is only dependent to

St

, where

St

is unobservable [4][5]. Figure [1] indicates the belief representation of an HMM[4][5].

Figure 1: HMM belief representation[1].

Framework of HMM

Definition of HMM

The main procedure of generating a HMM is as time pass by, states will change from one to another and there will be a observation in each new state. In this way, a HMM knows all the states will occur and all the observations it might meet. For example, it has N possible states and M possible observations. So, one of N states will occur each time, and one of M observations will appear each time based on current state. Besides, HMM knows the probability of changing from one state to another and the probability of what observation it might meet in each new state in order to increase the sequence. Thus, it is able to transfer from current state to a new one and making a new observation at this new state. Moreover, given a time period of length T, the HMM will have T states and observations, and at each time, state it is paired to observation ot.

A Hidden Markov model could be defined as follow: Assuming Q is the set of all possible states while V is a set of all possible observations: Q={q1,q2,,qN}, V={v1,v2,,vM}, where N refers to the number of possible states, and M refers to the number of possible observations. Besides, I and O both have the length of T and indicate state sequence and observation sequence respectively: I=(i1,i2,,iT),O=(o1,o2,,oT). In this way, the state transposition probability matrix A could be built as: A=[aij]N×N, where aij means at state qi, there will be a probability aij for this state to transform to state qj. It is defined as follow:

aij=P(it+1=qj|it=qi),i=1,2,,N;j=1,2,,N;t{1,2,,T1}

Moreover, output emission probability matrix

B

, which indicates the probability to generate observation

vk

at given state

qj

, could be established as

B=[bj(k)]N×M

, where

bj(k)

means at time

t

, given state as

qi

, there is a probability

bj(k)

to observe

vk

:

bj(k)=P(ot=vk|it=qi),k=1,2,,M;j=1,2,,N

Furthermore, initial state vector π, which intend to provide the initial state for a HMM could be defined as π=(πi), where πi=P(i1=qi),i=1,2,,N, and it means the probability for real observation i to be qi at initial time 1.

A Hidden Markov model is decided by initial state vector π, state transposition probability matrix A and output emission probability matrix B. Assuming a HMM is represented by λ, the model could be expressed as: λ=(A,B,π). Initial state vector π for state transposition probability matrix A decides hidden Markov chain, and output emission probability matrix B determines how the observations is generated from states.

Generation of observation sequence

An observation sequence O={o1,o2,,cT} could be generated according to the definition of HMM.

Algorithm 1 (generation of observation sequence)
 input: λ=(A,B,π) and T
 output: O={o1,o2,,cT}
 procedure:
 1. Generate initial states ii according to π.
 2. Let t = 1.
 3. Generate ot, according to output emission matrix bit(k)
 4. Generate it+1 according to state transposition
    probability ait,it+1
 5. Let t=t+1
 	if t<T, go to step 3
 	else, stop.

Example 1 Here is an example given to explain how a hidden Markov model is established. Assuming there are 4 opaque boxes, each of them contains 10 white or red balls. Balls assigned in each box are shown in Table 1.

Table 1: Ball arrangement
Box No. 1 2 3 4
Red balls 5 3 6 8
White balls 5 7 4 2

Grab balls to generate a observation sequence as following rules. First, randomly choose one box from all four boxes with same probability, then grab a ball from this opaque box to record its color and put it back. Second, transfer to next box as following rules: if current box is No.1, the next box must be box No.2; if current box is No.2 and No.3, they have a probability of 0.4 to transfer to the box No.1 and No.2 respectively, and a probability of 0.6 to transfer to box No.3 and No.4 respectively; if current box is No.4, it might stay in box No.4 with probability of 0.5, otherwise it will go to box No.3. Third, move to new box according to the status and rules in step 2. Last, repeat step 2 and 3, for T times to generate the observation sequence of ball color with length T. For example, let T=10, we might have a observation sequence as follow:

O={red,red,red,white,red,red,white,red,white,white}

During the whole ball grabbing process, observer will only able to get the color of ball, they cannot know in which box is the ball come from, in an other words, they cannot observe the sequence of box transformation. In another words, box sequence is hidden, while the sequence of balls is observable.

In this way, we could determine possible states Q, possible observations V and initial state vector π.

Possible states are:

Q={BoxNo.1,BoxNo.2,BoxNo.3,BoxNo.4},whereN=4

Possible observations are:

V={red,white},whereM=4

Initial state vector could be defined as:

π=(0.25,0.25,0.25,0.25)T

Furthermore, state transposition probability matrix A (Table 2) and output emission probability matrix B (Table 3) could be defined.

Table 2: State transposition probability matrix: A
A 1 2 3 4
1 0 1 0 0
2 0.4 0 0.6 0
3 0 0.4 0 0.6
4 0 0 0.5 0.5
Table 3: Output emission probability matrix
B 1 2
1 0.5 0.5
2 0.3 0.7
3 0.6 0.4
4 0.8 0.2

In this way a hidden Markov model could be defined as λ=(A,B,π).

Three core problem of HMM

There are three fundamental questions of HMM [3], which are widely used to solve real world problems:

1. Probability calculation: given λ=(A,B,π) and observation sequence O=(o1,o2,,oT), calculate the probability P(O|λ) of the occurrence of O.

2. Learning: given observation sequence O=(o1,o2,,oT), estimate attributes of λ under maximum probability of P(O|λ).

3. Predicting (also named as decoding): given λ=(A,B,π) and observation sequence O=(o1,o2,,oT), find out state sequence I=(i1,i2,,iT) with maximum probability P(I|λ).

Probability calculation

This question is to compute the probability of a particular output sequence by using parameters of the HMM: λ=(A,B,π). In general, probability calculation problem could be solved by forward-backward algorithm [3][6]. Forward-backward algorithm includes forward procedure and backward procedure, both of them are able to compute the probability of output sequence individually.

Forward procedure

Definition: Given Markov model λ, the observation sequence to time t as o1,o2,,ot and current state qi, forward probability αt(i) could be defined as:

αt(i)=P(o1,o2,,ot,it=qi|λ)

In this way, observation probability P(O|λ) for current state could be calculated recursively.

 Algorithm 2 (forward algorithm)
 input: λ=(A,B,π), O.
 output: P(O|λ).
 procedure:
 1. Initialization:
 	α1(i)=πibi(o1),i=1,2,,N.
 2. Recursion:
 	αt+1(i)=[j=1Nαt(j)aji]bi(ot+1),
 	where t=1,2,,T1,i=1,2,,N.
 3. Terminal:
 	P(O|λ)=i=1NαT(i).

Example 2 Here is an example to present how forward procedure works. Assuming there is a box-ball observing system like Example 1 with HMM λ=(A,B,π), Q={1,2,3}, V={red,white}, π=(0.2,0.4,0.4)T. State transposition probability matrix and output emission probability matrix are shown in Table 4 and Table 5 respectively. Letting T=3,O=(red,white,red), and using the forward procedure to solve it.

Table 4: State transposition probability matrix for example 2
A 1 2 3
1 0.5 0.2 0.3
2 0.3 0.5 0.2
3 0.2 0.3 0.5
Table 5: Output emission probability matrix for example 2
B 1 2
1 0.5 0.5
2 0.4 0.6
3 0.7 0.3
 Solution
 1. Calculate initial values:
   α1(1)=π1b1(o1)=0.1
   α1(2)=π2b2(o1)=0.16
   α1(3)=π3b3(o1)=0.28
 2. Compute recursively
   α2(1)=[i=13α1(i)ai1]b1(o2)=0.154×0.5=0.077
   α2(2)=[i=13α1(i)ai2]b2(o2)=0.184×0.6=0.1104
   α2(3)=[i=13α1(i)ai3]b3(o2)=0.202×0.3=0.0606
   α3(1)=[i=13α2(i)ai1]b1(o3)=0.04187
   α3(2)=[i=13α2(i)ai2]b2(o3)=0.03551
   α3(3)=[i=13α2(i)ai3]b3(o3)=0.05284
 3. Sum
   P(O|λ)=i=13α3(i)=0.13022
Backward procedure

Backward procedure is similar to forward procedure. Given Markov model λ, current state qi at time t and the observation sequence ot+1,ot+2,,oT from time t to time T, backward probability βt(i) could be defined as:

βt(i)=P(ot+1,ot+2,,oT,it=qi|λ)

Similarly, observation probability P(O|λ) for current state could be calculated.

 Algorithm 3 (backward algorithm)
 input: λ=(A,B,π), O.
 output: P(O|λ).
 procedure:
 1. Initialization:
 	βT(i)=1,i=1,2,,N.
 2. Recursion: 
 	βt(i)=j=1Nβt+1(j)aijbi(ot+1),
 	where t=T1,T2,,1,i=1,2,,N.
 3. Terminal:
 	P(O|λ)=i=1Nπibi(o1)β1(i).

Learning

HMM is also used to find the best set of state transition and output emission probabilities. Given an output sequence, this method could derive the maximum likelihood estimate of the parameters of the HMM. In this section, Expectation Maximization algorithm[7] helps HMM to find out that, given a sequence of observations, what are the values of state transposition probability matrix A and output emission probability matrix B makes such observation most likely [6].

 Algorithm 4 (Expectation Maximization algorithm)
 input: O.
 output: λ=(A,B,π), argmaxP(O|λ).
 procedure:
 1. Initialization: set values of matrix A, B as random
    valid numbers, where Ai0=0 and B0(k)=0, i=1,2,,N,
    k=1,2,,M.
 2. E-step: run forward-backward algorithm o compute αi and
    βi for i=1,2,,N:
 	γt(i,j):=αt(i)aijbj(ot+1)βt+1(i)
 3. M-step: re-estimate the maximum likelihood of A and B:
 	aij:=t=1Tγt(i,j)j=1Mt=1Tγt(i,j)
 	bj(k):=j=1Mt=1T1{ot=vk}γt(i,j)j=1Mt=1Tγt(i,j)
 4. Repeat step 2 and 3, until converge.

This algorithm is a variant of forward-backward algorithm, also called Baum-Welch algorithm. It is used for parameter learning for HMM.

Prediction

The final question of HMM is to compute what was the most likely sequence of states I with the help of a sequence of observation. Viterbi algorithm uses dynamic programming to solve such problem [3].

To apply Viterbi algorithm, some variables (δ, ψ) should be introduced.

δ stands for maximum probability routine among all possible states (i1,i2,,iT) at time t:

δt(i)=maxi1,i2,,it1P(it=i,it1,,i1,o1,o2,,ot|λ),i=q,2,,N

Also, δt+1(i) could be calculated:

δt+1(i)=maxi1,i2,,itP(it+1=i,it,,i1,o1,o2,,ot|λ)
=max1jN[δt(j)aji]bi(ot+1),i=1,2,,N;t=1,2,,T1

ψ stands for the t1 node of maximum probability routine among all possible states (i1,i2,,iT) at time t1:

ψt(i)=argmax1jN[δt1(j)aji],i=1,2,,N

The process of Viterbi algorithm is shown as follow:

 Algorithm 5 (Viterbi algorithm)
 input: λ=(A,B,π), O.
 output: optimal state routine I*=(i1*,i2*,,iT*),
 	probability of such routine P*
 1. Initialization: 
 	δ1(i)=piibi(o1),i=1,2,,N
 	ψ1(i)=0,i=1,2,,N
 2. for t = (2, 3, \cdots, T)
 	δt(i)=max1jN[δt1(j)aji]bi(ot),i=1,2,,N;
 	ψt(i)=argmax1jN[δt1(j)aji],i=1,2,,N
 3. Stop:
 	P*=max1jNδt(i)
 	iT*=argmax1jN[δt(i)]

The Viterbi algorithm is similar to forward procedure except only tracks maximum probability and record its corresponding state sequence, rather than tracking the total probability of all observations so far.

Applications

Hidden Markov Model are applied to many fields to discover not immediately observable data sequence [1]:

Annotated Bibliography

  1. 1.0 1.1 1.2 Hidden Markov Model, Available at: https://en.wikipedia.org/wiki/\Hidden_Markov_model [Accessed Jan-27-2016].
  2. 2.0 2.1 Daniel, Jurafsky (2014). Speech and Language Processing. Prentice Hall. ISBN 978-0131873216.
  3. 3.0 3.1 3.2 3.3 Hang, Li (2012). Statistical Learning Methods (in Chinese). Tsinghua University Press. ISBN 978-7-302-27595-4.
  4. 4.0 4.1 Zoubin, Ghahramani (2001). "An Introduction to Hidden Markov Models and Bayesian Networks". International Journal of Pattern Recognition and Artificial Intelligence.
  5. 5.0 5.1 David, Poole (2010). Artificial Intelligence: Foundations of Computational Agents. Cambridge University Press. ISBN 978-0521519007.
  6. 6.0 6.1 Hidden Markov Models Fundamentals, Available at: http://cs229.stanford.edu/section/cs229-hmm.pdf [Accessed Jan-29-2016]
  7. Expectation Maximization algorithm, Available at: https://en.wikipedia.org/wiki/Expectation–maximization_algorithm [Accessed Jan-30-2016].
  8. NICOLAI, CHRISTOPHER (2013). "SOLVING ION CHANNEL KINETICS WITH THE QuB SOFTWARE". Biophysical Reviews and Letters. 8 (3n04): 191–211. doi:10.1142/S1793048013300053.
  9. Stigler, J.; Ziegler, F.; Gieseke, A.; Gebhardt, J. C. M.; Rief, M. (2011). "The Complex Folding Network of Single Calmodulin Molecules". Science. 334 (6055): 512–516. doi:10.1126/science.1207598. PMID 22034433.
  10. Wong, W.; Stamp, M. (2006). "Hunting for metamorphic engines". Journal in Computer Virology. 2 (3): 211–229. doi:10.1007/s11416-006-0028-7.
  11. Wong, K. -C.; Chan, T. -M.; Peng, C.; Li, Y.; Zhang, Z. (2013). "DNA motif elucidation using belief propagation". Nucleic Acids Research. 41 (16): e153. doi:10.1093/nar/gkt574. PMC 3763557. PMID 23814189.

To Add

1. Add the proof process of three core questions of HMM

2. Add more examples to better explain three core questions of HMM

Some rights reserved
Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0