# Course:CPSC522/Markov Chains

## Title

This is a writeup of Discrete Markov chains, its classification and the limiting and stationary distribution

Principal Author: Surbhi Palande

Collaborators: Kevin Dsouza

## Abstract

Stochastic processes basically just means random variables evolving over time. This is an observation of dependent random variables. Stochastic processes can be classified in two main families:

1. Markov Processes where roughly speaking the probability of being in a state X at time t+1 is only dependent on the state the system is in at time ${\displaystyle X_{t}}$
2. Martingaleswhere roughly speaking you make a best estimate at time instance “s” of the probability of the system being in state X at a future time instance “t”. This estimate is refered to as ${\displaystyle X_{s}}$ (t > s)

In this writeup, we study Markov Chains that abide to the Markov property. Markov property is a certain conditional probability statement. The past and the future are conditionally independent given the present.

### Builds on

Knowledge of probability distributions is assumed

## Content

Markov chains named after the Russian mathematician Andrey Markov are "a stochastic model. They describe a sequence of possible events in which, the probability of each event depends only on the state attained in the previous event. The future depends only on the present and not on the past! It is a mathematical system that experiences transitions from one state to another according to certain probabilistic rules. The defining characteristic of a Markov chain is that no matter how the process arrived at its present state, the possible future states are fixed. In other words, the probability of transitioning to any particular state is dependent solely on the current state and hence they are referred to a "Memoryless". Markov Chains can be modeled as a finite state machine that shows how a system transitions from one state to another and with what probability. In the language of conditional probability and random variables, a Markov chain is a sequence of random variables satisfying the rule of conditional independence:
For any positive integer n and possible states ${\displaystyle i_{0},i_{1},i_{2},i_{n-1},i_{n}}$ of the random variables,
${\displaystyle P(X_{n}=i_{n}|X_{n-1}=i_{n-1})=P(X_{n}=i_{n}|X_{n-1}=i_{n-1}|X_{n-2}=i_{n-2}|....X_{2}=i_{2}|X_{1}=i_{1}|X_{0}=i_{0})}$

Time Homogenous Markov Chains are time independent. The value of the transition probability is time independent. As against this the transitional probabilities in Time inhomogeneous Markov Chains changes with time and is thus not fixed. The transitional probabilities of a time-homogeneous Markov Chains can be expressed in a transition matrix Q. This matrix thus describes the probability of transitioning from ${\displaystyle state_{i}}$ to ${\displaystyle state_{j}}$ and fills up the matrix of the size m * m where m is the total number of nodes in the matrix. Each column in this matrix Q describes the probabilities of being in that state from some other state. Similarly, each row in this matrix states the probability to move from a particular state to all the other states. Since the probability that you move out from one state to another is 1 at any given instance, the row of this transition matrix adds to 1. ${\displaystyle \sum _{j}P(Z_{i,j})=1}$
Each row is the Probability Mass Function (PMF) ${\displaystyle s_{j}}$ at time n given an initial state j. As described above the transition matrix is a convenient way to record the transition probabilities of a time-homogeneous Markov chain. However, it is much more than that. It can be used to find the transitional probability after “n” time units using the power transitional matrix.
For example:
${\displaystyle Q^{1}}$: describes the probability of going from any ${\displaystyle state_{i}}$ to any ${\displaystyle state_{j}}$ in 1 step.
${\displaystyle Q^{2}}$: describes the probability of going from any ${\displaystyle state_{i}}$ to any ${\displaystyle state_{j}}$ in 2 steps.
${\displaystyle Q^{3}}$: describes the probability of going from any ${\displaystyle state_{i}}$ to any ${\displaystyle state_{j}}$ in 3 steps.

Finding the PMF at ${\displaystyle time_{n+1}}$ given the transition matrix and the state at n.
Now suppose we want the PMF at ${\displaystyle time_{n+1}}$ i.e now we want to know the ${\displaystyle P(X_{n+1}=j)}$ Intuitively, we need to condition on the value “n” i.e we use what state we are in at time instance “n”.

By law of total probabilities: ${\displaystyle P(X_{n+1}=j)=\sum \ P(Xn+1=j|Xn=i)*P(Xn=i)}$

Now ${\displaystyle P(X_{n+1}=j|X_{n}=i)=q_{i,j}}$ by definition - i.e probability of going from i to j ${\displaystyle P(X_{n}=i)}$ is the PMF at ${\displaystyle time_{n}}$ given the initial ${\displaystyle state_{i}}$ at ${\displaystyle time_{n}}$. This is the ${\displaystyle i^{th}}$ “row” vector from the transition matrix Q. This then becomes the jth entry in the Matrix [s * Q] where, s is the PMF and Q is the transition matrix and the product is of dimension 1 row * m columns

Examples of Markov Chains:
Analysis of a company market share
Consider an example where a company initially has a 10% market share. Using an advertising campaign, the transformation matrix is given by

Transition matrix (Q) A A'
A 0.8 0.2
A' 0.6 0.4

where: A uses Brand A whereas A’ uses some different brand. If the probabilities in P remain valid for over a long period of time, what happens to the companies market share?

Since the company has initially 10% market share, the initial state is S0 = [.1 0.9] where 0.1 represents the A’s share and 0.9 represents the remaining share of A’. Let’s assume that the transitional matrix represents a time period of 1 month. So if we look at the next month:

S1 = S0 * Q  =  [0.1	0.9] * [0.8 	0.2]  =   [0.62 	   0.38]
[0.6	 0.4] 

What the product S1 implies that now after a month, A hold 62% of the market share and A’ holds 38% market share

S2 = S1 * Q = S0 * Q * Q = [0.62	0.38] * [0.8 	0.2]  =   [0.724 	0.276]
[0.6	0.4] 

So after 2 months, we see that Brand A has 72.4% market share and A’ has 27.6% market share.

Similarly after 3 months we see:

S3 = S2 * Q = S0 * Q * Q * Q =  [0.724	0.276] * [0.8 	0.2]  =   [0.7448 	0.2552]
[0.6	0.4]

S4 = S3 * Q = S0 * Q * Q * Q * Q =  [0.7448   0.2552] * [0.8 	0.2]  =   [0.74979  0.25020]
[0.6	0.4] 

So we can see that as more months elapse, A is steadily keeping its 75% market share and A’ is steady at 25% market share. This matrix S = [0.75 0.25] is the stationary matrix and the system is said to be in steady state or equilibrium

Other examples

• By the statistical analysis of note transitions, every type of music can be encoded as a Markov chain.
• Markov chains can be used to generate text in a given language, based on a statistical analysis on the transition between words in a sample space.
• Google page rank is an example of Markov Chains: states are web pages and the links are the hyperlinks. The importance of a page is not just how many links there are to it, but also how important those pages are.

Classification of states and Markov Chains.

Irreducible Markov Chains In case you can transition from ${\displaystyle state_{i}}$ to ${\displaystyle state_{j}}$ and also transition back from ${\displaystyle state_{j}}$ to ${\displaystyle state_{i}}$ then we state that i and j “communicate” and it is denoted as i↔︎j. When all states in a Markov chain “communicate” then those chains are called as “Irreducible” Markov Chains. Clearly, there is no state which is “absorbable” in an “Irreducible” Markov chain. Since all states in an Irreducible Markov chain communicate, it is possible to revisit states from any given state. Such states which can be revisited are called as recurrent states. Irreducible Markov chains have recurrent states. Thus if any of the diagonal entries in a transition matrix Q is 1 and the rest of the entries in that same row are 0, then that transitional matrix belongs to an irreducible Markov chain.

Conversely, when there exists some state that is not communicable, then the Markov chain is called as a “Reducible Markov chain” . Thus it is possible that i→j but the transition from j back to i may not be possible. Clearly, in reducible Markov chains, there exists some state “i” which cannot be revisited from some other state “j” (however state j can be visited from state i) These states “i” are called as transient states. A reducible Markov Chains has “transient states”.

Absorbing Markov Chain A state k is said to be absorbing if ${\displaystyle P_{k,k}=1}$ An absorbing state can never be left from. A Markov chain is an absorbing Markov chain iff:

• At least one of the states is absorbing
• It is possible to go from each non-absorbing state to at least one absorbing state in a finite number of steps.

Note that absorbing states are “recurrent” states. If the expected time of return back to a state is less than ∞, then the state is called as “positive” recurrent. On the other hand, if the expected time taken to return back to a state is ∞, then the state is called as “null recurrent”. If the number of states in a Markov chain are finite, then all the recurrent states are always “positive recurrent”.

Aperiodic state in a Markov Chain A period of a state is k, where k is the smallest number such that all paths leading from ${\displaystyle state_{i}}$ back to ${\displaystyle state_{i}}$ have a length that is a multiple of k.
A state having a period > 1 is called as “Periodic”.

Given a state i∈S, consider the set of integers n≥1: ${\displaystyle P_{i,i}^{n}>0}$.The period of the state i∈S is the greatest common divisor of n≥1: ${\displaystyle P_{i,i}^{n}>0}$. A state having period 1 is said to be aperiodic, which is the case in particular if ${\displaystyle P_{i,i}}$ > 0. It is important to note that all states of an irreducible Markov chain have the same period.

For a transient set, the ${\displaystyle P_{i,i}=0}$ and by convention the period is denoted to be 0. If we may not come back to a state again, that state is NOT “Periodic”. It is also NOT “Aperiodic”.

When the period is “1” the state is called as “Aperiodic”. If the transitional probability of going from ${\displaystyle state_{i}}$ to ${\displaystyle state_{i}}$ (i.e itself) is equal to 1, then such a state is called as an “Aperiodic” state. Thus an absorbing state is an “aperiodic” state. A recurrent state is “Ergodic” when it is both “Aperiodic” and “positive recurrent”. A Markov chain is “Ergodic” when all the states in the Markov chain are “Ergodic”. Thus all states in an Ergodic chain “communicate” with each other, making the chain irreducible. Furthermore, there are no transient states in an Ergodic chain. This means that once you enter an Ergodic state you will always visit it again.
If every probability/number in the power transition matrix, is positive and non zero then, the states in the Markov Chain are not periodic (if there was zero in the original matrix Q then the ${\displaystyle Q^{n}}$ will have a zero somewhere) Such a transition matrix is called as a Regular transition matrix and it always has a stationary distribution (described ahead).

Reversible Markov Chains This reversibility refers to time. Assume that you have an irreducible and positive recurrent chain, started at its unique invariant distribution ${\displaystyle \Pi }$. This implies that ${\displaystyle \Pi *Q=\Pi }$ This means that ${\displaystyle \Pi }$ is the p. m. f. of ${\displaystyle X_{0}}$ , and all other ${\displaystyle X_{n}}$ as well. Now suppose that, for every n, ${\displaystyle X_{0},X_{1},...,X_{n-1},X_{n}}$ has the same joint p. m. f. as their time-reversal ${\displaystyle X_{n},X_{n-1}....X_{1},X_{0}}$. Then we call the chain reversible — sometimes it is, equivalently, also said that its invariant distribution ${\displaystyle \Pi }$ is reversible . This means that a recorded simulation of a reversible chain looks the same if the “movie” is run backward.
Is there a condition for reversibility that can be easily checked? The first thing to observe is that if the chain is started at ${\displaystyle \Pi }$, reversible or not, the time-reversed chain has the Markov property. For reversibility, this expression must be the same as the forward transition probability ${\displaystyle P(X_{k+1}=i|X_{k}=j)=P_{j,i}}$. Conversely, if both original and time-reversed chain have the same transition probabilities (and we already know that the two start at the same invariant distribution and that both are Markov), then their p. m. f.’s must agree. A Markov chain with invariant measure ${\displaystyle \Pi }$ is reversible if and only if

${\displaystyle \Pi _{i}P_{i,j}=\Pi _{j}P_{j,i}}$, for all states i and j.


Example: Random walks on an undirected graph can be expressed as a Reversible Markov Chain. Google Page Rank is a not a reversible Markov Chain.

Long standing behavior of Markov Chains:

Limiting Distribution: A Markov chain (Xn)n∈N is said to have a limiting distribution if:
${\displaystyle \lim _{n\to \infty }P(X_{n}=j|X_{0}=i)}$ exist for all i,j ∈ S where S is a probability distribution of the Markov Chains and
${\displaystyle \lim _{n\to \infty }P(X_{n}=j|X_{0}=i)=1}$.

Provided these two conditions are met, then the limiting distribution of a Markov chain with ${\displaystyle X_{0}=i}$ is the probability distribution given by ${\displaystyle l=(L_{i,j})}$ for all states j.

For example when the transition matrix Q is regular (i.e. it has a power matrix whose coefficients are all non-zero) on the finite state space S={0,1,...,N}, the chain admits a limiting distribution ${\displaystyle \Pi =\Pi _{i}}$ where 0 ≤ i ≤ N given by ${\displaystyle \Pi _{j}=\lim _{n\to \infty }P(X_{n}=j|X_{0}=i)}$, 0≤i, j≤N

Stationary distribution: If you run the Markov chains for a long time then it converges to a state such that s = Q*s Intuitively this implies that you start at a distribution of states as “s” and then at the next instance you still are at Q* s = s i.e same state. Hence you are in a state of equilibrium. The stationary distribution lets you study the long-term behavior of the model. It gives information about the stability of a random process and, in certain cases, describes the limiting behavior of the Markov chain.

A probability distribution on S, i.e. a family ${\displaystyle \Pi =\Pi _{i}}$ i∈S in [0,1] such that i∈S, ${\displaystyle \sum \Pi _{i}=1}$ is said to be stationary if, starting ${\displaystyle X_{0}}$ at ${\displaystyle time_{0}}$ with the distribution ${\displaystyle \Pi _{i},}$ i∈S,it turns out that the distribution of ${\displaystyle X_{1}}$ is still ${\displaystyle \Pi _{i}}$, i∈S at ${\displaystyle time_{1}}$. In other words, ${\displaystyle \Pi _{i}}$, i∈S is stationary for the Markov chain with transition matrix Q if, letting ${\displaystyle P(X_{0}=i)=\Pi _{i}}$, i∈S, at ${\displaystyle time_{0}}$, implies ${\displaystyle P(X_{1}=i)=P(X_{0}=i)=\Pi _{i}}$, i∈S, at time 1. Thus the distribution does not change with time.

The distribution ${\displaystyle \Pi }$ is stationary if and only if it is invariant by the matrix Q, that means ${\displaystyle \Pi =\Pi Q}$.

For any time-homogeneous Markov chain that is aperiodic and irreducible, ${\displaystyle \lim _{n\to \infty }Q^{n}}$ lim n→∞ Q^n converges to a matrix with all rows identical and equal to ${\displaystyle \Pi }$ i.e the stationary distribution.
We also note that stationary and limiting distributions are quite different concepts. If the chain is started in the stationary distribution then it will remain in that distribution at any subsequent timestep (which is stronger than saying that the chain will reach that distribution after an infinite number of time steps). On the other hand, in order to reach the limiting distribution, the chain can be started from any given initial distribution or even from any fixed given state, and it will converge to the limiting distribution if it exists. For eg: Limiting distributions may not exist when the chain is NOT Aperiodic.

In summary, we usually try first to compute the stationary distribution whenever possible, and this also gives the limiting distribution when it exists. Otherwise, we may try to compute the limiting distribution by exponentiating the transition matrix Q and taking the limit, but this is normally much more complicated and done only in exceptional cases. To summarize we note that when we have:

• irreducible + recurrent + aperiodic =⇒existence of a limiting distribution,
• irreducible + positive recurrent + aperiodic =⇒ existence of a limiting distribution which is also stationary
• irreducible + positive recurrent =⇒ existence of a stationary distribution.

Finding the Limiting Distribution for an Absorbing Markov Chain explained with an example:
[Note: Absorbing states are positive recurrent and aperiodic. The Markov chain may not be Aperiodic as the chain depends on all the states in the chain]

Suppose a credit union classifies automobile loans into one of the four categories: The loan has been paid in pull (F) The account is in good standing (G) with all payments uptodate. The account is in arrears (A) with one or more missing payments The account has been classified as bad debt (B) and sold to a collection agency. Past records indicate that each month 10% of the accounts in good standing pay the loan in full. 80% remain in good standing and 10% become in arrears. For accounts in arrears 10% are paid in full, 40% become accounts in good standing, 40% remain in arrears and 10% are classified as bad debts.

Markov Chains

The above Markov chain has the following properties:

• Following are the absorbing states: F, B. These are also the recurring states. A and G are the non-absorbing states as you may not reach them back.
• This is a periodic Markov chain with period = 0 as state B is recurring and A, G, F are transient. Once you reach B you cannot go back to these states.
• The chain is reducible as B does not communicate back to A.
• The chain is not Ergodic as it is not aperiodic and it is not recurrent.

Standard Form representation of the transition matrix is such that all the absorbing states are put before the non-absorbing states. In this case F, B comes before A and G.

Standard Matrix F B G A
F 1 0 0 0
B 0 1 0 0
G 0.1 0 0.8 0.1
A 0.1 0.1 0.4 0.4

[FB][FB]: forms an Identity Matrix (I) [FB][GA] forms a Zero Matrix. [GA][FB]: refer as R [GA][GA]: refer as Q

We are going to calculate (I - Q) (I: Identity matrix and Q is as above) and find the inverse of that. This is refered to as the Fundamental matrix F => [(I - Q) ^ -1 ] This is as follows:

Fundamental Matrix G A Addition
G 7.5 1.25 8.75
A 5 2.5 7.5

The sum of the entries in each row of the fundamental matrix F is the average number of trials it will take to go from each non-absorbing state to some absorbing state.

Thus it will take on an average 7.5 months for the Arrears account to be classified as a Full payment account or a Bad Debt. Similarly, it will take on an average 8.75 months for the account in good standing to be classified as Full payment account or Bad Debt. We then find the product of F and R We can now use this product FR to find the limiting matrix as follows:

Limiting Matrix F B G A
F 1 0 0 0
B 0 1 0 0
G 0.875 0.125 0 0
A 0.75 0.25 0 0

[FB][FB]: forms an Identity Matrix [FB][GA] forms a Zero Matrix. [GA][GA]: forms a Zero Matrix. [GA][FB]: we got from FR mentioned above.

Using this limiting matrix we can make the following statements:

• In the long run, 75% loans from Arrears transit to full payments and 25% loans from Arrears transit to Bad Loans.
• Similarly 87.5% loans in good standing are paid in full and 12.5% loans in good standing become bad loans.
• 0% of good loans become arrear loans and 0 % of arrear loans become good loans.

## Annotated Bibliography

• Understanding Markov Chains Examples and Applications - Nicolas Privault

This books introduces the relevant concepts of probability theory, distributions and explains Markov Chains with a lot of examples and problems. Solutions to the problems are also offered at the end of the book.

This small lecture series discuss practical examples to explain Markov Chains. Its an excellent reference to understand Markov Chains. The examples in this wiki are from this lecture series

This lecture series is a descent watch for understanding Markov Chains.

• Introduction to Probability Theory - William Feller

This is a good introduction to the relevant probability theory needed to understand Markov Chains and its distribution. This book also has a chapter on Markov Chains which is a fine read.