This is a writeup of Discrete Markov chains, its classification and the limiting and stationary distribution
Principal Author: Surbhi Palande
Collaborators: Kevin Dsouza
Stochastic processes basically just means random variables evolving over time. This is an observation of dependent random variables. Stochastic processes can be classified in two main families:
In this writeup, we study Markov Chains that abide to the Markov property. Markov property is a certain conditional probability statement. The past and the future are conditionally independent given the present.
Knowledge of probability distributions is assumed
Markov chains named after the Russian mathematician Andrey Markov are "a stochastic model. They describe a sequence of possible events in which, the probability of each event depends only on the state attained in the previous event. The future depends only on the present and not on the past! It is a mathematical system that experiences transitions from one state to another according to certain probabilistic rules. The defining characteristic of a Markov chain is that no matter how the process arrived at its present state, the possible future states are fixed. In other words, the probability of transitioning to any particular state is dependent solely on the current state and hence they are referred to a "Memoryless". Markov Chains can be modeled as a finite state machine that shows how a system transitions from one state to another and with what probability. In the language of conditional probability and random variables, a Markov chain is a sequence of random variables satisfying the rule of conditional independence:
For any positive integer n and possible states of the random variables,
Time Homogenous Markov Chains are time independent. The value of the transition probability is time independent. As against this the transitional probabilities in Time inhomogeneous Markov Chains changes with time and is thus not fixed. The transitional probabilities of a timehomogeneous Markov Chains can be expressed in a transition matrix Q. This matrix thus describes the probability of transitioning from to and fills up the matrix of the size m * m where m is the total number of nodes in the matrix. Each column in this matrix Q describes the probabilities of being in that state from some other state. Similarly, each row in this matrix states the probability to move from a particular state to all the other states. Since the probability that you move out from one state to another is 1 at any given instance, the row of this transition matrix adds to 1.
Each row is the Probability Mass Function (PMF) at time n given an initial state j. As described above the transition matrix is a convenient way to record the transition probabilities of a timehomogeneous Markov chain. However, it is much more than that. It can be used to find the transitional probability after “n” time units using the power transitional matrix.
For example:
: describes the probability of going from any to any in 1 step.
: describes the probability of going from any to any in 2 steps.
: describes the probability of going from any to any in 3 steps.
Finding the PMF at given the transition matrix and the state at n.
Now suppose we want the PMF at i.e now we want to know the Intuitively, we need to condition on the value “n” i.e we use what state we are in at time instance “n”.
By law of total probabilities:
Now by definition  i.e probability of going from i to j
is the PMF at given the initial at . This is the “row” vector from the transition matrix Q. This then becomes the jth entry in the Matrix [s * Q] where, s is the PMF and Q is the transition matrix and the product is of dimension 1 row * m columns
Examples of Markov Chains:
Analysis of a company market share
Consider an example where a company initially has a 10% market share. Using an advertising campaign, the transformation matrix is given by
Transition matrix (Q)  A  A' 

A  0.8  0.2 
A'  0.6  0.4 
where: A uses Brand A whereas A’ uses some different brand. If the probabilities in P remain valid for over a long period of time, what happens to the companies market share?
Since the company has initially 10% market share, the initial state is S0 = [.1 0.9] where 0.1 represents the A’s share and 0.9 represents the remaining share of A’. Let’s assume that the transitional matrix represents a time period of 1 month. So if we look at the next month:
S1 = S0 * Q = [0.1 0.9] * [0.8 0.2] = [0.62 0.38] [0.6 0.4]
What the product S1 implies that now after a month, A hold 62% of the market share and A’ holds 38% market share
S2 = S1 * Q = S0 * Q * Q = [0.62 0.38] * [0.8 0.2] = [0.724 0.276] [0.6 0.4]
So after 2 months, we see that Brand A has 72.4% market share and A’ has 27.6% market share.
Similarly after 3 months we see:
S3 = S2 * Q = S0 * Q * Q * Q = [0.724 0.276] * [0.8 0.2] = [0.7448 0.2552] [0.6 0.4] S4 = S3 * Q = S0 * Q * Q * Q * Q = [0.7448 0.2552] * [0.8 0.2] = [0.74979 0.25020] [0.6 0.4]
So we can see that as more months elapse, A is steadily keeping its 75% market share and A’ is steady at 25% market share. This matrix S = [0.75 0.25] is the stationary matrix and the system is said to be in steady state or equilibrium
Other examples
Classification of states and Markov Chains.
Irreducible Markov Chains
In case you can transition from to and also transition back from to then we state that i and j “communicate” and it is denoted as i↔︎j. When all states in a Markov chain “communicate” then those chains are called as “Irreducible” Markov Chains. Clearly, there is no state which is “absorbable” in an “Irreducible” Markov chain. Since all states in an Irreducible Markov chain communicate, it is possible to revisit states from any given state. Such states which can be revisited are called as recurrent states. Irreducible Markov chains have recurrent states. Thus if any of the diagonal entries in a transition matrix Q is 1 and the rest of the entries in that same row are 0, then that transitional matrix belongs to an irreducible Markov chain.
Conversely, when there exists some state that is not communicable, then the Markov chain is called as a “Reducible Markov chain” . Thus it is possible that i→j but the transition from j back to i may not be possible. Clearly, in reducible Markov chains, there exists some state “i” which cannot be revisited from some other state “j” (however state j can be visited from state i) These states “i” are called as transient states. A reducible Markov Chains has “transient states”.
Absorbing Markov Chain A state k is said to be absorbing if An absorbing state can never be left from. A Markov chain is an absorbing Markov chain iff:
Note that absorbing states are “recurrent” states. If the expected time of return back to a state is less than ∞, then the state is called as “positive” recurrent. On the other hand, if the expected time taken to return back to a state is ∞, then the state is called as “null recurrent”. If the number of states in a Markov chain are finite, then all the recurrent states are always “positive recurrent”.
Aperiodic state in a Markov Chain
A period of a state is k, where k is the smallest number such that all paths leading from back to have a length that is a multiple of k.
A state having a period > 1 is called as “Periodic”.
Given a state i∈S, consider the set of integers n≥1: .The period of the state i∈S is the greatest common divisor of n≥1: . A state having period 1 is said to be aperiodic, which is the case in particular if > 0. It is important to note that all states of an irreducible Markov chain have the same period.
For a transient set, the and by convention the period is denoted to be 0. If we may not come back to a state again, that state is NOT “Periodic”. It is also NOT “Aperiodic”.
When the period is “1” the state is called as “Aperiodic”. If the transitional probability of going from to (i.e itself) is equal to 1, then such a state is called as an “Aperiodic” state. Thus an absorbing state is an “aperiodic” state. A recurrent state is “Ergodic” when it is both “Aperiodic” and “positive recurrent”. A Markov chain is “Ergodic” when all the states in the Markov chain are “Ergodic”. Thus all states in an Ergodic chain “communicate” with each other, making the chain irreducible. Furthermore, there are no transient states in an Ergodic chain. This means that once you enter an Ergodic state you will always visit it again.
If every probability/number in the power transition matrix, is positive and non zero then, the states in the Markov Chain are not periodic (if there was zero in the original matrix Q then the will have a zero somewhere) Such a transition matrix is called as a Regular transition matrix and it always has a stationary distribution (described ahead).
Reversible Markov Chains
This reversibility refers to time. Assume that you have an irreducible and positive recurrent chain, started at its unique invariant distribution . This implies that This means that is the p. m. f. of , and all other as well.
Now suppose that, for every n, has the same joint p. m. f. as their timereversal . Then we call the chain reversible — sometimes it is, equivalently, also said that its invariant distribution is reversible . This means that a recorded simulation of a reversible chain looks the same if the “movie” is run backward.
Is there a condition for reversibility that can be easily checked? The first thing to observe is that if the chain is started at , reversible or not, the timereversed chain has the Markov property. For reversibility, this expression must be the same as the forward transition probability . Conversely, if both original and timereversed chain have the same transition probabilities (and we already know that the two start at the same invariant distribution and that both are Markov), then their p. m. f.’s must agree. A Markov chain with invariant measure is reversible if and only if
, for all states i and j.
Example: Random walks on an undirected graph can be expressed as a Reversible Markov Chain. Google Page Rank is a not a reversible Markov Chain.
Long standing behavior of Markov Chains:
Limiting Distribution:
A Markov chain (Xn)n∈N is said to have a limiting distribution if:
exist for all i,j ∈ S where S is a probability distribution of the Markov Chains and
.
Provided these two conditions are met, then the limiting distribution of a Markov chain with is the probability distribution given by for all states j.
For example when the transition matrix Q is regular (i.e. it has a power matrix whose coefficients are all nonzero) on the finite state space S={0,1,...,N}, the chain admits a limiting distribution where 0 ≤ i ≤ N given by , 0≤i, j≤N
Stationary distribution: If you run the Markov chains for a long time then it converges to a state such that s = Q*s Intuitively this implies that you start at a distribution of states as “s” and then at the next instance you still are at Q* s = s i.e same state. Hence you are in a state of equilibrium. The stationary distribution lets you study the longterm behavior of the model. It gives information about the stability of a random process and, in certain cases, describes the limiting behavior of the Markov chain.
A probability distribution on S, i.e. a family i∈S in [0,1] such that i∈S, is said to be stationary if, starting at with the distribution i∈S,it turns out that the distribution of is still , i∈S at . In other words, , i∈S is stationary for the Markov chain with transition matrix Q if, letting , i∈S, at , implies , i∈S, at time 1. Thus the distribution does not change with time.
The distribution is stationary if and only if it is invariant by the matrix Q, that means .
For any timehomogeneous Markov chain that is aperiodic and irreducible, lim n→∞ Q^n converges to a matrix with all rows identical and equal to i.e the stationary distribution.
We also note that stationary and limiting distributions are quite different concepts. If the chain is started in the stationary distribution then it will remain in that distribution at any subsequent timestep (which is stronger than saying that the chain will reach that distribution after an infinite number of time steps). On the other hand, in order to reach the limiting distribution, the chain can be started from any given initial distribution or even from any fixed given state, and it will converge to the limiting distribution if it exists. For eg: Limiting distributions may not exist when the chain is NOT Aperiodic.
In summary, we usually try first to compute the stationary distribution whenever possible, and this also gives the limiting distribution when it exists. Otherwise, we may try to compute the limiting distribution by exponentiating the transition matrix Q and taking the limit, but this is normally much more complicated and done only in exceptional cases. To summarize we note that when we have:
Finding the Limiting Distribution for an Absorbing Markov Chain explained with an example:
[Note: Absorbing states are positive recurrent and aperiodic. The Markov chain may not be Aperiodic as the chain depends on all the states in the chain]
Suppose a credit union classifies automobile loans into one of the four categories: The loan has been paid in pull (F) The account is in good standing (G) with all payments uptodate. The account is in arrears (A) with one or more missing payments The account has been classified as bad debt (B) and sold to a collection agency. Past records indicate that each month 10% of the accounts in good standing pay the loan in full. 80% remain in good standing and 10% become in arrears. For accounts in arrears 10% are paid in full, 40% become accounts in good standing, 40% remain in arrears and 10% are classified as bad debts.
The above Markov chain has the following properties:
Standard Form representation of the transition matrix is such that all the absorbing states are put before the nonabsorbing states. In this case F, B comes before A and G.
Standard Matrix  F  B  G  A 

F  1  0  0  0 
B  0  1  0  0 
G  0.1  0  0.8  0.1 
A  0.1  0.1  0.4  0.4 
[FB][FB]: forms an Identity Matrix (I) [FB][GA] forms a Zero Matrix. [GA][FB]: refer as R [GA][GA]: refer as Q
We are going to calculate (I  Q) (I: Identity matrix and Q is as above) and find the inverse of that. This is refered to as the Fundamental matrix F => [(I  Q) ^ 1 ] This is as follows:
Fundamental Matrix  G  A  Addition 

G  7.5  1.25  8.75 
A  5  2.5  7.5 
The sum of the entries in each row of the fundamental matrix F is the average number of trials it will take to go from each nonabsorbing state to some absorbing state.
Thus it will take on an average 7.5 months for the Arrears account to be classified as a Full payment account or a Bad Debt. Similarly, it will take on an average 8.75 months for the account in good standing to be classified as Full payment account or Bad Debt. We then find the product of F and R We can now use this product FR to find the limiting matrix as follows:
Limiting Matrix  F  B  G  A 

F  1  0  0  0 
B  0  1  0  0 
G  0.875  0.125  0  0 
A  0.75  0.25  0  0 
[FB][FB]: forms an Identity Matrix [FB][GA] forms a Zero Matrix. [GA][GA]: forms a Zero Matrix. [GA][FB]: we got from FR mentioned above.
Using this limiting matrix we can make the following statements:
This books introduces the relevant concepts of probability theory, distributions and explains Markov Chains with a lot of examples and problems. Solutions to the problems are also offered at the end of the book.
This small lecture series discuss practical examples to explain Markov Chains. Its an excellent reference to understand Markov Chains. The examples in this wiki are from this lecture series
This lecture series is a descent watch for understanding Markov Chains.
This is a good introduction to the relevant probability theory needed to understand Markov Chains and its distribution. This book also has a chapter on Markov Chains which is a fine read.
