Skip to article frontmatterSkip to article content
Contents
and

33. Markov Chains: Basic Concepts

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon
Output
Requirement already satisfied: quantecon in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (0.8.0)
Requirement already satisfied: numba>=0.49.0 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from quantecon) (0.61.0)
Requirement already satisfied: numpy>=1.17.0 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from quantecon) (2.1.3)
Requirement already satisfied: requests in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from quantecon) (2.32.3)
Requirement already satisfied: scipy>=1.5.0 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from quantecon) (1.15.2)

Requirement already satisfied: sympy in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from quantecon) (1.13.3)
Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from numba>=0.49.0->quantecon) (0.44.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from requests->quantecon) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from requests->quantecon) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from requests->quantecon) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from requests->quantecon) (2025.1.31)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/hostedtoolcache/Python/3.12.9/x64/lib/python3.12/site-packages (from sympy->quantecon) (1.3.0)

33.1Overview

Markov chains provide a way to model situations in which the past casts shadows on the future.

By this we mean that observing measurements about a present situation can help us forecast future situations.

This can be possible when there are statistical dependencies among measurements of something taken at different points of time.

For example,

  • inflation next year might co-vary with inflation this year
  • unemployment next month might co-vary with unemployment this month

Markov chains are a workhorse for economics and finance.

The theory of Markov chains is beautiful and provides many insights into probability and dynamics.

In this lecture, we will

  • review some of the key ideas from the theory of Markov chains and
  • show how Markov chains appear in some economic applications.

Let’s start with some standard imports:

import matplotlib.pyplot as plt
import quantecon as qe
import numpy as np
import networkx as nx
from matplotlib import cm
import matplotlib as mpl
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
from matplotlib.patches import Polygon
from mpl_toolkits.mplot3d.art3d import Poly3DCollection

33.2Definitions and examples

In this section we provide some definitions and elementary examples.

33.2.1Stochastic matrices

Recall that a probability mass function over nn possible outcomes is a nonnegative nn-vector pp that sums to one.

For example, p=(0.2,0.2,0.6)p = (0.2, 0.2, 0.6) is a probability mass function over 3 outcomes.

A stochastic matrix (or Markov matrix) is an n×nn \times n square matrix PP such that each row of PP is a probability mass function over nn outcomes.

In other words,

  1. each element of PP is nonnegative, and
  2. each row of PP sums to one

If PP is a stochastic matrix, then so is the kk-th power PkP^k for all kNk \in \mathbb N.

You are asked to check this in an exercise below.

33.2.2Markov chains

Now we can introduce Markov chains.

Before defining a Markov chain rigorously, we’ll give some examples.

33.2.2.1Example 1

From US unemployment data, Hamilton Hamilton (2005) estimated the following dynamics.

Here there are three states

  • “ng” represents normal growth
  • “mr” represents mild recession
  • “sr” represents severe recession

The arrows represent transition probabilities over one month.

For example, the arrow from mild recession to normal growth has 0.145 next to it.

This tells us that, according to past data, there is a 14.5% probability of transitioning from mild recession to normal growth in one month.

The arrow from normal growth back to normal growth tells us that there is a 97% probability of transitioning from normal growth to normal growth (staying in the same state).

Note that these are conditional probabilities --- the probability of transitioning from one state to another (or staying at the same one) conditional on the current state.

To make the problem easier to work with numerically, let’s convert states to numbers.

In particular, we agree that

  • state 0 represents normal growth
  • state 1 represents mild recession
  • state 2 represents severe recession

Let XtX_t record the value of the state at time tt.

Now we can write the statement “there is a 14.5% probability of transitioning from mild recession to normal growth in one month” as

P{Xt+1=0Xt=1}=0.145\mathbb P\{X_{t+1} = 0 \,|\, X_t = 1\} = 0.145

We can collect all of these conditional probabilities into a matrix, as follows

P=[0.9710.02900.1450.7780.07700.5080.492]P = \begin{bmatrix} 0.971 & 0.029 & 0 \\ 0.145 & 0.778 & 0.077 \\ 0 & 0.508 & 0.492 \end{bmatrix}

Notice that PP is a stochastic matrix.

Now we have the following relationship

P(i,j)=P{Xt+1=jXt=i}P(i,j) = \mathbb P\{X_{t+1} = j \,|\, X_t = i\}

This holds for any i,ji,j between 0 and 2.

In particular, P(i,j)P(i,j) is the probability of transitioning from state ii to state jj in one month.

33.2.2.2Example 2

Consider a worker who, at any given time tt, is either unemployed (state 0) or employed (state 1).

Suppose that, over a one-month period,

  1. the unemployed worker finds a job with probability α(0,1)\alpha \in (0, 1).
  2. the employed worker loses her job and becomes unemployed with probability β(0,1)\beta \in (0, 1).

Given the above information, we can write out the transition probabilities in matrix form as

P=[1ααβ1β]P = \begin{bmatrix} 1 - \alpha & \alpha \\ \beta & 1 - \beta \end{bmatrix}

For example,

P(0,1)= probability of transitioning from state 0 to state 1 in one month= probability finding a job next month=α\begin{aligned} P(0,1) & = \text{ probability of transitioning from state $0$ to state $1$ in one month} \\ & = \text{ probability finding a job next month} \\ & = \alpha \end{aligned}

Suppose we can estimate the values α and β.

Then we can address a range of questions, such as

  • What is the average duration of unemployment?
  • Over the long-run, what fraction of the time does a worker find herself unemployed?
  • Conditional on employment, what is the probability of becoming unemployed at least once over the next 12 months?

We’ll cover some of these applications below.

33.2.2.3Example 3

Imam and Temple Imam & Temple (2023) categorize political institutions into three types: democracy (D)\text{(D)}, autocracy (A)\text{(A)}, and an intermediate state called anocracy (N)\text{(N)}.

Each institution can have two potential development regimes: collapse (C)\text{(C)} and growth (G)\text{(G)}. This results in six possible states: DG, DC, NG, NC, AG\text{DG, DC, NG, NC, AG} and AC\text{AC}.

Imam and Temple Imam & Temple (2023) estimate the following transition probabilities:

P:=[0.860.110.030.000.000.000.520.330.130.020.000.000.120.030.700.110.030.010.130.020.350.360.100.040.000.000.090.110.550.250.000.000.090.150.260.50]P := \begin{bmatrix} 0.86 & 0.11 & 0.03 & 0.00 & 0.00 & 0.00 \\ 0.52 & 0.33 & 0.13 & 0.02 & 0.00 & 0.00 \\ 0.12 & 0.03 & 0.70 & 0.11 & 0.03 & 0.01 \\ 0.13 & 0.02 & 0.35 & 0.36 & 0.10 & 0.04 \\ 0.00 & 0.00 & 0.09 & 0.11 & 0.55 & 0.25 \\ 0.00 & 0.00 & 0.09 & 0.15 & 0.26 & 0.50 \end{bmatrix}
nodes = ['DG', 'DC', 'NG', 'NC', 'AG', 'AC']
P = [[0.86, 0.11, 0.03, 0.00, 0.00, 0.00],
     [0.52, 0.33, 0.13, 0.02, 0.00, 0.00],
     [0.12, 0.03, 0.70, 0.11, 0.03, 0.01],
     [0.13, 0.02, 0.35, 0.36, 0.10, 0.04],
     [0.00, 0.00, 0.09, 0.11, 0.55, 0.25],
     [0.00, 0.00, 0.09, 0.15, 0.26, 0.50]]

Here is a visualization, with darker colors indicating higher probability.

Source
G = nx.MultiDiGraph()

for start_idx, node_start in enumerate(nodes):
    for end_idx, node_end in enumerate(nodes):
        value = P[start_idx][end_idx]
        if value != 0:
            G.add_edge(node_start,node_end, weight=value)

pos = nx.spring_layout(G, seed=10)
fig, ax = plt.subplots()
nx.draw_networkx_nodes(G, pos, node_size=600, edgecolors='black', node_color='white')
nx.draw_networkx_labels(G, pos)

arc_rad = 0.2

edges = nx.draw_networkx_edges(G, pos, ax=ax, connectionstyle=f'arc3, rad = {arc_rad}', edge_cmap=cm.Blues, width=2,
    edge_color=[G[nodes[0]][nodes[1]][0]['weight'] for nodes in G.edges])

pc = mpl.collections.PatchCollection(edges, cmap=cm.Blues)

ax = plt.gca()
ax.set_axis_off()
plt.colorbar(pc, ax=ax)
plt.show()
<Figure size 640x480 with 2 Axes>

Looking at the data, we see that democracies tend to have longer-lasting growth regimes compared to autocracies (as indicated by the lower probability of transitioning from growth to growth in autocracies).

We can also find a higher probability from collapse to growth in democratic regimes.

33.2.3Defining Markov chains

So far we’ve given examples of Markov chains but we haven’t defined them.

Let’s do that now.

To begin, let SS be a finite set {x1,,xn}\{x_1, \ldots, x_n\} with nn elements.

The set SS is called the state space and x1,,xnx_1, \ldots, x_n are the state values.

A distribution ψ on SS is a probability mass function of length nn, where ψ(i)\psi(i) is the amount of probability allocated to state xix_i.

A Markov chain {Xt}\{X_t\} on SS is a sequence of random variables taking values in SS that have the Markov property.

This means that, for any date tt and any state ySy \in S,

P{Xt+1=yXt}=P{Xt+1=yXt,Xt1,}\mathbb P \{ X_{t+1} = y \,|\, X_t \} = \mathbb P \{ X_{t+1} = y \,|\, X_t, X_{t-1}, \ldots \}

This means that once we know the current state XtX_t, adding knowledge of earlier states Xt1,Xt2X_{t-1}, X_{t-2} provides no additional information about probabilities of future states.

Thus, the dynamics of a Markov chain are fully determined by the set of conditional probabilities

P(x,y):=P{Xt+1=yXt=x}(x,yS)P(x, y) := \mathbb P \{ X_{t+1} = y \,|\, X_t = x \} \qquad (x, y \in S)

By construction,

  • P(x,y)P(x, y) is the probability of going from xx to yy in one unit of time (one step)
  • P(x,)P(x, \cdot) is the conditional distribution of Xt+1X_{t+1} given Xt=xX_t = x

We can view PP as a stochastic matrix where

Pij=P(xi,xj)1i,jnP_{ij} = P(x_i, x_j) \qquad 1 \leq i, j \leq n

Going the other way, if we take a stochastic matrix PP, we can generate a Markov chain {Xt}\{X_t\} as follows:

  • draw X0X_0 from a distribution ψ0\psi_0 on SS
  • for each t=0,1,t = 0, 1, \ldots, draw Xt+1X_{t+1} from P(Xt,)P(X_t,\cdot)

By construction, the resulting process satisfies (8).

33.3Simulation

A good way to study Markov chains is to simulate them.

Let’s start by doing this ourselves and then look at libraries that can help us.

In these exercises, we’ll take the state space to be S=0,,n1S = 0,\ldots, n-1.

(We start at 0 because Python arrays are indexed from 0.)

33.3.1Writing our own simulation code

To simulate a Markov chain, we need

  1. a stochastic matrix PP and
  2. a probability mass function ψ0\psi_0 of length nn from which to draw an initial realization of X0X_0.

The Markov chain is then constructed as follows:

  1. At time t=0t=0, draw a realization of X0X_0 from the distribution ψ0\psi_0.
  2. At each subsequent time tt, draw a realization of the new state Xt+1X_{t+1} from P(Xt,)P(X_t, \cdot).

(That is, draw from row XtX_t of PP.)

To implement this simulation procedure, we need a method for generating draws from a discrete distribution.

For this task, we’ll use random.draw from QuantEcon.py.

To use random.draw, we first need to convert the probability mass function to a cumulative distribution

ψ_0 = (0.3, 0.7)           # probabilities over {0, 1}
cdf = np.cumsum(ψ_0)       # convert into cumulative distribution
qe.random.draw(cdf, 5)   # generate 5 independent draws from ψ
array([0, 1, 1, 1, 0])

We’ll write our code as a function that accepts the following three arguments

  • A stochastic matrix P.
  • An initial distribution ψ_0.
  • A positive integer ts_length representing the length of the time series the function should return.
def mc_sample_path(P, ψ_0=None, ts_length=1_000):

    # set up
    P = np.asarray(P)
    X = np.empty(ts_length, dtype=int)

    # Convert each row of P into a cdf
    P_dist = np.cumsum(P, axis=1)  # Convert rows into cdfs

    # draw initial state, defaulting to 0
    if ψ_0 is not None:
        X_0 = qe.random.draw(np.cumsum(ψ_0))
    else:
        X_0 = 0

    # simulate
    X[0] = X_0
    for t in range(ts_length - 1):
        X[t+1] = qe.random.draw(P_dist[X[t], :])

    return X

Let’s see how it works using the small matrix

P = [[0.4, 0.6],
     [0.2, 0.8]]

Here’s a short time series.

mc_sample_path(P, ψ_0=(1.0, 0.0), ts_length=10)
array([0, 1, 1, 1, 0, 0, 0, 1, 1, 1])

It can be shown that for a long series drawn from P, the fraction of the sample that takes value 0 will be about 0.25.

(We will explain why later.)

Moreover, this is true regardless of the initial distribution from which X0X_0 is drawn.

The following code illustrates this

X = mc_sample_path(P, ψ_0=(0.1, 0.9), ts_length=1_000_000)
np.mean(X == 0)
np.float64(0.250481)

You can try changing the initial distribution to confirm that the output is always close to 0.25 (for the P matrix above).

33.3.2Using QuantEcon’s routines

QuantEcon.py has routines for handling Markov chains, including simulation.

Here’s an illustration using the same PP as the preceding example

mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1_000_000)
np.mean(X == 0)
np.float64(0.24936)

The simulate routine is faster (because it is JIT compiled).

%time mc_sample_path(P, ts_length=1_000_000) # Our homemade code version
CPU times: user 3.03 s, sys: 12.6 ms, total: 3.04 s
Wall time: 5.76 s
array([0, 1, 1, ..., 0, 1, 1])
%time mc.simulate(ts_length=1_000_000) # qe code version
CPU times: user 15 ms, sys: 5.97 ms, total: 21 ms
Wall time: 30.2 ms
array([0, 0, 1, ..., 1, 1, 1])

33.3.2.1Adding state values and initial conditions

If we wish to, we can provide a specification of state values to MarkovChain.

These state values can be integers, floats, or even strings.

The following code illustrates

mc = qe.MarkovChain(P, state_values=('unemployed', 'employed'))
mc.simulate(ts_length=4, init='employed')  # Start at employed initial state
array(['employed', 'unemployed', 'unemployed', 'employed'], dtype='<U10')
mc.simulate(ts_length=4, init='unemployed')  # Start at unemployed initial state
array(['unemployed', 'employed', 'unemployed', 'employed'], dtype='<U10')
mc.simulate(ts_length=4)  # Start at randomly chosen initial state
array(['employed', 'employed', 'employed', 'employed'], dtype='<U10')

If we want to see indices rather than state values as outputs as we can use

mc.simulate_indices(ts_length=4)
array([1, 1, 1, 1])

33.4Distributions over time

We learned that

  1. {Xt}\{X_t\} is a Markov chain with stochastic matrix PP
  2. the distribution of XtX_t is known to be ψt\psi_t

What then is the distribution of Xt+1X_{t+1}, or, more generally, of Xt+mX_{t+m}?

To answer this, we let ψt\psi_t be the distribution of XtX_t for t=0,1,2,t = 0, 1, 2, \ldots.

Our first aim is to find ψt+1\psi_{t + 1} given ψt\psi_t and PP.

To begin, pick any ySy \in S.

To get the probability of being at yy tomorrow (at t+1t+1), we account for all ways this can happen and sum their probabilities.

This leads to

P{Xt+1=y}=xSP{Xt+1=yXt=x}P{Xt=x}\mathbb P \{X_{t+1} = y \} = \sum_{x \in S} \mathbb P \{ X_{t+1} = y \, | \, X_t = x \} \cdot \mathbb P \{ X_t = x \}

(We are using the law of total probability.)

Rewriting this statement in terms of marginal and conditional probabilities gives

ψt+1(y)=xSP(x,y)ψt(x)\psi_{t+1}(y) = \sum_{x \in S} P(x,y) \psi_t(x)

There are nn such equations, one for each ySy \in S.

If we think of ψt+1\psi_{t+1} and ψt\psi_t as row vectors, these nn equations are summarized by the matrix expression

ψt+1=ψtP\psi_{t+1} = \psi_t P

Thus, we postmultiply by PP to move a distribution forward one unit of time.

By postmultiplying mm times, we move a distribution forward mm steps into the future.

Hence, iterating on (12), the expression ψt+m=ψtPm\psi_{t+m} = \psi_t P^m is also valid --- here PmP^m is the mm-th power of PP.

As a special case, we see that if ψ0\psi_0 is the initial distribution from which X0X_0 is drawn, then ψ0Pm\psi_0 P^m is the distribution of XmX_m.

This is very important, so let’s repeat it

X0ψ0    Xmψ0PmX_0 \sim \psi_0 \quad \implies \quad X_m \sim \psi_0 P^m

The general rule is that postmultiplying a distribution by PmP^m shifts it forward mm units of time.

Hence the following is also valid.

Xtψt    Xt+mψtPmX_t \sim \psi_t \quad \implies \quad X_{t+m} \sim \psi_t P^m

33.4.1Multiple step transition probabilities

We know that the probability of transitioning from xx to yy in one step is P(x,y)P(x,y).

It turns out that the probability of transitioning from xx to yy in mm steps is Pm(x,y)P^m(x,y), the (x,y)(x,y)-th element of the mm-th power of PP.

To see why, consider again (14), but now with a ψt\psi_t that puts all probability on state xx.

Then ψt\psi_t is a vector with 1 in position xx and zero elsewhere.

Inserting this into (14), we see that, conditional on Xt=xX_t = x, the distribution of Xt+mX_{t+m} is the xx-th row of PmP^m.

In particular

P{Xt+m=yXt=x}=Pm(x,y)=(x,y)-th element of Pm\mathbb P \{X_{t+m} = y \,|\, X_t = x \} = P^m(x, y) = (x, y) \text{-th element of } P^m

33.4.2Example: probability of recession

Recall the stochastic matrix PP for recession and growth considered above.

Suppose that the current state is unknown --- perhaps statistics are available only at the end of the current month.

We guess that the probability that the economy is in state xx is ψt(x)\psi_t(x) at time t.

The probability of being in recession (either mild or severe) in 6 months time is given by

(ψtP6)(1)+(ψtP6)(2)(\psi_t P^6)(1) + (\psi_t P^6)(2)

33.4.3Example 2: cross-sectional distributions

The distributions we have been studying can be viewed either

  1. as probabilities or
  2. as cross-sectional frequencies that the law of large numbers leads us to anticipate for large samples.

To illustrate, recall our model of employment/unemployment dynamics for a given worker discussed above.

Consider a large population of workers, each of whose lifetime experience is described by the specified dynamics, with each worker’s outcomes being realizations of processes that are statistically independent of all other workers’ processes.

Let ψt\psi_t be the current cross-sectional distribution over {0,1}\{ 0, 1 \}.

The cross-sectional distribution records fractions of workers employed and unemployed at a given moment tt.

  • For example, ψt(0)\psi_t(0) is the unemployment rate at time tt.

What will the cross-sectional distribution be in 10 periods hence?

The answer is ψtP10\psi_t P^{10}, where PP is the stochastic matrix in (4).

This is because each worker’s state evolves according to PP, so ψtP10\psi_t P^{10} is a marginal distribution for a single randomly selected worker.

But when the sample is large, outcomes and probabilities are roughly equal (by an application of the law of large numbers).

So for a very large (tending to infinite) population, ψtP10\psi_t P^{10} also represents fractions of workers in each state.

This is exactly the cross-sectional distribution.

33.5Stationary distributions

As seen in (12), we can shift a distribution forward one unit of time via postmultiplication by PP.

Some distributions are invariant under this updating process --- for example,

P = np.array([[0.4, 0.6],
              [0.2, 0.8]])
ψ = (0.25, 0.75)
ψ @ P
array([0.25, 0.75])

Notice that ψ @ P is the same as ψ.

Such distributions are called stationary or invariant.

Formally, a distribution ψ\psi^* on SS is called stationary for PP if ψP=ψ\psi^* P = \psi^* .

Notice that, postmultiplying by PP, we have ψP2=ψP=ψ\psi^* P^2 = \psi^* P = \psi^*.

Continuing in the same way leads to ψ=ψPt\psi^* = \psi^* P^t for all t0t \ge 0.

This tells us an important fact: If the distribution of ψ0\psi_0 is a stationary distribution, then ψt\psi_t will have this same distribution for all t0t \ge 0.

The following theorem is proved in Chapter 4 of Sargent & Stachurski (2023) and numerous other sources.

Note that there can be many stationary distributions corresponding to a given stochastic matrix PP.

  • For example, if PP is the identity matrix, then all distributions on SS are stationary.

To get uniqueness, we need the Markov chain to “mix around,” so that the state doesn’t get stuck in some part of the state space.

This gives some intuition for the following theorem.

We will come back to this when we introduce irreducibility in the next lecture on Markov chains.

33.5.1Example

Recall our model of the employment/unemployment dynamics of a particular worker discussed above.

If α(0,1)\alpha \in (0,1) and β(0,1)\beta \in (0,1), then the transition matrix is everywhere positive.

Let ψ=(p,1p)\psi^* = (p, 1-p) be the stationary distribution, so that pp corresponds to unemployment (state 0).

Using ψ=ψP\psi^* = \psi^* P and a bit of algebra yields

p=βα+βp = \frac{\beta}{\alpha + \beta}

This is, in some sense, a steady state probability of unemployment.

Not surprisingly it tends to zero as β0\beta \to 0, and to one as α0\alpha \to 0.

33.5.2Calculating stationary distributions

A stable algorithm for computing stationary distributions is implemented in QuantEcon.py.

Here’s an example

P = [[0.4, 0.6],
     [0.2, 0.8]]

mc = qe.MarkovChain(P)
mc.stationary_distributions  # Show all stationary distributions
array([[0.25, 0.75]])

33.5.3Asymptotic stationarity

Consider an everywhere positive stochastic matrix with unique stationary distribution ψ\psi^*.

Sometimes the distribution ψt=ψ0Pt\psi_t = \psi_0 P^t of XtX_t converges to ψ\psi^* regardless of ψ0\psi_0.

For example, we have the following result

This situation is often referred to as asymptotic stationarity or global stability.

A proof of the theorem can be found in Chapter 4 of Sargent & Stachurski (2023), as well as many other sources.

33.5.3.1Example: Hamilton’s chain

Hamilton’s chain satisfies the conditions of the theorem because P2P^2 is everywhere positive:

P = np.array([[0.971, 0.029, 0.000],
              [0.145, 0.778, 0.077],
              [0.000, 0.508, 0.492]])
P @ P
array([[0.947046, 0.050721, 0.002233], [0.253605, 0.648605, 0.09779 ], [0.07366 , 0.64516 , 0.28118 ]])

Let’s pick an initial distribution ψ1,ψ2,ψ3\psi_1, \psi_2, \psi_3 and trace out the sequence of distributions ψiPt\psi_i P^t for t=0,1,2,t = 0, 1, 2, \ldots, for i=1,2,3i=1, 2, 3.

First, we write a function to iterate the sequence of distributions for ts_length period

def iterate_ψ(ψ_0, P, ts_length):
    n = len(P)
    ψ_t = np.empty((ts_length, n))
    ψ_t[0 ]= ψ_0
    for t in range(1, ts_length):
        ψ_t[t] = ψ_t[t-1] @ P
    return ψ_t

Now we plot the sequence

Source
ψ_1 = (0.0, 0.0, 1.0)
ψ_2 = (1.0, 0.0, 0.0)
ψ_3 = (0.0, 1.0, 0.0)                   # Three initial conditions
colors = ['blue','red', 'green']   # Different colors for each initial point

# Define the vertices of the unit simplex
v = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 0]])

# Define the faces of the unit simplex
faces = [
    [v[0], v[1], v[2]],
    [v[0], v[1], v[3]],
    [v[0], v[2], v[3]],
    [v[1], v[2], v[3]]
]

fig = plt.figure()
ax = fig.add_subplot(projection='3d')

def update(n):    
    ax.clear()
    ax.set_xlim([0, 1])
    ax.set_ylim([0, 1])
    ax.set_zlim([0, 1])
    ax.view_init(45, 45)
    
    simplex = Poly3DCollection(faces, alpha=0.03)
    ax.add_collection3d(simplex)
    
    for idx, ψ_0 in enumerate([ψ_1, ψ_2, ψ_3]):
        ψ_t = iterate_ψ(ψ_0, P, n+1)
        
        for i, point in enumerate(ψ_t):
            ax.scatter(point[0], point[1], point[2], color=colors[idx], s=60, alpha=(i+1)/len(ψ_t))
            
    mc = qe.MarkovChain(P)
    ψ_star = mc.stationary_distributions[0]
    ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='yellow', s=60)
    
    return fig,

anim = FuncAnimation(fig, update, frames=range(20), blit=False, repeat=False)
plt.close()
HTML(anim.to_jshtml())
Loading...

Here

  • PP is the stochastic matrix for recession and growth considered above.
  • The red, blue and green dots are initial marginal probability distributions ψ1,ψ2,ψ3\psi_1, \psi_2, \psi_3, each of which is represented as a vector in R3\mathbb R^3.
  • The transparent dots are the marginal distributions ψiPt\psi_i P^t for t=1,2,t = 1, 2, \ldots, for i=1,2,3.i=1,2,3..
  • The yellow dot is ψ\psi^*.

You might like to try experimenting with different initial conditions.

33.5.3.2Example: failure of convergence

Consider the periodic chain with stochastic matrix

P=[0110]P = \begin{bmatrix} 0 & 1 \\ 1 & 0 \\ \end{bmatrix}

This matrix does not satisfy the conditions of Theorem 3 because, as you can readily check,

  • Pm=PP^m = P when mm is odd and
  • Pm=IP^m = I, the identity matrix, when mm is even.

Hence there is no mm such that all elements of PmP^m are strictly positive.

Moreover, we can see that global stability does not hold.

For instance, if we start at ψ0=(1,0)\psi_0 = (1,0), then ψm=ψ0Pm\psi_m = \psi_0 P^m is (1,0)(1, 0) when mm is even and (0,1)(0,1) when mm is odd.

We can see similar phenomena in higher dimensions.

The next figure illustrates this for a periodic Markov chain with three states.

Source
ψ_1 = (0.0, 0.0, 1.0)
ψ_2 = (0.5, 0.5, 0.0)
ψ_3 = (0.25, 0.25, 0.5)
ψ_4 = (1/3, 1/3, 1/3)

P = np.array([[0.0, 1.0, 0.0],
              [0.0, 0.0, 1.0],
              [1.0, 0.0, 0.0]])

fig = plt.figure()
ax = fig.add_subplot(projection='3d')
colors = ['red','yellow', 'green', 'blue']  # Different colors for each initial point

# Define the vertices of the unit simplex
v = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 0, 0]])

# Define the faces of the unit simplex
faces = [
    [v[0], v[1], v[2]],
    [v[0], v[1], v[3]],
    [v[0], v[2], v[3]],
    [v[1], v[2], v[3]]
]

def update(n):
    ax.clear()
    ax.set_xlim([0, 1])
    ax.set_ylim([0, 1])
    ax.set_zlim([0, 1])
    ax.view_init(45, 45)
    
    # Plot the 3D unit simplex as planes
    simplex = Poly3DCollection(faces,alpha=0.05)
    ax.add_collection3d(simplex)
    
    for idx, ψ_0 in enumerate([ψ_1, ψ_2, ψ_3, ψ_4]):
        ψ_t = iterate_ψ(ψ_0, P, n+1)
        
        point = ψ_t[-1]
        ax.scatter(point[0], point[1], point[2], color=colors[idx], s=60)
        points = np.array(ψ_t)
        ax.plot(points[:, 0], points[:, 1], points[:, 2], color=colors[idx],linewidth=0.75)
    
    return fig,

anim = FuncAnimation(fig, update, frames=range(20), blit=False, repeat=False)
plt.close()
HTML(anim.to_jshtml())
Loading...

This animation demonstrates the behavior of an irreducible and periodic stochastic matrix.

The red, yellow, and green dots represent different initial probability distributions.

The blue dot represents the unique stationary distribution.

Unlike Hamilton’s Markov chain, these initial distributions do not converge to the unique stationary distribution.

Instead, they cycle periodically around the probability simplex, illustrating that asymptotic stability fails.

33.6Computing expectations

We sometimes want to compute mathematical expectations of functions of XtX_t of the form

E[h(Xt)]\mathbb E [ h(X_t) ]

and conditional expectations such as

E[h(Xt+k)Xt=x]\mathbb E [ h(X_{t + k}) \mid X_t = x]

where

  • {Xt}\{X_t\} is a Markov chain generated by n×nn \times n stochastic matrix PP.
  • hh is a given function, which, in terms of matrix algebra, we’ll think of as the column vector
h=[h(x1)h(xn)].h = \begin{bmatrix} h(x_1) \\ \vdots \\ h(x_n) \end{bmatrix}.

Computing the unconditional expectation (20) is easy.

We just sum over the marginal distribution of XtX_t to get

E[h(Xt)]=xS(ψPt)(x)h(x)\mathbb E [ h(X_t) ] = \sum_{x \in S} (\psi P^t)(x) h(x)

Here ψ is the distribution of X0X_0.

Since ψ and hence ψPt\psi P^t are row vectors, we can also write this as

E[h(Xt)]=ψPth\mathbb E [ h(X_t) ] = \psi P^t h

For the conditional expectation (21), we need to sum over the conditional distribution of Xt+kX_{t + k} given Xt=xX_t = x.

We already know that this is Pk(x,)P^k(x, \cdot), so

E[h(Xt+k)Xt=x]=(Pkh)(x)\mathbb E [ h(X_{t + k}) \mid X_t = x] = (P^k h)(x)

33.6.1Expectations of geometric sums

Sometimes we want to compute the mathematical expectation of a geometric sum, such as tβth(Xt)\sum_t \beta^t h(X_t).

In view of the preceding discussion, this is

E[j=0βjh(Xt+j)Xt=x]=x+β(Ph)(x)+β2(P2h)(x)+\mathbb{E} \left[ \sum_{j=0}^\infty \beta^j h(X_{t+j}) \mid X_t = x \right] = x + \beta (Ph)(x) + \beta^2 (P^2 h)(x) + \cdots

By the Neumann series lemma, this sum can be calculated using

I+βP+β2P2+=(IβP)1I + \beta P + \beta^2 P^2 + \cdots = (I - \beta P)^{-1}

The vector PkhP^k h stores the conditional expectation E[h(Xt+k)Xt=x]\mathbb E [ h(X_{t + k}) \mid X_t = x] over all xx.

Solution to Exercise 1

Solution 1:

Since the matrix is everywhere positive, there is a unique stationary distribution ψ\psi^* such that ψtψ\psi_t\to \psi^* as tt\to \infty.

Solution 2:

One simple way to calculate the stationary distribution is to take the power of the transition matrix as we have shown before

P = np.array([[0.68, 0.12, 0.20],
              [0.50, 0.24, 0.26],
              [0.36, 0.18, 0.46]])
P_power = np.linalg.matrix_power(P, 20)
P_power
array([[0.56145769, 0.15565164, 0.28289067], [0.56145769, 0.15565164, 0.28289067], [0.56145769, 0.15565164, 0.28289067]])

Note that rows of the transition matrix converge to the stationary distribution.

ψ_star_p = P_power[0]
ψ_star_p
array([0.56145769, 0.15565164, 0.28289067])
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ψ_star
array([0.56145769, 0.15565164, 0.28289067])
Solution to Exercise 2

Solution 1:

Although PP is not every positive, PmP^m when m=3m=3 is everywhere positive.

P = np.array([[0.86, 0.11, 0.03, 0.00, 0.00, 0.00],
              [0.52, 0.33, 0.13, 0.02, 0.00, 0.00],
              [0.12, 0.03, 0.70, 0.11, 0.03, 0.01],
              [0.13, 0.02, 0.35, 0.36, 0.10, 0.04],
              [0.00, 0.00, 0.09, 0.11, 0.55, 0.25],
              [0.00, 0.00, 0.09, 0.15, 0.26, 0.50]])

np.linalg.matrix_power(P,3)
array([[0.764927, 0.133481, 0.085949, 0.011481, 0.002956, 0.001206], [0.658861, 0.131559, 0.161367, 0.031703, 0.011296, 0.005214], [0.291394, 0.057788, 0.439702, 0.113408, 0.062707, 0.035001], [0.272459, 0.051361, 0.365075, 0.132207, 0.108152, 0.070746], [0.064129, 0.012533, 0.232875, 0.154385, 0.299243, 0.236835], [0.072865, 0.014081, 0.244139, 0.160905, 0.265846, 0.242164]])

So it satisfies the requirement.

Solution 2:

We find the distribution ψ converges to the stationary distribution quickly regardless of the initial distributions

ts_length = 30
num_distributions = 20
nodes = ['DG', 'DC', 'NG', 'NC', 'AG', 'AC']

# Get parameters of transition matrix
n = len(P)
mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ψ_0 = np.array([[1/6 for i in range(6)],
                [0 if i != 0 else 1 for i in range(6)]])
## Draw the plot
fig, axes = plt.subplots(ncols=2)
plt.subplots_adjust(wspace=0.35)
for idx in range(2):
    ψ_t = iterate_ψ(ψ_0[idx], P, ts_length)
    for i in range(n):
        axes[idx].plot(ψ_t[:, i] - ψ_star[i], alpha=0.5, label=fr'$\psi_t({i+1})$')
        axes[idx].set_ylim([-0.3, 0.3])
        axes[idx].set_xlabel('t')
        axes[idx].set_ylabel(fr'$\psi_t$')
        axes[idx].legend()
        axes[idx].axhline(0, linestyle='dashed', lw=1, color = 'black')

plt.show()
<Figure size 640x480 with 2 Axes>
Solution to Exercise 3

Suppose that PP is stochastic and, moreover, that PkP^k is stochastic for some integer kk.

We will prove that Pk+1=PPkP^{k+1} = P P^k is also stochastic.

(We are doing proof by induction --- we assume the claim is true at kk and now prove it is true at k+1k+1.)

To see this, observe that, since PkP^k is stochastic and the product of nonnegative matrices is nonnegative, Pk+1=PPkP^{k+1} = P P^k is nonnegative.

Also, if 1\mathbf 1 is a column vector of ones, then, since PkP^k is stochastic we have Pk1=1P^k \mathbf 1 = \mathbf 1 (rows sum to one).

Therefore Pk+11=PPk1=P1=1P^{k+1} \mathbf 1 = P P^k \mathbf 1 = P \mathbf 1 = \mathbf 1

The proof is done.

References
  1. Hamilton, J. D. (2005). What’s real about the business cycle? Federal Reserve Bank of St. Louis Review, July-August, 435–452.
  2. Imam, P., & Temple, J. R. (2023). Political Institutions and Output Collapses. IMF Working Paper.
  3. Sargent, T. J., & Stachurski, J. (2023). Economic Networks: Theory and Computation. arXiv Preprint arXiv:2203.11972.
CC-BY-SA-4.0

Creative Commons License – This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International.