Skip to article frontmatterSkip to article content
Contents
and

19. LLN and CLT

19.1Overview

This lecture illustrates two of the most important results in probability and statistics:

  1. the law of large numbers (LLN) and
  2. the central limit theorem (CLT).

These beautiful theorems lie behind many of the most fundamental results in econometrics and quantitative economic modeling.

The lecture is based around simulations that show the LLN and CLT in action.

We also demonstrate how the LLN and CLT break down when the assumptions they are based on do not hold.

This lecture will focus on the univariate case (the multivariate case is treated in a more advanced lecture).

We’ll need the following imports:

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as st

19.2The law of large numbers

We begin with the law of large numbers, which tells us when sample averages will converge to their population means.

19.2.1The LLN in action

Let’s see an example of the LLN in action before we go further.

We can generate a draw of XX with scipy.stats (imported as st) as follows:

p = 0.8
X = st.bernoulli.rvs(p)
print(X)
1

In this setting, the LLN tells us if we flip the coin many times, the fraction of heads that we see will be close to the mean pp.

We use nn to represent the number of times the coin is flipped.

Let’s check this:

n = 1_000_000
X_draws = st.bernoulli.rvs(p, size=n)
print(X_draws.mean()) # count the number of 1's and divide by n
0.80016

If we change pp the claim still holds:

p = 0.3
X_draws = st.bernoulli.rvs(p, size=n)
print(X_draws.mean())
0.299901

Let’s connect this to the discussion above, where we said the sample average converges to the “population mean”.

Think of X1,,XnX_1, \ldots, X_n as independent flips of the coin.

The population mean is the mean in an infinite sample, which equals the expectation EX\mathbb E X.

The sample mean of the draws X1,,XnX_1, \ldots, X_n is

Xˉn:=1ni=1nXi\bar X_n := \frac{1}{n} \sum_{i=1}^n X_i

In this case, it is the fraction of draws that equal one (the number of heads divided by nn).

Thus, the LLN tells us that for the Bernoulli trials above

XˉnEX=p(n) \bar X_n \to \mathbb E X = p \qquad (n \to \infty)

This is exactly what we illustrated in the code.

19.2.2Statement of the LLN

Let’s state the LLN more carefully.

Let X1,,XnX_1, \ldots, X_n be random variables, all of which have the same distribution.

These random variables can be continuous or discrete.

For simplicity we will

  • assume they are continuous and
  • let ff denote their common density function

The last statement means that for any ii in {1,,n}\{1, \ldots, n\} and any numbers a,ba, b,

P{aXib}=abf(x)dx\mathbb P\{a \leq X_i \leq b\} = \int_a^b f(x) dx

(For the discrete case, we need to replace densities with probability mass functions and integrals with sums.)

Let μ denote the common mean of this sample.

Thus, for each ii,

μ:=EXi=xf(x)dx\mu := \mathbb E X_i = \int_{-\infty}^{\infty} x f(x) dx

The sample mean is

Xˉn:=1ni=1nXi\bar X_n := \frac{1}{n} \sum_{i=1}^n X_i

The next theorem is called Kolmogorov’s strong law of large numbers.

Here

  • IID means independent and identically distributed and
  • EX=xf(x)dx\mathbb E |X| = \int_{-\infty}^\infty |x| f(x) dx

19.2.3Comments on the theorem

What does the probability one statement in the theorem mean?

Let’s think about it from a simulation perspective, imagining for a moment that our computer can generate perfect random samples (although this isn’t strictly true).

Let’s also imagine that we can generate infinite sequences so that the statement Xˉnμ\bar X_n \to \mu can be evaluated.

In this setting, (7) should be interpreted as meaning that the probability of the computer producing a sequence where Xˉnμ\bar X_n \to \mu fails to occur is zero.

19.2.4Illustration

Let’s illustrate the LLN using simulation.

When we illustrate it, we will use a key idea: the sample mean Xˉn\bar X_n is itself a random variable.

The reason Xˉn\bar X_n is a random variable is that it’s a function of the random variables X1,,XnX_1, \ldots, X_n.

What we are going to do now is

  1. pick some fixed distribution to draw each XiX_i from
  2. set nn to some large number

and then repeat the following three instructions.

  1. generate the draws X1,,XnX_1, \ldots, X_n
  2. calculate the sample mean Xˉn\bar X_n and record its value in an array sample_means
  3. go to step 1.

We will loop over these three steps mm times, where mm is some large integer.

The array sample_means will now contain mm draws of the random variable Xˉn\bar X_n.

If we histogram these observations of Xˉn\bar X_n, we should see that they are clustered around the population mean EX\mathbb E X.

Moreover, if we repeat the exercise with a larger value of nn, we should see that the observations are even more tightly clustered around the population mean.

This is, in essence, what the LLN is telling us.

To implement these steps, we will use functions.

Our first function generates a sample mean of size nn given a distribution.

def draw_means(X_distribution,  # The distribution of each X_i
               n):              # The size of the sample mean

    # Generate n draws: X_1, ..., X_n
    X_samples = X_distribution.rvs(size=n)

    # Return the sample mean
    return np.mean(X_samples)

Now we write a function to generate mm sample means and histogram them.

def generate_histogram(X_distribution, n, m): 

    # Compute m sample means

    sample_means = np.empty(m)
    for j in range(m):
      sample_means[j] = draw_means(X_distribution, n) 

    # Generate a histogram

    fig, ax = plt.subplots()
    ax.hist(sample_means, bins=30, alpha=0.5, density=True)
    μ = X_distribution.mean()  # Get the population mean
    σ = X_distribution.std()    # and the standard deviation
    ax.axvline(x=μ, ls="--", c="k", label=fr"$\mu = {μ}$")
     
    ax.set_xlim(μ - σ, μ + σ)
    ax.set_xlabel(r'$\bar X_n$', size=12)
    ax.set_ylabel('density', size=12)
    ax.legend()
    plt.show()

Now we call the function.

# pick a distribution to draw each $X_i$ from
X_distribution = st.norm(loc=5, scale=2) 
# Call the function
generate_histogram(X_distribution, n=1_000, m=1000)
<Figure size 640x480 with 1 Axes>

We can see that the distribution of Xˉ\bar X is clustered around EX\mathbb E X as expected.

Let’s vary n to see how the distribution of the sample mean changes.

We will use a violin plot to show the different distributions.

Each distribution in the violin plot represents the distribution of XnX_n for some nn, calculated by simulation.

def means_violin_plot(distribution,  
                      ns = [1_000, 10_000, 100_000],
                      m = 10_000):

    data = []
    for n in ns:
        sample_means = [draw_means(distribution, n) for i in range(m)]
        data.append(sample_means)

    fig, ax = plt.subplots()

    ax.violinplot(data)
    μ = distribution.mean()
    ax.axhline(y=μ, ls="--", c="k", label=fr"$\mu = {μ}$")

    labels=[fr'$n = {n}$' for n in ns]

    ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)
    ax.set_xlim(0.25, len(labels) + 0.75)


    plt.subplots_adjust(bottom=0.15, wspace=0.05)

    ax.set_ylabel('density', size=12)
    ax.legend()
    plt.show()

Let’s try with a normal distribution.

means_violin_plot(st.norm(loc=5, scale=2))
<Figure size 640x480 with 1 Axes>

As nn gets large, more probability mass clusters around the population mean μ.

Now let’s try with a Beta distribution.

means_violin_plot(st.beta(6, 6))
<Figure size 640x480 with 1 Axes>

We get a similar result.

19.3Breaking the LLN

We have to pay attention to the assumptions in the statement of the LLN.

If these assumptions do not hold, then the LLN might fail.

19.3.1Infinite first moment

As indicated by the theorem, the LLN can break when EX\mathbb E |X| is not finite.

We can demonstrate this using the Cauchy distribution.

The Cauchy distribution has the following property:

If X1,,XnX_1, \ldots, X_n are IID and Cauchy, then so is Xˉn\bar X_n.

This means that the distribution of Xˉn\bar X_n does not eventually concentrate on a single number.

Hence the LLN does not hold.

The LLN fails to hold here because the assumption EX<\mathbb E|X| < \infty is violated by the Cauchy distribution.

19.3.2Failure of the IID condition

The LLN can also fail to hold when the IID assumption is violated.

Does this contradict the LLN, which says that the distribution of Xˉn\bar X_n collapses to the single point μ?

No, the LLN is correct --- the issue is that its assumptions are not satisfied.

In particular, the sequence X1,,XnX_1, \ldots, X_n is not independent.

19.4Central limit theorem

Next, we turn to the central limit theorem (CLT), which tells us about the distribution of the deviation between sample averages and population means.

19.4.1Statement of the theorem

The central limit theorem is one of the most remarkable results in all of mathematics.

In the IID setting, it tells us the following:

Here dN(0,σ2)\stackrel { d } {\to} N(0, \sigma^2) indicates convergence in distribution to a centered (i.e., zero mean) normal with standard deviation σ.

The striking implication of the CLT is that for any distribution with finite second moment, the simple operation of adding independent copies always leads to a Gaussian(Normal) curve.

19.4.2Simulation 1

Since the CLT seems almost magical, running simulations that verify its implications is one good way to build understanding.

To this end, we now perform the following simulation

  1. Choose an arbitrary distribution FF for the underlying observations XiX_i.
  2. Generate independent draws of Yn:=n(Xˉnμ)Y_n := \sqrt{n} ( \bar X_n - \mu ).
  3. Use these draws to compute some measure of their distribution --- such as a histogram.
  4. Compare the latter to N(0,σ2)N(0, \sigma^2).

Here’s some code that does exactly this for the exponential distribution F(x)=1eλxF(x) = 1 - e^{- \lambda x}.

(Please experiment with other choices of FF, but remember that, to conform with the conditions of the CLT, the distribution must have a finite second moment.)

# Set parameters
n = 250         # Choice of n
k = 1_000_000        # Number of draws of Y_n
distribution = st.expon(2) # Exponential distribution, λ = 1/2
μ, σ = distribution.mean(), distribution.std()

# Draw underlying RVs. Each row contains a draw of X_1,..,X_n
data = distribution.rvs((k, n))
# Compute mean of each row, producing k draws of \bar X_n
sample_means = data.mean(axis=1)
# Generate observations of Y_n
Y = np.sqrt(n) * (sample_means - μ)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * σ, 3 * σ
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.4, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, st.norm.pdf(xgrid, scale=σ), 
        'k-', lw=2, label=r'$N(0, \sigma^2)$')
ax.set_xlabel(r"$Y_n$", size=12)
ax.set_ylabel(r"$density$", size=12)

ax.legend()

plt.show()
<Figure size 1000x600 with 1 Axes>

(Notice the absence of for loops --- every operation is vectorized, meaning that the major calculations are all shifted to fast C code.)

The fit to the normal density is already tight and can be further improved by increasing n.

19.5Exercises

Solution to Exercise 1
# Set parameters
n = 250         # Choice of n
k = 1_000_000        # Number of draws of Y_n
distribution = st.beta(2,2) # We chose Beta(2, 2) as an example
μ, σ = distribution.mean(), distribution.std()

# Draw underlying RVs. Each row contains a draw of X_1,..,X_n
data = distribution.rvs((k, n))
# Compute mean of each row, producing k draws of \bar X_n
sample_means = data.mean(axis=1)
# Generate observations of Y_n
Y = np.sqrt(n) * (sample_means - μ)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * σ, 3 * σ
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.4, density=True)
ax.set_xlabel(r"$Y_n$", size=12)
ax.set_ylabel(r"$density$", size=12)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, st.norm.pdf(xgrid, scale=σ), 'k-', lw=2, label=r'$N(0, \sigma^2)$')
ax.legend()

plt.show()
<Figure size 1000x600 with 1 Axes>
Solution to Exercise 2

We can write XX as X=1{U<p}X = \mathbf 1\{U < p\} where 1\mathbf 1 is the indicator function (i.e., 1 if the statement is true and zero otherwise).

Here we generated a uniform draw UU on [0,1][0,1] and then used the fact that

P{0U<p}=p0=p\mathbb P\{0 \leq U < p\} = p - 0 = p

This means that X=1{U<p}X = \mathbf 1\{U < p\} has the right distribution.

Solution to Exercise 3

Q1 Solution

Regarding part 1, we claim that XtX_t has the same distribution as X0X_0 for all tt.

To construct a proof, we suppose that the claim is true for XtX_t.

Now we claim it is also true for Xt+1X_{t+1}.

Observe that we have the correct mean:

EXt+1=α+βEXt=α+βα1β=α1β\begin{aligned} \mathbb E X_{t+1} &= \alpha + \beta \mathbb E X_t \\ &= \alpha + \beta \frac{\alpha}{1-\beta} \\ &= \frac{\alpha}{1-\beta} \end{aligned}

We also have the correct variance:

Var(Xt+1)=β2Var(Xt)+σ2=β2σ21β2+σ2=σ21β2\begin{aligned} \mathrm{Var}(X_{t+1}) &= \beta^2 \mathrm{Var}(X_{t}) + \sigma^2\\ &= \frac{\beta^2\sigma^2}{1-\beta^2} + \sigma^2 \\ &= \frac{\sigma^2}{1-\beta^2} \end{aligned}

Finally, since both XtX_t and ϵ0\epsilon_0 are normally distributed and independent from each other, any linear combination of these two variables is also normally distributed.

We have now shown that

Xt+1N(α1β,σ21β2)X_{t+1} \sim N \left(\frac{\alpha}{1-\beta}, \frac{\sigma^2}{1-\beta^2}\right)

We can conclude this AR(1) process violates the independence assumption but is identically distributed.

Q2 Solution

σ = 10
α = 0.8
β = 0.2
n = 100_000

fig, ax = plt.subplots(figsize=(10, 6))
x = np.ones(n)
x[0] = st.norm.rvs(α/(1-β), α**2/(1-β**2))
ϵ = st.norm.rvs(size=n+1)
means = np.ones(n)
means[0] = x[0]
for t in range(n-1):
    x[t+1] = α + β * x[t] + σ * ϵ[t+1]
    means[t+1] = np.mean(x[:t+1])


ax.scatter(range(100, n), means[100:n], s=10, alpha=0.5)

ax.set_xlabel(r"$n$", size=12)
ax.set_ylabel(r"$\bar X_n$", size=12)
yabs_max = max(ax.get_ylim(), key=abs)
ax.axhline(y=α/(1-β), ls="--", lw=3, 
           label=r"$\mu = \frac{\alpha}{1-\beta}$", 
           color = 'black')

plt.legend()
plt.show()
<Figure size 1000x600 with 1 Axes>

We see the convergence of xˉ\bar x around μ even when the independence assumption is violated.

CC-BY-SA-4.0

Creative Commons License – This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International.