Simulations for Statistical and Thermal Physics

Central limit theorem

Introduction

The central limit theorem states that the average of a sum of N random variables tends to a Gaussian distribution as N approaches infinity. The only requirement is that the variance of the probability distribution for the random variables be finite.

To be specific, consider a continuous random variable x with probability density f(x). That is, f(x)Δx is the probability that x has a value between x and x + Δx. The mean value of x is defined as

<x> = ∫x f(x) dx.

Similarly the mean value of x² is given by

<x²> = ∫x²f(x) dx.

The variance σ_x² of f(x) is

σ_x² = <x²> - <x>².

Now consider the sum y_N of N values of x:

y = y_N = (1/N)(x₁ + x₂ + … + x_N).

We generate the N values of x from the probability density f(x) and determine the sum y. The quantity y is an example of a random additive process. We know that the values of y will not be identical, but will be distributed according to a probability density p(y), where p(y)Δy is the probability that the value of y is in the range y to y + Δy. The main question of interest is what is the form of the probability density f(y)?

As we will find by doing the simulation, the form of p(y) is universal if σ_x is finite and N is sufficiently large.

Method

Generate N random variables x_i that satisfy a given probability density f(x), sum them, and divide by N.
Repeat step 1 many times.
Plot the histogram of the values of the sum y.

Problems

First consider the uniform distribution f(x) = 1 in the interval [0,1]. Calculate <x> and σ_x.
Use the default value of N and describe the qualitative form of p(y). Does the qualitative form of p(y) change as the number of measurements of y is increased for a given value of N?
What is the approximate width of p(y) for N = 12? Describe the changes, if any, of the form and width of p(y) as N is increased. Increase N by at least a factor of 4.
To determine the generality of your results, consider the probability density f(x) = 2e^-2x for x ≥ 0. Verify that f(x) is properly normalized. (We have chosen f(x) so that its mean is the same as the mean for the uniform distribution in Problem 1.)
Consider the Lorentz distribution
f(x) = (1/π)(1/(x² + 1),

where -∞ ≤ x ≤ ∞. Use symmetry arguments to show that <x> = 0. What is the variance σ_x? Do you obtain a Gaussian distribution for this case? If not, why not?
Each value of y can be considered to be a measurement. The sample variance s² is a measure of the square of the difference in the result of each measurement and is given by

The reason for the factor of N - 1 rather than N in the definition of s² is that to compute it, we need to use the N values of x to compute the mean of y, and thus, loosely speaking, we have only N - 1 independent values of x remaining to calculate s². Show that if N >> 1, then s ≅ σ_y, where the standard deviation σ_y is given by
σ_y² = <y²> - <y>².
The quantity s is known as the standard deviation of the means. That is, s gives a measure of how much variation we expect to find if we make repeated measurements of y. How does the value of s compare to your estimated width of the probability density p(y)?

References

H. Gould, J. Tobochnik, and Wolfgang Christian, An Introduction to Computer Simulation Methods (Addison-Wesley, 2006), 3rd ed., pp. 213-214.

Java Classes

CentralApp

Updated 28 December 2009.