Central limit theorem

Introduction

The central limit theorem states that the average of a sum of N random variables tends to a Gaussian distribution as N approaches infinity. The only requirement is that the variance of the probability distribution for the random variables be finite.

    To be specific, consider a continuous random variable x with probability density f(x). That is, f(x)Δx is the probability that x has a value between x and x + Δx. The mean value of x is defined as

<x> = ∫x f(x) dx.

Similarly the mean value of x2 is given by

<x2> = ∫x2f(x) dx.

The variance σx2 of f(x) is

σx2 = <x2> - <x>2.

Now consider the sum yN of N values of x:

y = yN = (1/N)(x1 + x2 + … + xN).

We generate the N values of x from the probability density f(x) and determine the sum y. The quantity y is an example of a random additive process. We know that the values of y will not be identical, but will be distributed according to a probability density p(y), where p(y)Δy is the probability that the value of y is in the range y to y + Δy. The main question of interest is what is the form of the probability density f(y)?

    As we will find by doing the simulation, the form of p(y) is universal if σx is finite and N is sufficiently large.

Method

  1. Generate N random variables xi that satisfy a given probability density f(x), sum them, and divide by N.
  2. Repeat step 1 many times.
  3. Plot the histogram of the values of the sum y.

Problems

  1. First consider the uniform distribution f(x) = 1 in the interval [0,1]. Calculate <x> and σx.
  2. Use the default value of N and describe the qualitative form of p(y). Does the qualitative form of p(y) change as the number of measurements of y is increased for a given value of N?
  3. What is the approximate width of p(y) for N = 12? Describe the changes, if any, of the form and width of p(y) as N is increased. Increase N by at least a factor of 4.
  4. To determine the generality of your results, consider the probability density f(x) = 2e-2x for x ≥ 0. Verify that f(x) is properly normalized. (We have chosen f(x) so that its mean is the same as the mean for the uniform distribution in Problem 1.)
  5. Consider the Lorentz distribution

    f(x) = (1/π)(1/(x2 + 1),

    where -∞ ≤ x ≤ ∞. Use symmetry arguments to show that <x> = 0. What is the variance σx? Do you obtain a Gaussian distribution for this case? If not, why not?

  6. Each value of y can be considered to be a measurement. The sample variance s2 is a measure of the square of the difference in the result of each measurement and is given by

    The reason for the factor of N - 1 rather than N in the definition of s2 is that to compute it, we need to use the N values of x to compute the mean of y, and thus, loosely speaking, we have only N - 1 independent values of x remaining to calculate s2. Show that if N >> 1, then s ≅ σy, where the standard deviation σy is given by

    σy2 = <y2> - <y>2.

  7. The quantity s is known as the standard deviation of the means. That is, s gives a measure of how much variation we expect to find if we make repeated measurements of y. How does the value of s compare to your estimated width of the probability density p(y)?

References

Java Classes

Updated 28 December 2009.