# Eleisha's Segment 10: The Central Limit Theorem

**To Calculate: **

1. Take 12 random values, each uniform between 0 and 1. Add them up and subtract 6. Prove that the result is close to a random value drawn from the Normal distribution with mean zero and standard deviation 1.

Below is some code that calculates the sum of these 12 uniform random values and compares it to a normal distribution with mean zero and standard deviation 1.

import random import math import numpy as np import matplotlib.pyplot as plt def get_uni_sum(): rand_vals = [] for i in xrange(0, 12): rand_vals.append(random.random()) uni_sum = np.sum(rand_vals) uni_sum_variate = uni_sum - 6 return uni_sum_variate normal_data = [] uni_data = [] for x in xrange(0, 10000): normal_data.append(random.normalvariate(0,1)) uni_data.append(get_uni_sum()) print "Summary statistics from 5000 normal samples" print "Mean: " + str(np.mean(normal_data)) print "Standard Deviation: " + str(np.std(normal_data)) print "Summary statistics from 5000 uniform sums samples" print "Mean: " + str(np.mean(uni_data)) print "Standard Deviation: " + str(np.std(uni_data)) plt.figure(1) plt.hist(normal_data, 200) plt.title("Samples from a Normal Distribution") plt.xlabel("Frequency") plt.ylabel("Value") plt.xticks([-4, -3, -2, -1, 0, 1, 2, 3, 4], ["-4", "-3", "-2", "-1", "0", "1", "2", "3", "4"]) plt.yticks([0, 25, 50, 75, 100, 125, 150, 175, 200], ["0", "25", "50", "75", "100", "125", "150", "175", "200"]) plt.savefig("Norm_Samples.pdf") plt.figure(2) plt.hist(uni_data, 200) plt.title("Samples from a Uniform Distribution") plt.xlabel("Frequency") plt.ylabel("Value") plt.xticks([-4, -3, -2, -1, 0, 1, 2, 3, 4], ["-4", "-3", "-2", "-1", "0", "1", "2", "3", "4"]) plt.yticks([0, 25, 50, 75, 100, 125, 150, 175, 200], ["0", "25", "50", "75", "100", "125", "150", "175", "200"]) plt.savefig("Uni_Samples.pdf") plt.show()

Sample Output

Summary statistics from 5000 normal samples Mean: -0.00614837267393 Standard Deviation: 1.00521848466 Summary statistics from 5000 uniform sums samples Mean: 0.00682570348368 Standard Deviation: 0.991557299845

Here is a graph of the modified uniform distribution sum:

Here is a graph of samples from a normal distribution:

As you can see the see the result of the modified sum uniform random variables and the random variables from the normal distribution are quite similar.

2. Invent a family of functions, each different, that look like those in Slide 3: they all have value 1 at x = 0; they all have zero derivative at x = 0; and they generally (not necessarily monotonically) decrease to zero at large x. Now multiply 10 of them together and graph the result near the origin (i.e., reproduce what Slide 3 was sketching).

3. For what value(s) of **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \nu}**
does the Student distribution (Segment 8, Slide 4) have a convergent 1st and 2nd moment, but divergent 3rd and higher moments?
The Student distribution has a convergent 1st and 2nd moment when **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle nu = 3 }**
. This is because the integral: **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \int_{-\infty}^{\infty} p(t)dt }**
diverges when **Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \nu = 3}**
.

** To Think About: **

1. A distribution with moments as in problem 3 above has a well-defined mean and variance. Does the CLT hold for the sum of RVs from such a distribution? If not, what goes wrong in the proof? Is the mean of the sum equal to the sum of the individual means? What about the variance of the sum? What, qualitatively, does the distribution of the sum of a bunch of them look like?

2. Give an explanation of Bessel's correction in the last expression on slide 5. If, as we see, the MAP calculation gives the factor 1/N, why would one ever want to use 1/(N-1) instead? (There are various wiki and stackoverflow pages on this. See if they make sense to you!)

**Back To** Eleisha Jackson