24

1. Let X be an R.V. that is a linear combination (with known, fixed coefficients \alpha_k) of twenty N(0,1) deviates. That is, . How can you most simply form a t-value-squared (that is, something distributed as ? For some particular choice of \alpha_k's (random is ok), generate a sample of x's, plot their histogram, and show that it agrees with .

Since , then and The t-statistic is thus: Code here:

import numpy as np
import scipy.stats as sc
import matplotlib.pyplot as plt
from math import *

a_k = np.random.uniform(-1, 1.,20)
variance = np.sum(np.power(a_k,2))

def X():
T_k = np.random.normal(size=20)
X = np.sum(a_k*T_k)
return X

chi = []
for i in range(0,10000):
chi.append((X()**2)/variance)

def chiSquare(x, k):
return (x**((k/2.)-1.)*np.exp(-x/2.))/((2**(k/2.)*gamma(k/2.)))

x = np.arange(0,100,.01)
plt.hist(chi, bins = 1000, normed=True)
plt.plot(x,chiSquare(x,1),'r')
plt.xlabel("t^2")
plt.ylabel("p(t^2)")
plt.axis([0,6,0,1])
plt.show()


2. From some matrix of known coefficients with and , generate 100 R.V.s where . In other words, you are expanding 20 i.i.d. into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as ? What is the value of \nu? Use enough samples so that you could distinguish between and .

From the description given, the following represents the chi-squared for each trial run ( for a linear combination of normals):  It is not possible for this statistic to have a distribution representative of chi-squared with any degree of freedom because the individual t values (each i) are dependent on each other, because each t value uses the same 20 . Plot below:

However, if each i uses a different set of N(0,1), then each would not be dependent on each other. Since there are 100 trials and no parameters, the resulting chi-square should be distributed as chi-square(100). This is detectable from chi-square(99):

code below:

import numpy as np
import scipy.stats as sc
import matplotlib.pyplot as plt
from math import *

a_k = np.random.uniform(-1., 1.,2000).reshape(100,20)
variance = np.sum(np.power(a_k,2),axis = 1)

def X():
#obtains x_i
T_k = np.random.normal(size=20)
x = np.sum(a_k*T_k, axis = 1)
#evaulates ti^2
t_i_squared = np.power(x,2)/variance
#obtained chi-squared value
X_squared = t_i_squared.sum()
return X_squared

def X_altered():
#obtains x_i
T_k = np.random.normal(size=(100,20))
x = np.sum(a_k*T_k, axis = 1)
#evaulates ti^2
t_i_squared = np.power(x,2)/variance
#obtained chi-squared value
X_squared = t_i_squared.sum()
return X_squared


chi = []
for i in range(0,100000):
chi.append(X())

def chiSquare(x, k):
return (x**((k/2.)-1.)*np.exp(-x/2.))/((2**(k/2.)*gamma(k/2.)))

x = np.arange(0,200,.1)
plt.hist(chi, bins = 1000, normed=True)
plt.plot(x,chiSquare(x,100),'r', label="chisquare(100)")
plt.plot(x,chiSquare(x,99),'b', label="chisquare(99)")
plt.legend()
plt.xlabel("t^2")
plt.ylabel("p(t^2)")
plt.show()


3. Reproduce the table of critical values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)