# 24

1. Let X be an R.V. that is a linear combination (with known, fixed coefficients \alpha_k) of twenty N(0,1) deviates. That is, . How can you most simply form a t-value-squared (that is, something distributed as ? For some particular choice of \alpha_k's (random is ok), generate a sample of x's, plot their histogram, and show that it agrees with .

Since , then and

The t-statistic is thus:

Code here:

import numpy as np import scipy.stats as sc import matplotlib.pyplot as plt from math import *

a_k = np.random.uniform(-1, 1.,20) variance = np.sum(np.power(a_k,2))

def X(): T_k = np.random.normal(size=20) X = np.sum(a_k*T_k) return X chi = [] for i in range(0,10000): chi.append((X()**2)/variance)

def chiSquare(x, k): return (x**((k/2.)-1.)*np.exp(-x/2.))/((2**(k/2.)*gamma(k/2.)))

x = np.arange(0,100,.01) plt.hist(chi, bins = 1000, normed=True) plt.plot(x,chiSquare(x,1),'r') plt.xlabel("t^2") plt.ylabel("p(t^2)") plt.axis([0,6,0,1]) plt.show()

2. From some matrix of known coefficients with and , generate 100 R.V.s where . In other words, you are expanding 20 i.i.d. into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as ? What is the value of \nu? Use enough samples so that you could distinguish between and .

From the description given, the following represents the chi-squared for each trial run ( for a linear combination of normals):

It is not possible for this statistic to have a distribution representative of chi-squared with any degree of freedom because the individual t values (each i) are dependent on each other, because each t value uses the same 20 . Plot below:

However, if each i uses a different set of N(0,1), then each would not be dependent on each other. Since there are 100 trials and no parameters, the resulting chi-square should be distributed as chi-square(100). This is detectable from chi-square(99):

code below:

import numpy as np import scipy.stats as sc import matplotlib.pyplot as plt from math import *

a_k = np.random.uniform(-1., 1.,2000).reshape(100,20) variance = np.sum(np.power(a_k,2),axis = 1)

def X(): #obtains x_i T_k = np.random.normal(size=20) x = np.sum(a_k*T_k, axis = 1) #evaulates ti^2 t_i_squared = np.power(x,2)/variance #obtained chi-squared value X_squared = t_i_squared.sum() return X_squared

def X_altered(): #obtains x_i T_k = np.random.normal(size=(100,20)) x = np.sum(a_k*T_k, axis = 1) #evaulates ti^2 t_i_squared = np.power(x,2)/variance #obtained chi-squared value X_squared = t_i_squared.sum() return X_squared

chi = [] for i in range(0,100000): chi.append(X()) def chiSquare(x, k): return (x**((k/2.)-1.)*np.exp(-x/2.))/((2**(k/2.)*gamma(k/2.)))

x = np.arange(0,200,.1) plt.hist(chi, bins = 1000, normed=True) plt.plot(x,chiSquare(x,100),'r', label="chisquare(100)") plt.plot(x,chiSquare(x,99),'b', label="chisquare(99)") plt.legend() plt.xlabel("t^2") plt.ylabel("p(t^2)") plt.show()

3. Reproduce the table of critical values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)