Eleisha's Segment 34: Permutation Tests

From Computational Statistics Course Wiki
Revision as of 19:25, 15 April 2014 by Eleishaj (talk | contribs)
Jump to navigation Jump to search

To Calculate

1. Use the permutation test to decide whether the contingency table Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{bmatrix} 5 & 3 & 2\\ 2 & 3 & 6 \\ 0 & 2 & 3 \end{bmatrix} } shows a significant association. What is the p-value?

The chi-square statistic for this table is 5.756. In order to determine whether this is a significant association you can perform a permutation test under the assumption that there is no association between the row and column. You can do this by generating 100,000 tables which marginals that are preserved from the original table. The p-value is the the probability of seeing a statistic at least as extreme as this table. Using this method, the association was not determined to be significant and the p-value was approximately 0.226.

Below is the python script that was used to perform the calculations.

import scipy.stats
import numpy as np
import random
import matplotlib.pyplot as plt

obs_list = [[5,3,2], [2,3,6], [0,2,3]] #Load our table
obs = np.array(obs_list)
[chi_t, p_val, dof, expected] =  scipy.stats.chi2_contingency(obs)
print chi_t #Get the stats for out statistics
print p_val
print dof
print expected

rows, cols = obs.shape
first_col = []
second_col = []

#Deconstruct the data into counts
for i in xrange(0, rows):
	for j in xrange (0, cols):
		num_elements = obs[i][j]
		k = 0
		while (k < num_elements):
			#print i+1, j+1
			first_col.append(i+1)
			second_col.append(j+1)
			k = k+1

#Creates a new table with the shuffled data
def create_new_table(first_col, shuffle_col):
	new_table = np.zeros((rows, cols))
	for i in xrange(0, len(first_col)):
		num_i = first_col[i]-1
		num_j = shuffle_col[i] -1
		new_table[num_i][num_j] = new_table[num_i][num_j]+ 1
	return new_table

p_vals = []
chi_ts = []

for x in xrange(0, 100000):  #Generate 100,000 permutations
	shuffle_col = second_col #Shuffle the second column
	random.shuffle(shuffle_col)
	table = create_new_table(first_col, shuffle_col)
	[t, p, dof, expected] =  scipy.stats.chi2_contingency(table)
	p_vals.append(p)
	chi_ts.append(t)

sum_chi = 0 
for element in chi_ts:
	if element > chi_t or element == chi_t:
		sum_chi = sum_chi + 1
final_p = float(sum_chi)/len(chi_ts)
print final_p #Calculate the one  tailed p-value using the permutations

plt.hist(chi_ts, 30) #Plot a histogram
plt.xlabel("Value of Pearson chi-square statistic")
plt.ylabel("Frequency")
plt.savefig("Eleisha_HW34.png")
plt.show()	

Sample Output

Test statistic: 5.75599173554
p-value: 0.218127096086
Degrees of Freedom: 4
p-value: 0.22573