Difference between revisions of "Eleisha's Segment 34: Permutation Tests"

From Computational Statistics Course Wiki
Jump to navigation Jump to search
Line 88: Line 88:
  
 
2. Repeat the calculation using the Pearson chi-square statistic instead of the Wald statistic, or vice versa.
 
2. Repeat the calculation using the Pearson chi-square statistic instead of the Wald statistic, or vice versa.
 +
 +
Could not figure out how to calculate the Wald Statistic for the 3x3 case. Sorry!
 +
  
 
<b> To Think About: </b>
 
<b> To Think About: </b>

Revision as of 13:13, 20 April 2014

To Calculate:

1. Use the permutation test to decide whether the contingency table Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{bmatrix} 5 & 3 & 2\\ 2 & 3 & 6 \\ 0 & 2 & 3 \end{bmatrix} } shows a significant association. What is the p-value?

The chi-square statistic for this table is 5.756. In order to determine whether this is a significant association you can perform a permutation test under the assumption that there is no association between the row and column. You can do this by generating 100,000 tables which marginals that are preserved from the original table. The p-value is the the probability of seeing a statistic at least as extreme as this table. Using this method, the association was not determined to be significant and the p-value was approximately 0.226.

Histogram of chi-square statistics for the 100,000 permuted tables Eleisha HW34.png

Below is the python script that was used to perform the calculations.

import scipy.stats
import numpy as np
import random
import matplotlib.pyplot as plt

obs_list = [[5,3,2], [2,3,6], [0,2,3]] #Load our table
obs = np.array(obs_list)
[chi_t, p_val, dof, expected] =  scipy.stats.chi2_contingency(obs)
print chi_t #Get the stats for out statistics
print p_val
print dof
print expected

rows, cols = obs.shape
first_col = []
second_col = []

#Deconstruct the data into counts
for i in xrange(0, rows):
	for j in xrange (0, cols):
		num_elements = obs[i][j]
		k = 0
		while (k < num_elements):
			#print i+1, j+1
			first_col.append(i+1)
			second_col.append(j+1)
			k = k+1

#Creates a new table with the shuffled data
def create_new_table(first_col, shuffle_col):
	new_table = np.zeros((rows, cols))
	for i in xrange(0, len(first_col)):
		num_i = first_col[i]-1
		num_j = shuffle_col[i] -1
		new_table[num_i][num_j] = new_table[num_i][num_j]+ 1
	return new_table

p_vals = []
chi_ts = []

for x in xrange(0, 100000):  #Generate 100,000 permutations
	shuffle_col = second_col #Shuffle the second column
	random.shuffle(shuffle_col)
	table = create_new_table(first_col, shuffle_col)
	[t, p, dof, expected] =  scipy.stats.chi2_contingency(table)
	p_vals.append(p)
	chi_ts.append(t)

sum_chi = 0 
for element in chi_ts:
	if element > chi_t or element == chi_t:
		sum_chi = sum_chi + 1
final_p = float(sum_chi)/len(chi_ts)
print final_p #Calculate the one  tailed p-value using the permutations

plt.hist(chi_ts, 30) #Plot a histogram
plt.xlabel("Value of Pearson chi-square statistic")
plt.ylabel("Frequency")
plt.savefig("Eleisha_HW34.png")
plt.show()	

Sample Output

Test statistic: 5.75599173554
p-value: 0.218127096086
Degrees of Freedom: 4
p-value from Permutation Test: 0.22573

2. Repeat the calculation using the Pearson chi-square statistic instead of the Wald statistic, or vice versa.

Could not figure out how to calculate the Wald Statistic for the 3x3 case. Sorry!


To Think About:

1. Is slide's 7 suggestion, that you figure out how to implement the permutation test without "expanding all the data", actually possible? If so, what is your method?

Back To: Eleisha Jackson