# Eleisha's Segment 34: Permutation Tests

** To Calculate: **

1. Use the permutation test to decide whether the contingency table
**Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{bmatrix} 5 & 3 & 2\\ 2 & 3 & 6 \\ 0 & 2 & 3 \end{bmatrix} }**
shows a significant association. What is the p-value?

The chi-square statistic for this table is 5.756. In order to determine whether this is a significant association you can perform a permutation test under the assumption that there is no association between the row and column. You can do this by generating 100,000 tables which marginals that are preserved from the original table. The p-value is the the probability of seeing a statistic at least as extreme as this table. Using this method, the association was not determined to be significant and the p-value was approximately 0.226.

** Histogram of chi-square statistics for the 100,000 permuted tables **

Below is the python script that was used to perform the calculations.

import scipy.stats import numpy as np import random import matplotlib.pyplot as plt obs_list = [[5,3,2], [2,3,6], [0,2,3]] #Load our table obs = np.array(obs_list) [chi_t, p_val, dof, expected] = scipy.stats.chi2_contingency(obs) print chi_t #Get the stats for out statistics print p_val print dof print expected rows, cols = obs.shape first_col = [] second_col = [] #Deconstruct the data into counts for i in xrange(0, rows): for j in xrange (0, cols): num_elements = obs[i][j] k = 0 while (k < num_elements): #print i+1, j+1 first_col.append(i+1) second_col.append(j+1) k = k+1 #Creates a new table with the shuffled data def create_new_table(first_col, shuffle_col): new_table = np.zeros((rows, cols)) for i in xrange(0, len(first_col)): num_i = first_col[i]-1 num_j = shuffle_col[i] -1 new_table[num_i][num_j] = new_table[num_i][num_j]+ 1 return new_table p_vals = [] chi_ts = [] for x in xrange(0, 100000): #Generate 100,000 permutations shuffle_col = second_col #Shuffle the second column random.shuffle(shuffle_col) table = create_new_table(first_col, shuffle_col) [t, p, dof, expected] = scipy.stats.chi2_contingency(table) p_vals.append(p) chi_ts.append(t) sum_chi = 0 for element in chi_ts: if element > chi_t or element == chi_t: sum_chi = sum_chi + 1 final_p = float(sum_chi)/len(chi_ts) print final_p #Calculate the one tailed p-value using the permutations plt.hist(chi_ts, 30) #Plot a histogram plt.xlabel("Value of Pearson chi-square statistic") plt.ylabel("Frequency") plt.savefig("Eleisha_HW34.png") plt.show()

** Sample Output **

Test statistic: 5.75599173554 p-value: 0.218127096086 Degrees of Freedom: 4 p-value from Permutation Test: 0.22573

2. Repeat the calculation using the Pearson chi-square statistic instead of the Wald statistic, or vice versa.

Could not figure out how to calculate the Wald Statistic for the 3x3 case. Sorry!

** To Think About: **

1. Is slide's 7 suggestion, that you figure out how to implement the permutation test without "expanding all the data", actually possible? If so, what is your method?

** Back To: ** Eleisha Jackson