# Difference between revisions of "Eleisha's Segment 34: Permutation Tests"

Line 12: | Line 12: | ||

<b> Histogram of chi-square statistics for the 100,000 permuted tables </b> | <b> Histogram of chi-square statistics for the 100,000 permuted tables </b> | ||

− | [[File: | + | [[File:Eleisha_HW34.png]] |

Below is the python script that was used to perform the calculations. | Below is the python script that was used to perform the calculations. |

## Revision as of 19:32, 15 April 2014

** To Calculate: **

1. Use the permutation test to decide whether the contingency table shows a significant association. What is the p-value?

The chi-square statistic for this table is 5.756. In order to determine whether this is a significant association you can perform a permutation test under the assumption that there is no association between the row and column. You can do this by generating 100,000 tables which marginals that are preserved from the original table. The p-value is the the probability of seeing a statistic at least as extreme as this table. Using this method, the association was not determined to be significant and the p-value was approximately 0.226.

** Histogram of chi-square statistics for the 100,000 permuted tables **

Below is the python script that was used to perform the calculations.

import scipy.stats import numpy as np import random import matplotlib.pyplot as plt obs_list = [[5,3,2], [2,3,6], [0,2,3]] #Load our table obs = np.array(obs_list) [chi_t, p_val, dof, expected] = scipy.stats.chi2_contingency(obs) print chi_t #Get the stats for out statistics print p_val print dof print expected rows, cols = obs.shape first_col = [] second_col = [] #Deconstruct the data into counts for i in xrange(0, rows): for j in xrange (0, cols): num_elements = obs[i][j] k = 0 while (k < num_elements): #print i+1, j+1 first_col.append(i+1) second_col.append(j+1) k = k+1 #Creates a new table with the shuffled data def create_new_table(first_col, shuffle_col): new_table = np.zeros((rows, cols)) for i in xrange(0, len(first_col)): num_i = first_col[i]-1 num_j = shuffle_col[i] -1 new_table[num_i][num_j] = new_table[num_i][num_j]+ 1 return new_table p_vals = [] chi_ts = [] for x in xrange(0, 100000): #Generate 100,000 permutations shuffle_col = second_col #Shuffle the second column random.shuffle(shuffle_col) table = create_new_table(first_col, shuffle_col) [t, p, dof, expected] = scipy.stats.chi2_contingency(table) p_vals.append(p) chi_ts.append(t) sum_chi = 0 for element in chi_ts: if element > chi_t or element == chi_t: sum_chi = sum_chi + 1 final_p = float(sum_chi)/len(chi_ts) print final_p #Calculate the one tailed p-value using the permutations plt.hist(chi_ts, 30) #Plot a histogram plt.xlabel("Value of Pearson chi-square statistic") plt.ylabel("Frequency") plt.savefig("Eleisha_HW34.png") plt.show()

** Sample Output **

Test statistic: 5.75599173554 p-value: 0.218127096086 Degrees of Freedom: 4 p-value: 0.22573

** To Think About: **

1. Is slide's 7 suggestion, that you figure out how to implement the permutation test without "expanding all the data", actually possible? If so, what is your method?

** Back To: ** Eleisha Jackson