Difference between revisions of "Eleisha's Segment 34: Permutation Tests"

From Computational Statistics Course Wiki
Jump to navigation Jump to search
(Created page with "<b> To Calculate </b> 1. Use the permutation test to decide whether the contingency table <math>\begin{bmatrix} 5 & 3 & 2\\ 2 & 3 & 6 \\ 0 & 2 & 3 \end{bmatrix} </math> show...")
 
Line 8: Line 8:
 
\end{bmatrix} </math> shows a significant association. What is the p-value?
 
\end{bmatrix} </math> shows a significant association. What is the p-value?
  
2. Repeat the calculation using the Pearson chi-square statistic instead of the Wald statistic, or vice versa.
+
The chi-square statistic for this table is 5.756. In order to determine whether this is a significant association you can perform a permutation test under the assumption that there is no association between the row and column. You can do this by generating 100,000 tables which marginals that are preserved from the original table. The p-value is the the probability of seeing a statistic at least as extreme as this table.  Using this method, the association was not determined to be significant and the p-value was approximately 0.226.  
  
<b>To Think About </b>
+
Below is the python script that was used to perform the calculations.
  
1. Is slide's 7 suggestion, that you figure out how to implement the permutation test without "expanding all the data", actually possible? If so, what is your method?
+
<pre>
 +
import scipy.stats
 +
import numpy as np
 +
import random
 +
import matplotlib.pyplot as plt
  
<b>Back To: </b> [[Eleisha Jackson]]
+
obs_list = [[5,3,2], [2,3,6], [0,2,3]] #Load our table
 +
obs = np.array(obs_list)
 +
[chi_t, p_val, dof, expected] =  scipy.stats.chi2_contingency(obs)
 +
print chi_t #Get the stats for out statistics
 +
print p_val
 +
print dof
 +
print expected
 +
 
 +
rows, cols = obs.shape
 +
first_col = []
 +
second_col = []
 +
 
 +
#Deconstruct the data into counts
 +
for i in xrange(0, rows):
 +
for j in xrange (0, cols):
 +
num_elements = obs[i][j]
 +
k = 0
 +
while (k < num_elements):
 +
#print i+1, j+1
 +
first_col.append(i+1)
 +
second_col.append(j+1)
 +
k = k+1
 +
 
 +
#Creates a new table with the shuffled data
 +
def create_new_table(first_col, shuffle_col):
 +
new_table = np.zeros((rows, cols))
 +
for i in xrange(0, len(first_col)):
 +
num_i = first_col[i]-1
 +
num_j = shuffle_col[i] -1
 +
new_table[num_i][num_j] = new_table[num_i][num_j]+ 1
 +
return new_table
 +
 
 +
p_vals = []
 +
chi_ts = []
 +
 
 +
for x in xrange(0, 100000):  #Generate 100,000 permutations
 +
shuffle_col = second_col #Shuffle the second column
 +
random.shuffle(shuffle_col)
 +
table = create_new_table(first_col, shuffle_col)
 +
[t, p, dof, expected] =  scipy.stats.chi2_contingency(table)
 +
p_vals.append(p)
 +
chi_ts.append(t)
 +
 
 +
sum_chi = 0
 +
for element in chi_ts:
 +
if element > chi_t or element == chi_t:
 +
sum_chi = sum_chi + 1
 +
final_p = float(sum_chi)/len(chi_ts)
 +
print final_p #Calculate the one  tailed p-value using the permutations
 +
 
 +
plt.hist(chi_ts, 30) #Plot a histogram
 +
plt.xlabel("Value of Pearson chi-square statistic")
 +
plt.ylabel("Frequency")
 +
plt.savefig("Eleisha_HW34.png")
 +
plt.show()
 +
</pre>
 +
 
 +
<b> Sample Output </b>
 +
 
 +
<pre>
 +
 
 +
 
 +
 
 +
</pre>

Revision as of 18:24, 15 April 2014

To Calculate

1. Use the permutation test to decide whether the contingency table Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \begin{bmatrix} 5 & 3 & 2\\ 2 & 3 & 6 \\ 0 & 2 & 3 \end{bmatrix} } shows a significant association. What is the p-value?

The chi-square statistic for this table is 5.756. In order to determine whether this is a significant association you can perform a permutation test under the assumption that there is no association between the row and column. You can do this by generating 100,000 tables which marginals that are preserved from the original table. The p-value is the the probability of seeing a statistic at least as extreme as this table. Using this method, the association was not determined to be significant and the p-value was approximately 0.226.

Below is the python script that was used to perform the calculations.

import scipy.stats
import numpy as np
import random
import matplotlib.pyplot as plt

obs_list = [[5,3,2], [2,3,6], [0,2,3]] #Load our table
obs = np.array(obs_list)
[chi_t, p_val, dof, expected] =  scipy.stats.chi2_contingency(obs)
print chi_t #Get the stats for out statistics
print p_val
print dof
print expected

rows, cols = obs.shape
first_col = []
second_col = []

#Deconstruct the data into counts
for i in xrange(0, rows):
	for j in xrange (0, cols):
		num_elements = obs[i][j]
		k = 0
		while (k < num_elements):
			#print i+1, j+1
			first_col.append(i+1)
			second_col.append(j+1)
			k = k+1

#Creates a new table with the shuffled data
def create_new_table(first_col, shuffle_col):
	new_table = np.zeros((rows, cols))
	for i in xrange(0, len(first_col)):
		num_i = first_col[i]-1
		num_j = shuffle_col[i] -1
		new_table[num_i][num_j] = new_table[num_i][num_j]+ 1
	return new_table

p_vals = []
chi_ts = []

for x in xrange(0, 100000):  #Generate 100,000 permutations
	shuffle_col = second_col #Shuffle the second column
	random.shuffle(shuffle_col)
	table = create_new_table(first_col, shuffle_col)
	[t, p, dof, expected] =  scipy.stats.chi2_contingency(table)
	p_vals.append(p)
	chi_ts.append(t)

sum_chi = 0 
for element in chi_ts:
	if element > chi_t or element == chi_t:
		sum_chi = sum_chi + 1
final_p = float(sum_chi)/len(chi_ts)
print final_p #Calculate the one  tailed p-value using the permutations

plt.hist(chi_ts, 30) #Plot a histogram
plt.xlabel("Value of Pearson chi-square statistic")
plt.ylabel("Frequency")
plt.savefig("Eleisha_HW34.png")
plt.show()	

Sample Output