/Segment33

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

To Calculate

1. How many distinct m by n contingency tables are there that have exactly N total events?

<math> {N+m*n \choose m*n-1} </math>

Hmm. Close, but not right. How did you get this? Note that with N=1 we should get m*n, but your answer doesn't! Wpress 15:37, 21 April 2013 (CDT)


Thanks for pointing that out Bill. It should be this: <math> {N+m*n-1 \choose m*n-1} </math> Kai also helped me on that, divide the N balls by m*n-1 dividers so that we can put them in m*n boxes. But, we have to fill a ball in each box first or there will be zero balls between two of the dividers sometime. So we divide N+m*n balls, the space between them will be N+m*n-1. We will pick m*n-1 dividers among the N+m*n-1 possible spaces, thus N+m*n-1 pick m*n-1. -- Silu 10.49, 22 April 2013 (CDT)

2. For every distinct 2 by 2 contingency table containing exactly 14 elements, compute its chi-square statistic, and also its Wald statistic. Display your results as a scatter plot of one statistic versus the other.

Here's my code, I didn't want to mess up with the zero draws as I might get zero denominators for probabilities in Wald's T, so I excluded zero counts in any cell.


import random
import itertools
import scipy.misc as ms
import math
import numpy as np
import matplotlib.pyplot as plt

counts=range(1,14)
tablesprep=list(itertools.permutations(counts, 4))
tables=[]

for table in tablesprep:
    if (sum(table) ==14):
        tables.append(table)

#now I get a list of tables each have total count of 14
wald=[]
chisquares=[]
for table in tables:
    p_1=float(table[0])/float(table[0]+table[2])
    float(p_1)
    p_2=float(table[1])/float(table[1]+table[3])
    p=float(table[0]+table[1])/14
    nt=p_1-p_2
    dt_1=math.sqrt(p*(1-p)) 
    dt_2=math.sqrt((1/float(table[0]+table[2])+(1/float(table[3]+table[1]))))
    t=float(nt/(dt_1*dt_2))
    wald.append(t)

    row_1=table[0]+table[1]
    row_2=table[2]+table[3]
    col_1=table[0]+table[2]
    col_2=table[1]+table[3]
    e_11=float(row_1*col_1)/14
    e_12=float(row_1*col_2)/14
    e_21=float(row_2*col_1)/14
    e_22=float(row_2*col_2)/14
    chisquare=(table[0]-e_11)**2/e_11+(table[1]-e_12)**2/e_12+(table[2]-e_21)**2/e_21+(table[3]-e_22)**2/e_22
    chisquares.append(chisquare)

plt.scatter(chisquares, wald)
plt.xlabel(r'chisquared')
plt.ylabel(r'Wald T')
plt.show()

Wald.png

To Think About=

1. Suppose you want to find out of living under power lines causes cancer. Describe in detail how you would do this (1) as a case/control study, (2) as a longitudinal study, (3) as a snapshot study. Can you think of a way to do it as a study with all the marginals fixed (protocol 4)?


2. For an m by n contingency table, can you think of a systematic way to code "the loop over all possible contingency tables with the same marginals" in slide 8?