16

From Computational Statistics Course Wiki
Jump to navigation Jump to search

1. Simulate the following: You have M=50 p-values, none actually causal, so that they are drawn from a uniform distribution. Not knowing this sad fact, you apply the Benjamini-Hochberg prescription with \alpha=0.05 and possibly call some discoveries as true. By repeated simulation, estimate the probability of thus getting N wrongly-called discoveries, for N=0, 1, 2, and 3.

Here is my code for this problem:

import numpy as np
import scipy.stats as sc
import matplotlib.pyplot as plt
alpha = .05
n = 50
sim = 10000
l = []
for k in range(sim):
   #creates the random p-values
   p = np.arange(0,1,1.0/n)
   for i in range(n):
       p[i] = (np.random.uniform(0.0, 1.0, 1))
   p = sorted(p)
   #gets positive discoveries
   x = np.arange(1,n+1, 1)
   i = 0
   while p[i] < (i+1)*alpha/(n+1):
       i += 1
 
#data output
   if i == 0:
       fd[0] += 1
   if i == 1:
       fd[1] += 1
   if i == 2:
       fd[2] += 1
   if i == 3:
       fd[3] += 1

print fd

for 10^5 runs, the data is as follows:

N = 0 --> 95258 --> P(N = 0) = 95.258%
N = 1 --> 4412 --> P(N = 1) = 4.412%
N = 2 --> 303 --> P(N = 2) = .303%
N = 3 --> 32 --> P(N =3) = .032%

2. Does the distribution that you found in problem 1 depend on M? On \alpha? Derive its form analytically for the usual case of \alpha \ll 1?