Segment 16 Sanmit Narvekar
Segment 16
To Calculate
1. Simulate the following: You have M=50 p-values, none actually causal, so that they are drawn from a uniform distribution. Not knowing this sad fact, you apply the Benjamini-Hochberg prescription with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha=0.05} and possibly call some discoveries as true. By repeated simulation, estimate the probability of thus getting N wrongly-called discoveries, for N=0, 1, 2, and 3.
Here is the Matlab code:
M = 50; alpha = 0.05; nIters = 1000000; discoveries = zeros(10,1); for iter=1:nIters % Draw 50 pvalues, uniformly distributed pvals = rand(M,1); % Benjamini & Hochberg FDR "prescription" sortedPvals = sort(pvals); nDiscoveries = sum((sortedPvals ./ ((1:M)' .* alpha ./ M)) < 1); discoveries(nDiscoveries+1) = discoveries(nDiscoveries+1) + 1; end discoveries ./ nIters
I actually calculated it for N = 0 to 9 since some of the other values were found as well. Here are the results (starting with N=0 at the top):
0.950169 0.046188 0.003375 0.000250 0.000017 0.000001 0 0 0 0
2. Does the distribution that you found in problem 1 depend on M? On Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha}
? Derive its form analytically
for the usual case of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha \ll 1}
?
The distribution does not depend on M. I ran the code above with M = 1000, and received the following results, which are very similar to the ones above:
0.950202 0.046139 0.003332 0.000297 0.000025 0.000005 0 0 0 0
Obviously, it does depend on alpha, since changing alpha will (at the very least) affect the first bin N = 0, with changes propagating to the other ones. Here is alpha = 0.10:
0.899984 0.086055 0.011699 0.001868 0.000321 0.000056 0.000014 0.000003 0 0
To Think About
1. Suppose you have M independent trials of an experiment, each of which yields an independent p-value. Fisher proposed combining them by forming the statistic
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S = -2\sum_{i=0}^{i=M}\log(p_i)}
Show that, under the null hypothesis, S is distributed as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Chisquare}(2M)} and describe how you would obtain a combined p-value for this statistic.
2. Fisher is sometimes credited, on the basis of problem 1, with having invented "meta-analysis", whereby results from multiple investigations can be combined to get an overall more significant result. Can you see any pitfalls in this?