Class Activity 2/22/13

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

Last class, we used this cryptic function to test our random number generators:

 def test(x):
     ''' Input:   x - an array of purported independent uniform(0, 1)
         Returns: a mysterious p-value.
     ntable = 100
     counts, _, _ = np.histogram2d(x[:-1], x[1:],
                                   range=[(0, 1), (0, 1)],
     chisq, p_val = scipy.stats.chisquare(counts.flatten())
     return p_val


Should this chisquare test have a degree of freedom <math>(ntable-1)^2</math> , which means the corresponding parameter of the chisquare() function, ddof, should have a value, 2*(ntable-1)? --Kai

It turns out we were computing the p-value of a certain summary statistic of our random draws. Now that we know what p-values are, we are going to explore this further.

On the IPython server, 'import our_random' will give you access to this test function as well as 3 different functions (random_1, random_2, and random_3) that take an argument n and claim to return an array of n independent uniform(0, 1) deviates.

1. For each of the random_* functions, use our_random.test to compute p-values for 500 different sets of 2000 draws from the function. Plot histograms of the computed p-values. If each function does what it claims to do, what do you expect to see?

  • Bonus: Do the same thing for sets of 100 draws and explain the results that you see.

2. For each of the random_* functions, for a range of value of n, plot the median of the p-values for 5 different sets of n draws from the function.

  • Explore different ranges of n to try to find interesting behavior.
  • If you store your p-values in an appropriately structured numpy array, np.median can be used to conveniently find the median values.

For posterity

random_1 is an xkcd generator.

random_2 is the 'mod pi' idea from Wednesday.

random_3 is np.random.uniform.