User:Noah/Segment5

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

Segment 5 (28 January 2013)

Computation Problems

1.

Sum 2 3 4 5 6 7 8 9 10 11 12
Probability 1 2 3 4 5 6 5 4 3 2 1

So the probability <math>p</math> of a prime sum is <math>15/36</math>.

The probability of five out of ten dice being prime is <math>\binom{10}{5}\left(\frac{15}{36}\right)^5\left(\frac{21}{36}\right)^5\approx 0.21376</math>

2. Direct computation is not feasible. We approximate with the normal distribution:

from scipy.stats import norm
from math import sqrt
n=1e9
p=0.5
mean = n*p
var = n*p*p
stdev = sqrt(var)
n = norm(loc=mean, scale=stdev)
print n.cdf(500020000)-n.cdf(500010000)

and get <math>0.160593023067\approx0.1606</math>.

In-class problems

Teamed with User:Trettels, User talk:Ddykstra, User:Jzhang

Code for Python simulation

comment

Hi, I found a small inconsistency between your result and mine, it looks to me that if we have N trials, we shall have N+1 outcomes, from 0 head, 1 head... N head, which means the final result is supposed to contain N+1 element, and your code only show 12 elements, is there one possible case missing? -- Jin

Yes, I screwed up in the accounting for n=0. It should be fixed now. --Noah 21:09, 28 January 2013 (CST)

High-performance version

I've rewritten the code in C using OpenMP, so that I can get 10^12 iterations.

Running 1000000000000 iterations.
  0	     2789494	 0.00000278949400
  1	    29134804	 0.00002913480400
  2	   204873411	 0.00020487341100
  3	   949205806	 0.00094920580600
  4	  3680128345	 0.00368012834500
  5	 11422238802	 0.01142223880200
  6	 30108032808	 0.03010803280800
  7	 66754970472	 0.06675497047200
  8	124795069710	 0.12479506971000
  9	192442830694	 0.19244283069400
 10	236695814840	 0.23669581484000
 11	215476843930	 0.21547684393000
 12	117438066884	 0.11743806688400
sum	1.000000000

real	464m10.040s
user	3657m24.958s
sys	2m8.142s

This seems to disagree with those of other people (Kai, Jzhang, User:Trettels); I'm not sure why. If somebody finds a bug, I'll be happy to re-run the data.

The behavior was analyzed by running 5 times (with different seeds) for 10^6 through 10^12 iterations. The value for n=0 is plotted with respect to the number of iterations, with one line for each of the five sets of runs, and the probability can be clearly seen to converge to 0.00002789 as the number of iterations goes to 10^12 (x-axis).

Noah Compute 5 HPC 1.png Noah Compute 5 HPC 2.png Noah Compute 5 HPC 3.png Noah Compute 5 HPC 4.png

The full output from all five runs can be found in /Output. Performance numbers are from my ICES lab machine, an Intel(R) Xeon(R) CPU E31245 @ 3.30GHz (4 cores/8 threads).

Here is the code:

/* Compile with
 *  gcc -g -Wall -fopenmp -O2 -o inclass_5_1 inclass_5_1.c
 */

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int r(unsigned int *s, float p) {
  return rand_r(s) < p*RAND_MAX;
}

int main(int argc, char *argv[]) {
  int N = 12;
  float p = 0.8;
  float q = 0.2;
  unsigned long results[N+1];
  unsigned long iters = 1;
  double iterfloat = 0, sum = 0;
  unsigned k = 0;

  /* Parse command-line */
  if ( argc != 2 ) {
    printf("Usage: %s number_of_iterations\n", argv[0]);
    return 1;
  }
  if ( sscanf(argv[1], "%lg", &iterfloat) == 1 ) {
    iters = (unsigned long)iterfloat;
  }
  else {
    printf("Could not parse %s as number of iterations.\n", argv[1]);
    return 2;
  }
  printf("Running %lu iterations.\n", iters);

  /* Seed the initial random number generator */  
  srand((unsigned int)time((time_t *)NULL));

  for ( k = 0; k <= N; k++ )
    results[k] = 0;
  #pragma omp parallel
  {
    unsigned int seed;
    unsigned long thrresults[N+1];
    unsigned long i = 0;
    unsigned int j = 0;
    for ( j = 0; j <= N; j++ ) thrresults[j] = 0;
    #pragma omp critical
    seed = rand();
    #pragma omp for
    for ( i = 0; i < iters; i++ ) {
      int last_time = r(&seed, p);
      int successes = last_time;
      for ( j = 1; j < N; j++ ) {
        if ( !r(&seed, q) ) last_time = r(&seed, p);
        successes += last_time;
      }
      thrresults[successes]++;
    }
    #pragma omp critical
    for ( j = 0; j <= N; j++ ) results[j] += thrresults[j];
  }

  /* Display result */
  for ( k = 0; k <= N; k++ ) {
    double dval = results[k]*1.0/iters;
    printf("%3u\t%12lu\t%17.14f\n", k, results[k], dval);
    sum += dval;
  }
  printf("sum\t%.9f\n", sum);

  return 0;
}

Some notes:

  • OpenMP is used for parallelization.
    • #pragma omp parallel causes the following block to be executed once for each thread.
    • #pragma omp critical means the following line is a "critical section" and is only executed on one thread at a time. This is used for updating shared variables. Each thread maintains its own copy of the results array, but they are all added together at the end. To prevent data corruption due to multiple competing threads, this is a critical section. In certain circumstances, #pragma omp atomic may be appropriate, and will likely be faster.
    • #pragma omp for causes OpenMP to distribute the iterations of the for loop among the different threads. In simple cases, can be combined with #pragma omp parallel for #pragma omp parallel for.
  • The standard rand function is not thread-safe, as it uses a global state for the random number generator. I use rand_r in each thread, which I seed from the global random number generator. It's worth noting that this function is apparently obsolete with no replacement, perhaps I should have written the program to use GSL instead.
  • This program deals with large numbers (>4294967296, approximately 4*10^9), and requires 64 bit integers. I represent these as unsigned long which is only 32-bits on a 32-bit computer. Therefore, this program will only work correctly on 64-bit.