User:Kai:Segment5

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

To calculate

1. You throw a pair of fair dice 10 times and, each time, you record the total number of spots. When you are done, what is the probability that exactly 5 of the 10 recorded totals are prime?

The total of each throw is between 2 and 12, which is included five primes: 2=1+1, 3=1+2=2+1, 5=1+4=2+3=3+2=4+1, 7=1+6=2+5=...=6+1, 11=5+6=6+5. Thus, for each throw, the probability you got a prime total is (1+2+4+6+2)/36=5/12. According to binormial distribution, the probability that exactly 5 of the 10 recorded totals are prime is <math>\binom{10}{5} \cdot (\frac 5{12})^5 (1-\frac 5{12})^5 = 0.2138</math>

2. If you flip a fair coin one billion times, what is the probability that the number of heads is between 500010000 and 500020000, inclusive? (Give answer to 4 significant figures.)

So the formula is easy: <math>P = \sum_{k=500010000}^{500020000} \binom{n}k \cdot (\frac12)^k \cdot (1-\frac12)^{n-k}, n=10^9 </math>

As n is very large, we can use de Moivre–Laplace theorem to make a continuous normal approximation to this binormial distribution. Then the probability is <math>P = \int_{n/2+10000}^{n/2+20000} \frac{1}{\sqrt{2 \pi nq(1-p)}}\,e^{-(k-np)^2 / (2np(1-p))} dk = \int_{1}^{2} \frac{10000}{\sqrt{\pi n/2}}\,e^{-t^2 / 5} dt=\frac{10000\sqrt5}{\sqrt{\pi n/2}} (erf (2/\sqrt5)-erf(1/\sqrt5)) \simeq 0.1812</math> (this answer is probably not right)

To think about

In-class problem

Teamed with Tameem, Kkumar, Silu's homework problems, Jonathan

A Pure Iterative Method

For activity 2, an iterative algorithm can save a lot of time and the time needed might attain to a polynomial time (shown in the next section, User:Kai:Segment5#A_modified_iterative_method). In this section, a pure iterative algorithm with a exponential time is shown to illustrate the basic methods.

So first, there are two cases of trials needed to be considered. The first cases is the first trial, and the second is when the trial is not the first one. The reason why I emphasize this difference is because these two cases should be dealt with by totally different ways.

For the first trial, the situation is simple: outcome 1 with probability p, outcome 0 with probability 1-p. If the first trial's outcome is 1, then the following N-1 trials should have n-1 outcomes of 1 and We now know the previous outcome before these N-1 trials is 1. If the first trial's outcome is 0, then the following N-1 trials should have n outcomes of 1 and We now know the previous outcome before these N-1 trials is 0. The Pro function in the code can conclude this paragraph.

For the following trials, where the iteration begins, there are also two subcases to be considered for every trial. The first one is the previous trial's outcome is 1. The second one is the previous trial's outcome is 0. They should be considered separately because the previous outcomes can influence the probability of the current outcomes. An important observation which we conclude from group discussion is the probability for different outcomes of the current trial can be calculated, that is, a,b,c,d in the function of Pro in the code. So when the previous outcome is 0, the probability that there are n 1s in the following nn outcomes is q0(n,nn)=a*q0(n,nn-1)+b*q1(n-1,nn-1). Here a*q0(n,nn-1) means the current trial is 0, and in the following nn-1 trials, n 1s is needed to guarantee n 1s out of the total nn outcomes. b*q1(n-1,nn-1) means the current trial is 1, so in the following nn-1 trials, only n-1 1s is needed. Thus, the above description is the essential idea of the iterative method. More details about the iteration can be seen in the functions, q0 and q1, in the code.

The following pure iterative algorithm has an exponential time, which is about <math>2^N</math> for n=N/2. But as the analysis following the code, a lot of repeat calculations are included and they can probably be avoided.

So the pure iterative method code is:

loops=0 #how many loops needed

def Pro(n0,nn0,p,q):
    #to calculate the final probability. n0 is n and nn0 is N
    a=q+(1-p)*(1-q) #the probability that the current outcome is 0, if the previous outcome is 0
    b=(1-q)*p       #the probability that the current outcome is 1, if the previous outcome is 0
    c=(1-q)*(1-p)   #the probability that the current outcome is 0, if the previous outcome is 1
    d=q+(1-q)*p     #the probability that the current outcome is 1, if the previous outcome is 1
    return (1-p)*q0(n0,nn0-1)+p*q1(n0-1,nn0-1)

def q0(n,nn):
    #the probability that there are n 1s in the following nn outcomes, when the previous outcome is 0
    global loops
    loops +=1
    if n<0 or n>nn:
        return 0
    elif n==0 and nn==0:
        return 1
    else:
        return a*q0(n,nn-1)+b*q1(n-1,nn-1)

def q1(n,nn):
    #the probability that there are n 1s in the following nn outcomes, when the previous outcome is 1
    global loops
    loops +=1
    if n<0 or n>nn:
        return 0
    elif n==0 and nn==0:
        return 1
    else:
        return c*q0(n,nn-1)+d*q1(n-1,nn-1)

print Pro(0,12,0.8,0.2),loops

The output is

 2.63243407685e-06 24

where 24 is the number of loops needed.

By trying N=2,4,6...24 and n=N/2, I plotted the figure of N~log(loops). And it is a linear relation, which shows an exponential time needed.

Loops pure iteration.png

A modified iterative method

By analyzing the total non-repeated data we used, which are mostly return values of functions q0 and q1 and is shown in the picture below, we can find that the total number of useful values is very limited, which is about (N-n)n. So if we can modify the iterative algorithm by avoiding repeat calculations, the loops needed might shrink to (N-n)n.

Q0.PNG Q1.PNG

The direct way to avoid repeating calculations is not to calculate them if they have already been calculated. So more storage is needed to give that information. However, the storage needed is very small, just 2Nn double number, so it is worthy to do so. The code is

loops=0 #how many loops needed

def Pro(n0,nn0,p,q):
    #to calculate the final probability
    global q0data, q1data
    q0data=[[-1 for row in range(n0+1)] for col in range(nn0+1)] #To storage the q0(i,j) values those have been calculated. If it has not calculated yet, the value is -1. 
    q1data=[[-1 for row in range(n0+1)] for col in range(nn0+1)] #To storage the q1(i,j) values those have been calculated. If it has not calculated yet, the value is -1. 
    a=q+(1-p)*(1-q) #the probability that the current outcome is 0, if the previous outcome is 0
    b=(1-q)*p       #the probability that the current outcome is 1, if the previous outcome is 0
    c=(1-q)*(1-p)   #the probability that the current outcome is 0, if the previous outcome is 1
    d=q+(1-q)*p     #the probability that the current outcome is 1, if the previous outcome is 1
    return (1-p)*q0(n0,nn0-1)+p*q1(n0-1,nn0-1)

def q0(n,nn):
    #the probability that there are n 1s in the following nn outcomes, when the previous outcome is 0
    global loops
    loops +=1
    if q0data[nn][n]!=-1: #if q0data[nn][n]!=-1, it means this value has been calculated and no more iterations for this value are needed, which can save a lot of time. 
        return q0data[nn][n]
    elif n<0 or n>nn:
        return 0
    elif n==0 and nn==0:
        return 1
    else:
        q0data[nn][n]=a*q0(n,nn-1)+b*q1(n-1,nn-1)
        return q0data[nn][n]

def q1(n,nn):
    #the probability that there are n 1s in the following nn outcomes, when the previous outcome is 1
    global loops
    loops +=1
    if q1data[nn][n]!= -1:
        return q1data[nn][n]
    elif n<0 or n>nn:
        return 0
    elif n==0 and nn==0:
        return 1
    else:
        q1data[nn][n]=c*q0(n,nn-1)+d*q1(n-1,nn-1)
        return q1data[nn][n]

print Pro(720,900,0.8,0.2),loops

The result is

0.0271286525826 520198

where the number of loops is just 520198 for n=720 N=900.

In this modified iteration algorithm,I plotted the figure of N~log(loops),too (n=N/2). And it is a logarithm relation, which shows a polynomial time needed.

Modified iteratioin algorithm.png

Comment by Bill P.: You Win!

This is a correct solution. I will bring your $100 prize on Monday. Congratulations!!