Eleisha's Segment 5: Bernoulli Trials

From Computational Statistics Course Wiki
Jump to navigation Jump to search

To Calculate:

1. You throw a pair of fair dice 10 times and, each time, you record the total number of spots. When you are done, what is the probability that exactly 5 of the 10 recorded totals are prime?

If you throw a pair of fair dice 10 times, you can either get an prime or a non-prime outcome. Therefore you can represent probability of a number of prime outcomes with a binomial distribution. In this case, there are N = 10 Bernoulli trials for which you are trying to calculate the probability of fives successes for. In this case, p, the probability of a success is represented by the probability that the outcome is a prime sum. Since there are 12 possible unique outcomes from rolling the dice and five those outcomes are prime, p = 5/12. You can calculate the probability that 5 of the recorded totals are prime by the binomial distribution with n = 5, N = 10 and p = 5/12.

Here are some python lines that can be used to calculate the probability:

from scipy.stats import binom 
N = 10
x = 5
p = (5.0/12.0)
prob = binom.pmf(x ,N, p)
print "Probability exactly 5 of the 10 recorded totals are prime: " + str(prob)

Output:

Probability exactly 5 of the 10 recorded totals are prime: 0.213760916116

The probability exactly 5 of the 10 recorded totals are prime is approximately 21.3%.

2. If you flip a fair coin one billion times, what is the probability that the number of heads is between 500010000 and 500020000, inclusive? (Give answer to 4 significant figures.) Flipping a coin a billion times can be represented by a binomial distribution with N = 1 Billion Bernoulli trials with the probability p = 0.5. p = 0.5 because it is a fair coin so there is an equal probability that you see a heads or a tails on any give flip.

You can calculate the probability that the number of heads is between 500010000 and 500020000 by calculating:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_1 } = The probability number of heads at most 500010000

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2 } = The probability number of heads at most 500020000

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2 - x_1 } is the probability that the number of heads is between 500010000 and 500020000

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_1 } and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x_2 } can be obtained by calculating the cumulative distribution of a binomial distribution with n = 500010000 and n = 500020000 respectively. In both cases, N = 1 billion and p = 0.5.

Here are some python lines that can be used to calculate the probability:

from scipy.stats import binom
N = 1000000000
x_1 = 500010000
x_2 = 500020000
p = 0.5
value_1 = binom.cdf(x_1, N, p)
value_2 = binom.cdf(x_2, N, p)
prob_diff  = value_2 - value_1
print "Probability that the number of heads is between 500010000 and 500020000: " + str(prob_diff)

Output:

Probability that the number of heads is between 500010000 and 500020000: 0.160588434741

Therefore the probability that the number of heads is between 500010000 and 500020000: is 0.1606 or approximately 16.06%.

To Think About:

1. Suppose that the assumption of independence (the first "i" in "i.i.d.") were violated. Specifically suppose that, after the first Bernoulli trial, every trial has a probability Q of simply reproducing the immediately previous outcome, and a probability (1-Q) of being an independent trial. How would you compute the probability of getting n events in N trials if the probability of each event (when it is independent) is p?

2. Try the Mathematica calculation on slide 5 without the magical "GenerateConditions -> False". Why is the output different?