Jan31 Team4 Multinomial Parameter Estimation

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Team 4's Solution

Daniel Shepard, Todd Swinson, Ian Yen

The past two segments have been about Bayesian parameter estimation.

In Segment 5. Bernoulli Trials, we did Bayesian parameter estimation of the rate parameter of a binomial distribution. The setup was: we saw the outcomes of a series of independent trials. There were two possible outcomes to each trial: the jailer says B, or the jailer says C. There was one parameter of interest: x, the probability with which the jailer says B. (What about the probability that the jailer says C? It is 1-x, because the jailer only has 2 choices.) The goal was to compute the posterior distribution, given data in the form of counts of outcomes observed, of x.

In this exercise, we will generalize this to a multinomial setting. Each trial is now a chess game, to which there are three possible outcomes: white wins, black wins, or the players draw. We have data on the outcomes of 10,000 real chess games. We want to use this data to learn how likely each outcome is. In other words, we assume that due to the structure of the game of chess, there is some inherent probability of each outcome occurring, and we want to figure out what these probabilities are. The parameters of interest are w, the probability that white wins, and b, the probability that black wins. (What about the probability that they draw? Since , the probability of a draw is ) To do this, we will take counts of the outcomes observed and compute the joint posterior distribution of w and b given this data.

Notational conventions
w = probability that white wins
b = probability that black wins
d = probability that the players draw
N = total number of games observed
W = number of white wins in these games
B = number of black wins in these games
D = number of draws in these games
Activity checkpoints

1. What does a joint uniform prior on w and b look like?

2. Suppose we know that w=0.4, b = 0.3, and d = 0.3. If we watch N = 10 games, what is the probability that W = 3, B = 5, and D = 2?


3. For general w, b, d, W, B, D, what is P(W, B, D | w, b, d)?


4. Applying Bayes, what is P(w, b, d | W, B, D)? (The Bayes denominator is tricky - if you present us with the integral to evaluate, we will provide the answer.)


5. Here is the real data - chess_outcomes.txt. Each line represents the outcome of one game. Count the outcomes of the first N games and produce a visualization of the joint posterior of the win rates for N = 0, 3, 10, 100, 1000, and 10000.

We did this in MATLAB. Code is below. Note that we used the gamma function in log space instead of factorials or n choose k, to avoid overflow/underflow.

load 'chess.mat'
N = [0,3,10,100,1000,10000];
[w b] = meshgrid(0.01:0.01:1,0.01:0.01:1);
for(i = 1:length(N))
    W = 0;
    B = 0;
    D = 0;
    % counting the number of wins
    for(j = 1:N(i))
        if(Outcomes{j} == 'W')
            W = W+1;
        elseif(Outcomes{j} == 'B')
            B = B+1;
        else
            D = D+1;
        end
    end

    %calculating the posterior using the gamma function
    P = @(w,b) ((w+b)<1).*exp(W*log(w) + B*log(b) + D*log(1-w-b) ...
        + gammaln(N(i)+3) - gammaln(W+1) - gammaln(B+1) - gammaln(D+1));
    Pwb = P(w,b);
    Pwb(isnan(Pwb)) = 0;
    figure
    surf(w,b,Pwb)
end


0 trials.
0 outcomes so far (the prior).
0 trials.
100 outcomes.
0 trials.
10000 outcomes.

In the heatmaps above, w is on the x-axis and b is on the y-axis. We see that White has a slightly higher probability of winning than Black does, given this dataset and the assumed model.