Segment 24. Goodness of Fit

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/EJleSVf0Z-U&hd=1 |width=800 |height=625 |border=0 }}

The direct YouTube link is http://youtu.be/EJleSVf0Z-U

Links to the slides: PDF file or PowerPoint file

Problems

To Calculate

1. Let <math>X</math> be an R.V. that is a linear combination (with known, fixed coefficients <math>\alpha_k</math>) of twenty <math>N(0,1)</math> deviates. That is, <math>X = \sum_{k=1}^{20} \alpha_k T_k</math> where <math>T_k \sim N(0,1)</math>. How can you most simply form a t-value-squared (that is, something distributed as <math>\text{Chisquare}(1)</math> from <math>X</math>? For some particular choice of <math>\alpha_k</math>'s (random is ok), generate a sample of <math>x</math>'s, plot their histogram, and show that it agrees with <math>\text{Chisquare}(1)</math>.

2. From some matrix of known coefficients <math>\alpha_{ik}</math> with <math>k=1,\ldots,20</math> and <math>i = 1,\ldots,100</math>, generate 100 R.V.s <math>X_i = \sum_{k=1}^{20} \alpha_{ik} T_k</math> where <math>T_k \sim N(0,1)</math>. In other words, you are expanding 20 i.i.d. <math>T_k</math>'s into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as <math>\text{Chisquare}(\nu)</math>? What is the value of <math>\nu</math>? Use enough samples so that you could distinguish between <math>\nu</math> and <math>\nu-1</math>.

3. Reproduce the table of critical <math>\Delta\chi^2</math> values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)

To Think About

1. Design a numerical experiment to exemplify the assertions on slide 8, namely that <math>\chi^2_{min}</math> varies by <math>\pm\sqrt{2\nu}</math> from data set to data set, but varies only by <math>\pm O(1)</math> as the fitted parameters <math>\mathbf b</math> vary within their statistical uncertainty?

2. Suppose you want to estimate the central value <math>\mu</math> of a sample of <math>N</math> values drawn from <math>\text{Cauchy}(\mu,\sigma)</math>. If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as <math>N^{-1/2}</math>? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.