# Segment 24. Goodness of Fit

#### Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/EJleSVf0Z-U&hd=1 |width=800 |height=625 |border=0 }}

The direct YouTube link is http://youtu.be/EJleSVf0Z-U

Links to the slides: PDF file or PowerPoint file

### Problems

#### To Calculate

1. Let <math>X</math> be an R.V. that is a linear combination (with known, fixed coefficients <math>\alpha_k</math>) of twenty <math>N(0,1)</math> deviates. That is, <math>X = \sum_{k=1}^{20} \alpha_k T_k</math> where <math>T_k \sim N(0,1)</math>. How can you most simply form a t-value-squared (that is, something distributed as <math>\text{Chisquare}(1)</math> from <math>X</math>? For some particular choice of <math>\alpha_k</math>'s (random is ok), generate a sample of <math>x</math>'s, plot their histogram, and show that it agrees with <math>\text{Chisquare}(1)</math>.

2. From some matrix of known coefficients <math>\alpha_{ik}</math> with <math>k=1,\ldots,20</math> and <math>i = 1,\ldots,100</math>, generate 100 R.V.s <math>X_i = \sum_{k=1}^{20} \alpha_{ik} T_k</math> where <math>T_k \sim N(0,1)</math>. In other words, you are expanding 20 i.i.d. <math>T_k</math>'s into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as <math>\text{Chisquare}(\nu)</math>? What is the value of <math>\nu</math>? Use enough samples so that you could distinguish between <math>\nu</math> and <math>\nu-1</math>.

3. Reproduce the table of critical <math>\Delta\chi^2</math> values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)

#### To Think About

1. Design a numerical experiment to exemplify the assertions on slide 8, namely that <math>\chi^2_{min}</math> varies by <math>\pm\sqrt{2\nu}</math> from data set to data set, but varies only by <math>\pm O(1)</math> as the fitted parameters <math>\mathbf b</math> vary within their statistical uncertainty?

2. Suppose you want to estimate the central value <math>\mu</math> of a sample of <math>N</math> values drawn from <math>\text{Cauchy}(\mu,\sigma)</math>. If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as <math>N^{-1/2}</math>? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.