# Segment 24. Goodness of Fit

## Contents

#### Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/EJleSVf0Z-U&hd=1 |width=800 |height=625 |border=0 }}

Links to the slides: PDF file or PowerPoint file

### Problems

#### To Calculate

1. Let $\displaystyle X$ be an R.V. that is a linear combination (with known, fixed coefficients $\displaystyle \alpha_k$ ) of twenty $\displaystyle N(0,1)$ deviates. That is, $\displaystyle X = \sum_{k=1}^{20} \alpha_k T_k$ where $\displaystyle T_k \sim N(0,1)$ . How can you most simply form a t-value-squared (that is, something distributed as $\displaystyle \text{Chisquare}(1)$ from $\displaystyle X$ ? For some particular choice of $\displaystyle \alpha_k$ 's (random is ok), generate a sample of $\displaystyle x$ 's, plot their histogram, and show that it agrees with $\displaystyle \text{Chisquare}(1)$ .

2. From some matrix of known coefficients $\displaystyle \alpha_{ik}$ with $\displaystyle k=1,\ldots,20$ and $\displaystyle i = 1,\ldots,100$ , generate 100 R.V.s $\displaystyle X_i = \sum_{k=1}^{20} \alpha_{ik} T_k$ where $\displaystyle T_k \sim N(0,1)$ . In other words, you are expanding 20 i.i.d. $\displaystyle T_k$ 's into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as $\displaystyle \text{Chisquare}(\nu)$ ? What is the value of $\displaystyle \nu$ ? Use enough samples so that you could distinguish between $\displaystyle \nu$ and $\displaystyle \nu-1$ .

3. Reproduce the table of critical $\displaystyle \Delta\chi^2$ values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)

1. Design a numerical experiment to exemplify the assertions on slide 8, namely that $\displaystyle \chi^2_{min}$ varies by $\displaystyle \pm\sqrt{2\nu}$ from data set to data set, but varies only by $\displaystyle \pm O(1)$ as the fitted parameters $\displaystyle \mathbf b$ vary within their statistical uncertainty?

2. Suppose you want to estimate the central value $\displaystyle \mu$ of a sample of $\displaystyle N$ values drawn from $\displaystyle \text{Cauchy}(\mu,\sigma)$ . If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as $\displaystyle N^{-1/2}$ ? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.

### Class Activity

I measured the temperature of my framitron manifold every minute for 1000 minutes, with the same accuracy, $\displaystyle \sigma = 5$ , for each measurement. The data is plotted below (with data points connected by straight lines), and is in the file Modelselection1.txt.

It's a contest! Which group can write down a model $\displaystyle T(t|\mathbf{b})$ , where $\displaystyle \mathbf{b}$ is a vector of parameters, that gives the best fit to the data in a least squares sense.

Part 1. Any number of parameters in $\displaystyle \mathbf{b}$ are allowed.

Part 2. At most 20 parameters are allowed.

Part 3. At most 10 parameters are allowed.

Part 4. At most 4 parameters are allowed.

And, oh by the way, we'll actually test your model on a different realization of the same process, possibly one of the ones shown below.