Segment 24. Goodness of Fit

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Watch this segment

(Don't worry, what you see statically below is not the beginning of the segment. Press the play button to start at the beginning.)

{{#widget:Iframe |url=http://www.youtube.com/v/EJleSVf0Z-U&hd=1 |width=800 |height=625 |border=0 }}

The direct YouTube link is http://youtu.be/EJleSVf0Z-U

Links to the slides: PDF file or PowerPoint file

Problems

To Calculate

1. Let be an R.V. that is a linear combination (with known, fixed coefficients ) of twenty deviates. That is, where . How can you most simply form a t-value-squared (that is, something distributed as from ? For some particular choice of 's (random is ok), generate a sample of 's, plot their histogram, and show that it agrees with .

2. From some matrix of known coefficients with and , generate 100 R.V.s where . In other words, you are expanding 20 i.i.d. 's into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as ? What is the value of ? Use enough samples so that you could distinguish between and .

3. Reproduce the table of critical values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)

To Think About

1. Design a numerical experiment to exemplify the assertions on slide 8, namely that varies by from data set to data set, but varies only by as the fitted parameters vary within their statistical uncertainty?

2. Suppose you want to estimate the central value of a sample of values drawn from . If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as ? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.

Class Activity

I measured the temperature of my framitron manifold every minute for 1000 minutes, with the same accuracy, , for each measurement. The data is plotted below (with data points connected by straight lines), and is in the file Modelselection1.txt.

1st set

It's a contest! Which group can write down a model , where is a vector of parameters, that gives the best fit to the data in a least squares sense.

Part 1. Any number of parameters in are allowed.

Part 2. At most 20 parameters are allowed.

Part 3. At most 10 parameters are allowed.

Part 4. At most 4 parameters are allowed.

And, oh by the way, we'll actually test your model on a different realization of the same process, possibly one of the ones shown below.

2nd set

3rd set

Modelselection2.txt

Modelselection3.txt

Modelselection4.txt

Modelselection5.txt

Modelselection6.txt