# (DT) Segment 31: A Tale of Model Selection

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

## To Calculate

Question 1: Putting on my filtering-glasses, the data seems to follow a continuous profile which seems to vary slowly over a period of 1000 minutes, and therefore I expect measurement points close to each other in time to not yield data too far away from each other. Therefore, to estimate the measurement error, I group the data into clusters of $\displaystyle n\in \{5, 10, 20, 40, 50\}$ measurement points, and then assume that the measurement for each cluster should have ideally yielded a single value (estimated as the mean of the cluster). The measurement error is then estimated as the mean of standard deviations calculated for each cluster. This yields the following values:

             $\displaystyle (n,\sigma) \in \{(5,4.67),(10,4.81),(20,4.92),(40,4.99),(50,5.14)\}\;,$



and the following plot,




I would therefore estimate the measurement error to be $\displaystyle \approx 5$ in the eyeball-norm.

Question 2 and 3: My guesses for the models and the $\displaystyle \chi^2$ values computed using the fit are:

• Sum of two sines: 5.5983e+3,
• Sum of two Gaussians: 4.5002e+3,
• Sum of three sines: 5.2946e+03.

Question 4: The first two models have the least number of parameters, and therefore the sum of two Gaussians seems to win the model fitting contest. Going from the first model to the third model, the number of parameters is increased by 3, and the decrease in $\displaystyle \chi^2$ obtained is $\displaystyle \approx 300\;,$ thereby justifying the increase in number of parameters irrespective of whether AIC or BIC is used.