(DT) Segment 31: A Tale of Model Selection

From Computational Statistics Course Wiki
Revision as of 11:54, 9 April 2014 by Deepesh (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

To Calculate

Question 1: Putting on my filtering-glasses, the data seems to follow a continuous profile which seems to vary slowly over a period of 1000 minutes, and therefore I expect measurement points close to each other in time to not yield data too far away from each other. Therefore, to estimate the measurement error, I group the data into clusters of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n\in \{5, 10, 20, 40, 50\}} measurement points, and then assume that the measurement for each cluster should have ideally yielded a single value (estimated as the mean of the cluster). The measurement error is then estimated as the mean of standard deviations calculated for each cluster. This yields the following values:

             Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle  (n,\sigma) \in \{(5,4.67),(10,4.81),(20,4.92),(40,4.99),(50,5.14)\}\;,}

and the following plot,

             DTSeg31sigmaest.jpg

I would therefore estimate the measurement error to be Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \approx 5} in the eyeball-norm.

Question 2 and 3: My guesses for the models and the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \chi^2} values computed using the fit are:

  • Sum of two sines: 5.5983e+3,
  • Sum of two Gaussians: 4.5002e+3,
  • Sum of three sines: 5.2946e+03.

Question 4: The first two models have the least number of parameters, and therefore the sum of two Gaussians seems to win the model fitting contest. Going from the first model to the third model, the number of parameters is increased by 3, and the decrease in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \chi^2} obtained is Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \approx 300\;,} thereby justifying the increase in number of parameters irrespective of whether AIC or BIC is used.