Difference between revisions of "Eleisha's Segment 24: Goodness of Fit"

From Computational Statistics Course Wiki
Jump to navigation Jump to search
(Created page with "<b> To Calculate </b> 1. Let X be an R.V. that is a linear combination (with known, fixed coefficients <math> \alpha_k </math> ) of twenty <math> N(0,1) </math> deviates. Th...")
 
 
Line 3: Line 3:
 
1. Let X be an R.V. that is a linear combination (with known, fixed coefficients <math> \alpha_k </math> ) of twenty <math> N(0,1) </math> deviates. That is, <math>  X = \sum_{k=1}^{20} \alpha_k T_k where T_k \sim N(0,1) </math>. How can you most simply form a t-value-squared (that is, something distributed as <math> \text{Chisquare}(1) </math>  from X? For some particular choice of <math> \alpha_k's </math> (random is ok), generate a sample of <math>x</math>'s, plot their histogram, and show that it agrees with <math> \text{Chisquare}(1)</math>.
 
1. Let X be an R.V. that is a linear combination (with known, fixed coefficients <math> \alpha_k </math> ) of twenty <math> N(0,1) </math> deviates. That is, <math>  X = \sum_{k=1}^{20} \alpha_k T_k where T_k \sim N(0,1) </math>. How can you most simply form a t-value-squared (that is, something distributed as <math> \text{Chisquare}(1) </math>  from X? For some particular choice of <math> \alpha_k's </math> (random is ok), generate a sample of <math>x</math>'s, plot their histogram, and show that it agrees with <math> \text{Chisquare}(1)</math>.
  
2. From some matrix of known coefficients \alpha_{ik} with k=1,\ldots,20 and i = 1,\ldots,100, generate 100 R.V.s X_i = \sum_{k=1}^{20} \alpha_{ik} T_k where T_k \sim N(0,1). In other words, you are expanding 20 i.i.d. T_k's into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as  \text{Chisquare}(\nu)? What is the value of \nu? Use enough samples so that you could distinguish between \nu and \nu-1.
+
2. From some matrix of known coefficients <math> \alpha_{ik} </math> with <math> k=1,\ldots,20 </math> and <math> i = 1,\ldots,100 </math>, generate 100 R.V.s <math>X_i = \sum_{k=1}^{20} \alpha_{ik} T_k </math> where <math>T_k \sim N(0,1) </math>. In other words, you are expanding 20 i.i.d. T_k's into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as  <math>\text{Chisquare}(\nu) </math>? What is the value of <math>\nu </math>? Use enough samples so that you could distinguish between <math>\nu </math>and <math>\nu-1 </math>.
  
3. Reproduce the table of critical \Delta\chi^2 values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)
+
3. Reproduce the table of critical <math>\Delta\chi^2 </math> values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)
  
  
 
<b> To Think About </b>
 
<b> To Think About </b>
1. Design a numerical experiment to exemplify the assertions on slide 8, namely that \chi^2_{min} varies by \pm\sqrt{2\nu} from data set to data set, but varies only by \pm O(1) as the fitted parameters \mathbf b vary within their statistical uncertainty?
 
  
2. Suppose you want to estimate the central value \mu of a sample of N values drawn from \text{Cauchy}(\mu,\sigma). If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as N^{-1/2}? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.
+
1. Design a numerical experiment to exemplify the assertions on slide 8, namely that <math>\chi^2_{min}</math> varies by <math>\pm\sqrt{2\nu} </math> from data set to data set, but varies only by <math>\pm O(1) </math> as the fitted parameters <math> \mathbf b </math> vary within their statistical uncertainty?
 +
 
 +
2. Suppose you want to estimate the central value <math>\mu </math> of a sample of <math>N </math> values drawn from <math>\text{Cauchy}(\mu,\sigma) </math>. If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as <math> N^{-1/2} </math>? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.
 +
 
 +
<b> Back To: </b> [[Eleisha Jackson]]

Latest revision as of 11:44, 3 April 2014

To Calculate

1. Let X be an R.V. that is a linear combination (with known, fixed coefficients Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_k } ) of twenty Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N(0,1) } deviates. That is, Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X = \sum_{k=1}^{20} \alpha_k T_k where T_k \sim N(0,1) } . How can you most simply form a t-value-squared (that is, something distributed as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Chisquare}(1) } from X? For some particular choice of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_k's } (random is ok), generate a sample of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x} 's, plot their histogram, and show that it agrees with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Chisquare}(1)} .

2. From some matrix of known coefficients Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \alpha_{ik} } with Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k=1,\ldots,20 } and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle i = 1,\ldots,100 } , generate 100 R.V.s Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle X_i = \sum_{k=1}^{20} \alpha_{ik} T_k } where Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle T_k \sim N(0,1) } . In other words, you are expanding 20 i.i.d. T_k's into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Chisquare}(\nu) } ? What is the value of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \nu } ? Use enough samples so that you could distinguish between Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \nu } and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \nu-1 } .

3. Reproduce the table of critical Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Delta\chi^2 } values shown in slide 7. Hint: Go back to segment 21 and listen to the exposition of slide 7. (My solution is 3 lines in Mathematica.)


To Think About

1. Design a numerical experiment to exemplify the assertions on slide 8, namely that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \chi^2_{min}} varies by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pm\sqrt{2\nu} } from data set to data set, but varies only by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pm O(1) } as the fitted parameters Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathbf b } vary within their statistical uncertainty?

2. Suppose you want to estimate the central value Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mu } of a sample of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N } values drawn from Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \text{Cauchy}(\mu,\sigma) } . If your estimate is the mean of your sample, does the "universal rule of thumb" (slide 2) hold? That is, does the accuracy get better as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle N^{-1/2} } ? Why or why not? What if you use the median of your sample as the estimate? Verify your answers by numerical experiments.

Back To: Eleisha Jackson