Segment 7 Sanmit Narvekar

From Computational Statistics Course Wiki
Jump to navigation Jump to search

Segment 7

To Calculate

1. Prove the result of slide 3 the "mechanical way" by setting the derivative of something equal to zero, and solving.

Fairly straightforward. We start by taking the derivative of the original equation (on slide 3) with respect to a, and setting it equal to 0:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{d}{da} \left( (\langle x^2 \rangle - \langle x \rangle^2) + ( \langle x \rangle - a)^2 \right) = 0}

The first term in the inner parenthesis does not depend on a, so we can remove it. We also expand the second inner term:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \frac{d}{da} ( \langle x \rangle^2 - 2a \langle x \rangle + a^2 )= 0}

Again, the first term disappears since it doesn't depend on a. We take the derivate of the other terms in the usual way:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle -2 \langle x \rangle + 2a = 0}

And simplify to obtain the final result:

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \langle x \rangle = a}


2. Give an example of a function Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(x)} , with a maximum at Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle x=0} , whose third moment Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle M_3} exists, but whose fourth moment Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle M_4} doesn't exist.


3. List some good and bad things about using the median instead of the mean for summarizing a distribution's central value.

Advantages of median:

  • Less sensitive to outliers. For example, if all your data was centered at 0, except one point which was at 1 million, the mean would be slightly skewed towards the outlier (the amount depends on how many 0s there are). However, the median would accurately capture that most of the data is around 0.

Disadvantages of median:

  • Consider a distribution with 2 "peaks". If you wanted to get a sense of where the data was centered, the mean would capture the middle of these two peaks. However, the median would fall on one or the other peak, depending on which had more mass.


To Think About

1. This segment assumed that Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(x)} is a known probability distribution. But what if you know Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle p(x)} only experimentally. That is, you can draw random values of x from the distribution. How would you estimate its moments?

Estimating the mean (first moment) is as simple as averaging all the x values you get from sampling from P. It seems like the higher moments can be found by averaging the values of powers of x. For example, the second moment would be the average of x^2 as drawn from P. Not completely sure about this...


2. High moments (e.g., 4 or higher) are algebraically pretty, but they are rarely useful because they are very hard to measure accurately in experimental data. Why is this true?


3. Even knowing that it is useless, how would you find the formula for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I_8} , the eighth semi-invariant?

Use the cumulant, as referenced in the slides.

Comments