Dan's Segment 7

From Computational Statistics (CSE383M and CS395T)
Jump to navigation Jump to search

To Calculate

1. We have our function <math>F(x) = x^2 - 2ax + a^2</math> and we want to take its derivative w.r.t. a, so we get <math>f(x) = -2x + 2a = 0</math> and the solution is clearly a=x.

3. Good: the first big positive about the median over the mean is that the median always exists whereas the mean does not for some distributions. The other big advantage is that the median is more resistant to outliers in many situations. For example, if you have a data set that looks like this: (0,1,0,2,3,0,1,1000) your mean is going to be way bigger than the likely center of the distribution, while the median will still reside close to the center.

Bad: The median assumes that the data is uniformly distributed and that you have sufficient data. Without a lot of data, the median can easily come far away from the actual population center. For instance, a data set (1,2,1,300,150) would have a mean of 2, which might not accurately reflect the real population.

To Think About

1. One option is to fit a function to the data and then use that function to calculate the moments. I won't claim to have thought of this myself, but wikipedia tells me that the formula <math>\frac{1}{n}\sum_{i = 1}^{n} X^k_i\,\!</math> will calculate moments with decent accuracy for large samples. I'm kind of surprised this works, but unfortunately wikipedia lists no source or derivation.

2. My guess is that this has to do with error propagation. As you add higher powers into the calculation of a moment, error is compounded which makes the resulting numbers less reliable. I also can't really think of many common situations where you would even care about the higher moments. I'm sure there are specific areas where it is important, but not enough to get people to do anything about it.

Class Activity

This was completed with Sean Trettel,Noah, and Kai. The full solution can be found on Kai's page.