# Segment 30 Sanmit Narvekar

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

#### To Calculate

1. For a set of positive values $\displaystyle \{x_i\}$ , use Jensen's inequality to show (a) the mean of their square is never less than the square of their mean, and (b) their (arithmetic) mean is never less than their harmonic mean.

Part A

This is basically applying Jensen's inequality to the function $\displaystyle f(x) = x^2$ , which leads the statement to hold since x^2 is a convex function. Specifically, consider the graph of x^2. The mean of the squares would be like a value on a segment connecting 2 points on the parabola, and the square of the mean would be the point on the parabola in the middle of those points. Since the segment will always be higher than the point on the parabola, the statement holds.

More formally:

$\displaystyle \left( \frac{1}{N} \sum x_i \right)^2 = f \left( \frac{1}{N} \sum x_i \right) \le \frac{1}{N} \sum f(x_i) = \frac{1}{N} \sum x_i^2$

Which is the proof as desired.

Part B

The harmonic mean is:

$\displaystyle H = \frac{N}{\sum\frac{1}{x_i}}$

We define our function to be $\displaystyle f(x) = \frac{1}{x}$ . Then,

$\displaystyle H = f \left(\frac{1}{N} \sum \frac{1}{x_i} \right)$

By Jensen's inequality:

$\displaystyle H \le \frac{1}{N} \sum f \left( \frac{1}{x_i} \right) = \frac{1}{N} \sum x_i$

Which is the arithmetic mean. Thus the arithmetic mean is always larger than the harmonic mean.

2. Sharpen the argument about termination of E-M methods that was given in slide 4, as follows: Suppose that $\displaystyle g(x) \ge f(x)$ for all $\displaystyle x$ , for some two functions $\displaystyle f$ and $\displaystyle g$ . Prove that, at any local maximum $\displaystyle x_m$ of $\displaystyle f$ , one of these two conditions must hold: (1) $\displaystyle g(x_m) > f(x_m)$ [in which case the E-M algorithm has not yet terminated], or (2) $\displaystyle g(x_m)$ is a local maximum of $\displaystyle g$ [in which case the E-M algorithm terminates at a maximum of $\displaystyle g$ , as advertised]. You can make any reasonable assumption about continuity of the functions.

By our assumption, $\displaystyle g(x) \ge f(x)$ for all $\displaystyle x$ . Consider a local maximum of f at x_m. First consider the case where $\displaystyle g(x_m) = f(x_m)$ . If we assume that both functions are continuous, then their derivatives at x_m must be 0, since f(x_m) is a local maxima, and if g's derivative at that point were not zero, then f(x_m) would either not be continuous, or it wouldn't equal g(x_m).

The other case is that $\displaystyle g(x_m) > f(x_m)$ , since g has to be greater at all points. In this case, EM has not terminated, since it waits until f(x_m) = g(x_m).

2. So slide 4 proves that some function is less than the actual function of interest, namely $\displaystyle L(\theta)$ . What makes this such a powerful idea?