# Segment 20: Nonlinear Least Squares Fitting - 3/24/2012

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

## Contents

### Problem 1

(See lecture slide 3.) For one-dimensional $x$, the model $y(x | \mathbf b)$ is called "linear" if $y(x | \mathbf b) = \sum_k b_k X_k(x)$, where $X_k(x)$ are arbitrary known functions of $x$. Show that minimizing $\chi^2$ produces a set of linear equations (called the "normal equations") for the parameters $b_k$.

We have our chi-square where $\sum_i^{N}(\frac{y_i - \sum_k^M b_k X_k(x)}{\sigma_i^2})^2$
To minimize our chi-square, we take the derivative of it in respect to $b_k$ and set it equal to 0: 0 = 2$\sum_i^{N}(\frac{y_i - \sum_k^M b_k X_k(x)}{\sigma_i^2})*X_k(k)$
By multiplying $\sum_k^M b_k X_k(x) \text{ by }X_k(k)$, we can represent that as $A*A^T$, which is similar to noraml equations

### Problem 2

A simple example of a linear model is $y(x | \mathbf b) = b_0 + b_1 x$, which corresponds to fitting a straight line to data. What are the MLE estimates of $b_0$ and $b_1$ in terms of the data: $x_i$'s, $y_i$'s, and $\sigma_i$'s?
So we this chi-square function to get the MLE estimates of our b vector $\sum_i^{N}(\frac{y_i - b_0 - b_1x_i}{\sigma_i})^2$
To minimize this, we will need the partial derivatives in respect to b0 and b1 and set those equal to zero:
$\frac{df}{db_0} = 2\sum_i^{N}\frac{y_i - b_0 - b_1x_i}{\sigma_i}\frac{-1}{\sigma_i}$ = 0
$\frac{df}{db_1} = 2\sum_i^{N}\frac{y_i - b_0 - b_1x_i}{\sigma_i}\frac{-x_i}{\sigma_i}$ = 0

To make our lives simplier, I will split up the summations and set them equal to variables in order to make them look a little cleaner:

$\frac{df}{db_0} = \sum_i^{N}\frac{y_i}{\sigma_i^2} - b_0\sum_i^{N}\frac{1}{\sigma_i^2} - b_1\sum_i^{N}\frac{x_i}{\sigma_i^2}$ = 0
$\frac{df}{db_1} = \sum_i^{N}\frac{x_iy_i}{\sigma_i^2} - b_0\sum_i^{N}\frac{x_i}{\sigma_i^2} - b_1\sum_i^{N}\frac{x_i^2}{\sigma_i^2}$ = 0
$\text{Let S = }\sum_i^{N}\frac{1}{\sigma_i^2},\text{Sx = } \sum_i^{N}\frac{x_i}{\sigma_i^2}, \text{Sy = }\sum_i^{N}\frac{y_i}{\sigma_i^2}, \text{Sxy = }\sum_i^{N}\frac{x_iy_i}{\sigma_i^2}, \text{ Sxx = }\sum_i^{N}\frac{x_i^2}{\sigma_i^2}$

Sy = b0*S + b1*Sx
Sxy = b0*Sx + b1*Sxx

To make b0 and b1 in terms of x's, y's and sigma's we simple solve for one variable, substitute it, and resubstitute it in the other equation.
$b_0 = \frac{S_{xx}S_y - S_xS_{xy}}{SS_{xx}-(S_x)^2}$

$b_1 = \frac{S_{xy}S - S_xS_y}{SS_{xx}-(S_x)^2}$

We often rather casually assume a uniform prior $P(\mathbf b)= \text{constant}$ on the parameters $\mathbf b$. If the prior is not uniform, then is minimizing $\chi^2$ the right thing to do? If not, then what should you do instead? Can you think of a situation where the difference would be important?
No I don't think it is the right thing to do because when the prior is not uniform, it impacts the distribution. We will need to do $exp(\frac{-1}{2}\Chi^2)P(b)$, where P(b) is the distribution that our b parameters were drawn from
What if, in lecture slide 2, the measurement errors were $e_i \sim \text{Cauchy}(0,\sigma_i)$ instead of $e_i \sim N(0,\sigma_i)$? How would you find MLE estimates for the parameters $\mathbf b$?