# Segment 20: Nonlinear Least Squares Fitting - 3/24/2012

### Problem 1

**(See lecture slide 3.) For one-dimensional <math>x</math>, the model <math>y(x | \mathbf b)</math> is called "linear" if <math>y(x | \mathbf b) = \sum_k b_k X_k(x)</math>, where <math>X_k(x)</math> are arbitrary known functions of <math>x</math>. Show that minimizing <math>\chi^2</math> produces a set of linear equations (called the "normal equations") for the parameters <math>b_k</math>.**

We have our chi-square where <math>\sum_i^{N}(\frac{y_i - \sum_k^M b_k X_k(x)}{\sigma_i^2})^2</math>

To minimize our chi-square, we take the derivative of it in respect to <math>b_k</math> and set it equal to 0: 0 = 2<math>\sum_i^{N}(\frac{y_i - \sum_k^M b_k X_k(x)}{\sigma_i^2})*X_k(k)</math>

By multiplying <math>\sum_k^M b_k X_k(x) \text{ by }X_k(k)</math>, we can represent that as <math> A*A^T</math>, which is similar to noraml equations

### Problem 2

**A simple example of a linear model is <math>y(x | \mathbf b) = b_0 + b_1 x</math>, which corresponds to fitting a straight line to data. What are the MLE estimates of <math>b_0</math> and <math>b_1</math> in terms of the data: <math>x_i</math>'s, <math>y_i</math>'s, and <math>\sigma_i</math>'s?**

So we this chi-square function to get the MLE estimates of our b vector <math>\sum_i^{N}(\frac{y_i - b_0 - b_1x_i}{\sigma_i})^2</math>

To minimize this, we will need the partial derivatives in respect to b0 and b1 and set those equal to zero:

<math>\frac{df}{db_0} = 2\sum_i^{N}\frac{y_i - b_0 - b_1x_i}{\sigma_i}\frac{-1}{\sigma_i}</math> = 0

<math>\frac{df}{db_1} = 2\sum_i^{N}\frac{y_i - b_0 - b_1x_i}{\sigma_i}\frac{-x_i}{\sigma_i}</math> = 0

To make our lives simplier, I will split up the summations and set them equal to variables in order to make them look a little cleaner:

<math>\frac{df}{db_0} = \sum_i^{N}\frac{y_i}{\sigma_i^2} - b_0\sum_i^{N}\frac{1}{\sigma_i^2} - b_1\sum_i^{N}\frac{x_i}{\sigma_i^2}</math> = 0

<math>\frac{df}{db_1} = \sum_i^{N}\frac{x_iy_i}{\sigma_i^2} - b_0\sum_i^{N}\frac{x_i}{\sigma_i^2} - b_1\sum_i^{N}\frac{x_i^2}{\sigma_i^2}</math> = 0

<math>\text{Let S = }\sum_i^{N}\frac{1}{\sigma_i^2},\text{Sx = } \sum_i^{N}\frac{x_i}{\sigma_i^2}, \text{Sy = }\sum_i^{N}\frac{y_i}{\sigma_i^2}, \text{Sxy = }\sum_i^{N}\frac{x_iy_i}{\sigma_i^2}, \text{ Sxx = }\sum_i^{N}\frac{x_i^2}{\sigma_i^2}</math>

Sy = b0*S + b1*Sx

Sxy = b0*Sx + b1*Sxx

To make b0 and b1 in terms of x's, y's and sigma's we simple solve for one variable, substitute it, and resubstitute it in the other equation.

<math> b_0 = \frac{S_{xx}S_y - S_xS_{xy}}{SS_{xx}-(S_x)^2} </math>

<math> b_1 = \frac{S_{xy}S - S_xS_y}{SS_{xx}-(S_x)^2} </math>

### To Think About 1

**We often rather casually assume a uniform prior <math>P(\mathbf b)= \text{constant}</math> on the parameters <math>\mathbf b</math>. If the prior is not uniform, then is minimizing <math>\chi^2</math> the right thing to do? If not, then what should you do instead? Can you think of a situation where the difference would be important?**

No I don't think it is the right thing to do because when the prior is not uniform, it impacts the distribution. We will need to do <math> exp(\frac{-1}{2}\Chi^2)P(b)</math>, where P(b) is the distribution that our b parameters were drawn from

### To Think about 2

**What if, in lecture slide 2, the measurement errors were <math>e_i \sim \text{Cauchy}(0,\sigma_i)</math> instead of <math>e_i \sim N(0,\sigma_i)</math>? How would you find MLE estimates for the parameters <math>\mathbf b</math>?**

We would work it in the exact same way but we would use the formula above instead