Data fitting is another function approximation technique where we generally assume that our data contains errors. This is the case, for example, of data obtained through measurements in experiments. Sometimes we know the underlying function because there is a proven physical law, and in that case, we will try to use that function to approximate the data. In other cases, we do not know the function, and we will try to approximate with generic functions, such as polynomials or trigonometric functions.

The simplest example of fit or regression is the least-squares regression line. The line or polynomial of degree one, is the function we fit. Least squares refer to the function we use to evaluate the error of the fitting. In this course, it will be the only error function we will use. The advantage of using the least-squares error with linear functions as fitting functions, which will also be the only ones we will see in this course, is that the solution to the problem is a system of linear equations.

Exercise¶

Given the data

$$\begin{array}{|c|ccccc|} \hline x & 0 & 1 & 2 & 3 & 4 \\ \hline y & 2 & 5 & 8 & 13 & 18 \\ \hline \end{array} $$

Compute the least-squares regression line.
Compute also the residuals.

The least-squares regression or data fitting problem can be posed in different ways:

As an optimization problem.
As a statistical problem.
As an algebra problem.

In this course, the adequate approach is the first.

Expanded notation¶

The line is a polynomial of degree one:

$$ P_{1}(x)=a_{0}+a_{1}x $$

Our unknowns are $a_0$ and $a_1.$ We want to calculate the line that minimizes the sum of the quadratic residuals (errors).

(Error and residual are similar ideas but they are used in different contexts. Now we are going to build the model, the regression line, and the difference between the values of our data and the corresponding value of the model is called residue. When we use this model to predict the value of the function we will talk about errors. But in both cases, it is $y_i-P_1 (x_i).$ If $x_i$ was one of the points we built the model with, this difference is a residual. If $x_i$ is a new point, it will be an error.)

$$ E = r_1^2+r_2^2+r_3^2+r_4^2+r_5^2 $$

with

$$r_1 = y_1-P_1(x_1) \quad r_2 = y_2-P_1(x_2) \quad \ldots \quad r_5 = y_5-P_1(x_5)$$

It seems that it would make more sense to minimize $E = |r_1|+|r_2|+|r_3|+|r_4|+|r_5|,$ which would be the sum of distances. But this is a more difficult problem.
By squaring these distances, we guarantee that they all are positive and that positive residuals are not offset by negative ones.
Since $ E $ is a sum of squares and we are going to minimize this function, the line that we will obtain is called least-squares regression line.
The notation for the points we are adjusting now starts at 1. The points are $x_1,$ $x_2,$ $x_3,$ $x_4$ and $x_5$ and similarly the $y$.
The total error $E$ depends on $a_0$ and $a_0$, which are the unknowns.

$$E = r_1^2+r_2^2+r_3^2+r_4^2+r_5^2\qquad r_k = y_k-P_1(x_k)\qquad \begin{array}{|c|ccccc|} \hline x & 0 & 1 & 2 & 3 & 4 \\ \hline y & 2 & 5 & 8 & 13 & 18 \\ \hline \end{array}\qquad P_{1}(x)=a_{0}+a_{1}x $$

$$ \begin{eqnarray*} E(a_{0},a_{1}) & = & (P_{1}(0)-2)^{2}+(P_{1}(1)-5)^{2}+(P_{1}(2)-8)^{2}+(P_{1}(3)-13)^{2}+(P_{1}(4)-18)^{2}=\\ & = & (a_{0}+a_{1}(0)-2)^{2}+(a_{0}+a_{1}(1)-5)^{2}+(a_{0}+a_{1}(2)-8)^{2}+\\ & & (a_{0}+a_{1}(3)-13)^{2}+(a_{0}+a_{1}(4)-18)^{2} \end{eqnarray*} $$

To find the minimum error we calculate the partial derivatives with respect to the two variables and set them equal to zero

$$ \begin{eqnarray*} \dfrac{\partial E}{\partial a_{0}} & = & 2(a_{0}+a_{1}(0)-2)+2(a_{0}+a_{1}(1)-5)+2(a_{0}+a_{1}(2)-8)+\\ & & 2(a_{0}+a_{1}(3)-13)+2(a_{0}+a_{1}(4)-18)=0\\[0.4cm] \dfrac{\partial E}{\partial a_{1}} & = & 2(a_{0}(0)+a_{1}(0)^{2}-(0)(2))+2(a_{0}(1)+a_{1}(1)^{2}-(1)(5))+\\ & + & 2(a_{0}(2)+a_{1}(2)^{2}-(2)(8))+2(a_{0}(3)+a_{1}(3)^{2}-(3)(13))+\\ & + & 2(a_{0}(4)+a_{1}(4)^{2}-(4)(18))=0 \end{eqnarray*} $$

Then

$$ \begin{array}{ccccc} a_{0}(1+1+1+1+1) & + & a_{1}(0+1+2+3+4) & = & 2+5+8+13+18\\ a_{0}(0+1+2+3+4) & + & a_{1}(0^{2}+1^{2}+2^{2}+3^{2}+4^{2}) & = & (0)(2)+(1)(5)+(2)(8)+(3)(13)+(4)(18) \end{array} $$

that is

$$ \begin{array}{ccccc} 5a_{0} & + & 10a_{1} & = & 46\\ 10a_{0} & + & 30a_{1} & = & 132 \end{array} $$

We solve the system by Gauss: the second equation $e_{2}\rightarrow e_{2}-2e_{1}$

$$ \begin{array}{ccccc} 5a_{0} & + & 10a_{1} & = & 46\\ & & 10a_{1} & = & 40 \end{array} $$

and using backward substitution

$$ \begin{array}{ccccc} a_{1} & = & 40/10 & = & 4\\ a_{0} & = & (46-10a_{1})/5 & = & 1.2 \end{array} $$

And the least-squares regression line is

$$ P_{1}(x)=1.2+4x $$

Compressed notation¶

The line is

$$ P_{1}(x)=a_{0}+a_{1}x $$

We want to calculate the line that minimizes the sum of the squared errors:

$$ E(a_{0},a_{1})=\sum_{k=1}^{5}(P_{1}(x_{k})-y_{k})^{2}=\sum_{k=1}^{5}(a_{0}+a_{1}x_{k}-y_{k})^{2}. $$

To find the minimum error we calculate the partial derivatives with respect to the two variables and set them equal to zero:

$$ \begin{eqnarray*} \dfrac{\partial E}{\partial a_{0}} & = & \sum_{k=1}^{5}2(a_{0}+a_{1}x_{k}-y_{k})=0\\ \dfrac{\partial E}{\partial a_{1}} & = & \sum_{k=1}^{5}2(a_{0}+a_{1}x_{k}-y_{k})x_{k}=\sum_{k=1}^{5}2(a_{0}x_{k}+a_{1}x_{k}^{2}-x_{k}y_{k})=0 \end{eqnarray*} $$

That is

$$ \begin{array}{ccccc} a_{0}\sum_{k=1}^{5}1 & + & a_{1}\sum_{k=1}^{5}x_{k} & = & \sum_{k=1}^{5}y_{k}\\ a_{0}\sum_{k=1}^{5}x_{k} & + & a_{1}\sum_{k=1}^{5}x_{k}^{2} & = & \sum_{k=1}^{5}x_{k}y_{k} \end{array} $$

Expressed in matrix form is

$$ \left(\begin{array}{cc} \sum_{k=1}^{5}1 & \sum_{k=1}^{5}x_{k}\\ \sum_{k=1}^{5}x_{k} & \sum_{k=1}^{5}x_{k}^{2} \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \sum_{k=1}^{5}y_{k}\\ \sum_{k=1}^{5}x_{k}y_{k} \end{array}\right) $$

We compute the elements of these matrices

$$ \begin{array}{c|c|c|c|c|c|} \hline & 1 & x_{k} & x_{k}^{2} & y_k & x_{k}\,y_k\\ \hline & 1 & 0 & 0 & 2 & 0\\ & 1 & 1 & 1 & 5 & 5\\ & 1 & 2 & 4 & 8 & 16\\ & 1 & 3 & 9 & 13 & 39\\ & 1 & 4 & 16 & 18 & 72\\ \hline \sum & 5 & 10 & 30 & 46 & 132\\ \hline \end{array} $$

And we substitute them in the system

$$ \begin{array}{ccccc} 5a_{0} & + & 10a_{1} & = & 46\\ 10a_{0} & + & 30a_{1} & = & 132 \end{array} $$

And the least-squares regression line is

$$ P_{1}(x)=1.2+4x $$

As a projection problem¶

The least-squares approximation problem can also be posed as an orthogonal projection on a subspace of functions with base B.

We want to approximate the points using the basis of polynomial functions

$$ B=\left\{Q_0(x),Q_1(x)\right\} =\left\{1,x\right\} $$

That is, we want to obtain a polynomial $$ P_1(x)=a_{0} \cdot Q_0(x) + a_{1} \cdot Q_1(x)= a_{0} \cdot 1 + a_{1} \cdot x $$

We obtain the coefficients $a_0$ and $a_1$ as a solution of the linear system

$$ \left(\begin{array}{cc} \left\langle Q_0,Q_0\right\rangle & \left\langle Q_0,Q_1\right\rangle \\ \left\langle Q_1,Q_0\right\rangle & \left\langle Q_1,Q_1\right\rangle \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \left\langle Q_0,f(x)\right\rangle\\ \left\langle Q_1,f(x)\right\rangle \end{array}\right) $$

In the discrete case, the most common dot product is

$$ \left\langle g(x),h(x)\right\rangle=\sum_{k=1}^{n} g(x_k)h(x_k)$$

That for the previous case would be

$$ \left(\begin{array}{cc} \sum_{k=1}^{5}1\cdot 1 & \sum_{k=1}^{5}1\cdot x_{k}\\ \sum_{k=1}^{5}x_{k}\cdot 1 & \sum_{k=1}^{5}x_{k}\cdot x_{k} \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \sum_{k=1}^{5}1\cdot y_{k}\\ \sum_{k=1}^{5}x_{k}\cdot y_{k} \end{array}\right) $$

Or

$$ \left(\begin{array}{cc} \sum_{k=1}^{5}1 & \sum_{k=1}^{5}x_{k}\\ \sum_{k=1}^{5}x_{k} & \sum_{k=1}^{5}x_{k}^{2} \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \sum_{k=1}^{5}y_{k}\\ \sum_{k=1}^{5}x_{k}y_{k} \end{array}\right) $$

Substituting the data and operating $$ \begin{array}{ccccc} 5a_{0} & + & 10a_{1} & = & 46\\ 10a_{0} & + & 30a_{1} & = & 132 \end{array} $$

And the least-squares regression line is

$$ P_1(x)=1.2+4x $$

Why didn't we solve the problem using the residuals in absolute value instead of the squared errors? Because, as we saw, the derivative of the quadratic error is a linear function, and we obtain several linear equations and the solution is given by a linear system. That is, the solution is simple and fast.

However, if we use the sum of the residuals in absolute value, the function $E$ contains absolute values, which is a function that is not always differentiable, that is, it is not smooth, and the solution would be more complicated.

Currently, the absolute value is used in the error function (and many other error functions), thanks to the existence of computers, which makes that long calculations are not a problem.

Compute the residuals¶

We have minimized the sum of the squared residuals $r_k$.

$$ \begin{array}{ccccc} \hline x_{k} & P_1(x_{k}) & y_k & r_k = y_k-P_1(x_{k}) &r_k^2\\ \hline 0 & 1.2 & 2 & 0.8 & 0.64\\ 1 & 5.2 & 5 & -0.2 & 0.04\\ 2 & 9.2 & 8 & -1.2 & 1.44\\ 3 & 13.2 & 13 & -0.2 & 0.04\\ 4 & 17.2 & 18 & 0.8 & 0.64\\ \hline & & & \sum & 2.80\\ \end{array} $$

Contents

Introduction¶

Data fitting¶

Exercise¶

Expanded notation¶

Compressed notation¶

As a projection problem¶

Compute the residuals¶