We have seen the approximation of functions by interpolation. In interpolation, we assumed that our data was accurate. For example, in some interpolation exercises, we obtained the data from the expression of the function that we wanted to approximate.
Data fitting is another function approximation technique where we generally assume that our data contains errors. This is the case, for example, of data obtained through measurements in experiments. Sometimes we know the underlying function because there is a proven physical law, and in that case, we will try to use that function to approximate the data. In other cases, we do not know the function, and we will try to approximate with generic functions, such as polynomials or trigonometric functions.
The simplest example of fit or regression is the least-squares regression line. The line or polynomial of degree one, is the function we fit. Least squares refer to the function we use to evaluate the error of the fitting. In this course, it will be the only error function we will use. The advantage of using the least-squares error with linear functions as fitting functions, which will also be the only ones we will see in this course, is that the solution to the problem is a system of linear equations.
Given the data
$$\begin{array}{|c|ccccc|} \hline x & 0 & 1 & 2 & 3 & 4 \\ \hline y & 2 & 5 & 8 & 13 & 18 \\ \hline \end{array} $$The least-squares regression or data fitting problem can be posed in different ways:
In this course, the adequate approach is the first.
The line is a polynomial of degree one:
$$ P_{1}(x)=a_{0}+a_{1}x $$Our unknowns are $a_0$ and $a_1.$ We want to calculate the line that minimizes the sum of the quadratic residuals (errors).
(Error and residual are similar ideas but they are used in different contexts. Now we are going to build the model, the regression line, and the difference between the values ​​of our data and the corresponding value of the model is called residue. When we use this model to predict the value of the function we will talk about errors. But in both cases, it is $y_i-P_1 (x_i).$ If $x_i$ was one of the points we built the model with, this difference is a residual. If $x_i$ is a new point, it will be an error.)
with
To find the minimum error we calculate the partial derivatives with respect to the two variables and set them equal to zero
Then
that is
We solve the system by Gauss: the second equation $e_{2}\rightarrow e_{2}-2e_{1}$
and using backward substitution
And the least-squares regression line is
The line is
$$ P_{1}(x)=a_{0}+a_{1}x $$We want to calculate the line that minimizes the sum of the squared errors:
$$ E(a_{0},a_{1})=\sum_{k=1}^{5}(P_{1}(x_{k})-y_{k})^{2}=\sum_{k=1}^{5}(a_{0}+a_{1}x_{k}-y_{k})^{2}. $$To find the minimum error we calculate the partial derivatives with respect to the two variables and set them equal to zero:
$$ \begin{eqnarray*} \dfrac{\partial E}{\partial a_{0}} & = & \sum_{k=1}^{5}2(a_{0}+a_{1}x_{k}-y_{k})=0\\ \dfrac{\partial E}{\partial a_{1}} & = & \sum_{k=1}^{5}2(a_{0}+a_{1}x_{k}-y_{k})x_{k}=\sum_{k=1}^{5}2(a_{0}x_{k}+a_{1}x_{k}^{2}-x_{k}y_{k})=0 \end{eqnarray*} $$That is
$$ \begin{array}{ccccc} a_{0}\sum_{k=1}^{5}1 & + & a_{1}\sum_{k=1}^{5}x_{k} & = & \sum_{k=1}^{5}y_{k}\\ a_{0}\sum_{k=1}^{5}x_{k} & + & a_{1}\sum_{k=1}^{5}x_{k}^{2} & = & \sum_{k=1}^{5}x_{k}y_{k} \end{array} $$Expressed in matrix form is
$$ \left(\begin{array}{cc} \sum_{k=1}^{5}1 & \sum_{k=1}^{5}x_{k}\\ \sum_{k=1}^{5}x_{k} & \sum_{k=1}^{5}x_{k}^{2} \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \sum_{k=1}^{5}y_{k}\\ \sum_{k=1}^{5}x_{k}y_{k} \end{array}\right) $$We compute the elements of these matrices
$$ \begin{array}{c|c|c|c|c|c|} \hline & 1 & x_{k} & x_{k}^{2} & y_k & x_{k}\,y_k\\ \hline & 1 & 0 & 0 & 2 & 0\\ & 1 & 1 & 1 & 5 & 5\\ & 1 & 2 & 4 & 8 & 16\\ & 1 & 3 & 9 & 13 & 39\\ & 1 & 4 & 16 & 18 & 72\\ \hline \sum & 5 & 10 & 30 & 46 & 132\\ \hline \end{array} $$And we substitute them in the system
$$ \begin{array}{ccccc} 5a_{0} & + & 10a_{1} & = & 46\\ 10a_{0} & + & 30a_{1} & = & 132 \end{array} $$And the least-squares regression line is
$$ P_{1}(x)=1.2+4x $$The least-squares approximation problem can also be posed as an orthogonal projection on a subspace of functions with base B.
We want to approximate the points using the basis of polynomial functions
$$ B=\left\{Q_0(x),Q_1(x)\right\} =\left\{1,x\right\} $$That is, we want to obtain a polynomial $$ P_1(x)=a_{0} \cdot Q_0(x) + a_{1} \cdot Q_1(x)= a_{0} \cdot 1 + a_{1} \cdot x $$
We obtain the coefficients $a_0$ and $a_1$ as a solution of the linear system
$$ \left(\begin{array}{cc} \left\langle Q_0,Q_0\right\rangle & \left\langle Q_0,Q_1\right\rangle \\ \left\langle Q_1,Q_0\right\rangle & \left\langle Q_1,Q_1\right\rangle \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \left\langle Q_0,f(x)\right\rangle\\ \left\langle Q_1,f(x)\right\rangle \end{array}\right) $$In the discrete case, the most common dot product is
$$ \left\langle g(x),h(x)\right\rangle=\sum_{k=1}^{n} g(x_k)h(x_k)$$That for the previous case would be
$$ \left(\begin{array}{cc} \sum_{k=1}^{5}1\cdot 1 & \sum_{k=1}^{5}1\cdot x_{k}\\ \sum_{k=1}^{5}x_{k}\cdot 1 & \sum_{k=1}^{5}x_{k}\cdot x_{k} \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \sum_{k=1}^{5}1\cdot y_{k}\\ \sum_{k=1}^{5}x_{k}\cdot y_{k} \end{array}\right) $$Or
$$ \left(\begin{array}{cc} \sum_{k=1}^{5}1 & \sum_{k=1}^{5}x_{k}\\ \sum_{k=1}^{5}x_{k} & \sum_{k=1}^{5}x_{k}^{2} \end{array}\right)\left(\begin{array}{c} a_{0}\\ a_{1} \end{array}\right)=\left(\begin{array}{c} \sum_{k=1}^{5}y_{k}\\ \sum_{k=1}^{5}x_{k}y_{k} \end{array}\right) $$Substituting the data and operating $$ \begin{array}{ccccc} 5a_{0} & + & 10a_{1} & = & 46\\ 10a_{0} & + & 30a_{1} & = & 132 \end{array} $$
And the least-squares regression line is
$$ P_1(x)=1.2+4x $$Why didn't we solve the problem using the residuals in absolute value instead of the squared errors? Because, as we saw, the derivative of the quadratic error is a linear function, and we obtain several linear equations and the solution is given by a linear system. That is, the solution is simple and fast.
However, if we use the sum of the residuals in absolute value, the function $E$ contains absolute values, which is a function that is not always differentiable, that is, it is not smooth, and the solution would be more complicated.
Currently, the absolute value is used in the error function (and many other error functions), thanks to the existence of computers, which makes that long calculations are not a problem.
We have minimized the sum of the squared residuals $r_k$.