Local minimum. Unconstrained optimization. Numerical methods
Approximate with one iteration the minimum of the function $f\left(x,y\right)=x^{2}+3y^{2}$ with initial guess $(2,1)$ and using
The gradient methods are a set of methods that are based on the use of the gradient of the function at each point.
Let us see examples
Let's find the minimum of a function. Let's think of a function of two variables and therefore we can represent it with a surface.
This process is illustrated in the following graphics
Using the gradient method, let's approximate the minimum of the function $f\left(x,y\right)=x^{2}+3y^{2}$ using as initial guess $(2,1).$
We will iterate with the gradient method using the formula
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-h\nabla f\left(x_{0},y_{0}\right)\qquad (1) $$and we will search $h$ where
$$ g(h)=f\left(x_{0}-h\,f_{x}(x_0,y_0),y_{0}-h\,f_{y}(x_0,y_0)\right) $$has a minimum.
As
$$ \nabla f=\left(f_{x},f_{y}\right)=\left(2x,6y\right) $$and substituting the value of the starting point
$$ \nabla f\left(x_{0},y_{0}\right)=\nabla f\left(2,1\right)=\left(4,6\right). $$and substituting the value of the initial point and the gradient at the initial point in the function $g$
$$ g(h)=f\left(x_{0}-h\,f_{x}(x_0,y_0),y_{0}-h\,f_{y}(x_0,y_0)\right)=f\left(2-4h,1-6h\right)=\left(2-4h\right)^{2}+3\left(1-6h\right)^{2} $$Now we would have to find the minimum of this one variable function in a numerical way. But for the sake of simplicity, we will skip this step and do it analytically.
Let's calculate the minimum of this function taking into account that the necessary condition of minimum is that $g'(h)=0$
$$ g'(h)=2(-4)\left(2-4h\right)+6(-6)\left(1-6h\right)=248h-52=0\Longrightarrow h=0.2097\approx 0.21 $$and using this value in equation (1)
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} 2\\ 1 \end{array}\right)-0.21\left(\begin{array}{c} 4\\ 6 \end{array}\right)=\left(\begin{array}{r} 1.16\\ -0.26 \end{array}\right) $$There has been an improvement because $f\left(x_{0},y_{0}\right)=f(2,1)=7$ and $f\left(x_{1},y_{1}\right)=f(1.16,-0.26)=1.55$ which is a smaller value.
For the next iteration, we use the formula
$$ \left(\begin{array}{c} x_{2}\\ y_{2} \end{array}\right)=\left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)-h\nabla f\left(x_{1},y_{1}\right)\qquad (2) $$We need to compute $h$ again so
$$ g(h)=f\left(x_{1}-h\,f_{x}(x_1,y_1),y_{1}-h\,f_{y}(x_1,y_1)\right) $$is minimum. And then, we substitute this $h$ in (2).
In this case, we do not optimize the step, we always use the same $h$, and this parameter is called $\eta$ or learning rate. We must fix it. If it is too small, the algorithm will converge too slowly and if it is too large it may not converge.
This method has the advantage of its simplicity and the drawback that you have to search for a suitable $\eta$.
At each step, we use the formula
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-\eta\nabla f_{\left(x_{0},y_{0}\right)} $$As
$$ \nabla f=\left(f_{x},f_{y}\right)=\left(2x,6y\right) $$we have
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-\eta \left(\begin{array}{c} 2x_{0}\\ 6y_{0} \end{array}\right) \quad \quad \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} 2\\ 1 \end{array}\right)-0.1 \left(\begin{array}{c} 2(2)\\ 6(1) \end{array}\right)= \left(\begin{array}{c} 1.6\\ 0.4 \end{array}\right) $$$$ f(1.6,0.4)= 3.04 $$$$ \left(\begin{array}{c} x_{2}\\ y_{2} \end{array}\right)=\left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)-\eta \left(\begin{array}{c} 2x_{1}\\ 6y_{1} \end{array}\right) \quad \quad \left(\begin{array}{c} x_{2}\\ y_{2} \end{array}\right)=\left(\begin{array}{c} 1.6\\ 0.4 \end{array}\right)-0.1 \left(\begin{array}{c} 2(1.6)\\ 6(0.4) \end{array}\right)= \left(\begin{array}{c} 1.28\\ 0.16 \end{array}\right) $$$$ f(1.28, 0.16)= 1.7152 $$$$ \left(\begin{array}{c} x_{3}\\ y_{3} \end{array}\right)=\left(\begin{array}{c} x_{2}\\ y_{2} \end{array}\right)-\eta \left(\begin{array}{c} 2x_{2}\\ 6y_{2} \end{array}\right) \quad \quad \left(\begin{array}{c} x_{3}\\ y_{3} \end{array}\right)=\left(\begin{array}{c} 1.28\\ 0.16 \end{array}\right)-0.1 \left(\begin{array}{c} 2(1.28)\\ 6(0.16) \end{array}\right)= \left(\begin{array}{c} 1.024\\ 0.064 \end{array}\right) $$$$ f(1.024, 0.064)= 1.060864 $$The Newton's method to find a root of the nonlinear equation $f(x)=0$, starting from an initial value $x_0$, is
$$ x_{k+1} = x_k-\frac{f(x_k)}{f'(x_k)} $$and we obtain
$$ x_{1} = x_0-\frac{f(x_0)}{f'(x_0)},\qquad x_{2} = x_1-\frac{f(x_1)}{f'(x_1)}, \qquad x_{3} = x_2-\frac{f(x_2)}{f'(x_2)},\quad \ldots \quad \rightarrow \quad \alpha $$If we want to find a maximum or a minimum of the function $ f $, for $f$ smooth, the necessary condition is $$ f'(x)=0 $$
That is, we look for a zero of this equation and, using Newton, we could solve it, taking an initial value $x_0$, and using the formula
$$ x_{k+1} = x_k-\frac{f'(x_k)}{f''(x_k)} $$Similarly, for functions of $ n $ variables $f:\Omega\subset \mathbb{R}^n\to \mathbb{R}$, taking into account that the components of the gradient vector of a function $\nabla f(\mathbf{x})$ are its first partial derivatives, and the components of the Hessian matrix $H(\mathbf{x})$ are the second partial derivatives, we can formulate Newton starting from an initial value $\mathbf{x}_0$
$$ \mathbf{x}_{k+1} = \mathbf{x}_{k}-H^{-1}(\mathbf{x}_{k}) \cdot \nabla f(\mathbf{x}_{k}) $$This is not the construction of the method. In both the one-dimensional and multidimensional cases, Newton's method can be built from the corresponding Taylor formula.
Since our function is a two-variable function, we will iterate with Newton's method using the formula
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-H^{-1}\left(x_{0},y_{0}\right)\cdot\nabla f\left(x_{0},y_{0}\right) $$If we take into account that
$$ H\left(x_{0},y_{0}\right)\left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=\nabla f\left(x_{0},y_{0}\right)\qquad (1) $$then
$$ \left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=H^{-1}\left(x_{0},y_{0}\right)\cdot\nabla f\left(x_{0},y_{0}\right) $$and we can write
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-\left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)\qquad (2) $$where $\left(c_{1},c_{2}\right)^{T}$ is the solution of system (1). In general, it makes more sense to solve system (1) than to compute the inverse matrix $H^{-1}$ and then multiply it by the gradient, because calculating the inverse of a matrix is equivalent to solving $ n $ systems (although with the same matrix of coefficients) and in this way, we are solving only one system.
The function to minimize is $f\left(x,y\right)=x^{2}+3y^{2}$ and the initial guess is $(2,1)$
The gradient is
$$ \nabla f=\left(\frac{\partial f(x,y)}{\partial x},\frac{\partial f(x,y)}{\partial y}\right)=\left(2x,6y\right) $$and
$$ \nabla f\left(x_{0},y_{0}\right)=\nabla f\left(2,1\right)=\left(4,6\right). $$Also
$$ H=\left(\begin{array}{cc} \dfrac{\partial^{2}f(x,y)}{\partial x\,\partial x} & \dfrac{\partial^{2}f(x,y)}{\partial x\,\partial y}\\ \dfrac{\partial^{2}f(x,y)}{\partial y\,\partial x} & \dfrac{\partial^{2}f(x,y)}{\partial y\,\partial y} \end{array}\right)=\left(\begin{array}{cc} 2 & 0\\ 0 & 6 \end{array}\right) $$and
$$ H\left(x_{0},y_{0}\right)=H\left(2,1\right)=\left(\begin{array}{cc} 2 & 0\\ 0 & 6 \end{array}\right) $$The system (1) is
$$ \left(\begin{array}{cc} 2 & 0\\ 0 & 6 \end{array}\right)\left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=\left(\begin{array}{c} 4\\ 6 \end{array}\right) $$and solving the system we obtain
$$ \left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=\left(\begin{array}{c} 2\\ 1 \end{array}\right). $$Therefore (2) is
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} 2\\ 1 \end{array}\right)-\left(\begin{array}{c} 2\\ 1 \end{array}\right)=\left(\begin{array}{c} 0\\ 0 \end{array}\right). $$We have improve because $f\left(x_{0},y_{0}\right)=f(2,1)=7$ and $f\left(x_{1},y_{1}\right)=f(0,0)=0$ which is smaller.
In this case we have arrive to the minimum performing only one iteration.
We will optimize a single variable $g$ function. The golden section method is an interval method. If the function reaches a single minimum in the interval, it will converge to that minimum.
If the function reaches a single minimum in $[a,b]$ and we have two inner points $x_1$ and $x_2$ such that $a \lt x_1 \lt x_2 \lt b$ taking into account the values of the function $g$ in $x_1$ and $x_2$, we can discard $[a,x_1)$ or $(x_2,b]$ keeping the minimum within the interval.
We choose the points so that they are at a fraction $\phi$ and $1-\phi$ from one of the extremes, with $\phi\in(0.5,1]$ with the condition that, if we discard $(x_2,b]$ the next $x_2$ is the same as the old $x_1$. And if we discard $[a,x_1)$ the next $x_1$ is the old $x_2.$ Thus, at each iteration, we will only have to evaluate the function at a new point.
Without losing generality, we can assume that the initial interval $[a,b]$ is $[0,1]$ and that in the first iteration we discard the interval on the right.
In the first iteration, and taking into account that $\phi\in(0.5,1]$, we see the points $a,$ $x_1$, $x_2$ and $b$ and their values.
If we discard $(x_2,b]$, in the second iteration, the points $x_1$ and $x_2$ are $1-\phi$ and $\phi$ multiplied by $\phi$ because the interval no longer has length $1$ but $\phi.$
To make the new $x_2$ equal to the old $x_1$, $$\phi^2=1-\phi\qquad\Longrightarrow \qquad \phi = \frac{\sqrt{5}-1}{2}\approx 0.618034$$
- Given $a_1=a$, $b_1=b$ and $\phi = \dfrac{\sqrt{5}-1}{2}$
- For $k=1,2,\ldots,\mathrm{MaxNumIter}$
- Compute the points
- $x_1=a_k+(1-\phi)(b_k-a_k)$
- $x_2=a_k+\phi(b_k-a_k)$
- if $g(x_1) \gt g(x_2)$ then:
$ a_{k+1}=x_1,$ $ b_{k+1}=b_k,$
- else:
$ a_{k+1}=a_{k},$ $ b_{k+1}=x_2.$
- If the stopping criterion is met, stop.
The idea is that the minimum is always within the interval.
Let's apply it to calculate the minimum of the function
Iteration 1
Iteration 2
Iteration 3
And so on.
Of the last two $x_1$ and $x_2$ we will take as an approximation the one in which the function has a lower value than is $$\fbox{0.2098}$$