Optimization¶

The Golden Section method
- Exercise 1
Representation of a two-variable function
The gradrient method with learning rate
- Exercise 2
The steepest descent method
- Exercise 3
Additional exercises
Newton's method
- Exercise 4
The penalty function
- Exercise 5
- Exercise 6

import numpy as np
import matplotlib.pyplot as plt

The Golden Section method¶

We will optimize a single variable function $g$. The golden section method is an interval method.

If the interval contains a single minimum in $[a, b]$, we will have two inner points $x_1$ and $x_2$ such that $a\lt x_1 \lt x_2 \lt b.$ Taking into account the values of the function $g$ in $x_1$ and $x_2$, we can discard $ [a, x_1) $ or $ (x_2, b] $ keeping the minimum within the interval.

We choose the points so that they are at a fraction $\phi$ and $1-\phi$ from one of the borders, with $\phi = \dfrac{\sqrt{5} -1}{2}$. $\phi$ is chosen in such a way that if we discard $(x_2, b]$ the next $x_2$ is the same as the old $x_1$. And if we discard $[a, x_1)$ the next $x_1$ is the old one $x_2.$ Thus, in each iteration, we will only have to evaluate the function at a new point.

Being an interval method, the length of the final interval is an error bound.

golden

Algorithm¶

Given $a_1=a$, $b_1=b$ and $\phi = \dfrac{\sqrt{5}-1}{2}$

For $k=1,2,\ldots,\mathrm{MaxNumIter}$

Compute the points

$x_1=a+(1-\phi)\,(b-a)$

$x_2=a+\phi\,(b-a)$

If $g(x_1) \gt g(x_2)$ then:
$ a_{k+1}=x_1,$ $ b_{k+1}=b_k,$

else:
$ a_{k+1}=a_{k},$ $ b_{k+1}=x_2.$

If the stopping criterion is met, stop.

Exercise 1

Write a function goldensection(g,a,b,tol=1.e-6,maxiter=100) with input parameters the lambda function g, the interval borders a, b, the absolute error bound tol and the maximum number of iterations maxiter, and as output, the approximate minimum of the function and the number of performed iterations. Take as the minimum m the midpoint of the last interval. Plot the function in $[a,b]$ and the point m on the curve.

Compute the minimum of the function $g_1(x) = (2x-1.1)^2 + (3x-3.3)^2$ in $[0,3]$
Also, for $g_2(x) = x^5 - 3 x^2 + 1.6$ in $[0,2]$

Note: The error bound is the length of the last interval.

%run Exercise1.py

g1, minimum = 0.9307690377689064 , iterations =  31
g2, minimum = 1.0626586010861123 , iterations =  31

We can compute the minimum of a function with the python function minimize:

import scipy.optimize as op

m1 = op.minimize(g1, x0=0)
print(m1.x[0])

0.9307692232431363

m2 = op.minimize(g2, x0=1)
print(m2.x[0])

1.0626585933647292

Representation of a two-variable function¶

Let us, for instance, represent the function $f(x,y)=x^2+y^2$ in the rectangle $[-5,0]\times[-2,2]$

We define the two variable function

f = lambda x,y: x**2 + y**2

We construct a grid of points

x = np.linspace(-5,0,6) # 6 equispaced point in [-5,0]
y = np.linspace(-2,2,5) # 5 equispaced point in [-2,2]
X, Y = np.meshgrid(x,y)

And we have created two matrices

print(X)

[[-5. -4. -3. -2. -1.  0.]
 [-5. -4. -3. -2. -1.  0.]
 [-5. -4. -3. -2. -1.  0.]
 [-5. -4. -3. -2. -1.  0.]
 [-5. -4. -3. -2. -1.  0.]]

print(Y)

[[-2. -2. -2. -2. -2. -2.]
 [-1. -1. -1. -1. -1. -1.]
 [ 0.  0.  0.  0.  0.  0.]
 [ 1.  1.  1.  1.  1.  1.]
 [ 2.  2.  2.  2.  2.  2.]]

And for every point $(x,y)$ of the grid (the ones in the same position in $X$ and $Y$) we will have a $f(x,y)$ value.

Z = f(X,Y)
print(Z)

[[29. 20. 13.  8.  5.  4.]
 [26. 17. 10.  5.  2.  1.]
 [25. 16.  9.  4.  1.  0.]
 [26. 17. 10.  5.  2.  1.]
 [29. 20. 13.  8.  5.  4.]]

We can represente it with isolines

plt.figure()
plt.contour(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()

plt.figure()
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()

Or as a surface

from mpl_toolkits.mplot3d import Axes3D

fig1  = plt.figure(figsize=(10,5))
ax1 = plt.axes(projection ='3d') 
ax1.plot_surface(X, Y, Z, cmap='jet')
plt.show()

fig2  = plt.figure(figsize=(10,5))
ax2 = plt.axes(projection ='3d') 
ax2.plot_wireframe(X, Y, Z)
plt.show()

If we build a denser mesh we will have a smoother representation

x = np.linspace(-5,0)  
y = np.linspace(-2,2) 
X, Y = np.meshgrid(x,y)
Z = f(X,Y)

plt.figure()
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()

The gradient method with learning rate¶

We look for the minimum of a function. Let's think about a two-variable function and therefore, we can think that we are on the surface that represents the function

We start at an initial point $(x_0, y_0).$
We point in the direction of the gradient. The gradient is oriented in the direction the function increases, that is, upwards, and we are looking for a minimum and want to go downwards.
Thus, we turn around 180$^o$ in the opposite direction and advance in a straight line using with the formula$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-\eta\,\left(\begin{array}{c} f_x(x_0,y_0)\\ f_y(x_0,y_0) \end{array}\right) $$where $\eta$ is a constant that we will have to fix. If $\eta$ is small, the sequence will converge slowly. If it is too large the sequence will not converge.
We look again for the direction of the gradient and repeat the previous steps until a stopping condition is met.
Since at the minimum $\nabla f = \mathbf{0}$ (minimum necessary condition), a stopping condition is that $\left\Vert \nabla f \right\Vert$ is less than a certain tolerance.

Exercise 2

Write the function gradient(f,eta,fx,fy,x0,y0,tol=1.e-6,maxiter=100) with input arguments the two-variable lambda function f, the learning rate eta, the two gradient coordinates fx and fy of f, also defined as lambda functions, the initial guess x0, y0, the tolerance tol and the maximum number of iterations, and as output, a numpy array x and y that contain the coordinates $x$ and $y$ of the iterations and the number of performed iterations. Using this function

Compute the minimum of the function $f_1(x,y) = x^2 + 3y^2$, with
- Initial guess $(2,1)$
- Learning rate $\eta = 0.1$,
- Tolerance tol = 0.001 so $\left\Vert \nabla f \right\Vert\lt tol$
Compute the minimum of the function $f_2(x,y) = x^2 + y^2 - x\,y - 3\,x$, trying with different values of $\eta$ to perform less than 15 iterations and using
- Initial guess $(-2,2)$
- Tolerance tol = 0.001

Plot the function and the point sequence.

%run Exercise2.py

Minimum with tol = 0.001
x =  0.0004153837486827864
y =  7.555786372591399e-16
38  iterations

Minimum with tol = 0.001
x =  1.999370827078981
y =  0.9993649134430919
13  iterations

We can also use the function minimize changing the syntax of the lambda function.

import scipy.optimize as op

f1 = lambda x: x[0]**2 + 3*x[1]**2
m1 = op.minimize(f1, x0 = np.array([2,1]))
print(m1.x)

[-7.44156456e-09 -7.44761094e-09]

f2 = lambda x: x[0]**2 + x[1]**2 - x[0]*x[1] - 3*x[0]
m1 = op.minimize(f2, x0 = np.array([-2,2]))
print(m1.x)

[1.99999995 0.99999996]

The steepest descent method¶

The method is similar to the previous one but the step is optimized to reach the lowest point of its trajectory where the value of the function is minimal.

We will perform the first iteration by the gradient method using the formula

$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-h\,\left(\begin{array}{c} f_x(x_0,y_0)\\ f_y(x_0,y_0) \end{array}\right)\qquad $$

When we move in a straight line on the surface we are on the curve $g$

$$ g(h)=f\left(x_{0}-h\,f_{x}(x_0,y_0),y_{0}-h\,f_{y}(x_0,y_0)\right) $$

and we will look for the step $h$ that makes us travel the curve to its minimum value. At the point where we start from $(x_0,y_0)$ we have $h=0.$ Also $h$ must be positive because we want to move in the opposite direction to the gradient and this is already determined by the minus sign in front of $h.$ Therefore, if we search with an interval method such as the golden section method, we will search for the minimum in an interval $[0,b].$

Exercise 3

Write the function descent(f,fx,fy,x0,y0,tol=1.e-6,maxiter=100) with input arguments the two-variable lambda function f, the two gradient coordinates fx and fy of f, also defined as lambda functions, the initial guess x0, y0, the tolerance tol and the maximum number of iterations, and as output, a numpy array x and y that contain the coordinates $x$ and $y$ of the iterations and the number of performed iterations. The function will optimize each step using goldensection to find the optimal h .

Compute the minimum of the function $f_1(x,y) = x^2 + 3y^2$, with
- Initial guess $(2,1)$
- Tolerance tol = 0.001 so $\left\Vert \nabla f \right\Vert\lt tol$
Compute the minimum of the function $f_2(x,y) = x^2 + y^2 - x\,y - 3\,x$, trying with different values of $\eta$ to perform less than 15 iterations and using
- Initial guess $(-2,2)$
- Tolerance tol = 0.001

Plot the function and the point sequence.

%run Exercise3.py

Minimum with tol = 0.001
x =  0.0002342710219407259
y =  0.00011713565913230156
12  iterations

Minimum with tol = 0.001
x =  1.9999044047362888
y =  0.9998907513285318
7  iterations

Newton's method¶

Newton's method to find a root of the nonlinear equation $f(x)=0$. Starting from an initial value $x_0$, uses the formula

$$ x_{k+1} = x_k-\frac{f(x_k)}{f'(x_k)} $$

and we obtain

$$ x_{1} = x_0-\frac{f(x_0)}{f'(x_0)},\qquad x_{2} = x_1-\frac{f(x_1)}{f'(x_1)}, \qquad x_{3} = x_2-\frac{f(x_2)}{f'(x_2)},\quad \ldots \quad \rightarrow \quad \alpha $$

If we want to find a maximum or a minimum of the function $f$, if $f$ is sufficiently regular, the necessary condition is

$$ f'(x)=0 $$

That is, we look for a zero of this equation and, using Newton, we would solve it, taking an initial guess $x_ 0$, using the formula

$$ x_{k+1} = x_k-\frac{f'(x_k)}{f''(x_k)} $$

Similarly, for functions of $ n $ variables $f:\Omega\subset \mathbb{R}^n\to \mathbb{R}$, taking into account that the components of the gradient vector of a function $\nabla f(\mathbf{x}$ are its first partial derivatives, and the components of the Hessian matrix $H(\mathbf{x})$ are the second partial derivatives, we can formulate Newton starting from an initial value $\mathbf{x}_0$

$$ \mathbf{x}_{k+1} = \mathbf{x}_{k}-H^{-1}(\mathbf{x}_{k}) \cdot \nabla f(\mathbf{x}_{k}) $$

This is not a proof. In both the one-dimensional and the multidimensional cases, Newton's method can be built from the corresponding Taylor formula, and that would be the formal proof.

As in the following exercise we are going to use a two-variable function, we will iterate with Newton's method using the formula

$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-H^{-1}\left(x_{0},y_{0}\right)\cdot\nabla f\left(x_{0},y_{0}\right) $$

If we take into account that

$$ H\left(x_{0},y_{0}\right)\left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=\nabla f\left(x_{0},y_{0}\right)\qquad (1) $$

then

$$ \left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=H^{-1}\left(x_{0},y_{0}\right)\cdot\nabla f\left(x_{0},y_{0}\right) $$

and we write

$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-\left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)\qquad (2) $$

where $\left(c_{1},c_{2}\right)^{T}$ is the solution of system (1). In general, it makes more sense to solve system (1) than to calculate the inverse matrix of $H$ and then multiply it by the gradient, because calculating the inverse of a matrix is equivalent to solving $n$ systems (although with the same matrix of coefficients) and in this way we are solving only one system.

Exercise 4

Write a function newton(f,gradf,Hf,x0,tol=1.e-6,maxiter=100) whose input parameters are the lambda functions f, gradf (gradient of $f$), Hf (Hessian matrix of f) and the numpy array x0 for the initial guess, the tolerance tol and the maximum number of iterations, and, as output, two numpy arrays x and y that contain the coordinates $x$ and $y$ of the iterations and the number of performed iterations. Using this function compute the minima of the function $$f(x,y) = \ln(12+2\,x^2-4\,x\,y+y^4)$$

The elements of the numpy array return bygradf and the elements of the matrix Hf are defined using centered formulas of numerical differentiation.
The tolerance will be tol = 0.001 and $\left\Vert \nabla f \right\Vert\lt tol$
Try with different initial guesses until we have two sequences, one for each minimum.
Plot the function and the sequences of points.

%run Exercise4.py

Minimum with tol = 0.001
x =  -1.000057446512907
y =  -1.000773590983239
2  iterations

Minimum with = 0.001
x =  1.0001779666335626
y =  1.000176853682842
3  iterations

The point $(0,0)$ is a saddle point because if we intersect the function with the plane $y=x$ it is a maximum but if we intersect with the plane $y=-x$ it is a minimum.

g1 = lambda h: f(h,h) 
g2 = lambda h: f(h,-h)

xx = np.linspace(-1,1)

plt.figure()
plt.plot(xx,g1(xx),label='Intersection of $f$ with the plane $y = x$')
plt.plot(xx,g2(xx),label='Intersection of $f$ with the plane $y = -x$')
plt.plot(0,g1(0),'ro')
plt.legend()
plt.show()

The penalty function¶

The idea of the penalty function is to replace the objective function $f$ by another function

$$F(x,y)=f(x,y)+c\,P(x,y)$$

and solve the problem without constrictions. We take a positive constant $c$ and function $P$ satisfying:

$P$ is continuous in the $f$ domain.
$P(x,y)\geq 0$ for every point in the domain of $f$, and
$P(x,y)=0$ if and only if the point $(x,y)$ satisfies the constrains.

Exercise 5

Minimize the function $f(x,y)=-y$ with the constriction $x^{2}+y^{2}=1$ using the penalty function and the steepest descent method.

The components of the gradient fx and fy will be defined as centered formulas of numerical differentiation.
The tolerance will be tol = 0.01 and $\left\Vert \nabla f \right\Vert\lt tol$
Try with different initial guesses until the sequence converges to the minimum.
Plot the function and the sequence of points.

Note: A possible $F$ would be

$$ F(x,y) = -y + 10\,(x^2+y^2-1)^2$$

%run Exercise5.py

Minimum with tol = 0.01
x =  -0.003037918689368151
y =  1.0122669782214664
49  iterations

Exercise 6

Minimize the function $f(x,y)=x^{2}+y^{2}+xy-3x$ with the constrictions $x\geq0$ e $y\geq0$ using the penalty function and the steepest descent method.

The components of the gradient fx and fy will be defined as centered formulas of numerical differentiation.
The tolerance will be tol = 0.01 and $\left\Vert \nabla f \right\Vert\lt tol$
Try with different initial guesses until the sequence converges to the minimum.
Plot the function and the sequence of points.

Nota: Note: A possible $F$ would be

$$F(x,y) = f(x,y) + 20 g_1(x,y) + 20 g_2(x,y)$$

where

$$ g_1(x,y)=\begin{cases} 0 & \mathrm{si} & x\ge 0\\ x^2 & \mathrm{si} & x\lt 0 \end{cases} \qquad g_2(x,y)=\begin{cases} 0 & \mathrm{si} & y\ge 0\\ y^2 & \mathrm{si} & y\lt 0 \end{cases} $$

%run Exercise6.py

Minimum with tol = 0.001
x =  1.5176821309377722
y =  -0.03612621827893196
49  iterations