import numpy as np
import matplotlib.pyplot as plt
We will optimize a single variable function $g$. The golden section method is an interval method.
If the interval contains a single minimum in $[a, b]$, we will have two inner points $x_1$ and $x_2$ such that $a\lt x_1 \lt x_2 \lt b.$ Taking into account the values of the function $g$ in $x_1$ and $x_2$, we can discard $ [a, x_1) $ or $ (x_2, b] $ keeping the minimum within the interval.
We choose the points so that they are at a fraction $\phi$ and $1-\phi$ from one of the borders, with $\phi = \dfrac{\sqrt{5} -1}{2}$. $\phi$ is chosen in such a way that if we discard $(x_2, b]$ the next $x_2$ is the same as the old $x_1$. And if we discard $[a, x_1)$ the next $x_1$ is the old one $x_2.$ Thus, in each iteration, we will only have to evaluate the function at a new point.
Being an interval method, the length of the final interval is an error bound.
golden
- Given $a_1=a$, $b_1=b$ and $\phi = \dfrac{\sqrt{5}-1}{2}$
- For $k=1,2,\ldots,\mathrm{MaxNumIter}$
- Compute the points
- $x_1=a+(1-\phi)\,(b-a)$
- $x_2=a+\phi\,(b-a)$
- If $g(x_1) \gt g(x_2)$ then:
$ a_{k+1}=x_1,$ $ b_{k+1}=b_k,$
- else:
$ a_{k+1}=a_{k},$ $ b_{k+1}=x_2.$
- If the stopping criterion is met, stop.
Write a function goldensection(g,a,b,tol=1.e-6,maxiter=100)
with input parameters
the lambda function g
, the interval borders a
, b
, the absolute error bound tol
and the maximum number of iterations maxiter
, and as output, the approximate minimum of the function and the number of performed iterations. Take as the minimum m
the midpoint of the last interval. Plot the function in $[a,b]$ and the point m
on the curve.
Note: The error bound is the length of the last interval.
%run Exercise1.py
We can compute the minimum of a function with the python function minimize:
import scipy.optimize as op
m1 = op.minimize(g1, x0=0)
print(m1.x[0])
m2 = op.minimize(g2, x0=1)
print(m2.x[0])
Let us, for instance, represent the function $f(x,y)=x^2+y^2$ in the rectangle $[-5,0]\times[-2,2]$
We define the two variable function
f = lambda x,y: x**2 + y**2
We construct a grid of points
x = np.linspace(-5,0,6) # 6 equispaced point in [-5,0]
y = np.linspace(-2,2,5) # 5 equispaced point in [-2,2]
X, Y = np.meshgrid(x,y)
And we have created two matrices
print(X)
print(Y)
And for every point $(x,y)$ of the grid (the ones in the same position in $X$ and $Y$) we will have a $f(x,y)$ value.
Z = f(X,Y)
print(Z)
We can represente it with isolines
plt.figure()
plt.contour(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
plt.figure()
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
Or as a surface
from mpl_toolkits.mplot3d import Axes3D
fig1 = plt.figure(figsize=(10,5))
ax1 = plt.axes(projection ='3d')
ax1.plot_surface(X, Y, Z, cmap='jet')
plt.show()
fig2 = plt.figure(figsize=(10,5))
ax2 = plt.axes(projection ='3d')
ax2.plot_wireframe(X, Y, Z)
plt.show()
If we build a denser mesh we will have a smoother representation
x = np.linspace(-5,0)
y = np.linspace(-2,2)
X, Y = np.meshgrid(x,y)
Z = f(X,Y)
plt.figure()
plt.contourf(X,Y,Z,cmap='jet')
plt.colorbar()
plt.show()
We look for the minimum of a function. Let's think about a two-variable function and therefore, we can think that we are on the surface that represents the function
Write the function gradient(f,eta,fx,fy,x0,y0,tol=1.e-6,maxiter=100)
with input arguments the two-variable lambda function f
, the learning rate eta
, the two gradient coordinates fx
and fy
of f
, also defined as lambda functions, the initial guess x0
, y0
, the tolerance tol
and the maximum number of iterations, and as output, a numpy array x
and y
that contain the coordinates $x$ and $y$ of the iterations and the number of performed iterations. Using this function
Compute the minimum of the function $f_1(x,y) = x^2 + 3y^2$, with
tol = 0.001
so $\left\Vert \nabla f \right\Vert\lt tol$Compute the minimum of the function $f_2(x,y) = x^2 + y^2 - x\,y - 3\,x$, trying with different values of $\eta$ to perform less than 15 iterations and using
tol = 0.001
Plot the function and the point sequence.
%run Exercise2.py
We can also use the function minimize changing the syntax of the lambda function.
import scipy.optimize as op
f1 = lambda x: x[0]**2 + 3*x[1]**2
m1 = op.minimize(f1, x0 = np.array([2,1]))
print(m1.x)
f2 = lambda x: x[0]**2 + x[1]**2 - x[0]*x[1] - 3*x[0]
m1 = op.minimize(f2, x0 = np.array([-2,2]))
print(m1.x)
The method is similar to the previous one but the step is optimized to reach the lowest point of its trajectory where the value of the function is minimal.
We will perform the first iteration by the gradient method using the formula
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-h\,\left(\begin{array}{c} f_x(x_0,y_0)\\ f_y(x_0,y_0) \end{array}\right)\qquad $$When we move in a straight line on the surface we are on the curve $g$
$$ g(h)=f\left(x_{0}-h\,f_{x}(x_0,y_0),y_{0}-h\,f_{y}(x_0,y_0)\right) $$and we will look for the step $h$ that makes us travel the curve to its minimum value. At the point where we start from $(x_0,y_0)$ we have $h=0.$ Also $h$ must be positive because we want to move in the opposite direction to the gradient and this is already determined by the minus sign in front of $h.$ Therefore, if we search with an interval method such as the golden section method, we will search for the minimum in an interval $[0,b].$
Write the function descent(f,fx,fy,x0,y0,tol=1.e-6,maxiter=100)
with input arguments the two-variable lambda function f
, the two gradient coordinates fx
and fy
of f
, also defined as lambda functions, the initial guess x0
, y0
, the tolerance tol
and the maximum number of iterations, and as output, a numpy array x
and y
that contain the coordinates $x$ and $y$ of the iterations and the number of performed iterations. The function will optimize each step using goldensection
to find the optimal h
.
Compute the minimum of the function $f_1(x,y) = x^2 + 3y^2$, with
tol = 0.001
so $\left\Vert \nabla f \right\Vert\lt tol$Compute the minimum of the function $f_2(x,y) = x^2 + y^2 - x\,y - 3\,x$, trying with different values of $\eta$ to perform less than 15 iterations and using
tol = 0.001
Plot the function and the point sequence.
%run Exercise3.py
Newton's method to find a root of the nonlinear equation $f(x)=0$. Starting from an initial value $x_0$, uses the formula
$$ x_{k+1} = x_k-\frac{f(x_k)}{f'(x_k)} $$and we obtain
$$ x_{1} = x_0-\frac{f(x_0)}{f'(x_0)},\qquad x_{2} = x_1-\frac{f(x_1)}{f'(x_1)}, \qquad x_{3} = x_2-\frac{f(x_2)}{f'(x_2)},\quad \ldots \quad \rightarrow \quad \alpha $$If we want to find a maximum or a minimum of the function $f$, if $f$ is sufficiently regular, the necessary condition is
$$ f'(x)=0 $$That is, we look for a zero of this equation and, using Newton, we would solve it, taking an initial guess $x_ 0$, using the formula
$$ x_{k+1} = x_k-\frac{f'(x_k)}{f''(x_k)} $$Similarly, for functions of $ n $ variables $f:\Omega\subset \mathbb{R}^n\to \mathbb{R}$, taking into account that the components of the gradient vector of a function $\nabla f(\mathbf{x}$ are its first partial derivatives, and the components of the Hessian matrix $H(\mathbf{x})$ are the second partial derivatives, we can formulate Newton starting from an initial value $\mathbf{x}_0$
$$ \mathbf{x}_{k+1} = \mathbf{x}_{k}-H^{-1}(\mathbf{x}_{k}) \cdot \nabla f(\mathbf{x}_{k}) $$This is not a proof. In both the one-dimensional and the multidimensional cases, Newton's method can be built from the corresponding Taylor formula, and that would be the formal proof.
As in the following exercise we are going to use a two-variable function, we will iterate with Newton's method using the formula
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-H^{-1}\left(x_{0},y_{0}\right)\cdot\nabla f\left(x_{0},y_{0}\right) $$If we take into account that
$$ H\left(x_{0},y_{0}\right)\left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=\nabla f\left(x_{0},y_{0}\right)\qquad (1) $$then
$$ \left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)=H^{-1}\left(x_{0},y_{0}\right)\cdot\nabla f\left(x_{0},y_{0}\right) $$and we write
$$ \left(\begin{array}{c} x_{1}\\ y_{1} \end{array}\right)=\left(\begin{array}{c} x_{0}\\ y_{0} \end{array}\right)-\left(\begin{array}{c} c_{1}\\ c_{2} \end{array}\right)\qquad (2) $$where $\left(c_{1},c_{2}\right)^{T}$ is the solution of system (1). In general, it makes more sense to solve system (1) than to calculate the inverse matrix of $H$ and then multiply it by the gradient, because calculating the inverse of a matrix is equivalent to solving $n$ systems (although with the same matrix of coefficients) and in this way we are solving only one system.
Write a function newton(f,gradf,Hf,x0,tol=1.e-6,maxiter=100)
whose input parameters are the lambda functions f
, gradf
(gradient of $f$), Hf
(Hessian matrix of f) and the numpy array x0
for the initial guess, the tolerance tol
and the maximum number of iterations, and, as output, two numpy arrays x
and y
that contain the coordinates $x$ and $y$ of the iterations and the number of performed iterations. Using this function compute the minima of the function $$f(x,y) = \ln(12+2\,x^2-4\,x\,y+y^4)$$
The elements of the numpy array return bygradf
and the elements of the matrix Hf
are defined using centered formulas of numerical differentiation.
The tolerance will be tol = 0.001
and $\left\Vert \nabla f \right\Vert\lt tol$
Try with different initial guesses until we have two sequences, one for each minimum.
Plot the function and the sequences of points.
%run Exercise4.py
The point $(0,0)$ is a saddle point because if we intersect the function with the plane $y=x$ it is a maximum but if we intersect with the plane $y=-x$ it is a minimum.
g1 = lambda h: f(h,h)
g2 = lambda h: f(h,-h)
xx = np.linspace(-1,1)
plt.figure()
plt.plot(xx,g1(xx),label='Intersection of $f$ with the plane $y = x$')
plt.plot(xx,g2(xx),label='Intersection of $f$ with the plane $y = -x$')
plt.plot(0,g1(0),'ro')
plt.legend()
plt.show()
The idea of the penalty function is to replace the objective function $f$ by another function
$$F(x,y)=f(x,y)+c\,P(x,y)$$and solve the problem without constrictions. We take a positive constant $c$ and function $P$ satisfying:
$P$ is continuous in the $f$ domain.
$P(x,y)\geq 0$ for every point in the domain of $f$, and
$P(x,y)=0$ if and only if the point $(x,y)$ satisfies the constrains.
Minimize the function $f(x,y)=-y$ with the constriction $x^{2}+y^{2}=1$ using the penalty function and the steepest descent method.
The components of the gradient fx
and fy
will be defined as centered formulas of numerical differentiation.
The tolerance will be tol = 0.01
and $\left\Vert \nabla f \right\Vert\lt tol$
Try with different initial guesses until the sequence converges to the minimum.
Plot the function and the sequence of points.
Note: A possible $F$ would be
$$ F(x,y) = -y + 10\,(x^2+y^2-1)^2$$%run Exercise5.py
Minimize the function $f(x,y)=x^{2}+y^{2}+xy-3x$ with the constrictions $x\geq0$ e $y\geq0$ using the penalty function and the steepest descent method.
The components of the gradient fx
and fy
will be defined as centered formulas of numerical differentiation.
The tolerance will be tol = 0.01
and $\left\Vert \nabla f \right\Vert\lt tol$
Try with different initial guesses until the sequence converges to the minimum.
Plot the function and the sequence of points.
Nota: Note: A possible $F$ would be
$$F(x,y) = f(x,y) + 20 g_1(x,y) + 20 g_2(x,y)$$where
$$ g_1(x,y)=\begin{cases} 0 & \mathrm{si} & x\ge 0\\ x^2 & \mathrm{si} & x\lt 0 \end{cases} \qquad g_2(x,y)=\begin{cases} 0 & \mathrm{si} & y\ge 0\\ y^2 & \mathrm{si} & y\lt 0 \end{cases} $$%run Exercise6.py