SciPy, optimization

SciPy, optimization

SciPy (pronounced sai pai) is a math application package based on the Numpy Python extension. With SciPy, an interactive Python session becomes the same complete data processing and complex prototyping environment as MATLAB, IDL, Octave, R-Lab, and SciLab. Today I want to briefly talk about how to apply some well-known optimization algorithms in the scipy.optimize package. More detailed and up-to-date help on using functions can always be obtained using the help() command or using Shift+Tab.

Introduction

In order to save myself and readers from searching and reading primary sources, links to descriptions of methods will be mainly on Wikipedia. As a rule, this information is sufficient to understand the methods in general terms and the conditions for their application. To understand the essence of mathematical methods, we follow the links to more authoritative publications that can be found at the end of each article or in your favorite search engine.

So, the scipy.optimize module includes the implementation of the following procedures:

  1. Conditional and unconditional minimization of scalar functions of several variables (minim) using various algorithms (Nelder-Mead simplex, BFGS, Newton's conjugate gradients, COBYLA ΠΈ SLSQP)
  2. Global optimization (for example: basinhopping, diff_evolution)
  3. Residue minimization MNC (least_squares) and non-linear least squares curve fitting algorithms (curve_fit)
  4. Minimizing scalar functions of one variable (minim_scalar) and finding roots (root_scalar)
  5. Multivariate equation system solvers (root) using different algorithms (Hybrid Powell, Levenberg-Marquardt or large scale methods such as Newton-Krylov).

In this article, we will consider only the first item from this entire list.

Unconditional minimization of a scalar function of several variables

The minim function from the scipy.optimize package provides a common interface for conditional and unconditional minimization of scalar functions of several variables. To demonstrate its operation, we need a suitable function of several variables, which we will minimize in different ways.

For these purposes, the Rosenbrock function of N variables is perfect, which looks like:

SciPy, optimization

Despite the fact that the Rosenbrock function and its Jacobi and Hesse matrices (of the first and second derivatives, respectively) are already defined in the scipy.optimize package, we will define it ourselves.

import numpy as np

def rosen(x):
    """The Rosenbrock function"""
    return np.sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0, axis=0)

For clarity, let's draw in 3D the values ​​of the Rosenbrock function of two variables.

Code for rendering

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter

# НастраиваСм 3D Π³Ρ€Π°Ρ„ΠΈΠΊ
fig = plt.figure(figsize=[15, 10])
ax = fig.gca(projection='3d')

# Π—Π°Π΄Π°Π΅ΠΌ ΡƒΠ³ΠΎΠ» ΠΎΠ±Π·ΠΎΡ€Π°
ax.view_init(45, 30)

# Π‘ΠΎΠ·Π΄Π°Π΅ΠΌ Π΄Π°Π½Π½Ρ‹Π΅ для Π³Ρ€Π°Ρ„ΠΈΠΊΠ°
X = np.arange(-2, 2, 0.1)
Y = np.arange(-1, 3, 0.1)
X, Y = np.meshgrid(X, Y)
Z = rosen(np.array([X,Y]))

# РисуСм ΠΏΠΎΠ²Π΅Ρ€Ρ…Π½ΠΎΡΡ‚ΡŒ
surf = ax.plot_surface(X, Y, Z, cmap=cm.coolwarm)
plt.show()

SciPy, optimization

Knowing in advance that the minimum is 0 at SciPy, optimization, let's look at examples of how to determine the minimum value of the Rosenbrock function using various scipy.optimize procedures.

Simplex method of Nelder-Mead (Nelder-Mead)

Let there be an initial point x0 in 5-dimensional space. Find the minimum point of the Rosenbrock function closest to it using the algorithm Nelder-Mead Simplex (the algorithm is specified as the value of the method parameter):

from scipy.optimize import minimize
x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(rosen, x0, method='nelder-mead',
    options={'xtol': 1e-8, 'disp': True})
print(res.x)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 339
         Function evaluations: 571
[1. 1. 1. 1. 1.]

The simplex method is the simplest way to minimize an explicitly defined and fairly smooth function. It does not require the calculation of derivatives of the function, it is enough to specify only its values. The Nelder-Mead method is a good choice for simple minimization problems. However, since it does not use gradient estimates, it may take longer to find the minimum.

Powell method

Another optimization algorithm in which only function values ​​are calculated is Powell method. To use it, you need to set method = 'powell' in the minim function.

x0 = np.array([1.3, 0.7, 0.8, 1.9, 1.2])
res = minimize(rosen, x0, method='powell',
    options={'xtol': 1e-8, 'disp': True})
print(res.x)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 19
         Function evaluations: 1622
[1. 1. 1. 1. 1.]

Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm

To obtain faster convergence to the solution, the procedure bfgs uses the objective function gradient. The gradient can be specified as a function or calculated using first-order differences. In any case, the BFGS method usually requires fewer function calls than the simplex method.

Let's find the derivative of the Rosenbrock function in an analytical form:

SciPy, optimization

SciPy, optimization

This expression is valid for the derivatives of all variables except the first and last, which are defined as:

SciPy, optimization

SciPy, optimization

Let's look at a Python function that calculates this gradient:

def rosen_der (x):
    xm = x [1: -1]
    xm_m1 = x [: - 2]
    xm_p1 = x [2:]
    der = np.zeros_like (x)
    der [1: -1] = 200 * (xm-xm_m1 ** 2) - 400 * (xm_p1 - xm ** 2) * xm - 2 * (1-xm)
    der [0] = -400 * x [0] * (x [1] -x [0] ** 2) - 2 * (1-x [0])
    der [-1] = 200 * (x [-1] -x [-2] ** 2)
    return der

The gradient calculation function is specified as the value of the jac parameter of the minim function, as shown below.

res = minimize(rosen, x0, method='BFGS', jac=rosen_der, options={'disp': True})
print(res.x)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 25
         Function evaluations: 30
         Gradient evaluations: 30
[1.00000004 1.0000001  1.00000021 1.00000044 1.00000092]

Conjugate Gradient (Newton) Algorithm

Algorithm Newtonian conjugate gradients is a modified Newton's method.
Newton's method is based on the approximation of a function in a local area by a polynomial of the second degree:

SciPy, optimization

where SciPy, optimization is a matrix of second derivatives (Hessian matrix, Hessian).
If the Hessian is positive definite, then the local minimum of this function can be found by equating the zero gradient of the quadratic form to zero. The result is an expression:

SciPy, optimization

The inverse Hessian is calculated using the conjugate gradient method. An example of using this method to minimize the Rosenbrock function is given below. To use the Newton-CG method, you must define a function that calculates the Hessian.
The Hessian of the Rosenbrock function in analytical form is equal to:

SciPy, optimization

SciPy, optimization

where SciPy, optimization ΠΈ SciPy, optimization, define the matrix SciPy, optimization.

The remaining non-zero elements of the matrix are equal:

SciPy, optimization

SciPy, optimization

SciPy, optimization

SciPy, optimization

For example, in the five-dimensional space N = 5, the Hessian matrix for the Rosenbrock function has a band form:

SciPy, optimization

The code that calculates this Hessian, along with the code for minimizing the Rosenbrock function using the conjugate gradient (Newton) method:

def rosen_hess(x):
    x = np.asarray(x)
    H = np.diag(-400*x[:-1],1) - np.diag(400*x[:-1],-1)
    diagonal = np.zeros_like(x)
    diagonal[0] = 1200*x[0]**2-400*x[1]+2
    diagonal[-1] = 200
    diagonal[1:-1] = 202 + 1200*x[1:-1]**2 - 400*x[2:]
    H = H + np.diag(diagonal)
    return H

res = minimize(rosen, x0, method='Newton-CG', 
               jac=rosen_der, hess=rosen_hess,
               options={'xtol': 1e-8, 'disp': True})
print(res.x)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 24
         Function evaluations: 33
         Gradient evaluations: 56
         Hessian evaluations: 24
[1.         1.         1.         0.99999999 0.99999999]

An example with the definition of the function of the product of the Hessian and an arbitrary vector

In real problems, the calculation and storage of the entire Hessian matrix can require significant time and memory resources. In this case, there is actually no need to specify the Hessian matrix itself, since the minimization procedure only needs a vector equal to the product of the Hessian with another arbitrary vector. Thus, from a computational point of view, it is much more preferable to immediately define a function that returns the result of the product of the Hessian with an arbitrary vector.

Consider the hess function, which takes a minimization vector as its first argument and an arbitrary vector as its second argument (along with other arguments to the function to be minimized). In our case, it is not very difficult to calculate the product of the Hessian of the Rosenbrock function with an arbitrary vector. If p is an arbitrary vector, then the product SciPy, optimization has the form:

SciPy, optimization

The function that calculates the product of the Hessian and an arbitrary vector is passed as the value of the hessp argument to the minimize function:

def rosen_hess_p(x, p):
    x = np.asarray(x)
    Hp = np.zeros_like(x)
    Hp[0] = (1200*x[0]**2 - 400*x[1] + 2)*p[0] - 400*x[0]*p[1]
    Hp[1:-1] = -400*x[:-2]*p[:-2]+(202+1200*x[1:-1]**2-400*x[2:])*p[1:-1] 
    -400*x[1:-1]*p[2:]
    Hp[-1] = -400*x[-2]*p[-2] + 200*p[-1]
    return Hp

res = minimize(rosen, x0, method='Newton-CG',
               jac=rosen_der, hessp=rosen_hess_p,
               options={'xtol': 1e-8, 'disp': True})

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 24
         Function evaluations: 33
         Gradient evaluations: 56
         Hessian evaluations: 66

Trust region algorithm for conjugate gradients (Newton)

Poor conditionality of the Hessian matrix and incorrect search directions can cause Newton's conjugate gradient algorithm to be inefficient. In such cases, preference is given trust region method (trust-region) Newtonian conjugate gradients.

An example with the definition of the Hessian matrix:

res = minimize(rosen, x0, method='trust-ncg',
               jac=rosen_der, hess=rosen_hess,
               options={'gtol': 1e-8, 'disp': True})
print(res.x)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 20
         Function evaluations: 21
         Gradient evaluations: 20
         Hessian evaluations: 19
[1. 1. 1. 1. 1.]

An example with the function of the product of the Hessian and an arbitrary vector:

res = minimize(rosen, x0, method='trust-ncg', 
                jac=rosen_der, hessp=rosen_hess_p, 
                options={'gtol': 1e-8, 'disp': True})
print(res.x)

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 20
         Function evaluations: 21
         Gradient evaluations: 20
         Hessian evaluations: 0
[1. 1. 1. 1. 1.]

Krylovsky type methods

Like the trust-ncg method, Krylov's type methods are well suited for solving large-scale problems because they only use matrix-vector products. Their essence is in solving the problem in the confidence region bounded by the truncated Krylov subspace. For uncertain problems, this method is better to use, since it uses fewer non-linear iterations due to fewer matrix-vector products per subproblem, compared to the trust-ncg method. In addition, the solution of the quadratic subproblem is found more accurately than by the trust-ncg method.
An example with the definition of the Hessian matrix:

res = minimize(rosen, x0, method='trust-krylov',
               jac=rosen_der, hess=rosen_hess,
               options={'gtol': 1e-8, 'disp': True})

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 19
         Function evaluations: 20
         Gradient evaluations: 20
         Hessian evaluations: 18

print(res.x)

    [1. 1. 1. 1. 1.]

An example with the function of the product of the Hessian and an arbitrary vector:

res = minimize(rosen, x0, method='trust-krylov',
               jac=rosen_der, hessp=rosen_hess_p,
               options={'gtol': 1e-8, 'disp': True})

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 19
         Function evaluations: 20
         Gradient evaluations: 20
         Hessian evaluations: 0

print(res.x)

    [1. 1. 1. 1. 1.]

Algorithm for Approximate Solution in the Confidence Region

All methods (Newton-CG, trust-ncg and trust-krylov) are well suited for solving large scale problems (thousands of variables). This is due to the fact that the underlying conjugate gradient algorithm implies an approximate finding of the inverse Hessian matrix. The solution is found iteratively, without an explicit expansion of the Hessian. Because it only needs to define a function for the product of a Hessian and an arbitrary vector, this algorithm is especially good for working with sparse (diagonal band) matrices. This provides low memory costs and significant time savings.

In medium-sized problems, the cost of storing and factoring the Hessian is not critical. This means that it is possible to obtain a solution in fewer iterations by resolving the subproblems of the confidence region almost exactly. To do this, some nonlinear equations are solved iteratively for each quadratic subproblem. Such a solution usually requires 3 or 4 expansions of the Cholesky Hessian matrix. As a result, the method converges in fewer iterations and requires less objective function computation than other implemented trust-region methods. This algorithm only implies the definition of the full Hessian matrix and does not support the ability to use the function of the product of the Hessian and an arbitrary vector.

Example with minimization of the Rosenbrock function:

res = minimize(rosen, x0, method='trust-exact',
               jac=rosen_der, hess=rosen_hess,
               options={'gtol': 1e-8, 'disp': True})
res.x

Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 13
         Function evaluations: 14
         Gradient evaluations: 13
         Hessian evaluations: 14

array([1., 1., 1., 1., 1.])

On this, perhaps, we will stop. In the next article I will try to tell the most interesting things about conditional minimization, the application of minimization in solving approximation problems, minimizing a function of one variable, arbitrary minimizers, and finding the roots of a system of equations using the scipy.optimize package.

Source: https://docs.scipy.org/doc/scipy/reference/

Source: habr.com

Add a comment