10. Solvers, Optimizers #

Contents

Solvers, Optimizers

10.1. Overview #

In this lecture we introduce a few of the Julia libraries for solving optimization problems, systems of equations, and finding fixed-points.

See auto-differentiation for more on calculating gradients and Jacobians for these types of algorithms.

using LinearAlgebra, Statistics
using ForwardDiff, Optim, Roots, NLsolve
using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions

10.2. Optimization #

There are a large number of packages intended to be used for optimization in Julia.

Part of the reason for the diversity of options is that Julia makes it possible to efficiently implement a large number of variations on optimization routines.

The other reason is that different types of optimization problems require different algorithms.

10.2.1. Optim.jl#

A good pure-Julia solution for the (unconstrained or box-bounded) optimization of univariate and multivariate function is the Optim.jl package.

By default, the algorithms in Optim.jl target minimization rather than maximization, so if a function is called optimize it will mean minimization.

10.2.1.1. Univariate Functions on Bounded Intervals#

Univariate optimization defaults to a robust hybrid optimization routine called Brent’s method.

using Optim: converged, maximum, maximizer, minimizer, iterations #some extra functions

result = optimize(x -> x^2, -2.0, 1.0)

Results of Optimization Algorithm
 * Status: success
 * Algorithm: Brent's Method
 * Search Interval: [-2.000000, 1.000000]
 * Minimizer: 0.000000e+00
 * Minimum: 0.000000e+00
 * Iterations: 5
 * Convergence: max(|x - x_upper|, |x - x_lower|) <= 2*(1.5e-08*|x|+2.2e-16): true
 * Objective Function Calls: 6

Always check if the results converged, and throw errors otherwise

converged(result) || error("Failed to converge in $(iterations(result)) iterations")
xmin = result.minimizer
result.minimum

0.0

The first line is a logical OR between converged(result) and error("...").

If the convergence check passes, the logical sentence is true, and it will proceed to the next line; if not, it will throw the error.

10.2.1.2. Unconstrained Multivariate Optimization#

There are a variety of algorithms and options for multivariate optimization.

From the documentation, the simplest version is

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv) # i.e. optimize(f, x_iv, NelderMead())

 * Status: success

 * Candidate solution
    Final objective value:     3.525527e-09

 * Found with
    Algorithm:     Nelder-Mead

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    60
    f(x) calls:    117

The default algorithm in NelderMead, which is derivative-free and hence requires many function evaluations.

To change the algorithm type to L-BFGS

results = optimize(f, x_iv, LBFGS())
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in " *
        "$(results.iterations) iterations")

minimum = 5.378388330692143e-17 with argmin = [0.9999999926662504, 0.9999999853325008] in 24 iterations

Note that this has fewer iterations.

As no derivative was given, it used finite differences to approximate the gradient of f(x).

However, since most of the algorithms require derivatives, you will often want to use auto differentiation or pass analytical gradients if possible.

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv, LBFGS(), autodiff = :forward) # i.e. use ForwardDiff.jl
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in " *
        "$(results.iterations) iterations")

minimum = 5.191703158437428e-27 with argmin = [0.999999999999928, 0.9999999999998559] in 24 iterations

Note that we did not need to use ForwardDiff.jl directly, as long as our f(x) function was written to be generic (see the generic programming lecture ).

Alternatively, with an analytical gradient

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
function g!(G, x)
    G[1] = -2.0 * (1.0 - x[1]) - 400.0 * (x[2] - x[1]^2) * x[1]
    G[2] = 200.0 * (x[2] - x[1]^2)
end

results = optimize(f, g!, x_iv, LBFGS()) # or ConjugateGradient()
println("minimum = $(results.minimum) with argmin = $(results.minimizer) in " *
        "$(results.iterations) iterations")

minimum = 5.191703158437428e-27 with argmin = [0.999999999999928, 0.9999999999998559] in 24 iterations

For derivative-free methods, you can change the algorithm – and have no need to provide a gradient

f(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
x_iv = [0.0, 0.0]
results = optimize(f, x_iv, SimulatedAnnealing()) # or ParticleSwarm() or NelderMead()

 * Status: failure (reached maximum number of iterations)

 * Candidate solution
    Final objective value:     2.893426e-02

 * Found with
    Algorithm:     Simulated Annealing

 * Convergence measures
    |x - x'|               = NaN ≰ 0.0e+00
    |x - x'|/|x'|          = NaN ≰ 0.0e+00
    |f(x) - f(x')|         = NaN ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = NaN ≰ 0.0e+00
    |g(x)|                 = NaN ≰ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    1000
    f(x) calls:    1001

However, you will note that this did not converge, as stochastic methods typically require many more iterations as a tradeoff for their global-convergence properties.

See the maximum likelihood example and the accompanying Jupyter notebook.

10.2.2. JuMP.jl#

The JuMP.jl package is an ambitious implementation of a modelling language for optimization problems in Julia.

In that sense, it is more like an AMPL (or Pyomo) built on top of the Julia language with macros, and able to use a variety of different commerical and open source solvers.

If you have a linear, quadratic, conic, mixed-integer linear, etc. problem then this will likely be the ideal “meta-package” for calling various solvers.

For nonlinear problems, the modelling language may make things difficult for complicated functions (as it is not designed to be used as a general-purpose nonlinear optimizer).

See the quick start guide for more details on all of the options.

The following is an example of calling a linear objective with a nonlinear constraint (provided by an external function).

Here Ipopt stands for Interior Point OPTimizer, a nonlinear solver in Julia

using JuMP, Ipopt
# solve
# max( x[1] + x[2] )
# st sqrt(x[1]^2 + x[2]^2) <= 1

function squareroot(x) # pretending we don't know sqrt()
    z = x # Initial starting point for Newton’s method
    while abs(z * z - x) > 1e-13
        z = z - (z * z - x) / (2z)
    end
    return z
end
m = Model(Ipopt.Optimizer)
# need to register user defined functions for AD
JuMP.register(m, :squareroot, 1, squareroot, autodiff = true)

@variable(m, x[1:2], start=0.5) # start is the initial condition
@objective(m, Max, sum(x))
@NLconstraint(m, squareroot(x[1]^2 + x[2]^2)<=1)
@show JuMP.optimize!(m)

And this is an example of a quadratic objective

# solve
# min (1-x)^2 + (100(y-x^2)^2)
# st x + y >= 10

using JuMP, Ipopt
m = Model(Ipopt.Optimizer) # settings for the solver
@variable(m, x, start=0.0)
@variable(m, y, start=0.0)

@NLobjective(m, Min, (1 - x)^2+100(y - x^2)^2)

JuMP.optimize!(m)
println("x = ", value(x), " y = ", value(y))

# adding a (linear) constraint
@constraint(m, x + y==10)
JuMP.optimize!(m)
println("x = ", value(x), " y = ", value(y))

10.2.3. BlackBoxOptim.jl#

Another package for doing global optimization without derivatives is BlackBoxOptim.jl.

An example for parallel execution of the objective is provided.

10.3. Systems of Equations and Least Squares #

10.3.1. Roots.jl#

A root of a real function $f$ on $[a, b]$ is an $x \in [a, b]$ such that $f (x) = 0$ .

For example, if we plot the function

(10.1)#

f (x) = \sin (4 (x - 1 / 4)) + x + x^{20} - 1

with $x \in [0, 1]$ we get

The unique root is approximately 0.408.

The Roots.jl package offers fzero() to find roots

using Roots
f(x) = sin(4 * (x - 1 / 4)) + x + x^20 - 1
fzero(f, 0, 1)

0.40829350427936706

10.3.2. NLsolve.jl#

The NLsolve.jl package provides functions to solve for multivariate systems of equations and fixed points.

From the documentation, to solve for a system of equations without providing a Jacobian

using NLsolve

f(x) = [(x[1] + 3) * (x[2]^3 - 7) + 18
        sin(x[2] * exp(x[1]) - 1)] # returns an array

results = nlsolve(f, [0.1; 1.2])

Results of Nonlinear Solver Algorithm
 * Algorithm: Trust-region with dogleg and autoscaling
 * Starting Point: [0.1, 1.2]
 * Zero: [-7.775548712324193e-17, 0.9999999999999999]
 * Inf-norm of residuals: 0.000000
 * Iterations: 4
 * Convergence: true
   * |x - x'| < 0.0e+00: false
   * |f(x)| < 1.0e-08: true
 * Function Calls (f): 5
 * Jacobian Calls (df/dx): 5

In the above case, the algorithm used finite differences to calculate the Jacobian.

Alternatively, if f(x) is written generically, you can use auto-differentiation with a single setting.

results = nlsolve(f, [0.1; 1.2], autodiff = :forward)

println("converged=$(NLsolve.converged(results)) at root=$(results.zero) in " *
        "$(results.iterations) iterations and $(results.f_calls) function calls")

converged=true at root=[-3.7818049096324184e-16, 1.0000000000000002] in 4 iterations and 5 function calls

Providing a function which operates inplace (i.e., modifies an argument) may help performance for large systems of equations (and hurt it for small ones).

function f!(F, x) # modifies the first argument
    F[1] = (x[1] + 3) * (x[2]^3 - 7) + 18
    F[2] = sin(x[2] * exp(x[1]) - 1)
end

results = nlsolve(f!, [0.1; 1.2], autodiff = :forward)

println("converged=$(NLsolve.converged(results)) at root=$(results.zero) in " *
        "$(results.iterations) iterations and $(results.f_calls) function calls")

converged=true at root=[-3.7818049096324184e-16, 1.0000000000000002] in 4 iterations and 5 function calls

10.4. LeastSquaresOptim.jl #

Many optimization problems can be solved using linear or nonlinear least squares.

Let $x \in R^{N}$ and $F (x) : R^{N} \to R^{M}$ with $M \geq N$ , then the nonlinear least squares problem is

min_{x} F (x)^{T} F (x)

While $F (x)^{T} F (x) \to R$ , and hence this problem could technically use any nonlinear optimizer, it is useful to exploit the structure of the problem.

In particular, the Jacobian of $F (x)$ , can be used to approximate the Hessian of the objective.

As with most nonlinear optimization problems, the benefits will typically become evident only when analytical or automatic differentiation is possible.

If $M = N$ and we know a root $F (x^{*}) = 0$ to the system of equations exists, then NLS is the defacto method for solving large systems of equations.

An implementation of NLS is given in LeastSquaresOptim.jl.

10. Solvers, Optimizers#

10.1. Overview#

10.2. Optimization#