## A PR Exercise – a.k.

In the last few posts we've been looking at the BFGS quasi-Newton algorithm for minimising multivariate functions. This uses iteratively updated approximations of the Hessian matrix of second partial derivatives in order to choose directions in which to search for univariate minima, saving the expense of calculating it explicitly. A particularly useful property of the algorithm is that if the line search satisfies the Wolfe conditions then the positive definiteness of the Hessian is preserved, meaning that the implied locally quadratic approximation of the function must have a minimum.
Unfortunately for large numbers of dimension the calculation of the approximation will still be relatively expensive and will require a significant amount of memory to store and so in this post we shall take a look at an algorithm that only uses the vector of first partial derivatives.

## Bring Out The Big Flipping GunS – a.k.

Last month we took a look at quasi-Newton multivariate function minimisation algorithms which use approximations of the Hessian matrix of second partial derivatives to choose line search directions. We demonstrated that the BFGS rule for updating the Hessian after each line search maintains its positive definiteness if they conform to the Wolfe conditions, ensuring that the locally quadratic approximation of the function defined by its value, the vector of first partial derivatives and the Hessian has a minimum.
Now that we've got the theoretical details out of the way it's time to get on with the implementation.

## Big Friendly GiantS – a.k.

In the previous post we saw how we could perform a univariate line search for a point that satisfies the Wolfe conditions meaning that it is reasonably close to a minimum and takes a lot less work to find than the minimum itself. Line searches are used in a class of multivariate minimisation algorithms which iteratively choose directions in which to proceed, in particular those that use approximations of the Hessian matrix of second partial derivatives of a function to do so, similarly to how the Levenberg-Marquardt multivariate inversion algorithm uses a diagonal matrix in place of the sum of the products of its Hessian matrices for each element and the error in that element's current value, and in this post we shall take a look at one of them.

## Wolfe It Down – a.k.

Last time we saw how we could efficiently invert a vector valued multivariate function with the Levenberg-Marquardt algorithm which replaces the sum of its second derivatives with respect to each element in its result multiplied by the difference from those of its target value with a diagonal matrix. Similarly there are minimisation algorithms that use approximations of the Hessian matrix of second partial derivatives to estimate directions in which the value of the function will decrease.
Before we take a look at them, however, we'll need a way to step toward minima in such directions, known as a line search, and in this post we shall see how we might reasonably do so.

## Do The Evolution – a.k.

In the last few posts we have taken a look at genetic algorithms, which use simple models of biological evolution to search for global maxima of functions, being those points at which they return their greatest possible values.
These models typically represent the arguments of the function as genes within the binary chromosomes of individuals whose fitnesses are the values of the function for those arguments, exchange genetic information between them with a crossover operator, make small random changes to them with a mutation operator and, most importantly, favour the fitter individuals in the population for reproduction into the next generation with a selection operator.
We used a theoretical analysis of a simple genetic algorithm to suggest improved versions of the crossover operator, as well as proposing more robust schemes for selection and the genetic encoding of the parameters.
In this post we shall use some of them to implement a genetic algorithm for the `ak` library.

## The Best Laid Schemata – a.k.

We have seen how we can exploit a simple model of biological evolution, known as a genetic algorithm, to search for global maxima of functions, being those points at which they return their greatest values.
This model treated the function being optimised as a non-negative measure of the fitness of individuals to survive and reproduce, replacing negative results with zero, and represented their chromosomes with arrays of bits which were mapped onto its arguments by treating subsets of them as integers that were linearly mapped to floating point numbers with given lower and upper bounds. It simulated sexual reproduction by splitting pairs of the chromosomes of randomly chosen individuals at a randomly chosen position and swapping their bits from it to their ends, and mutations by flipping randomly chosen bits from the chromosomes of randomly chosen individuals. Finally, and most crucially, it set the probability that an individual would be copied into the next generation to its fitness as a proportion of the total fitness of the population, ensuring that that total fitness would tend to increase from generation to generation.
I concluded by noting that, whilst the resulting algorithm was reasonably effective, it had some problems that a theoretical analysis would reveal and that is what we shall look into in this post.

## It’s All In The Genes – a.k.

Last time we took a look at the simulated annealing global minimisation algorithm which searches for points at which functions return their least possible values and which drew its inspiration from the metallurgical process of annealing which minimises the energy state of the crystalline structure of metals by first heating and then slowly cooling them.
Now as it happens, physics isn't the only branch of science from which we can draw inspiration for global optimisation algorithms. For example, in biology we have the process of evolution through which the myriad species of life on Earth have become extraordinarily well adapted to their environments. Put very simply this happens because offspring differ slightly from their parents and differences that reduce the chances that they will survive to have offspring of their own are less likely to be passed down through the generations than those that increase those chances.
Noting that extraordinarily well adapted is more or less synonymous with near maximally adapted, it's not unreasonable to suppose that we might exploit a mathematical model of evolution to search for global maxima of functions, being those points at which they return their greatest possible values.

## Annealing Down – a.k.

A few years ago we saw how we could search for a local minimum of a function, being a point for which it returns a lesser value than any in its immediate vicinity, by taking random steps and rejecting those that lead uphill; an algorithm that we dubbed the blindfolded hill climber. Whilst we have since seen that we could progress towards a minimum much more rapidly by choosing the directions in which to step deterministically, there is a way that we can use random steps to yield better results.