Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. For large numbers of local optima, SA can find the global optimum.[1] It is often used when the search space is discrete (for example the traveling salesman problem, the boolean satisfiability problem, protein structure prediction, and job-shop scheduling). For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound.
The name of the algorithm comes from annealing in metallurgy, a technique involving heating and controlled cooling of a material to alter its physical properties. Both are attributes of the material that depend on their thermodynamic free energy. Heating and cooling the material affects both the temperature and the thermodynamic free energy or Gibbs energy.Simulated annealing can be used for very hard computational optimization problems where exact algorithms fail; even though it usually achieves an approximate solution to the global minimum, it could be enough for many practical problems.
The problems solved by SA are currently formulated by an objective function of many variables, subject to several mathematical constraints. In practice, the constraint can be penalized as part of the objective function.
Similar techniques have been independently introduced on several occasions, including Pincus (1970),[2] Khachaturyan et al (1979,[3] 1981[4]), Kirkpatrick, Gelatt and Vecchi (1983), and Cerny (1985).[5] In 1983, this approach was used by Kirkpatrick, Gelatt Jr., Vecchi, for a solution of the traveling salesman problem. They also proposed its current name, simulated annealing.
This notion of slow cooling implemented in the simulated annealing algorithm is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space is explored. Accepting worse solutions allows for a more extensive search for the global optimal solution. In general, simulated annealing algorithms work as follows. The temperature progressively decreases from an initial positive value to zero. At each time step, the algorithm randomly selects a solution close to the current one, measures its quality, and moves to it according to the temperature-dependent probabilities of selecting better or worse solutions, which during the search respectively remain at 1 (or positive) and decrease toward zero.
The simulation can be performed either by a solution of kinetic equations for probability density functions,[6] [7] or by using a stochastic sampling method.[8] [9] The method is an adaptation of the Metropolis–Hastings algorithm, a Monte Carlo method to generate sample states of a thermodynamic system, published by N. Metropolis et al. in 1953.[10]
The state s of some physical systems, and the function E(s) to be minimized, is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.
At each step, the simulated annealing heuristic considers some neighboring state s* of the current state s, and probabilistically decides between moving the system to state s* or staying in state s. These probabilities ultimately lead the system to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.
Optimization of a solution involves evaluating the neighbors of a state of the problem, which are new states produced through conservatively altering a given state. For example, in the traveling salesman problem each state is typically defined as a permutation of the cities to be visited, and the neighbors of any state are the set of permutations produced by swapping any two of these cities. The well-defined way in which the states are altered to produce neighboring states is called a "move", and different moves give different sets of neighboring states. These moves usually result in minimal alterations of the last state, in an attempt to progressively improve the solution through iteratively improving its parts (such as the city connections in the traveling salesman problem). It is even better to reverse the order of an interval of cities. This is a smaller move since swapping two cities can be achieved by twice reversing an interval.
Simple heuristics like hill climbing, which move by finding better neighbor after better neighbor and stop when they have reached a solution which has no neighbors that are better solutions, cannot guarantee to lead to any of the existing better solutions their outcome may easily be just a local optimum, while the actual best solution would be a global optimum that could be different. Metaheuristics use the neighbors of a solution as a way to explore the solution space, and although they prefer better neighbors, they also accept worse neighbors in order to avoid getting stuck in local optima; they can find the global optimum if run for a long enough amount of time.
The probability of making the transition from the current state
s
snew
P(e,enew,T)
e=E(s)
enew=E(snew)
T
P
enew
e
When
T
P(e,enew,T)
enew>e
T
T=0
In the original description of simulated annealing, the probability
P(e,enew,T)
enew<e
The
P
enew-e
Given these properties, the temperature
T
s
T
s
T
The name and inspiration of the algorithm demand an interesting feature related to the temperature variation to be embedded in the operational characteristics of the algorithm. This necessitates a gradual reduction of the temperature as the simulation proceeds. The algorithm starts initially with
T
T=0
For any given finite problem, the probability that the simulated annealing algorithm terminates with a global optimal solution approaches 1 as the annealing schedule is extended.[11] This theoretical result, however, is not particularly helpful, since the time required to ensure a significant probability of success will usually exceed the time required for a complete search of the solution space.
The following pseudocode presents the simulated annealing heuristic as described above. It starts from a state and continues until a maximum of steps have been taken. In the process, the call should generate a randomly chosen neighbour of a given state ; the call should pick and return a value in the range, uniformly at random. The annealing schedule is defined by the call, which should yield the temperature to use, given the fraction of the time budget that has been expended so far.
In order to apply the simulated annealing method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function, the candidate generator procedure, the acceptance probability function, and the annealing schedule AND initial temperature . These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices of these parameters that will be good for all problems, and there is no general way to find the best choices for a given problem. The following sections give some general guidelines.
Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves. An essential requirement for the function is that it must provide a sufficiently short path on this graph from the initial state to any state which may be the global optimum the diameter of the search graph must be small. In the traveling salesman example above, for instance, the search space for n = 20 cities has n! = 2,432,902,008,176,640,000 (2.4 quintillion) states; yet the number of neighbors of each vertex is
n-1 | ||
\sum | k= | |
k=1 |
n(n-1) | |
2 |
=190
n-1
To investigate the behavior of simulated annealing on a particular problem, it can be useful to consider the transition probabilities that result from the various design choices made in the implementation of the algorithm. For each edge
(s,s')
s'
s
P(e,e',T)
The specification of,, and is partially redundant. In practice, it's common to use the same acceptance function for many problems and adjust the other two functions according to the specific problem.
In the formulation of the method by Kirkpatrick et al., the acceptance probability function
P(e,e',T)
e'<e
\exp(-(e'-e)/T)
T
In 1990, Moscato and Fontanari, and independently Dueck and Scheuer, proposed that a deterministic update (i.e. one that is not based on the probabilistic acceptance rule) could speed-up the optimization process without impacting on the final quality. Moscato and Fontanari conclude from observing the analogous of the "specific heat" curve of the "threshold updating" annealing originating from their study that "the stochasticity of the Metropolis updating in the simulated annealing algorithm does not play a major role in the search of near-optimal minima". Instead, they proposed that "the smoothening of the cost function landscape at high temperature and the gradual definition of the minima during the cooling process are the fundamental ingredients for the success of simulated annealing." The method subsequently popularized under the denomination of "threshold accepting" due to Dueck and Scheuer's denomination. In 2001, Franz, Hoffmann and Salamon showed that the deterministic update strategy is indeed the optimal one within the large class of algorithms that simulate a random walk on the cost/energy landscape.
When choosing the candidate generator, one must consider that after a few iterations of the simulated annealing algorithm, the current state is expected to have much lower energy than a random state. Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state
s'
In the traveling salesman problem above, for example, swapping two consecutive cities in a low-energy tour is expected to have a modest effect on its energy (length); whereas swapping two arbitrary cities is far more likely to increase its length than to decrease it. Thus, the consecutive-swap neighbor generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with
n-1
n(n-1)/2
A more precise statement of the heuristic is that one should try the first candidate states
s'
P(E(s),E(s'),T)
P
E(s')-E(s)
T
T
When choosing the candidate generator one must also try to reduce the number of "deep" local minima—states (or sets of connected states) that have much lower energy than all its neighboring states. Such "closed catchment basins" of the energy function may trap the simulated annealing algorithm with high probability (roughly proportional to the number of states in the basin) and for a very long time (roughly exponential on the energy difference between the surrounding states and the bottom of the basin).
As a rule, it is impossible to design a candidate generator that will satisfy this goal and also prioritize candidates with similar energy. On the other hand, one can often vastly improve the efficiency of simulated annealing by relatively simple changes to the generator. In the traveling salesman problem, for instance, it is not hard to exhibit two tours
A
B
A
A
B
A
B
A
B
The physical analogy that is used to justify simulated annealing assumes that the cooling rate is low enough for the probability distribution of the current state to be near thermodynamic equilibrium at all times. Unfortunately, the relaxation time—the time one must wait for the equilibrium to be restored after a change in temperature—strongly depends on the "topography" of the energy function and on the current temperature. In the simulated annealing algorithm, the relaxation time also depends on the candidate generator, in a very complicated way. Note that all these parameters are usually provided as black box functions to the simulated annealing algorithm. Therefore, the ideal cooling rate cannot be determined beforehand and should be empirically adjusted for each problem. Adaptive simulated annealing algorithms address this problem by connecting the cooling schedule to the search progress. Other adaptive approaches such as Thermodynamic Simulated Annealing,[12] automatically adjusts the temperature at each step based on the energy difference between the two states, according to the laws of thermodynamics.
Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This process is called restarting of simulated annealing. To do this we set s
and e
to sbest
and ebest
and perhaps restart the annealing schedule. The decision to restart could be based on several criteria. Notable among these include restarting based on a fixed number of steps, based on whether the current energy is too high compared to the best energy obtained so far, restarting randomly, etc.