Simulated annealing algorithm of optimization algorithm series (1) - the boring version of the basic principle

Simulated annealing algorithm of optimization algorithm series (1) - the boring version of the basic principle

Recommended books --> "Intelligent Optimization Algorithms and Their MATLAB Examples (Second Edition)"

The image description on Zhihu:

  A large pot with a bumpy bottom and many pits. Shaking the pot makes a small ball reach the global minimum. At the beginning, the shaking is more severe, and the change of the small ball is relatively large. When it tends to the global minimum, the amplitude of the shaking pot is gradually reduced, until the pot is not shaken at the end, and the small ball reaches the global minimum.

1. History (you can skip if you are not interested)

  The famous simulated annealing algorithm is a method based on the design of Monte Carlo to approximate the optimization problem.

  American physicist N. Metropolis and colleagues published an article in 1953 on the study of complex systems and the calculation of energy distribution in them. They used Monte Carlo simulation to calculate the energy distribution of molecules in multi-molecular systems. This is equivalent to the beginning of the issues discussed in this article. In fact, a term often mentioned in simulated annealing is the Metropolis criterion, which we will introduce later.

  S.Kirkpatrick, CD Gelatt and MP Vecchi, physicists of IBM Corporation in the United States, published an influential article in "Science" in 1983: "Optimization by Simulated Annealing". When they borrowed the method of Metropolis et al. to explore a spin glass system (spin glass system), they found that the energy of the physical system and some combinatorial optimization problems (the famous traveling salesman problem TSP is a representative example) is quite similar: seeking the lowest cost is like seeking the lowest energy. As a result, they developed a set of algorithms based on the Metropolis method, and used it to solve combinatorial problems and seek optimal solutions.

  Almost at the same time, European physicist V. Carny also published almost the same results, but both were independently discovered; it was just that Carny was "unlucky", and no one noticed his masterpiece at the time; perhaps it can be said that "Science " magazine is marketed all over the world with a high degree of "exposure" and is well-known. However, Carny published his results in another academic journal "J.Opt.Theory Appl." focus on.

  Kirkpatrick et al. were inspired by the Monte Carlo simulation by Metropolis et al. and invented the term "simulated annealing" because it is similar to the annealing process of objects. Finding the optimal solution (maximum value) of the problem is similar to finding the lowest energy of the system. Therefore, when the system cools down, the energy gradually decreases, and in the same sense, the solution of the problem also "falls" to the minimum value.

       Solid annealing:

       1. First heat the solid to dissolve and then cool slowly.

       2. Annealing should be carried out slowly so that the system reaches equilibrium at each temperature.

       3. Do not reduce the temperature sharply during cooling.

       Simulated annealing algorithm:

       1. Starting from an initial solution i0, after L times of solution transformation (each time according to the Metropoils algorithm), the relative optimal solution at a given temperature is obtained.

       2. Reduce the control parameter T and re-transform the solution (as above)

       3. Obtain the optimal solution when the control parameter T tends to 0

2. What is annealing - physical origin

  In thermodynamics, the annealing phenomenon refers to a physical phenomenon in which an object gradually cools down. The lower the temperature, the lower the energy state of the object; when it is low enough, the liquid begins to condense and crystallize. In the crystalline state, the energy state of the system is the lowest. When nature slowly cools down (that is, anneals), it can "find" the lowest energy state: crystallization. However, if the process is too fast and rapid, rapid cooling (also known as "quenching") will result in an amorphous state that is not the lowest energy state.

  As shown in the figure below, first (left) the object is in an amorphous state. We heat the solid to a sufficiently high temperature (middle image) and allow it to cool slowly, or anneal (right image). When heating, the particles inside the solid become disordered with the temperature rise, and the internal energy increases, while the particles gradually become orderly when cooling slowly, and reach equilibrium at each temperature, and finally reach the ground state at room temperature, and the internal energy decreases. is the smallest (at this point the object appears in crystal form).

  It seems that nature knows how to work slowly and carefully: slowly cool down, so that the molecules of the object can have enough time to find a settlement at each temperature, and then gradually, the lowest energy state can be obtained at the end, and the system is the most stable.

3. Metropolis ( Monte Carlo ) criterion  

  In 1953, Metropolis proposed such an importance sampling method, that is, if a new state j is generated from the current state i, if the internal energy of the new state is less than that of state i (Ej<Ei), then the new state j is accepted as the new state j. Otherwise, accept the state j with the probability exp[-(Ej-Ei)/kT], where k is the Boltzmann constant, which is the so-called Metropolis criterion.

  According to the Metropolis ( Monte Carlo ) criterion, the probability that the particle tends to balance at temperature T is exp(-ΔE/(kT)), where E is the internal energy at temperature T, ΔE is its change number, and k is the Boltzmann constant . The Metropolis criterion is often expressed as

  

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

  According to the principle of thermodynamics, when the temperature is T, the probability of a drop in temperature with an energy difference of dE is p(dE), expressed as: 

          

  Among them: k is the Boltzmann constant, the value is k=1.3806488(13)×10−23,

      exp represents the natural exponent,

      dE<0

                dE/kT<0

                The value range of the p(dE) function is (0,1).

                It satisfies the definition of the probability density function.

                In fact, the more intuitive meaning of this formula is: the higher the temperature, the greater the probability of a temperature drop with an energy difference of p(dE); the lower the temperature, the smaller the probability of a temperature drop.
  

  In practical problems, the calculation of "certain probability" refers to the annealing process of metal smelting. Assuming that the current feasible solution is x, and the solution after iterative update is x_new, then the corresponding "energy difference" is defined as: 
       Δf=f(x_new)−f(x).

  The corresponding "certain probability" is: 
        

  Note: In practical problems, k=1 can be set. Because kT can be equivalent to a parameter T. For example, setting k=2 and T=1000 is equal to the effect of directly setting T=2000.

4. Introduce simulated annealing (Simulate Anneal)

  Imagine if we now have a function like the following, and now we want to find the (global) optimal solution of the function. If the Greedy strategy is adopted, then start testing from point A, and if the function value continues to decrease, then the testing process will continue. And when we reach point B, obviously our exploration process is over (because no matter which direction we work hard, the result will only get bigger and bigger). In the end, we can only find a partial final solution B.

  Simulated annealing is actually a Greedy algorithm, but its search process introduces random factors. When iteratively updating the feasible solution, a solution worse than the current solution is accepted with a certain probability, so it is possible to jump out of this local optimal solution and reach the global optimal solution. Take the following figure as an example, assuming that the initial solution is the blue point A on the left, the simulated annealing algorithm will quickly search for the local optimal solution B, but after searching for the local optimal solution, it will not end here, but will accept it with a certain probability Move to the left. Perhaps after several such non-local optimal moves, the global optimal point D will be reached, and then the local minimum will be jumped out.

  funny metaphor

  Hill Climbing Algorithm : The rabbit jumps towards a higher place than it is now. It found the highest mountain not far away. But the mountain doesn't have to be Mount Everest. This is the hill-climbing algorithm, which cannot guarantee that the local optimum is the global optimum.

      (Very lazy, I will accept it when I see it higher than now)

  Simulated annealing : The rabbit is drunk. It jumps randomly for a long time. During this period, it may go to a high place, or it may step into a flat ground. However, it gradually woke up and jumped towards the highest direction. This is simulated annealing.

      (I was very excited at the beginning, I tried everywhere, and changed a lot. Later, I got tired and gradually slowed down my pace, climbing towards the highest point at this moment)  

5. Simulated annealing principle

  Simulated Anneal Arithmetic (SAA) is a general probability algorithm used to find the optimal solution of a proposition in a large search space.

  The principle of simulated annealing is also similar to that of metal annealing: apply the theory of thermodynamics to statistics, imagine each point in the search space as a molecule in air; the energy of a molecule is its own kinetic energy; Each point, like an air molecule, also carries an "energy" to indicate how well that point fits the proposition. The algorithm starts with an arbitrary point in the search space: at each step, a "neighbor" is selected, and then the probability of reaching the "neighbor" from the existing position is calculated.

6. Simulated annealing algorithm model

The simulated annealing algorithm can be decomposed into three parts: solution space, objective function and initial solution.

    • The basic idea of ​​simulated annealing:
      • (1) Initialization: initial temperature T (sufficiently large), initial solution state S (starting point of algorithm iteration), iteration number L for each T value
      • (2) Do steps (3) to 6 for k=1, ..., L:
      • (3) Generate a new solution S′
      • (4) Calculate the increment Δt′=C(S′)-C(S), where C(S) is the evaluation function||optimization target
      • (5) If Δt′<0, accept S′ as the new current solution, otherwise accept S′ as the new current solution with probability exp(-Δt′/T).
      • (6) If the termination condition is met, output the current solution as the optimal solution and end the program. The termination condition is usually chosen to terminate the algorithm when several consecutive new solutions are not accepted.
      • (7) T gradually decreases, and T->0, then go to step 2.

  The algorithm corresponds to the dynamic demonstration diagram:

  

    • The generation and acceptance of new solutions of simulated annealing algorithm can be divided into the following four steps:
      • The first step is to generate a new solution in the solution space from the current solution by a generating function; in order to facilitate subsequent calculation and acceptance, and reduce the time-consuming algorithm, the method of generating a new solution by simply transforming the current new solution is usually selected , such as replacing or exchanging all or part of the elements constituting the new solution. Note that the transformation method that generates the new solution determines the neighborhood structure of the current new solution, and thus has a certain impact on the selection of the cooling schedule.
      • The second step is to calculate the difference of the objective function corresponding to the new solution. Since the objective function difference is only produced by the transformed part, the calculation of the objective function difference is preferably performed incrementally. This has been shown to be the fastest way to compute the difference of the objective function for most applications.
      • The third step is to judge whether the new solution is accepted. The judgment is based on an acceptance criterion. The most commonly used acceptance criterion is the Metropolis criterion: if Δt′<0, accept S′ as the new current solution S; otherwise, the probability exp(- Δt'/T) accepts S' as the new current solution S.
      • The fourth step is to replace the current solution with the new solution when the new solution is confirmed to be accepted. This only needs to realize the transformation part of the current solution corresponding to the generation of the new solution, and modify the value of the objective function at the same time. At this point, the current solution has achieved one iteration. On this basis, the next round of experiments can be started. And when the new solution is judged to be discarded, the next round of experiments is continued on the basis of the original current solution.

  The simulated annealing algorithm has nothing to do with the initial value, and the solution obtained by the algorithm has nothing to do with the initial solution state S (which is the starting point of the algorithm iteration); the simulated annealing algorithm has asymptotic convergence, and it has been proved in theory that it converges to Global optimization algorithm for global optimal solution; simulated annealing algorithm has parallelism.

7. Basic elements of simulated annealing

State Space and State Generating Functions

  1) The search space is also called the state space, which consists of a set of encoded feasible solutions.
  2) The state generation function (neighborhood function) should try to ensure that the generated candidate solutions are distributed throughout the entire solution space. It usually consists of two parts, the way to generate candidate solutions and the probability distribution of candidate solutions.
  3) Candidate solutions are generally obtained by randomly sampling the solution space according to a certain probability density function.
  4) The probability distribution can be uniform distribution, normal distribution, exponential distribution, etc. 

state transition probability

  1) State transition probability refers to the transition probability from one state to another state.
  2) The popular understanding is the probability of accepting a new solution as the current solution.
  3) It is related to the current temperature parameter T, and decreases as the temperature drops.
  4) The Metropolis criterion is generally adopted.

Inner loop termination criterion: also known as Metropolis sampling stability criterion, used to determine the number of candidate solutions generated at each temperature. Commonly used sampling stability criteria include:

  1) Check whether the mean of the objective function is stable.
  2) The change of the target value for several consecutive steps is small.
  3) Sampling by a certain number of steps.

Outer loop termination criterion: the algorithm termination criterion, commonly used include:

  1) Set the threshold of termination temperature.
  2) Set the number of iterations of the outer loop.
  3) The optimal value searched by the algorithm remains unchanged for several consecutive steps.
  4) Check whether the system entropy is stable.

8. Parameter description

  The annealing process is controlled by a set of initial parameters, that is, the cooling schedule. Its core is to try to make the system reach equilibrium so that the algorithm can approach the optimal solution within a limited time. The cooling schedule includes: 

(1) The initial value T0  of the control parameter : the temperature at which the cooling starts.

  (2) The decay function of the control parameter T: Since the computer can handle only discrete data, the continuous cooling process is discretized into a series of temperature points during the cooling process, and the decay function is the expression for calculating this series of temperatures.

  (3) The final value Tf  of the control parameter T (stopping criterion).

(4) The length Lk  of the Markov chain : the number of iterations at any temperature T. 

9. Parameter setting

 The simulated annealing algorithm is widely used and can solve NP-complete problems, but its parameters are difficult to control. The main problems are as follows:

(1) The initial value T0 of the control parameter T

  Random search algorithms for solving global optimization problems generally adopt a search strategy that combines large-scale rough search and local fine search.

  Only by finding the area where the global optimal solution is located in the initial large-scale search stage, can the search scope be gradually narrowed, and the global optimal solution finally be obtained.

  The simulated annealing algorithm realizes a large-scale rough search and a local fine search by controlling the initial value T0 of the parameter T and the gas decay process.

  When the problem scale is large, too small T0 often makes it difficult for the algorithm to jump out of the local trap and fail to reach the global optimum. Generally 100°C.

   [If the initial temperature is high, it is more likely to find the global optimal solution, but it takes a lot of computing time; otherwise, the computing time can be saved, but the global search performance may be affected. In practical application, the initial temperature generally needs to be adjusted several times according to the experimental results.

(2) The attenuation function of the control parameter T

  The decay function can have many forms, a commonly used decay function is

 

  Among them, k is the number of times of cooling, α is a constant, which can be taken as 0.5~0.99, and its value determines the speed of the cooling process.

   [In order to ensure a large search space, a generally takes a value close to 1, such as 0.95, 0.9]

(3) Annealing speed: Markov chain length

  The selection principle of the Markov chain length is: on the premise that the attenuation function of the control parameter T has been selected, Lk should be able to achieve a quasi-balance on each value of the control parameter T.

  From experience, for simple cases, Lk=100n can be set, where n is the problem size.

   [An increase in the number of cycles will inevitably lead to an increase in computational overhead. In practical applications, a reasonable annealing balance condition should be set according to the nature and characteristics of the specific problem.

(4) The final value Tf  of the control parameter T (stopping criterion). Or several consecutive Mapkob chain solutions do not change, that is, several consecutive solutions do not change.

  Algorithm Stopping Criterion: For Acceptance Functions in Metropolis Criterion

  

   The analysis shows that in the case of high temperature with relatively large T, the denominator of the exponent is relatively large, and this is a negative exponent, so the entire acceptance function may tend to 1, that is, a new solution xj that is worse than the current solution xi may also be accepted Therefore, it is possible to jump out of the local minimum and conduct a wide-area search to search other areas of the solution space; and as the cooling progresses, when T is reduced to a relatively small value, the denominator of the receiver function is smaller and the whole is also smaller. That is, it is difficult to accept a solution that is worse than the current solution, and it is not easy to jump out of the current area. If at high temperature, a sufficient wide-area search has been carried out to find the best possible solution area, and if enough local search is performed at low temperature, the global optimum may eventually be found. Therefore, the general termination temperature Tj should be set as a small enough positive number, such as 0.01~5.

  【Generally, the termination temperature Tj should be set as a small enough positive number, such as 0.01~5.

10. Advantages, disadvantages and improvements

  Simulated annealing algorithm (simulated annealing, SA) is a general probability algorithm, used to find the optimal solution of the problem in a large search space.

  Advantages: It can effectively solve NP-hard problems and avoid falling into local optimal solutions.

        The calculation process is simple, general, robust, suitable for parallel processing, and can be used to solve complex nonlinear optimization problems.

     The simulated annealing algorithm has nothing to do with the initial value, and the solution obtained by the algorithm has nothing to do with the initial solution state S (which is the starting point of the algorithm iteration);

        The simulated annealing algorithm has asymptotic convergence, and it has been proved in theory to be a global optimization algorithm that converges to the global optimal solution with probability;

      Simulated annealing algorithm has parallelism

  Disadvantages: slow convergence speed, long execution time, algorithm performance is related to initial value and sensitive to parameters.

     The optimization process is longer due to the requirement of higher initial temperature, slower cooling rate, lower termination temperature, and sufficient number of samples at each temperature.

    (1) If the cooling process is slow enough, the performance of more solutions will be better, but the convergence speed is too slow;

      (2) If the cooling process is too fast, the global optimal solution may not be obtained.

  Applicable environment: combinatorial optimization problems.

  Improve:

    (1) Design an appropriate state generating function so that it can show the full space dispersion or local regionality of the state according to the needs of the search process.

    (2) Design an efficient annealing strategy.

    (3) Avoid circuitous search of states.

    (4) Adopt parallel search structure.

    (5) In order to avoid falling into local minimum, improve the control method of temperature.

    (6) Choose an appropriate initial state.

    (7) Design an appropriate algorithm termination criterion.

    -------------------------------------------------------------------------------------------------------------------------------

    (8) Increase the heating or reheating process. At an appropriate time in the algorithm process, the temperature is appropriately increased to activate the acceptance probability of each state to adjust the current state in the search process and avoid the algorithm from stagnating at the local minimum solution.

    (9) Increase the memory function. In order to avoid losing the optimal solution currently encountered due to the implementation of the probability acceptance link in the search process, some previous good states can be memorized by adding a storage link.

    (10) Increase supplementary search process. That is, after the annealing process ends, the simulated annealing process or locality search is performed again with the searched optimal solution as the initial state.

    (11) For each current state, multiple search strategies are used to accept the optimal state in the area with probability, instead of the single comparison method of standard SA.

    (12) Algorithms combined with other search mechanisms, such as genetic algorithms, chaotic search, etc.

    (13) Comprehensive application of the above methods.

11. Summary

    The local search algorithm represented by hill-climbing algorithm is only suitable for some combinatorial optimization problems and the quality of the solution is not very ideal. So in order to overcome these shortcomings, people look for solutions through some natural physical processes. The simulated annealing algorithm is derived from the simulation of the annealing process of solids. By adopting the Metropolis acceptance criterion and using a set of parameters called cooling tables to control the algorithm process, we can Finds an approximate optimal solution in polynomial time.

  Simulated annealing algorithm (simulated annealing, SA) is a general probability algorithm, used to find the optimal solution of the problem in a large search space. Because it can effectively solve NP-hard problems and avoid falling into local optimal solutions, it has been widely used in production scheduling, control engineering, machine learning, neural network, image processing and other fields.

Guess you like

Origin blog.csdn.net/dw1360585641/article/details/129804785