Simulated Annealing Algorithm

                                       Simulated Annealing Algorithm

Simulated annealing algorithm, I believe that many people see this name at first, and if they don't add the word algorithm, they will not think that this is an algorithm for solving the optimal value. So what exactly is a simulated annealing algorithm? First of all, we need to know why it is called simulated annealing. The word annealing comes from metallurgy and refers to " heating the solid to a sufficiently high level, and then letting it cool slowly. , the particles inside the solid become disordered as the temperature rises , and the internal energy increases, while the particles gradually become orderly when cooling slowly, reaching an equilibrium state at each temperature , and finally reaching the ground state at room temperature , and the internal energy is reduced to a minimum. "--Baidu Encyclopedia. What does the word "simulate" mean? It means that I apply the idea of ​​annealing to find the minimum value of the objective function. Usually, if the number of iterations is sufficient and the parameter settings are reasonable, the global optimum can be found.

So why use the simulated annealing algorithm? We know that for a set of discrete values, we can use the exhaustive method to find the minimum value, and for a convex function, we can find the minimum value iteratively through gradient descent or Newton's method, but for a function with several local minimum values Functions, whether greedy or gradient descent, are prone to getting stuck in local minima. The simulated annealing algorithm, when finding the minimum value, will also accept the worse value with a certain probability, so that there is a chance to jump out of the local optimum and find the global optimum. Of course, this probability will gradually decrease with time. little. In fact, this idea is the Metropolis Acceptance Criterion (1953)

                                                   

What does it mean, df is y1-y0, that is, the new function value is smaller than the current function value, then we accept the new function value with probability 1 (that is, certain), if df is greater than 0, we will use the second The value of this formula accepts a worse value. It can be found that due to the characteristics of the exponential function, when our df is large or T0 is small, the probability of accepting a worse value approaches 0.

Let's take a look at the algorithm flow and consider the problem of a function minimizing f(x).

1. First, given an initial temperature T0, and an initial solution X0, calculate y0 

2. Decrease T0 by a certain factor (0.99, 0.98)

3. Disturb X0, that is, randomly make X0 change to X1, calculate y1, and compare the size of y1 and y0

4. If y1 is less than y0, it means that the new solution is better than the previous solution. We accept it. Otherwise, we accept the worse solution of y1 with a certain probability. This probability is the value calculated by the exponential function in the above formula.

5. Repeat 2, 3 several times under each T.

6. Determine whether T reaches the preset lower bound or whether the number of cycles is completed, if so, exit. No, go back to step 2.

We can find that if T0 is relatively large at the beginning, the value of exp(-df/T) will also be relatively large, that is to say, if it falls into the local optimum at the beginning, we have a high probability of jumping out. As the random T decreases, the probability of accepting a worse solution becomes smaller, and eventually it falls to the global optimum.



            

  

    

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324482233&siteId=291194637