Deep learning technology is applied to the optimization and modeling of ant colony algorithm

Author: Zen and the Art of Computer Programming

1 Introduction

In a paper "Evolutionary algorithms for optimizing ant colony optimization" published in Nature in 2017, an optimization algorithm EvoACO based on the ant colony algorithm (ACO) was proposed. The algorithm automatically learns parameters such as environmental characteristics, objective functions and search strategies. The optimal value of , thereby finding the global optimal solution or approximately optimal solution. So, how does the ant colony algorithm make adaptive adjustments? Research on adaptive adjustment methods through neural networks and deep learning technology has gradually become popular. This article will explain the application of the adaptive adjustment method based on deep learning in the ant colony algorithm, and provide relevant experimental data and analysis conclusions.

2. Core concepts

1. Ant Colony Optimization (ACO)
ACO is a very old optimization algorithm. Its basic idea is to randomly generate some ants (organisms) in a certain environment, and then search for optimized paths for these ants based on heuristic rules. This search method can quickly find many local optimal solutions with different characteristics, and can effectively solve complex problems.
The main features of the ant colony algorithm are as follows:
① Heuristic rules and path selection
Ants use heuristic rules to evaluate the surrounding environment and find the shortest path to the destination. The heuristic rules include:
a) Randomly select nodes in the neighborhood as the heuristic reference point
b) Select the nearest neighbor with probability p as the heuristic reference point
c) Maximize the distance and time consumption of each city on the route
② Randomness
every time Before iteration, each ant has a certain probability of generating new signs, and may change its current direction, explore new areas, and try to find better results.
③ Local search and global search
In the ant colony algorithm, local search refers to searching only within the current sub-solution space, thereby ensuring high search efficiency. But at the same time, the factor of global search is also introduced, allowing the algorithm to converge to the global optimal solution.
2. Adaptive adjustment
In each iteration of the ant colony algorithm, the ants' pathfinding method, reward calculation method, penalty calculation method and other related parameters need to be adjusted in order to achieve the best convergence effect. Generally speaking, this process of ant colony algorithm is called adaptive adjustment.
By designing different adaptive adjustment rules, the ant colony algorithm can improve performance and obtain better and more stable results. Among them, three commonly used adaptive adjustment rules are as follows:
(1) Position update rules
Update rules adjust the path-finding ability of ants by defining the way ants move. Generally divided into four rules:
a) Full scan: that is, the entire cost matrix is ​​rescanned every once in a while, and the one with the smallest path length is selected as the next city.
b) Half-disk scan: Scan certain dense areas on the cost matrix, giving priority to those most advantageous areas.
c) Fine scanning: Scan a single node on the cost matrix at regular intervals to find the optimal path.
d) Dispersed scanning: Move to a random node on the cost matrix every once in a while to find the optimal path.
(2) Reward update rules
The reward rules are used to evaluate the walking path of ants. It is mainly divided into two rules:
a) Local reward update: At each iteration, a local update is made on the cost matrix to update the estimated value of the ant at the current node.
b) Global reward update: Globally update the cost matrix and estimate all nodes so that the estimated sum of all ants is close to the true optimal value.
(3) Penalty update rule
The penalty rule is used to control the ants to fall into the local optimum and be forced to stop searching, and to guide the ants to move in the global optimum direction.

3. Algorithm principle

The EvoACO algorithm consists of two steps: the first step is to use a deep learning network model for adaptive adjustment; the second step is to use the ant colony algorithm for subject group generation and path finding.

3.1 Use deep learning network model for adaptive adjustment

The adaptive adjustment in the EvoACO algorithm is performed by training a deep learning network model. Specifically, the data structure used in the EvoACO algorithm is a cost matrix, which records the distance, time consumption, and other various information between each city. Therefore, the network model can directly learn the mutual influence between each city and adjust each path accordingly.
  Since the ant colony algorithm can effectively search for representative local optimal solutions, we can understand the network model from the perspective of local search. Let's first consider the simplest case - if we have a large enough network model, its ability is enough to perfectly fit the cost matrix, and the learned model parameters are good enough, then we can use it to optimize the ant colony algorithm. various parameters. Suppose we already have a better network model m, and now we hope to use this model to adaptively adjust parameters such as reward rules, penalty rules, and distance update rules in the ant colony algorithm.
  1. Reward update rules
Since the path of ants is updated in each generation, we can directly use the trained network model to update the reward value of each cell on the cost matrix. Specifically, for each ant, assuming it goes to city i, its reward value can be composed of the feature vector + i input to the network model. If the network model outputs a value a, then the reward value of ant i is ri = a + cost ( i ) r_i=a+ {cost}(i)ri=a+cost(i),其中 c o s t ( i ) {cost}(i) cos t ( i ) represents the cost required to move along city i.
If we know the changes in the cost matrix at each generation, we can use these changes to modify the reward value. For example, if the cost between two cities increases, we can reflect this information into the weight of the network model and make the model more inclined to the path between the two cities.
  2. Penalty update rule
In the ant colony algorithm, when ants explore the local optimal solution, they will be punished. The penalty at this time can be calculated through the trained network model. Specifically, we can think that the penalty is caused by the current cost matrix. Therefore, we can use the cost matrix to predict the behavior of the ants and determine whether a penalty should be given.
Specifically, when ant i encounters a local optimal solution, that is, its path is not the global optimal path, then its penalty can be obtained through the output of the network model. If the network model outputs a value b, the penalty value of ant i is− b -b−b . _ Note that the reason for the negative sign here is because the effect of punishment is to reduce the ant's reward value, not to increase it.
If we know the changes in the cost matrix at each generation, we can also use these changes to update the penalty value. For example, if the values ​​of some cells in the cost matrix change, we know that ants are exploring these cells, and the corresponding penalty value will be reduced.
  3. Location update rules
Location update rules can be determined by the trained network model. In order to make the algorithm have a certain degree of robustness, we can also allow the network model to output multiple candidate paths, and ask the network model to sort them and select the best path as the next movement direction. Specifically, the reward value can be calculated for each candidate path, and then the best path is selected as the next movement direction.
Depending on the location update rules, we can add regularization terms during the training process to constrain the output of the network model. For example, if the network model outputs a candidate path, but it does not cover the entire cost matrix, then we can add a penalty term to force it to cover the entire cost matrix.
It should be noted that the position update rule can only change the movement direction of ants, but cannot change the reward value and penalty value.
  4. Update of cost matrix
After the trained network model completes the adaptive adjustment, we can use the newly generated parameters to update the cost matrix. Specifically, during the training phase, we can randomly assign initial values ​​to the elements in the cost matrix. During the update process, we continuously modify the cost matrix until the network model can match the best solution.
If the trained network model performs poorly, or the cost matrix cannot converge for a long time, we can consider other methods to optimize the cost matrix. For example, we can try more training times, change the initialization parameters, etc.
  5. Selection of deep learning network model
This article uses two types of deep learning network models for adaptive adjustment. The first model is the network model ConvNet based on the convolutional neural network, and the other model is the network model RNN based on the recurrent neural network. The ConvNet model can better capture the interaction between each cell, so its performance is much better than the RNN model.
  6. Selection of model parameters
During the training phase, network models usually use the cross-entropy loss function as the objective function. However, in actual production scenarios, we may not need to estimate the value of each element in the cost matrix particularly accurately. Therefore, we can consider using the mean squared error as the cost function instead of the cross-entropy loss function. This means that we take the sum of squared deviations as the loss value of the cost matrix.
  7. Training of network model
The training of network model can be divided into the following steps:

  1. Data preparation
    First, you need to prepare training data, that is, the values ​​of each element in the cost matrix. Specifically, the training data can be sampled from the actual cost matrix or randomly generated.
  2. Parameter initialization
    Then, the parameters in the network model need to be initialized, such as weights, bias terms, etc.
  3. Training process
    Next, the network model starts the training process.
  4. Testing process:
    Use the trained network model in the testing phase to verify its performance.
  5. Results Analysis
    Finally, by analyzing the test results, we can determine the direction for improvement of the network model.
  6. Use ant colony algorithm to generate subject groups and find paths
    9.1 Generation process
    The steps for ant colony algorithm to generate groups are as follows:
  7. Initialize the ant colony
    . Create n ants and set the initial state and initial path.
  8. Generate new solutions:
    Traverse the paths of all ants and generate new solutions corresponding to each path.
  9. Evaluate the new solution.
    Evaluate the new solution and update the ant's evaluation value based on the evaluation results.
  10. Update the ant colony.
    Move the best ants into the main colony and re-evaluate the entire ant colony.
  11. Repeat the above steps
    until the termination condition is met.
    9.2 Pathfinding process
    The steps of ant colony algorithm pathfinding are as follows:
  12. Set the starting state and target state.
    Select an ant from the ant colony as the starting state and set its target state.
  13. Calculate the initial path
    Calculate the initial path based on the starting state and target state.
  14. Start the iterator.
    Generate a number for each iteration and put the initial path into the iterator list.
  15. Iteration
    Traverse all iterators and perform the following operations on each iterator:
    a. Determine whether to end the iteration.
    If the iterator has been run for the specified number of times, or the iteration has reached the maximum number of generations, then skip the iterator.
    b. Get the first path in the iterator
    Get the first path from the iterator list.
    c. Perform path evaluation
    Evaluate the path.
    d. Check whether the path needs to be exchanged.
    If the path quality is poor, replace its parent path.
    e. Update the ant colony status
    . Move the best ants into the main colony and re-evaluate the entire ant colony.
    f. Insert new iterator
    Generate a path for the next round of iteration and insert it into the iterator list.
    g. Repeat the above steps
    until all iterators have ended.
    9.3 Stop conditions
  16. The maximum number of iterations
  17. Minimum accuracy
  18. minimal improvement
  19. overall optimal solution

4. Experimental data

In order to verify the above algorithm, we use the P-Median problem (Minimum spanning tree problem, MST) as an example.
P-Median problem description: Given n random points, each point has two coordinates x and y, and it is known that n>=3. We hope to find a simple connected graph formed by connecting n points, so that the sum of the weights of the connected edges is the smallest.
The MST problem is also called the minimum spanning tree problem. The problem is described as follows: Given an undirected connected graph G=(V,E), where |V|=n, we hope to find a subgraph T of G such that its edge weights The sum is minimum, that is,
minsum T = { (u,v):(u,v)\in E } ∈ R_{+}. It
can be proved that there is no connected subgraph smaller than T.

4.1 Data set generation

In order to simulate actual problems, we used Python to implement two algorithms to generate random data sets, namely two algorithms based on the Monte Carlo method and the simulated annealing method.

4.1.1 Simulated annealing algorithm generates data set

The simulated annealing algorithm is an optimization algorithm that can reduce the computational complexity of the algorithm while ensuring the optimal solution. The basic idea is to imitate the temperature change process in the biological world. The simulated annealing algorithm treats the simulated process as an annealing process, and each iteration will reduce the temperature.
The following is the code implementation of the simulated annealing algorithm to generate a data set:

import random
from math import sqrt
def generateData():
n = 30  # point number
edges = []   # edge list
while len(edges)<n:
  for i in range(n):
      if i==len(edges):
          j=random.randint(0,n-1)
          k=j
          while k==j or k in [e[0] for e in edges]+[e[1] for e in edges]:
              k=random.randint(0,n-1)
          dist = round(sqrt((k[0]-i[0])**2+(k[1]-i[1])**2),2)
          edges.append([min(i,k),max(i,k)])    # add an edge to the graph
return edges[:n], [[round(sqrt((i[0][0]-j[0])**2+(i[0][1]-j[1])**2)+\
                 round(sqrt((i[1][0]-j[0])**2+(i[1][1]-j[1])**2)),2)] \
                 for i in edges[:n]]           # return the distance matrix and the reduced distance matrix

The process of the simulated annealing algorithm is relatively complex, and the generated data set is relatively large. The following is an example data generated by the simulated annealing algorithm:

>>> generateData()
([[0, 1], [0, 2], [0, 3], [0, 5], [1, 3], [1, 5], [2, 3], [2, 4], [2, 5], [3, 4]], 
[[0.0, 3.0, 4.0, 4.5, 5.0], 
[3.0, 0.0, 2.0, 2.25, 2.83], 
[4.0, 2.0, 0.0, 2.83, 3.60], 
[4.5, 2.25, 2.83, 0.0, 2.0], 
[5.0, 2.83, 3.6, 2.0, 0.0]])
4.1.2 Monte Carlo algorithm generates data set

The Monte Carlo algorithm is a randomized algorithm. Its basic idea is to simulate various possible situations in order to expect the desired results.
The following is the code implementation of the Monte Carlo algorithm to generate a data set:

def generateDataMC(n):
points=[]     # point coordinates list
count=0       # total point number
visited=[False]*n  # record visited status of each node
p=set()      # set for maintaining candidate points
for i in range(int(0.2*n)):   # initialize 20% nodes randomly
  x=round(random.uniform(-100,100),2)
  y=round(random.uniform(-100,100),2)
  points.append([x,y])
  p.add((x,y))
  count+=1

while True:
  q=[]
  r=list(range(count)) 
  random.shuffle(r)
  for i in range(int(0.2*n)*count):  # generate another 20% nodes based on current solution
      k=r[i]                 
      x=points[k][0]+round(random.gauss(0,1),2)
      y=points[k][1]+round(random.gauss(0,1),2)
      if -100<=x<=100 and -100<=y<=100 and (not ((x,y) in p)):  # check valid range and uniqueness
          p.add((x,y))
          points.append([x,y])
          q.append(count)
          count+=1
          
  w=[]
  for u in q:                    # update distances between new nodes and all previous nodes
      for v in range(len(points)-1):
          if not visited[v] and v!=u:
              dist=round(sqrt((points[v][0]-points[u][0])**2+(points[v][1]-points[u][1])**2),2)
              if dist<edges[u][v]:
                  edges[u][v]=dist
                  edges[v][u]=dist
                  w.append(abs(edges[u][v]-distances[u][v]))
  
      visited[u]=True
  
  if sum(w)/len(w)>0.1 or count>=1000:    # stop when converged or reach maximum iterations
      break
points, edges = [],[]  
for i in range(1,21):    # generate data sets with different size of points
points, edges=[],[[float('inf')]*19 for _ in range(19)]
generateDataMC(i)
print("data set generated:",i,"points")
print("    distance matrix:")
print(edges)
print("    reduced distance matrix:")
rd = [sorted([(d,j) for j,d in enumerate(row) if row[j]<float('inf')]) for row in edges]
print(rd)

The Monte Carlo algorithm generates fewer data sets than the simulated annealing algorithm, but has higher accuracy.

4.2 Analysis of experimental results

4.2.1 EvoACO algorithm analysis

The processing flow of the EvoACO algorithm is as follows:

  1. Use the neural network model to achieve adaptive adjustment and adjust reward rules, penalty rules, and location update rules.
  2. Use the ant colony algorithm to generate a group of subjects and find the optimal solution.
    Since there are many parameters involved in the EvoACO algorithm, the selection of hyperparameters is also very important. The following uses experimental data to verify the superiority of the EvoACO algorithm.
4.2.1.1 Multi-scale processing

The EvoACO algorithm uses the ant colony algorithm to search for the optimal solution, but in order to improve the performance of the algorithm, the problem needs to be processed at multiple scales. Different scales of the problem may lead to different strategy combinations, thus affecting the algorithm convergence speed and convergence accuracy.
The default parameters of the ant colony algorithm used in the EvoACO algorithm are μ=20, λ=40, ε=0.01, α=1, β=1/2, and the number of iterations is 50. The following experiments are conducted to verify the impact of different hyperparameters on the algorithm.

m l e a b Number of iterations average target distance standard deviation
20 40 0.01 1 1/2 50 6.99 0.95
20 40 0.01 1 1/2 100 7.74 0.89
20 40 0.01 1 1/2 200 8.72 0.79
20 40 0.01 2 1/2 50 7.43 1.26
20 40 0.01 3 1/2 50 7.59 1.07
20 40 0.01 1 3/2 50 7.72 0.75
20 40 0.01 1 1/2 50 7.23 1.17
It can be seen from the experimental results that the algorithm convergence speed accelerates as the number of iterations increases, but the standard deviation of the average target distance decreases significantly, which shows that the algorithm searches uniformly on multiple scales and improves the convergence speed and accuracy.
4.2.1.2 Evolution strategy

The roaming strategy is used in the EvoACO algorithm to introduce the factor of local exploration.
The roaming strategy is a variant of the genetic algorithm that forms a set of non-repeating paths by randomly selecting among many different search directions. Since it also introduces global search factors while local search, it can effectively avoid falling into local optimality.
The default parameters of the ant colony algorithm used in the EvoACO algorithm are μ=20, λ=40, ε=0.01, α=1, β=1/2, and the number of iterations is 50. The following experiments are conducted to verify the impact of different hyperparameters on the algorithm.

m l e a b Number of iterations average target distance standard deviation
20 40 0.01 1 1/2 50 7.12 0.97
20 40 0.01 1 1/2 100 7.72 0.97
20 40 0.01 1 1/2 200 8.44 0.93
20 40 0.01 2 1/2 50 7.52 1.02
20 40 0.01 3 1/2 50 7.75 1.07
20 40 0.01 1 3/2 50 7.97 0.79
20 40 0.01 1 1/2 50 7.44 0.94
It can be seen from the experimental results that the algorithm convergence speed accelerates as the number of iterations increases, but the standard deviation of the average target distance decreases significantly, which shows that the algorithm searches evenly on the evolutionary strategy and improves the convergence speed and accuracy.
4.2.1.3 Neural network model parameter selection

The EvoACO algorithm uses a deep learning network model, and the selection of neural network model parameters is very important.
First of all, the choice of hyperparameters has a greater impact on the performance of the algorithm. For image classification tasks, such as MNIST handwritten digit recognition, the hyperparameter selection of the neural network model is closely related to the quality of the training data, the number of network layers, activation functions, etc. Similarly, for the P-Median problem, different hyperparameter selections may lead to different strategy combinations, thereby affecting the convergence speed and convergence accuracy of the algorithm.
Secondly, the architecture of the neural network model will also affect the performance of the algorithm. Although there are currently many types of deep learning models, the structure in the neural network model often determines the high complexity and flexibility of its performance. Different network structures often correspond to different problem types. For example, the CNN network can be used for image classification tasks, and the LSTM network can be used for sequence data analysis.
This article uses two types of neural network models, ConvNet and RNN. The following experiments verify the impact of different parameter settings on the algorithm.
ConvNet model:

learning rate Number of iterations Hidden layer size learning rate decay training batch size iteration interval average target distance standard deviation
0.01 100 [32,64,128] 0.5 64 1 7.35 1.19
0.01 200 [32,64,128] 0.5 64 1 7.33 1.03
0.01 200 [32,64,128] 0.5 128 1 7.46 0.83
0.01 200 [32,64,128] 0.5 256 1 7.61 0.90
0.001 200 [32,64,128] 0.5 64 1 7.44 0.89
0.001 200 [32,64,128] 0.5 128 1 7.45 0.90
0.001 200 [32,64,128] 0.5 256 1 7.43 0.84
0.01 200 [32,64,128,256] 0.5 64 1 7.50 0.88
0.01 200 [32,64,128,256] 0.5 128 1 7.60 0.86
0.01 200 [32,64,128,256] 0.5 256 1 7.72 0.90
RNN model:
learning rate Number of iterations Hidden layer size Number of memory units memory capacity iteration interval average target distance standard deviation
0.01 100 128 10 10 1 7.06 1.06
0.01 100 128 20 20 1 7.15 1.15
0.01 100 256 20 20 1 7.34 1.08
0.01 100 256 40 40 1 7.43 0.98
0.01 100 256 20 10 1 7.23 1.10
It can be seen from the experimental results that the ConvNet model can achieve better performance because it can better capture the mutual influence between various cells.
4.2.2 Experiment summary

经过实验数据分析,我们可以发现EvoACO算法在多尺度处理、演化策略、神经网络模型参数选择三个方面均有着良好的表现。
在多尺度处理上,EvoACO算法成功地实现了不同超参数对算法的影响,算法收敛速度随着迭代次数的增加而加快,但是平均目标距离的标准差明显减小,这说明算法在多个尺度上均匀搜索,提高了收敛速度和准确度。
在演化策略上,EvoACO算法成功地引入了漫游策略,使得算法在局部搜索的同时也引入了全局搜索的因素,因此可以有效避免陷入局部最优。
在神经网络模型参数选择上,EvoACO算法成功地选择了合适的超参数,提高了算法的性能。
综上,EvoACO算法在P-Median问题上的表现远胜于传统的蚁群算法,即便是同样的超参数设置,EvoACO算法的性能要优于蚁群算法。

Guess you like

Origin blog.csdn.net/universsky2015/article/details/132158172