Ant colony algorithm re-optimization: combine aco algorithm with Sarsa in RL

Introduction to Ant Colony Algorithm, Sarsa and TSP Problems

Before introducing the optimization of the ant colony algorithm, the author first explains the application background of the algorithms involved.

TSP and Sarsa

TSPThe problem is called the traveling salesman problem, that is, given n cities and their coordinates, the distance between the cities is expressed as dij (i, j are the subscripts of different cities), and the distance of d is generally Euclidean distance, and it is a symmetrical problem (That is, the distances between i->j and j->i are equal), we want to find a Hamiltonian circuit to minimize the consumption of the circuit, which is the goal of our TSP problem.
dressingAlgorithm is a classic algorithm for agents to learn strategies in an unknown space in reinforcement learning. The general idea is that the action taken by the agent (agent) at the next moment refers to the benefits brought by the action, and the action with the probability of ε is used to obtain the maximum benefit, and the probability of 1-ε randomly selects the remaining actions for optimization.
For specific related articles, please refer to the author's previous blog, the comparison experiment between Sarsa of reinforcement learning and Cliff-Walking of Q-Learning . The pseudo code of Sarsa in the article is quoted below.
insert image description here

aco algorithm

Ant Colony AlgorithmIt is a bionic heuristic algorithm and one of the classic heuristic algorithms. It is proposed that many scholars have been studying ant colony and making improvements over the years. Many algorithms will combine ant colony algorithm with other heuristic algorithms, or combine with the popular deep learning and reinforcement learning to produce a more powerful optimization ability. algorithm. Of course, the effect of the classic ant colony algorithm in solving the TSP problem is far inferior to the best method, but only by improving a small part of the basic research can we build high-rise buildings. This is also the purpose of the article.
The ant colony algorithm determines the next path to take based on the pheromone left by the previous ants until the destination is reached, which is the famous ant effect. This idea is used to design ACO, which is applied in the TSP problem. The following is the flow chart of the ant colony algorithm
insert image description here

Concrete improvements and code

Improvement description

With a certain probability rr (using a pseudo-random number), when rr>0.1, select the largest next city to construct the city circuit. Select one of the 3 best next
cities with a probability of 1-ε.
reason:

 引用了Sarsa的思路进行探索
 且探索的是较好的那几个城市中一个
 不是所有城市是因为,最优解的边不会太差,所以不是选取所有的边进行等概率的选择

All the code, the author has uploaded to CSDN, if necessary, you can download the experimental
ant colony algorithm improvement

part of the code

The following is the improved part based on the original ant colony algorithm, and the others are consistent with the original

        if rr > 0.1:
            for i, probability in enumerate(probabilities):
                rand -= probability
                if rand <= 0:
                    selected = i
                    break
        else:
            sorted_allowed = sorted(enumerate(probabilities), key=lambda x: x[1],reverse=True)
            ex_allowed = sorted_allowed[:2]
            selected = random.choice([tup[0] for tup in ex_allowed])
            while probabilities[selected] == 0:
                selected = random.choice([tup[0] for tup in ex_allowed])

The picture is taken from the reference [1]

numerical experiment

The author has conducted experiments on TSP's public test machine att48 to prove that after parameter adjustment and parameter determination, in the overall experiment of 1000 times, data collection and summary were carried out, and finally it was proved that the improvement is effective, compared with the original The own algorithm has made **1%** progress.
The dataset and code are shared in the signed code csdn file.
The author uses two sets of parameters to carry out experiments on the original ant colony algorithm and the improved ant colony algorithm. There are 4 sets of experiments in total. The parameters and experimental results are given below and discussed.
The meanings of the parameters are, the number of ant, the number of iterations, α, β, the decay rate r, the fixed Q value and the selected method of calculating pheromone

第一组参数
10, 100, 1.0, 10.0, 0.5, 10, 3
改进的蚁群:ave = 36366      min = 34575
未改进的蚁群:ave = 36536    min=35251


第二组参数
40, 50, 0.1, 18.0, 0.7, 48, 3
改进的蚁群:ave = 35949      min = 34448
未改进的蚁群:ave = 35667    min=34852

Conclusion analysis

Under the premise of controlling variables such as parameters, the improved ant colony is smaller than the minimum value of the unimproved one. It is noted that the improved ave of the second set of parameters is higher than the unimproved ave, but the minimum value is smaller, which shows that the improved ant colony The search space performed by the swarm is larger than that of the unimproved one, so the obtained optimization results are better.

references

[1] Analysis of 30 Cases of MATLAB Intelligent Algorithms, ISBN: 9787512403512, Authors: Shi Feng, Wang Hui, etc.

Guess you like

Origin blog.csdn.net/wlfyok/article/details/129325205