[Cooperative task] Multi-UAV collaborative task planning based on matlab [including Matlab source code 2515]

⛄1. Introduction to Multi-UAV Collaborative Operation

0 INTRODUCTION
Multiple UAVs form a UAV swarm to complete tasks collaboratively, which is the future development direction of UAVs. Multiple UAVs that form a UAV cluster communicate with each other through inter-machine links to achieve cooperation, and can quickly and accurately perform complex tasks such as path planning, cooperative reconnaissance, cooperative perception, and coordinated attack.
In order to realize the attractive prospect of UAV swarm collaboration, relevant research work has been actively carried out at home and abroad. In the United States, the US Defense Advanced Research Projects Agency (DARPA) launched the "Elf" project in 2015, planning to develop a drone swarm system with self-organization and intelligent coordination capabilities. The Strategic Capabilities Office (SCO) of the U.S. Department of Defense launched the "UAV Swarm" project in 2014, which aims to perform low-altitude situational awareness and jamming missions through manned air-launched "Grey Partridge" micro-UAV swarms. The U.S. Office of Naval Research (ONR) announced the "Low-Cost UAV Swarm" (LOCUST) project in 2015 to develop a UAV swarm that can be launched in rapid succession. UAVs use short-range radio frequency networks to share the situation information to coordinate the execution of cover, attack or defense missions. In 2017, a sponsor event for the Offensive Swarm Tactics (OFFSET) project was held at the DARPA Convention Center with the goal of developing an open game-based architecture to generate, evaluate, and integrate swarm tactics for unmanned swarm systems for urban warfare.
In Europe, in 2016, the European Defense Agency launched the "European Bee Swarm" project, and carried out research on key technologies such as autonomous decision-making and coordinated flight of drone swarms. In 2016, the British Ministry of Defense launched the UAV swarm competition. Multiple participating teams controlled the UAV swarm to achieve tasks such as communication relay, coordinated jamming, target tracking and positioning, and regional mapping. In 2017, the Russian Radio Electronic Technology Group published a research plan stating that loading multiple swarm drones on fighter jets can realize a new combat style of coordinated reconnaissance and attack.
Relevant research has also been carried out in China. Recently, the Electronic Science Research Institute of China Electronics Technology (CETC) released a video of the Army's collaborative UAV "swarm", which has attracted widespread attention.
Facing this important topic, this paper summarizes the development trend of UAV collaborative application, discusses its current research progress and development direction, and proposes that the development trend of UAV cluster task collaboration is multi-agent collaboration.

1 Development Trend of UAV Collaborative Application
Analysis of existing research work As shown in Figure 1, UAV collaborative application can be roughly divided into three stages, namely distributed collaboration, swarm intelligence collaboration and future multi-agent synergy.
insert image description here
Figure 1. The development trend of UAV collaboration technology.
The first stage of UAV swarm collaboration development is simple distributed collaboration. At this stage, the cooperative tasks are pre-calculated and distributed among the simple connection and combined cluster members according to the execution conditions. The swarm basically has no ability to dynamically adjust task allocation according to changes in the environment and goals, and the tasks shared by each UAV are usually determined.

In view of the limitations of the pre-allocation method, inspired by the activities of biological swarms, swarm intelligence is applied to UAV swarms, making UAV swarm collaboration develop to the second stage - swarm intelligence collaboration. In this stage, each UAV node is endowed with primary intelligence, capable of simple cognition and decision-making; through closer coupling between cluster individuals, the optimization method or optimization goal can be adjusted according to the feedback during execution, so that the entire The UAV swarm system has the ability to form a self-organized and highly stable distributed system. The stage of swarm intelligence collaboration is currently in a period of rapid development in research and application.

With the further improvement of node computing capabilities and the rapid development of artificial intelligence technology, UAV collaboration is about to enter the third stage of development - multi-agent collaboration. In the stage of multi-agent collaborative development, each UAV in the cluster will be an independent comprehensive intelligent body, with multi-dimensional cognitive computing and advanced intelligent processing capabilities, so as to achieve more efficient autonomous learning and decision-making, and here Based on this, complete more complex and arduous tasks.

2. Distributed collaboration
Since the emergence of UAV swarms, it has been used to solve distributed collaboration tasks such as collaborative path planning, collaborative perception, and collaborative task planning. Early UAV distributed collaborative tasks are usually fully calculated and allocated in advance, and UAV nodes execute according to established algorithms or schemes. According to the calculation results, the distributed collaborative UAV group can organize and execute the configured tasks, as shown in Figure 2.
insert image description here
Figure 2 Task execution mode in the distributed collaborative stage
2.1 Cooperative path planning
In the collaborative path planning, it is required that after the target is given or the target is searched, the UAV node determines the flight path according to the current task state. For route planning of collaborative search and tracking tasks, UAV swarms can use the maximize target function to detect the most important targets and track them at critical moments to obtain the most valuable information. The path planning of collaborative search can be divided into two sub-problems of UAV work area division and full-area coverage search path planning. Multi-machine collaborative search is transformed into single-machine search on sub-areas, and the target area is quickly divided and flight routes are generated. The multi-UAV cooperative reconnaissance track planning algorithm based on the improved genetic algorithm can be used to solve the path planning problem in the process of efficient reconnaissance of various types of targets in a complex battlefield environment, and can effectively improve the accuracy and efficiency of track planning.

2.2 Cooperative sensing
Cooperative sensing is a task form in which multi-UAV clusters jointly detect and perceive the state of a certain target area. The most common among such tasks is collaborative spectrum sensing. According to the characteristics of cooperative spectrum sensing tasks, the distributed cooperative task execution scheme using the best fusion criterion can optimize the detection performance, minimize the total error rate of cooperative spectrum sensing, reduce the time of cooperative sensing, and save the cost of sensing process.

2.3 Cooperative task planning
Cooperative task planning requires that the cluster system can systematically allocate tasks according to the target tasks and execution conditions. For example, for coordinated strike missions, by establishing UAV damage cost index functions, voyage cost index functions, and value-benefit index functions, the allocation of multi-UAV cooperative strike tasks can be realized [5]; and by establishing a multi-objective optimization model and The use of genetic algorithm can effectively improve the efficiency of task completion. Aiming at the cooperative search and rescue mission, a new adaptive feedback regulation genetic algorithm based on communication preservation auction method can improve the weakness of traditional genetic algorithm that is easy to fall into local optimum.

Through the research work related to the above three types of collaborative tasks, it can be seen that although the distributed collaborative mode of UAV clusters fully considers the "distributed" characteristics, it can set effective objective functions and optimization methods according to the task objectives and cluster characteristics. To seek the optimal or better results, but the task execution environment and solution goals need to be optimized and calculated before the task is executed, and then assigned for execution, which cannot adapt to the actual dynamic task goals and environmental changes, and lacks the perception of "intelligence" and adaptive behavior. With the in-depth research on biological swarm intelligence such as "bee swarms" and "bird swarms", swarm intelligence collaboration has been further introduced into the collaboration of drones.

3 Swarm intelligence collaboration
"bird colony" and "ant colony" and other biological groups, although the individual intelligence in them is limited, but the group shows a high degree of self-organization, which is consistent with the demand for autonomous coordination of UAV swarms, so the group Intelligence has also been extensively studied in the field of UAV collaborative applications, which makes UAV cluster collaboration have preliminary intelligence. The UAV swarm system with swarm intelligence introduces group feedback and adaptability in the process of task disassembly and execution, and can perform more complex dynamic tasks. The process is shown in Figure 3.
insert image description here
Fig. 3 Task execution mode in the swarm intelligence collaboration stage
3.1 Cooperative path planning
Path planning tasks in mountainous areas are strongly affected by terrain characteristics, so it is difficult to pre-deterministically allocate and execute, and simple distributed collaboration is not competent. For example, in order to perform the path planning task of emergency material transportation in mountainous areas, an improved ant colony algorithm considering path safety can quickly converge and generate shorter paths. The ant colony algorithm is also used in the trajectory planning task of the UAV's cooperative flight to the air battlefield; an improved chaotic ant colony algorithm can better overcome the defects of local extremum and low convergence efficiency in the traditional ant colony intelligence algorithm , to improve the global optimization ability and search efficiency of the algorithm. Aiming at the trajectory planning of the scenario of cooperative attack on moving targets, another improved ant colony algorithm establishes a UAV swarm collaborative trajectory planning model combined with task assignment, which can quickly plan effective trajectories for multiple moving targets on the ground.

3.2 Collaborative sensing
In collaborative sensing tasks, swarm intelligence is also applied. Aiming at the UAV swarm communication scenarios and needs, a swarm intelligence theory and method that considers the combination of intelligent communication ideas under the guidance of cognitive radio technology and the aggregation of limited intelligence has emerged, and a swarm intelligent collaborative communication model and an intelligent collaborative perception model have been constructed.

3.3 Coordinated mission planning
Cooperative operations are a typical scenario in collaborative mission planning. Combining the advantages of swarm intelligence optimization algorithms, the cluster networking task assignment algorithm based on particle swarm-integer coded wolf pack algorithm is suitable for solving such collaborative problems; It is difficult to make cooperative decision-making of computer clusters, and it can also combine the cognition and collaboration capabilities of the wolf pack algorithm to quickly track and surround targets in complex environments. This kind of coordination task is beyond the capability of the first type of simple coordination.

Although the combination of UAV swarm and swarm intelligence can give full play to the advantages of UAV swarm and enhance distributed collaborative intelligence, it can generate certain interaction and feedback with the environment and the middle process of task execution in the process of task execution, so that it has certain Adaptive ability, however, this kind of intelligence is still very limited, and its essence is still based on the distributed optimization algorithm under the specific calculation mode and feedback mode.

4 Multi-agent collaboration
With the continuous enhancement of artificial intelligence technology and the computing power of nodes themselves, the individuals in the future UAV cluster will have stronger intelligence, able to independently perceive and evaluate the environment and tasks, and realize multiple intelligences. Interaction and collaboration between agents, so as to have the ability of multi-agent collaboration.

In recent years, breakthroughs have been made in the field of artificial intelligence research. Among them, deep reinforcement learning has been successfully applied in many fields. Resource allocation techniques based on multi-agent deep reinforcement learning for wireless communication networks have also been intensively studied. The multi-agent deep reinforcement learning model has long been used to solve the spectrum resource allocation problem in the Internet of Vehicles, and this application is quite close to the UAV swarm system. For example, a distributed dynamic power allocation scheme based on multi-agent deep reinforcement learning. Strategies based on multi-agent deep reinforcement learning can also be used in a combination of the two—using UAV-assisted vehicle networks for multi-dimensional resource management.

Although the problem of multi-agent communication network resource allocation based on reinforcement learning has been widely studied, due to the different network characteristics, the traditional research results for other communication networks cannot be directly used in UAV swarm networks. Therefore, the application of multi-agent autonomous collaboration based on reinforcement learning has gradually become a research hotspot in the future of UAV multi-agent autonomous collaboration. The multi-agent deep reinforcement learning schemes proposed for the dynamic allocation of communication network resources of UAV groups have also emerged one after another. For example, a distributed interference coordination strategy based on multi-agent deep reinforcement learning is used to File download service in computer network. In stand-alone reinforcement learning adapted to the characteristics of UAV networks, agents' behavioral policies can usually only be formulated based on their local individual observations of the global environment. In view of this limitation, the joint use of two agents of different scales can solve the communication problem between agents.

UAV cluster collaboration often deals with the optimal solution of dynamic high-dimensional discrete and continuous action state spaces. The recently emerged actor-critic algorithm is an emerging direction of deep reinforcement learning, which combines value-based and policy-based functions. The advantages of the two branches of deep reinforcement learning are very suitable for the intelligent collaboration of UAV clusters. Using the actor-critic algorithm, the optimal resource allocation strategy can be found under the condition that the wireless channel and the renewable energy regeneration rate change randomly and the environment changes dynamically, such as resource allocation in a complex dynamic environment in the Internet of Vehicles question. Under the environment of heterogeneous cellular network with device-to-device (D2D) network, the strategy based on actor-critic algorithm can be used for intelligent energy-saving mode selection and resource allocation.

As the intelligent computing power of nodes continues to increase, each drone in the drone cluster can be used as an agent with deep reinforcement learning capabilities, and the entire cluster can form a multi-agent through cooperation. Information exchange and distribution are carried out between adjacent UAVs through the communication network. As shown in Figure 4, each UAV interacts with the local environment. According to the information obtained from the surrounding environment or the companion UAV, according to the task requirements of the load, it intelligently generates an action strategy through deep reinforcement learning to carry out its own The allocation and adjustment of resources and behavior, and then interact with the environment and peers, and obtain individual rewards.
insert image description here
Figure 4 UAV cluster based on multi-agent
The deep reinforcement learning agent of each UAV is composed of two deep neural networks, including actor network and critic network, as shown in Figure 5.
insert image description here
Figure 5. The UAV agent based on the actor-critic algorithm
. The actor network is responsible for outputting actions, and the critic network is responsible for evaluating the actions of actors to obtain mutual promotion effects. Compared with the traditional deep reinforcement learning method, the actor-critic algorithm absorbs the advantages of the value function-based method and the policy function-based method at the same time, and trains and improves the agent from the two aspects of value and strategy. The training is faster and the effect is better. good. Through training and learning, it is expected that the critic network of the agent can obtain the best utility evaluation function:
insert image description here
the agent obtains state information St from the surrounding environment, such as interference state, adjacent drones, etc. The actor network fits the action strategy function π(St;ωπ). According to the state information St, the action strategy function outputs the action at of the current time slot, that is, the result of resource allocation, and applies it to the environment to obtain the individual's immediate reward rt. The reward is calculated by the reward function, which is responsible for feeding back an immediate, reasonable, and instructive reward value, thereby motivating the agent to update the strategy towards the goal. The critic network fits the utility evaluation function Q(St,at,ω), which is responsible for predicting and evaluating the value of the state action that can be obtained by using the action at in the current state St, that is, the long-term return Rt is:
insert image description here
In the formula, γ is a discount factor, γ represents the current emphasis on future income, and the value is between 0 and 1. A value of 0 means that the agent ignores future income and only values ​​the current income rt. A value of 1 means that Indicates that the agent regards the rewards of each future moment as equally important as the rewards of the current moment. Obtaining the maximum long-term reward is the ultimate goal of the agent, and this goal can be defined as different evaluation criteria according to the nature of the task.

The actor network action strategy corresponding to the best utility evaluation function of the critic network is the best action strategy π*. The parameters of the action policy function are updated by the method of policy gradient, and the parameters of the utility evaluation function are updated by minimizing the loss function:
insert image description here
where, yt is the advantage function, which is used to measure the actual utility evaluation value after calculating the execution of the action. Through updating, the action policy output of the actor network of the agent is getting better and better, and the utility evaluation of the critic network will be more and more accurate. Each UAV in the UAV swarm based on multi-agent will perform actions in the direction of the maximum benefit, so as to maximize the revenue of the swarm.

⛄ 2. Part of the source code

clear
clc
addpath(‘./sub/’)
data;
disp(‘1-Question 1, the path of S1’)
disp(‘2-Question 1, the path of S2’)
disp(‘3-Question 3, the path of missile_plane & bmob_plane’)
key=input('Please select the num: ');
switch key
case 1
disp(‘Please wait until the calculation is finished…’);
disp(‘It might be a long time’);
close all
Draw_Map;
camara=2;
sig=1;
[result1,seq1,point1,l1]=Get_Total_Rader_Dist1(camara,A01,A02,A08,A09,A03,sig);
[result2,seq2,point2,l2]=Get_Total_Rader_Dist2(camara,A06,A05,A07,A10,A04,sig);
disp(‘result1,seq1,point1,l1,result2,seq2,point2,l2 has recorded all the results of this question’);
result=sum(result1)+sum(result2);
Show_Result(['The minimum rader traveling distance is ’ num2str(result)]);
case 2
disp(‘Please wait until the calculation is finished…’);
disp(‘It might be a long time’);
close all
Draw_Map;
Question_2;
disp(‘result,point,l has recorded all the results of this question’);
Show_Result(['The minimum rader traveling distance is ’ num2str(result)]);
case 3
disp(‘Please wait until the calculation is finished…’);
disp(‘It might be a long time’);
close all
Draw_Map;
Question_3;
disp(‘min_point,min_value has recorded all the results of this question’);
Show_Result(['The minimum rader traveling distance is ’ num2str(min_value)]);
end

rmpath(‘./sub/’)

⛄3. Running results

insert image description here

⛄4. Matlab version and references

1 matlab version
2014a

2 References
[1] Zhao Fa, Qi Xiuli, Yu Xiaohan, Zhang Suojuan, Li Benling. Research on area search and target capture based on multi-UAV autonomous cooperative task planning [J]. Electronic Technology and Software Engineering. 2022, ( 11)

3 Remarks
Introduction This part is taken from the Internet and is for reference only. If there is any infringement, please contact to delete

Guess you like

Origin blog.csdn.net/TIQCmatlab/article/details/130033305