多无人机网络中的强化学习：部署和运控
Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design
文章链接

文章目录

无人机应用场景
强化学习结合无人机通信研究概况
文章问题建模

符号速查
问题建模

大环境
用户
无人机
信道衰减模型
带宽分配
用户信噪比计算和要求门限
限制条件
用户满意度指标 Quality-of-Experience Model
高度和LoS的矛盾
问题描述

方法

Cell Partition of Ground Users 用户划分
Q-Learning Algorithm for 3D Deployment of UAVs 无人机部署
dynamic move design of UAVs

无人机应用场景

作为基站辅助地面通信
UAVs are employed as aerial base stations for assisting the existing terrestrial communication infrastructure
problems like quick-response wireless service recovery after unexpected
地面基础设施损坏时提供通信服务或蜂窝繁忙时纾解压力
infrastructure damage or natural disasters, as well
as cellular network offloading in hot spots such as sport stadiums
IoT场景下收集传感器信息
In IoT networks, UAVs are capable of collecting data from ground IoT devices in a given geographical area

强化学习结合无人机通信研究概况

Reinforcement Learning in UAV-Assisted Communications:

the authors in 36 jointly designed the trajectory and power allocation of UAV for serving static nonorthogonal multiple access (NOMA) users. The design challenges of integrating NOMA techniques into UAV networks have been investigated while some open research issues are also highlighted.
An interference-aware path planning scheme based on deep reinforcement learning was proposed for a network of cellular-connected UAVs in 37, better wireless latency and transmit rate was achieved in the proposed scheme.
In 38, the authors proposed a UAV relay scheme based on both reinforcement learning and deep reinforcement learning. With the aid of these two algorithms, the energy consumption of the UAV is minimized while a better bit error rate (BER) performance is received. However, the multiple-UAV scenario was not considered in this article.
In 39, the authors invoked a deep reinforcement learning (DRL) algorithm for energy efficient control of UAVs by jointly considering communications coverage, fairness, energy consumption, and connectivity. The aim is to find a control policy that specifies how each UAV moves in each timeslot. Thus, four parameters: average coverage score, fairness index, average energy consumption and energy efficiency are jointly optimized. However, the movement of ground users was also neglected for the purpose of simplifying the system model

文章问题建模

符号速查

The mean opinion score (MOS) is adopted for evaluating the satisfaction of users.
在这里插入图片描述

问题建模

大环境

在这里插入图片描述

down-link transmission in UAV-assisted wireless network.
users in the same cluster are served by the same UAV simultaneously by employing FDMA.
In each cluster, a single UAV is employed.

用户

在这里插入图片描述

用户坐标

无人机

在这里插入图片描述

信道衰减模型

在这里插入图片描述

带宽分配

在这里插入图片描述

用户信噪比计算和要求门限

在这里插入图片描述

限制条件

（1）高度限制
在这里插入图片描述
（2）能量限制
$P_{max}$

用户满意度指标 Quality-of-Experience Model

在这里插入图片描述

When the UAVs fly towards some users to acquire a better channel environment for these users, they get farther away from other users. Finally, the MOS of the users which are farther away from the UAVs become lower than those who are closer to the UAVs

高度和LoS的矛盾

Increasing the UAV’s altitude leads to a higher path loss while a higher LoS probability is obtained, the UAVs have to increase the transmit power for
satisfying the users’ QoE requirements.

问题描述

在这里插入图片描述
$13-e$ 和 $13-f$ 是能量约束
$13-d$ 是信噪比约束
$13-b$ 是用户划分约束
$13-c$ 是无人机高度约束

方法

Cell Partition of Ground Users 用户划分

K-means algorithm, also named Lloyd algorithm can solve the problem of clustering and obtaining the initial 3D position of the UAVs with low complexity. This algorithm is capable of partitioning users into different clusters based on the policy of nearest neighbor barycenter and recalculate the barycenter of each cluste### A. Initial Algorithm for Cell Partition of Ground Users.

Firstly, initialize the population (N users) and find the best individuals as the center of cluster $C_1, . . . , C_N$ based on the genetic algorithm.
Secondly, deploy N UAVs in each center $μ_n$ , n = 1, . . . , N, and compare the Euclid distance
$(x_i, μ_n) , i = 1, . . . , U$ , then partition user i into the cluster with the smallest distance.
Repeat this step until all users have been allocated, after which recompute the center of each cluster. Update the cluster member by repeating this step until the cluster members no longer change significantly.