Cloud-Edge-End Collaboration in Air–GroundIntegrated Power IoT: A SemidistributedLearning Approach

Summary:

The combination of Ground-Air Integrated Power Internet of Things (AGI-PIoT) and cloud-edge-device collaboration enables flexible coverage and real-time data processing. However, how to achieve intelligent cloud collaboration in AGI-PIoT faces challenges such as aerial network dynamics, multi-level, time scale, dimensional resource allocation coupling, incomplete information, and the curse of dimensionality. In this paper, we propose a multi-layer, multi-timescale, and multi-dimensional resource allocation algorithm based on federated deep reinforcement learning. Based on the Lyapunov optimization method, the multi-layer, multi-time scale and multi-dimensional resource allocation problem is decomposed into three sub-problems. Aiming at the joint task offloading and power control subproblems, a semi-distributed algorithm based on federated deep role criticality is proposed. The quadratic programming method is used to solve the entry control subproblem. The third subproblem is solved by smoothing approximation and Lagrangian dual decomposition. Simulation results show that FEDERATION algorithm is superior to existing algorithms in terms of queuing delay, energy consumption and convergence.

    • introduction

5G-enabled Power Internet of Things (PIoT) devices have been deployed in power systems to provide continuous monitoring, unmanned control, and fault detection. These devices generate massive computational tasks that must be offloaded to cloud servers and processed within strict latency requirements [1], [2]. However, PIoT faces strict cellular network coverage and large task offload latency due to the remote location of power infrastructure [3]. To address these deficiencies, a new Air-Ground Integrated Power Internet of Things (AGI-PIoT) framework is needed, using cloud servers and edge servers to provide powerful computing capabilities, and to achieve flexible coverage by deploying UAVs [4], [5] . On the one hand, cloud servers can make up for the shortage of edge server computing resources. Compared with ground base stations, UAVs can be flexibly deployed to meet sudden communication needs and provide high-speed data transmission services [6]. AGI-PIoT integrates cloud-edge-end collaboration, and reduces data processing delays by jointly optimizing device-side data reception, task offloading, power control, and edge-end and cloud computing resource allocation [7]. Despite the aforementioned advantages, the seamless integration of these two technologies requires efficient utilization of communication, energy, and computing resources through data reception, transmission, and processing from an intelligent perspective. How to realize intelligent cloud-edge-device collaboration in AGI-PIoT still faces three major challenges. First, although aerial networks offer higher channel quality due to visible light (LoS) links, their coverage availability varies dynamically due to UAV mobility. The heterogeneity and dynamics of aviation networks bring new difficulties to cloud-device collaboration. Second, resource allocation is coupled across layers, timescales, and dimensions. For example, admission control at the application layer affects task offloading, power control, computing resource allocation, etc. at the physical layer; Admission control at the application layer affects task offloading, power control, and computing resource allocation at the physical layer. Task offloading on large time scales also affects power control, admission control, and computing resource allocation on small time scales. Finally, it is not practical to obtain complete global state information (GSI) for each device. The optimality and convergence of the algorithm are further discussed. Due to the curse of dimensionality, the performance of some machine learning techniques such as reinforcement learning (RL) drops significantly. At present, there have been some researches on cloud collaboration and AGI-PIoT resource allocation. In [8], Kai et al. developed a collaborative computing framework among devices, edge nodes, and cloud centers for latency minimization. This method mainly focuses on the terrestrial network without considering the heterogeneity and dynamics of AGI-PIoT. In [6], Shang et al. studied Mobile Edge Computing (MEC) supporting AGI wireless networks and developed a coordinate descent based algorithm to minimize the energy consumption of user equipment. However, these works rely on perfect GSIs, which perform poorly in practical implementations with incomplete information. Deep reinforcement learning (DRL) has the potential to solve high-dimensional optimization problems by combining the feature extraction and prediction capabilities of deep learning, and the sequential decision-making optimization capabilities of RL [9], [10]. It embraces centralized learning and distributed learning paradigms, but neither can provide a balanced trade-off between learning cost and performance. Centralized DRL has high communication cost due to the upload of raw data, while distributed DRL has relatively poor learning performance because it does not make full use of similar environment observation data of nearby devices. To provide a solution, some researchers try to combine federated learning (FL) with DRL to gain the benefits of both centralized and distributed learning. Federated DRL can reduce communication overhead by utilizing semi-distributed model training and improve learning performance by fully utilizing network observations through federated averaging. To optimize communication resource allocation and transmission model selection in vehicle-to-everything communication [11], a dual-time-scale federated DRL algorithm is proposed. In [12], Kwon et al. designed a federated DRL-based resource allocation and cell association scheme to maximize the throughput of underwater IoT devices. However, these works cannot be applied to multi-level multi-time-scale multi-dimensional optimization problems considering dynamic random data arrival and long-term energy consumption and data reception constraints. To address the above challenges, we propose a federated deep reinforcement learning-based multi-layer, multi-time-scale, and multi-dimensional resource allocation algorithm (FEDERATION), which is used in agpiot Cloud-edge-end collaboration. The goal is to minimize queuing delays for all PIoT devices under the long-term constraints of data reception and energy consumption. First, the short-term optimization of task offloading, power control, admission control, and computing resource allocation is decoupled from the long-term constraints using the Lyapunov optimization method; the proposed multi-layer, multi-time-scale, and multi-dimensional optimization problem is decomposed into three short-term sub-problems: 1 ) task offloading and power control; 2) admission control; 3) computing resource allocation. Based on this, we propose a [13] semi-distributed multi-timescale task offloading and power control algorithm based on federated deep role criticism (AC) to solve the curse of dimensionality problem and balance between learning cost and power consumption. trade-off performance. The second subproblem is solved by each PIoT device in a distributed manner. Finally, on the basis of large-scale task offloading decisions, each server optimizes the allocation of small-scale computing resources. The third subproblem can be solved by smoothing approximation and Lagrangian dual decomposition. These contributions are summarized below. 1) Multi-dimensional optimization of multi-layer, multi-time scale, ground-space heterogeneous resources: FEDERATION optimizes physical layer task offloading on a large time scale, and optimizes application layer admission control, physical layer power control, and computing resource allocation on a small time scale. 2) Semi-distributed learning under the curse of dimensionality: FEDERATION separates DRL model training from raw training data requisition by introducing a centrally coordinated loose federation of devices, and uses participant networks and critic networks to draw actions, update strategies, and criticize strategies optimization. 3) Extensive performance evaluation: In various scenarios of AGI-PIoT, the FEDERATION algorithm is verified by comparing it with the current state-of-the-art algorithms in terms of queuing delay, energy consumption, and convergence. The structure of this paper is as follows. The second section introduces the AGI-PIoT system model. Section III addresses the optimization problem. Section IV presents the proposed FEDERATION algorithm. Section V presents the semi-distributed multi-timescale task offloading and power control algorithm. Section VI presents the simulation results. Finally, Section VII concludes the paper. 3) Calculation resource allocation. Based on this, we propose a [13] semi-distributed multi-timescale task offloading and power control algorithm based on federated deep role criticism (AC) to solve the curse of dimensionality problem and balance between learning cost and power consumption. trade-off performance. The second subproblem is solved by each PIoT device in a distributed manner. Finally, on the basis of large-scale task offloading decisions, each server optimizes the allocation of small-scale computing resources. The third subproblem can be solved by smoothing approximation and Lagrangian dual decomposition. These contributions are summarized below. 1) Multi-dimensional optimization of multi-layer, multi-time scale, ground-space heterogeneous resources: FEDERATION optimizes physical layer task offloading on a large time scale, and optimizes application layer admission control, physical layer power control, and computing resource allocation on a small time scale. 2) Semi-distributed learning under the curse of dimensionality: FEDERATION separates DRL model training from raw training data requisition by introducing a centrally coordinated loose federation of devices, and uses participant networks and critic networks to draw actions, update strategies, and criticize strategies optimization. 3) Extensive performance evaluation: In various scenarios of AGI-PIoT, the FEDERATION algorithm is verified by comparing it with the current state-of-the-art algorithms in terms of queuing delay, energy consumption, and convergence. The structure of this paper is as follows. The second section introduces the AGI-PIoT system model. Section III addresses the optimization problem. Section IV presents the proposed FEDERATION algorithm. Section V presents the semi-distributed multi-timescale task offloading and power control algorithm. Section VI presents the simulation results. Finally, Section VII concludes the paper. 3) Calculation resource allocation. Based on this, we propose a [13] semi-distributed multi-timescale task offloading and power control algorithm based on federated deep role criticism (AC) to solve the curse of dimensionality problem and balance between learning cost and power consumption. trade-off performance. The second subproblem is solved by each PIoT device in a distributed manner. Finally, on the basis of large-scale task offloading decisions, each server optimizes the allocation of small-scale computing resources. The third subproblem can be solved by smoothing approximation and Lagrangian dual decomposition. These contributions are summarized below. 1) Multi-dimensional optimization of multi-layer, multi-time scale, ground-space heterogeneous resources: FEDERATION optimizes physical layer task offloading on a large time scale, and optimizes application layer admission control, physical layer power control, and computing resource allocation on a small time scale. 2) Semi-distributed learning under the curse of dimensionality: FEDERATION separates DRL model training from raw training data requisition by introducing a centrally coordinated loose federation of devices, and uses participant networks and critic networks to draw actions, update strategies, and criticize strategies optimization. 3) Extensive performance evaluation: In various scenarios of AGI-PIoT, the FEDERATION algorithm is verified by comparing it with the current state-of-the-art algorithms in terms of queuing delay, energy consumption, and convergence. The structure of this paper is as follows. The second section introduces the AGI-PIoT system model. Section III addresses the optimization problem. Section IV presents the proposed FEDERATION algorithm. Section V presents the semi-distributed multi-timescale task offloading and power control algorithm. Section VI presents the simulation results. Finally, Section VII concludes the paper. FEDERATION separates DRL model training from raw training data enlistment by introducing a centrally coordinated device loose federation, and uses participant networks and critic networks to draw actions, update policies, and criticize policy optimization. 3) Extensive performance evaluation: In various scenarios of AGI-PIoT, the FEDERATION algorithm is verified by comparing it with the current state-of-the-art algorithms in terms of queuing delay, energy consumption, and convergence. The structure of this paper is as follows. The second section introduces the AGI-PIoT system model. Section III addresses the optimization problem. Section IV presents the proposed FEDERATION algorithm. Section V presents the semi-distributed multi-timescale task offloading and power control algorithm. Section VI presents the simulation results. Finally, Section VII concludes the paper. FEDERATION separates DRL model training from raw training data enlistment by introducing a centrally coordinated device loose federation, and uses participant networks and critic networks to draw actions, update policies, and criticize policy optimization. 3) Extensive performance evaluation: In various scenarios of AGI-PIoT, the FEDERATION algorithm is verified by comparing it with the current state-of-the-art algorithms in terms of queuing delay, energy consumption, and convergence. The structure of this paper is as follows. The second section introduces the AGI-PIoT system model. Section III addresses the optimization problem. Section IV presents the proposed FEDERATION algorithm. Section V presents the semi-distributed multi-timescale task offloading and power control algorithm. Section VI presents the simulation results. Finally, Section VII concludes the paper.

2. System model

图1显示了AGI-PIoT网络,它由大量的PIoT设备、BSs和服务器组成。这些设备都沿着电力传输线部署,以提供24/7的服务实时监控服务。I PIoT设备集记为U = {u1,…,ui,…, uI}。共有J + 1个基站,包括1个宏基站(MBS)、M个小基站(sss)和J−M个无人机(uav),其集合记为S = {s0, s1,…, sj,…,sJ},其中sJ, j = 0,表示MBS, sJ, j = 1,…,M,表示SBS, sj, j = M + 1,…,J,表示无人机。MBS为所有设备提供大范围通信覆盖,SBSs和无人机为热点地区提供本地覆盖作为补充。有一个云服务器和J + 1边缘服务器。云服务器具有强大的计算能力,但位于远离设备的位置。它通过用于MBS和SBS的有线回程链路和用于无人机的无线链路与BSs连接。边缘服务器与BSs一起部署,为设备提供近似计算服务。与地面基站相比,由于LoS链路,无人机具有较高的传输速率,但由于其有限的承载能力和固有的机动性,其计算能力和间歇性服务可用性相对较弱。可以集成AGI-PIoT和云边端协作的异构性,以适应数据的异构性,具体实现如下:首先,设备生成大量的延迟敏感任务数据,其中一部分被接收到其本地缓冲区。其次,设备决定自己的传输功率,并通过MBS、SBS或UAV将接收的数据卸载到云服务器或其中一台边缘服务器上。特别是,该设备可以通过无人机将数据密集型业务的任务数据卸载到边缘服务器,例如:电力视频检测,并通过地面基站将计算密集型业务(如光伏输出预测)卸载到云服务器。每个服务器根据数据量和计算复杂度分配计算资源来处理卸载的数据。示例如图1所示。u1首先可以通过MBS s0、SBS s2或UAV s4卸载数据。当s4移动到u1的通信范围之外时,u1只能通过s0或s2卸载它的数据。我们利用离散时隙模型和准静态模型[14]。如图1所示,我们考虑G个epoch,每个epoch被划分为T0 = duration τ,即一个时隙。epoch的集合由G ={1,…, g,…,G},记为T (G) = {(G−1)T0 + 1,(G−1)T0 + 2,…, gT0}。G个epoch总共包含T个时隙,即T = T0G。将时隙的集合表示为T ={1,…, t,…T}。无人机的位置变化时间尺度大,信道状态信息变化时间尺度小。我们考虑了多层、多时间尺度的多维资源配置。1)多层:应用层优化准入控制,物理层优化任务卸载和电源控制;2)多时间尺度:在每个epoch(大时间尺度)优化任务卸载,避免切换成本[15]、[16],在每个时隙(小时间尺度)优化功率控制、准入控制和计算资源分配,减少能耗和数据积压;3)多维:优化能源、通信、计算等多维资源的配置。联邦可以优化多时间尺度的任务卸载和功率控制DRL,具有优越的学习性能和较少的沟通开销。详情见第五节。

A.准入控制模型

表示到达设备ui的任务数据量为Ai(t),其上限为0≤Ai(t)≤Ai,max。其统计模型未知。允许控制决策用于确定可以允许进入缓冲区的数据的部分。由于数据的突发到达和冗余,通过确定可以进入缓冲区的数据部分,可以利用准入控制来缓解大量的积压增量和资源浪费。鉴于监测精度与接收数据量呈正相关,因此,接收控制决策ai(t)应满足以下应用层约束,即:

式中ai,min表示每个时隙内监控服务需要接收的最小数据量,即短期接收约束。θG∈(0,1)为监测业务的最小平均数据接纳比,即长期接纳约束。长期和短期的接收约束保证了接收的数据量不超过到达的数据量,保证了接收的数据量满足PIoT监测精度的要求。存储在ui缓冲区中的数据被建模为设备端数据队列。它的backlog记为Qi(t),更新为

其中Ui(t)是吞吐量。

B.任务卸载模型

s的覆盖可用性;u;表示为二进制变量wi,j(g){0,1},其中wi,j(g) = 1表示UI位于s的覆盖范围内;在GTH纪元,即s;可用于ui。任务卸载优化分为两个阶段:1)BS选择,即选择MBS、SBS或UAV进行数据传输;2)计算范式选择,即数据处理选择云计算或边缘计算。特别指出了用户界面的任务卸载决策作为二进制指标 x(g) = {(g),(g), UI EU, sj ES},其中xj(9) = 1表示u;选择年代;用于GTH时代的任务卸载。(g) = 1表示u选择第g个纪元的边缘计算,(g) =O表示ui选择第g个纪元的云计算。选取相同BS的PIoT器件使用正交谱进行分配。该频谱被SBSs和无人机重用,以提高频谱效率。因此,只有SBSs和uav之间的cell间干扰被认为是[17]。传输模型介绍如下。

1) Device-SBS传输模型:ui与SBS sj之间的传输速率为

其中Bi,j (t), Pi(t), hi,j (t), ri,j (g)分别为带宽、发射功率、信道增益、ui与sj之间的水平距离。Ij e(t)、δ2和αS分别为器件- sbs信道的电磁干扰功率、噪声功率和路径损耗指数。将ui的最小传输功率和最大传输功率分别记为Pi,min和Pi,max。将传输功率离散为N级,即Pi(t)∈P = {Pi,min,…,Pi,min + (n−1)(Pi,max−Pi,min) n−1,…,Pi,max}。[12]和[13]也采用了类似的假设。

2) Device-MBS传输模型:ui与MBS之间的传输速率为

其中αM为器件- mbs信道的路径损耗指数。

3)设备-无人机传输模型:ui与无人机sj之间的路径损耗由[18]给出

Di,j (g)表示UI到sj的垂直距离。ηLoS i,j,g是LoS链路自由空间路径损失之外的附加损失,ηi,j,g NLoS是非视线(NLoS)链路的附加损失。Fc是载频,c是光速。PLoS i,j,g为设备-无人机链路的LoS概率。则ui与sj之间的传输速率为

由式(5)、式(4)、式(7)可知,吞吐量Ui(t)为

C.数据计算模型

ui中未处理的数据存储在边缘服务器和云服务器的缓冲队列中。队列积压记为He i,j (t)和Hi c(t),它们演化为

Ze i,j (t)和Zi c(t)分别表示sj和云服务器处理的数据量。Zi i,j (t)和Zi c(t)分别得到

其中f e i,j (t)和fi c(t)为sj和云服务器为处理ui卸载数据而分配的CPU周期频率。λi为ui数据的计算密度

D.能耗模型

ui进行数据传输的能耗为

考虑到PIoT设备的电池容量有限,长期的能源消耗限制为

其中Ei,max是ui的能量预算。

E.排队延迟模型

延迟需求是根据排队延迟[19]定义的。端到端延迟由数据传输τi Q(t)的排队延迟和计算τi H(t)的排队延迟组成。其中,τi H(t)为J + 1边缘服务器和云服务器上的最大排队延迟。根据利特尔定律[20],排队延迟与排队长度成正比,与平均数据到达率成反比。因此,用户界面的数据传输、边缘计算和云计算的排队时延分别为

其中τ e→c j, j = 0,…,M,是τ e→c j, j = M + 1时,通过有线回程链路转发数据到云服务器的延迟。, J,是通过无线回程链接。~ ai(t), ~ Ue i,j (t),和~ Uc i(t)是Qi(t), He i,j (t)和Hi c(t)的平均数据到达率。计算的排队延迟τi H(t)的推导为

3、问题公式化

多层多时间尺度多维优化问题的目标是使所有PIoT的总排队延迟最小化在长期的数据准入和能耗约束下,设备通过准入控制、加载任务、功率控制和计算资源分配的联合优化,其表达式为

x = (x(g): g∈g)表示大时间尺度任务卸载向量。a = (ai(t): ui∈U, t∈t), P = (Pi(t): ui∈U, t∈t), f e = (f e i,j (t): ui∈U, sj∈S, t∈t), f c = (fi c(t): ui∈U, t∈t)表示边缘服务器端和云服务器端的小时间尺度准入控制、功率控制和计算资源分配向量。C1限制每个PIoT设备在每个epoch中只能选择一个BS和一个计算范式进行任务卸载。C2表示短期数据接收约束。C3和C4表示sj和云服务器分配的计算资源量不能超过它们的最大可用CPU周期频率f e j,max(t)和f c max(t)。C5是传输功率约束。C6为长期的能耗和数据准入约束。为了求解P1,我们首先借助Lyapunov优化将短期优化与长期约束解耦。采用C6对应的虚拟队列Ni(t)和Yi(t)[21],即:

其中Ei,最大T是单槽能量预算。Ni(t)值越大,说明录取数量严重不足。同理,Yi(t)值越大,说明能量消耗过度严重。基于Lyapunov优化,当Ni(t)和Yi(t)均值速率稳定,即limt→∞E{Ni(t)} t = 0和limt→∞E{Yi(t)} t = 0时,C6自动成立

P1等价地变换为

定义Θ(t)=[Qi(t), He i,j (t), Hi c(t), Ni(t), Yi(t)],李雅普诺夫函数为

定义李雅普诺夫函数在相邻两个时隙之间的期望偏差为一步李雅普诺夫漂移,即:

L的绝对值越小(Θ(t))表示队列积压的波动越小,队列稳定性越好。漂移加惩罚项定义为

V比;0用于在“延迟最小化”和“队列稳定性”之间进行权衡。因此,P2的优化目标转化为最小化V L的上界(Θ(t)),在保持队列稳定的同时最小化延迟。基于所涉及的优化变量,将P2分解为三个确定性子问题,即SP1:任务卸载和功率控制;SP2:准入控制;SP3:计算资源分配。

4、基于联邦drl的多层多时间尺度多维资源分配

我们提出FEDERATION来解决P2。FEDERATION的框架如图2所示,由三个阶段组成

分别对应SP1、SP2、SP3的求解。各阶段具体内容如下:

A.任务卸载和电源控制

在SP1中,ui联合优化了大时间尺度任务卸载决策和小时间尺度传输功率为

Ui(t)受到CSI以及其他PIoT设备的任务卸载和电源控制决策的影响,这些设备对Ui是不可用的。为了解决SP1问题,我们在第五节中提出了一种基于联邦深度交流的半分布式多时间尺度任务卸载和功率控制算法。

B.进入控制

在SP2中,ui决定了允许的数据量。SP2的公式为

SP2是一个变量中的二次规划问题,它在允许的数据量和队列稳定性之间进行权衡。具体来说,当Qi(t)变大时,ui允许更少的数据进入设备端数据队列。反之,当Ni(t)变大时,接纳更多的数据以满足数据接纳的长期约束。

C.计算资源分配

在SP3中,给定x(g),边缘服务器sj和云服务器决定分配给处理ui数据的计算资源量。为方便起见,我们设置Λ(fi c(t)) = tZi c(t) (t−1)~ Uc i(t) +(1−xci (t))Qi(t)和Γ(f i,j (t)) =i, j (t) (t−1)˜问题我,j (t) + xi, j (t) xc我(t)气(t)。SP3的公式为

C8和C9规定分配的计算资源不能超过处理ui的数据积压所需的计算资源,这有助于提高资源利用效率。根据所涉及的优化实体,SP3可进一步分解为边-服务器端计算资源分配问题SP3-1和云-服务器端计算资源分配问题SP3-2。

1)边缘-服务器端:SP3-1公式为

为了将SP3-1问题转化为凸优化问题,我们将优化目标近似为

经过近似后,SP3−1是凸的,可以用拉格朗日对偶分解求解。这里省略了细节。

2)云-服务器端:SP3−2表述为

云服务器端计算资源分配算法同样可以通过平滑近似和拉格朗日对偶分解得到。

5、联邦深度交流半分布式多时间尺度任务卸载及功率控制算法

在本节中,我们提出了一种基于联邦深度交流的半分布式多时间尺度任务卸载和功率控制算法来解决SP1问题。

A. MDP模型

任务卸载和功率控制问题可以建模为MDPs[22],[23]。关键要素如下所示。

1)状态空间:对于设备ui,任务卸载SO i (t)的状态空间定义为

功率控制的状态空间SP i (t)还包括任务卸载决策,描述为

2)动作空间:任务卸载和功率控制的动作空间记为XO i (g) = {XO i,j (g), xC i (g)}, XP i (t) = {XP i,n(t)},其中XP i,n(t) = 1表示ui选择第n级发射功率。

3)即时奖励:由于SP1是一个最小化问题,所以使用成本函数。对于小时间尺度的功率控制,我们利用单槽代价函数Φ(XO i (g), XP i (t))。对于大时间段任务卸载,采用T0-slot代价函数,定义为Φ(XO i (g), XP i (t))对T0个时隙的累加和,即:

B.联邦深度交流半分布式多时间尺度任务卸载与电源控制

在提出的多时间尺度任务卸载和功率控制算法中,云服务器扮演中心服务器的角色并执行联邦平均,MBS负责中心服务器与PIoT设备之间的模型交换。使用先进的模型压缩技术[11],模型下载和上传的延迟可以忽略不计。构造了两组联邦深度交流网络,分别用于任务卸载和功率控制。对于任务卸载,将ui的全局行动者网络、全局评论家网络、ui的局部行动者网络、局部评论家网络分别记为ωO G(G)、θO G(G)、ωi O(G)、θi O(G)。对于功率控制,ui的全局行为者网络、全局批评网络、ui的局部行为者网络、局部批评网络分别记为ωP G(t)、θP G(t)、ωi P (t)、θi P (t),具体实现步骤见算法1。首先,采用随机权值初始化全局行动者网络和全局评论家网络。代价函数Φ(XO i (g), XP i (t))和ΦT0 (XO i (g))初始化为零。本文提出的算法由五个阶段组成,具体介绍如下。1)模型下载:在第gth纪元开始时,即t = (g−1)T0 + 1,各设备(如ui)从MBS中下载卸载全局网络的最大时间尺度任务。接下来,ui将局部任务卸载actor网络设置为θi O(g) = θO g (g)

局部任务卸载临界网络为ωi O(g) = ωO g (g)。在每个时隙,ui从MBS中下载功率控制全局网络,并将局部网络设置为θi P (t) = θP G(t)和ωi P (t) = ωP G(t)。2)动作绘制:在第gth纪元开始时,将状态空间SO i (g)分别输入θi O(g)和ωi O(t)。接下来,ui根据πO(SO i (t) | θi O(g))策略绘制任务卸载动作XO i (g)。在第gth epoch的每个时隙t,将状态空间SO i (t)分别输入θi P (t)和ωi P (t), ui根据πP (SP i (t) | θi P (t))的策略绘制出功率控制动作XP i (t)。然后,ui执行动作XO i(g)和XP i(t),并观察ui (t), Ei(t), f Ei,j (t)和fi c(t)。之后,ui计算一个槽的成本Φ(XO i (g), XP i (t)) as(25)和更新t0槽的成本ΦT0 (XO i (g)) as

3)局部模型更新:在第gth epoch结束时,即t = gT0, ui计算任务卸载网络的时间差(TD)误差,将局部任务卸载actor模型θi O(g)和局部任务卸载评论家模型ωi O(g)更新为

其中y E[0,1]是折现因子。K(g)绝对值越大,说明局部任务卸载模型的估计偏差越大。4o,, P和ψP分别为局部任务卸载行为体模型和评论家模型以及局部权力控制行为体模型和评论家模型的学习率。ui计算了功率控制网络的TD误差,并更新了局部功率控制角色模型(t)和局部功率控制批评模型w?(t)

4)本地模型上传:在第gth epoch结束时,ui将更新后的本地任务卸载actor网络θi O(g + 1)和评论家网络ωi O(g + 1)上传到中心服务器。在每个时点t, ui将更新后的本地功率控制actor网络θi P (t + 1)和评论家网络ωi P (t + 1)上传到中央服务器。

5)联邦平均:在第gth epoch结束时,即t = gT0,中心服务器根据所上传的局部任务卸载网络,对全局任务卸载actor模型θO G(G + 1)和全局任务卸载评论家模型ωO G(G + 1)进行联邦平均更新

φO i和φO分别为ui的训练批大小和全局任务卸载actor模型的和批大小。在每个时间段,中央服务器还执行联邦平均,以更新全球电力控制网络

此外,ui∈U将队列积压更新为(3)、(9)、(10)、(18)和(19)。Q1-2。备注1:设置T0 = 1,在任务卸载和功率分配在同一时间尺度的场景下,FEDERATION仍然适用,时间尺度不影响FEDERATION的算法结构。

Guess you like

Origin blog.csdn.net/weixin_62646577/article/details/128865110