边缘智能：按需深度学习模型和设备边缘协同的共同推理

本文为SIGCOMM 2018 Workshop (Mobile Edge Communications, MECOMM)论文。

笔者翻译了该论文。由于时间仓促，且笔者英文能力有限，错误之处在所难免；欢迎读者批评指正。

本文及翻译版本仅用于学习使用。如果有任何不当，请联系笔者删除。

本文作者包含3位，En Li, Zhi Zhou, and Xu Chen@School of Data and Computer Science, Sun Yat-sen University

ABSTRACT (摘要）

As the backbone technology of machine learning, deep neural networks (DNNs) have have quickly ascended to the spotlight. Running DNNs on resource-constrained mobile devices is, however, by no means trivial, since it incurs high performance and energy overhead. While offloading DNNs to the cloud for execution suﬀers unpredictable performance, due to the uncontrolled long wide-area network latency. To address these challenges, in this paper, we propose Edgent, a collaborative and on-demand DNN co-inference framework with device-edge synergy. Edgent pursues two design knobs: (1) DNN partitioning that adaptively partitions DNN computation between device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference. (2) DNN right-sizing that accelerates DNN inference through early-exit at a proper intermediate DNN layer to further reduce the computation latency. The prototype implementation and extensive evaluations based on Raspberry Pi demonstrate Edgent’s eﬀectiveness in enabling on-demand low-latency edge intelligence.

作为机器学习的骨干技术，深度神经网络（DNNs）已经迅速成为人们关注的焦点。然而，在资源受限的移动设备上运行DNN绝不是微不足道的，因为它会带来高性能和高能耗开销。由于不受控制的长广域网延迟，将DNN加载到云中以便执行会带来不可预测的性能。为了应对这些挑战，在本文中，我们提出了Edgent，一种具有设备边缘协同作用的协作和按需DNN协同推理框架。 Edgent追求两个设计目标：（1）DNN划分，自适应地划分设备和边缘之间的DNN计算，以便利用邻近的混合计算资源进行实时DNN推理。（2）DNN正确调整大小，通过在适当的中间DNN层提前退出来加速DNN推理，以进一步减少计算延迟。基于Raspberry Pi的原型实现和广泛评估证明了Edgent在实现按需低延迟边缘智能方面的有效性。

1 INTRODUCTION & RELATED WORK （引言和相关工作）

As the backbone technology supporting modern intelligent mobile applications, Deep Neural Networks (DNNs) represent the most commonly adopted machine learning technique and have become increasingly popular. Due to DNNs’s ability to perform highly accurate and reliable inference tasks, they have witnessed successful applications in a broad spectrum of domains from computer vision [14] to speech recognition [12] and natural language processing [16]. However, as DNN-based applications typically require tremendous amount of computation, they cannot be well supported by today’s mobile devices with reasonable latency and energy consumption.

作为支持现代智能移动应用的骨干技术，深度神经网络（DNN）代表了最常用的机器学习技术，并且越来越受欢迎。由于DNN能够执行高度准确和可靠的推理任务，他们见证了从计算机视觉[14]到语音识别[12]和自然语言处理[16]等广泛领域的成功应用。但是，由于基于DNN的应用程序通常需要大量的计算，因此当今的移动设备无法很好地支持它们（在合理的延迟和能耗约束下）。

In response to the excessive resource demand of DNNs, the traditional wisdom resorts to the powerful cloud datacenter for training and evaluating DNNs. Input data generated from mobile devices is sent to the cloud for processing, and then results are sent back to the mobile devices after the inference. However, with such a cloud-centric approach, large amounts of data (e.g., images and videos) are uploaded to the remote cloud via a long wide-area network data transmission, resulting in high end-to-end latency and energy consumption of the mobile devices. To alleviate the latency and energy bottlenecks of cloud-centric approach, a better solution is to exploiting the emerging edge computing paradigm. Specifcally, by pushing the cloud capabilities from the network core to the network edges (e.g., base stations and WiFi access points) in close proximity to devices, edge computing enables low-latency and energy-efficient DNN inference.

为了应对DNN的过多资源需求，传统智慧采用强大的云数据中心来训练和评估DNN。从移动设备生成的输入数据被发送到云进行处理，然后在推断之后将结果发送回移动设备。然而，利用这种以云为中心的方法，大量数据（例如，图像和视频）通过长广域网数据传输上传到远程云，导致移动设备上大的端到端延迟和能量消耗。为了缓解以云为中心的方法的延迟和能量瓶颈，更好的解决方案是利用新兴的边缘计算范例。具体地，通过将云的能力从网络核心推送到紧邻设备的网络边缘（例如，基站和WiFi接入点），边缘计算实现低延迟和高效能的DNN推断。

While recognizing the benefts of edge-based DNN inference, our empirical study reveals that the performance of edge-based DNN inference is highly sensitive to the available bandwidth between the edge server and the mobile device. Specifcally, as the bandwidth drops from 1Mbps to 50Kbps, the latency of edge-based DNN inference climbs from 0.123s to 2.317s and becomes on par with the latency of local processing on the device. Then, considering the vulnerable and volatile network bandwidth in realistic environments (e.g., due to user mobility and bandwidth contention among various Apps), a natural question is that can we further improve the performance (i.e., latency) of edge-based DNN execution, especially for some mission-critical applications such as VR/AR games and robotics [13].

虽然我们认识到基于边缘的DNN推理的好处，但我们的实证研究表明，基于边缘的DNN推理的性能对边缘服务器和移动设备之间的可用带宽高度敏感。具体而言，随着带宽从1Mbps降至50Kbps，基于边缘的DNN推断的延迟从0.123s上升到2.317s，并且与设备上本地处理的延迟相当。然后，考虑到现实环境中易受攻击和易变的网络带宽（例如，由于用户移动性和各种应用之间的带宽争用），一个自然的问题是我们能否进一步改善基于边缘的DNN执行的性能（即延迟），特别是对于一些关键任务应用，如VR/AR游戏和机器人[13]。

To answer the above question in the positive, in this paper we proposed Edgent, a deep learning model co-inference framework with device-edge synergy. Towards low-latency edge intelligence, Edgent pursues two design knobs. The frst is DNN partitioning, which adaptively partitions DNN computation between mobile devices and the edge server based on the available bandwidth, and thus to take advantage of the processing power of the edge server while reducing data transfer delay. However, worth noting is that the latency after DNN partition is still restrained by the rest part running on the device side. Therefore, Edgent further combines DNN partition with DNN right-sizing which accelerates DNN inference through early-exit at an intermediate DNN layer. Needless to say, early-exit naturally gives rise to the latency-accuracy tradeoﬀ (i.e., early-exit harms the accuracy of the inference). To address this challenge, Edgent jointly optimizes the DNN partitioning and right-sizing in an on-demand manner. That is, for mission-critical applications that typically have a predefned deadline, Edgent maximizes the accuracy without violating the deadline. The prototype implementation and extensive evaluations based on Raspberry Pi demonstrate Edgent’s eﬀectiveness in enabling on-demand low-latency edge intelligence.

为了回答上述问题，我们在本文中提出了Edgent，一种具有设备边缘协同作用的深度学习模型协同推理框架。对于低延迟边缘智能（作为初始探索，本文我们只考虑执行延迟问题。在未来工作中，我们也将考虑能耗问题），Edgent追求两个设计目标。第一个是DNN分区，其基于可用带宽自适应地划分移动设备和边缘服务器之间的DNN计算，从而利用边缘服务器的处理能力，同时减少数据传输延迟。但值得注意的是，DNN分区后的延迟仍然受到设备端运行的其余部分的限制。因此，Edgent进一步将DNN分区与DNN正确大小调整相结合，通过在中间DNN层的早期退出来加速DNN推断。不用说，提前退出自然会产生延迟和准确度之间的均衡（即，提前退出会损害推断的准确性）。为了解决这一挑战，Edgent以按需方式协同优化DNN分区和正确大小调整。也就是说，对于通常具有预定截止时间的关键任务应用程序，Edgent在不违反截止时间的情况下最大化准确性。基于Raspberry Pi的原型实现和广泛评估证明了Edgent在实现按需低延迟边缘智能方面的有效性。

While the topic of edge intelligence has began to garner much attention recently, our study is diﬀerent from and complementary to existing pilot eﬀorts. On one hand, for fast and low power DNN inference at the mobile device side, various approaches as exemplifed by DNN compression and DNN architecture optimization has been proposed [3–5, 7, 9]. Diﬀerent from these works, we take a scale-out approach to unleash the benefts of collaborative edge intelligence between the edge and mobile devices, and thus to mitigate the performance and energy bottlenecks of the end devices. On the other hand, though the idea of DNN partition among cloud and end device is not new [6], realistic measurements show that the DNN partition is not enough to satisfy the stringent timeliness requirements of mission-critical applications. Therefore, we further apply the approach of DNN right-sizing to speed up DNN inference.

虽然边缘智能的话题在最近引起了很多关注，但我们的研究与现有的工作不同并且互为补充。一方面，对于移动设备侧的快速和低功率DNN推断，已经提出了DNN压缩和DNN架构优化为例的各种方法[3-5,7,9]。与这些工作不同，我们采用横向扩展方法释放边缘和移动设备之间协同边缘智能的好处，从而减轻终端设备的性能和能耗瓶颈。另一方面，虽然云和终端设备之间DNN划分的想法并不新鲜[6]，但实际测量表明DNN划分不足以满足任务关键型应用的严格的实时性要求。因此，我们进一步应用DNN正确大小调整的方法来加速DNN推理。

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

边缘智能：按需深度学习模型和设备边缘协同的共同推理

ABSTRACT (摘要）

1 INTRODUCTION & RELATED WORK （引言和相关工作）

猜你喜欢