A Survey on Neural Architecture Search论文阅读

A Survey on Neural Architecture Search
作者: Martin WistubaAmbrish Rawat、Tejaswini Pedapati, 三位研究人员均就职于IBM

Abstract

The growing interest in both the automation of machine learning and deep learning has inevitably led to the development of a wide variety of automated methods for neural architecture search.
对AutoML和AutoDL的兴趣引起了自动化NAS方法的广泛兴起。
The choice of the network architecture has proven to be critical, and many advances in deep learning spring from its immediate improvements.
网络架构的选取只管重要,深度学习许多进步源于架构设计的提升。
However, deep learning techniques are computationally intensive and their application requires a high level of domain knowledge.
然而深度学习技术需要密集的计算,应用过程中还需要高水平的领域知识
Therefore, even partial automation of this process helps to make deep learning more accessible to both researchers and practitioners.
因此,即便是部分地自动化都能使研究者和相关技术人员更轻松的使用深度学习。
With this survey, we provide a formalism which unifies and categorizes the landscape of existing methods along with a detailed analysis that compares and contrasts the different approaches.
通过综述,本文提供了一个用于统一和分类现有方法的范式,详细地分析、比较了不同方法。
We achieve this via a comprehensive discussion of the commonly adopted architecture search spaces and architecture optimization algorithms based on principles of reinforcement learning and evolutionary algorithms along with approaches that incorporate surrogate and one-shot models.
综述采用的主要方法是综合讨论常用的架构搜索空间和架构优化算法(强化学习、进化算法及包含代理和一次学习模型的方法)。
Additionally, we address the new research directions which include constrained and multi-objective architecture search as well as automated data augmentation, optimizer and activation function search.
此外,本文提出了一些新的研究方向,包括约束多目标结构搜索、自动数据增强、优化器和激活函数搜索等。
Keywords: Neural Architecture Search, Automation of Machine Learning, Deep Learning, Reinforcement Learning, Evolutionary Algorithms, Constrained Optimization, MultiObjective Optimization

Introduction

Deep learning methods are very successful in solving tasks in machine translation, image and speech recognition. This success is often attributed to their ability to automatically extract features from unstructured data such as audio, image and text. We are currently witnessing this paradigm shift from the laborious job of manual feature engineering for unstructured data to engineering network components and architectures for deep learning methods. While architecture modifications do result in significant gains in the performance of deep learning methods, the search for suitable architectures is in itself a time-consuming, arduous and error-prone task. Within the last two years there has been an insurgence in research efforts by the machine learning community that seeks to automate this search process.
深度学习凭借其自动提取特征的能力在处理非结构化数据?方面优势突出,结构调整对网络性能改善作用显著,但设计合适的结构却是一个耗时的、易错的过程。过去的两年里,机器学习研究社区开启动了革新性的研究,即试图使网络架构的搜索过程自动化。
On a high level, this automation is cast as a search problem over a set of decisions that define the different components of a neural network.The set of feasible solutions to these decisions implicitly defines the search space and the search algorithm is defined by the optimizer.
抽象看来,这种自动化过程可视为在一组定义了神经网络组件的决策集合里的搜索问题。这些决策的可行解隐含地定义了搜索空间,而优化器定义了搜索算法。
Arguably, the works by Zoph and Le (Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017) and Baker et al. (Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017) mark the beginning of these efforts where their works demonstrated that good architectures can be discovered with the use of reinforcement learning algorithms. Shortly thereafter, Real et al. (Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc V. Le, and Alexey Kurakin. Large-scale evolution of image classifiers. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 2902– 2911) showed that similar results could also be achieved by the hitherto (迄今为止) well studied approaches in neuroevolution (Dario Floreano, Peter Du¨rr, and Claudio Mattiussi. Neuroevolution: from architectures to learning. Evolutionary Intelligence, 1(1):47–62, 2008.).
可以说,早期的NAS以Bowen Baker等人提出的MetaQNN和Barret Zoph等人提出的NASNet为代表。但随后Real等人发现采用研究已久的Floreano提出的neuroevolution的进化方法也可以实现与强化学习类似的结果。
However, both these search approaches consumed hundreds of GPU hours in their respective computations. Consequently, many of the subsequent works focused on methods that reduce this computational burden. The successful algorithms along this line of research leverage from the principle of reusing the learned model parameters, with the works of Cai et al. (Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 2787–2794, 2018a. URL https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/ view/16755.
Han Cai, Jiacheng Yang, Weinan Zhang, Song Han, and Yong Yu. Path-level network transformation for efficient architecture search. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsm¨assan, Stockholm, Sweden, July 10-15, 2018, pages 677–686, 2018b. URL http://proceedings.mlr.press/v80/ cai18a.html.
Han Cai, Ligeng Zhu, and Song Han. ProxylessNAS: Direct neural architecture search on target task and hardware. In Proceedings of the International Conference on Learning Representations, ICLR 2019, New Orleans, Louisiana, USA, 2019. URL https: //openreview.net/forum?id=HylVB3AqYm.) and Pham et al. (Hieu Pham, Melody Guan, Barret Zoph, Quoc Le, and Jeff Dean. Efficient neural architecture search via parameters sharing. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4095–4104) being the notable mentions.
然而上述方法都十分消耗GPU计算资源,因而许多后续工作集中于如何减小计算负担。搜索期间权值复用是比较成功的算法,沿着这一思路展开研究的有Cai和Pham等人的工作。
The design of the search space forms a key component of neural architecture search. In addition to speeding up the search process, this influences the duration of the search and the quality of the solution. In the earlier works on neural architecture search, the spaces were designed to primarily search for chain-structured architectures. However, with branched handcrafted architectures surpassing the classical networks in terms of performance, appropriate search spaces were proposed shortly after the initial publications (Zoph et al., 2018) and these have since become a norm in this field.
搜索空间作为NAS至关重要的元素,除了能够加速搜索过程外(搜索速度),还影响着持续时间(搜索周期)以及搜索所得结构的质量(搜索效果)。早期NAS研究工作中,通常选择链式结构的搜索空间。然而,随着Zoph等人提出的带分支结构逐渐超越经典网络结构的性能,研究界提出了许多合适的搜索空间,而这种带分支的结构也逐渐称为行业的规范。
In parallel to these developments, researchers have broadened the horizons of neural architecture search to incorporate objectives that go beyond reducing the search time and generalization error of the found architectures. Methods that simultaneously handle multiple objective functions have become relevant. Notable works include methods that attempt to limit the number of model parameters or the like, for efficient deployment on mobile devices (Tan et al., 2018; Kim et al., 2017). Furthermore, the developed techniques for architecture search have been extended for advanced automation of other related components of deep learning. For instance, the search for activation functions (Ramachandran et al., 2018) or suitable data augmentation (Cubuk et al., 2018a).
这些发展的同时,许多研究者拓宽了NAS的适用范围并寻求搜索时间和泛化误差之外的其优化目标,因而同时处理多目标优化目标的方法研究变得十分有意义了。这方面代表性的工作包括Tan和Kim等人在限制模型参数数量和移动端部署方面的成果。此外,架构搜索技术还广泛用于深度学习系统其它组件优化的研究,包括激活函数搜索和自动数据增强等方面的研究。
Currently, the automation of deep learning in the form of neural architecture search is one of the fastest developing areas of machine learning. With new papers emerging on arXiv.org each week and major conferences publishing a handful of interesting work, it is easy to lose track. With this survey, we provide a formalism which unifies the landscape of existing methods. This formalism allows us to critically examine the different approaches and understand the benefits of different components that contribute to the design and success of neural architecture search. Along the way, we also highlight some popular misconceptions pitfalls in the current trends of architecture search. We supplement our criticism with suitable experiments.
当前,以神经架构搜索为代表的自动深度学习是机器学习社区发展最迅速的研究领域之一,每周都会诞生许多有趣的工作,因而十分有可能让人迷失方向。通过本篇综述,我们提供了用于统一现有各研究子方向的范式,从而有助于我们理解设计成功的NAS方法的各种准则。在此过程中,我们还强调了当前架构搜索趋势中一些常见的误解陷阱。我们用适当的实验来补充我们的批评。
Our review is divided into several sections. In Section 2, we discuss various architecture search spaces that have been proposed over time. We use Section 3 to formally define the problem of architecture search. Then we identify four typical types of optimization methods: reinforcement learning, evolutionary algorithms, surrogate model-based optimization, and one-shot architecture search. We define these optimization procedures and associate them to existing work and discuss it. Section 6 highlights the architecture search, considering constraints, multiple objective functions and model compression techniques. Alternate approaches that are motivated from the use of transfer learning are discussed in Section 5. Similarly, the class of methods that use early termination to fasten the search process are detailed in Section 4. Finally, the influence of search procedures on related areas is discussed in Section 7. The discussion is supported with extensive illustrations to elucidate (阐明)the different methods under a common formalism and relevant experimentation that examines the different aspects of neural architecture search methods.
本篇综述分为七部分展开,第二节讨论了不同的架构搜索空间,第三节定义了架构搜索的问题并确定4类典型的优化方进行具体分析,即强化学习、进化算法、代理模型和one-shot搜索。在对优化算法分类后,我们将每类算法联系到具体的论文进行分析讨论。类似地,在搜索加速方面,第四节中对利用早停技术加速搜索的方法进行了讨论,第五节介绍了基于迁移学习的加速方法。第六节重点关注考虑约束、多目标以及模型压缩相关的架构搜索技术。第七节讨论了与NAS相关的其他领域的研究进展,并通过分离变量的思想考察了神经结构搜索方法的不同方面。第八节对NAS技术发展及其应用领域进行了展望。

发布了3 篇原创文章 · 获赞 1 · 访问量 1444

猜你喜欢

转载自blog.csdn.net/weixin_39833897/article/details/103998625