When low-power, high-precision, Microsoft tiki neural network algorithm to search for semi-supervised learning

Author | Luo people for thousands, Xu Tan, Rui Qin Tao, Chen En Red, Rock Tie  

Source | Microsoft Research AI headlines (ID: MSRAsia)

Editor's note: In recent years, neural network structure search (Search Neural Architecture, NAS ) has made a major breakthrough, but still faces an unstable time-consuming search and search results challenge. To this end, Microsoft Asia Research Institute of Machine Learning Group on the neural network structure search algorithm SemiNAS based on semi-supervised learning can be time-consuming at improving search accuracy and reduce the time-consuming search in the same search accuracy in the same search. SemiNAS can reach 23.5% of the top-1 6.8% error rate and the error rate of the top-5 on ImageNet (mobile setting). Meanwhile, SemiNAS first neural network structure introduced search text-to-speech synthesis tasks (Text to Speech, TTS ) , the achieved results in low resources and enhance the robustness of the two scenarios.

NAS breakthrough has been made in recent years, it is by automating the design of neural network architectures, in many tasks (such as image classification, object recognition, language model, machine translation) achieved than human experts to design better results on the network.

Figure 1: NAS schematically frame

NAS includes a controller (Controller) and an evaluator (Evaluator), wherein the controller is responsible for generating different network structures, and then handed over to the evaluator to evaluate , as shown in FIG. Assessment needs to train the network, and then get the accuracy rate on the objectives and tasks of the validation data set, and returns to the controller. The controller uses the network structure and the corresponding learning accuracy, thereby generating a better network structure. This process, process evaluation is to assess the network is very time-consuming because it requires training for each of the network structure, and learning as much as they need the controller network structure - the accuracy of data as training, thus makes the whole search process is time-consuming assessment is very high. In previous work, the evaluator at least time-consuming hundreds GPU / day (equivalent to a few hundred GPU run one day).

Subsequently, researchers have proposed a one-shot search algorithm using the neural network structure parameter sharing (weight-sharing) of. Specifically, is to build a super-network, including the search space of all possible structures, sub-structures of the same parameters are shared between different structures. Training over the network at the same time training the equivalent of several network structure. This method is time-consuming directly down to 10 GPU / within a day, but because of their own problems (such as the average training time is less than) the exact cause of network structure and its real accuracy ordering relationship is weak, thereby affecting the controller of learning, is not stable search for a good network structure, and sometimes the effect of random search of the same.

In order to solve the time-consuming traditional methods of training high, and the poor one-shot method to search the unstable performance issues, Microsoft Research Asia, the machine learning group of researchers have proposed a neural network architecture SemiNAS search method based on semi-supervised learning, can reduce search time-consuming, while improving search accuracy .

method

The NAS controller uses a large number of neural network architecture and its corresponding accuracy were supervised learning (supervised learning) as training data. Training a large number of network structure until convergence to get the accuracy rate is very time consuming, but get unsupervised data (that is, only neural network structure itself without the corresponding accuracy) is very easy (for example, randomly generated network structure). Therefore, we want to use a lot of unsupervised data (neural network) readily available to help further learning controller, this method is known as semi-supervised learning (the SEMI-Supervised Learning) . This has two advantages: 1 improve performance: In almost the same training costs (the same supervised data), you can use a lot of unsupervised data to further improve the performance of search algorithms to search for a better network structure;. 2. reduce the time: in the case of search to achieve the same accuracy, by using a large number of unsupervised data, it can greatly reduce the number of supervised data to reduce training time consuming.

In order to use a lot of network structure without labels, to be in the network structure of a small number of tagged learning, then unlabeled network structure labeled (predict their accuracy) and then add them to the original training data for Learn. More specifically, we constructed a performance predictor to predict f_p accuracy of a network structure, which is trained by minimizing the loss of MSE, as shown in Equation 1, wherein L_p loss:

Official 1

We train on a limited data set supervision f_p, let it converges with its unlabeled neural network structure x 'prediction, the accuracy to obtain y' = f_p (x '). We predicted the raw data and monitoring data mixed performance predictor f_p further training in order to achieve higher accuracy.

Good performance predictor f_p training can be combined with a variety of NAS algorithms for prediction accuracy rate for its network structure learning. For example, the algorithm (e.g., NASNet [3], ENAS [6], etc.) and based on reinforcement learning algorithm based on evolutionary algorithms (e.g. AmoebaNet [4], Single Path One-Shot NAS [7], etc.), f_p may be used to generating candidate network structure prediction accuracy. For gradient based algorithms (e.g., DARTS [5] and the NAO [1]), the network can directly use the accuracy of the predicted structure of a network structure f_p derivative, update the network structure.

In this paper, we based on previous work NAO (Neural Architecture Optimization) [1], to achieve the SemiNAS search algorithm. NAO mainly comprises a coder - performance predictor - decoder frame, the encoder discrete neural network structure that maps to a vector continuous space in said performance predictor for predicting its accuracy, the decoder is responsible for the continuous vector representation is decoded into discrete neural network structure representation. In training, the three joint training, performance prediction is trained by regression task, the decoder is trained by the task of reconstruction. When generating a new network architecture, we enter a network structure, the performance predictor calculates a gradient of the input network structure, to obtain a better structure of the neural network by gradient ascent. More details on NAO can be found in the original papers.

The proposed combination of learning methods, in SemiNAS, we start with a small amount of training data overall framework label and unlabeled samples get a lot of neural network architecture from the search space, using the framework of these networks trained to predict the exact rate structure . Then use the original data from the tag and label good unlabeled data with fully trained the entire framework. After optimization of the method in accordance with NAO, generate better network structure.

Experimental results

We verified on the data sets and a plurality of tasks SemiNAS method, comprising image classification (NASBench-101 [2], ImageNet) and text-to-speech synthesis. It is worth mentioning that, NAS was first used in speech synthesis tasks, and achieved good results.

NASBench-101

First, we have conducted experiments on NASBench-101 [2] dataset. NASBench-101 is an open source authentication algorithm for NAS effect dataset contains 423k different network structure and its accuracy in the classification task CIFAR-10, i.e., the evaluator provides an out of the box, researchers easy to quickly verify its search algorithm itself, and other work and a fair comparison (eliminating the different training techniques, and random seed data set itself to bring the difference). The results shown in Table 1.

Table 1: Different methods NASBench-101 on the performance of

On NASBench-101, a random search method (Random Search), method (Regularized Evolution, RE) evolutionary algorithm and NAO after sampling a 2000 network structure were made of 93.66%, 93.97% and 93.87% of the average test accuracy rate. And SemiNAS in only a sampling of the 300 network structure after the test has achieved an average accuracy rate of 93.98 percent, while achieving the same performance and RE and NAO, greatly reduces the required resources. Further, when the sampling almost the same network structure (2100), SemiNAS get average test accuracy rate of 94.09 percent, more than any other search method.

ImageNet

We further validated in a larger ImageNet classification task SemiNAS performance, the search process, we only actually training assessed 400 structures, the final results shown in Table 2.

Table 2: performance on different methods of classification tasks ImageNet

In the mobile setting conditions (FLOPS <600M), SemiNAS searches the network structure made up of top-1 23.5% and 6.8% error rate of the top-5 error rate than other NAS methods.

Speech synthesis (TTS)

We also explored in new areas of application SemiNAS, use it for voice synthesis (Text to Speech, TTS) on the task.

When the NAS is applied to a new task that requires facing two basic problems: the search space design and search index design. For the design of the search space, we refer to the mainstream TTS model, designed based encoders - frame (backbone) decoder (encoder-decoder) is. In a specific arithmetic operation of each search, the candidate operating layer comprising Transformer (attention head comprising a different number), the convolution layer (comprising convolution kernels of different sizes), LSTM layer. For the design of the evaluation, the evaluation criteria are not objective in classification tasks, as well as language recognition task model tasks can be done automatically by the program. In TTS task, synthesized audio quality required labor to judge, but need to evaluate hundreds of NAS network model, which in the TTS is unrealistic. It is necessary to design an objective evaluation criteria. We found that the right focus FIG attentional mechanisms between the mass and its synthesized audio codec refocusing degree (diagonal focus rate, DFR) on the diagonal there is a strong correlation which audio quality of the final instructive, so choose it as an objective evaluation at the time of the search.

We try to use NAS solution to the current challenges facing the TTS two scenarios: a low-resource scenarios (low resource setting) and the robustness of the scene (robustness setting). In low-resource scenario, less available TTS training data, while robust scenario, the test is generally more difficult to enter text. We will NAO as one baseline contrast, keeping the NAO and SemiNAS the same time-consuming search to compare the final performance of the search structure in the experiment.

We LJSpeech on data set (24 hours of language text) test, for low resource scenarios, randomly selected about 3 hours of voice and text data as a training scenario to simulate low-resource, the final results of the experiment shown in Table 3.

Table 3: Different methods properties in low-resource scenarios

For audio finally generated, we intelligibility (Intelligibility Rate, IR), that is, one can understand the proportion of the number of words, to evaluate the performance of the model. It can be seen artificially designed Transformer TTS [8] achieved only 88% of intelligibility, before NAS algorithm NAO achieved 94% and 97% achieved SemiNAS intelligibility, improved compared Transformer TTS 9 %, compared to the NAO has also improved significantly. At the same time you can see, we have designed the search index DFR and IR was positively correlated verified using the DFR as an objective evaluation of the validity of the search.

For the robustness of the scene, we were on the whole LJSpeech training, and then find an additional 100 sentence more difficult (it contains many monosyllabic or repeat syllables, etc.) as a test set, the experimental results shown in Table 4.

Table 4: Different methods robust performance in the scene

We calculate the number of sentences repeated spit the word occurs in different models on the test set, missing words, and calculate the overall error rate (sentence, as long as a repeat spit the word or words that appear to record a leak error). Can be seen, Transformer TTS error rate reached 22%, SemiNAS to reduce it to 15%.

TTS experimental audio demo link:

https://speechresearch.github.io/seminas/

to sum up

SemiNAS semi-supervised learning, learning from neural network architecture without having a lot of training, on the one hand can enhance the performance of existing NAS phase method in training costs, on the other hand performance under the same conditions can be maintained to reduce training costs. Experiments show that the method on multiple tasks and data sets have achieved very good results. SemiNAS future we plan to apply to more search algorithm, while exploring NAS applications in more areas.

For more details, see the original paper:

Semi-Supervised Neural Architecture Search

Papers link: https: //arxiv.org/abs/2002.10389

Papers now open source codes.

GitHub link: https: //github.com/renqianluo/SemiNAS

references

[1] Luo, Renqian, et al. "Neural architecture optimization." Advances in neural information processing systems. 2018.

[2] Ying, Chris, et al. "NAS-Bench-101: Towards Reproducible Neural Architecture Search." International Conference on Machine Learning. 2019.

[3] Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[4] Real, Esteban, et al. "Regularized evolution for image classifier architecture search." Proceedings of the aaai conference on artificial intelligence. Vol. 33. 2019.

[5] Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "DARTS: Differentiable Architecture Search." (2018).

[6] Pham, Hieu, et al. "Efficient Neural Architecture Search via Parameters Sharing." International Conference on Machine Learning. 2018.

[7] Guo, Zichao, et al. "Single path one-shot neural architecture search with uniform sampling." arXiv preprint arXiv:1904.00420 (2019).

[8] Li, Naihan, et al. "Neural speech synthesis with transformer network." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.

【end】

Welcome to all developers under the Fanger Wei code scanning fill out the "big developers and AI research", just 2 minutes, you can harvest value of 299 yuan, "AI developers million people congress" live online tickets!

推荐阅读全球呼吸机告急!医疗科技巨头美敦力“开源”设计图和源代码中国无人机“老炮儿”回忆录
互联网之父确诊新冠,一代传奇:任谷歌副总裁、NASA 访问科学家微软为一人收购一公司?破解索尼程序、写黑客小说,看他彪悍的程序人生!在Kubernetes上部署一个简单的、类PaaS的平台,原来这么容易!2020年,这20个大家都认识的加密交易所过得怎么样?你点的每个“在看”,我都认真当成了AI
Released 1375 original articles · won praise 10000 + · views 6.85 million +

Guess you like

Origin blog.csdn.net/dQCFKyQDXYm3F8rB0/article/details/105336743