Reference code: pdarts

1 Overview

Introduction: Based on the work of DARTS, this article points out that the model search results obtained by directly using the DARTS method will have performance degradation problems after being directly applied to large data sets (such as ImageNet). The article analyzes that this is due to There is a deep gap in the depth of the model structure in web search and verification (another data set). Searching directly using the corresponding data set (such as ImageNet) will bring a lot of computational overhead. This is also the reason why a lot of network searches first go to small data sets to extract the network structure, and then enlarge the network for large data sets. . A simple solution to this is to search in a multi-stage progressive way , while searching and excluding some invalid search spaces, so as to narrow the deep gap in this depth with limited computing resources. Regarding the stability of network search, the article proposes some regularization strategies to constrain skip-connection in the process of network search. The method of the article has a test error rate of 2.5% on the CIFAR-10 data set, and a top-1/5 error rate of 24.4% and 7.4% on ImageNet.

In the DARTS series of network search methods, first perform a network search on a small data set, and then scale it to the target data set (Figure a), and the article uses a multi-level progressive method (Figure b) , See the figure below for
Insert picture description here
details : Specifically, the article’s optimization of web search focuses on two aspects:

1) Multi-level approximation: The above figure uses a multi-level progressive method to improve the gap between the two domains, thus preventing the structure searched on a small data set from migrating to a large big data. The problem of allocation (the search space for small networks and large networks is different). At the same time, reduce resource consumption by excluding invalid search space, thereby making it possible to search deeper;
2) Search stability: As the network search progresses, the search algorithm will gradually tend to choose more skip connections, which will lead to the collapse of the network (the shallow network changes more rapidly during the gradient optimization process, rather than like Traditional cognition tends to deeper network structure). In order to improve this problem, the article proposes two improvements: use operation (acting on shortcut OP) level dropout to reduce the effect of skip connection; control the generation of skip connection;

2. Method design

2.1 Approximation of the search space

The approximation of the search space mentioned here is to use a multi-layer progressive form to approximate the search space of a large network, gradually increasing the depth of the search network during the search process of different layers, and at the same time excluding the lower probability part of the network structure. Used to reduce video memory usage, the process is described as shown
Insert picture description here
in the figure below: In the entire hierarchical search process, while the network depth is increased, the increase in video memory is well controlled. The following table shows the steps of searching video memory in several stages Occupation situation:

2.2 Regularization of the search space

Because in the process of network search, network search will tend to use skip connection (model collapse) instead of operations such as convolution/pooling. The explanation of this article is that skip connection has faster gradient descent properties, which makes it easy to fit on data sets such as CIFAR-10, so that the search algorithm will give this operation a greater probability of selection. As a result, the network cannot extract deeper semantic information, resulting in lower performance under other data sets. In fact, this is an over-fitting phenomenon.

In this regard, the article restricts skip connection from the perspective of network regularization, specifically the following two methods:

1) Add dropout after skip connection : This article uses a probability to drop it after the skip connection operation. The dropout probability decays with the search stage (a fixed dropout probability will hinder the improvement of network performance, skip Connection is also very important for the network), so that the larger probability is used in the initial stage, and the smaller probability is used in the later stage, which not only restricts the choice of skip connection, but also ensures the effectiveness of other operating parameters in the search space. Learning effectively guarantees the function of skip connection and takes into account the performance improvement of the network;
2) Each search unit skip connection is fixed as $M$ number: skip connection mentioned above is very important for network performance, and therefore can not be reduced too much, it is set to a proper range like the (2-4 hereinafter), which is the last stage The result afterwards. In fact, after the network search is completed, select the skip connections with the highest probability according to this hyperparameter. It should be noted that the regularization here needs to be performed after the first step of regularization, so as to reduce the probability of low-quality skip connections being selected;

Using the above regularization constraints can indeed constrain skip connection, but introduces more hyperparameters, such as drop rate/ $M$ , this is quite dependent on artificial settings, and I personally think it is a point that can be optimized in the future.

3. Experimental results

CIFAR10 and CIFAR100 data sets:
Insert picture description here

ImageNet data set:
Insert picture description here

《PDARTS：Bridging the Depth Gap between Search and Evaluation》论文笔记

1 Overview

2. Method design

2.1 Approximation of the search space

2.2 Regularization of the search space

3. Experimental results

Guess you like