automl- evolutionary learning - paper notes - EAT-NAS: Elastic Architecture Transfer for Accelerating Large-scale Neural Archite

HUST method proposed EAT-NAS: to enhance the speed of large-scale neural model search

background

Many existing NAS method on a small scale structure of the search database, and then manually adjust the width and depth for large-scale databases. This mechanism widely used in the NAS space. However, due to the different domains between small-scale and large-scale database database, model search algorithm on a small scale database when applied to large-scale databases, and can not vouch for it.

Sharing and Innovation

In this paper, the above limitation of the needle, put forward a more reasonable solution. The authors used a method of transfer learning from the small-scale structure for the task of large-scale task and applied to fine-tune. In more detail, the authors used a method elastic frame-based NAS - then select the league on large databases, by first using an existing method to search nerves frame on a small database, and then form the previous frame as an initialization seed search .

Generally speaking, the main highlights of this article can be summarized as follows:

  • An elastic structure proposed migration mechanism (Elastic Architecture Transfer Machanism) be used to compensate for differences in the structure of large-scale search of the tasks and small tasks.
  • Due to the use of optimal model on a small scale database as large seed initialization tasks, which effectively saves time model search in large databases.
  • In the case of saving computing resources, final model can still achieve good performance.

algorithm

Here Insert Picture Description

EAT-NAS basic idea shown above, the first to use evolutionary algorithms search for the optimal model on small tasks, and then as a second stage seed initialization, and then use evolutionary algorithms for large-scale job search. Due to the use of small tasks on the best model for the initial population of the big task to do the initial search process on a large scale tasks will be significantly faster than the speed of convergence from scratch.

  • The authors used a population quality discriminant function (Population Quality) in order to make a better evaluation of the model populations in the evolutionary process.
  • In addition, the authors used a descendant structure generator (offspring architecture generator) in the second stage to create a new structure. (Mainly it defines a new transformation function, by adding some disturbance, so that the input frame may be more lightweight and homogenization)
  • For while the accuracy and optimize the size of the model, the authors used to solve optimization Pareto, Pareto optimization is a method of multi-objective optimization problem for seeking the optimal solution set.
Published 172 original articles · won praise 250 · Views 230,000 +

Guess you like

Origin blog.csdn.net/xys430381_1/article/details/104346568