enas algorithm explanation (popular version)

The full name of enas is Efficient Neural Architecture Search

The previous nas conventional algorithm is very computationally intensive bottleneck, which is mainly used for weight sharing, which can be observed later.

Link to the paper: https://arxiv.org/abs/1802.03268

Looking at the paper, the understanding of the enas algorithm is still relatively obscure, so I plan to write a popular understanding to facilitate understanding of enas

In fact, it is very simple to say, take a look at the following flowchart:

Note: I drew the sub-networks in the following figure at will, and may not be able to form a network.

 What nas needs to do is to select the sub-network and choose the sub-network with the best effect

So how do you choose a sub-network?

Suppose we have N sub-networks to be selected, namely the Child net1, Child net2...

Then we use pre-set weights to these networks, and then constitute N deep learning networks that can perform inference, and then evaluate these N networks on the data set to get the required loss

Then we choose the network structure with the best effect. We use it for training. After the training is completed, we get a new weight value. This weight value is given to the target network, and so on, you can iterate.

Of course, it needs to be explained that each iteration uses a different structure of the pending sub-networks ( Child net1, Child net2..... ). Specifically, this is the direction of optimization. There are many optimization methods. The reinforcement learning used in the paper Ways to optimize, optimization does not necessarily have to use reinforcement learning, other optimization methods are also ok.

As for why you need to change the layer in deep learning to node, you can think of it as a structure in graph theory, so that one of our weights can be given to different network structures, so it only needs to be trained once to load it into Different structures for reasoning.

Having said that, the advantage of sharing weights is brought out. According to the conventional thinking, don’t we have to train as many sub-networks as many times as we have, and then compare them. Now we can only train once per round, and then Load weight values ​​to different network structures

The amount of calculation is greatly reduced, so the nas that previously required hundreds of GPUs for training can now be trained with only a single card of GPUs.

Why do you need to use reinforcement learning for optimization instead of traversing and enumerating all the possibilities, the answer is that there are too many possibilities, and you can only find the best through the most optimized method.

Specifically, let's take a look at the results of training using nni:

This is the effect of reinforcement learning optimization:

We can see that the greater the loss, the greater the accuracy, the reward and punishment mechanism in reinforcement learning, negative scores indicate that the effect is very poor, and the positive scores indicate that you are doing well and giving enough rewards.

Let's take a look at the training effect after getting the optimal structure:

It can be seen that it is normal dnn training

There is another problem. When inferring for the first time, training has not been performed at that time, then where the weights are used as inferences, the answer is weight initialization, which is the kaiming initialization method:

At this point, I have a general understanding of enas.

Guess you like

Origin blog.csdn.net/zhou_438/article/details/114173951