DenseNet----Narrow the network

author:

The idea for DenseNet stems largely from a work we published last year at ECCV called Deep networks with stochastic depth. At that time we proposed a method similar to Dropout to improve ResNet. We found that randomly "dropping" some layers at each step in the training process can significantly improve the generalization performance of ResNet. The success of this method gives us at least two inspirations:

First of all, it shows that the neural network does not have to be a progressive hierarchical structure, that is to say, a layer in the network can not only depend on the features of the immediately preceding layer, but can also depend on the features learned by the previous layer. . Imagine that in a random deep network, when the lth layer is thrown away, the l+1th layer is directly connected to the l-1th layer; when the second to the lth layer is thrown away, the l+1th layer is thrown away. The layer directly uses the features of the first layer. Therefore, a random deep network can actually be regarded as a DenseNet with random dense connections.

Secondly, we randomly throw away many layers in the training process without destroying the convergence of the algorithm, which shows that ResNet has obvious redundancy, and each layer in the network extracts only a few features (the so-called so-called residual). In fact, we randomly remove a few layers of the trained ResNet, and it will not have much impact on the prediction results of the network. Since there are so few features learned in each layer, can it reduce the amount of computation to reduce redundancy?

The design of DenseNet is based on the above two observations. We make each layer in the network directly connected to its previous layer to realize the reuse of features; at the same time, each layer of the network is designed to be particularly "narrow", that is, only very few feature maps are learned (the most extreme case is that each layer only learns one feature map) to reduce redundancy. These two points are also the main difference between DenseNet and other networks. It should be emphasized that the first point is the premise of the second point. Without dense connections, it is impossible for us to design the network too narrow, otherwise the training will appear under-fitting, even ResNet.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325343495&siteId=291194637