What are the differences between DenseNet and ResNet

What are the differences between DenseNet and ResNet

ResNet DenseNet
1 contribute Residual learning is proposed to solve the degradation problem of network deepening Propose dense shortcuts to alleviate the problem of gradient disappearance, strengthen feature propagation, realize feature reuse, and greatly reduce the amount of parameters
2 The input of the current layer is different ∑ i = 0 l X i \sum_{i=0}^{l}{X_i} i=0lXi(addition of all previous layer inputs and outputs) C o n c a t e ( X 0 , X 1 , . . . , X l ) Concate(X_0,X_1,...,X_l) Concate(X0,X1,...,Xl) (channel dimensions concatenated)
3 training speed faster slower (because with llWith the increase of l , the number of input channels of each convolutional layer is also increasing, which is usually higher than theH l H_lHlThe number of channels is larger, and splicing also brings greater storage consumption)
4 Parameter amount Same depth, more parameters Less, because the number of output channels of each convolution block in the entire network is a fixed small value (such as 32)
5 feature information transfer The operation of summing destroys the transfer of feature information flow to a certain extent The splicing method enables the layers behind DenseNet to obtain richer input, strengthens the transmission of feature information flow, and realizes feature reuse

Difference 2 Explained

Refer to "ResNet or DenseNet? Introducing Dense Shortcuts to ResNet"
for standard convolution:
fl = H l ∗ fl − 1 f_l = H_l*f_{l-1}fl=Hlfl1
f l − 1 f_{l-1} fl1Refers to the feature map output by the previous convolution module.

insert image description here

For DenseNet, fl − 1 = Y l = ( X 0 / X 1 / . . . X l ) f_{l-1}=Y_l=(X_0/X_1/...X_l)fl1=Yl=(X0/X1/...Xl)

For ResNet, fl − 1 = Y l = ( X 0 + X 1 + . . . X l ) f_{l-1}=Y_l=(X_0+X_1+...X_l)fl1=Yl=(X0+X1+...Xl)

So there is a difference 2: ResNet processes the sum of the outputs of all previous layers, and DesNet processes the splicing of the channel dimensions output by all previous layers.

Difference 3 Explained

Compared with ResNet of the same depth, the training speed of DenseNet is slower, RepVGG: Making VGG-style ConvNets Great Again is because with llWith the increase of l , the number of input channels of each convolutional layer is also increasing, which is usually higher than theH l H_lHlThe number of channels is larger, and reading features with a larger number of channels each time will also consume a lot of IO time, so overall, the training speed of DenseNet will be slower. In addition, in order to store the spliced ​​feature map, the storage peak is also larger, which is not suitable for scenarios considering Memory-economical.

Difference 5 Explained

Because the input distribution and output distribution of each convolution block are different, and the addition operation will destroy the distribution of the convolution output, and the output distribution represents the characteristics learned by the current convolution block to a certain extent, so Said that it will destroy the transmission of characteristic information flow in the network.

Replenish:

In the paper "RepVGG: Making VGG-style ConvNets Great Again", I also saw an interesting angle. Both DenseNet and ResNet belong to the network of Multi-branch architecture.
ResNet is equivalent to two-branch, because the output flow of each convolution module has two branches, for nnResNet with n convolutional blocks can be seen as2 n 2^n2An ensemble of n shallow models. The output flow of each convolution module of DenseNet has multiple branches and has a more complex topology, which is equivalent to the integration of more shallow models. Therefore, the features learned by DenseNet contain richer information. Naturally, compared with ResNet of the same depth, the effect of classification detection is improved.

The above differences are also summed up in the process of reading the paper. If there is any inappropriateness, please point it out.

reference link

For a detailed explanation of ResNet, DenseNet, and gradient disappearance explosion, you can refer to the following blog post.

https://zhuanlan.zhihu.com/p/31852747

https://zhuanlan.zhihu.com/p/82709638

https://blog.csdn.net/qq_25737169/article/details/78847691

Guess you like

Origin blog.csdn.net/qq_40924873/article/details/124831891