Easy to understand DenseNet

Blog: blog park | CSDN | Blog

EDITORIAL

In the blog "ResNet and Detailed Analysis", we talked about the flow of information between the different layers ResNet implicit in the "and" in, so from the perspective of the flow of information is not complete, compared to ResNet, DenseNet biggest differences that the element-wise addition is not required to feature map, but by the concatenation feature map stitched together, the convolution DenseNet layer in front of each step of the convolution know what happened.

Crucially, in contrast to ResNets, we never combine features summation before they are passed into a layer; instead, we combine features by concatenating them.

ResNet with similar structure, DenseNet Dense Block is formed by a plurality of series, as shown in FIG.

https://arxiv.org/abs/1608.06993

Dense Block与Transition Layer

Within each Dense Block, each convolution feature map layer may know all the previous layers convolution output is input as it is spliced feature map of all the preceding layer output from the convolution, another point of view, each convolution layer was feature map to be output to the convolution of all layers behind it. Here that "each convolution layer" is not accurate, more accurate to say that "each convolution" , will be seen later, it is composed of a set of convolution 1 \ (1 \ times 1 \) convolutional layer and a \ (3 \ times 3 \) convolution stacking layers, i.e. bottleneck structure .

to ensure maximum information flow between layers in the network, we connect all layers (with matching feature-map sizes) directly with each other. To preserve the feed-forward nature, each layer obtains additional inputs from all preceding layers and passes on its own feature-maps to all subsequent layers.

Let's look at an example of Block Dense,

https://arxiv.org/abs/1608.06993

图中的\(x\)为feature map,特别地,\(x_0\)为网络输入,\(H\)为一组卷积,同Identity Mappings in Deep Residual Networks采用了pre activation方式,即BN-ReLU-\(1\times 1\)Conv-BN-ReLU-\(3\times 3\)Conv的bottleneck结构。\(x_i\)\(H_i\)输出的feature map,\(H_i\)的输入为concatenation of \([x_0, x_1, \dots, x_{i-1}]\)。定义每个\(H\)输出的 channel数为growth rate \(k =4\),则\(H_i\)的输入feature map有 \(k_0 + k\times (i-1)\)个channel,特别地,\(k_0\)\(x_0\)的channel数。所以,对于越靠后的\(H\),其输入feature map的channel越多,为了控制计算复杂度,将bottleneck中\(1\times 1\)卷积的输出channel数固定为\(4k\)。对于DenseNet的所有 Dense Block,growth rate均相同。

相邻Dense Block 之间通过Transition Layer衔接,Transition Layer由1个\(1\times 1\)卷积和\(2\times 2\)的average pooling构成,前者将输入feature map的channel数压缩一半,后者将feature map的长宽尺寸缩小一半。

可见,bottleneck和Transition Layer的作用都是为了提高计算效率以及压缩参数量。

DenseNet网络架构与性能

DenseNet用于ImageNet的网络架构如下,通过上面的介绍,这里的架构不难理解。

https://arxiv.org/abs/1608.06993

DenseNet的Parameter Efficiency很高,可以用少得多的参数和计算复杂度,取得与ResNet相当的性能,如下图所示。

https://arxiv.org/abs/1608.06993

理解DenseNet

DenseNet最终的输出为前面各层输出的拼接,在反向传播时,这种连接方式可以将最终损失直接回传到前面的各个隐藏层,相当于某种Implicit Deep Supervision强迫各个隐藏层学习到更有区分里的特征

DenseNet对feature map的使用方式可以看成是某种多尺度特征融合,文中称之为feature reuse,也可以看成是某种“延迟决定”,综合前面各环节得到的信息再决定当前层的行为。文中可视化了同block内每层对前面层的依赖程度,

For each convolutional layer ‘ within a block, we compute the average (absolute) weight assigned to connections with layers. Figure 5 shows a heat-map for all three dense blocks. The average absolute
weight serves as a surrogate for the dependency of a convolutional layer on its preceding layers.

https://arxiv.org/abs/1608.06993

图中可见每个Dense Block中每层对前面层的依赖程度,约接近红色表示依赖程度越高,可以看到,

  • Dense Block内,每个层对其前面的feature map利用方式(依赖程度)是不一样的,相当于某种“注意力
  • Transition Layer Classification Layer and its last front -dependent macroscopic features relatively high , this tendency is more obvious deeper

Plain Net, ResNet and DenseNet

May not be here to do a proper analogy, compare Plain Net, ResNet and DenseNet.

If the behavior of the network metaphor of painting, known eventually hope painted look, but to go through the hands of N individuals, each person limited ability to draw, in front of a person to person unfinished back.

  • Net Plain : unfinished front of a person, a person can only refer to the front and rear of a person to draw a picture of themselves again, despite his limited ability, but he had to draw.

  • ResNet : unfinished front of a person, a person behind the painting on its basis, he is more concerned about the current differences in the final part of the painting and drawing, but he still does not draw right.

  • DenseNet : current painting one can see in front of everyone's painting, but he also knows the order of all painting and painters who relatively better and more reliable, referring to him in front of all the paintings themselves a redrawn, and together with all previous back to the drawing together of people.

Not difficult to see, ResNet and DenseNet focus is not the same, but the probability should draw better than Plain Net.

So, if the ability to synthesize and DenseNet is not ResNet will draw better?

the above.

reference

Guess you like

Origin www.cnblogs.com/shine-lee/p/12380510.html