Network in Network (2013), 1x1 convolution with Global Average Pooling

Blog: blog.shinelee.me | blog park | CSDN

EDITORIAL

"Network in Network" referred to NIN, from Yen Shui into teacher teams, first published in time arxiv of December 2013, references to the amount of 20,190,921 2871 (google scholar).

citations

NIN is still the network structure modified from the AlexNet on the basis of its main innovations are as follows:

  • Proposed mlpconv Layer : Layer mlpconv used in small multi-layer fully connected neural network (multilayer perceptron, MLP) "micro network" replace the convolution operation, the right weight micro network is shared feature map the input layer Patch all local . Convolution operation can be seen as a linear transformation, and micro network can fit more complex transformations, corresponding to enhance the ability of conv layer . More mlpconv layer stack of the entire network, which is also the origin of the name of the Network in Network.
  • Proposed Global Average Pooling (the GAP) : NIN fully connected layer is no longer used , the final count feature map in the same category mlpconv layer one output, each feature map for the GAP perfection FIG mean results for each category directly softmax The probability. GAP parameters while reducing the amount of, the final net force to guide learning feature map to the corresponding category Map confidence .
  • \ (1 \ times 1 \) Convolution : for the first time in mlpconv layer in the \ (1 \ times 1 \) convolution, \ (1 \ times 1 \) convolution can, without changing the size, flexible adjustment feature the number of channel map , widely influenced the design of the follow-up of the network, such as Inception series.

The sequence described herein above innovation, while passing on a fully connected relationship with the convolution, fully connected relationship with the GAP, the NIN Finally, the network structure.

mlpconv layer to achieve

mlpconv

Paper stresses, the use of a small mlpconv layer fully connected neural network to replace the convolution, convolution layer with the following schematic comparison mlpconv layer,

Comparison of linear convolution layer and mlpconv layer

For convolution layer, assuming there are N kernel, a size of each kernel \ (k \ times k \) , each convolution \ (k \ times k \) the size of the local recptive field / local patch linear mapping is N output, local patch summary of all the convolution result to obtain N feature map.

For mlpconv layer, replacing the convolution using micro network out, by each micro network \ (k \ times k \) of the local patch nonlinear mapped into N output, still obtain N summarized and feature map. Wen said micro network is fully connected neural network of small, but in the realization of this neural network is fully connected by several convolution layer to achieve, and why? For all the connections can be converted into a convolution .

The following is the "Dive into Deep Learning" in a NIN block (mlpconv layer) is implemented mxnet,

NIN block

from mxnet import gluon, nd
from mxnet.gluon import nn

def nin_block(num_channels, kernel_size, strides, padding):
    blk = nn.Sequential()
    blk.add(nn.Conv2D(num_channels, kernel_size, strides, padding, ctivation='relu'),
            nn.Conv2D(num_channels, kernel_size=1, activation='relu'),
            nn.Conv2D(num_channels, kernel_size=1, activation='relu'))
    return blk

一个NIN block通过1个卷积层和2个\(1 \times 1\)卷积层堆叠而成,这3个卷积层的输出channel数相同。对于第1个卷积层,因为kernel_size与local patch大小相同,所以对每一个local patch而言,这个卷积等价于全连接,共num_channels个输出,每个输出与local patch全连接的权重就是对应的整个卷积核,卷积核的数量也为num_channels。对于后面2个\(1\times 1\)的卷积层,输入都是num_channels维的向量,即num_channels个\(1\times 1\)的feature map,kernel_size与整个feature map的尺寸相同,这个\(1\times 1\)的卷积也就相当于全连接了。通过\(1\times 1\)的卷积实现了不同卷积核结果间的信息交流。

实际上,通过调整\(1\times 1\)卷积核的数量,可以在不改变输入feature map尺寸的情况,灵活地增加或减少feature map的channel数量,引入更多的非线性,表达能力更强,在实现feature map间信息交流的同时,获得信息的压缩或增广表示

Global Average Pooling

卷积神经网络的经典做法是 数个卷积层+几个全连接层,典型视角是将前面的卷积层视为特征提取器,将全连接层视为分类器。卷积层的计算量高但参数少,全连接层的计算量少但参数多,一种观点认为全连接层大量的参数会导致过拟合。作者提出了Global Average Pooling(GAP),取代全连接层,最后一层mlpconv layer输出的feature map数与类别数相同,对每一个feature map取平均,全连接层与GAP的对比如下图所示,图片来自Review: NIN — Network In Network (Image Classification),GAP的结果直接输给softmax得到每个类别的概率。

FC vs GAP

去掉全连接的GAP强制将feature map与对应的类别建立起对应关系,softmax相当于分数的归一化,GAP的输出可以看成是与每个类别相似程度的某种度量,GAP的输入feature map可以解释为每个类别的置信度图(confidence map)——每个位置为与该类别的某种相似度,GAP操作可以看成是求取每个类别全图置信度的期望。因为只有卷积层,很好地保留了空间信息,增加了可解释性,没有全连接层,减少了参数量,一定程度上降低了过拟合。

最后一层mlpconv layer输出的feature map如下,可以看到图片label对应的feature map响应最强,强响应基本分布在目标主体所在的位置。

Visualization of the feature maps from the last mlpconv layer

此外,作者还做将GAP与全连接层、全连接+dropout对比,在CIFAR-10库上的测试结果如下,

GAP comparison

GAP可以看成是一种正则,全连接层的参数是学习到的,GAP可以看成是权值固定的全连接层。上面的实验说明,这种正则对改善性能是有效的。

网络结构

论文中给出的整体网络结构如下,

NIN architecture

Paper did not give specific parameters, in fact, NIN is modified from the still AlexNet basis , the equivalent of two inserted after each convolution of the layer AlexNet \ (1 \ times 1 \) convolution layer, removing the Local Response Norm, while replacing the whole connection layer GAP. Here, mlpconv layer can be viewed as either enhance the expressive power of the original conv layer, it can also be seen as increasing the network depth.

NIN architecture in d2l

reference

Guess you like

Origin www.cnblogs.com/shine-lee/p/11655836.html