CNN-4: GoogLeNet model

1, GoogLeNet model profile

GoogLeNet deep learning is a new structure in 2014 Christian Szegedy proposed, the model received a ImageNet Challenge champion.

2, GoogLeNet model proposed

1) Prior to the AlexNet, VGG and other structures are obtained by depth (layers) to increase the network of better training effect, but it will bring an increase in the number of layers
and more negative effects, such as over-fitting, gradient disappeared, gradient explosion.
2) to solve these problems, of course, is to increase the depth and width of the network while reducing the parameters, in order to decrease the number, naturally think of the whole connection becomes sparse
connection. But in the realization of the whole connection becomes sparse after the upgrade is not connected to the actual calculation of the amount will be a qualitative, because most of the hardware is optimized for compute-intensive matrix,
sparse matrix although the data is small, but very time consuming calculations difficult to reduce.
3) Then, not only is there a way to maintain a sparse network structure, but with a high computing performance of the dense matrix. The literature suggests that a large number of sparse matrix can be
clustered into a relatively dense sub-matrix to improve computing performance, just as the human brain can be seen as repetitive neuronal accumulation, therefore, GoogLeNet team proposed
Inception network structure, is to construct a species "basis neuron" structure, to build a sparse, network structure of high computing performance.

3, Inception modular structure

Inception module is made from another point to improve the training results, which is capable of more efficient use of computing resources can be extracted more features at the same computation,
so as to enhance training results.
Inception gone through several versions of V1, V2, V3, V4 and other development, and constantly perfecting.

1) Inception V1

By designing a sparse network structure, but can produce dense data, both to increase the performance of neural networks, but also to ensure efficient use of computing resources.
Google presents the basic structure of the most original Inception:

The structure commonly used in the CNN convolution (1x1,3x3,5x5), pool operation (3x3) are stacked together (convolution, after pooling the same size, the sum channel),
on the one hand increases the width of the web on the other hand also increase the adaptability of the network scale.
Convolution network layer network able to extract detailed information for each input, but also a 5x5 filter can cover most of the input receiving layer. You can also be
a pool operations, to reduce space and reduce overfitting. On top of these layers, ReLU have to do a convolution operation after each layer to increase network
non-linear characteristic.
However, this Inception original version, all of the convolution kernel are on the floor on all outputs to do, and that the amount of computation required for nuclear 5x5 convolution on too,
had a great view of the characteristics of thickness, in order to avoid this case, before 3x3, 5x5 front, rear max pooling 1x1 convolution kernel were added to reduce the play
characteristics of FIG thickness effect, which also forms a network structure of Inception v1, as shown below:

 

1x1 convolution kernel of action:
The main purpose of 1x1 convolution is to reduce the dimensions, but also for correcting linear activation (ReLU). For example, the output layer is 100x100x128, has elapsed
after the 5x5 convolution channel layer 256 (stride = 1, pad = 2 ), output data 100x100x256, wherein the parameter is a convolution layer 128x5x5x256 = 819200.
And if the first layer of the output layer through the 1x1 convolution having 32 channels, then after having layers 256 5x5 convolution output, then the output data is still 100x100x256,
but has been reduced to the amount of convolution parameter 128x1x1x32 + 32x5x5x256 = 204800, was reduced by approximately 4 times.
Based on the network structure GoogLeNet Inception constructed as follows (total 22 Layer):

 

Description of the figure below:
(. 1) GoogLeNet modular structure (the Inception structure), additions and modifications to facilitate;
(2) the network using the final average pooling (pooled average) was used instead fully connected layer, from the idea NIN (Network in Network),
proved this can be increased by 0.6% accuracy. However, the actual finally added in a fully connected layers, mainly to facilitate adjustment of the output flexible;
(3) Although the full connection is removed, but the network is still used Dropout;
(. 4) in order to avoid gradient disappears, the network two additional auxiliary softmax for conducting gradient (secondary classifier) forward. Intermediate secondary classifier is
a layer used as a classification output, according to a smaller weight (0.3) was added to a final classification result, this is equivalent to doing a fusion model, to the network while
increasing the gradient back propagation signals also provides additional regularization, training for the entire network is very beneficial. And when the actual test, two
additional softmax will be removed.

GoogLeNet的网络结构图细节如下:

注:上表中的“#3x3 reduce”,“#5x5 reduce”表示在3x3,5x5卷积操作之前使用了1x1卷积的数量。
GoogLeNet网络结构明细表解析如下:
0、输入
原始输入图像为224x224x3,且都进行了零均值化的预处理操作(图像每个像素减去均值)。
1、第一层(卷积层)
使用7x7的卷积核(滑动步长2,padding为3),64通道,输出为112x112x64,卷积后进行ReLU操作
经过3x3的max pooling(步长为2),输出为((112 - 3+1)/2)+1=56,即56x56x64,再进行ReLU操作
2、第二层(卷积层)
使用3x3的卷积核(滑动步长为1,padding为1),192通道,输出为56x56x192,卷积后进行ReLU操作
经过3x3的max pooling(步长为2),输出为((56 - 3+1)/2)+1=28,即28x28x192,再进行ReLU操作
3a、第三层(Inception 3a层)
分为四个分支,采用不同尺度的卷积核来进行处理
(1)64个1x1的卷积核,然后RuLU,输出28x28x64;
(2)96个1x1的卷积核,作为3x3卷积核之前的降维,变成28x28x96,然后进行ReLU计算,再进行
128个3x3的卷积(padding为1),输出28x28x128;
(3)16个1x1的卷积核,作为5x5卷积核之前的降维,变成28x28x16,进行ReLU计算后,再进行32个
5x5的卷积(padding为2),输出28x28x32;
(4)pool层,使用3x3的核(padding为1),输出28x28x192,然后进行32个1x1的卷积,输出28x28x32。
将四个结果进行连接,对这四部分输出结果的第三维并联,即64+128+32+32=256,最终输出28x28x256。
3b、第三层(Inception 3b层)
(1)128个1x1的卷积核,然后RuLU,输出28x28x128
(2)128个1x1的卷积核,作为3x3卷积核之前的降维,变成28x28x128,进行ReLU,再进行192个3x3的
卷积(padding为1),输出28x28x192;
(3)32个1x1的卷积核,作为5x5卷积核之前的降维,变成28x28x32,进行ReLU计算后,再进行96个5x5
的卷积(padding为2),输出28x28x96;
(4)pool层,使用3x3的核(padding为1),输出28x28x256,然后进行64个1x1的卷积,输出28x28x64。
将四个结果进行连接,对这四部分输出结果的第三维并联,即128+192+96+64=480,最终输出输出为28x28x480
第四层(4a,4b,4c,4d,4e)、第五层(5a,5b)……,与3a、3b类似,在此就不再重复。
从GoogLeNet的实验结果来看,效果很明显,差错率比MSRA、VGG等模型都要低,对比结果如下表所示:

 

2) Inception V2
GoogLeNet凭借其优秀的表现,得到了很多研究人员的学习和使用,因此GoogLeNet团队又对其进行了
进一步地发掘改进,产生了升级版本的GoogLeNet。
GoogLeNet设计的初衷就是要又准又快,而如果只是单纯的堆叠网络虽然可以提高准确率,但是会导致
计算效率有明显的下降,所以如何在不增加过多计算量的同时提高网络的表达能力就成为了一个问题。
Inception V2版本的解决方案就是修改Inception的内部计算逻辑,提出了比较特殊的“卷积”计算结构。
1、卷积分解(Factorizing Convolutions)
大尺寸的卷积核可以带来更大的感受野,但也意味着会产生更多的参数,比如5x5卷积核的参数有25个,
3x3卷积核的参数有9个,前者是后者的25/9=2.78倍。因此,GoogLeNet团队提出可以用2个连续的3x3
卷积层组成的小网络来代替单个的5x5卷积层,即在保持感受野范围的同时又减少了参数量,
如下图:

 

那么这种替代方案会造成表达能力的下降吗?通过大量实验表明,并不会造成表达缺失。
可以看出,大卷积核完全可以由一系列的3x3卷积核来替代,那能不能再分解得更小一点呢?GoogLeNet团队考虑了nx1的卷积核,如下图所示,用3个3x1取代3x3卷积:

因此,任意nxn的卷积都可以通过1xn卷积后接nx1卷积来替代。GoogLeNet团队发现在网络的前期使用
这种分解效果并不好,在中度大小的特征图(feature map)上使用效果才会更好(特征图大小建议在12到20之间)。

2、降低特征图大小
一般情况下,如果想让图像缩小,可以有如下两种方式:

先池化再作Inception卷积,或者先作Inception卷积再作池化。但是方法一(左图)先作pooling(池化)
会导致特征表示遇到瓶颈(特征缺失),方法二(右图)是正常的缩小,但计算量很大。为了同时保持特征表示
且降低计算量,将网络结构改为下图,使用两个并行化的模块来降低计算量(卷积、池化并行执行,再进行合并)

使用Inception V2作改进版的GoogLeNet,网络结构图如下:

注:上表中的Figure 5指没有进化的Inception,Figure 6是指小卷积版的Inception(用3x3卷积核代替5x5卷积核),
Figure 7是指不对称版的Inception(用1xn、nx1卷积核代替nxn卷积核)。
经实验,模型结果与旧的GoogleNet相比有较大提升,如下表所示:

 

3)Inception V3
Inception V3一个最重要的改进是分解(Factorization),将7x7分解成两个一维的卷积(1x7,7x1),
3x3也是一样(1x3,3x1),这样的好处,既可以加速计算,又可以将1个卷积拆成2个卷积,使得网络
深度进一步增加,增加了网络的非线性(每增加一层都要进行ReLU)。另外,网络输入从224x224变为了299x299。

4)Inception V4
Inception V4研究了Inception模块与残差连接的结合。ResNet结构大大地加深了网络深度,还极大地提升了
训练速度,同时性能也有提升(ResNet的技术原理介绍见本博客之前的文章:大话深度残差网络ResNet)。
Inception V4主要利用残差连接(Residual Connection)来改进V3结构,得到Inception-ResNet-v1,
Inception-ResNet-v2,Inception-v4网络。
ResNet的残差结构如下:

 

将该结构与Inception相结合,变成下图:

通过20个类似的模块组合,Inception-ResNet构建如下:

参考文献

[1] 大话CNN经典模型:GoogLeNet(从Inception v1到v4的演进)

[2] Szegedy C , Liu W , Jia Y , et al. Going Deeper with Convolutions[J]. 2014.

[3] Ioffe S , Szegedy C . Batch normalization: accelerating deep network training by reducing internal covariate shift[C]// International Conference on International Conference on Machine Learning (ICML), 2015.

[4] Szegedy C , Vanhoucke V , Ioffe S , et al. Rethinking the Inception Architecture for Computer Vision[C]// IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ,2016.

[5] Szegedy C , Ioffe S , Vanhoucke V , et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning[J]. 2016.

Guess you like

Origin www.cnblogs.com/ai-learning-blogs/p/11099736.html