Reference [You can move to the first reference, original, only recorded here]
inception-v1,v2,v3,v4----paper notes
Detailed explanation of Inception structure (from V1 to V4, then to xcpetion)
review
develop
Inception v1(GoogLeNet, 2014) —> Inception v2(BN-Inception, 2015) —> Inception v3(2015) —> Inception v4(Inception-ResNet, 2016) —> Xception(2016)
Corresponding papers and their time
GoogLeNet v1:《Going deeper with convolutions》, 2014.09
Inception v2:《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》, 2015.02
Inception v3:《Rethinking the Inception Architecture for Computer Vision》, 2015.12
Inception v4:《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》, 2016.02
Xception:《Xception: Deep Learning with Depthwise Separable Convolutions》, 2016.10
structural details
Inception v1
overview
Problem: Network performance improvement depends on depth and width (hidden layer and the number of neurons in each layer), but it will lead to an increase in the number of parameters, resulting in overfitting/gradient disappearance.
Solution: Change full connection to sparse connection, such as convolution structure. Inception finds the optimal convolution sparse structure.
detail
Inception role
Increase the number of units per step and provide multi-scale features.
Inception replaces manually determining the type of filter in the convolutional layer or whether to create a convolutional layer and a pooling layer, and lets the network learn what parameters it needs.
The role of 1*1 convolution
Reduce dimensionality and reduce computing bottlenecks
Increase the number of network layers and improve the expressive ability of the network
Inception v2
overview
Introduce BN and improve structure
The BN layer solves the problem that the input is the same, but the output of the same network layer is different (because the parameters of each update of the network will change).
Function (the activation function has a large gradient in the excited area, which speeds up network training and prevents the gradient from disappearing):
Accelerate network training
Prevent vanishing gradients
Improved structure: Replace the 5*5 convolution in Inception-v1 with two 3*3 convolutions, which is also the idea mentioned in the VGG paper. This approach has two advantages:
Reduce parameters while maintaining the same receptive field
Enhance nonlinear expressive ability
Inception V3
overview
The structural design and optimization ideas of the neural network and the improved structure are proposed.
Design Guidelines
Avoid network expressive bottlenecks, especially on the front end of the network. The feature map decreases sharply. If the compression of the layer is too large, a lot of information will be lost, and the model training will be difficult.
Local processing of high-dimensional features is more difficult.
Aggregate in lower dimensional spaces without loss of expressive power.
Balance the width and depth of your network.
improve structure
1. Decompose the convolution kernel size
Decomposed into symmetrical small convolution kernels (5*5 changed to two 3*3)
Decomposed into asymmetric convolution kernels (n*n convolution kernels are replaced by 1*n and n*1, which perform well in 12-20 dimensions, but not in large dimensions.)
asymmetric advantage
Save a lot of parameters
Add a layer of nonlinearity to improve the expressive ability of the model
Can handle richer spatial features and increase the diversity of features
2. Use an auxiliary classifier
Two auxiliary classifiers are used in GoogLeNet (Inception). Advantages:
Pass the gradient back effectively, there will be no problem of gradient disappearance, and the training will be accelerated
The characteristics of the middle layer are also meaningful, and the spatial location features are relatively rich, which is conducive to the discrimination of the commission model
Change the way to reduce the size of the feature map
The traditional convolutional neural network approach, when there is pooling (the pooling layer will lose a lot of information), will increase the thickness of the feature map before (that is, double the number of filters). The figure below is the point of view of this paper.
Inception V4
overview
The residual network is not necessary to deepen the network layer, because of good initialization and BN
Residual Inception Block
Scaling of the Residual
当过滤器的数目超过1000个的时候,会出现问题,网络会“坏死”,即在average pooling层前都变成0。即使降低学习率,增加BN层都没有用。这时候就在激活前缩小残差可以保持稳定。即下图
网络精度提高原因
残差连接只能加速网络收敛,真正提高网络精度的还是“更大的网络规模”。