Inception-v1~v4

Reference [You can move to the first reference, original, only recorded here]

inception-v1,v2,v3,v4----paper notes

Detailed explanation of Inception structure (from V1 to V4, then to xcpetion)

review

develop

Inception v1(GoogLeNet, 2014) —> Inception v2(BN-Inception, 2015) —> Inception v3(2015) —> Inception v4(Inception-ResNet, 2016) —> Xception(2016)

Corresponding papers and their time

GoogLeNet v1:《Going deeper with convolutions》, 2014.09

Inception v2:《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》, 2015.02

Inception v3:《Rethinking the Inception Architecture for Computer Vision》, 2015.12

Inception v4:《Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning》, 2016.02

Xception:《Xception: Deep Learning with Depthwise Separable Convolutions》, 2016.10

structural details

Inception v1

overview

Problem: Network performance improvement depends on depth and width (hidden layer and the number of neurons in each layer), but it will lead to an increase in the number of parameters, resulting in overfitting/gradient disappearance.

Solution: Change full connection to sparse connection, such as convolution structure. Inception finds the optimal convolution sparse structure.

detail

Inception role

Increase the number of units per step and provide multi-scale features.

Inception replaces manually determining the type of filter in the convolutional layer or whether to create a convolutional layer and a pooling layer, and lets the network learn what parameters it needs.

The role of 1*1 convolution

  • Reduce dimensionality and reduce computing bottlenecks

  • Increase the number of network layers and improve the expressive ability of the network

Inception v2

overview

Introduce BN and improve structure

The BN layer solves the problem that the input is the same, but the output of the same network layer is different (because the parameters of each update of the network will change).

Function (the activation function has a large gradient in the excited area, which speeds up network training and prevents the gradient from disappearing):

  • Accelerate network training

  • Prevent vanishing gradients

Improved structure: Replace the 5*5 convolution in Inception-v1 with two 3*3 convolutions, which is also the idea mentioned in the VGG paper. This approach has two advantages:

  • Reduce parameters while maintaining the same receptive field

  • Enhance nonlinear expressive ability

Inception V3

overview

The structural design and optimization ideas of the neural network and the improved structure are proposed.

Design Guidelines

  1. Avoid network expressive bottlenecks, especially on the front end of the network. The feature map decreases sharply. If the compression of the layer is too large, a lot of information will be lost, and the model training will be difficult.

  1. Local processing of high-dimensional features is more difficult.

  1. Aggregate in lower dimensional spaces without loss of expressive power.

  1. Balance the width and depth of your network.

improve structure

1. Decompose the convolution kernel size

  • Decomposed into symmetrical small convolution kernels (5*5 changed to two 3*3)

  • Decomposed into asymmetric convolution kernels (n*n convolution kernels are replaced by 1*n and n*1, which perform well in 12-20 dimensions, but not in large dimensions.)

asymmetric advantage

  • Save a lot of parameters

  • Add a layer of nonlinearity to improve the expressive ability of the model

  • Can handle richer spatial features and increase the diversity of features

2. Use an auxiliary classifier

Two auxiliary classifiers are used in GoogLeNet (Inception). Advantages:

  • Pass the gradient back effectively, there will be no problem of gradient disappearance, and the training will be accelerated

  • The characteristics of the middle layer are also meaningful, and the spatial location features are relatively rich, which is conducive to the discrimination of the commission model

  1. Change the way to reduce the size of the feature map

The traditional convolutional neural network approach, when there is pooling (the pooling layer will lose a lot of information), will increase the thickness of the feature map before (that is, double the number of filters). The figure below is the point of view of this paper.

Inception V4

overview

  1. The residual network is not necessary to deepen the network layer, because of good initialization and BN

  1. Residual Inception Block

  1. Scaling of the Residual

当过滤器的数目超过1000个的时候,会出现问题,网络会“坏死”,即在average pooling层前都变成0。即使降低学习率,增加BN层都没有用。这时候就在激活前缩小残差可以保持稳定。即下图

  1. 网络精度提高原因

残差连接只能加速网络收敛,真正提高网络精度的还是“更大的网络规模”。

Guess you like

Origin blog.csdn.net/qq_41804812/article/details/129744061