【MobileNet V2】MobileNet V2

Document name: MobileNetV2: Inverted Residuals and Linear Bottlenecks
Published: 2018
Download address: https://openaccess.thecvf.com/content_cvpr_2018/papers/Sandler_MobileNetV2_Inverted_Residuals_CVPR_2018_paper.pdf


1 Introduction

Mobile Net V1 has the following issues:

  • no residual structure
  • After the training is over, it is found that the values ​​of many parameters of the depth-wise convolution are 0 for the following three reasons:
    • The number of depth-wise convolution weights is too small; the number of channels is too thin, and can only process two-dimensional information, not three-dimensional information
    • ReLU results in output less than 0 being 0, and the parameters are also 0
    • Low-precision data representation for mobile devices

Mobile Net V2 has made further improvements on the above problems.


2. Innovation points of the paper

1) Inverted residual structure – Inverted residual block

The similarities and differences between ResNet's residual block and Mobile Net's Inverted residual block:

  • Similarity: Structures are both conv 1x1 → \rightarrow conv 3x3 → \rightarrow conv 1x1 , and finally make a cross-layer connectionshortcut
  • the difference:
    • The channel number of the residual block is decreased first and then increased, while the channel number of the Inverted residual block is increased first and then decreased
    • The middle layer of the residual block conv 3x3is a standard 3x3 convolution, while the middle layer of the Inverted residual block is a depth-wise convolution
    • The shortcut of the residual block connects two high-dimensional layers (two layers with a large number of channels), while the Inverted residual block connects two low-dimensional layers (some places will
      write Inverted residual block to connect two bottleblocks across layers, and bottleblock refers to A bottleneck layer with a small number of channels.)
    • The residual block is followed by each layer ReLU 激活函数, and the first conv 1x1and conv 3x3middle the Inverted residual block are followed by ReLU6, and the last one conv 1x1is followed by one after dimension reduction linear 线性激活函数. That is, the layer indicated by the diagonal slash in the figure below is followed by the linear activation layer
      insert image description here

Inverted residual block structure description:

  • First do conv 1x1, and change the number of channels from kkk- dimension up totk tkt k dimension,ttt is the expansion factor (expansion factor). The calculation amount of this layer ish × w × k × ( tk ) h \times w \times k \times (tk)h×w×k×(tk)
  • Then do depth-wise convolution, keeping the number of channels unchanged. The calculation amount of this layer is h × w × ( tk ) × 3 2 h \times w \times (tk) \times 3^2h×w×(tk)×32
  • Finally do conv 1x1, dimension from tk tkt k down tok ′ k'k' . The calculation amount of this layer ish × w × ( tk ) × k ′ h \times w \times (tk) \times k'h×w×(tk)×k
    insert image description here
    So, the calculation amount of an Inverted residual block is:h × w × ( tk ) × ( k + 3 2 + k ′ ) h \times w \times (tk) \times (k + 3^2 + k' )h×w×(tk)×(k+32+k)

Why use an inverted residual structure?

(Let’s talk about the conclusion first and then elaborate) Because the nonlinear transformation (ReLU) will cause information loss, it is necessary to increase the dimension first, create redundant dimensions, then perform nonlinear transformation (ReLU) in the redundant dimensions, and finally convert the dimension Drop back and extract only the necessary information that is useful.

The author did an experiment, taking a spiral line X 2 × n X_{2 \times n} composed of n points in a 2-dimensional spaceX2×n, put it through the matrix T m × 2 T_{m \times 2}Tm×2Map to m dimensions and do ReLU activation. The expression is: Y = R e LU ( T m × 2 ⋅ X 2 × n ) Y = ReLU( T_{m \times 2} \cdot X_{2 \times n})Y=R e LU ( Tm×2X2×n) . ThenYYY throughT − 1 T^{-1}T1 is converted back to two-dimensional space, denoted asX ^ \hat XX^ , vs.XXX andX ^ \hat XX^ , observe information loss

** The m dimension corresponds to dim = 2 / 3 / 5 / 15 / 30 in the figure below
** T − 1 T^{-1}T1 isTTgeneralized inverse matrix of T

The experimental conclusion is that if the dimension m of the mapping is relatively low, a lot of information will be lost after the ReLU change. If the dimension m is relatively high, the lost information will be much less after ReLU transformation.This is why after conv 1x1 dimensionality reduction, ReLU is not used, but a linear activation function is used.

insert image description here


2) ReLU6

Why use ReLU6?
Because Mobile Net is designed to be applied to devices with relatively small memory such as mobile devices or embedded devices, it generally requires low-precision representation, such as using 8 bits to represent a number (there should be no general data type such as float8, This data type is used for an entirely specific purpose).

(The following sentence is my personal understanding, please point out if it is wrong)
Therefore, using ReLU6 can limit the value to 6, the integer part only occupies 3 bits, and the remaining bits are used to represent the decimal part, which can only be The expression is less accurate. ( See )

Then why is it 6, not 5, 7, 8?
The author said: the value 6 that fits best in 8 bits, which probably is the most use-case.
Most usage scenarios on the mobile end use float8, and using 6 is the most appropriate.


3. Network structure

insert image description here
t: expansion rate
c: the number of dimensions of the output
n: the number of repetitions of the bottleneck
s: (in repeated n modules) the step size of the first module, and the step size of the remaining modules is 1

Among them, some blocks need to be connected across layers, and some do not. The criterion for judging is whether it is downsampled. The downsampled block cannot be shortcutd because the input and output sizes are different. Only blocks without downsampling (blocks with stride=1) have shorcut. As shown below:

insert image description here
Code address: https://github.com/Enzo-MiMan/cv_related_collections/blob/main/classification/MobileNet/model_MobileNet_v2.py

Guess you like

Origin blog.csdn.net/weixin_37804469/article/details/129267665
v2