MobileNet network model (V1, V2, V3)

insert image description here
Highlights in the MobileNet network: DW convolution, adding two hyperparameters, controlling the number of convolution kernels in the convolutional layer α, and controlling the size of the input image β, these two hyperparameters are artificially set by us, not Learned. BN batch normal batch normalization, in order to speed up the training convergence speed and improve the accuracy rate.
First look at the traditional convolutional network. The number of channels of the convolution kernel must be consistent with the number of channels of the input feature matrix; the number of channels of the output feature is equal to the number of convolution kernels and DW
convolution, each of its convolutions The depth of the kernel is 1, a channel of the input feature matrix is ​​convolved with a convolution kernel to output an output feature matrix, and each convolution kernel is responsible for one channel.
insert image description here

PW convolution is an ordinary convolution except that the size of its convolution kernel is equal to 1. We can see that the depth of each convolution kernel is the same as the depth of the input matrix, and the depth of the output feature matrix is ​​the same as that of our convolution kernel. The numbers are the same. Usually our DW and PW are used together.
insert image description here
The input feature matrix of PW is equal to the output feature matrix of DW. And the depth of the output feature matrix of DW is equal to the depth of the input feature matrix of DW. Let's compare the amount of calculation compared to the traditional network and the calculation amount of depth-separable convolution DW+PW.
insert image description here
MobileNet model
insert image description here
MobileNetV2
insert image description here
Let's look at the inverted residual
. Ordinary residuals are first 1x1 convolution dimensionality reduction, and then 3x3 convolution , and then increase the dimension by 1x1 convolution.
Inverted residuals are first dimensionally increased by 1x1 convolution, then 3x3dw convolution is used, and then dimensionality is reduced by 1x1 convolution. It
should be noted that the activation function used by the ordinary residual network is ReLu, and the activation function used by the inverted residual is ReLu6
insert image description here
. In ordinary Relu, when our input value is less than 0, then we set it to 0 by default, and when it is greater than 0, we do not process it. In Relu6, when the
input value is less than 0, we set it to 0, if in In the interval from 0 to 6, we will not change the input value. When the input value is greater than 6, the input value will be set to 6.
insert image description here
In MobileNetv2, a linear activation function is used for the last 1x1 convolutional layer of the inverted residual structure. , instead of Relu, because the Relu activation function causes a large loss of low-dimensional feature information. Because our residuals are thin on both sides and thick in the middle, it is a low-dimensional feature vector when output. So we need to use a linear activation function instead of Relu to avoid information loss.
insert image description here
Not every residual structure in MobileNetv2 has a shortcut connection.
insert image description here
MobilenetV2 model parameter
bottleneck is the inverted residual
insert image description here
MobilenetV3
insert image description here
SE Module: Attention Mechanism
Attention Mechanism Idea: For the obtained feature matrix, each channel is pooled and then the output vector is obtained through two fully connected layers. It should be noted that for the first fully connected layer, its The number of nodes in the fully connected layer is equal to 1/4 of the feature matrix channel, and the number of nodes in the second fully connected layer is consistent with the channel of our feature matrix.
insert image description here
If we assume that the channel of the input feature matrix is ​​2, and use the average pooling to get two vectors, the first fully connected layer should be 1/4 of 2, but here it is 2 for the sake of example, and then use Relu, The channel of the second fully connected layer is consistent with the input feature matrix channel, which is 2, and then uses the H-sig function to output 0.5 and 0.6, let 0.5 be multiplied by all the features of the first input, and then let 0.6 be multiplied by the second All the features of the input get new channel data. This is the simple implementation process of the SE module.
insert image description here
NL represents a nonlinear activation function. It should be noted here that the 1x1 dimensionality reduction convolutional layer does not use an activation function. It can also be said that it uses a linear activation function.
insert image description here
Redesign the time-consuming layer structure
insert image description here
! [Insert picture description here](https://img-blog.csdnimg.cn/cfbbd74b785445d4a82d2dd9fc887d1d.png
insert image description here
MobileNetV3 model parameters
insert image description here
Pay attention to the part of the blue box here, its input and expsize are the same , indicating that instead of using 1x1 for convolution, DW convolution is used directly for output. bneck represents the convolution kernel of DW convolution.
expsize represents the first dimension-enhancing convolution, and how much we want to increase its dimension. Here How much expsize is given, we will use 1x1 convolution to increase the channel of our feature matrix to how many dimensions. out corresponds to the value after we use 1x1 dimensionality reduction. SE represents whether to use the attention mechanism, HS represents the H-swish function , RE represents the RElu activation function.
The introduction to the three versions of MobileNet ends here, for learning and reference only.

Guess you like

Origin blog.csdn.net/Kirihara_Yukiho/article/details/128318197