Detailed explanation of MobileNet network

1、MobileNet

The MobileNet network focuses on lightweight CNNs in mobile terminals or embedded devices. Compared with traditional convolutional neural networks, it greatly reduces model parameters and computation load with a small decrease in accuracy.
The difference between traditional convolution and DW convolution (Depthwise Conv), in traditional convolution, the channel of each convolution kernel is equal to the channel of the input feature matrix (each convolution kernel will be compared with each dimension of the input feature matrix Convolution operation), the output feature matrix channel is equal to the number of convolution kernels. In DW convolution, the channel of each convolution kernel is equal to 1 (each convolution kernel is only responsible for one channel of the input feature matrix, the number of convolution kernels = the number of channels of the input feature matrix = output features The number of channels of the matrix)
insert image description here
PW convolution realizes changing the channel of the output feature matrix, DW convolution can be followed by a PW convolution, and the PW convolution is an ordinary convolution with a size of 1×1. Usually DW convolution and PW convolution are used together, called Depthwise Separable Convolution (depth separable convolution)
insert image description here
The following figure is a comparison of parameter calculations

2、MobileNetV2

The inverted residual structure is introduced in the V2 network structure.
The residual structure in the ResNet network is 1x1 convolution dimensionality reduction -> 3x3 convolution -> 1x1 convolution dimensionality reduction, in the inverted residual structure is 1x1 convolution dimensionality reduction -> 3x3DW convolution -> 1x1 convolution dimensionality reduction , which is the opposite operation to the residual structure. The ReLU6 activation function is basically used in the residual structure, but the last 1x1 convolution layer uses a linear activation function, because the ReLu activation function causes a large loss of low-level feature information.
insert image description here
Not all inverted residual structures have shortcut connections. Only when stride=1 and the input feature matrix is the same as the output feature matrix shape, there is a shortcut connection, because only when the shape is the same, the two matrices can be added, otherwise Only stitching operations can be performed, but when stride=1, it does not guarantee that the channel of the input feature matrix is the same as the channel of the output feature matrix.
The input feature matrix is h w k, after 1 1conv (the number of convolution kernels is tk), it becomes h w tk, [t is an expansion factor, corresponding to the expansion factor of the first layer 1 1conv convolution kernel in the inverted residual structure ], after a DW convolution with a 3 3 step distance of s, it becomes h/s w/s tk, and then after 1 1conv (the number of convolution kernels is k'), it becomes a h/s w/s k'
insert image description here
network structure diagram

3、MobileNetV3

blog link