Understanding and application of 1×1 convolution in neural network

1×1 convolution

We often see a 1×1 convolution kernel in various networks. This is a very practical convolution kernel, so why use a 1×1 convolution kernel?
We can understand it from two angles

information fusion

insert image description here
The 1×1 convolution operation can fuse the information of multiple channels. For example, after the three channels in the above figure are convolved, they become one channel, and the pixels at the same position of different channels are weighted and added to the resulting pixels.
In MobileNet, Depthwise and pointwise operations are required:
insert image description here

  • Depthwise: Divide each channel into a group, and each group has a convolution kernel dedicated to the convolution of this channel. Compared with ordinary convolution, the amount of calculation is greatly reduced
  • Poinwise: Because each channel is related, a lot of information will be lost if only Depthwise is performed. We need to fuse the information in multiple unrelated channels, so we need to use 1×1 convolution

Dimension reduction (reduce the amount of calculation)

insert image description here

As shown in the figure above, if only a 5×5 convolution kernel is used, then our calculation amount is in_channels * Win * Hin * kernel_size ^ 2 * out_channels * Wout * Hout = 120, 422, 400

If the number of channels will be 16 after the 1×1 convolution operation, then the 5*5 convolution operation will be performed, then the calculation amount will be reduced to one-tenth

From this we can get that the 1 * 1 convolution operation can greatly reduce the amount of convolution operations

Application: At the end of every Inception in googlenet, ResNet is used to reduce dimensionality, and Pointwise in MobileNet
is shown in the figure below (an Inception in GoogleNet):
insert image description here

Guess you like

Origin blog.csdn.net/gary101818/article/details/124573364