Convolution (CNN) | Deconvolution (TCNN, Transposed CNN) | Channel change of feature map during convolution | How to reduce/upgrade dimension of 1 × 1 convolution kernel

Regarding some small concepts in deep learning, because the resources on the Internet are a bit scattered, here is a summary of some of my own, which is convenient for review later , and I will pick some pictures, but the references will be marked !


1. Convolution

For convolution, the following picture is enough, an operation performed by moving the convolution kernel on according to stridethe step size . 1feature map

insert image description here


2. Deconvolution

Deconvolution of two images can be explained clearly, as follows:

insert image description here

Note: The green one in the above picture is the deconvolution kernel, and we can get 4 feature maps.

Superimpose the 4 feature maps according to the figure below to get the final feature map. It is worth noting that it is indeed composed of 3 × 3 3\times 33×3 becomes4 × 4 4\times 44×4 now.

insert image description here


3. Changes in channels during convolution

Now that we talk about deep learning, what is depth? On the one hand, it refers to the number of layers of the network. How many convolutional layers are there? On the other hand, the number of channels of the feature map is also included. So how exactly do channels change between two convolutions?

Just look at the picture below. 2

insert image description here

In the convolution process, how many channels are there in the input layer, and how many channels are there in the filter (you can see that the input and filter are the same thickness, that is to say, the thickness of the convolution kernel should be the same as the channel of the input feature map The numbers are equal), but the number of filters is arbitrary (that is, how many convolution kernels (filters) we have can be changed, that is, the output channel can be controlled), the number of filters Determines the number of channels of the feature map after convolution.

Q1: With so many channels, if there is only one convolution kernel, why only one layer of feature map is output in the end?

A1: Here is an accumulation. For example, if there are 10 channels, then there are 10 layers of convolution kernels. Finally, these 10 layers are added together to get one. Of course, if we have 20 convolution kernels (filter) for convolution, then we will get 20 layers.

insert image description here

It is worth noting that when using many convolution kernels, the bias of each convolution kernel is different 3 , I am not very clear here, and I will add it later.


4. Dimension reduction/upgrade of 1 × 1 convolution kernel

If the output and input of the convolution are just a plane, then the 1x1 convolution kernel has no meaning, it does not consider the relationship between the pixel and other surrounding pixels at all. However, the output and input of the convolution is a cuboid, so the 1x1 convolution actually performs linear combination (information integration) on different channels for each pixel, and retains the original planar structure of the picture, and adjusts the depth to complete. Dimensionality enhancement or dimensionality reduction functions. 4

As shown in the figure below, if you choose a 1x1 convolutional layer with 2 filters, the data will be reduced from the original depth 3 to 2. If 4 filters are used, it will increase the dimension.

insert image description here


  1. https://blog.csdn.net/weixin_39326879/article/details/120797857 ↩︎

  2. https://blog.csdn.net/briblue/article/details/83063170 ↩︎

  3. https://blog.csdn.net/weixin_47125742/article/details/118788914 ↩︎

  4. https://zhuanlan.zhihu.com/p/27642620 ↩︎

Guess you like

Origin blog.csdn.net/myf_666/article/details/129265038