2020-12-09 Deep learning convolution kernel/filter, feature map (featue map), convolution layer

Concept learning: convolution kernel/filter, feature map (featue map), convolution layer

As a basic study, it is recommended to take a look at the electronic version:

[America] Michael Nielsen, Xiaohu Zhu/Freeman Zhang translation: "Neural Networks and Deep Learning" (Neural Networks and Deep Learning)

This book explains the basic principles of neural networks more clearly.

 

People translate feature maps into feature maps and channel into channels. Sometimes the two are talking about the same thing; but sometimes when the input and output are emphasized, it is called a channel, and the feature that is found after the image is processed by a neural network is called a feature map. In the following description process, we will not make a significant distinction. Students can understand that the feature maps output by the previous layer are the channels input by the next layer.

 

(1) Convolution kernel/filter

The convolution kernel is also called a filter.

Each convolution kernel has three dimensions: length, width, and depth.

The length and width of the convolution kernel are manually specified. Length X width is also called the size of the convolution kernel. Commonly used sizes are 3X3, 5X5, etc.

When specifying the convolution kernel, you only need to specify its length and width. This is because usually the depth of the convolution kernel (can also be understood as the number of channels) and the depth of the current image (the number of feather maps, such as: RGB The three channels are the same as the three feature maps.

During the convolution process, how many channels the input layer has (the number of input feature maps), and how many channels the filter has (the depth of the convolution kernel); but the number of filters is arbitrary, and the number of filters determines The number of output channels after convolution (ie: the number of output feature maps).

In many commonly used architectures, as the network used for computing gets deeper and deeper, the number of filters used increases (for example, the second is 64, the third is 128, and so on) .

 

(2)feature map

Input layer: In the input layer, if it is a grayscale image, there is only one feature map (one channel); if it is a color image, there are generally three feature maps (red, green and blue).

Other layers: There will be several convolution kernels (kernels) (also called filters) between the layers. The feature map (channel) of the previous layer is convolved with each convolution kernel to produce the next layer A feature map of; there are N convolution kernels, and the lower layer will generate N feather maps (ie: N output channels).

 

(3) Convolutional layer

Many convolutional architectures start with an external convolution unit, which maps the channel RGB input image to a series of internal filters. In a deep learning framework, this code might look like this:

 

out_1=Conv2d(input=image, filter=32, kernel_size=(3,3), strides=(1,1))

relu_out = relu (out_1)

pool_out=MaxPool(relu_out, kernel_size=(2,2), strides=2)

 

For an input image, 32 filters are used here, each with a size of 3X3 and a step size of 1.

The following picture can be used to show all the operations in the above code snippet:

 

In the above figure, each of the 32 filters (ie: Filter-1, Filter-2...) actually contains a set of 3 two-dimensional kernels (Wt-R, Wt-G and WT-B, That is: the depth is 3). Each of these two-dimensional kernels is saved as a red (R), green (G), and blue (B) channel in the input picture.

During forward propagation, the R, G, and B pixel values ​​in the input image are multiplied by the Wt-R, Wt-G, and Wt-B kernels to generate an intermittent activation map (not shown in the figure) ). Then the outputs of the three cores are added together to generate an activation map (Activation) for each filter, a total of 32.

Subsequently, each of these activation maps must be dominated by the ReLu function, and finally run to the maximum pooling layer (some do not use the maximum pooling layer), and the latter is mainly responsible for reducing the dimensionality of the output activation map (which can be understood as reducing The size of the length X width is smaller, and note that the step size used here is 2). In the end, what we get is a set of 32 activation maps, the dimension of which is half of the input image (ie: the 32 feature maps that are obtained, the size of each feature map is only half of the input image).

The output from the convolutional layer is often used as the input for subsequent convolutional layers. Therefore, if our second convolution unit is as follows:

 

conv_out_2 = Conv2d(input = relu_out,filters = 64)

 

There are 64 filters in this convolution unit, and each filter uses a set of 32 unique kernels (each kernel corresponds to the channel of a feature map output by the previous convolution layer, 32 feature maps need 32 kernels , That is: the depth is 32).

The calculation process of the Conv2d() convolution function with relatively simple parameters can refer to:

https://blog.csdn.net/weixin_41943311/article/details/94570067

 

(4)Batch Normalization

Normalization is data standardization (normalization, normalization), Batch can be understood as batch, which adds up to batch standardization.

Guess you like

Origin blog.csdn.net/qingfengxd1/article/details/110928299