Convolution principle (convolution, padding, step size, multi-channel convolution)

1. Convolution

Convolution can be used to extract features from input data. The process of feature extraction can be understood as weighting the input features through convolution to extract important information from the input.

The process of convolution operation is to scan the elements of the input matrix through the convolution kernel, multiply and add the corresponding elements of the convolution kernel and the scan to obtain an output, and obtain the final output matrix by continuously sliding. Its operation process is as follows:

b3056ceae69149ff90ec63b30116547e.png

It can be seen from the process of convolution operation that the output feature is the weighted sum of the input features.

2、Padding

In the above example, a 5×5 input feature matrix is ​​convolved with a 3×3 convolution kernel. Two deficiencies can be seen from the above calculation.

The first one is that the output matrix becomes smaller than the input matrix after convolution. When participating in a multi-layer neural network, the matrix will become smaller and smaller. This is not good for human information extraction.

The second is that the number of times the middle element participates in the operation is much greater than that of the surrounding elements, especially when the input matrix is ​​high-dimensional, the gap will be even greater, so we will lose part of the edge information in the calculation.

In order to solve the above two problems, we pad around the input matrix.

7b05ead187d64562b48be170353ff34a.png

If the size of the input matrix is ​​n×n and the size of the convolution kernel is f×f, the size of the output matrix is ​​(n-f+1)×(n-f+1) .

As shown in the above figure, we fill an element around the surrounding pixels of the input matrix, and use p to represent the filling amount, and the filling p shown in the figure is 1.

Use p to represent the number of fills, then the size of the output matrix is ​​(n+2p-f+1)×(n+2p-f+1)

Depending on the amount of filling, there are usually Valid convolution and Same convolution.

Valid convolution: Do not fill the input matrix, that is, p=0. The output matrix size is (n-f+1)×(n-f+1)

Same convolution: The output matrix is ​​as large as the input matrix. That is n+2p-f+1=n, that is to saygif.latex?p%3D%5Cfrac%7Bf-1%7D%7B2%7D .

3. Convolution step size

In the above example of convolving a 5×5 matrix with a 3×3 convolution kernel, the step size s we use is 1. If the step size s is set to 2, then:

8a3d3d1742d34585907a8f9a04219860.png

When the step size is s and the filling quantity is p, the size of the output matrix is:

n×n       *           f×f       ---->  (gif.latex?%5Cleft%20%5Clfloor%20%5Cfrac%7Bn+2p-f%7D%7Bs%7D+1%20%5Cright%20%5Crfloor)×(gif.latex?%5Cleft%20%5Clfloor%20%5Cfrac%7Bn+2p-f%7D%7Bs%7D+1%20%5Cright%20%5Crfloor

4. Multi-channel convolution

Suppose the input data is multi-channel, such as a color picture, with three channels of RGB. For a multi-channel matrix, the number of channels of the convolution kernel should be the same as the number of channels of the input data. As shown in the figure below, the input data is three channels, and the convolution kernel should also be three channels.

 31e3c565545d40d6b63a57b90aae65e2.png

The three-channel convolution process is that the convolution kernel of the corresponding channel convolves the input matrix of the corresponding channel, and then adds the results. The calculation process of the above three-channel convolution is as follows (here, for the sake of simplicity of calculation, the input of the three channels is set to be the same):

dd9ada3422574aa8acc78f9bab321e2a.png

 The above figure demonstrates the convolution process with only one convolution kernel. If there are two convolution kernels, the convolution process is as follows:

e492efd163fb448abf995cd19df4b1a1.png

If the number of convolution kernels is 2, the number of channels of the output matrix is ​​also 2.

Assuming that the number of input matrix channels is n1 and the number of convolution kernels is n2, the output matrix is:

n×n×n1       *       f×f×n1      --->         (n-f+1)×(n-f+1)×n2

Guess you like

Origin blog.csdn.net/m0_45267220/article/details/128985131