Deep learning filtering and padding

Filtering and Padding

The original picture is nxn. After the filter of fxf, the size becomes (n-f+1) f is generally odd. If there is a step size S, then the output is (n − f) / S + 1 (nf)/S+1(nf)/S+1

At this time there will be two problems

  • After the convolution operation, the input image size is reduced
  • The edge information of the original picture contributes little to the output, and the output picture will lose the edge information

In order to solve the problem of image reduction, the original image is expanded using padding, the expanded area is filled with zeros, and p is used to represent the expanded width.

Insert picture description here

After padding, the original picture becomes (n + 2 p) x (n + 2 p) (n+2p)x(n+2p)(n+2p)x(n+2p)

So to ensure that the image size before and after convolution is consistent, then p = (f − 1) / 2 p=(f-1)/2p=(f1)/2

Stride represents the length of each step of the filter in the horizontal and vertical directions in the original image. Previously, we defaulted stride=1. If stride=2, it means that the length of each step of the filter is 2, that is, it moves once every point.

img

We use s to represent the stride length and p to represent the padding length. If the original image size is nxn and the filter size is fxf, the image size after convolution is:

⌊ n + 2 p − f s + 1 ⌋ X ⌊ n + 2 p − f s + 1 ⌋ ⌊\frac{n+2p−f}{s}+1⌋X⌊\frac{n+2p−f}{s}+1⌋ sn+2pf+1Xsn+2pf+1

It is worth mentioning that there is a difference between cross-correlations and convolutions. In fact, the real convolution operation will first rotate the filter 180 degrees around its center, and then slide the rotated filter on the original picture. The filter rotation is as follows:

img

In fact, the CNN convolution we have introduced so far actually calculates the correlation coefficient, not the convolution in the mathematical sense. However, in order to simplify the calculation, we generally call this "correlation coefficient" in CNN a convolution operation. The reason for thisEquivalent, Because the filter operator is generally horizontal or vertical symmetrical, 180 degree rotation has little effect; and the final filter operator needs to be calculated by the CNN network gradient descent algorithm, and the rotation part can be regarded as included in the CNN model algorithm . In general, ignoring the rotation operation can greatly increase the speed of the CNN network operation without affecting the performance of the model.

Convolution operation obeys associative law:
(A ∗ B) ∗ C = A ∗ (B ∗ C) (A*B)*C=A*(B*C)(AB)C=A(BC)

Convolutions Over Volume

For a 3-channel RGB picture, the corresponding filter operator is also 3-channel. For example, a picture is 6 x 6 x 3, which respectively represent the height (height), width (weight) and channel (#channel) of the picture.

The convolution operation of the 3-channel picture is basically the same as the convolution operation of the single-channel picture. The process is to perform convolution and summation of each single channel (R, G, B) with the corresponding filter, and then add the sum of the 3 channels to obtain a pixel value of the output image.

img

The filter operators of different channels may be different. For example, the R channel filter implements vertical edge detection, and the G and B channels do not perform edge detection, and all are set to zero, or the R, G, and B three channel filters are all set to horizontal edge detection.

In order to perform multiple convolution operations and achieve more edge detection, more filter banks can be added. For example, set the first filter bank to achieve vertical edge detection, and the second filter bank to achieve horizontal edge detection. In this way, different filter banks convolve to get different outputs, and the number is determined by the filter bank.

img

If the size of the input picture is nxnx nc and the filter size is fxfx nc, the size of the picture after convolution is (n-f+1) x (n-f+1) x nc'. Among them, nc is the number of picture channels, and nc' is the number of filter banks.

** One Layer of a Convolutional Network**

The single-layer structure of the convolutional neural network is as follows:

img

Compared with the previous convolution process, the single-layer structure of CNN has more activation function ReLU and offset b. The whole process is very similar to the standard single layer structure of neural network:

Each filter bank has 3x3x3=27 parameters, and 1 offset b, thenEach filter bank has 27+1=28 parameters, The two filter banks contain a total of 28×2=56 parameters. We found that after selecting the filter bank,The number of parameters has nothing to do with the input image size. Therefore, there is no case that the image size is too large, resulting in too many parameters. For example, for a 1000x1000x3 picture, the dimensionality of the input layer of the standard neural network will reach 3 million. In CNN, the number of parameters is only determined by the filter bank, and the number is relatively small. This is one of the advantages of CNN. .

The number of parameters has nothing to do with the size of the picture

Finally, we summarize all the marker symbols of the CNN single-layer structure, and set the number of layers as l

Understand, input: After the width, height and dimension of the picture are superimposed on the filter, then the number of weights is actually the number of parameters of the filter, and the offset isNumber of output filters

Simple Convolutional Network Example

Here is a simple CNN network model:

The layer structure of the CNN model is shown in the figure above. It should be noted that a [3] ∗ a [3] a^{[3]}*a^[3]a[3]aThe dimension of [ 3]is 7 x 7 x 40, anda [3] ∗ a [3] a^{[3]}*a^[3]a[3]a[ 3]Arranged in a column with a dimension of 1960 x 1, and then connected to the last level of output layer. The output layer can be one neuron, that is, binary classification (logistic); it can also be multiple neurons, that is, multivariate classification (softmax). Finally get the predicted outputy ^ \hat yand^

It is worth mentioning that as the number of CNN layers increases, n H [l] and n W [l] n_H^{[l]} and n_W^{[l]}nH[l]And nW[l]Generally decrease gradually, while nc [l] n_c^{[l]}nc[l]Generally increase gradually.

For parameters, the parameters of each last layer are fxfx nc [l − 1] n_c^{[ l-1 ]}nc[l1]+ n c [ l ] n_c^{[ l ]} nc[l]

CNN has three types of layers:

  • Convolution layer (CONV)
  • Pooling层(POOL)
  • Fully connected层(FC)

CNN Example

Here is a simple example of CNN for number recognition:

img

In the figure, the CON layer is followed by a POOL layer, CONV1 and POOL1 constitute the first layer, and CONV2 and POOL2 constitute the second layer. Special attention is that FC3 and FC4 are fully connected layers FC, which are consistent with the standard neural network structure. The final output layer (softmax) consists of 10 neurons.

The dimensions and parameters of each layer of the entire network are shown in the following table:

img

Why Convolutions

Compared with standard neural networks, one of the advantages of CNN is that the number of parameters is much smaller. There are two reasons for the small number of parameters:

  • Parameter sharing: A feature detector (such as vertical edge detection) is useful for a certain area of ​​the picture, and it may also act on other areas of the picture.
  • Sparsity of connection: Because of the filter operator size limitation, each output of each layer is only related to the input part of the region.

In addition, due to the small number of CNN parameters, relatively few training samples are required, so that over-fitting is not prone to occur to a certain extent. Moreover, CNN is better at capturing regional position shifts. That is to say, when CNN performs object detection, it is not affected by the position of the image where the object is located, increasing the accuracy of detection and the robustness of the system.

Guess you like

Origin blog.csdn.net/ahelloyou/article/details/111691982