Pytorch Neural Network Practical Study Notes_14 Convolutional Network Model + Sobel Operator Principle

1 Understanding Convolutional Neural Networks from a Visual Perspective

1.1 The relationship between convolutional neural network and biological visual system

The workflow of convolutional neural networks is similar to that of biological brains for processing visual signals, that is, the transformation of images from basic pixels to local information to global information. When the brain processes images hierarchically, the images are calculated step by step from low-level features to high-level features.

1.2 Calculus

When the brain processes vision, it is essentially a process of first differentiation and then integration

1.3 Discrete Differentiation and Discrete Integration

In calculus, the condition of infinite subdivision is that the subdivided object must be continuous. For example, a straight line can be infinitely subdivided into points, but several points cannot be subdivided.

1.3.1 Discrete Differentiation

The process of subdividing discrete objects is called discrete differentiation. For example, on the right side of Figure 7-3, the dashed line segment is divided into four points.

1.3.2 Discrete integration

  • The left side of Figure 7-3 can be understood as the result of the integration of continuously subdivided line segments, combining all arbitrarily small line segments together.
  • The dotted line segment on the right of Figure 7-3 can be understood as the integration result of 4 points, that is, combining 4 points together.
  • The operation of integrating the results of discrete differentiation is called discrete integration.

1.4 Discrete integrals in visual neural networks

1.4.1 Digital Forms of Computer Vision

The value of each matrix is ​​0~255, which is used to represent the pixel point

1.4.2 The working model of computer image processing/discrete calculus

①Use the convolution operation to process local information to generate low-level features.

② Perform multiple convolution operations on low-level features to generate intermediate and high-level features.

③ Combine the high-level features of multiple local information together to generate the final interpretation result

2 Structure of Convolutional Neural Networks

Perform small-scale calculations on the data area, use smaller weights to complete the classification task, improve the difficult convergence situation, and improve the generalization ability.

2.1 The working process of convolutional neural network

Take the fully connected network as an example to introduce the convolutional neural network:

 Convolution process:

The convolution kernel (also known as the filter) has three input nodes and one output node. The nodes obtained through the convolution operation are called feature maps. Each time a certain step is moved, the entire result of its output is the result of the convolution.

2.1.1 The difference between convolutional neural network and fully connected network

  • Each node output by the convolutional network is the result obtained after the local area nodes in the original data are calculated by neurons.
  • Each node output by the fully connected network is the result obtained after all nodes in the original data are calculated by neurons.
  • The local information contained in the results output by the convolutional neural network is more obvious. Due to this characteristic of convolution, convolutional neural networks are widely used in the field of computer vision.

2.2 1D convolution, 2D convolution, 3D convolution

1D/2D/3D convolution calculation methods are the same, of which 2D convolution is the most widely used. Compared with fully connected layers, the main advantages of convolutional layers are parameter sharing and sparse connections , which greatly reduces the number of parameters that need to be learned for convolutional operations.

The convolution is calculated as follows:

insert image description here

n: original image size n ∗ n
p: padding, the number of filled pixel columns at the edge of the original image
f: the kernel size of the filter, which needs to be emphasized here, because the original image has only one channel, so this convolution filter only uses one kernel.
s: that is, stride, the step size of each movement of the filter on the image.

The derivation process of the calculation method:

Convolution is a "merging" operation that performs weighted summation of adjacent pieces of data to obtain a number, and performs sliding scanning on the input tensor to obtain the output tensor. Following this process, we can easily derive the formula for calculating the size of the convolution output.
(1) padding refers to zero-filling on both sides at the same time, so the input size after zero-filling is equivalent to i+2p;
(2) When scanning with a convolution kernel, imagine a ruler moving from left to right on the table , bounded by the left and right borders, its movement range is only i+2p-f.
(3) If the step size of each move is s, the actual number of steps to move is (i+2p-f)/s, but the number of steps to move must be an integer, because it cannot be out of bounds, if the last step is even a little worse It can't be counted, so it must be rounded down.
(4) Even if you don't move one step, you will get an output point in situ, so the final output size is the total number of steps moved plus 1.

2.2.2 Calculation of (de)convolution output size for AI interview questions

[input+2*padding] This is the original picture, with a circle of padding added on the outside, because the padding is a circle, so there are left and right, up and down, so it is twice as much.

[input+2∗padding−kernel] is the calculation, how many steps to take.


Let's take an example to understand:


The figure is an example where the input is 7, then the kernel_size is 3, and the podding is 1.
As can be seen from the figure below, the kernel takes a total of 6 steps

These six steps are the meaning of input+2∗padding−kernel, the number of steps the kernel needs to slide.

Then stride is the step size. If it is 2, then the kernel movement is like this (as shown below):


It becomes 3 steps. So why does the calculation formula add 1 at the end? Even before the kernel has taken a step, the position in the upper left corner at the beginning is also a point.

[Summary: When calculating the convolution output size, the previous fraction means that the calculation of the convolution kernel can take a few steps, and then the initial position of the convolution kernel is added, which is the output size]

Let's look at a decent example:

 [This example shows how to calculate if the input size is an even number and the convolution kernel is an odd number - round down]

2.2.3 Deconvolution Derivation Calculation

Two deconvolution examples
Input size input=2, kernel_size=3, stride=1, padding=2, calculate the output size of deconvolution?

 [Answer: output=4]

Input size input=3, kernel=3, stride=2, padding=1, calculate the output size of deconvolution?

 [Answer: output=5]

2.2.4 1D Convolution

insert image description here

Calculation

1. The input data dimension in the figure is 8, and the filter dimension is 5. Similar to two-dimensional convolution, the output data dimension after convolution is 8−5+1=4.

2. If the number of filters is still 1, the number of channels of the input data becomes 16, that is, the dimension of the input data is 8×16. The concept of channel here is equivalent to embedding in natural language processing, and the input data represents 8 words, and the word vector dimension of each word is 16. In this case, the dimension of the filter is changed from 5 to 5×16, and the final output data dimension is still 4.

3. If the number of filters is n, the output data dimension becomes 4×n.

Application field

One-dimensional convolution is often used in sequence models, natural language processing fields

2.2.5 2D Convolution

insert image description here
Calculation

1. The input data dimension in the figure is 14×14, the filter size is 5×5, the two are convolved, and the output data dimension is 10×10 (14−5+1=10).

2. The above content does not introduce the concept of channel, it can also be said that the number of channels is 1. If the number of input channels in the two-dimensional convolution is changed to 3, the input data dimension becomes (14×14×3). Since the number of channels of the filter in the convolution operation must be the same as the number of channels of the input data, the filter size also becomes 5×5×3. In the process of convolution, the filter and the data are convolved separately in the channel direction, and then the values ​​after convolution are added, that is, the operation of adding 3 values ​​10×10 times is performed, and the final output data dimension is 10× 10.

3. The above are all discussions under the condition that the number of filters is 1. If the number of filters is increased to 16, that is, 16 filters of size 10×10×3, the final output data dimension becomes 10×10×16. It can be understood as performing the convolution operation of each filter separately, and finally splicing the output of each convolution in the third dimension (channel dimension).

Application field

Two-dimensional convolution is often used in the fields of computer vision and image processing

2.2.6 3D Convolution

insert image description here

Calculation

1. Suppose the size of the input data is a1×a2×a3, the number of channels is c, the size of the filter is f×f×f×c (the dimension of the channel is generally not written), and the number of filters is n.

2. Based on the above situation, the final output of the three-dimensional convolution is (a1−f+1)×(a2−f+1)×(a3−f+1)×n.

Application field
3D convolution is often used in the medical field (CT effect), video processing field (detection of action and human behavior)

3 Example analysis: Sobel operator principle

The Sobel operator is a classic example of the convolution operation. It uses the manually configured convolution kernel to perform the convolution operation on the picture, realizes the edge detection of the picture, and generates a picture including only the outline.

The Sobel edge detection algorithm is relatively simple, and its efficiency is higher than that of canny edge detection in practical applications, but the edge detection is not as accurate as Canny, but in many practical applications, the sobel edge is the first choice, and the Sobel operator is Gaussian smoothing and differential operations. Combination, so its anti-noise ability is very strong, many uses. Especially when the efficiency requirements are high and the fine texture is not too concerned.

3.1 Method

Assuming that the image to be processed is I, take derivatives in both directions:

  • Horizontal Variation : Convolve image I with an odd-sized stencil, resulting in Gx. For example, when the template size is 3, Gx is:

  • Vertical Variation : Convolve image I with an odd-sized template, resulting in Gy. For example, when the template size is 3, then Gy is:

At each point of the image, combine the above two results to find:

The location of the statistical maximum value is the edge of the image.

Note : When the kernel size is 3, the above Sobel kernel may produce obvious errors. To solve this problem, we use the Scharr function, but this function only works on the kernel size of 3. The operation of this function is as fast as the Sobel function, but the result is more accurate, and it is calculated as:

 3.3 The calculation process of the Sobel operator

The calculation process of the Sobel operator The 5×5 light-colored matrix on the left of Figure 7-10 can be understood as the original picture. The middle 3x3 matrix is ​​the Sobe1 operator. An intelligible outline picture of the 5×5 matrix on the right of Figure 7-10.

3.3.1 Description of the calculation process.

1. A circle of 0 is added to the outside of the original image. This process is called padding. The purpose of the padding operation is to generate a matrix of the same size.
2. Multiply each element in the 3×3 matrix in the upper left corner of the matrix after 0-filling with the element in the corresponding position in the Sobel operator matrix, and then add them together, and the obtained value is taken as the first rightmost value. elements.
3. Move the 3×3 matrix in the upper left corner of Figure 7-10 by one space to the right, which can be understood as a step size of 1.
4. Multiply each element in the matrix by the element at the corresponding position of the 3×3 matrix in the middle, then add the multiplied results together, and fill in the calculated value in the first row of the matrix on the right side of Figure 7-0. in two elements.
5. Repeat this operation to fill all the values ​​on the right. Complete the entire calculation process.

The value of each pixel in the newly generated image is not guaranteed to be between 0 and 256. For the pixels outside the interval, the grayscale image cannot be displayed, so a normalization needs to be done, and then each element is multiplied by 256, and all the values ​​are mapped to the interval of 0~256, pay attention to the normalization Algorithm: x=(c-Mim)/Max-Mim). Among them, Max and Mi blood are the maximum and minimum values ​​in the overall data, and x is the current pixel value to be converted. Normalization can make every x in the interval [0, 1].

3.4 Sobel operator principle

As shown in the figure below, the data convolved by the Sobel operator in the image is essentially the difference between the pixels in the image. If this pixel difference data is displayed as a picture, it becomes a contour picture.

The principle of the weight value of the second row of the Sobel operator is the same as that of the first row, except that the difference is amplified by 2 times, which is done to enhance the effect.

Its idea is: (1) weight the pixel difference of the 3 lines of the convolution kernel; (2) center the pixel difference of the second line in the middle; (3) according to the closer to the center point, the result The principle of greater influence is to strengthen the pixel difference value of the second row (the value is set to 2), so that it has a major influence in generating the final result.

4 Convolution kernels in deep neural networks

        In the deep network, there are many convolution kernels similar to the Sobel operator. Unlike the Sobel operator, their weight values ​​are calculated after the model is trained with a large number of samples.
        During the model training process, the weight of the convolution kernel will be adjusted according to the final output result, and finally several convolution kernels with specific functions are generated, and some of them can calculate the pixel difference in the picture to extract the background texture, etc. The feature data generated after convolution can also be further processed by convolution. In deep neural networks, these convolutional processes are implemented through multiple convolutional layers.
        The convolution kernel in the deep convolutional network is no longer simply processing basic pixels such as contours and textures, but further reasoning and superimposing existing features such as energy contours and textures. The feature data that has been convolved multiple times will have more specific local representations, such as eyes, ears, and noses. Then cooperate with other structures of neural network to reason and superimpose local information, and finally complete the recognition of the whole picture.

5 Understand the mathematical meaning of convolution--convolution points

Guess you like

Origin blog.csdn.net/qq_39237205/article/details/123404752
Recommended