First understanding of convolutional neural network (CNN)

Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right


foreword

I am a student in a certain college, because I am more interested in deep learning, artificial intelligence and other fields, so the following are the basic concepts of cnn learned on the Internet. Convolutional neural network (cnn) is also a kind of feedforward neural network, which is characterized in that the neuron nodes of each layer only respond to the neurons in the local area of ​​the previous layer (each neuron node in the fully connected network responds to the previous layer of all nodes). A deep convolutional neural network model usually consists of several convolutional layers superimposed on several fully connected layers, and various nonlinear operations and pooling operations are also included in the middle. The convolutional neural network can also be trained using the backpropagation algorithm. Compared with other network models, the parameter sharing feature of the convolution operation greatly reduces the number of parameters that need to be optimized, improving the training efficiency and scalability of the model. Since the convolution operation is mainly used to process data with a grid-like structure, it has significant advantages in the analysis and identification of time series and image data.

Convolutional Neural Networks (CNNs)

Convolutional Neural Network (CNN) is mainly composed of three parts:
convolutional layer: perform convolution operation to extract the features from the bottom layer to the high layer, and discover the "local characteristics" of the picture ;
pooling layer: through downsampling , under the premise of not affecting the image quality, compress the picture and reduce the parameters;
fully connected layer: after the pooling is completed, the data is "flattened" and thrown into the Flatten layer, and then the output of the Flatten layer is placed in the fully connected layer, It can be classified using softmax.
insert image description here

1. Convolution layer

As the name implies, the core of Convolutional Neural Network (CNN) is of course convolution. In the convolutional layer of cnn, there are matrices filled with numbers. They are called convolutional kernels. After the original picture passes through the input layer , will become a matrix filled with grayscale or RGB, align the convolution sum with the image matrix, multiply the numbers in the corresponding grid and then add them, and then fill the obtained numbers into a new matrix. This operation is convolution , and the distance that the convolution kernel moves each time in the image matrix is ​​called the step size. insert image description here
The obtained new matrix can reflect some features of the image, so it is called a feature map. The feature map is the output of the convolutional layer and the input of the next pooling layer. By setting the number in the convolution kernel, we can get different feature maps. If we want to determine the number in the convolution kernel, we can get it through training.

2. Pooling layer

The feature map obtained after the convolution operation is the input of the pooling layer. The pooling layer can select the main features of the image. The commonly used Maxpooling is to retain the maximum value of the window coverage area, and the Average Pooling can select the average value of the area. After the feature map matrix is ​​pooled, the parameters will be greatly reduced, reducing unnecessary calculations, thereby speeding up the neural network.
insert image description here

3. Fully connected layer

The fully-connected layer is generally at the end of the network, and the fully-connected layer can gather the features that have arrived in advance to give an overview of what the picture may be.

4. Activation function

Because all layer calculations are linear functions, no matter how many layers there are, the whole is actually a linear function, and it is actually a linear function, so the meaning of multiple layers is gone, so we need to add a nonlinear function in the middle to make the network internal It's a little more complicated, so each node is manipulated, and before each node outputs data, it is first calculated with a nonlinear function, such as sigmod or relu function. The probability of a category we finally output has a value range ∈ [0, 1], and the sum of all output values ​​is 1, so all values ​​​​of the output layer are processed after the output layer, the formula is
insert image description here

Summarize

Convolutional Neural Network (CNN) is very good at processing images. It is its excellent performance in their respective competitions that leads the trend of deep learning. All in all, understanding the principles of cnn is very necessary for learning deep learning. ( roll up! )
insert image description here

Guess you like

Origin blog.csdn.net/Thousand_drive/article/details/124230471