Convolutional Neural Network-Fully Connected + Network Training

Fully connected

First, we review the LeNet network structure:

figure 1

As shown in the red box in Figure 1, after multiple convolution-pooling-convolution-pooling operations, there are two full connection operations. The fully connected layer is a traditional multi-layer perceptron, using the Softmax activation function in the output layer (other classifiers, such as SVM can also be used). The term "fully connected" means that every neuron in the upper layer is connected to every neuron in the lower layer. The output of the convolution and pooling layer represents the high-level features of the input image. The purpose of the fully connected layer is to use these features to classify the input image into various categories based on the training data set. For example, Figure 1 shows that the image classification task has four possible outputs (dog, cat, boat, bird).

In addition to classification, adding a fully connected layer is usually an inexpensive way to learn nonlinear combinations of these functions. Most of the features from the convolutional and pooling layers may be useful for classification tasks, but the combination of these features may even be better. As the activation function in the output layer of the fully connected layer, Softmax can ensure that the sum of the output probabilities of the fully connected layer is 1. (The Softmax function inputs a vector with any value greater than 0, it will convert them to a value between zero and one, and the sum is one)

Network training

The overall training process of the convolutional network can be summarized as follows:

Step 1: We initialize all filters and parameters/weights with random values    

Step 2: The network takes the training image as input, performs forward propagation steps (convolution, ReLU and pooling operations and forward propagation in the fully connected layer), and finds the output probability of each category.    

Step 3: Calculate the total error of the output layer, the total error = ∑½ (target probability-output probability)²    

Step 4: Use backpropagation to calculate the error gradient relative to all weights in the network, and use gradient descent to update all filter values/weights and parameter values ​​to minimize the output error.    

Step 5: Repeat steps 2-4 for all images in the training set.

figure 2

As shown in Figure 2, the "convolution-pooling" operation can be repeated multiple times in a convolutional network. Generally speaking, the more convolution steps we perform, the more complex the functions our network will be able to learn to recognize. For example, in image classification, CNN can learn to detect edges from the original pixels in the first layer, then use the edges to detect simple shapes in the second layer, and then use these shapes to compose high-level features, such as the one shown in Figure 3 Higher-level facial shape.

image 3

Finally, I recommend a website to everyone, which visualizes the process of the convolutional neural network to complete the "0-9 digit recognition" function, and helps everyone better understand the middle details of the convolutional neural network.

Website address: http://scs.ryerson.ca/~aharley/vis/conv/flat.html

Guess you like

Origin blog.csdn.net/zzt123zq/article/details/113726357