Convolutional Neural Network Study Notes and Experience (5) Fully Connected Layer

After several layers of convolution and pooling, the dimension of the image will become smaller and larger, and the number will become more and more, and finally enter the fully connected layer and classify the output (traditional neural network). Since the fully connected layer will have a large number of connection weights, the possibility of model overfitting will increase. In this regard, researchers have proposed methods such as sparse connection and dropout to reduce the possibility of overfitting.

Dropout is a simple and effective way to prevent overfitting. It is used for the fully-connected layer in the training phase: each time training, some nodes of the fully-connected layer are randomly not updated, but their weights are retained.

An intuitive explanation of the Dropout principle is: because the hidden nodes appear randomly with a certain probability each time the weights are updated with the samples of the input network, it cannot be guaranteed that every 2 hidden nodes will be traversed at the same time. Appears, so that the update of weights no longer depends on the joint action of implicit nodes with a fixed relationship, which prevents some features from being effective only under other specific features.

Regarding the fully connected layer, I had an episode when I was working on the OCR project: at the beginning of the project, since everyone had no experience in using deep learning technology, it was not even clear which model to use. There were many popular models at that time, such as Lenet5, GoogLenet v1, Resnet50 and so on. Considering that Lenet5 may not be able to perform mixed character recognition, we first excluded Lenet5 (later the model we used added a convolutional layer and a pooling layer to Lenet5, which we called Lenet7^_^), and The Resnet50 model didn't work for our machine, so GoogLenet v1 was chosen. The output layer of GoogLenet v1 is different from the general convolutional neural network in that it uses global average pooling instead of a fully connected layer. This concept was proposed in a 2014 paper called "Network in Network" (the paper is famous). The purpose of using global average pooling instead of the fully connected layer is mainly to reduce the number of parameters and eliminate the risk of overfitting (the global draw pooling layer does not need to be learned).

One day after leaving the company, I suddenly thought of replacing the fully connected layer of Lenet7 with global average pooling. The result surprised me, and the recognition rate dropped sharply. Later, I realized that the fully connected layer is a nonlinear classifier, which guarantees the complexity of the model to a certain extent. When the extracted feature vectors are not mutually exclusive, the fully connected layer can play a role. While GoogLenet is a 22-layer network, the extracted features are abstract enough, and the feature vectors are almost completely mutually exclusive. At this time, a linear classifier such as global average pooling can play a better role. For details, see Wei Xiusan's answer in this link:
https://www.zhihu.com/question/41037974/answer/150522307

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325922545&siteId=291194637