Deep Learning Theory (15) -- VGG's initial exploration of the mystery of depth

Scientific knowledge

An important topic in machine learning is the generalization ability of the model. A model with strong generalization ability is a good model. For a well-trained model, if it performs poorly in the training set, it will also perform poorly in the test set. This may is caused by underfitting. Underfitting means that the degree of fitting of the model is not high, the data is far from the fitting curve, or the model does not capture the characteristics of the data well and cannot fit the data well.

ae569fb3af68e2f1f42328eff55455c6.png

# Preface

SEP.

In the last article of the theoretical article, we shared the AlexNet network, which is a little deeper than the previous deep learning network, and at the same time uses large-scale convolutions, which are novel improvements compared to the previous ones. Today we continue to learn a new network architecture - VGG. The basic component is still a convolutional layer, but the depth and combination are different. Finally, the public data set has also been raised to a new height. At the same time, this network will Deep learning has advanced a new step.

4eee0f50bb35552c34c8d4549c51f701.png

VGG network

6c46ebfbaf751f4d169f47539c04e6a5.png

The paper shared today is: Very Deep Convolutional Networks for Large-Scale Image Recognition. You can probably know what it means when you hear the name. The translation is a deep convolutional neural network for large-scale image recognition. How deep is this network? Generally speaking, the most recognized ones include 16-layer and 19-layer versions, and the final network architectures are: VGG16 and VGG19.

Screenshot of the paper:

2af6409627f8f5b34ed846333741fc7f.png

0d7ffe36c5acf7087ad33dcef64e6de1.png

1. Network structure diagram

ec7c72493601812fb2e6a59b3b72117c.png

The network configuration diagram in the paper:

c315de0ec5165ff9aae05d9d1a3f3842.png

Figure 1

Network structure diagram on the Internet:

726b61b86ab2be6e2405d284511e9783.png

Figure II

Paper address: https://arxiv.org/pdf/1409.1556.pdf

2. Network analysis

6b0d033daaaae24df7868aab84940666.png

Today we only share VGG16, because VGG19 has a similar architecture, but the depth of the network is a little bit more. Observe carefully, conv3 in Figure 1 represents the convolution operation with a convolution kernel of 3x3 size, and the number of channels changes from 3-64-128-256-512. From Figure 2, we can see that the original image passes through the network After that, the size is getting smaller and smaller, but the number of channels of the intermediate feature map is increasing. What is the principle? The popular explanation is to use the increase in the number of channels to make up for the decrease in spatial information (because the feature map is getting smaller and smaller).

VGG16 contains a total of 16 layers (13 layers of convolution + 3 layers of full connection). What needs to be remembered here is that the number of network layers usually refers to the layer that can be trained, and pooling is not included, because It only contains calculation operations, no training operations.

Input layer: 224x224x3

64-channel convolution layer block : 2-layer 3x3x64 convolution structure, and padding operation is adopted at the same time, so that the size of the feature map before and after the convolution operation will be kept unchanged, and the output: 64x224x224.

maxpooling1 : The size of the feature map becomes half of the original, output: 64x112x112.

128-channel convolution block : 2-layer 3x3x128 convolution structure, and padding operation is adopted at the same time, so that the size of the feature map before and after the convolution operation will remain unchanged, and the output: 128x112x112.

maxpooling2 : The size of the feature map becomes half of the original, output: 128x56x56.

256-channel convolution block : 3-layer 3x3x256 convolution structure, and padding operation is used at the same time, so that the size of the feature map before and after the convolution operation will remain unchanged, and the output: 256x56x56.

maxpooling3 : The size of the feature map becomes half of the original, output: 256x28x28.

512-channel convolution block : 3-layer 3x3x256 convolution structure, and padding operation is used at the same time, so that the size of the feature map before and after the convolution operation will remain unchanged, and the output: 512x28x28.

maxpooling4 : The size of the feature map becomes half of the original, output: 512x14x14.

512-channel convolution block : 3-layer 3x3x256 convolution structure, and padding operation is used at the same time, so that the size of the feature map before and after the convolution operation will be kept unchanged, and the output: 512x14x14.

maxpooling5 : The size of the feature map becomes half of the original, output: 512x7x7.

Fully connected layer 1 : input: 512*7*7, output: 4096.

Fully connected layer 2 : input: 4096, output: 4096.

Fully connected layer 3 : input: 4096, output: 1000. Because it is 100 categories.

The above is the structural analysis of the entire VGG16. The network mainly proves that the deeper the network can learn more information, it also improves the final classification accuracy, but is the deeper the network the better? Or is there any limit to getting deeper? We will discuss this issue later. In addition, the deeper the network consumes more video memory, especially the last two fully connected layers of 4096, so it is better to run such a network with a graphics card above 1080, otherwise the speed is very slow.

8a73fe716199788b693e194200bcc658.gif

END

epilogue

This is the end of today’s sharing. Students who are serious about studying can read the original paper of the VGG network to understand the author’s original intention of designing this network and how to prove the effectiveness of the network. Next week we will continue the TensorFlow practice of VGG16 .

See you again!

Editor: Layman Yueyi|Review: Layman Xiaoquanquan

24dfa4a21a46b5c24098c369b7416613.png

Advanced IT Tour

Past review

Deep Learning Theory (14) -- AlexNet's next level

Deep Learning Theory (13) -- LetNet-5 is surging

Deep Learning Theory Part (12) -- Pooling of Dimensionality Reduction

What have we done in the past year:

[Year-end Summary] Saying goodbye to the old and welcoming the new, 2020, let's start again

[Year-end summary] 2021, bid farewell to the old and welcome the new

7eaa92d7557a6b74f432dbb1287d1192.gif

Click "Like" and let's go~

Guess you like

Origin blog.csdn.net/xyl666666/article/details/120500128