Classical neural network (3) VGG_ network using blocks
1 Brief introduction of VGG
1.1 Overview of VGG
VGG-Net
DeepMind
It is a deep convolutional network jointly developed by the computer vision group of Oxford University and the company, andILSVRC
won the second place in the classification project and the first place in the positioning project in the competition in 2014. These repetitive structures can be easily implemented in the code of any modern deep learning framework by using loops and subroutines.VGG-Net
The main contributions are:3x3
It is proved that a deep network with a small size convolution kernel ( ) is better than a shallow network with a large size convolution kernel.- Demonstrate the importance of depth to the generalization performance of the network.
- The use of blocks leads to a very compact defined network. Complex networks can be efficiently designed using blocks.
scale jittering
The effectiveness of size dithering, a data augmentation technique, is verified .
VGG-Net
The biggest problem is the number of parameters,VGG-19
basically the convolutional network architecture with the largest number of parameters.
1.2 Comparison of AlexNet and VGG
-
Like AlexNet and LeNet, VGG network can be divided into two parts: the first part is mainly composed of convolutional layer and pooling layer, and the second part is composed of fully connected layer.
-
The first part of the VGG neural network is formed by connecting several VGG blocks. The fully connected module is the same as in AlexNet.
-
Compared with
AlexNet
,VGG-Net
the input in the first fully connected layerfeature map
is larger:7x7 vs 6x6
,512 vs 256
.
1.3 Five Group Structures of VGG Neural Network
VGG-Net
There are five groups of structures (represented as: A~E
), each group of structures is similar, and the difference lies in the depth of the network.
-
The different parts of the structure are given in black bold.
-
The parameters of the convolution layer are
convx-y
, wherex
is the size of the convolution kernel andy
is the number of convolution kernels.For example:
conv3-64
represents64
a3x3
convolution kernel. -
The number of channels of the convolutional layer starts small (64 channels), and then doubles the number of channels of the convolutional layer after each pooling layer until it reaches 512.
-
Each convolutional layer is followed by an
ReLU
activation function.
-
Input layer: A fixed-sized image
224x224
ofRGB
. -
Convolution layer: The convolution step size is 1.
-
Filling method: fill the input of the convolutional layer so that the same spatial resolution is maintained before and after convolution.
3x3
Convolution:same
Filling, that is, filling 1 pixel at the top, bottom, left, and right of the input.1x1
Convolution: No padding required.
-
Convolution kernel size: There are two types of
3x3
and1x1
.-
3x3
Convolution Kernel: This is the smallest size that captures the concepts of left-right, top-bottom, center, etc. -
1x1
Convolution Kernel: A linear transformation for the input channels.It is followed by an
ReLU
activation function that performs a non-linear transformation on the input channel.
-
-
-
Pooling layer: Max pooling is used.
- Pooling layers are connected after convolutional layers, but not all convolutional layers are followed by pooling.
- The pooling window is
2x2
, and the stride is 2.
-
The last four layers of the network are:: three fully connected layers + one
softmax
layer.- The first two fully connected layers are both 4096 neurons, and the third fully connected layer is 1000 neurons (because the classification of 1000 classes is performed).
- The last layer is
softmax
the probability that the layer is used to output the class.
-
All hidden layers use
ReLU
activation functions.
1.4 Parameters of 5 Group Structures of VGG Neural Network
The number of parameters of the first fully connected layer is: 7x7x512x4096=1.02亿
, so most of the parameters of the network come from this layer.
network | A , A-LRN | B | C | D | E |
---|---|---|---|---|---|
number of parameters | 113 million | 133 million | 134 million | 138 million | 1.44 |
1.5 Implementation of VGG-11
import torch.nn as nn
import torch
'''
原始VGG⽹络有5个卷积块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层。
第⼀个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。
由于该⽹络使⽤8个卷积层和3个全连接层,因此它通常被称为VGG-11。
'''
class Vgg11Net(nn.Module):
def __init__(self):
super().__init__()
self.model = self.vgg()
def forward(self, X):
X = self.model(X)
return X
def vgg(self):
conv_blks = []
# 输入通道的数量,初始化为1
in_channels = 1
# 卷积层部分,一共有5个vgg块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
for (num_convs, out_channels) in conv_arch:
# 添加vgg块
conv_blks.append(self.vgg_block(num_convs, in_channels, out_channels))
in_channels = out_channels
return nn.Sequential(
*conv_blks,
nn.Flatten(),
# 全连接层部分,和AlexNet一致
# 第一个全连接层的参数数量为:7x7x512x4096=1.02亿,因此网络绝大部分参数来自于该层
nn.Linear(512 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(4096, 10)
)
def vgg_block(self,num_convs,in_channels,out_channels):
"""
:param num_convs: 卷积层的数量
:param in_channels: 输⼊通道的数量
:param out_channels: 输出通道的数量
:return: vgg块
"""
layers = []
for _ in range(num_convs):
# 卷积层
layers.append(
# 填充方式:填充卷积层的输入,使得卷积前后保持同样的空间分辨率
nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1)
)
layers.append(nn.ReLU())
in_channels = out_channels
# 汇聚层
layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
return nn.Sequential(*layers)
if __name__ == '__main__':
net = Vgg11Net()
# 测试神经网络是否可运行
# inputs = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
# outputs = net(inputs)
# print(outputs.shape)
# 查看每一层输出的shape
X = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
for layer in net.model:
X = layer(X)
print(layer.__class__.__name__, 'output shape:', X.shape)
# 1、5个卷积块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层。
# 第⼀个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。
Sequential output shape: torch.Size([1, 64, 112, 112])
Sequential output shape: torch.Size([1, 128, 56, 56])
Sequential output shape: torch.Size([1, 256, 28, 28])
Sequential output shape: torch.Size([1, 512, 14, 14])
Sequential output shape: torch.Size([1, 512, 7, 7])
# 2、和AlexNet一样的3个全连接层
Flatten output shape: torch.Size([1, 25088])
Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])
Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])
Linear output shape: torch.Size([1, 10])
2 Innovation points of the VGG paper
The download address of the paper is: https://arxiv.org/pdf/1409.1556.pdf
2.1 Initialization of weights
Since the network is deep, the initialization of network weights is important, and a poorly designed initialization may hinder learning.
- The weight initialization scheme of the paper is: train the structure first
A
. When training deeper configurations,A
the first four convolutional layers and the last three fully connected layers of the architecture are used to initialize the network, and the other layers of the network are randomly initialized. - The authors later pointed out that weights can be initialized directly via
Xavier
uniform initialization without pre-training.
2.2 Local Response Normalization Layer LRN
- The classification error decreases as the depth of the network increases.
- From the comparison of
A-LRN
and , it is found that the local response normalization layer does not improve the model.A
LRN
2.3 Channel pixel zero mean
- First count the channel mean of all samples in the training set: the pixel mean a of all red channels, the pixel mean b of all green channels, and the pixel mean c of all blue channels
- For each sample: subtract a from each pixel value in the red channel, subtract b from each pixel value in the green channel, and subtract c from each pixel value in the blue channel.
There are other contents in the paper 多尺度训练、多尺度测试
, those who are interested can read the original text.
3 Application example of VGG-11 on Fashion-MNIST dataset
3.1 Create VGG-11 network model
注意:由于VGG-11⽐AlexNet计算量更⼤,因此我们构建了⼀个通道数较少的⽹络,⾜够⽤于训练Fashion-MNIST数据集。
import torch.nn as nn
import torch
'''
原始VGG⽹络有5个卷积块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层。
第⼀个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。
由于该⽹络使⽤8个卷积层和3个全连接层,因此它通常被称为VGG-11。
'''
class Vgg11Net(nn.Module):
def __init__(self):
super().__init__()
self.model = self.vgg()
def forward(self, X):
X = self.model(X)
return X
def vgg(self):
conv_blks = []
# 输入通道的数量,初始化为1
in_channels = 1
# 卷积层部分,一共有5个vgg块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
# 1、由于VGG-11⽐AlexNet计算量更⼤,因此我们构建了⼀个通道数较少的⽹络,⾜够⽤于训练Fashion-MNIST数据集。
small_conv_arch = [(pair[0], pair[1] // 4) for pair in conv_arch]
for (num_convs, out_channels) in small_conv_arch:
# 添加vgg块
conv_blks.append(self.vgg_block(num_convs, in_channels, out_channels))
in_channels = out_channels
return nn.Sequential(
*conv_blks,
nn.Flatten(),
# 全连接层部分,和AlexNet一致
# 2、注意,这里从512改为128
nn.Linear(128 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
nn.Linear(4096, 10)
)
def vgg_block(self,num_convs,in_channels,out_channels):
"""
:param num_convs: 卷积层的数量
:param in_channels: 输⼊通道的数量
:param out_channels: 输出通道的数量
:return: vgg块
"""
layers = []
for _ in range(num_convs):
# 卷积层
layers.append(
nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1)
)
layers.append(nn.ReLU())
in_channels = out_channels
# 汇聚层
layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
return nn.Sequential(*layers)
if __name__ == '__main__':
net = Vgg11Net()
# 测试神经网络是否可运行
# inputs = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
# outputs = net(inputs)
# print(outputs.shape)
# 查看每一层输出的shape
X = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
for layer in net.model:
X = layer(X)
print(layer.__class__.__name__, 'output shape:', X.shape)
3.2 Read the Fashion-MNIST dataset
All other functions are exactly the same as the classic neural network (1) LeNet and its application on the Fashion-MNIST dataset .
'''
Fashion-MNIST图像的分辨率(28×28像素)低于ImageNet图像。为了解决这个问题,增加到224×224
'''
batch_size = 128
train_iter,test_iter = get_mnist_data(batch_size,resize=224)
3.3 Model training on GPU
from _03_Vgg11Net import Vgg11Net
# 初始化模型
net = Vgg11Net()
lr, num_epochs = 0.05, 10
train_ch(net, train_iter, test_iter, num_epochs, lr, try_gpu())
注:Vgg-Net用GPU才能跑,如果自己电脑没有合适的GPU,可以参考下面文章进行租借