Classical neural network (3) Vgg-Net and its application on the Fashion-MNIST dataset

Classical neural network (3) VGG_ network using blocks

1 Brief introduction of VGG

1.1 Overview of VGG

  1. VGG-NetDeepMindIt is a deep convolutional network jointly developed by the computer vision group of Oxford University and the company, and ILSVRCwon the second place in the classification project and the first place in the positioning project in the competition in 2014. These repetitive structures can be easily implemented in the code of any modern deep learning framework by using loops and subroutines.
  2. VGG-NetThe main contributions are:
    • 3x3It is proved that a deep network with a small size convolution kernel ( ) is better than a shallow network with a large size convolution kernel.
    • Demonstrate the importance of depth to the generalization performance of the network.
    • The use of blocks leads to a very compact defined network. Complex networks can be efficiently designed using blocks.
    • scale jitteringThe effectiveness of size dithering, a data augmentation technique, is verified .
  3. VGG-NetThe biggest problem is the number of parameters, VGG-19basically the convolutional network architecture with the largest number of parameters.

1.2 Comparison of AlexNet and VGG

insert image description here

  • Like AlexNet and LeNet, VGG network can be divided into two parts: the first part is mainly composed of convolutional layer and pooling layer, and the second part is composed of fully connected layer.

  • The first part of the VGG neural network is formed by connecting several VGG blocks. The fully connected module is the same as in AlexNet.

  • Compared with AlexNet, VGG-Netthe input in the first fully connected layer feature mapis larger: 7x7 vs 6x6, 512 vs 256.

1.3 Five Group Structures of VGG Neural Network

VGG-NetThere are five groups of structures (represented as: A~E), each group of structures is similar, and the difference lies in the depth of the network.

  • The different parts of the structure are given in black bold.

  • The parameters of the convolution layer are convx-y, where xis the size of the convolution kernel and yis the number of convolution kernels.

    For example: conv3-64represents 64a 3x3convolution kernel.

  • The number of channels of the convolutional layer starts small (64 channels), and then doubles the number of channels of the convolutional layer after each pooling layer until it reaches 512.

  • Each convolutional layer is followed by an ReLUactivation function.

insert image description here

  • Input layer: A fixed-sized image 224x224of RGB.

  • Convolution layer: The convolution step size is 1.

    • Filling method: fill the input of the convolutional layer so that the same spatial resolution is maintained before and after convolution.

      • 3x3Convolution: sameFilling, that is, filling 1 pixel at the top, bottom, left, and right of the input.
      • 1x1Convolution: No padding required.
    • Convolution kernel size: There are two types of 3x3and 1x1.

      • 3x3Convolution Kernel: This is the smallest size that captures the concepts of left-right, top-bottom, center, etc.

      • 1x1Convolution Kernel: A linear transformation for the input channels.

        It is followed by an ReLUactivation function that performs a non-linear transformation on the input channel.

  • Pooling layer: Max pooling is used.

    • Pooling layers are connected after convolutional layers, but not all convolutional layers are followed by pooling.
    • The pooling window is 2x2, and the stride is 2.
  • The last four layers of the network are:: three fully connected layers + one softmaxlayer.

    • The first two fully connected layers are both 4096 neurons, and the third fully connected layer is 1000 neurons (because the classification of 1000 classes is performed).
    • The last layer is softmaxthe probability that the layer is used to output the class.
  • All hidden layers use ReLUactivation functions.

1.4 Parameters of 5 Group Structures of VGG Neural Network

The number of parameters of the first fully connected layer is: 7x7x512x4096=1.02亿, so most of the parameters of the network come from this layer.

network A , A-LRN B C D E
number of parameters 113 million 133 million 134 million 138 million 1.44

1.5 Implementation of VGG-11

import torch.nn as nn
import torch

'''
原始VGG⽹络有5个卷积块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层。
第⼀个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。
由于该⽹络使⽤8个卷积层和3个全连接层,因此它通常被称为VGG-11。
'''
class Vgg11Net(nn.Module):


    def __init__(self):
        super().__init__()
        self.model = self.vgg()

    
    def forward(self, X):
        X = self.model(X)
        return X

    def vgg(self):
        conv_blks = []
        # 输入通道的数量,初始化为1
        in_channels = 1

        # 卷积层部分,一共有5个vgg块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层
        conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
        for (num_convs, out_channels) in conv_arch:
            # 添加vgg块
            conv_blks.append(self.vgg_block(num_convs, in_channels, out_channels))
            in_channels = out_channels

        return nn.Sequential(
            *conv_blks,
            nn.Flatten(),
            # 全连接层部分,和AlexNet一致
            # 第一个全连接层的参数数量为:7x7x512x4096=1.02亿,因此网络绝大部分参数来自于该层
            nn.Linear(512 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
            nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
            nn.Linear(4096, 10)
        )




    def vgg_block(self,num_convs,in_channels,out_channels):

        """
        :param num_convs:    卷积层的数量
        :param in_channels:  输⼊通道的数量
        :param out_channels: 输出通道的数量
        :return: vgg块
        """
        layers = []
        for _ in range(num_convs):
            # 卷积层
            layers.append(
                # 填充方式:填充卷积层的输入,使得卷积前后保持同样的空间分辨率
                nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1)
            )
            layers.append(nn.ReLU())
            in_channels = out_channels

        # 汇聚层
        layers.append(nn.MaxPool2d(kernel_size=2,stride=2))

        return nn.Sequential(*layers)



if __name__ == '__main__':
    net = Vgg11Net()
    # 测试神经网络是否可运行
    # inputs = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
    # outputs = net(inputs)
    # print(outputs.shape)
    # 查看每一层输出的shape
    X = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
    for layer in net.model:
        X = layer(X)
        print(layer.__class__.__name__, 'output shape:', X.shape)
# 1、5个卷积块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层。
# 第⼀个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。
Sequential output shape: torch.Size([1, 64, 112, 112])
Sequential output shape: torch.Size([1, 128, 56, 56])
Sequential output shape: torch.Size([1, 256, 28, 28])
Sequential output shape: torch.Size([1, 512, 14, 14])
Sequential output shape: torch.Size([1, 512, 7, 7])

# 2、和AlexNet一样的3个全连接层
Flatten output shape: torch.Size([1, 25088])
Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])

Linear output shape: torch.Size([1, 4096])
ReLU output shape: torch.Size([1, 4096])
Dropout output shape: torch.Size([1, 4096])

Linear output shape: torch.Size([1, 10])

2 Innovation points of the VGG paper

The download address of the paper is: https://arxiv.org/pdf/1409.1556.pdf

2.1 Initialization of weights

Since the network is deep, the initialization of network weights is important, and a poorly designed initialization may hinder learning.

  • The weight initialization scheme of the paper is: train the structure first A. When training deeper configurations, Athe first four convolutional layers and the last three fully connected layers of the architecture are used to initialize the network, and the other layers of the network are randomly initialized.
  • The authors later pointed out that weights can be initialized directly via Xavieruniform initialization without pre-training.

2.2 Local Response Normalization Layer LRN

  • The classification error decreases as the depth of the network increases.
  • From the comparison ofA-LRN and , it is found that the local response normalization layer does not improve the model.ALRN

2.3 Channel pixel zero mean

  • First count the channel mean of all samples in the training set: the pixel mean a of all red channels, the pixel mean b of all green channels, and the pixel mean c of all blue channels
  • For each sample: subtract a from each pixel value in the red channel, subtract b from each pixel value in the green channel, and subtract c from each pixel value in the blue channel.

There are other contents in the paper 多尺度训练、多尺度测试, those who are interested can read the original text.

3 Application example of VGG-11 on Fashion-MNIST dataset

3.1 Create VGG-11 network model

注意:由于VGG-11⽐AlexNet计算量更⼤,因此我们构建了⼀个通道数较少的⽹络,⾜够⽤于训练Fashion-MNIST数据集。

import torch.nn as nn
import torch

'''
原始VGG⽹络有5个卷积块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层。
第⼀个模块有64个输出通道,每个后续模块将输出通道数量翻倍,直到该数字达到512。
由于该⽹络使⽤8个卷积层和3个全连接层,因此它通常被称为VGG-11。
'''
class Vgg11Net(nn.Module):


    def __init__(self):
        super().__init__()
        self.model = self.vgg()


    def forward(self, X):
        X = self.model(X)
        return X

    def vgg(self):
        conv_blks = []
        # 输入通道的数量,初始化为1
        in_channels = 1

        # 卷积层部分,一共有5个vgg块,其中前两个块各有⼀个卷积层,后三个块各包含两个卷积层
        conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
        # 1、由于VGG-11⽐AlexNet计算量更⼤,因此我们构建了⼀个通道数较少的⽹络,⾜够⽤于训练Fashion-MNIST数据集。
        small_conv_arch = [(pair[0], pair[1] // 4) for pair in conv_arch]
        for (num_convs, out_channels) in small_conv_arch:
            # 添加vgg块
            conv_blks.append(self.vgg_block(num_convs, in_channels, out_channels))
            in_channels = out_channels

        return nn.Sequential(
            *conv_blks,
            nn.Flatten(),
            # 全连接层部分,和AlexNet一致
            # 2、注意,这里从512改为128
            nn.Linear(128 * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
            nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
            nn.Linear(4096, 10)
        )




    def vgg_block(self,num_convs,in_channels,out_channels):

        """
        :param num_convs:    卷积层的数量
        :param in_channels:  输⼊通道的数量
        :param out_channels: 输出通道的数量
        :return: vgg块
        """
        layers = []
        for _ in range(num_convs):
            # 卷积层
            layers.append(
                nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1)
            )
            layers.append(nn.ReLU())
            in_channels = out_channels

        # 汇聚层
        layers.append(nn.MaxPool2d(kernel_size=2,stride=2))

        return nn.Sequential(*layers)



if __name__ == '__main__':
    net = Vgg11Net()
    # 测试神经网络是否可运行
    # inputs = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
    # outputs = net(inputs)
    # print(outputs.shape)
    # 查看每一层输出的shape
    X = torch.rand(size=(1, 1, 224, 224), dtype=torch.float32)
    for layer in net.model:
        X = layer(X)
        print(layer.__class__.__name__, 'output shape:', X.shape)

3.2 Read the Fashion-MNIST dataset

All other functions are exactly the same as the classic neural network (1) LeNet and its application on the Fashion-MNIST dataset .

'''
Fashion-MNIST图像的分辨率(28×28像素)低于ImageNet图像。为了解决这个问题,增加到224×224
'''
batch_size = 128
train_iter,test_iter = get_mnist_data(batch_size,resize=224)

3.3 Model training on GPU

from _03_Vgg11Net import Vgg11Net

# 初始化模型
net = Vgg11Net()

lr, num_epochs = 0.05, 10
train_ch(net, train_iter, test_iter, num_epochs, lr, try_gpu())

insert image description here
注:Vgg-Net用GPU才能跑,如果自己电脑没有合适的GPU,可以参考下面文章进行租借

Detailed explanation of renting GPU on AutoDL platform

Guess you like

Origin blog.csdn.net/qq_44665283/article/details/130730586