[Deep Learning Experiment] Convolutional Neural Network (5): Classic model of deep convolutional neural network - VGG network (convolutional layer, pooling layer, fully connected layer)

Table of contents

1. Experiment introduction

2. Experimental environment

1. Configure the virtual environment

2. Library version introduction

3. Experimental content

0. Import necessary toolkits

1. conv_layer (create convolution block)

2. vgg_conv_block (convolution module: convolution layer*n, pooling layer)

3. vgg_fc_layer (fully connected layer)

4. VGG_S (simplified version of VGG model)

a. __init__

b. forward


1. Experiment introduction

        This experiment implemented a simplified version of the VGG network and completed the image classification task based on it.
       

        The VGG network is one of the classic models in deep convolutional neural networks, proposed by the Visual Geometry Group of the University of Oxford. It achieved excellent results in the 2014 ImageNet image classification challenge (second in the classification task and first in the positioning task) and is widely used in tasks such as image classification, target detection, and image generation.

        The main feature of the VGG network is the use of a very small convolution kernel size (usually 3x3) and a deeper network structure . The network is stacked together through multiple convolutional and pooling layers to gradually increase the depth of the network, thereby extracting multi-level feature representations of images. The basic building block of the VGG network is composed of consecutive convolutional layers, each followed by a ReLU activation function. At the end of each convolutional block, a max-pooling layer is added to reduce the size of the feature map. This simple and effective structure of the VGG network makes it easy to understand and implement, and has good generalization performance on different tasks.

        There are several different variants of the VGG network, such as VGG11, VGG13, VGG16, and VGG19, and their numbers represent the number of layers of the network. These variants differ in network depth and number of parameters, with deeper networks generally having more powerful representation capabilities but also being more complex.

2. Experimental environment

    This series of experiments uses the PyTorch deep learning framework. The relevant operations are as follows:

1. Configure the virtual environment

conda create -n DL python=3.7 
conda activate DL
pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
conda install matplotlib
 conda install scikit-learn

2. Library version introduction

software package This experimental version The latest version currently
matplotlib 3.5.3 3.8.0
numpy 1.21.6 1.26.0
python 3.7.16
scikit-learn 0.22.1 1.3.0
torch 1.8.1+cu102 2.0.1
torchaudio 0.8.1 2.0.2
torchvision 0.9.1+cu102 0.15.2

3. Experimental content

ChatGPT:

        Convolutional Neural Network (CNN) is a deep learning model that is widely used in image recognition, computer vision, pattern recognition and other fields. Its design is inspired by how the visual cortex works in biology.

        The convolutional neural network consists of multiple convolutional layers, pooling layers and fully connected layers .

  • The convolution layer is mainly used to extract local features of the image. Through the processing of convolution operations and activation functions, the feature representation of the image can be learned.
  • The pooling layer is used to reduce the dimension of the feature map and reduce the number of parameters while retaining the main feature information.
  • The fully connected layer is used to map the extracted features to the probabilities of different categories for classification or regression tasks.

        Convolutional neural networks have strong advantages in image processing. They can automatically learn feature representations with hierarchical structures and have certain invariance to image transformations such as translation, scaling, and rotation . These characteristics make convolutional neural networks the model of choice for tasks such as image classification, target detection, and semantic segmentation. In addition to image processing, convolutional neural networks can also be applied to other fields, such as natural language processing and time series analysis. By converting text or time series data into a two-dimensional form, convolutional neural networks can be used to process related tasks.

0. Import necessary toolkits

import torch
from torch import nn
import torch.nn.functional as F

1. conv_layer (create convolution block)

  • Each convolutional block consists of three layers
    • nn.Conv2dconvolution layer
    • nn.BatchNorm2dbatch normalization layer
    • ReLU activation layer
def conv_layer(chann_in, chann_out, k_size, p_size):
    layer = nn.Sequential(
        nn.Conv2d(chann_in, chann_out, kernel_size=k_size, padding=p_size),
        nn.BatchNorm2d(chann_out),
        nn.ReLU()
    )
    return layer

  • nn.Conv2d(chann_in, chann_out, kernel_size=k_size, padding=p_size): Two-dimensional convolution layer, which will input the feature map for convolution operation. chann_inIndicates the number of input channels, chann_outindicates the number of output channels, kernel_sizeindicates the convolution kernel size, and paddingindicates the padding size.

  • nn.BatchNorm2d(chann_out): Batch normalization layer, used to standardize the output of the convolutional layer, accelerate the network training process, and enhance the robustness of the network.

  • nn.ReLU(): ReLU activation layer, performs non-linear mapping on the output of the convolution layer, introduces non-linear features, and increases the expressive ability of the network.

2. vgg_conv_block (convolution module: convolution layer, pooling layer)

        It consists of multiple identical convolution blocks and a max pooling layer.

def vgg_conv_block(in_list, out_list, k_list, p_list, pooling_k, pooling_s):

    layers = [conv_layer(in_list[i], out_list[i], k_list[i], p_list[i]) for i in range(len(in_list)) ]
    layers += [nn.MaxPool2d(kernel_size = pooling_k, stride = pooling_s)]
    return nn.Sequential(*layers)

  • The input parameters of the function include:
    • in_list, out_list, k_list, p_list, pooling_k and  pooling_srepresent the number of input channels, the number of output channels, the convolution kernel size, the padding size, and the kernel size and stride of the maximum pooling layer of each convolution block, respectively.
  • conv_layerLayers of multiple convolutional blocks are created  via list comprehensions and functions and stored sequentially layers in a list. Then, add an instance of the max-pooling layer ( nn.MaxPool2d) to  layers the end of the list.
  • By nn.Sequentialconcatenating  layers the layers in the list in order and returning a convolutional module containing all layers.

3. vgg_fc_layer (fully connected layer)

        The fully connected layer consists of three layers: nn.Linearlinear layer, nn.BatchNorm1dbatch normalization layer and ReLU activation layer.

def vgg_fc_layer(size_in, size_out):
    layer = nn.Sequential(
        nn.Linear(size_in, size_out),
        nn.BatchNorm1d(size_out),
        nn.ReLU()
    )
    return layer
  • The input parameters of the function include  size_in and  size_out, which represent the size of the input feature and the size of the output feature respectively.
  • By nn.Sequentialconnecting the three layers of linear layer, batch normalization layer and ReLU activation layer in order, and returning a fully connected layer module.

4. VGG_S (simplified version of VGG model)

        For simplicity, we use a few less convolutional layers.

class VGG_S(nn.Module):
    def __init__ (self, num_classes):
        super().__init__()
        
        self.layer1 = vgg_conv_block([3,64], [64,64], [3,3], [1,1], 2, 2)   
        self.layer2 = vgg_conv_block([64,128], [128,128], [3,3], [1,1], 2, 2)
        self.layer3 = vgg_conv_block([128,256,256], [256,256,256], [3,3,3], [1,1,1], 2, 2)

        # 全连接层
        self.layer4 = vgg_fc_layer(4096, 1024)
        # Final layer
        self.layer5 = nn.Linear(1024, num_classes)
        
    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        vgg16_features = self.layer3(out)
        out = vgg16_features.view(out.size(0), -1)
        out = self.layer4(out)
        out = self.layer5(out)

        return out

a. __init__

  • Three convolution modules ( , , and ) vgg_conv_blockare created by calling the function, and specify their input channel number, output channel number, convolution kernel size, padding size, and kernel size and stride of the max pooling layer.layer1layer2layer3
  • Create a fully connected layer ( layer4) with input feature size of 4096 and output feature size of 1024.
  • By nn.Linearcreating the last layer ( layer5), the 1024-dimensional features are mapped to the number of predicted categories.

b. forward

        The input data goes through three convolution modules in the convolution part, and then viewthe features are flattened into a one-dimensional vector through a function. Then, the feature vector is predicted through the fully connected layer and the last layer, and the prediction result is finally output.

Guess you like

Origin blog.csdn.net/m0_63834988/article/details/133350927