Table of contents
1. Configure the virtual environment
2. Library version introduction
1. conv_layer (create convolution block)
2. vgg_conv_block (convolution module: convolution layer*n, pooling layer)
3. vgg_fc_layer (fully connected layer)
4. VGG_S (simplified version of VGG model)
1. Experiment introduction
This experiment implemented a simplified version of the VGG network and completed the image classification task based on it.
The VGG network is one of the classic models in deep convolutional neural networks, proposed by the Visual Geometry Group of the University of Oxford. It achieved excellent results in the 2014 ImageNet image classification challenge (second in the classification task and first in the positioning task) and is widely used in tasks such as image classification, target detection, and image generation.
The main feature of the VGG network is the use of a very small convolution kernel size (usually 3x3) and a deeper network structure . The network is stacked together through multiple convolutional and pooling layers to gradually increase the depth of the network, thereby extracting multi-level feature representations of images. The basic building block of the VGG network is composed of consecutive convolutional layers, each followed by a ReLU activation function. At the end of each convolutional block, a max-pooling layer is added to reduce the size of the feature map. This simple and effective structure of the VGG network makes it easy to understand and implement, and has good generalization performance on different tasks.
There are several different variants of the VGG network, such as VGG11, VGG13, VGG16, and VGG19, and their numbers represent the number of layers of the network. These variants differ in network depth and number of parameters, with deeper networks generally having more powerful representation capabilities but also being more complex.
2. Experimental environment
This series of experiments uses the PyTorch deep learning framework. The relevant operations are as follows:
1. Configure the virtual environment
conda create -n DL python=3.7
conda activate DL
pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
conda install matplotlib
conda install scikit-learn
2. Library version introduction
software package | This experimental version | The latest version currently |
matplotlib | 3.5.3 | 3.8.0 |
numpy | 1.21.6 | 1.26.0 |
python | 3.7.16 | |
scikit-learn | 0.22.1 | 1.3.0 |
torch | 1.8.1+cu102 | 2.0.1 |
torchaudio | 0.8.1 | 2.0.2 |
torchvision | 0.9.1+cu102 | 0.15.2 |
3. Experimental content
ChatGPT:
Convolutional Neural Network (CNN) is a deep learning model that is widely used in image recognition, computer vision, pattern recognition and other fields. Its design is inspired by how the visual cortex works in biology.
The convolutional neural network consists of multiple convolutional layers, pooling layers and fully connected layers .
- The convolution layer is mainly used to extract local features of the image. Through the processing of convolution operations and activation functions, the feature representation of the image can be learned.
- The pooling layer is used to reduce the dimension of the feature map and reduce the number of parameters while retaining the main feature information.
- The fully connected layer is used to map the extracted features to the probabilities of different categories for classification or regression tasks.
Convolutional neural networks have strong advantages in image processing. They can automatically learn feature representations with hierarchical structures and have certain invariance to image transformations such as translation, scaling, and rotation . These characteristics make convolutional neural networks the model of choice for tasks such as image classification, target detection, and semantic segmentation. In addition to image processing, convolutional neural networks can also be applied to other fields, such as natural language processing and time series analysis. By converting text or time series data into a two-dimensional form, convolutional neural networks can be used to process related tasks.
0. Import necessary toolkits
import torch
from torch import nn
import torch.nn.functional as F
1. conv_layer (create convolution block)
- Each convolutional block consists of three layers
nn.Conv2d
convolution layernn.BatchNorm2d
batch normalization layer- ReLU activation layer
def conv_layer(chann_in, chann_out, k_size, p_size):
layer = nn.Sequential(
nn.Conv2d(chann_in, chann_out, kernel_size=k_size, padding=p_size),
nn.BatchNorm2d(chann_out),
nn.ReLU()
)
return layer
-
nn.Conv2d(chann_in, chann_out, kernel_size=k_size, padding=p_size)
: Two-dimensional convolution layer, which will input the feature map for convolution operation.chann_in
Indicates the number of input channels,chann_out
indicates the number of output channels,kernel_size
indicates the convolution kernel size, andpadding
indicates the padding size. -
nn.BatchNorm2d(chann_out)
: Batch normalization layer, used to standardize the output of the convolutional layer, accelerate the network training process, and enhance the robustness of the network. -
nn.ReLU()
: ReLU activation layer, performs non-linear mapping on the output of the convolution layer, introduces non-linear features, and increases the expressive ability of the network.
2. vgg_conv_block (convolution module: convolution layer, pooling layer)
It consists of multiple identical convolution blocks and a max pooling layer.
def vgg_conv_block(in_list, out_list, k_list, p_list, pooling_k, pooling_s):
layers = [conv_layer(in_list[i], out_list[i], k_list[i], p_list[i]) for i in range(len(in_list)) ]
layers += [nn.MaxPool2d(kernel_size = pooling_k, stride = pooling_s)]
return nn.Sequential(*layers)
- The input parameters of the function include:
in_list
,out_list
,k_list
,p_list
,pooling_k
andpooling_s
represent the number of input channels, the number of output channels, the convolution kernel size, the padding size, and the kernel size and stride of the maximum pooling layer of each convolution block, respectively.
conv_layer
Layers of multiple convolutional blocks are created via list comprehensions and functions and stored sequentiallylayers
in a list. Then, add an instance of the max-pooling layer (nn.MaxPool2d
) tolayers
the end of the list.- By
nn.Sequential
concatenatinglayers
the layers in the list in order and returning a convolutional module containing all layers.
3. vgg_fc_layer (fully connected layer)
The fully connected layer consists of three layers: nn.Linear
linear layer, nn.BatchNorm1d
batch normalization layer and ReLU activation layer.
def vgg_fc_layer(size_in, size_out):
layer = nn.Sequential(
nn.Linear(size_in, size_out),
nn.BatchNorm1d(size_out),
nn.ReLU()
)
return layer
- The input parameters of the function include
size_in
andsize_out
, which represent the size of the input feature and the size of the output feature respectively. - By
nn.Sequential
connecting the three layers of linear layer, batch normalization layer and ReLU activation layer in order, and returning a fully connected layer module.
4. VGG_S (simplified version of VGG model)
For simplicity, we use a few less convolutional layers.
class VGG_S(nn.Module):
def __init__ (self, num_classes):
super().__init__()
self.layer1 = vgg_conv_block([3,64], [64,64], [3,3], [1,1], 2, 2)
self.layer2 = vgg_conv_block([64,128], [128,128], [3,3], [1,1], 2, 2)
self.layer3 = vgg_conv_block([128,256,256], [256,256,256], [3,3,3], [1,1,1], 2, 2)
# 全连接层
self.layer4 = vgg_fc_layer(4096, 1024)
# Final layer
self.layer5 = nn.Linear(1024, num_classes)
def forward(self, x):
out = self.layer1(x)
out = self.layer2(out)
vgg16_features = self.layer3(out)
out = vgg16_features.view(out.size(0), -1)
out = self.layer4(out)
out = self.layer5(out)
return out
a. __init__
- Three convolution modules ( , , and )
vgg_conv_block
are created by calling the function, and specify their input channel number, output channel number, convolution kernel size, padding size, and kernel size and stride of the max pooling layer.layer1
layer2
layer3
- Create a fully connected layer (
layer4
) with input feature size of 4096 and output feature size of 1024. - By
nn.Linear
creating the last layer (layer5
), the 1024-dimensional features are mapped to the number of predicted categories.
b. forward
The input data goes through three convolution modules in the convolution part, and then view
the features are flattened into a one-dimensional vector through a function. Then, the feature vector is predicted through the fully connected layer and the last layer, and the prediction result is finally output.