Build VGG neural network (based on pytorch)

Preface

Several previous blogs shared content about building a simple fully connected neural network. How to build a neural network - Building a fully connected neural network based on pytorch - CSDN Blog For simple data sets, fully connected networks can achieve better results, such as MNIST Data set; but when the data set is more complex, the fully connected network cannot achieve good results.

Today we will introduce a typical deep neural network VGG, which can better handle tasks with more complex data sets.

VGG neural network

The VGG (Visual Geometry Group) network is a deep convolutional neural network proposed by the VGG team of the Computer Vision Group of Oxford University. It achieved excellent results in the 2014 ImageNet competition.

The main innovation of the VGG network is to improve the expressive ability of the model by repeating the convolution operation of a small convolution kernel (3x3) and increasing the network depth. The design of the VGG network is simple. The entire network is composed of two structures: convolutional layer and fully connected layer. There are two main versions, namely VGG16 and VGG19.

VGG16: This version contains 13 convolutional layers, 3 fully connected layers and 5 max pooling layers. Both convolutional layers and fully connected layers are equipped with ReLU activation functions.
VGG19: Compared with VGG16, VGG19 has better performance. Its structure is basically the same, except that 3 convolutional layers are added, so the total number of convolutional layers is 16.

The advantage of the VGG network is that the model structure is simple and clear, completely using 3x3 small filters and 2x2 maximum pooling layers, and the entire model is very regular. However, the disadvantage is that the number of model parameters is too large, and the dense connection layer (i.e., the fully connected layer) occupies the majority of the total number of parameters, which requires a large amount of computing resources and storage space.

Introduction to training data sets

The CIFAR10 data set introduced before is used here. Compared with the MNIST data set, the CIFAR data set is more complex, has more types of classifications, and is also a three-channel color image, which is more suitable to reflect the performance of deep networks.

Introduction to CIFAR10 data set Detailed explanation and visualization of CIFAR-10 data set-CSDN Blog

VGG neural network construction code

cfg = {'VGG16':[64,64,'m',128,128,'m',256,256,256,'m',512,512,512,'m',512,512,512,'m'],
       'VGG19':[64,64,'m',128,128,'m',256,256,256,256,'m',512,512,512,512,'m',512,512,512,512,'m']
       }

class VGG(nn.Module):
    def __init__(self, cfg_name):
        super(VGG,self).__init__()
        self.features = self._make_layers(cfg[cfg_name])
        self.out = nn.Sequential(nn.Linear(512,10),nn.Softmax(dim=1)) # 如果是多分类就用softmax

    def forward(self,x):
        x = self.features(x)
        # print(x.shape)
        x = x.view(x.size(0),-1)
        x = self.out(x)
        return x
    
    def _make_layers(self,cfg_name):
        in_channels = 3
        layers = []
        for layer in cfg_name:
            if layer == "m":
                layers+=[nn.MaxPool2d(kernel_size=2,stride=2)]
            else:
                layers+=[nn.Conv2d(in_channels=in_channels,out_channels=layer,kernel_size=3,padding=1),
                               nn.BatchNorm2d(layer),
                               nn.ReLU(inplace=True)
                               ]
                in_channels = layer
        layers.append(nn.AvgPool2d(kernel_size=1,stride=1))
        return nn.Sequential(*layers)

In building the model, we define a dictionary cfg in advance to correspond to different types of VGG networks. The value corresponding to the network name as the primary key is a list data, indicating the parameters of each convolution layer or pooling layer. 'm' means that the window is $2\times2$ , maximum pooling with a step size of 2, other numbers represent the in_channels parameters of the corresponding convolution layer, normalization and activation function output are performed after convolution, and finally an $1\times1$ average pooling with a window and a step size of 1 is used layer integrates data from each channel

After completing these processes, the data is input into the fully connected layer for classification output.

Using datasets for model training

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import transforms, utils
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10

import matplotlib.pyplot as plt


class_str = "airplane|automobile|bird|cat|deer|dog|frog|horse|ship|truck"
classes = class_str.split("|")

cfg = {'VGG16':[64,64,'m',128,128,'m',256,256,256,'m',512,512,512,'m',512,512,512,'m'],
       'VGG19':[64,64,'m',128,128,'m',256,256,256,256,'m',512,512,512,512,'m',512,512,512,512,'m']
       }

# 创建多进程时防止报错
if __name__=='__main__':

    my_trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    
    # 这里的路径改为本地CIFAR10数据集保存得到路径
    train_dataset = CIFAR10('D:/deep_learning/12_16/data/', train=True, transform=my_trans, download=False)
    test_dataset = CIFAR10('D:/deep_learning/12_16/data/', train=False, transform=my_trans, download=False)

    train_boarder = DataLoader(train_dataset, batch_size=10, num_workers=5, shuffle=True)
    test_boarder = DataLoader(test_dataset, batch_size=10, num_workers=5)

    
    class VGG(nn.Module):
        def __init__(self, cfg_name):
            super(VGG,self).__init__()
            self.features = self._make_layers(cfg[cfg_name])
            self.out = nn.Sequential(nn.Linear(512,10),nn.Softmax(dim=1)) # 如果是多分类就用softmax

        def forward(self,x):
            x = self.features(x)
            # print(x.shape)
            x = x.view(x.size(0),-1)
            x = self.out(x)
            return x

        def _make_layers(self,cfg_name):
            in_channels = 3
            layers = []
            for layer in cfg_name:
                if layer == "m":
                    layers+=[nn.MaxPool2d(kernel_size=2,stride=2)]
                else:
                    layers+=[nn.Conv2d(in_channels=in_channels,out_channels=layer,kernel_size=3,padding=1),
                                   nn.BatchNorm2d(layer),
                                   nn.ReLU(inplace=True)
                                   ]
                    in_channels = layer
            layers.append(nn.AvgPool2d(kernel_size=1,stride=1))
            return nn.Sequential(*layers)


    # 模型送入GPU
    device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
    print(device)
    model = VGG("VGG16")
    model.to(device)

    # 定义损失函数和优化器
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    loss_funcation = nn.CrossEntropyLoss()

    # 模型训练 可以自己调整
    epoch_num = 25
    
    for epoch in range(epoch_num):
        train_loss = 0
        train_acc = 0
        model.train()
        for imgs, labels in train_boarder:
            imgs = imgs.to(device)
            labels = labels.to(device)

            out = model(imgs)
            loss = loss_funcation(out, labels)
            loss.backward()
            optimizer.step()
            
            optimizer.zero_grad()

            train_loss += loss.item()

            _,pred = out.max(1)
            correct_num = (pred == labels).sum().item()
            train_acc += correct_num/imgs.size(0)

        train_acc_all = train_acc/len(train_boarder)
        train_loss_all = train_loss/len(train_boarder)

        model.eval()
        test_loss = 0
        test_acc = 0
        for imgs, labels in test_boarder:
            imgs = imgs.to(device)
            labels = labels.to(device)

            out = model(imgs)
            loss = loss_funcation(out, labels)
            test_loss += loss.item()

            _,pred = out.max(1)
            correct_num = (pred == labels).sum().item()
            test_acc += correct_num/imgs.size(0)

        test_acc_all = test_acc/len(test_boarder)
        test_loss_all = test_loss/len(test_boarder)

        print('epoch:{}, Train Loss:{:.4f}, Train Acc:{:.4f}, Test Loss:{:.4f}, Test Acc:{:.4f}'.format(epoch, train_loss_all, \
        train_acc_all, test_loss_all, test_acc_all))

The VGG16 version of the model is used in training.

Note that the data set path in the code needs to be changed to the path saved in local CIFAR10; since VGG is a deep neural network, the training time will be relatively long. The number of iterations epoch_num is set to 25. During specific operation, you can set the number of iterations larger. for better training results

Everyone is welcome to discuss and exchange~