pytorch trains VGG16 with its own data

1. Introduction of VGG16

VGG16 is a very classic feature extraction network. The original model is trained in 1000 categories, so it is generally used directly to change the final classification number, and only train the final classification layer to adapt to its own tasks ( Also known as migration learning), why is this approach useful? It may be that different data in nature have similar distributions.

This article does not intend to do this. This article will modify the vgg network to retrain itself.

Let’s take a look at VGG’s native network first

insert image description here

 Features:

1. The network structure is extremely simple and clear, with five layers of convolution + three layers of full connection + softmax classification, and no other structures.

2. The convolutional layers all use 3*3 convolution kernels, and three 3*3 convolution kernels are equivalent to the receptive field obtained by a 7*7 convolution kernel, which not only obtains a larger receptive field, but also reduces parameter amount.

2. Model definition

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F
from torchsummary import summary


class VggNet(nn.Module):
    def __init__(self, num_classes=1000):
        super(VggNet,self).__init__()
        self.Conv = torch.nn.Sequential(
                    # 3*224*224  conv1
                    torch.nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.MaxPool2d(kernel_size = 2, stride = 2),
                    # 64*112*112   conv2
                    torch.nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.MaxPool2d(kernel_size = 2, stride = 2),
                    # 128*56*56    conv3
                    torch.nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.MaxPool2d(kernel_size = 2, stride = 2),
                    # 256*28*28    conv4
                    torch.nn.Conv2d(256, 512, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
                    torch.nn.ReLU(),
                    torch.nn.MaxPool2d(kernel_size = 2, stride = 2))
                    # 512*14*14   conv5
                    # torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
                    # torch.nn.ReLU(),
                    # torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
                    # torch.nn.ReLU(),
                    # torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
                    # torch.nn.ReLU(),
                    # torch.nn.MaxPool2d(kernel_size = 2, stride = 2))
                    # 512*7*7

        self.Classes = torch.nn.Sequential(
                        torch.nn.Linear(14*14*512, 1060),
                        torch.nn.ReLU(),
                        torch.nn.Dropout(p = 0.5),
                        torch.nn.Linear(1060, 1060),
                        torch.nn.ReLU(),
                        torch.nn.Dropout(p = 0.5),
                        torch.nn.Linear(1060, num_classes))

        
    def forward(self, inputs):
        x = self.Conv(inputs)
        x = x.view(-1, 14*14*512)
        x = self.Classes(x)
        return x


if __name__ == "__main__":
    model = VggNet(num_classes=1000)
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)

    summary(model, (3, 224, 224))

Here we only use four convolutional layers, and reduce the number of neurons in the fc layer. The goal is to complete the classification.

3. Prepare data

The data used this time is open source data on the Internet, from the Heywhale community https://www.heywhale.com/home/dataset

All are pictures of cars, divided into 10 categories in total, the training set and the verification set are separated, the training set has 1410 pictures, and the verification set has 210 pictures.

 

 Reading data into memory in pytorch generally inherits a dataset class, and then rewrites three functions. The specific operation process is changed according to the form of the data set.

from torch.utils.data import DataLoader,Dataset
from torchvision import transforms as T
import matplotlib.pyplot as plt
import os
from PIL import Image
import numpy as np

class Car(Dataset):
    def __init__(self, root, transforms=None):
        imgs = []
        for path in os.listdir(root):
            if path == "truck":
                label = 0
            elif path == "taxi":
                label = 1
            elif path == "minibus":
                label = 2
            elif path == "fire engine":
                label = 3
            elif path == "racing car":
                label = 4
            elif path == "SUV":
                label = 5
            elif path == "bus":
                label = 6
            elif path == "jeep":
                label = 7
            elif path == "family sedan":
                label = 8
            elif path == "heavy truck":
                label = 9
            else:
                print("data label error")

            childpath = os.path.join(root, path)
            for imgpath in os.listdir(childpath):
                imgs.append((os.path.join(childpath, imgpath), label))

        self.imgs = imgs
        if transforms is None:
            normalize = T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

            self.transforms = T.Compose([
                    T.Resize(256),
                    T.CenterCrop(224),
                    T.ToTensor(),
                    normalize
            ])
        else:
            self.transforms = transforms
             
    def __getitem__(self, index):
        img_path = self.imgs[index][0]
        label = self.imgs[index][1]

        data = Image.open(img_path)
        if data.mode != "RGB":
            data = data.convert("RGB")
        data = self.transforms(data)
        return data,label

    def __len__(self):
        return len(self.imgs)


if __name__ == "__main__":
    root = "/home/elvis/workfile/dataset/car/train"
    train_dataset = Car(root)
    train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
    for data, label in train_dataset:
        print(data.shape)
        pass

Note here that you cannot directly perform ont-hot encoding when labeling the data, because in multi-classification tasks, pytorch will do this work by itself later, so for multi-classification, you only need the label to be a single label.

4. Training

The data is ready, the network is defined, and then the hyperparameters can be defined for training.

import torch
import torch.nn as nn
from torch.utils.data import DataLoader,Dataset
from network import VggNet
from car_data import Car


# 1. prepare data
root = "/home/elvis/workfile/dataset/car/train"
train_dataset = Car(root)
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)

root_val = "/home/elvis/workfile/dataset/car/val"
val_dataset = Car(root_val)
val_dataloader = DataLoader(val_dataset, batch_size=32, shuffle=True)

# 2. load model
model = VggNet(num_classes=10)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

# 3. prepare super parameters
criterion = nn.CrossEntropyLoss()
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# 4. train
val_acc_list = []
for epoch in range(300):
    model.train()
    train_loss = 0.0
    for batch_idx, (data, target) in enumerate(train_dataloader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        train_loss += loss.item()

    # val
    model.eval()
    correct=0
    total=0
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(val_dataloader):
            data, target = data.to(device), target.to(device)
            output = model(data)
            _, predicted = torch.max(output.data, dim=1)
            total += target.size(0)
            correct += (predicted==target).sum().item()
    acc_val = correct / total
    val_acc_list.append(acc_val)

    # save model
    torch.save(model.state_dict(), "last.pt")
    if acc_val == max(val_acc_list):
            torch.save(model.state_dict(), "best.pt")
            print("save epoch {} model".format(epoch))

    print("epoch = {},  loss = {},  acc_val = {}".format(epoch, train_loss, acc_val))
    

The reason why softmax is not defined when writing the network is because softmax has been integrated in the nn.CrossEntropyLoss() function, and one-hot processing has been performed. At the beginning, the learning rate was defined as 1e-3, but the loss diverged during training, and the loss converged after changing to 1e-4.

The final training result is:

Although the loss has dropped to a relatively small level, the accuracy rate of the verification set is still not high, only 0.615. The reason should be that the amount of data is too small (only 1k+), and for a network as large as vgg, so little data There should be overfitting.

5. Transfer learning for training

Use the weights trained by others in vgg16 to initialize the network weights. Here, it is enough to directly call the vgg16 packaged in torchvision. pretrained=True means to use the pre-trained model (the model trained by others on a larger data set) to initialize the weights. Only the definition file of the network needs to be changed, and nothing else needs to be changed. The network definition is changed to:

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F
from torchsummary import summary
from torchvision import models

class VGGNet_Transfer(nn.Module):
    def __init__(self, num_classes=10):	   #num_classes,此处为 二分类值为2
        super(VGGNet_Transfer, self).__init__()
        net = models.vgg16(pretrained=True)   #从预训练模型加载VGG16网络参数
        net.classifier = nn.Sequential()	#将分类层置空,下面将改变我们的分类层
        self.features = net		#保留VGG16的特征层
        self.classifier = nn.Sequential(    #定义自己的分类层
                nn.Linear(512 * 7 * 7, 512),  #512 * 7 * 7不能改变 ,由VGG16网络决定的,第二个参数为神经元个数可以微调
                nn.ReLU(True),
                nn.Dropout(),
                nn.Linear(512, 128),
                nn.ReLU(True),
                nn.Dropout(),
                nn.Linear(128, num_classes),
        )

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

 The final training result is:

It can be seen that the loss drops rapidly. When epoch=5, the verification set accuracy rate rises to 0.95, and the effect is very good. This also verified our previous conjecture again. If we train by ourselves, there will be very little data, overfitting, and the performance of the verification set will be poor. 

Guess you like

Origin blog.csdn.net/Eyesleft_being/article/details/118757500