1. Introduction of VGG16
VGG16 is a very classic feature extraction network. The original model is trained in 1000 categories, so it is generally used directly to change the final classification number, and only train the final classification layer to adapt to its own tasks ( Also known as migration learning), why is this approach useful? It may be that different data in nature have similar distributions.
This article does not intend to do this. This article will modify the vgg network to retrain itself.
Let’s take a look at VGG’s native network first
Features:
1. The network structure is extremely simple and clear, with five layers of convolution + three layers of full connection + softmax classification, and no other structures.
2. The convolutional layers all use 3*3 convolution kernels, and three 3*3 convolution kernels are equivalent to the receptive field obtained by a 7*7 convolution kernel, which not only obtains a larger receptive field, but also reduces parameter amount.
2. Model definition
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F
from torchsummary import summary
class VggNet(nn.Module):
def __init__(self, num_classes=1000):
super(VggNet,self).__init__()
self.Conv = torch.nn.Sequential(
# 3*224*224 conv1
torch.nn.Conv2d(3, 64, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.Conv2d(64, 64, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size = 2, stride = 2),
# 64*112*112 conv2
torch.nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size = 2, stride = 2),
# 128*56*56 conv3
torch.nn.Conv2d(128, 256, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.Conv2d(256, 256, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size = 2, stride = 2),
# 256*28*28 conv4
torch.nn.Conv2d(256, 512, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
torch.nn.ReLU(),
torch.nn.MaxPool2d(kernel_size = 2, stride = 2))
# 512*14*14 conv5
# torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
# torch.nn.ReLU(),
# torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
# torch.nn.ReLU(),
# torch.nn.Conv2d(512, 512, kernel_size = 3, stride = 1, padding = 1),
# torch.nn.ReLU(),
# torch.nn.MaxPool2d(kernel_size = 2, stride = 2))
# 512*7*7
self.Classes = torch.nn.Sequential(
torch.nn.Linear(14*14*512, 1060),
torch.nn.ReLU(),
torch.nn.Dropout(p = 0.5),
torch.nn.Linear(1060, 1060),
torch.nn.ReLU(),
torch.nn.Dropout(p = 0.5),
torch.nn.Linear(1060, num_classes))
def forward(self, inputs):
x = self.Conv(inputs)
x = x.view(-1, 14*14*512)
x = self.Classes(x)
return x
if __name__ == "__main__":
model = VggNet(num_classes=1000)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
summary(model, (3, 224, 224))
Here we only use four convolutional layers, and reduce the number of neurons in the fc layer. The goal is to complete the classification.
3. Prepare data
The data used this time is open source data on the Internet, from the Heywhale community https://www.heywhale.com/home/dataset
All are pictures of cars, divided into 10 categories in total, the training set and the verification set are separated, the training set has 1410 pictures, and the verification set has 210 pictures.
Reading data into memory in pytorch generally inherits a dataset class, and then rewrites three functions. The specific operation process is changed according to the form of the data set.
from torch.utils.data import DataLoader,Dataset
from torchvision import transforms as T
import matplotlib.pyplot as plt
import os
from PIL import Image
import numpy as np
class Car(Dataset):
def __init__(self, root, transforms=None):
imgs = []
for path in os.listdir(root):
if path == "truck":
label = 0
elif path == "taxi":
label = 1
elif path == "minibus":
label = 2
elif path == "fire engine":
label = 3
elif path == "racing car":
label = 4
elif path == "SUV":
label = 5
elif path == "bus":
label = 6
elif path == "jeep":
label = 7
elif path == "family sedan":
label = 8
elif path == "heavy truck":
label = 9
else:
print("data label error")
childpath = os.path.join(root, path)
for imgpath in os.listdir(childpath):
imgs.append((os.path.join(childpath, imgpath), label))
self.imgs = imgs
if transforms is None:
normalize = T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
self.transforms = T.Compose([
T.Resize(256),
T.CenterCrop(224),
T.ToTensor(),
normalize
])
else:
self.transforms = transforms
def __getitem__(self, index):
img_path = self.imgs[index][0]
label = self.imgs[index][1]
data = Image.open(img_path)
if data.mode != "RGB":
data = data.convert("RGB")
data = self.transforms(data)
return data,label
def __len__(self):
return len(self.imgs)
if __name__ == "__main__":
root = "/home/elvis/workfile/dataset/car/train"
train_dataset = Car(root)
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
for data, label in train_dataset:
print(data.shape)
pass
Note here that you cannot directly perform ont-hot encoding when labeling the data, because in multi-classification tasks, pytorch will do this work by itself later, so for multi-classification, you only need the label to be a single label.
4. Training
The data is ready, the network is defined, and then the hyperparameters can be defined for training.
import torch
import torch.nn as nn
from torch.utils.data import DataLoader,Dataset
from network import VggNet
from car_data import Car
# 1. prepare data
root = "/home/elvis/workfile/dataset/car/train"
train_dataset = Car(root)
train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
root_val = "/home/elvis/workfile/dataset/car/val"
val_dataset = Car(root_val)
val_dataloader = DataLoader(val_dataset, batch_size=32, shuffle=True)
# 2. load model
model = VggNet(num_classes=10)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
# 3. prepare super parameters
criterion = nn.CrossEntropyLoss()
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# 4. train
val_acc_list = []
for epoch in range(300):
model.train()
train_loss = 0.0
for batch_idx, (data, target) in enumerate(train_dataloader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item()
# val
model.eval()
correct=0
total=0
with torch.no_grad():
for batch_idx, (data, target) in enumerate(val_dataloader):
data, target = data.to(device), target.to(device)
output = model(data)
_, predicted = torch.max(output.data, dim=1)
total += target.size(0)
correct += (predicted==target).sum().item()
acc_val = correct / total
val_acc_list.append(acc_val)
# save model
torch.save(model.state_dict(), "last.pt")
if acc_val == max(val_acc_list):
torch.save(model.state_dict(), "best.pt")
print("save epoch {} model".format(epoch))
print("epoch = {}, loss = {}, acc_val = {}".format(epoch, train_loss, acc_val))
The reason why softmax is not defined when writing the network is because softmax has been integrated in the nn.CrossEntropyLoss() function, and one-hot processing has been performed. At the beginning, the learning rate was defined as 1e-3, but the loss diverged during training, and the loss converged after changing to 1e-4.
The final training result is:
Although the loss has dropped to a relatively small level, the accuracy rate of the verification set is still not high, only 0.615. The reason should be that the amount of data is too small (only 1k+), and for a network as large as vgg, so little data There should be overfitting.
5. Transfer learning for training
Use the weights trained by others in vgg16 to initialize the network weights. Here, it is enough to directly call the vgg16 packaged in torchvision. pretrained=True means to use the pre-trained model (the model trained by others on a larger data set) to initialize the weights. Only the definition file of the network needs to be changed, and nothing else needs to be changed. The network definition is changed to:
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F
from torchsummary import summary
from torchvision import models
class VGGNet_Transfer(nn.Module):
def __init__(self, num_classes=10): #num_classes,此处为 二分类值为2
super(VGGNet_Transfer, self).__init__()
net = models.vgg16(pretrained=True) #从预训练模型加载VGG16网络参数
net.classifier = nn.Sequential() #将分类层置空,下面将改变我们的分类层
self.features = net #保留VGG16的特征层
self.classifier = nn.Sequential( #定义自己的分类层
nn.Linear(512 * 7 * 7, 512), #512 * 7 * 7不能改变 ,由VGG16网络决定的,第二个参数为神经元个数可以微调
nn.ReLU(True),
nn.Dropout(),
nn.Linear(512, 128),
nn.ReLU(True),
nn.Dropout(),
nn.Linear(128, num_classes),
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
The final training result is:
It can be seen that the loss drops rapidly. When epoch=5, the verification set accuracy rate rises to 0.95, and the effect is very good. This also verified our previous conjecture again. If we train by ourselves, there will be very little data, overfitting, and the performance of the verification set will be poor.