[Data Science Project Practice] Cat twelve classification migration learning based on ResNet and Inception v3
1. Project Background
This project is derived from the image classification learning competition of the flying pulp platform. guide link
- The code and results come from my classmates, without any changes, I just make a summary here for learning and review
Simply copy the competition questions:
This competition requires contestants to classify twelve kinds of cats, which is a classic image classification task in the CV direction. As the cornerstone of other image tasks, the image classification task can allow everyone to get started with computer vision faster.
data set
The competition data set contains pictures of 12 kinds of cats and is divided into training set and test set.
Training set: Provide high-definition color pictures and the categories to which the pictures belong. There are a total of 2160 pictures of cats, including annotation files.
Test set: only color pictures are provided, a total of 240 pictures of cats, excluding annotation files.
2. Baseline
2.1 Preparation stage
Mainly import some modules to be used:
import os
import cv2
import torch
import torch.nn as nn
from torchvision import models,transforms
from torch.utils.data import DataLoader,Dataset
import numpy as np
from PIL import Image
from torch.optim import lr_scheduler
import copy
2.2 Data reading stage
This stage is how to read the data into the model. Since cats and cats are image data, reading them into digital images here is generally stored in memory through arrays. Considering the visualization of the intermediate process, we PIL
pass Read Image
data of type. This step can be written as:
x=np.fromfile(imgPath,dtype=np.float32) # 读取成ndarray
x=cv2.imdecode(x,1) # 将区间转化为[0,255]
img=PIL.Image.fromarray(x) # 读取成Image对象
In the figure above, the data on the left is Image type, and the data on the right is the data read by cv. It can be found that the color channel has been swapped. In fact, just read the cv part, and you can call multi-window imshow
for data visualization.
We now have cat images! Then the next step is to get the label of the cat. Under normal circumstances, we will record the data and the label in a document, and each line corresponds to a data (picture) path and a label:
# 文件标签
filelist=r"data_split_list.txt"
imgs,labels=[],[] # 存储列表
with open(filelist) as f:
lines=[_.strip() for _ in f] # 去除空白
np.random.shuffle(lines) # 随机打乱
for l in lines:
img_path,label=l.split('\t') # 获取图片路径和标签
img=Image.fromarray(cv2.imdecode(np.fromfile(img_path,np.float32),1))
imgs.append(img)
labels.append(label)
We encapsulate this part of the work into a function to read the data.
The next job is to convert the data into PyTorch
an acceptable format. As we all know, PyTorch
model training and reasoning are generally DataLoader
performed by iterating an object, and DataLoader
the data set of the object is a DataSet
class. So here we need to build a Dataset
class:
class myData(Dataset):
def __init__(self):
super(myData,self).__init__()
self.data=[]
def __getitem__(self,x):
return self.data[x]
def __len__(self):
return len(self.data)
Well, fill in the above three functions and you will be fine.
For image data, we need to apply a transforms
, here is the simplest transformation: 转为Tensor,尺寸裁剪,标准化
.
self.transform=transforms.Compose(
transforms.ToTensor(),
transforms.Resize((299,299)),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
)
The final Dataset is as follows:
class myData(Dataset):
def __init__(self,kind):
super(myData, self).__init__()
self.mode=kind
self.transform=transforms.Compose(
transforms.ToTensor(),
transforms.Resize((299,299)),
transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
)
if kind=="test":
self.imgs=self.load_origin_data()
else:
self.imgs,self.labels=self.load_origin_data()
def __getitem__(self, item):
if self.mode=="test":
return self.transform(self.imgs[item])
else:
return self.transform(self.imgs[item]),torch.tensor(self.labels[item])
def __len__(self):
return len(self.imgs)
def load_origin_data(self):
filelist = './data/%s_split_list.txt' % self.mode
imgs,labels=[],[]
data_dir=os.getcwd()+"/data"
if self.mode=='train' or self.mode=='val':
with open(filelist) as f:
lines=[_.strip() for _ in f]
if self.mode=='train':
np.random.shuffle(lines)
for l in lines:
img_path,label=l.split('\t')
img_path=os.path.join(data_dir,img_path)
try:
img=Image.fromarray(cv2.imdecode(np.fromfile(img_path,dtype=np.float32),1))
imgs.append(img)
labels.append(int(label))
except Exception("The path %s"%img_path+" may be wrong") as e:
print(e)
continue
return imgs,labels
elif self.mode=="test":
full_lines = os.listdir('data/cat_12_test/')
lines = [line.strip() for line in full_lines]
for img_path in lines:
img_path = os.path.join(data_dir, "cat_12_test/", img_path)
img = Image.open(img_path)
imgs.append(img)
return imgs
2.3 Model Training
The model training and reasoning we just mentioned PyTorch
are generally DataLoader
carried out by iterating an object, and now we need to build this thing:
def get_Dataloader():
img_datasets = {
x: myData(x) for x in ['train', 'val', 'test']}
dataset_sizes = {
x: len(img_datasets[x]) for x in ['train', 'val', 'test']}
train_loader = DataLoader(
dataset=img_datasets['train'],
batch_size=24,
shuffle=True
)
val_loader = DataLoader(
dataset=img_datasets['val'],
batch_size=1,
shuffle=False
)
test_loader = DataLoader(
dataset=img_datasets['test'],
batch_size=1,
shuffle=False
)
dataloaders = {
'train': train_loader,
'val': val_loader,
'test': test_loader
}
return dataset_sizes,dataloaders
Then there is the simple training process. The steps are summarized as follows:
- parameter setting stage
- set GPU
- Set optimizer, loss function, learning strategy
- training process
- Iterate DataLoader
- Optimizer gradient reset
- model reasoning
- error calculation
- backpropagation
- Update optimizer, learning rate
- model evaluation
- Calculate the error accumulation value and precision of each round
- Select the best precision and save the model
def Train(model,criterion,optimizer,scheduler,num_epoches=25):
dataset_sizes,dataloaders=get_Dataloader()
device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
best_model_wts=copy.deepcopy(model.state_dict())
best_acc=0.0
for epoch in range(num_epoches):
print("Epoch {}/{}".format(epoch+1,num_epoches))
for phase in ['train','val']:
if phase=="train":
model.train()
else:
model.eval()
trian_loss=0.0
train_corrects=0
for inputs,labels in dataloaders[phase]:
inputs,labels=inputs.to(device),labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(phase=="train"):
# 上下文管理器,参数是Bool,用于确定是否对Block内的语句进行求导
y_pre=model(inputs)
_,y_pre=torch.max(y_pre,1)
loss=criterion(y_pre,labels)
if phase=="train":
loss.backward()
optimizer.step()
trian_loss+=loss.item()*inputs.size(0)
train_corrects+=torch.sum(y_pre==labels)
if phase=="train":
scheduler.step()
epoch_loss=trian_loss/dataset_sizes[phase]
epoch_acc=train_corrects.float()/dataset_sizes[phase]
print("{} Loss :{:.4f} Acc {:.4}".format(phase,epoch_loss,epoch_acc))
if phase=="val" and epoch_acc>best_acc:
best_acc=epoch_acc
best_model_wts=copy.deepcopy(model.state_dict())
print("Best val Acc : {:4f}".format(best_acc))
model.load_state_dict(best_model_wts)
return model
3. Transfer Learning
Transfer learning is to use pre-trained large model parameters to learn the distribution of other data.
In this process, we generally do not want the original model parameters to change, so we generally need to do the following work:
for param in model.parameters():
param.requires_grad=False
Then, we need to construct the last fully connected layer to learn the new data set:
model.fc=nn.Linear(2048,num_classes)
That is, the last thing that needs to be trained is this fully connected layer.
def Inception(device):
# 用训练好的模型进行迁移
model_ft=models.inception_v3(pretrained=True)
# model_ft=models.resnet50(pretrained=True)
# model_ft=models.alexnet(pretrained=True)
num_ftrs=model_ft.fc.in_features
model_ft.fc=nn.Linear(num_ftrs,12) # 设置全连接层最终结果
model_ft=model_ft.to(device)
cirterion=nn.CrossEntropyLoss()
optimizer_ft=torch.optim.SGD(model_ft.parameters(),lr=0.001,momentum=0.9)
exp_lr_scheduler=lr_scheduler.StepLR(optimizer_ft,step_size=5,gamma=0.1)
model_ft=Train(model_ft,cirterion,optimizer_ft,exp_lr_scheduler,num_epoches=30)
4. Results analysis
-
Inception
Epoch 30/30 train Loss: 0.1065 Acc: 0.9858 val Loss: 0.3026 Acc: 0.8983 Best val Acc: 0.918336
-
AlexNet
Epoch 30/30 train Loss: 0.1403 Acc: 0.9601 val Loss: 0.6815 Acc: 0.7750 Best val Acc: 0.779661
-
ResNet50
Epoch 30/30 train Loss: 0.0480 Acc: 0.9973 val Loss: 0.3157 Acc: 0.9060 Best val Acc: 0.909091
The result of the feature map in the middle part is as follows:
The feature map is an abstraction. It can be found that the same image has brand-new high-dimensional features after being processed by different convolution kernels. These features are also mainly difficult to explain, anyway, just watch it for fun.
Basically, it converges after 7 epochs.