[Project Practice] Twelve Classifications of Cats

[Data Science Project Practice] Cat twelve classification migration learning based on ResNet and Inception v3

1. Project Background

This project is derived from the image classification learning competition of the flying pulp platform. guide link

  • The code and results come from my classmates, without any changes, I just make a summary here for learning and review

Simply copy the competition questions:

This competition requires contestants to classify twelve kinds of cats, which is a classic image classification task in the CV direction. As the cornerstone of other image tasks, the image classification task can allow everyone to get started with computer vision faster.

data set

The competition data set contains pictures of 12 kinds of cats and is divided into training set and test set.

Training set: Provide high-definition color pictures and the categories to which the pictures belong. There are a total of 2160 pictures of cats, including annotation files.

Test set: only color pictures are provided, a total of 240 pictures of cats, excluding annotation files.

2. Baseline

2.1 Preparation stage

Mainly import some modules to be used:

import os
import cv2
import torch
import torch.nn as nn
from torchvision import models,transforms
from torch.utils.data import DataLoader,Dataset
import numpy as np
from PIL import Image
from torch.optim import lr_scheduler
import copy

2.2 Data reading stage

This stage is how to read the data into the model. Since cats and cats are image data, reading them into digital images here is generally stored in memory through arrays. Considering the visualization of the intermediate process, we PILpass Read Imagedata of type. This step can be written as:

x=np.fromfile(imgPath,dtype=np.float32) # 读取成ndarray
x=cv2.imdecode(x,1) # 将区间转化为[0,255]
img=PIL.Image.fromarray(x) # 读取成Image对象

insert image description here

In the figure above, the data on the left is Image type, and the data on the right is the data read by cv. It can be found that the color channel has been swapped. In fact, just read the cv part, and you can call multi-window imshowfor data visualization.

We now have cat images! Then the next step is to get the label of the cat. Under normal circumstances, we will record the data and the label in a document, and each line corresponds to a data (picture) path and a label:

# 文件标签
filelist=r"data_split_list.txt"
imgs,labels=[],[] # 存储列表

with open(filelist) as f:
    lines=[_.strip() for _ in f] # 去除空白
    np.random.shuffle(lines) # 随机打乱
    for l in lines:
        img_path,label=l.split('\t') # 获取图片路径和标签
        img=Image.fromarray(cv2.imdecode(np.fromfile(img_path,np.float32),1))
        imgs.append(img)
        labels.append(label)

We encapsulate this part of the work into a function to read the data.

The next job is to convert the data into PyTorchan acceptable format. As we all know, PyTorchmodel training and reasoning are generally DataLoaderperformed by iterating an object, and DataLoaderthe data set of the object is a DataSetclass. So here we need to build a Datasetclass:

class myData(Dataset):
    
    def __init__(self):
        super(myData,self).__init__()
        self.data=[]
    
    def __getitem__(self,x):
        return self.data[x]
    
    def __len__(self):
        return len(self.data)

Well, fill in the above three functions and you will be fine.

For image data, we need to apply a transforms, here is the simplest transformation: 转为Tensor,尺寸裁剪,标准化.

self.transform=transforms.Compose(
    transforms.ToTensor(),
    transforms.Resize((299,299)),
    transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
)

The final Dataset is as follows:

class myData(Dataset):

    def __init__(self,kind):
        super(myData, self).__init__()
        self.mode=kind
        self.transform=transforms.Compose(
            transforms.ToTensor(),
            transforms.Resize((299,299)),
            transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
        )

        if kind=="test":
            self.imgs=self.load_origin_data()
        else:
            self.imgs,self.labels=self.load_origin_data()

    def __getitem__(self, item):
        if self.mode=="test":
            return self.transform(self.imgs[item])
        else:
            return self.transform(self.imgs[item]),torch.tensor(self.labels[item])

    def __len__(self):
        return len(self.imgs)

    def load_origin_data(self):
        filelist = './data/%s_split_list.txt' % self.mode
        imgs,labels=[],[]
        data_dir=os.getcwd()+"/data"
        if self.mode=='train' or self.mode=='val':
            with open(filelist) as f:
                lines=[_.strip() for _ in f]
                if self.mode=='train':
                    np.random.shuffle(lines)
                    for l in lines:
                        img_path,label=l.split('\t')
                        img_path=os.path.join(data_dir,img_path)
                        try:
                            img=Image.fromarray(cv2.imdecode(np.fromfile(img_path,dtype=np.float32),1))
                            imgs.append(img)
                            labels.append(int(label))
                        except Exception("The path %s"%img_path+" may be wrong") as e:
                            print(e)
                            continue
                    return imgs,labels
                elif self.mode=="test":
                    full_lines = os.listdir('data/cat_12_test/')
                    lines = [line.strip() for line in full_lines]
                    for img_path in lines:
                        img_path = os.path.join(data_dir, "cat_12_test/", img_path)
                        img = Image.open(img_path)
                        imgs.append(img)
                    return imgs

2.3 Model Training

The model training and reasoning we just mentioned PyTorchare generally DataLoadercarried out by iterating an object, and now we need to build this thing:

def get_Dataloader():
    img_datasets = {
    
    x: myData(x) for x in ['train', 'val', 'test']}
    dataset_sizes = {
    
    x: len(img_datasets[x]) for x in ['train', 'val', 'test']}

    train_loader = DataLoader(
        dataset=img_datasets['train'],
        batch_size=24,
        shuffle=True
    )

    val_loader = DataLoader(
        dataset=img_datasets['val'],
        batch_size=1,
        shuffle=False
    )

    test_loader = DataLoader(
        dataset=img_datasets['test'],
        batch_size=1,
        shuffle=False
    )

    dataloaders = {
    
    
        'train': train_loader,
        'val': val_loader,
        'test': test_loader
    }
    return dataset_sizes,dataloaders

Then there is the simple training process. The steps are summarized as follows:

  • parameter setting stage
    • set GPU
    • Set optimizer, loss function, learning strategy
  • training process
    • Iterate DataLoader
    • Optimizer gradient reset
    • model reasoning
    • error calculation
    • backpropagation
    • Update optimizer, learning rate
  • model evaluation
    • Calculate the error accumulation value and precision of each round
    • Select the best precision and save the model
def Train(model,criterion,optimizer,scheduler,num_epoches=25):
    dataset_sizes,dataloaders=get_Dataloader()
    device=torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    best_model_wts=copy.deepcopy(model.state_dict())
    best_acc=0.0

    for epoch in range(num_epoches):
        print("Epoch {}/{}".format(epoch+1,num_epoches))

        for phase in ['train','val']:
            if phase=="train":
                model.train()
            else:
                model.eval()

            trian_loss=0.0
            train_corrects=0

            for inputs,labels in dataloaders[phase]:
                inputs,labels=inputs.to(device),labels.to(device)
                optimizer.zero_grad()

                with torch.set_grad_enabled(phase=="train"):
                    # 上下文管理器,参数是Bool,用于确定是否对Block内的语句进行求导
                    y_pre=model(inputs)
                    _,y_pre=torch.max(y_pre,1)
                    loss=criterion(y_pre,labels)

                    if phase=="train":
                        loss.backward()
                        optimizer.step()

                trian_loss+=loss.item()*inputs.size(0)
                train_corrects+=torch.sum(y_pre==labels)
            if phase=="train":
                scheduler.step()

            epoch_loss=trian_loss/dataset_sizes[phase]
            epoch_acc=train_corrects.float()/dataset_sizes[phase]

            print("{} Loss :{:.4f} Acc {:.4}".format(phase,epoch_loss,epoch_acc))

            if phase=="val" and epoch_acc>best_acc:
                best_acc=epoch_acc
                best_model_wts=copy.deepcopy(model.state_dict())
    print("Best val Acc : {:4f}".format(best_acc))
    model.load_state_dict(best_model_wts)
    return model

3. Transfer Learning

Transfer learning is to use pre-trained large model parameters to learn the distribution of other data.

In this process, we generally do not want the original model parameters to change, so we generally need to do the following work:

for param in model.parameters():
    param.requires_grad=False

Then, we need to construct the last fully connected layer to learn the new data set:

model.fc=nn.Linear(2048,num_classes)

That is, the last thing that needs to be trained is this fully connected layer.

def Inception(device):
    # 用训练好的模型进行迁移
    model_ft=models.inception_v3(pretrained=True)
    # model_ft=models.resnet50(pretrained=True)
    # model_ft=models.alexnet(pretrained=True)

    num_ftrs=model_ft.fc.in_features
    model_ft.fc=nn.Linear(num_ftrs,12) # 设置全连接层最终结果
    
    model_ft=model_ft.to(device)

    cirterion=nn.CrossEntropyLoss()
    optimizer_ft=torch.optim.SGD(model_ft.parameters(),lr=0.001,momentum=0.9)
    exp_lr_scheduler=lr_scheduler.StepLR(optimizer_ft,step_size=5,gamma=0.1)
    model_ft=Train(model_ft,cirterion,optimizer_ft,exp_lr_scheduler,num_epoches=30)

4. Results analysis

  • Inception

    Epoch 30/30
    train Loss: 0.1065 Acc: 0.9858
    val Loss: 0.3026 Acc: 0.8983
    Best val Acc: 0.918336
    
  • AlexNet

    Epoch 30/30
    train Loss: 0.1403 Acc: 0.9601
    val Loss: 0.6815 Acc: 0.7750
    Best val Acc: 0.779661
    
  • ResNet50

    Epoch 30/30
    train Loss: 0.0480 Acc: 0.9973
    val Loss: 0.3157 Acc: 0.9060
    Best val Acc: 0.909091
    

The result of the feature map in the middle part is as follows:

insert image description here

The feature map is an abstraction. It can be found that the same image has brand-new high-dimensional features after being processed by different convolution kernels. These features are also mainly difficult to explain, anyway, just watch it for fun.

insert image description here

Basically, it converges after 7 epochs.

Guess you like

Origin blog.csdn.net/qq_45957458/article/details/130877398