Deep learning-actual combat of cats and dogs vs. pytorch


a classic item on kaggle, when CNN used to be doing a beginning, do simple rough
I put the whole project is divided into four
Insert picture description here
config to configure some parameters, Dataset used to build data sets
Main data used to train and save Wait,
the configuration of the model config that Module uses to build is as follows

TRAIN_PATH = r'D:\temp\train'
PRE_PATH = r'D:\temp\test1' 
BATCH_SIZE = 200 # batch_size
PIC_SIZE = 100 # 图片的大小

Data preparation & processing

Here I have written it in the Dataset file.
The data comes from the kaggle official website https://www.kaggle.com/c/dogs-vs-cats to download.
There are three files in total, one csv file is used for submission, and the remaining two are one for training and one for prediction.

Taking a look at the training set, you will find that its label is directly attached to the name of the picture, so you need to match the string when processing it.

Furthermore, Kaggle did not divide the training set and the test set, so you should manually divide it here. (In fact, it should be divided into training set, validation set, test set, but I forgot it at the time)

First import the necessary packages

import os 
from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
import torch
import random
import config

The data set of pytorch is inherited from Dataset, so we have to define a class

class Cat_Vs_DogDatasets(Dataset):
	pass

Then the general idea of ​​construction is to give a list of picture names, then synthesize the picture paths according to the paths, and then import them one by one.

class Cat_Vs_DogDatasets(Dataset):
    def __init__(self, path: str, imglist: list, s: int=config.PIC_SIZE):
        self.path = path # 图片文件夹路径
        self.compose = transforms.Compose([ # 图片processing
            transforms.Resize(size=s), # 按照比例,把最小的边缩放成s大小
            transforms.CenterCrop(size=s), # 裁剪为s乘s的图片
            transforms.ToTensor(),
            transforms.Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))
        ])
        self.imglist = imglist # 图片名词列表
        self.len = len(imglist)
    
    def __getitem__(self, idx: int):
        name = self.imglist[idx]
        label = 0
        if name[:3] == 'cat': # 获取label
            label = 1
        return self.compose(Image.open(os.path.join(self.path, name) ,mode='r')), torch.tensor(label, dtype=torch.int64)
    
    def __len__(self):
        return self.len

Then, we built a data set structure, and then we need to divide the data. We can divide the data directly by dividing the list of picture names.

def train_test_split(path, test_size=0.15, random_state:int=666, s: int=config.PIC_SIZE):
    imglist = os.listdir(path) # 把所有图片的名词导入
    random.seed(random_state)
    random.shuffle(imglist) #打乱顺序
    train_size = int((1 - test_size) * len(imglist)) # 计算训练集代销
    train = imglist[:train_size] # 训练集
    test = imglist[train_size:] # 测试集
    return Cat_Vs_DogDatasets(path, imglist=train, s=s), Cat_Vs_DogDatasets(path, imglist=test, s=s) # 生成

So far, the data is ready.

Model building

Here is a random model, there are two convolutional layers, pooling layer, and then connected to a double-layer fully connected network.

import torch
from torch import nn
import config

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.cv1 = nn.Conv2d(in_channels=3, out_channels=10, kernel_size=5, stride=1, padding=0)
        self.cv2 = nn.Conv2d(in_channels=10, out_channels=20, kernel_size=5, stride=1, padding=0)
        self.maxpooling = nn.MaxPool2d(kernel_size=2)

        self.acFun = nn.functional.relu
        self.liner1 = nn.Linear(9680, 80)
        self.liner2 = nn.Linear(80, 2)
    def forward(self, x):
        x = self.acFun(self.cv1(x))
        x = self.maxpooling(x)
        x = self.acFun(self.cv2(x))
        x = self.maxpooling(x)
        
        x = x.view(x.shape[0], -1)
        x = self.acFun(self.liner1(x))
        x = self.acFun(self.liner2(x))
        return x

In the model, the size of my picture is 100×100. The reason why it is so small is because of the limited memory of the graphics card, but the resolution of 100×100 is not bad.

training

Training here, the optimizer used is SGD, and the loss function is the cross-entropy function.

import Dataset 
import Module
import config
import torch

device = torch.device('cuda:0')

train, test = Dataset.train_test_split(config.TRAIN_PATH)
train_loader = torch.utils.data.DataLoader(dataset=train, batch_size=config.BATCH_SIZE, num_workers=4, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test, batch_size=config.BATCH_SIZE)

moudle = Module.CNN().to(device)
sgd = torch.optim.SGD(params=moudle.parameters(), lr=0.01, momentum=0.15)
loss_fun = torch.nn.functional.cross_entropy

There is a problem here. When using SGD, I found that when the lr is too large, the loss does not decrease or rise. I am a bit confused and now I have not figured out how to explain it, so I adjusted the lr to a smaller value. Then increase the size of Momentum appropriately.

Then I wrote a function to calculate ACC

def OutPutACC():
    with torch.no_grad():
        A = 0
        for batch in test_loader:
            X, y = batch
            X = X.to(device)
            y = y.to(device)
            pre = torch.argmax(moudle(X), dim=1)
            A += (pre == y).sum().item()
        print(A / len(test))

Finally train

if __name__ == '__main__':
    for epoch in range(40):
        for i, batch in enumerate(train_loader):
            X, y = batch
            X = X.to(device)
            y = y.to(device)

            pre = moudle(X)
            loss = loss_fun(pre, y)

            sgd.zero_grad()
            loss.backward()
            sgd.step()
            if i % 10 == 0:
                print(loss)
        OutPutACC()

After 40 rounds of training and about 20 minutes, the ACC of the final model on the test set was 77.2%. The model is obviously still underfitting, and the number of training rounds can continue to increase.
Insert picture description here
Finally, attach the entire project link
https://github.com/zipper112/CatvsDog

Guess you like

Origin blog.csdn.net/qq_36102055/article/details/113957628