Dataset download address: https://www.microsoft.com/en-us/download/confirmation.aspx?id=54765

Dogs vs. Cats (Cats and Dogs War) comes from a competition question on Kaggle. The task is to design an algorithm for discriminating cat and dog pictures given a data set.

The data set includes 25,000 labeled training set pictures, 125,000 cats and 125,000 dogs each, and the labels are named after cat or dog. The image is a jpg image in RGB format with different sizes. Screenshot below:

1. Data preprocessing

The data preprocessing part of pytorch should be written as a class, which inherits the Dataset class and must implement three functions.

from torch.utils.data import DataLoader,Dataset
from torchvision import transforms as T
import matplotlib.pyplot as plt
import os
from PIL import Image

class DogCat(Dataset):
    def __init__(self, root, transforms=None, train=True):
        imgs = [os.path.join(root,img) for img in os.listdir(root)]
        imgs_num = len(imgs)

        if train:
            self.imgs = imgs[:int(0.7 * imgs_num)]
        else:
            self.imgs = imgs[int(0.3 * imgs_num):]

        if transforms is None:
            normalize = T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

            self.transforms = T.Compose([
                    T.Resize(224),
                    T.CenterCrop(224),
                    T.ToTensor(),
                    normalize
            ])
        else:
            self.transforms = transforms
             
    def __getitem__(self, index):
        img_path = self.imgs[index]
        # dog label : 1           cat label : 0
        label = 1 if "dog" in img_path.split('/')[-1] else 0
        data = Image.open(img_path)
        data = self.transforms(data)
        return data,label

    def __len__(self):
        return len(self.imgs)

__init__ is the constructor, and I am here to define the data path, data set division, and transforms.

__getitem__ is an iterative function used to return the data and label of a single data.

__len__ returns the length of the dataset.

2. Define the network

In this example, we use a simple 4-layer convolution, 2-layer fully connected, and finally a CNN network with a sigmoid output probability of two classifications.

import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F


class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.conv3 = nn.Conv2d(64, 128, 3)
        self.conv4 = nn.Conv2d(128, 128, 3)
        self.max_pool = nn.MaxPool2d(2)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        # 12*12 for size(224,224)    7*7 for size(150,150)
        self.fc1 = nn.Linear(128*12*12, 512)
        self.fc2 = nn.Linear(512, 1)
        
    def forward(self, x):
        in_size = x.size(0)
        x = self.conv1(x)
        x = self.relu(x)
        x = self.max_pool(x)
        x = self.conv2(x)
        x = self.relu(x)
        x = self.max_pool(x)
        x = self.conv3(x)
        x = self.relu(x)
        x = self.max_pool(x)
        x = self.conv4(x)
        x = self.relu(x)
        x = self.max_pool(x)
        # 展开
        x = x.view(in_size, -1)
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

When pytorch defines a network, two functions must be implemented. The constructor mainly defines some network blocks, and the forward function implements the forward reasoning process. And in the follow-up code, if you define the object model: ConvNet and data image, you can directly call the froward function through the model (image) (python is really amazing, and I am born in C++ to understand how difficult these operations are)

three. training model

The data is ready, the model network is defined, and of course the next step is to train the weights.


import torch
import torch.nn as nn
from torch.utils.data import DataLoader,Dataset
from dataset import DogCat
from network import ConvNet
from draw import draw_acc,draw_loss


train_data_root = "/home/elvis/workfile/dataset/dataset_kaggledogvscat/train"
batch_size = 256
# 1. prepare dataset
train_data = DogCat(train_data_root, train=True)
val_data = DogCat(train_data_root, train=False)
train_dataloader = DataLoader(train_data,batch_size=batch_size,shuffle=True)
val_dataloader = DataLoader(val_data,batch_size=batch_size,shuffle=True)

# 2. load model
model = ConvNet()
if torch.cuda.is_available():
    model.cuda()

# 3. prepare super parameters
criterion = nn.BCELoss()
learning_rate = 1e-3
# optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# 4. train
train_loss_epoch = []
train_acc_epoch = []
val_loss_epoch = []
val_acc_epoch = []
for epoch in range(1, 10):
    model.train()
    train_loss = 0;
    train_acc = 0;
    for batch_idx, (data, target) in enumerate(train_dataloader):
        if torch.cuda.is_available():
            data, target = data.cuda(), target.cuda().float().unsqueeze(-1)
        else:
            data, target = data, target.float().unsqueeze(-1)
        optimizer.zero_grad()
        output = model(data)
        # print(output)
        loss = criterion(output, target)
        train_loss += loss.item();
        pred = torch.tensor([[1] if num[0] >= 0.5 else [0] for num in output]).cuda();
        train_acc += pred.eq(target.long()).sum().item();
        loss.backward()
        optimizer.step()
        if(batch_idx+1)%10 == 0: 
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, (batch_idx+1) * len(data), len(train_dataloader.dataset),
                100. * (batch_idx+1) / len(train_dataloader), loss.item()))
    train_loss_epoch.append(train_loss / len(train_dataloader));
    train_acc_epoch.append(train_acc / len(train_dataloader.dataset));
    print('\nTrain set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)'.format(train_loss / len(train_dataloader), train_acc, len(train_dataloader.dataset),
                                                                                    100. * train_acc / len(train_dataloader.dataset)));

    # val
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for batch_idx, (data, target) in enumerate(val_dataloader):
            if torch.cuda.is_available():
                data, target = data.cuda(), target.cuda().float().unsqueeze(-1)
            else:
                data, target = data, target.float().unsqueeze(-1)
            output = model(data)
            # print(output)
            test_loss += criterion(output, target).item(); #每个批次平均，一个epoch里所有批次求和
            pred = torch.tensor([[1] if num[0] >= 0.5 else [0] for num in output]).cuda()
            correct += pred.eq(target.long()).sum().item()
    print('Valid set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(test_loss/len(val_dataloader), correct, len(val_dataloader.dataset),
                                                                                    100. * correct / len(val_dataloader.dataset)));
    val_loss_epoch.append(test_loss / len(val_dataloader));
    val_acc_epoch.append(correct / len(val_dataloader.dataset));

    # Save model
    val_acc_rate = correct / len(val_dataloader.dataset);
    save = True
    best = "best.pt"
    last = "last.pt"
    if save:
        # Save last, best and delete
        torch.save(model.state_dict(), last)
        if val_acc_rate == max(val_acc_epoch):
            torch.save(model.state_dict(), best)
            print("save epoch {} model".format(epoch))

    
# 5. drawing
draw_loss(train_loss_epoch, val_loss_epoch)
draw_acc(train_acc_epoch,val_acc_epoch)

The first step is to prepare the data. First use the DogCat class we defined earlier to load data, but this class inherits from dataset and loads a piece of data. If you want to load data in batches, you need to use another class DataLoader inside pytorch, and then pass in batchsize in the constructor to load data in batches. Note that the class object here is actually a generator, and you can always fetch data in batches through a loop.

The second step is to define the model object. If there is a graphics card, put the model on the graphics card. If not, use the CPU to run.

The third step is to define some hyperparameters. Because it is a binary classification, the last layer of the network is the probability value of the sigmoid output category, so the binary classification cross-entropy loss function is selected. Then set the learning rate and optimizer.

The fourth step is to train n epochs. Calculate the accuracy of the training set and the accuracy of the verification set in each epoch, and save the model.

The end result looks like this

If conditions permit, you can try to train a few more epochs.

Pytorch custom CNN network realizes cat and dog classification

1. Data preprocessing

2. Define the network

three. training model

Guess you like