Pytorch - Understand in one article how to use Pytorch to build and train a custom deep learning network (dataset customization and loading, model training, model testing, model saving and loading)


Using Pytorch to train a deep learning network model from scratch often requires the following steps: custom data set, loading custom data set, network model structure definition, defining loss function, defining optimizer, training model, testing model, saving and loading Model and other steps. The following describes in detail the necessary steps on how to build and train a deep learning network model from scratch.

1 Custom dataset

from torch.utils.data import Dataset
class torch.utils.data.Dataset

An abstract class that represents the Dataset. All other custom datasets should inherit the Dataset class and be forced to override _len_and _getitem_methods.

from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self):
        # TODO
        # 1. Initialize file path or list of file names.
        pass
    def __len__(self):
        # TODO
        # 1. Read one data from file (e.g. using numpy.fromfile, PIL.Image.open).
        # 2. Preprocess the data (e.g. torchvision.Transform).
        # 3. Return a data pair (e.g. image and label).
        pass
   	 def __getitem__(self, index):
       # TODO
       # You should change 0 to the total size of your dataset.
       pass

in:

  • init(): Write some initialization processes in this function, such as loading data from a file. This constructor can customize parameters, such as passing the data file path and label file path:
def __init__(self,data_path,label_path):
    pass

After passing the above path, the data and corresponding labels can be read from the above path and then used for model training. Of course, you can also add parameters that need to be used when loading the data set in the constructor, such as specifying whether the data needs to be normalized, whether it needs to be confused, whether it needs to be mirrored, etc.

  • len(): returns the number of all data

  • getitem(self, index): Returns the data and corresponding label of the specified index. In addition, data standardization, data confusion and data addition can be performed in this function.

Shaped like:

import matplotlib.pyplot as plt
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms
from PIL import Image
import os
image_transform = transforms.Compose([
    transforms.Resize(256),               # 把图片resize为256*256
    transforms.RandomCrop(224),           # 随机裁剪224*224
    transforms.RandomHorizontalFlip(),    # 水平翻转
    transforms.ToTensor(),                # 将图像转为Tensor
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])   # 标准化
])
 
class DogVsCatDataset(Dataset):   # 创建一个叫做DogVsCatDataset的Dataset,继承自父类torch.utils.data.Dataset
    def __init__(self, root_dir, train=True, transform=None):
        """
        Args:
            root_dir (string): Directory with all the images.
            transform (callable, optional): Optional transform to be applied on a sample.
        """
        self.root_dir = root_dir
        self.img_path = os.listdir(self.root_dir)
        if train:
            self.img_path = list(filter(lambda x: int(x.split('.')[1]) < 10000, self.img_path))    # 划分训练集和验证集
        else:
            self.img_path = list(filter(lambda x: int(x.split('.')[1]) >= 10000, self.img_path))
        self.transform = transform
 
    def __len__(self):
        return len(self.img_path)
 
    def __getitem__(self, idx):
        image = Image.open(os.path.join(self.root_dir, self.img_path[idx]))
        label = 0 if self.img_path[idx].split('.')[0] == 'cat' else 1        # label, 猫为0,狗为1
        if self.transform:
            image = self.transform(image)
        label = torch.from_numpy(np.array([label]))
        return image, label
 
if __name__ == '__main__':
    catanddog_dataset = DogVsCatDataset(root_dir='/Users/wangpeng/Desktop/train',
                                        train=False,
                                        transform=image_transform)
    train_loader = DataLoader(catanddog_dataset, batch_size=8, shuffle=True, num_workers=4)   # num_workers=4表示用4个线程读取数据
    image, label = iter(train_loader).next()   # iter()函数把train_loader变为迭代器,然后调用迭代器的next()方法
    sample = image[0].squeeze()
    sample = sample.permute((1, 2, 0)).numpy()
    sample *= [0.229, 0.224, 0.225]
    sample += [0.485, 0.456, 0.406]
    sample = np.clip(sample, 0, 1)
    plt.imshow(sample)
    plt.show()
    print('Label is: {}'.format(label[0].numpy()))

Custom datasets can also refer to:

  1. https://blog.csdn.net/public669/article/details/97533974
  2. https://www.cnblogs.com/picassooo/p/12846617.html

2 Load custom data set

from torch.utils.data import DataLoader
class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, num_workers=0, collate_fn=<function default_collate>, pin_memory=False, drop_last=False)

parameter:

  • dataset ( Dataset ) – The dataset into which to load data. This parameter can pass a custom dataset object
  • batch_size ( int , optional) – How many samples to load in each batch (default: 1).
  • shuffle ( bool , optional) – TrueWhen to reshuffle the data every epoch (default: False).
  • sampler ( Sampler , optional) – Defines the strategy for extracting samples from the dataset. If specified, shufflethe argument is ignored.
  • num_workers ( int , optional) – How many subprocesses to use to load data. 0 means the data will be loaded in the main process (default: 0)
  • collate_fn ( callable , optional) – a function that combines a list of samples into a mini-batch
  • pin_memory ( bool , optional) – If set to True, the data loader will copy tensors to CUDA pinned memory before returning them.
  • drop_last ( bool , optional) – If the data set size is not divisible by the batch size, set to True to delete the last incomplete batch. If set to False and the size of the dataset is not divisible by the batch size, the last batch will be smaller. (Default: False). This is for the last unfinished batch. For example, if your batch_size is set to 64, and an epoch only has 100 samples, then the next 36 samples will be thrown away during training. If it is False (default), Then it will continue to execute normally, but the final batch_size will be smaller.
  • timeout (numeric, optional): If it is a positive number, it indicates the time to wait to collect a batch from the worker process. If it has not been collected after the set time, this content will not be collected. This number should always be greater than or equal to 0. Default is 0
  • worker_init_fn (callable, optional): Each worker initialization function. If not None, this will be called on each

Generally, the method of loading a custom data set is:

mydataset = MyDataset()

dataloader = DataLoader(dataset=mydataset,batch_size=32,shuffle=True,num_workers=8,drop_last=True)

for batch_index ,(data,label) in enumerate(dataloader):
    data = Variable(data.cuda(0),requires_grad=False)
    label = Variable(label.cuda(0),requires_grad=False)

3 Define the model

class torch.nn.Module

The base class of all network models. All customized network models need to inherit this class and override forwardmethods forwordto define the calculation steps for each execution. Generally, registering the network model in the constructor of a custom network model _init_requires using various neural network layers, and then forwardusing the various neural network layers set in the constructor to combine them in the function to build a custom neural network.

A simple network model is defined as follows:

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4 * 4 * 50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

Commonly used member function methods of class torch.nn.Module:

  • cpu(device_id=None) : Copy all model parameters ( parameters) andbuffersCPU

  • cuda(device_id=None) : All model parameters ( parameters) and buffersassignments GPU. If device_id is specified, all model parameters will be copied to the specified device.

  • double() : Convert the data type of sum parametersto .buffersdouble

  • float() : Convert the data type of sum parametersto .buffersfloat

  • half() : Convert the data type of sum parametersto .buffershalf

  • eval() : Set the model to evaluationmode, which will only affect if there are Dropoutand in the model BatchNorm.

  • train(mode=True) : will modulebe set to training mode. DropoutOnly has an effect if and is in the model BatchNorm.

  • zero_grad() : moduleSet the gradient of all model parameters in to 0.

4 Define the loss function

After the model network is defined, we need to define the loss function that needs to be used during model training, in the form:

loss = nn.CrossEntropyLoss().cuda(output_device)

There are many loss functions defined in pytorch, such as:

  • torch.nn.L1Loss(size_average=True) : Creates a measure of the average of the absolute value of the difference between the input x( 模型预测输出) and the target .y
  • torch.nn.MSELoss(size_average=True) : Creates a measure of the mean squared error between the input x( 模型预测输出) and the target .y
  • torch.nn.CrossEntropyLoss(weight=None, size_average=True) : This criterion integrates LogSoftMaxthe NLLLosssum into a class. This method is very useful when training a multi-class classifier.
  • torch.nn.NLLLoss(weight=None, size_average=True) : Negative log likelihood lossloss. Used to train a nclass classifier.
  • torch.nn.NLLLoss2d(weight=None, size_average=True) : For pictures negative log likehood loss. Calculated for each pixel NLL loss.
  • torch.nn.KLDivLoss(weight=None, size_average=True) : Calculate KL divergence loss. KL divergence is often used to describe the distance between two distributions and is useful for performing direct regression on the space of the output distributions.
  • torch.nn.BCELoss(weight=None, size_average=True) : Compute the binary cross entropy between and target.output
  • torch.nn.MarginRankingLoss(margin=0, size_average=True)
  • torch.nn.HingeEmbeddingLoss(size_average=True) : This lossis usually used to measure whether two inputs are similar, that is, using L1 pairwise distance. Typically used in learning nonlinear embeddingor semi-supervised learning.
  • torch.nn.MultiLabelMarginLoss(size_average=True) : Compute multi-label classification hinge loss( margin-based loss).
  • torch.nn.SmoothL1Loss(size_average=True) : Smooth version L1 loss.
  • torch.nn.SoftMarginLoss(size_average=True) : Create a standard to optimize 2 categories logistic loss.
  • torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=True) : Create a standard that max-entropyoptimizes multi-label one-versus-allloss based on input x and target y.
  • torch.nn.CosineEmbeddingLoss(margin=0, size_average=True) : This standard uses cosinedistance to measure whether two inputs are similar. It is generally used to learn nonlinear embeddingor semi-supervised learning.
  • torch.nn.MultiMarginLoss(p=1, margin=1, weight=None, size_average=True):用来计算multi-class classification的hinge loss(magin-based loss)。

It is usually necessary to select an appropriate loss function based on the specific task type required. For example, when facing a multi-classification task, torch.nn.CrossEntropyLossit is most appropriate to choose the loss function at this time.

5 Define the optimizer

torch.optim

torch.optimIt is a library that implements various optimization algorithms, used to save the current model parameter status and update the model parameters based on the calculated gradients. We can set the parameters of the optimizer, such as learning rate, weight decay, etc.

To define the optimizer in pytorch, please refer to:

optimizer = optim.SGD(model.parameters(), lr = 0.01, momentum=0.9)
optimizer = optim.Adam([var1, var2], lr = 0.0001)

All optimizers implement step()methods that update all parameters. Once the gradient has been backward()calculated by a function like , we can call this function.

To use the optimizer to update parameters during a single training process, please refer to the following:

# 定义损失函数
loss_fn = nn.CrossEntropyLoss().cuda(output_device)

# 定义优化器
optimizer = optim.Adam([var1, var2], lr = 0.0001)

for batch_index ,(data,label) in enumerate(dataloader):
    data = Variable(data.cuda(0),requires_grad=False)
    label = Variable(label.cuda(0),requires_grad=False)
    
    # 清空所有被优化过的梯度
    optimizer.zero_grad()
    
    # 执行模型forward
    pred_label = model(data)
    
    # 损失函数比较差异
    loss = loss_fn(pred_label, label)
    
    # 反向传播
    loss.backward()
    
    # 优化模型参数
    optimizer.step()

6. Training model

6.1 Preparatory steps before model training

  1. Custom dataset
  2. Define network model
  3. Define loss function
  4. Define optimizer

For specific steps, please refer to the content in Sections 1-5 above.

6.2 Necessary steps for model training

  1. Load dataset
  2. Create model object
  3. Define the number of training times epoch
  4. The model is trained every epoch

In a single epoch, you usually need to model.train()switch the model to the train state first, then dataloaderload the data set and traverse the data and corresponding labels optimizer.zero_grad()using Get the predicted result, use the defined loss function to calculate the loss between the predicted result and the real label, use this loss for backpropagation, and then use the calculated optimizer.step()gradient to update the model parameters.

The sample code of a single training process is as follows:

def train(args, model, device, train_loader, optimizer, epoch):
    # 切换到train状态
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        # 从数据集获取数据以及对应标签
        data, target = data.to(device), target.to(device)
        # 清空所有优化过的梯度
        optimizer.zero_grad()
        # 执行模型forward
        output = model(data)
        # 计算损失
        loss = torch.nn.functional.nll_loss(output, target)
        # 反向传播
        loss.backward()
        # 优化模型参数
        optimizer.step()

7 Test model

During the model testing process, you need to model.evalset the model to evaluationmode first, and then use with torch.no_grad()the wrapped test code. The with torch.no_grad()wrapped code does not need to track the reverse gradient calculation. Then use dataloaderto load the data set and traverse the data and corresponding labels, and then use the currently traversed data to input it into the model to get the predicted results.

The sample code for a single test model is as follows:

def test(args, model, device, test_loader):
    # 切换到evaluation`模式
    model.eval()
    test_loss = 0
    correct = 0
    # 标志不进行反向传播
    with torch.no_grad():
        for data, target in test_loader:
            # 从数据集获取数据以及对应标签
            data, target = data.to(device), target.to(device)
            # 执行模型forward
            output = model(data)
            test_loss += torch.nn.functional.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            # 获取推理的最大可能标签
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

8 Save and load models

8.1 Saving and loading the entire model

Save the entire model:

torch.save(model_object, 'model.pt')

Load the entire model:

model = torch.load('model.pt')

8.2 Only save and load model parameters

Save model parameters:

torch.save(model_object.state_dict(), 'model.pt')

Load model parameters:

model_object.load_state_dict(torch.load('model.pt'))

9 pytorch CNN code practical instructions

import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4 * 4 * 50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)


def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))


def test(args, model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


def main():
    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    parser.add_argument('--batch-size', type=int, default=64, metavar='N',
                        help='input batch size for training (default: 64)')
    parser.add_argument('--test-batch-size', type=int, default=1000, metavar='N',
                        help='input batch size for testing (default: 1000)')
    parser.add_argument('--epochs', type=int, default=10, metavar='N',
                        help='number of epochs to train (default: 10)')
    parser.add_argument('--lr', type=float, default=0.01, metavar='LR',
                        help='learning rate (default: 0.01)')
    parser.add_argument('--momentum', type=float, default=0.5, metavar='M',
                        help='SGD momentum (default: 0.5)')
    parser.add_argument('--no-cuda', action='store_true', default=False,
                        help='disables CUDA training')
    parser.add_argument('--seed', type=int, default=1, metavar='S',
                        help='random seed (default: 1)')
    parser.add_argument('--log-interval', type=int, default=10, metavar='N',
                        help='how many batches to wait before logging training status')
    parser.add_argument('--save-model', action='store_true', default=False,
                        help='For Saving the current Model')

    args = parser.parse_args()
    use_cuda = not args.no_cuda and torch.cuda.is_available()
    torch.manual_seed(args.seed)
    device = torch.device("cuda:0" if use_cuda and torch.cuda.is_available() else "cpu")
    print(device)

    kwargs = {
    
    'num_workers': 4, 'pin_memory': True} if use_cuda else {
    
    }
    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./mnist', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=args.batch_size, shuffle=True, **kwargs)
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./mnist', train=False, transform=transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])),
        batch_size=args.test_batch_size, shuffle=True, **kwargs)

    model = Net().to(device)
    optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)

    test(args, model, device, test_loader)

    if (args.save_model):
        torch.save(model.state_dict(), "mnist_cnn.pt")

if __name__ == '__main__':
    main()

9.1 Define the model

A simple CNN model is defined using the following code:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4 * 4 * 50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

9.2 Data set loading

In the above code, the MNIST data set that comes with pytorch is first used as the data set, and DataLoaderthe data set is loaded using

    train_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./mnist', train=True, download=True,
                       transform=transforms.Compose([
                           transforms.ToTensor(),
                           transforms.Normalize((0.1307,), (0.3081,))
                       ])),
        batch_size=args.batch_size, shuffle=True, **kwargs)
    test_loader = torch.utils.data.DataLoader(
        datasets.MNIST('./mnist', train=False, transform=transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
        ])),
        batch_size=args.test_batch_size, shuffle=True, **kwargs)

9.3 Define the loss function

import torch.nn.functional as F

9.4 Defining the optimizer

optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

9.5 Model training

Single epoch model training:

def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                       100. * batch_idx / len(train_loader), loss.item()))

Multiple epoch model training:

    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)

9.6 Model testing

def test(args, model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

9.7 Model saving

    if (args.save_model):
        torch.save(model.state_dict(), "mnist_cnn.pt")

reference link

  1. https://pytorch-cn.readthedocs.io/zh/latest/

  2. https://blog.csdn.net/public669/article/details/97752226

If you are interested, you can visit my website: https://www.stubbornhuang.com/

Guess you like

Origin blog.csdn.net/HW140701/article/details/122090966