Why does machine learning need to be trained in batches?

This article continues to use the CIFAR-10 data set to demonstrate the machine learning steps. This article can be read together with the previous blog to read the detailed explanation and visualization of the CIFAR-10 data set - CSDN Blog


The significance of batch training

In the last blog, we obtained the contents of the dataset through the torchvision.datasets.CIFAR10 statement, and converted the data into the storage form of image tensor through transforms. In terms of method, we can actually put the data directly into ourselves. The training is carried out in the defined model, but from the perspective of the data set, the training data set has more than 50,000 pictures and corresponding labels, and the test set also has more than 10,000 pictures and corresponding labels. It is trained once for each iteration. Putting all the training set data into the model to update the parameters will not only extend the period of each parameter update and risk memory overflow, but also make the trained model more affected by the noise in the data set itself. Comprehensive analysis, the significance of batch training in machine learning is summarized as follows:

  • Improve computational efficiency: When the training data set is very large, loading all the data into memory for training at once may cause memory overflow. By dividing the dataset into batches, you can use part of the data for model training in each batch, thus saving memory resources.
  • Reduce the impact of noise: For noisy data, if it is processed all at once, it may have a great impact on the model, resulting in reduced model performance. Processing data in batches can reduce the impact of noisy data on the model.
  • Error back propagation: During the neural network training process, a back propagation algorithm is needed to adjust the weight of the neuron, which requires calculating the gradient of the loss function with respect to the weight. If a batch training method is used, gradients can be calculated and weights updated in each batch, which can speed up the training process. At the same time, the gradient direction of each batch can also provide a certain degree of randomness, which helps the model jump out of the local minimum and find better optimization results.
  • Can better fit data patterns: One of the advantages of batch training is that the model can learn some potential patterns within a batch, which may lead to better results in the next batch of training.

Code implementation for batch processing of data sets

Libraries that need to be imported:

import torch
from torchvision import transforms,
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10

If you download the data set/downloaded, read the data set from the path. Pay attention to the setting of the download parameter.

# 获得数据集数据
my_trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

# 注意以下路径更改为你需要下载数据集的路径或者已经下载好的数据集的路径
train_dataset = CIFAR10('D:/deep_learning/12_16/data/', train=True, transform=my_trans, download=False)
test_dataset = CIFAR10('D:/deep_learning/12_16/data/', train=False, transform=my_trans, download=False)

Make batch settings

# 批次迭代器的生成
train_boarder = DataLoader(train_dataset, batch_size=10, shuffle=True)
test_boarder = DataLoader(test_dataset, batch_size=10)

Batch_size is the data size of each batch. Here, it is set so that each batch has 10 data for training. Shuffle means disrupting the order of the training data to improve the generalization ability of the model.

It is worth noting that the Dataloader function returns a generator, which is an iterative return. Its function is returned through yeid. We use code to print out the generated train_boarder

<torch.utils.data.dataloader.DataLoader object at 0x000001550DFA0EB0>

To have an intuitive view of the data, one way is to convert the generated data into an iterator

# 生成迭代器
train_boarder_see = iter(train_boarder)

# 赋值到第一批次(前10个数据)的图像和对应标签
imgs,labels = next(train_boarder_see)
print(imgs.shape)
print(labels.shape)

The printed result is

# imgs
torch.Size([10, 3, 32, 32])
# labels
torch.Size([10])

It can be seen that the imgs data is already in the form of a standard image tensor, that is, [B, C, W, H], corresponding to the number of batches, number of channels, pixel width, pixel height, and labels are also ten in this batch. 10 label data corresponding to each data image. After processing in this way, the data can be easily put into the model for training; the unification of the data format also facilitates the calculation of the loss function and the update of the optimizer.

Practical comparison

In order to let everyone feel the effect of batch training more intuitively, this blog comes with a resource code that uses the CIFAR10 data set to train and evaluate the model. You can make adjustments in the training batch without using batch training. To use batch training, set batch_size to 1, and to use batch training, set batch_size to the corresponding batch size.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import transforms
from torch.utils.data import DataLoader
from torchvision.datasets import CIFAR10
import time

# 设置批次的大小
train_batch_size = 1
test_batch_size = 1

# 设置数据集的下载/储存路径
data_path = r"这里放入CIFAR10的路径"


# 创建多进程时防止报错
if __name__=='__main__':

    my_trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
    # my_trans = transforms.Compose([transforms.ToTensor()])
    train_dataset = CIFAR10(data_path, train=True, transform=my_trans, download=False)
    test_dataset = CIFAR10(data_path, train=False, transform=my_trans, download=False)

    train_boarder = DataLoader(train_dataset, batch_size=train_batch_size, num_workers=5, shuffle=True)
    test_boarder = DataLoader(test_dataset, batch_size=test_batch_size, num_workers=5)

    
    # 搭建网络模型
    class CNNnet(nn.Module):
        def __init__(self):
            super(CNNnet,self).__init__()
            self.layer1 = nn.Sequential(nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5, stride=2), nn.MaxPool2d(2, 1))
            self.layer2 = nn.Sequential(nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1), nn.MaxPool2d(2, 1))
            self.fc1 = nn.Linear(3200, 128)
            self.out = nn.Linear(128, 10)

        def forward(self,x):
            x = F.relu(self.layer1(x))
            x = F.relu(self.layer2(x))
            # print(x.shape) # [4, 32, 5, 5]
            x = x.view(x.size(0), -1)
            x = F.relu(self.fc1(x))
            x = F.softmax(self.out(x), dim=1)
            return x

    # 模型送入GPU
    device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
    print(device)
    model = CNNnet()
    model.to(device)

    # 定义损失函数和优化器
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    loss_funcation = nn.CrossEntropyLoss()

    # 模型训练
    epoch_num = 10
    time1 = time.perf_counter() 
    for epoch in range(epoch_num):
        train_loss = 0
        train_acc = 0
        model.train()
        for imgs, labels in train_boarder:
            imgs = imgs.to(device)
            labels = labels.to(device)

            out = model(imgs)
            loss = loss_funcation(out, labels)
            loss.backward()
            optimizer.step()
            
            optimizer.zero_grad()

            train_loss += loss.item()

            _,pred = out.max(1)
            correct_num = (pred == labels).sum().item()
            train_acc += correct_num/imgs.size(0)

        train_acc_all = train_acc/len(train_boarder)
        train_loss_all = train_loss/len(train_boarder)

        model.eval()
        test_loss = 0
        test_acc = 0
        for imgs, labels in test_boarder:
            imgs = imgs.to(device)
            labels = labels.to(device)

            out = model(imgs)
            loss = loss_funcation(out, labels)
            test_loss += loss.item()

            _,pred = out.max(1)
            correct_num = (pred == labels).sum().item()
            test_acc += correct_num/imgs.size(0)

        test_acc_all = test_acc/len(test_boarder)
        test_loss_all = test_loss/len(test_boarder)

        print('epoch:{}, Train Loss:{:.4f}, Train Acc:{:.4f}, Test Loss:{:.4f}, Test Acc:{:.4f}'.format(epoch, train_loss_all, \
        train_acc_all, test_loss_all, test_acc_all))
    time2 = time.perf_counter()

    time_using = time2-time1
    print("消耗的时间:",time_using)

A relatively simple model is used for training in the code, and the number of iterative trainings is only set to 10. The main purpose is to compare the effects of batch training and print the results without batch training.

Training results with batch set to 10

It is very obvious that when training is not divided into batches, the accuracy of the model is not greatly improved, and it consumes a lot of time. After training in batches, the accuracy of the model is significantly improved, and the time consumed is significantly reduced.

Everyone is welcome to discuss and exchange~


Supongo que te gusta

Origin blog.csdn.net/weixin_57506268/article/details/135095909
Recomendado
Clasificación