Convolutional neural network (ResNet-18) recognizes Fashion-MNIST dataset (Pytorch version)

1 Introduction

1.1 Case introduction

This case uses Pytorch to build a ResNet network structure for image classification of the Fashion-MNIST dataset. The analysis for this problem can be divided into data preparation, model building, training using the training set and testing the effect of the model using the test set.

1.2 Environment Configuration

⑴Operating system: Windows10
⑵Compiler environment: PyCharm Community Edition 2021.2
⑶Configuration environment: Pytorch1.7.1 + torchvision8.2 + CUDA11.3

1.3 Module import

In this case, the following library files and related modules need to be imported:

import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
import copy
import time
import torch
import torch.nn as nn
from torch.optim import Adam
import torch.utils.data as Data
from torchvision import transforms
from torchvision.datasets import FashionMNIST

2. Image data preparation

Before building and training the model, first prepare the FashionMNIST dataset, which can be read directly using the FashionMNIST() function of the datasets module in the torchvision library . If there is no current data in the specified working folder, it can be automatically downloaded from the Internet. data.

2.1 Preparation of training verification set

The loading handler of the training verification set is packaged into the following train_data_process() function , its function is to import the training data set, and then use the Data.DataLoader() function to define it as a data loader, each batch will contain 64 For samples, the number of batches contained in the data loader can be calculated through the len() function , and the output shows that there are 938 batches contained in the train_loader . It should be noted that the parameter shuffle = False means that the samples used by each batch in the loader are fixed, which is conducive to dividing it into a training set and a verification set according to the number of iterations when training the model. At the same time, in order to observe the content of each image in the data set, a batch of images can be obtained , and then visualized to observe the data.

# 处理训练集数据
def train_data_process():
    # 加载FashionMNIST数据集
    train_data = FashionMNIST(root="./data/FashionMNIST",  # 数据路径
                              train=True,  # 只使用训练数据集
                              transform=transforms.Compose([transforms.Resize(size=224), transforms.ToTensor()]),  # 把PIL.Image或者numpy.array数据类型转变为torch.FloatTensor类型
                                                                                                                   # 尺寸为Channel * Height * Width,数值范围缩小为[0.0, 1.0]
                              download=False,  # 若本身没有下载相应的数据集,则选择True
                              )
    train_loader = Data.DataLoader(dataset=train_data,  # 传入的数据集
                                   batch_size=64,  # 每个Batch中含有的样本数量
                                   shuffle=False,  # 不对数据集重新排序
                                   num_workers=0,  # 加载数据所开启的进程数量
                                   )
    print("The number of batch in train_loader:", len(train_loader))  # 一共有938个batch,每个batch含有64个训练样本

    # 获得一个Batch的数据
    for step, (b_x, b_y) in enumerate(train_loader):
        if step > 0:
            break
    batch_x = b_x.squeeze().numpy()  # 将四维张量移除第1维,并转换成Numpy数组
    batch_y = b_y.numpy()  # 将张量转换成Numpy数组
    class_label = train_data.classes  # 训练集的标签
    class_label[0] = "T-shirt"
    print("the size of batch in train data:", batch_x.shape)
    
    # 可视化一个Batch的图像
    plt.figure(figsize=(12, 5))
    for ii in np.arange(len(batch_y)):
        plt.subplot(4, 16, ii+1)
        plt.imshow(batch_x[ii, :, :], cmap=plt.cm.gray)
        plt.title(class_label[batch_y[ii]], size=9)
        plt.axis("off")
        plt.subplots_adjust(wspace=0.05)
    plt.show()
    
    return train_loader, class_label

The resulting visualization is as follows:
insert image description here

Note: Since the input size of the ResNet model is 224, here we expand the size of the Fashion-MNIST dataset to 224, and each batch size is 64, so the size of each mini-batch is 64×224×224.

2.2 Preparation of test set

The loading handler of the test set is packaged into the following test_data_process() function . Its function is to import the test data set, expand its size to 96, and process all samples as a whole, as a batch for testing. .

# 处理测试集数据
def test_data_process():
    test_data = FashionMNIST(root="./data/FashionMNIST",  # 数据路径
                             train=False,  # 不使用训练数据集
                             transform=transforms.Compose([transforms.Resize(size=224), transforms.ToTensor()]),  # 把PIL.Image或者numpy.array数据类型转变为torch.FloatTensor类型
                                                                                                                  # 尺寸为Channel * Height * Width,数值范围缩小为[0.0, 1.0]
                             download=False,  # 如果前面数据已经下载,这里不再需要重复下载
                             )
    test_loader = Data.DataLoader(dataset=test_data,  # 传入的数据集
                                  batch_size=1,  # 每个Batch中含有的样本数量
                                  shuffle=True,  # 不对数据集重新排序
                                  num_workers=0,  # 加载数据所开启的进程数量
                                   )

    # 获得一个Batch的数据
    for step, (b_x, b_y) in enumerate(test_loader):
        if step > 0:
            break
    batch_x = b_x.squeeze().numpy()  # 将四维张量移除第1维,并转换成Numpy数组
    batch_y = b_y.numpy()  # 将张量转换成Numpy数组
    print("The size of batch in test data:", batch_x.shape)

    return test_loader

3. Construction of Convolutional Neural Network

3.1 Creation of residual blocks

Suppose the input is xxx , the ideal map we want to get isf ( x ) f(x)f ( x ) . As shown in the figure below, the part in the dotted box on the left needs to directly fit the mappingf ( x ) f(x)f ( x ) , and the part in the dotted box on the right needs to fit the residual map f ( x ) − xf(x)−xof the identity mapf ( x ) x . Residual maps are often easier to optimize in practice. Take the identity map as the ideal map we wantf ( x ) f(x)f ( x ) , just set the weight and bias parameters of the upper weighting operation (such as affine) in the dotted line box on the right to 0, thenf ( x ) f(x)f ( x ) is the identity map. In fact, when the ideal mapf ( x ) f(x)When f ( x ) is very close to the identity mapping, the residual mapping is also easy to capture the subtle fluctuations of the identity mapping. On the right isResNet,the residual block. In a residual block, inputs can be propagated forward more quickly through the data wires across layers.
insert image description here

ResNet follows the design of the full 3×3 convolutional layer in the VGG network . In the residual block, there are first two 3×3 convolutional layers with the same number of output channels, each convolutional layer is followed by a batch normalization layer and a ReLU activation function, and then the input is skipped after these two convolutional operations Add it directly before the final ReLU activation function. Such a design requires that the outputs of the two convolutional layers have the same shape as the inputs so that they can be summed. If you want to change the number of channels, you need to introduce an additional 1×1 convolutional layer to transform the input into the required shape before adding.

class Residual(nn.Module):
    def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):  # 输入通道数,输出通道数,使能1x1卷积,步长
        super(Residual, self).__init__()

        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)  # 定义第一个卷积块
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)  # 定义第二个卷积块

        # 定义1x1卷积块
        if use_1x1conv:
            self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)
        else:
            self.conv3 = None

        # Batch归一化
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.bn2 = nn.BatchNorm2d(out_channels)

    # 定义前向传播路径
    def forward(self, x):
        y = nn.functional.relu(self.bn1(self.conv1(x)))
        y = self.bn2(self.conv2(y))
        if self.conv3:
            x = self.conv3(x)

        return nn.functional.relu(y + x)

3.2 Creation of ResNet module

ResNet uses 4 modules composed of residual blocks, and each module uses several residual blocks with the same number of output channels, where the number of channels of the first module is the same as the number of input channels. Since a max pooling layer with a stride of 2 has been used previously, there is no need to reduce the height and width. In each subsequent block, the number of channels of the previous block is doubled and the height and width are halved in the first residual block.

def resnet_block(in_channels, out_channels, num_residuals, first_block=False):
    if first_block:
        assert in_channels == out_channels  # 第一个模块的通道数同输入通道数一致
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))
        else:
            blk.append(Residual(out_channels, out_channels))

    return nn.Sequential(*blk)

3.3 Creation of ResNet network

The first two layers of ResNet are the same as GoogLeNet . After the 7×7 convolutional layer with 64 output channels and a stride of 2, it is followed by a maximum pooling layer with a stride of 2 and a window size of 3×3. The difference is The reason is that ResNet adds a batch normalization layer after each convolutional layer . Then we add all residual blocks for ResNet , here each module uses two residual blocks . Finally, the global average pooling layer is added and then the output of the fully connected layer is added .
insert image description here
insert image description here

Here, there are 4 convolutional layers in each module (not counting the 1×1 convolutional layer), plus the initial convolutional layer and the last fully connected layer, a total of 18 layers, so this model is often called ResNet -18 . Different ResNet models can be obtained by configuring different numbers of channels and residual blocks in the module , such as a deeper ResNet-152 with 152 layers . Although the main structure of ResNet is similar to that of GoogLeNet , the structure of ResNet is simpler and more convenient to modify.

# 定义一个全局平均池化层
class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()

    def forward(self, x):
        return nn.functional.avg_pool2d(x, kernel_size=x.size()[2:])  # 池化窗口形状等于输入图像的形状

# 定义ResNet网络结构
def ResNet():
    net = nn.Sequential(
        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))

    net.add_module("resnet_block1", resnet_block(64, 64, 2, first_block=True))
    net.add_module("resnet_block2", resnet_block(64, 128, 2))
    net.add_module("resnet_block3", resnet_block(128, 256, 2))
    net.add_module("resnet_block4", resnet_block(256, 512, 2))
    net.add_module("global_avg_pool", GlobalAvgPool2d())  # GlobalAvgPool2d的输出: (Batch, 512, 1, 1)
    net.add_module("fc", nn.Sequential(nn.Flatten(), nn.Linear(512, 10)))

    return net

4. Convolutional Neural Network Training and Prediction

In order to train the network structure ResNet , a train_model() function is defined , which is used to train the ResNet network using the training data set . The training data set contains 60,000 images, which are divided into 938 batches , of which 80% of the batches are used for model training, and 20% of the batches are used for model verification. Therefore, in the train_model() function , it includes model training and Verify both processes.

# 定义网络的训练过程
def train_model(model, traindataloader, train_rate, criterion, device, optimizer, num_epochs=25):
    '''
    :param model: 网络模型
    :param traindataloader: 训练数据集,会切分为训练集和验证集
    :param train_rate: 训练集batch_size的百分比
    :param criterion: 损失函数
    :param device: 运行设备
    :param optimizer: 优化方法
    :param num_epochs: 训练的轮数
    '''

    batch_num = len(traindataloader)  # batch数量
    train_batch_num = round(batch_num * train_rate)  # 将80%的batch用于训练,round()函数四舍五入
    best_model_wts = copy.deepcopy(model.state_dict())  # 复制当前模型的参数
    # 初始化参数
    best_acc = 0.0  # 最高准确度
    train_loss_all = []  # 训练集损失函数列表
    train_acc_all = []  # 训练集准确度列表
    val_loss_all = []  # 验证集损失函数列表
    val_acc_all = []  # 验证集准确度列表
    since = time.time()  # 当前时间
    # 进行迭代训练模型
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # 初始化参数
        train_loss = 0.0  # 训练集损失函数
        train_corrects = 0  # 训练集准确度
        train_num = 0  # 训练集样本数量
        val_loss = 0.0  # 验证集损失函数
        val_corrects = 0  # 验证集准确度
        val_num = 0  # 验证集样本数量
        # 对每一个mini-batch训练和计算
        for step, (b_x, b_y) in enumerate(traindataloader):
            b_x = b_x.to(device)
            b_y = b_y.to(device)
            if step < train_batch_num:  # 使用数据集的80%用于训练
                model.train()  # 设置模型为训练模式,启用Batch Normalization和Dropout
                output = model(b_x)  # 前向传播过程,输入为一个batch,输出为一个batch中对应的预测
                pre_lab = torch.argmax(output, 1)  # 查找每一行中最大值对应的行标
                loss = criterion(output, b_y)  # 计算每一个batch的损失函数
                optimizer.zero_grad()  # 将梯度初始化为0
                loss.backward()  # 反向传播计算
                optimizer.step()  # 根据网络反向传播的梯度信息来更新网络的参数,以起到降低loss函数计算值的作用
                train_loss += loss.item() * b_x.size(0)  # 对损失函数进行累加
                train_corrects += torch.sum(pre_lab == b_y.data)  # 如果预测正确,则准确度train_corrects加1
                train_num += b_x.size(0)  # 当前用于训练的样本数量
            else:  # 使用数据集的20%用于验证
                model.eval()  # 设置模型为评估模式,不启用Batch Normalization和Dropout
                output = model(b_x)  # 前向传播过程,输入为一个batch,输出为一个batch中对应的预测
                pre_lab = torch.argmax(output, 1)  # 查找每一行中最大值对应的行标
                loss = criterion(output, b_y)  # 计算每一个batch中64个样本的平均损失函数
                val_loss += loss.item() * b_x.size(0)  # 将验证集中每一个batch的损失函数进行累加
                val_corrects += torch.sum(pre_lab == b_y.data)  # 如果预测正确,则准确度val_corrects加1
                val_num += b_x.size(0)  # 当前用于验证的样本数量

        # 计算并保存每一次迭代的成本函数和准确率
        train_loss_all.append(train_loss / train_num)  # 计算并保存训练集的成本函数
        train_acc_all.append(train_corrects.double().item() / train_num)  # 计算并保存训练集的准确率
        val_loss_all.append(val_loss / val_num)  # 计算并保存验证集的成本函数
        val_acc_all.append(val_corrects.double().item() / val_num)  # 计算并保存验证集的准确率
        print('{} Train Loss: {:.4f} Train Acc: {:.4f}'.format(epoch, train_loss_all[-1], train_acc_all[-1]))
        print('{} Val Loss: {:.4f} Val Acc: {:.4f}'.format(epoch, val_loss_all[-1], val_acc_all[-1]))

        # 寻找最高准确度
        if val_acc_all[-1] > best_acc:
            best_acc = val_acc_all[-1]  # 保存当前的最高准确度
            best_model_wts = copy.deepcopy(model.state_dict())  # 保存当前最高准确度下的模型参数
        time_use = time.time() - since  # 计算耗费时间
        print("Train and val complete in {:.0f}m {:.0f}s".format(time_use // 60, time_use % 60))

    # 选择最优参数
    model.load_state_dict(best_model_wts)  # 加载最高准确度下的模型参数
    train_process = pd.DataFrame(data={"epoch": range(num_epochs),
                                       "train_loss_all": train_loss_all,
                                       "val_loss_all": val_loss_all,
                                       "train_acc_all": train_acc_all,
                                       "val_acc_all": val_acc_all}
                                 )  # 将每一代的损失函数和准确度保存为DataFrame格式

    # 显示每一次迭代后的训练集和验证集的损失函数和准确率
    plt.figure(figsize=(12, 4))
    plt.subplot(1, 2, 1)
    plt.plot(train_process['epoch'], train_process.train_loss_all, "ro-", label="Train loss")
    plt.plot(train_process['epoch'], train_process.val_loss_all, "bs-", label="Val loss")
    plt.legend()
    plt.xlabel("epoch")
    plt.ylabel("Loss")
    plt.subplot(1, 2, 2)
    plt.plot(train_process['epoch'], train_process.train_acc_all, "ro-", label="Train acc")
    plt.plot(train_process['epoch'], train_process.val_acc_all, "bs-", label="Val acc")
    plt.xlabel("epoch")
    plt.ylabel("acc")
    plt.legend()
    plt.show()

    return model, train_process

Next, define a test_model() function , which uses the test set to test on the optimal model to verify the performance of the model.

# 测试模型
def test_model(model, testdataloader, device):
    '''
    :param model: 网络模型
    :param testdataloader: 测试数据集
    :param device: 运行设备
    '''

	# 初始化参数
    test_corrects = 0.0
    test_num = 0
    test_acc = 0.0
    # 只进行前向传播计算,不计算梯度,从而节省内存,加快运行速度
    with torch.no_grad():
        for test_data_x, test_data_y in testdataloader:
            test_data_x = test_data_x.to(device)
            test_data_y = test_data_y.to(device)
            model.eval()  # 设置模型为评估模式,不启用Batch Normalization和Dropout
            output = model(test_data_x)  # 前向传播过程,输入为测试数据集,输出为对每个样本的预测
            pre_lab = torch.argmax(output, 1)  # 查找每一行中最大值对应的行标
            test_corrects += torch.sum(pre_lab == test_data_y.data)  # 如果预测正确,则准确度val_corrects加1
            test_num += test_data_x.size(0)  # 当前用于训练的样本数量

    test_acc = test_corrects.double().item() / test_num  # 计算在测试集上的分类准确率
    print("test accuracy:", test_acc)

Finally, start training and testing the model, in which the optimization algorithm uses the Adam optimizer , the learning rate is set to 0.001, and the loss function is the cross entropy function. Then call the train_model() function to use 80% of the training set train_loader for training, 20% for verification, and a total of 25 rounds of training.

# 模型的训练和测试
def train_model_process(myconvnet):
    optimizer = torch.optim.Adam(myconvnet.parameters(), lr=0.001)  # 使用Adam优化器,学习率为0.001
    criterion = nn.CrossEntropyLoss()  # 损失函数为交叉熵函数
    device = 'cuda' if torch.cuda.is_available() else 'cpu'  # GPU加速
    train_loader, class_label = train_data_process()  # 加载训练集
    test_loader = test_data_process()  # 加载测试集

    myconvnet = myconvnet.to(device)
    myconvnet, train_process = train_model(myconvnet, train_loader, 0.8, criterion, device, optimizer, num_epochs=25)  # 开始训练模型
    test_model(myconvnet, test_loader, device)  # 使用测试集进行评估

During the model training process, the change curve of the loss function and classification accuracy is as follows. It can be seen that the loss function gradually decreases on the training set and fluctuates up and down on the validation set. The classification accuracy also increases gradually on the training set, but fluctuates slightly on the validation set.
insert image description here

In order to obtain the generalization ability of the calculation model, the test set is given to the trained model for prediction, so as to obtain the prediction accuracy on the test set (as shown in the figure below).
insert image description here

Note: For complex neural networks and large-scale data, using CPU to calculate may not be efficient enough, so it is necessary to move the model to GPU and use GPU to accelerate calculation.

5. Run the program

The following is the content of the main function, which returns a ResNet network structure, and trains and tests the convolutional neural network.

if __name__ == '__main__':
    model = ResNet()
    train_model_process(model)

Note: In the previous program, it was configured to use multiple processes to load the training set data at the same time. The use of multiple processes must be performed in the main() function, otherwise an error will be reported during execution.

Guess you like

Origin blog.csdn.net/baoli8425/article/details/120071221