Convolutional neural network (LeNet) recognizes Fashion-MNIST dataset (Pytorch version)

1 Introduction

1.1 Case introduction

This case uses Pytorch to build a network structure similar to LeNet-5 for image classification of the Fashion-MNIST dataset. The analysis for this problem can be divided into data preparation, model building, training using the training set and testing the effect of the model using the test set.

1.2 Environment Configuration

⑴Operating system: Windows10
⑵Compiler environment: PyCharm Community Edition 2021.2
⑶Configuration environment: Pytorch1.8 + torchvision9.0 + CUDA11.3

1.3 Module import

In this case, the following library files and related modules need to be imported:

import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
import copy
import time
import torch
import torch.nn as nn
from torch.optim import Adam
import torch.utils.data as Data
from torchvision import transforms
from torchvision.datasets import FashionMNIST

2. Image data preparation

Before building and training the model, first prepare the FashionMNIST dataset, which can be read directly using the FashionMNIST() function of the datasets module in the torchvision library . If there is no current data in the specified working folder, it can be automatically downloaded from the Internet. data.

2.1 Preparation of training verification set

The loading handler of the training verification set is packaged into the following train_data_process() function , its function is to import the training data set, and then use the Data.DataLoader() function to define it as a data loader, each batch will contain 64 For samples, the number of batches contained in the data loader can be calculated through the len() function , and the output shows that there are 938 batches contained in the train_loader . It should be noted that the parameter shuffle = False means that the samples used by each batch in the loader are fixed, which is conducive to dividing it into a training set and a verification set according to the number of iterations when training the model. At the same time, in order to observe the content of each image in the data set, a batch of images can be obtained , and then visualized to observe the data.

# 处理训练集数据
def train_data_process():
    # 加载FashionMNIST数据集
    train_data = FashionMNIST(root="./data/FashionMNIST",  # 数据路径
                              train=True,  # 只使用训练数据集
                              transform=transforms.ToTensor(),  # 把PIL.Image或者numpy.array数据类型转变为torch.FloatTensor类型
                                                                # 尺寸为Channel * Height * Width,数值范围缩小为[0.0, 1.0]
                              download=False,  # 若本身没有下载相应的数据集,则选择True
                              )
    train_loader = Data.DataLoader(dataset=train_data,  # 传入的数据集
                                   batch_size=64,  # 每个Batch中含有的样本数量
                                   shuffle=False,  # 不对数据集重新排序
                                   num_workers=2,  # 加载数据所开启的进程数量
                                   )
    print("The number of batch in train_loader:", len(train_loader))  # 一共有938个batch,每个batch含有64个训练样本

    # 获得一个Batch的数据
    for step, (b_x, b_y) in enumerate(train_loader):
        if step > 0:
            break
    batch_x = b_x.squeeze().numpy()  # 将四维张量移除第1维,并转换成Numpy数组
    batch_y = b_y.numpy()  # 将张量转换成Numpy数组
    class_label = train_data.classes  # 训练集的标签
    class_label[0] = "T-shirt"

    # 可视化一个Batch的图像
    plt.figure(figsize=(12, 5))
    for ii in np.arange(len(batch_y)):
        plt.subplot(4, 16, ii+1)
        plt.imshow(batch_x[ii, :, :], cmap=plt.cm.gray)
        plt.title(class_label[batch_y[ii]], size=9)
        plt.axis("off")
        plt.subplots_adjust(wspace=0.05)
    plt.show()

    return train_loader, class_label

The resulting visualization is as follows:
insert image description here

2.2 Preparation of test set

The loading handler of the test set is packaged into the following test_data_process() function . Its function is to import the test data set, process all the samples as a whole, and regard it as a batch for testing.

# 处理测试集数据
def test_data_process():
    test_data = FashionMNIST(root="./data/FashionMNIST",  # 数据路径
                             train=False,  # 不使用训练数据集
                             download=False,  # 如果前面数据已经下载,这里不再需要重复下载
                             )
    test_data_x = test_data.data.type(torch.FloatTensor) / 255.0  # 将数值范围缩小为[0.0, 1.0]
    test_data_x = torch.unsqueeze(test_data_x, dim=1)  # 为测试数据test_data_x添加一个维度,即通道数
    test_data_y = test_data.targets  # 测试集的标签
    print("test_data_x.shape:", test_data_x.shape)
    print("test_data_y.shape:", test_data_y.shape)

    return test_data_x, test_data_y

The output obtained is as follows, that is, the test set has 10,000 images of 28×28.
insert image description here

3. Construction of Convolutional Neural Network

After the data is prepared, a convolutional neural network can be built, and the network can be trained using the training data, and the recognition accuracy of the built network can be verified using the test set.
The built convolutional neural network (as shown in the figure below) has 2 convolutional layers , including 16 and 32 3×3 convolutional kernels respectively , and the ReLU activation function is used for activation after convolution, and the two pooling layers are Average pooling , while the two fully connected layers have 256 and 128 neurons respectively, and the final classifier contains 10 neurons.
insert image description here

The following program code defines a class ConvNet , and defines its structure and functions on the basis of inheriting the nn.Module class . A network structure consisting of two convolutional layers and three fully connected layers is defined by the nn.Sequential () function , and the forward propagation process of the data in the network is defined in the forward() function .

# 定义一个卷积神经网络
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()  # 对继承自父类Module的属性进行初始化

        # 定义第一个卷积层,16个3*3的卷积核,池化层为平均池化
        self.conv1 = nn.Sequential(nn.Conv2d(in_channels=1,  # 输入图像的通道数
                                             out_channels=16,  # 卷积核的数量
                                             kernel_size=3,  # 卷积核的大小
                                             stride=1,  # 步长
                                             padding=1,  # 填充的数量
                                             ),  # 经过卷积后的尺寸变化:(1*28*28) -> (16*28*28)
                                   nn.ReLU(),  # ReLU激活函数
                                   nn.AvgPool2d(kernel_size=2,  # 池化窗口的大小
                                                stride=2,  # 步长
                                                ),  # 经过池化后的尺寸变化:(16*28*28) -> (16*14*14)
                                   )
        # 定义第二个卷积层,32个3*3的卷积核,池化层为平均池化
        self.conv2 = nn.Sequential(nn.Conv2d(in_channels=16,  # 输入图像的通道数
                                             out_channels=32,  # 卷积核的数量
                                             kernel_size=3,  # 卷积核的大小
                                             stride=1,  # 步长
                                             padding=0,  # 填充的数量
                                             ),  # 经过卷积后的尺寸变化:(16*14*14) -> (32*12*12)
                                   nn.ReLU(),  # ReLU激活函数
                                   nn.AvgPool2d(kernel_size=2,  # 池化窗口的大小
                                                stride=2,  # 步长
                                                ),  # 经过池化后的尺寸变化:(32*12*12) -> (32*6*6)
                                   )
        # 定义全连接层
        self.classifier = nn.Sequential(nn.Linear(32*6*6, 256),  # 全连接层的输入为32*6*6=1152,输出为256
                                        nn.ReLU(),
                                        nn.Linear(256, 128),  # 全连接层的输入为256,输出为128
                                        nn.ReLU(),
                                        nn.Linear(128, 10)  # 全连接层的输入为128,输出为10
                                        )

    # 定义网络结构的前向传播路径
    def forward(self, x):
        x = self.conv1(x)  # 将数据集x输入给第一个卷积层
        x = self.conv2(x)  # 将第一个卷积层的输出给到第二个卷积层
        x = x.view(x.size(0), -1)  # 将第二个卷积层的输出展开成一维张量x
                                   # 四维张量x对应的维度为(batch_size,channels,x,y),其中x.size(0)对应batch_size
        output = self.classifier(x)  # 将展开的一维张量x给到全连接层和分类器

        return output

Print out the defined network structure as follows:
insert image description here

4. Convolutional Neural Network Training and Prediction

In order to train the network structure ConvNet , a train_model() function is defined , and the function of this function is to use the training data set to train ConvNet . The training data set contains 60,000 images, which are divided into 938 batches , of which 80% of the batches are used for model training, and 20% of the batches are used for model verification. Therefore, in the train_model() function , it includes model training and Verify both processes.

# 定义网络的训练过程
def train_model(model, traindataloader, train_rate, criterion, optimizer, num_epochs=25):
    '''
    :param model: 网络模型
    :param traindataloader: 训练数据集,会切分为训练集和验证集
    :param train_rate: 训练集batch_size的百分比
    :param criterion: 损失函数
    :param optimizer: 优化方法
    :param num_epochs: 训练的轮数
    '''

    batch_num = len(traindataloader)  # batch数量
    train_batch_num = round(batch_num * train_rate)  # 将80%的batch用于训练,round()函数四舍五入
    best_model_wts = copy.deepcopy(model.state_dict())  # 复制当前模型的参数
    # 初始化参数
    best_acc = 0.0  # 最高准确度
    train_loss_all = []  # 训练集损失函数列表
    train_acc_all = []  # 训练集准确度列表
    val_loss_all = []  # 验证集损失函数列表
    val_acc_all = []  # 验证集准确度列表
    since = time.time()  # 当前时间
    # 进行迭代训练模型
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # 初始化参数
        train_loss = 0.0  # 训练集损失函数
        train_corrects = 0  # 训练集准确度
        train_num = 0  # 训练集样本数量
        val_loss = 0.0  # 验证集损失函数
        val_corrects = 0  # 验证集准确度
        val_num = 0  # 验证集样本数量
        # 对每一个mini-batch进行训练和计算
        for step, (b_x, b_y) in enumerate(traindataloader):
            if step < train_batch_num:  # 使用数据集的80%用于训练
                model.train()  # 设置模型为训练模式,启用Batch Normalization和Dropout
                output = model(b_x)  # 前向传播过程,输入为一个batch,输出为一个batch中对应的预测
                pre_lab = torch.argmax(output, 1)  # 查找每一行中最大值对应的行标
                loss = criterion(output, b_y)  # 计算每一个batch的损失函数
                optimizer.zero_grad()  # 将梯度初始化为0
                loss.backward()  # 反向传播计算
                optimizer.step()  # 根据网络反向传播的梯度信息来更新网络的参数,以起到降低loss函数计算值的作用
                train_loss += loss.item() * b_x.size(0)  # 对损失函数进行累加
                train_corrects += torch.sum(pre_lab == b_y.data)  # 如果预测正确,则准确度train_corrects加1
                train_num += b_x.size(0)  # 当前用于训练的样本数量
            else:  # 使用数据集的20%用于验证
                model.eval()  # 设置模型为评估模式,不启用Batch Normalization和Dropout
                output = model(b_x)  # 前向传播过程,输入为一个batch,输出为一个batch中对应的预测
                pre_lab = torch.argmax(output, 1)  # 查找每一行中最大值对应的行标
                loss = criterion(output, b_y)  # 计算每一个batch中64个样本的平均损失函数
                val_loss += loss.item() * b_x.size(0)  # 将验证集中每一个batch的损失函数进行累加
                val_corrects += torch.sum(pre_lab == b_y.data)  # 如果预测正确,则准确度val_corrects加1
                val_num += b_x.size(0)  # 当前用于验证的样本数量

        # 计算并保存每一次迭代的成本函数和准确率
        train_loss_all.append(train_loss / train_num)  # 计算并保存训练集的成本函数
        train_acc_all.append(train_corrects.double().item() / train_num)  # 计算并保存训练集的准确率
        val_loss_all.append(val_loss / val_num)  # 计算并保存验证集的成本函数
        val_acc_all.append(val_corrects.double().item() / val_num)  # 计算并保存验证集的准确率
        print('{} Train Loss: {:.4f} Train Acc: {:.4f}'.format(epoch, train_loss_all[-1], train_acc_all[-1]))
        print('{} Val Loss: {:.4f} Val Acc: {:.4f}'.format(epoch, val_loss_all[-1], val_acc_all[-1]))

        # 寻找最高准确度
        if val_acc_all[-1] > best_acc:
            best_acc = val_acc_all[-1]  # 保存当前的最高准确度
            best_model_wts = copy.deepcopy(model.state_dict())  # 保存当前最高准确度下的模型参数
        time_use = time.time() - since  # 计算耗费时间
        print("Train and val complete in {:.0f}m {:.0f}s".format(time_use // 60, time_use % 60))

    # 选择最优参数
    model.load_state_dict(best_model_wts)  # 加载最高准确度下的模型参数
    train_process = pd.DataFrame(data={"epoch": range(num_epochs),
                                       "train_loss_all": train_loss_all,
                                       "val_loss_all": val_loss_all,
                                       "train_acc_all": train_acc_all,
                                       "val_acc_all": val_acc_all}
                                 )  # 将每一代的损失函数和准确度保存为DataFrame格式

    # 显示每一次迭代后的训练集和验证集的损失函数和准确率
    plt.figure(figsize=(12, 4))
    plt.subplot(1, 2, 1)
    plt.plot(train_process['epoch'], train_process.train_loss_all, "ro-", label="Train loss")
    plt.plot(train_process['epoch'], train_process.val_loss_all, "bs-", label="Val loss")
    plt.legend()
    plt.xlabel("epoch")
    plt.ylabel("Loss")
    plt.subplot(1, 2, 2)
    plt.plot(train_process['epoch'], train_process.train_acc_all, "ro-", label="Train acc")
    plt.plot(train_process['epoch'], train_process.val_acc_all, "bs-", label="Val acc")
    plt.xlabel("epoch")
    plt.ylabel("acc")
    plt.legend()
    plt.show()

    return model, train_process

Next, start training and testing the model. The optimization algorithm uses the Adam optimizer , the learning rate is set to 0.0003, and the loss function is the cross entropy function. Then call the train_model() function to use 80% of the training set train_loader for training, 20% for verification, and a total of 25 rounds of training.

# 训练和测试模型
def train_model_process(myconvnet):
    optimizer = torch.optim.Adam(myconvnet.parameters(), lr=0.0003)  # 使用Adam优化器,学习率为0.0003
    criterion = nn.CrossEntropyLoss()  # 损失函数为交叉熵函数
    train_loader, class_label = train_data_process()  # 加载训练集
    test_data_x, test_data_y = test_data_process() # 加载测试集
    myconvnet, train_process = train_model(myconvnet, train_loader, 0.8, criterion, optimizer, num_epochs=25)  # 进行模型训练

    # 对测试集进行预测
    myconvnet.eval()  # 设置模型为评估模式,不启用Batch Normalization和Dropout
    output = myconvnet(test_data_x)  # 前向传播过程,输入为测试数据集,输出为对每个样本的预测
    pre_lab = torch.argmax(output, 1)  # 查找每一行中最大值对应的行标
    acc = accuracy_score(test_data_y, pre_lab)  # 计算分类准确率
    print("val_acc:", acc)

    # 计算混淆矩阵并可视化
    conf_mat = confusion_matrix(test_data_y, pre_lab)
    df_cm = pd.DataFrame(conf_mat, index=class_label, columns=class_label)
    heatmap = sns.heatmap(df_cm, annot=True, fmt="d", cmap="YlGnBu")
    heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0, ha='right')
    heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45, ha='right')
    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.show()

During the model training process, the change curve of the loss function and classification accuracy is as follows. It can be seen that the loss function decreases rapidly on the training set, first decreases on the verification set and then gradually converges to a small interval, indicating that the model has stabilized. The classification accuracy has been increasing on the training set, and gradually converges to a small interval on the verification set.
insert image description here

In order to obtain the generalization ability of the calculation model, the test set is given to the trained model for prediction, so as to obtain the prediction accuracy on the test set (as shown in the figure below).
insert image description here

For the prediction results of the test samples, use the confusion matrix to represent and visualize it, and observe its prediction on each type of data (as shown in the figure below). It can be seen that T-shirt and Shirt are the easiest to predict errors , and the number of samples with mutual prediction errors exceeds 100.
insert image description here

5. Run the program

The following is the content of the main function, create a ConvNet -like object, and train and predict the convolutional neural network.

if __name__ == '__main__':
    convnet = ConvNet()
    train_model_process(convnet)

Note: In the previous program, it was configured to use multiple processes to load the training set data at the same time. The use of multiple processes must be performed in the main() function, otherwise an error will be reported during execution.

Guess you like

Origin blog.csdn.net/baoli8425/article/details/119740795