[youcans hands-on learning model] LeNet model MNIST handwritten digit recognition

Welcome to the "youcans hands-on model" series
The content and resources of this column are synchronized to GitHub/youcans



In this paper, PyTorch is used to implement the LeNet5 network model, and the MNIST dataset is used to train the model for handwritten digit recognition.

1. LeNet5 convolutional neural network model

Yann LeCun (who won the Turing Award in 2018) published the paper "Gradient-Based Learning Applied to Document Recognition" in 1998. The proposed LeNet5 model is the pioneering work of convolutional neural networks and the first milestone of deep learning.

Paper download address: download 1 , download 2

insert image description here

Yann LeCun et al. creatively proposed the LeNet convolutional neural network model in 1989, and used the backpropagation algorithm to train the model to solve the problem of recognizing handwritten postal codes. In 1990, the LeNet model was applied to the postal code recognition system of the US Postal Service, with an error rate of only 1% and a rejection rate of about 9%, becoming the earliest practical handwritten digit recognition system. After years of iterative improvement, it became the LeNet-5 network model in the 1998 paper, which is the earliest convolutional neural network model. Although today it seems that this network is very simple and its performance is poor, its principle is still the basis of various convolutional neural networks.


1.1 Introduction to the paper

Using the Back Propagation Algorithm (BP Algorithm) to train a multi-layer neural network is one of the best examples of gradient learning techniques. For a given network structure, after simple preprocessing, gradient-based learning algorithms can be used to construct a complex decision surface to classify high-dimensional pattern features (such as handwritten characters). This paper reviews various methods for handwritten character recognition and compares their performance on the task of handwritten digit recognition. Convolutional Neural Networks (CNNs) are specifically designed to process 2D images and outperform other methods.

A practical document recognition system consists of several modules, including field extraction, segmentation, recognition, and language modeling. A new learning paradigm, called Graph Transformation Networks (GTN), uses a gradient-based approach to globally train such multi-module systems for optimal overall evaluation metrics. This paper introduces two online handwriting recognition systems, and experiments demonstrate the advantages of global training and the flexibility of graph transformation networks.

This paper describes a graph transformation network for reading bank checks, using a convolutional neural network character recognizer, combined with global training techniques, to accurately recognize business and personal checks. The system has been deployed commercially, reading millions of checks per day.


1.2 Convolutional Neural Network

The greatest contribution of this paper lies in the pioneering proposal of the convolutional neural network (CNN), thus creating the research direction of deep learning. We first review the original introduction to convolutional neural networks.

Convolutional networks combine three architectural ideas to ensure a degree of invariance to shift, scale, and distortion: local receptive fields, shared weights (or weight replication), and spatial or temporal subsampling.

Figure 2 shows a typical convolutional network for character recognition, called LeNet-5. The input layer receives standardized character images, and the neurons of each layer only receive input from the smaller neighborhood of the previous layer (Note: This is the fundamental difference in network structure from the fully connected layer FC).

The idea of ​​connecting neuron inputs to local receptive fields dates back to perceptrons in the early 1960s, when Hubel and Wiesel (Nobel Prize 1981) discovered locally sensitive selective neurons in the cat's visual system. In 1979, Fukushima Kunihiko (received the Ball Award in 2021) proposed the Neocognitron visual learning neural model, using local connections, average pooling and ReLU nonlinear activation functions. In 1985, Hinton (who won the Turing Award in 2018) and others proposed the Back Propagation (BP) gradient learning algorithm.

Using local receptive fields, neurons can extract basic visual features such as oriented edges, endpoints, corners (or similar features in other signals, such as the speech spectrum). These features are combined by subsequent network layers in order to detect higher-order features.

As mentioned earlier, distortion or shifting of the input causes the location of image features to change. Also, basic feature detectors that are useful for a portion of an image may be useful for the entire image. This knowledge can be achieved by using the same weight vector across a set of units whose receptive fields are at different locations on the image. Neurons in the same layer share a set of connection weights.

A convolutional layer consists of several feature maps (each with a different weight vector), so multiple features can be extracted at each location. For example, the first layer of LeNet-5 has 6 feature maps. Each unit in the feature map has 25 inputs and is connected to a 5×5 region in the previous layer, called the receptive field. Each unit has 25 inputs, a total of 25 weight parameters and 1 threshold parameter. Each feature map shares a set of weight parameters, and the 6 feature maps have 6 different sets of weight parameters, which can extract 6 different types of features.

This operation is equivalent to convolution, and the convolution kernel is the connection weight used by the feature map, so it is called a convolutional network.


1.3 LeNet5 network

The original LeNet network uses a 5-layer network, including 2 convolutional layers, 2 pooling layers (pooling layers) and 1 fully connected layer. The network structure is as follows:

  1. The input layer is a 28×28 single-channel image.

  2. C1 convolutional layer: 4 5×5 convolution kernels to obtain 4 24×24 feature maps.

  3. S1 pooling layer: 2×2 average pooling layer, the height and width of the image are halved, and four 12×12 feature maps are obtained.

  4. C2 convolutional layer: 12 5×5 convolution kernels to obtain 12 8×8 feature maps.

  5. S2 pooling layer: 2×2 average pooling layer, the height and width of the image are halved, and 12 4×4 feature maps are obtained.

  6. FC fully connected layer: fully connected hidden layer, using the sigmoid function.

The LeNet-5 network is an improved version of the LeNet network, using a 7-layer network, including 3 convolutional layers, 2 pooling layers and 2 fully connected layers.

insert image description here

  1. The input layer is a 32×32 single-channel image.
  2. C1 convolutional layer: 6 5×5 convolution kernels, 156 training parameters, and 6 28×28 feature maps.
  3. S2 pooling layer: 2×2 maximum pooling layer, the height and width of the image are halved, 12 training parameters, and six 14×14 feature maps are obtained.
  4. C3 convolutional layer: 16 5×5 convolution kernels, 1516 training parameters, and 16 10×10 feature maps.
  5. S4 pooling layer: 2×2 maximum pooling layer, the height and width of the image are halved, 32 training parameters, and 16 5×5 feature maps are obtained.
  6. C5 convolutional layer: 120 5×5 convolution kernels, 48120 training parameters, and 120 1×1 feature maps, equivalent to full connection.
  7. F6 fully connected layer: a hidden layer consisting of 84 neurons, 10164 training parameters, using the sigmoid function.
  8. F7 output layer: an output layer composed of 10 RBF neurons.

In particular, the C3 layer and the S2 layer are not all connected, but connected according to the method selected by the author to reduce calculation and extract more features. But in later studies, as the network becomes more and more complex, this artificial selection method is usually not used.

insert image description here


1.4 Running results of the model

insert image description here


2. Define the LeNet5 model class in PyTorch

2.1 Use nn.Module to define the network model class

PyTorch provides a high-level API through the torch.nn module to build networks from scratch.

Using PyTorch to construct a neural network model requires the use __call__()and __init__()method definition of the model class Class. nn.ModuleIs the base class for all neural network modules.

PyTorch implements __call__()the method in nn.Module, and calls the forward function in __call__()the method . __init__()A method is a class initialization function, similar to a C++ constructor.

The routines of the LeNet model class are as follows:

import torch.nn as nn
import torch.nn.functional as F

# 定义 LeNet5 模型类 1
class LeNet5v1(nn.Module):
    def __init__(self):
        super(LeNet5v1, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5, padding='same')  # C1: 输入 1,输出 6,卷积核 5x5
        self.pool1 = nn.AvgPool2d(2, 2)  # S2: 卷积核 2x2,步长 2
        self.conv2 = nn.Conv2d(6, 16, 5)  # C3: 输入 6,输出 16,卷积核 5x5
        self.pool2 = nn.AvgPool2d(2, 2)  # S4: 卷积核 2x2,步长 2
        self.flatten = nn.Flatten()  # 展平为一维
        self.linear1 = nn.Linear(400, 120)  # C5: 输入 400,输出 120
        self.linear2 = nn.Linear(120, 84)  # F6: 输入 120,输出 84
        self.linear3 = nn.Linear(84, 10)  # F7: 输入 84,输出 10

    def forward(self, x):
        x = F.relu(self.conv1(x))  # (1,28,28) -> (6,28,28)
        x = self.pool1(x)  # (1,28,28) -> (6,14,14)
        x = F.relu(self.conv2(x))  # (6,14,14) -> (16,10,10)
        x = self.pool2(x)  # (16,10,10) -> (16,5,5)
        x = self.flatten(x)  # (16,5,5) -> (400)
        x = F.relu(self.linear1(x))  # (400) -> (120)
        x = F.relu(self.linear2(x))  # (120) -> (84)
        x = self.linear3(x)  # (84) -> (10)
        return x

Use print to output the structure of the LeNet model as follows:

LeNet5(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=same)
  (pool1): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (pool2): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear1): Linear(in_features=400, out_features=120, bias=True)
  (linear2): Linear(in_features=120, out_features=84, bias=True)
  (linear3): Linear(in_features=84, out_features=10, bias=True)
)

In the LeNet5v1 model class, the activation function is not reflected in the model structure, but is implemented when the forward function is calculated forward.


2.2 Use Sequential container to construct model class

nn.Sequential() is an ordered container. This class adds multiple constructors to the calculation graph for execution in the order in which they are passed to the constructor. Serialized modules can be built through Sequential, which makes the hierarchy of network modules clearer and facilitates the construction of large and complex network models.

Simply, add each network layer in the LeNet5 model to the Sequential container sequentially in the form of a parameter list, define a model directly in the init method, and simplify the forward method.

# 定义 LeNet5 模型类 2
class LeNet5v2(nn.Module):
    def __init__(self):
        super(LeNet5v2, self).__init__()
        self.model = nn.Sequential(  # 顺序容器
            nn.Conv2d(1, 6, 5, padding='same'),  # C1: 输入 1,输出 6,卷积核 5x5
            nn.ReLU(),  # 激活函数
            nn.AvgPool2d(2, 2),  # S2: 卷积核 2x2,步长 2
            nn.Conv2d(6, 16, 5),  # C3: 输入 6,输出 16,卷积核 5x5
            nn.ReLU(),  # 激活函数
            nn.AvgPool2d(2, 2),  # S4: 卷积核 2x2,步长 2
            nn.Flatten(),  # 展平为一维
            nn.Linear(400, 120),  # C5: 输入 400,输出 120
            nn.ReLU(),  # 激活函数
            nn.Linear(120, 84),  # F6: 输入 120,输出 84
            nn.ReLU(),  # 激活函数
            nn.Linear(84, 10)  # F7: 输入 84,输出 10
        )

    def forward(self, x):
        x = self.model(x)
        return x

Use print to output the structure of the LeNet v2 model as follows:

LeNet5v2(
  (model): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=same)
    (1): ReLU()
    (2): AvgPool2d(kernel_size=2, stride=2, padding=0)
    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (4): ReLU()
    (5): AvgPool2d(kernel_size=2, stride=2, padding=0)
    (6): Flatten(start_dim=1, end_dim=-1)
    (7): Linear(in_features=400, out_features=120, bias=True)
    (8): ReLU()
    (9): Linear(in_features=120, out_features=84, bias=True)
    (10): ReLU()
    (11): Linear(in_features=84, out_features=10, bias=True)
  )
)

In the LeNet5v2 model class, the activation function is directly reflected in the model structure, and the model structure is clearer and more complete.


2.3 Using Sequential to construct model classes hierarchically

Convolution, pooling, and nonlinear activation functions are usually combined and used as a network layer. Through Sequential, the network can be constructed layer by layer, and the specified layer can also be accessed, and the parameters and weights of the network can be displayed through parameters, weights and other parameters.

Construct each network layer with a Sequential container, and define the LeNet5 model class as follows.

# 定义 LeNet5 网络结构 3
class LeNet5v3(nn.Module):
    def __init__(self):
        super(LeNet5v3, self).__init__()  # 调用父类的构造函数
        # 卷积池化层
        self.conv_pool1 = nn.Sequential(
            nn.Conv2d(1, 6, 5, padding=2),  # C1: 输入 1,输出 6,卷积核 5x5,填充 2
            nn.ReLU(),  # ReLU 激活函数
            nn.AvgPool2d(2, stride=2)  # S2: 卷积核 2x2,步长 2
        )
        self.conv_pool2 = nn.Sequential(
            nn.Conv2d(6, 16, 5),  # C3: 输入 6,输出 16,卷积核 5x5
            nn.ReLU(),  # ReLU 激活函数
            nn.AvgPool2d(2, stride=2)  # S2: 卷积核 2x2,步长 2
        )
        # 全连接层
        self.fc1 = nn.Sequential(
            nn.Linear(16*5*5, 120),
            nn.ReLU()
        )
        self.fc2 = nn.Sequential(
            nn.Linear(120, 84),
            nn.ReLU()
        )
        # 输出层
        self.out = nn.Sequential(
            nn.Linear(84, 10)
        )

    def forward(self, x):
        x = self.conv_pool1(x)  # (1,28,28) -> (6,14,14)
        x = self.conv_pool2(x)  # (6,14,14) -> (16,5,5)
        x = x.view(x.size(0), -1)  # (16,5,5) -> (400), 展平为一维
        x = self.fc1(x)  # (400) -> (120)
        x = self.fc2(x)  # (120) -> (84)
        x = self.out(x)  # (84) -> (10)
        return x

Use print to output the structure of the LeNet v3 model as follows:

LeNet5(
  (conv_pool1): Sequential(
    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (1): ReLU()
    (2): AvgPool2d(kernel_size=2, stride=2, padding=0)
  )
  (conv_pool2): Sequential(
    (0): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
    (1): ReLU()
    (2): AvgPool2d(kernel_size=2, stride=2, padding=0)
  )
  (fc1): Sequential(
    (0): Linear(in_features=400, out_features=120, bias=True)
    (1): ReLU()
  )
  (fc2): Sequential(
    (0): Linear(in_features=120, out_features=84, bias=True)
    (1): ReLU()
  )
  (out): Sequential(
    (0): Linear(in_features=84, out_features=10, bias=True)
  )
)

3. MNIST handwritten digit recognition based on LeNet5 model

3.1 The basic steps of building a neural network model in PyTorch

The basic steps to build, train and use a neural network model with PyTorch are as follows.

  1. Prepare dataset: Load the dataset and preprocess the data.
  2. Design the model: instantiate the model class, define the loss function and optimizer, and determine the model structure and training method.
  3. Model training: use the training data set to train the model and determine the model parameters.
  4. Model inferring: Use the trained model to perform inference and predict the output results for the input data.
  5. Model saving/loading: Save the trained model for later use or deployment.

The following steps explain the routine of the LeNet5 model.


3.2 Loading the MNIST dataset

The sample structure of the general data set is balanced, the information is efficient, and the organization is standardized and easy to handle. Using a common dataset to train a neural network can not only improve work efficiency, but also facilitate evaluation of model performance.

PyTorch provides some commonly used image datasets, preloaded in torchvision.datasetsthe class . torchvisionThe module implements the core classes and methods required for neural networks, torchvision.datasetsincluding popular datasets, model architectures, and commonly used image conversion methods.

The MNIST dataset is a classic handwritten digit dataset, which contains handwritten digits from 0 to 9, and the image is a single-channel grayscale image with a size of 28*28. The training set contains 60000 images and the test set contains 10000 images.

The MNIST dataset can be downloaded from the official website: http://yann.lecun.com/exdb/mnist/ and then used, or it can be automatically loaded using the datasets class (if the local path does not have the file, it will be downloaded automatically).

When downloading the dataset, use the predefined transform method for data preprocessing, including standardizing the sample data using the mean and variance of the MNIST dataset, and converting the data format into tensors. Note that MNIST is a single-channel image, so the mean and variance are also single-channel.

Large training data sets cannot load all samples for training at one time, and the Dataloader class can be used to automatically load data. Dataloader is an iterator whose basic function is to pass in a Dataset object and generate a batch of data according to the parameter batch_size.

    # (1) 将[0,1]的 PILImage 转换为[-1,1]的Tensor
    transform = transforms.Compose([  # Transform Compose of the image
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize(mean=(0.1307,), std=(0.3081,))])  # 标准化

    # (2) 加载 MNIST 数据集
    batch_size = 64
    # 加载 MNIST 数据集, 如果 root 路径加载失败, 则自动在线下载
    # 加载 MNIST 训练数据集, 50000张训练图片
    train_set = torchvision.datasets.MNIST(root='../dataset', train=True,
                                          download=True, transform=transform)
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                              shuffle=True, num_workers=2)

    # 加载 MNIST 验证数据集, 10000张验证图片
    test_set = torchvision.datasets.MNIST(root='../dataset', train=False,
                                          download=True, transform=transform)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=1024,
                                             shuffle=False, num_workers=2)      

3.3 Establish LeNet5 network model

Establish a LeNet5 network model for training, including three steps:

  • Instantiate the LeNet5 model object;
  • Set the loss function for training;
  • Set the optimizer for training.

The torch.nn.functional module provides various built-in loss functions, this example uses the cross entropy loss function CrossEntropyLoss.

The torch.optim module provides various optimization methods, this example uses the Adam optimizer. Note that the parameter model.parameters() of the model should be passed to the optimizer object so that the optimizer can scan the parameters that need to be optimized.

    # (3) 实例化 LeNet-5 网络模型
    model = LeNet5()  # 实例化 LeNet-5 网络模型
    print(model)

    criterion = nn.CrossEntropyLoss()  # 交叉熵损失函数
    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)  # SGD 随机梯度下降优化器

Use print to output the structure of the LeNet model as follows:

LeNet5(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=same)
  (pool1): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (pool2): AvgPool2d(kernel_size=2, stride=2, padding=0)
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear1): Linear(in_features=400, out_features=120, bias=True)
  (linear2): Linear(in_features=120, out_features=84, bias=True)
  (linear3): Linear(in_features=84, out_features=10, bias=True)
)

3.4 LeNet5 model training

The basic steps of PyTorch model training are:

  1. The output value of the feedforward calculation model;
  2. Calculate the loss function value;
  3. Calculate the gradient of weight weight and bias bias;
  4. Adjust the model parameters according to the gradient value;
  5. Reset gradients to 0 (for next loop).

During the model training process, the validation set data can be used to evaluate the model accuracy during the training process in order to control the training process. Model verification is to use the verification data for model reasoning. The model output is obtained through forward calculation, but the model error is not reversely calculated. Therefore, torch.no_grad() needs to be set.

The routine for model training using PyTorch is as follows.

    # (4) 训练 LeNet-5 网络模型
    epoch_list = []  # 记录训练轮次
    loss_list = []  # 记录训练集的损失值
    accu_list = []  # 记录验证集的准确率
    num_epochs = 50  # 训练轮次
    for epoch in range(num_epochs):  # 训练轮次 epoch
        running_loss = 0.0  # 每个 epoch 的累加损失值清零
        for step, data in enumerate(train_loader, start=0):  # 迭代器加载数据
            inputs, labels = data  # inputs: [batch, 1, 28, 28] labels: [batch]

            optimizer.zero_grad()  # 损失梯度清零
            outputs = model(inputs)  # 前向传播, [batch, 10]
            loss = criterion(outputs, labels)  # 计算损失函数
            loss.backward()  # 反向传播
            optimizer.step()  # 参数更新

            # 累加训练损失值
            running_loss += loss.item()
            if step%100==99:  # 每 100 个 step 打印一次训练信息
                print("epoch {}, step {}: loss = {:.4f}".format(epoch, step, loss.item()))

        # 计算验证集的预测准确率
        with torch.no_grad():  # 验证过程, 不计算损失函数梯度
            outputs_valid = model(valid_images)  # 对验证集进行模型推理 [batch, 10]
        # loss_valid = criterion(outputs_valid, valid_labels)  # 计算验证集损失函数
        pred_labels = torch.max(outputs_valid, dim=1)[1]  # 模型预测的类别 [batch]
        accuracy = torch.eq(pred_labels, valid_labels).sum().item() / valid_size * 100  # 计算准确率
        print("Epoch {}: train loss={:.4f}, accuracy={:.2f}%".format(epoch, running_loss, accuracy))

        # 记录训练过程的统计数据
        epoch_list.append(epoch)  # 记录迭代次数
        loss_list.append(running_loss)  # 记录训练集上的损失函数
        accu_list.append(accuracy)  # 记录验证集上的损失函数值

The result of the program running is as follows:

Epoch 0: train loss=1334.7479, accuracy=85.50%
Epoch 1: train loss=318.6170, accuracy=91.30%
Epoch 2: train loss=219.7319, accuracy=94.20%
Epoch 3: train loss=168.5048, accuracy=95.50%

Epoch 47: train loss=10.6612, accuracy=98.80%
Epoch 48: train loss=10.9356, accuracy=98.70%
Epoch 49: train loss=9.7699, accuracy=98.70%

After 10 rounds of training, using 1000 pictures in the verification set for verification, the accuracy of the model is close to 98%. Continue training, you can further reduce the value of the training loss function, but the accuracy of the verification set remains at 98~99%. From the loss function of the training set and the accuracy curve of the verification set in the figure below, it can be seen that after about 20 rounds of training on the training set, satisfactory model parameters can be obtained.

insert image description here


3.5 LeNet5 model saving and loading

After the model is trained, save the model for next use. There are two main ways to save models in PyTorch, one is to save the model weights, and the other is to save the entire model. This example uses the model.state_dict() method to return the model weights as a dictionary, and the torch.save() method serializes the weight dictionary to disk and saves the model as a .pth file.

    # (5) 保存 LeNet5 网络模型
    model_path = "../models/LeNet_MNIST2.pth"
    torch.save(model.state_dict(), model_path)

To use the trained model, first instantiate the model class, and then call the load_state_dict() method to load the weight parameters of the model.

    # 以下模型加载和模型推理,可以是另一个独立的程序
    # (6) 加载 LeNet5 网络模型进行推理
    # 加载 LeNet 预训练模型
    model_new = LeNet5()  # 实例化 LeNet-5 网络模型
    model_path = "../models/LeNet_MNIST1.pth"
    model_new.load_state_dict(torch.load(model_path))
    model_new.eval()  # 模型推理模式

Special attention should be paid to:

(1) The .pth file in PyTorch only saves the weight parameters of the model, but does not have the structural information of the model. Therefore, the model object must be instantiated before loading the model parameters.

(2) The model object must strictly correspond to the model parameters before it can be used normally. Note that even if they are both LeNet5 models, there may be subtle differences in the specific definitions of the model classes. If the definition of the model class is obtained from one source, and the model parameter file is obtained from another source, it is easy to cause a mismatch between the model structure and the parameters.

(3) Regardless of the model and parameters loaded from the PyTorch model warehouse, or the pre-trained model obtained from other sources, or the model trained by yourself, the method of loading the model is the same, and attention must also be paid to the matching of the model structure and parameters question.


3.6 Model Inference

Using the loaded LeNet5 model, input a new picture for model inference, and the category of the input picture can be determined from the output result of the model.

Using the test set data for model inference, the accuracy of the test model can be calculated. Note that the model validation set and the model testing set cannot be used interchangeably, but in order to simplify the routine, no distinction is made in this program.

    # 模型检测
    model_new.eval()  # 模型推理模式
    correct = 0
    total = 0
    for data in test_loader:  # 迭代器加载测试数据集
        inputs, labels = data
        outputs = model_new(inputs)
        labels_pred = torch.max(outputs, dim=1)[1]  # 模型预测的类别 [batch]
        # _, labels_pred = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += torch.eq(labels_pred, labels).sum().item()
    accuracy = 100. * correct / total
    print("Test accuracy={:.2f}%".format(accuracy))

Using the test set for model inference, the accuracy of the test model is 98.75%.

Test accuracy=98.75%

Select a few pictures from the test set, or read new handwritten digital pictures (pay attention to format conversion and picture size), input pictures for model reasoning, and also recognize the numbers in the input pictures.

     # (7) 模型推理识别手写数字
    # imgs, labels = next(iter(test_loader))  # 用 next 返回一个批次的数据
    # print(imgs.shape, labels.shape)  # torch.Size([64, 1, 28, 28])

    plt.figure(figsize=(8, 5))
    plt.suptitle("Inferring using LeNet-5 Model")
    for i, img in enumerate(imgs[:10]):
        out = model_new(imgs[i].unsqueeze(0))  # 增加维度,[1, 1, 28, 28]
        pred = torch.max(out, dim=1)[1]  # 模型预测的类别 torch.Size([1])
        plt.subplot(2, 5, i+1)
        imgNP = img.squeeze().numpy()  # 删除维度,转换为 numpy 数组
        plt.imshow(imgNP, cmap='gray')  # 绘制第 i 张图片
        plt.title("{:d}".format(pred.item()))
        plt.axis('off')
    plt.tight_layout()
    plt.show()

The results of handwritten digit recognition are as follows.

insert image description here


4. The complete routine of LeNet5 model for MNIST handwritten digit recognition

The complete routine of this article is as follows.

# Beginner_LeNet_MNIST_1.py
# LeNet-5 model for beginner with PyTorch
# 经典模型: LeNet 模型 MNIST 手写数字识别
# Copyright: [email protected]
# Crated: Huang Shan, 2023/05/12

# _*_coding:utf-8_*_
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from matplotlib import pyplot as plt

# 定义 LeNet5 网络结构
class LeNet5(nn.Module):
    def __init__(self):
        super(LeNet5, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 5, padding='same')  # C1: 输入 1,输出 6,卷积核 5x5
        self.pool1 = nn.AvgPool2d(2, 2)  # S2: 卷积核 2x2,步长 2
        self.conv2 = nn.Conv2d(6, 16, 5)  # C3: 输入 6,输出 16,卷积核 5x5
        self.pool2 = nn.AvgPool2d(2, 2)  # S4: 卷积核 2x2,步长 2
        self.flatten = nn.Flatten()  # 展平为一维
        self.linear1 = nn.Linear(400, 120)  # C5: 输入 400,输出 120
        self.linear2 = nn.Linear(120, 84)  # F6: 输入 120,输出 84
        self.linear3 = nn.Linear(84, 10)  # F7: 输入 84,输出 10

    def forward(self, x):
        x = F.relu(self.conv1(x))  # (1,28,28) -> (6,28,28)
        x = self.pool1(x)  # (1,28,28) -> (6,14,14)
        x = F.relu(self.conv2(x))  # (6,14,14) -> (16,10,10)
        x = self.pool2(x)  # (16,10,10) -> (16,5,5)
        x = self.flatten(x)  # (16,5,5) -> (400)
        x = F.relu(self.linear1(x))  # (400) -> (120)
        x = F.relu(self.linear2(x))  # (120) -> (84)
        x = self.linear3(x)  # (84) -> (10)
        return x

if __name__ == '__main__':
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print(device)

    # (1) 将[0,1]的 PILImage 转换为[-1,1]的Tensor
    transform = transforms.Compose([  # Transform Compose of the image
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize(mean=(0.1307,), std=(0.3081,))])  # 标准化

    # (2) 加载 MNIST 数据集
    batch_size = 64
    # 加载 MNIST 数据集, 如果 root 路径加载失败, 则自动在线下载
    # 加载 MNIST 训练数据集, 50000张训练图片
    train_set = torchvision.datasets.MNIST(root='../dataset', train=True,
                                          download=True, transform=transform)
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size,
                                              shuffle=True, num_workers=2)
    # 加载 MNIST 验证数据集, 10000张验证图片
    test_set = torchvision.datasets.MNIST(root='../dataset', train=False,
                                          download=True, transform=transform)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=1000,
                                             shuffle=False, num_workers=2)
    # 创建生成器,用 next 获取一个批次的数据
    valid_data_iter = iter(test_loader)  # _SingleProcessDataLoaderIter 对象
    valid_images, valid_labels = next(valid_data_iter)  # val_image: [batch, 1, 28, 28] val_label: [batch]
    valid_size = valid_labels.size(0)  # 验证数据集大小,1000
    print(valid_images.shape, valid_labels.shape)  # torch.Size([1000, 1, 28, 28]) torch.Size([1000]

    # (3) 实例化 LeNet-5 网络模型
    model = LeNet5()  # 实例化 LeNet-5 网络模型
    print(model)

    criterion = nn.CrossEntropyLoss()  # 交叉熵损失函数
    optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)  # SGD 随机梯度下降优化器

    # (4) 训练 LeNet-5 网络模型
    epoch_list = []  # 记录训练轮次
    loss_list = []  # 记录训练集的损失值
    accu_list = []  # 记录验证集的准确率
    num_epochs = 50  # 训练轮次
    for epoch in range(num_epochs):  # 训练轮次 epoch
        running_loss = 0.0  # 每个 epoch 的累加损失值清零
        for step, data in enumerate(train_loader, start=0):  # 迭代器加载数据
            inputs, labels = data  # inputs: [batch, 1, 28, 28] labels: [batch]

            optimizer.zero_grad()  # 损失梯度清零
            outputs = model(inputs)  # 前向传播, [batch, 10]
            loss = criterion(outputs, labels)  # 计算损失函数
            loss.backward()  # 反向传播
            optimizer.step()  # 参数更新

            # 累加训练损失值
            running_loss += loss.item()
            # if step%100==99:  # 每 100 个 step 打印一次训练信息
            #     print("epoch {}, step {}: loss = {:.4f}".format(epoch, step, loss.item()))

        # 计算验证集的预测准确率
        with torch.no_grad():  # 验证过程, 不计算损失函数梯度
            outputs_valid = model(valid_images)  # 对验证集进行模型推理 [batch, 10]
            # loss_valid = criterion(outputs_valid, valid_labels)  # 计算验证集损失函数
            pred_labels = torch.max(outputs_valid, dim=1)[1]  # 模型预测的类别 [batch]
            accuracy = torch.eq(pred_labels, valid_labels).sum().item() / valid_size * 100  # 计算准确率

        # 记录训练过程的统计数据
        epoch_list.append(epoch)  # 记录迭代次数
        loss_list.append(running_loss)  # 记录训练集上的损失函数
        accu_list.append(accuracy)  # 记录验证集上的损失函数值
        print("Epoch {}: train loss={:.4f}, accuracy={:.2f}%".format(epoch, running_loss, accuracy))

    # 训练结果可视化
    plt.figure(figsize=(11, 5))
    plt.suptitle("LeNet-5 Model in MNIST")
    plt.subplot(121), plt.title("Train loss")
    plt.plot(epoch_list, loss_list)
    plt.xlabel('epoch'), plt.ylabel('loss')
    plt.subplot(122), plt.title("Valid accuracy")
    plt.plot(epoch_list, accu_list)
    plt.xlabel('epoch'), plt.ylabel('accuracy')
    plt.show()

    # (5) 保存 LeNet5 网络模型
    model_path = "../models/LeNet_MNIST1.pth"
    torch.save(model.state_dict(), model_path)  # 保存模型权值
    
    # # 以下模型加载和模型推理,可以是另一个独立的程序
    # # (6) 加载 LeNet5 网络模型进行推理
    # # 加载 LeNet 预训练模型
    # model_new = LeNet5()  # 实例化 LeNet-5 网络模型
    # model_path = "../models/LeNet_MNIST1.pth"
    # model_new.load_state_dict(torch.load(model_path))
    # model_new.eval()  # 模型推理模式
    #
    # # 模型推理
    # correct = 0
    # total = 0
    # for data in test_loader:  # 迭代器加载测试数据集
    #     inputs, labels = data
    #     outputs = model_new(inputs)
    #     labels_pred = torch.max(outputs, dim=1)[1]  # 模型预测的类别 [batch]
    #     # _, labels_pred = torch.max(outputs.data, 1)
    #     total += labels.size(0)
    #     correct += torch.eq(labels_pred, labels).sum().item()
    # accuracy = 100. * correct / total
    # print("Test accuracy={:.2f}%".format(accuracy))

【End of this section】

references:

  1. Yann LeCun, Gradient-based learning applied to document recognition, 1998
  2. https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

【End of this section】


Copyright statement:
Welcome to pay attention to the "youcans hands-on model" series
. For forwarding, please indicate the original link:
[youcans hands-on model] LeNet model MNIST handwritten digit recognition
Copyright 2023 youcans, XUPT
Crated: 2023-05-16


Guess you like

Origin blog.csdn.net/youcans/article/details/130699395