PyTorch in action: Implementing MNIST handwritten digit recognition

Preface

PyTorch can be said to be the most suitable for beginners to learn among the three mainstream frameworks. Compared with other mainstream frameworks, PyTorch's simplicity and ease of use make it the first choice for beginners. The point I want to emphasize is that the framework can be compared to a programming language, which is only a tool for us to achieve project effects, that is, the wheels we use to build cars. What we need to focus on is to understand how to use Torch to implement functions without overly caring about it. How to make the wheels will take us too much learning time. In the future, there will be a series of articles that explain the deep learning framework in detail, but it is only later that we are more familiar with the theoretical knowledge and practical operations of deep learning before we can start learning. What we need most at this stage is to learn how to use these tools.

The content of deep learning is not so easy to master. It contains a lot of mathematical theoretical knowledge and a lot of calculation formula principles that require reasoning. And without actual operation, it is difficult to understand what role the code we write ultimately represents in the neural network computing framework. However, I will try my best to simplify the knowledge and convert it into content that we are more familiar with. I will try my best to let everyone understand and become familiar with the neural network framework, to ensure smooth understanding and smooth deduction, and try not to use too many mathematical formulas and Professional theoretical knowledge. Quickly understand and implement the algorithm in one article, and become proficient in this knowledge in the most efficient way.

The blogger has been focusing on data modeling for four years, and has participated in dozens of mathematical modeling, large and small, and understands the principles of various models, the modeling process of each model, and various problem analysis methods. The purpose of this column is to quickly use various mathematical models, machine learning, deep learning, and code from scratch. Each article contains practical projects and runnable code. Bloggers keep up with various digital and analog competitions. For each digital and analog competition, bloggers will write the latest ideas and codes into this column, as well as detailed ideas and complete codes. I hope friends in need will not miss the column carefully created by the author.

Quick Learning in One Article - Commonly Used Models in Mathematical Modeling

1. Data set loading

MNIST (Modified National Institute of Standards and Technology) is a handwritten digit dataset commonly used to train various image processing systems.

It contains a large number of images of handwritten digits, these numbers range from 0 to 9. Each image is a grayscale image, 28x28 pixels in size, representing a handwritten digit.

The MNIST data set is divided into two parts: training set and test set. The training set typically contains 60,000 images and is used to train the model. The test set contains 10,000 images and is used to evaluate the performance of the model.

The MNIST dataset is a very popular dataset that is used to test and validate various machine learning and deep learning models, especially in image recognition tasks. You can directly visit the official website to download or use torchvision in the program to download the data set.

Official website: THE MNIST DATABASE

There are 4 files in total, training set, training set labels, test set, and test set labels:

file name	size	content
train-labels-idx1-ubyte.gz	9,681 kb	55,000 training sets and 5,000 validation sets
train-labels-idx1-ubyte.gz	29 kb	Labels corresponding to training set images
t10k-images-idx3-ubyte.gz	1,611 kb	10,000 test sets
t10k-labels-idx1-ubyte.gz	5 kb	Labels corresponding to test set images

The program loads the MNIST data set:

from torch.utils.data import DataLoader
import torchvision.datasets as dsets

transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),  # 将图像转为灰度
    transforms.ToTensor(),  # 将图像转为张量
    transforms.Normalize((0.1307,), (0.3081,))
])

#MNIST dataset
train_dataset = dsets.MNIST(root = '/ml/pymnist',  #选择数据的根目录
                            train = True,  #选择训练集
                            transform = transform,  #不考虑使用任何数据预处理
                            download = True  #从网络上下载图片
                           )
test_dataset = dsets.MNIST(root = '/ml/pymnist',#选择数据的根目录
                           train = False,#选择测试集
                           transform = transform, #不考虑使用任何数据预处理
                           download = True #从网络上下载图片
                          )
#加载数据
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size = batch_size,
                                           shuffle = True #将数据打乱
                                          )
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                           batch_size = batch_size,
                                           shuffle = True
                                         )

image display:

import matplotlib.pyplot as plt
digit = train_dataset.train_data[0]
plt.imshow(digit,cmap=plt.cm.binary,interpolation='none')
plt.title("Labels: {}".format(train_dataset.train_labels[0]))
plt.show()

After that, you need to split the data set into a training set and a test set. The MNIST data set is already ready and can be used directly:

print("train_data:",train_dataset.train_data.size())
print("train_labels:",train_dataset.train_labels.size())
print("test_data:",test_dataset.test_data.size())
print("test_labels:",test_dataset.test_labels.size())

train_data: torch.Size([60000, 28, 28])
train_labels: torch.Size([60000])
test_data: torch.Size([10000, 28, 28])
test_labels: torch.Size([10000])

You also need to determine the batch size, which in neural network training batch_sizerefers to the number of samples that the model processes simultaneously during each iteration of training. It plays several important roles in the training process:

Accelerate the training process : By processing multiple samples at the same time, taking advantage of the parallel computing capabilities of modern computers, the training process can be accelerated, especially when using GPUs.
Reduce memory consumption : Loading the entire training set at once may occupy a large amount of memory, but loading the data in batches can reduce memory consumption, allowing training in a memory-constrained environment.
Improve model generalization ability : During the training process, the model will adjust weights based on the data of each batch, rather than relying on the entire training set. This can improve the model's generalization ability to different samples.
Avoid falling into local minima : Randomly selected small batches of samples can help the model avoid falling into local minima.
Increased noise robustness : In each iteration, the model only sees a small sample, which can be regarded as a kind of random noise and helps to improve the robustness of the model.
Convenient online learning : For online learning tasks, new data batches can be loaded dynamically without retraining the entire model.

In general, rationally choosing the right one batch_sizecan make the training process more efficient and stable, and improve the generalization ability of the model. However, if it is too large, batch_sizeit may cause memory overflow or slow down the training speed, and if it is too small, it batch_sizemay cause difficulty in model convergence. Therefore, choosing the right one batch_sizerequires debugging and optimization in practice.

print("批次的尺寸:",train_loader.batch_size)
print("load_train_data:",train_loader.dataset.train_data.shape)
print("load_train_labels:",train_loader.dataset.train_labels.shape)

Batch size: 100 
load_train_data: torch.Size([60000, 28, 28]) 
load_train_labels: torch.Size([60000])

From the output results, you can see that the total number of rows in the original data set and the data set read in batches are the same. In actual operation, train_loader and test_loader will be used as the input data source of the neural network.

2. Define Neural Network

In the previous article, I have led you to build a neural network several times. Pay attention to initializing the network and the corresponding input layer, hidden layer and output layer.

import torch.nn as nn
import torch

input_size = 784 #mnist的像素为28*28
hidden_size = 500
num_classes = 10#输出为10个类别分别对应于0~9

#创建神经网络模型
class Neural_net(nn.Module):
#初始化函数，接受自定义输入特征的维数，隐含层特征维数以及输出层特征维数
    def __init__(self,input_num,hidden_size,out_put):
        super(Neural_net,self).__init__()
        self.layer1 = nn.Linear(input_num,hidden_size) #从输入到隐藏层的线性处理
        self.layer2 = nn.Linear(hidden_size,out_put) #从隐藏层到输出层的线性处理
        
    def forward(self,x):
        x = self.layer1(x) #输入层到隐藏层的线性计算
        x = torch.relu(x) #隐藏层激活
        x = self.layer2(x) #输出层，注意，输出层直接接loss
        return x
        
net = Neural_net(input_size,hidden_size,num_classes)
print(net)

Neural_net(
  (layer1): Linear(in_features=784, out_features=500, bias=True)
  (layer2): Linear(in_features=500, out_features=10, bias=True)
)

super(Neural_net, self). init () is a way in Python to call methods or properties of a parent class. Here, Neural_net is the class name of the neural network model you defined, which inherits the nn.Module class, and nn.Module is the base class used to build neural network models in PyTorch. In other words, your neural network model will inherit all properties and methods of nn.Module, so that you can use various functions defined in nn.Module in the Neural_net class, such as adding neural network layers, specifying loss functions, etc.

3. Training model

Just pay attention to Variable, which was mentioned in the previous article.

Variable is an abstraction used to build calculation graphs in early versions of PyTorch (before version 0.4). It contains attributes such as data, grad, and grad_fn, and can be used to build calculation graphs and automatically calculate gradients during backpropagation. However, starting from PyTorch version 0.4, Variable has been officially abandoned, and Tensor directly supports the automatic derivation function, and there is no need to explicitly create Variable.

Therefore, Autograd is the core mechanism for PyTorch to implement automatic derivation, while Variable is an abstraction used to build computational graphs in early versions, and has now been replaced by Tensor. Autograd automatically tracks operations on the Tensor and calculates gradients when needed, enabling backpropagation.

#optimization
import numpy as np
from torchvision import transforms

learning_rate = 1e-3 #学习率
num_epoches = 5
criterion =nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(),lr = learning_rate) #随机梯度下降

for epoch in range(num_epoches):
    print('current epoch = %d' % epoch)
    for i ,(images,labels) in enumerate(train_loader,0):
        images=images.view(-1,28*28)
        outputs = net(images) #将数据集传入网络做前向计算
        labels = torch.tensor(labels, dtype=torch.long)
        
        loss = criterion(outputs, labels) #计算loss
        
        optimizer.zero_grad() #在做反向传播之前先清楚下网络状态
        loss.backward() #Loss反向传播
        optimizer.step() #更新参数
        
        if i % 100 == 0:
            print('current loss = %.5f' % loss.item())
            
print('finished training')

current epoch = 1
current loss = 0.27720
current loss = 0.23612
current loss = 0.39341
current loss = 0.24683
current loss = 0.18913
current loss = 0.31647
current loss = 0.28518
current loss = 0.18053
current loss = 0.34957
current loss = 0.31319
current epoch = 2
current loss = 0.15138
current loss = 0.30887
current loss = 0.24257
current loss = 0.46326
current loss = 0.30790
current loss = 0.17516
current loss = 0.32319
current loss = 0.32325
current loss = 0.32066
current loss = 0.24271

4. Accuracy Test

After the weights of each layer are updated with Loss through the stochastic gradient descent method, the accuracy of digital classification for the test set:

#prediction
total = 0
correct =0 
acc_list_test = []
for images,labels in test_loader:
    images=images.view(-1,28*28)
    outputs = net(images) #将数据集传入网络做前向计算
    
    _,predicts = torch.max(outputs.data,1)
    total += labels.size(0)
    correct += (predicts == labels).sum()
    acc_list_test.append(100 * correct / total)
    
print('Accuracy = %.2f'%(100 * correct / total))
plt.plot(acc_list_test)
plt.xlabel('Epoch')
plt.ylabel('Accuracy On TestSet')
plt.show()

Please pay attention to prevent it from getting lost. If there are any mistakes, please leave a message for advice. Thank you very much.

That’s all for this issue. My name is fanstuck. If you have any questions, feel free to leave a message for discussion. See you in the next issue.