pytorch (1): Realize mnist handwritten digit recognition

1 Development environment

Computer system: Windows 10

Compiler: Jupter Lab

Locale: Python 3.8

Deep learning environment: Pytorch

2 Preliminary preparation

2.1 Jupter Lab

(Reference https://blog.csdn.net/I_am_toutu/article/details/125495186)

(1) After ensuring that Anaconda is installed, open the Anaconda prompt at the start menu

(2) If jupyter-ab is not installed, install it with the following command

conda install jupyterlab

 (3) Start jupyterla:

① Method 1: Enter jupyter lab in Anaconda Prompt, as shown below

 ②Method 2: win+r to open the running window, enter Jupter lab, and open the software

(4) Modify the default file save path and execute in Anaconda Prompt:

(Reference https://www.jb51.net/article/274656.htm)

jupyter notebook --generate-config

        Prompt the path where the file jupyter_lab_config.py is stored, find and open it in this path, find the content in the red box below, modify the path, and restart jpyter lab

 (5) Add the existing conda virtual environment to the Jupyter kernel

        In the upper right corner of the jupyter lab interface, you can see the currently used kernel

         Click the red box to select the kernel, as shown in the figure below, there is only one default main environment in the initial state, which should be the Base in conda.

 

        If you want to use other existing virtual environments, you need to write. (Close the previously opened jupyter lab first)

①Enter the virtual environment that needs to be added, execute

conda activate <virtual environment name>

②Install ipykernel, execute

conda install ipykernel

③ Add the virtual environment without kernel, execute

python -m ipykernel install --name <existing virtual environment name> --display-name <Jupyterlab kernel name>

        After restarting Jupyter lab, select the added kernel.

 

2.2 Setting GPUs

        Since the computer graphics card used in the experiment is an integrated graphics card (intel(r) UHD graphics), the GPU cannot be used.

# 设置GPU(没有GPU则为CPU)
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import torchvision

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print('device', device)

        The execution result is as follows:

3 Import data

3.1 Download data

Use torchvision.datasets.MNIST to download the MNIST dataset, and divide the training set and test set

If the MNIST dataset has been downloaded locally, read it directly from the local (by adjusting the parameter download = False)

3.1.1 Function Prototype

torchvision.datasets.MNIST(root, train=True, transform=None, target_transform=None, download=False)

3.1.2 Parameter description

root (string): data address

train (string) : True = training set, False = test set

download (bool,optional) : If True, download the dataset from the Internet and put the dataset in the root directory.

transform (callable, optional): The parameter here selects a data transformation function you want, and directly completes the data transformation

target_transform (callable, optional) : A function/transform that takes a target and transforms it.

3.1.3 Experiment code

# 导入数据库
import os

ROOT_FOLDER = 'data'
MNIST_FOLDER = ROOT_FOLDER + '/MNIST'
if not os.path.exists(MNIST_FOLDER) or not os.path.isdir(MNIST_FOLDER):
    print('开始下载数据集')
    # 下载训练集
    train_ds = torchvision.datasets.MNIST(ROOT_FOLDER, 
                                          train=True, 
                                          transform=torchvision.transforms.ToTensor(), # 将数据类型转化为Tensor
                                          download=True)
    # 下载测试集
    test_ds  = torchvision.datasets.MNIST(ROOT_FOLDER, 
                                          train=False, 
                                          transform=torchvision.transforms.ToTensor(), # 将数据类型转化为Tensor
                                          download=True)
else:
    print('数据集已下载 直接读取')
    # 读取已下载的训练集
    train_ds = torchvision.datasets.MNIST(ROOT_FOLDER, 
                                          train=True, 
                                          transform=torchvision.transforms.ToTensor(), # 将数据类型转化为Tensor
                                          download=False)
    # 读取已下载的测试集
    test_ds  = torchvision.datasets.MNIST(ROOT_FOLDER, 
                                          train=False, 
                                          transform=torchvision.transforms.ToTensor(), # 将数据类型转化为Tensor
                                          download=False)

        The execution result is as follows: 

        After running, the data folder is created in the path, the data package is downloaded here, and the data set is directly converted into Tensor:

3.2 Load data

        Use torch.utils.data.DataLoader to load data and set batch_size=32

3.2.1 Function description

        torch.utils.data.DataLoader mainly divides data into batches, and its description is as follows:

(1) It is a data loader that combines datasets and samplers, and can provide multiple threads to process datasets.

(2) This function is used when training the model to divide the training data into multiple groups, and this function throws a set of data each time. Until all the data are thrown out. Is to do a data initialization.

(3) It should be noted that the data input into the function must be iterable. If it is a custom data set, it can be defined with def__len__ and def__getitem__ in the definition class.

3.2.2 Function prototype and parameter description

        Its function prototype is as follows:

torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=None, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, multiprocessing_context=None, generator=None, *, prefetch_factor=2, persistent_workers=False, pin_memory_device=‘’)

        Its parameters are described as follows:

dataset (string) : the dataset to load

batch_size (int, optional) : sample size to load per batch (default: 1)

shuffle (bool, optional) : If True, shuffle data every epoch.

sampler (Sampler or iterable, optional) : Defines the strategy for drawing samples from the dataset. Can be any Iterable that implements len. If specified, shuffle must not be specified.

batch_sampler (Sampler or iterable, optional) : Similar to sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.

num_workers (int, optional) : Number of child processes for data loading. 0 means the data will be loaded in the main process (default: 0).

pin_memory (bool, optional) : If True, the dataloader will copy tensors into device/CUDA pinned memory before returning. If the data element is a custom type, or collate_fn returns a batch of a custom type.

drop_last (bool, optional) : Set to True to drop the last incomplete batch if the dataset size is not divisible by the batch size. If False and the size of the dataset is not divisible by the batch size, the last batch will be kept. (default: False)

timeout (numeric, optional) : Set the timeout period for data reading, if the data is not read beyond this time, an error will be reported. (default: 0)

worker_init_fn (callable, optional) : If not None, this will be called on each worker child process after the step and before data loading, one by one with the order of the worker id (an int in [0, num_workers - 1]) import. (default: None)

3.2.3 Code

#加载数据
batch_size = 32
# 从 train_ds 加载训练集
train_dl = torch.utils.data.DataLoader(train_ds, 
                                       batch_size=batch_size, 
                                       shuffle=True)
# 从 test_ds 加载测试集
test_dl  = torch.utils.data.DataLoader(test_ds, 
                                       batch_size=batch_size)

# 取一个批次查看数据格式
# 数据的shape为:[batch_size, channel, height, weight]
# 其中batch_size为自己设定,channel,height和weight分别是图片的通道数,高度和宽度。
imgs, labels = next(iter(train_dl))
print(imgs.shape)
# torch.Size([32, 1, 28, 28])  # 所有数据集中的图像都是28*28的灰度图

Output after execution: torch.Size([32, 1, 28, 28])

4 Data Visualization

        The function of the squeeze() function is to remove the dimension of 1 from the matrix shape. For example, the shape of a matrix is ​​(5, 1), and the result after using this function is (5,).

# 数据可视化
import numpy as np

# 指定图片大小,图像大小为20宽、5高的绘图(单位为英寸inch)
plt.figure('数据可视化', figsize=(20, 5)) 
for i, imgs in enumerate(imgs[:20]):
    # 维度缩减
    npimg = np.squeeze(imgs.numpy())
    # 将整个figure分成2行10列,绘制第i+1个子图。
    plt.subplot(2, 10, i+1)
    plt.imshow(npimg, cmap=plt.cm.binary)
    plt.axis('off')
plt.show()

        The execution result is as follows:

5 Build a simple CNN network

        For a general CNN network, it is composed of a feature extraction network and a classification network. The feature extraction network is used to extract the features of the picture, and the classification network is used to classify the picture. The network result of CNN is shown in the figure below:

5.1 Description of Network Hierarchy

(1) nn.Conv2d is a convolutional layer, which is used to extract the features of the image. The input parameters are input channel, output channel, pooling kernel size

(2) nn.MaxPool2d is the pooling layer, which performs downsampling and uses higher-level abstractions to represent image features. The incoming parameter is the pooling kernel size

(3) nn.ReLU is an activation function that enables the model to fit nonlinear data

(4) nn.Linear is a fully connected layer, which can function as a feature extractor. The fully connected layer of the last layer can also be considered as an output layer. The incoming parameters are the input feature number and the output feature number (the input feature number is determined by The feature extraction network is calculated. If you don’t know how to calculate, you can run the network directly. The size of the input feature number will be prompted in the error report. The input feature number of the first fully connected layer in the lower network is 1600)

(5) nn.Sequential can connect the network according to the order of construction. The network structure is set in the initialization phase, and there is no need to rewrite it in the forward propagation.

5.2 Personal interpretation:

(1) In most cases, the convolutional layer makes the input matrix layer deeper, and the depth is determined by the number of kernels.

(2) The pooling layer generally does not change the number of layers of the input matrix, but makes the width and height of the matrix smaller and integrates higher-level abstract features. maxpool also has a function similar to "non-maximum value suppression", which only extracts the following Sample the largest image features in the kernel.

(3) The full link layer integrates the input matrix layers into a single dimension, and makes the calculated matrix size correspond to the number of categories in the data set

5.3 code

# 构建CNN网络
import torch.nn.functional as F

num_classes = 10  # 图片的类别数

class Model(nn.Module):
    def __init__(self):
        super().__init__()
         # 特征提取网络
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)  # 第一层卷积,卷积核大小为3*3
        self.pool1 = nn.MaxPool2d(2)                  # 设置池化层,池化核大小为2*2
        self.drop1 = nn.Dropout(p=0.15)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3) # 第二层卷积,卷积核大小为3*3   
        self.pool2 = nn.MaxPool2d(2)
        self.drop2 = nn.Dropout(p=0.15)
        
        # 分类网络
        self.fc1 = nn.Linear(1600, 64)          
        self.fc2 = nn.Linear(64, num_classes)
    # 前向传播
    def forward(self, x):
        x = self.drop1(self.pool1(F.relu(self.conv1(x))))     
        x = self.drop2(self.pool2(F.relu(self.conv2(x))))

        x = torch.flatten(x, start_dim=1)

        x = F.relu(self.fc1(x))
        x = self.fc2(x)
       
        return x

5.4 Load and print the model

# 加载并打印模型
from torchinfo import summary
# 将模型转移到GPU中(我们模型运行均在GPU中进行)
model = Model().to(device)

summary(model)

An error is reported after execution, as shown below

        Solution:

(1) Install torchinfo in this virtual environment in Anaconda Prompt as follows:

 (2) Re-run all the previous cells, and the output of the cell is as follows:

 5.5 Summary

        Generally speaking, this is a very simple model. There are 2 layers of convolution pooling in the middle layer, and a fully connected FC layer is added at the end.
 

(1) The convolution kernel of the first convolutional layer is 1*3*3, and there are 32 convolution kernels. After the calculation, the matrix changes from the original 1 layer to 32 layers. After the convolution, the ReLU activation function is passed. Then connect to the pooling layer, the size of the pooling layer core is 2*2, and the width and height are halved after the calculation is completed;

(2) The convolution kernel of the second convolutional layer is 32*3*3, and there are 64 convolution kernels. After the calculation, the matrix changes from the original 32 layers to 64 layers. After the same convolution, the ReLU activation function is passed , and then connected to the pooling layer, the size of the pooling layer core is 2*2, and the width and height are halved after the calculation is completed;

(3) In the last full connection layer, the multi-dimensional matrix is ​​first pulled into one dimension through flatten, and then through FC1, ReLU activation function, FC2 in turn, and finally a result matrix with a size of 1*10 is obtained;

(4) All pooling layers use maximum pooling, that is, only the largest value is retained in each 2*2 area;

(5) The activation function uses ReLU, the purpose is to increase the nonlinearity of the model


(6) You can also use torch.nn.Sequential to package the Conv-ReLU-Pool-Dropout process of each layer as a whole

6 Train the model

6.1 Setting hyperparameters

#设置超参数
loss_fn    = nn.CrossEntropyLoss() # 创建损失函数
learn_rate = 1e-2 # 学习率
opt        = torch.optim.SGD(model.parameters(),lr=learn_rate)

6.2 Writing the training function

(1)optimizer.zero_grad()

        The function traverses all the parameters of the model, cuts off the backpropagation gradient flow through the built-in method, and then sets the gradient value of each parameter to 0, that is, the previous gradient record is cleared.

(2)loss.backward()

        PyTorch's backpropagation (tensor.backward()) is implemented through the autograd package, which automatically calculates its corresponding gradient based on the mathematical operations performed by the tensor.

①torch.tensor is the basic class of the autograd package. If you set the requires_grads of tensor to True, it will start tracking all the operations on this tensor. If you use tensor.backward() after the operation, all gradients will be automatically calculated. , the gradient of the tensor will be added to its .grad attribute.

②The loss function loss is obtained by a series of calculations of all weights w of the model. If the requires_grads of a certain w is True, the .grad_fn attribute of all upper layer parameters of w (the weight w of the subsequent layer) will store the corresponding Operation, and then after using loss.backward(), the gradient value of each w will be calculated through layer-by-layer backpropagation, and saved to the .grad attribute of w.

③If tensor.backward() is not performed, the gradient value will be None, so loss.backward() should be written before optimizer.step().

(3)optimizer.step()

        The role of the step() function is to perform an optimization step and update the value of the parameter through the gradient descent method. Because the gradient descent is based on the gradient, the loss.backward() function should be executed to calculate the gradient before the optimizer.step() function is executed.

Note: The optimizer is only responsible for optimization through gradient descent, not for generating gradients, which are generated by the tensor.backward() method.

(4) Code

# 训练循环
def train(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)  # 训练集的大小,一共60000张图片
    num_batches = len(dataloader)   # 批次数目,1875(60000/32)

    train_loss, train_acc = 0, 0  # 初始化训练损失和正确率
    
    for X, y in dataloader:  # 获取图片及其标签
        X, y = X.to(device), y.to(device)
        
        # 计算预测误差
        pred = model(X)          # 网络输出
        loss = loss_fn(pred, y)  # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
        
        # 反向传播
        optimizer.zero_grad()  # grad属性归零
        loss.backward()        # 反向传播
        optimizer.step()       # 每一步自动更新
        
        # 记录acc与loss
        train_acc  += (pred.argmax(1) == y).type(torch.float).sum().item()
        train_loss += loss.item()
            
    train_acc  /= size
    train_loss /= num_batches

    return train_acc, train_loss

6.3 Writing test functions

        The test function is roughly the same as the training function, but since the network weights are not updated by gradient descent, there is no need to pass in the optimizer

def test (dataloader, model, loss_fn):
    size        = len(dataloader.dataset)  # 测试集的大小,一共10000张图片
    num_batches = len(dataloader)          # 批次数目,313(10000/32=312.5,向上取整)
    test_loss, test_acc = 0, 0
    
    # 当不进行训练时,停止梯度更新,节省计算内存消耗
    with torch.no_grad():
        for imgs, target in dataloader:
            imgs, target = imgs.to(device), target.to(device)
            
            # 计算loss
            target_pred = model(imgs)
            loss        = loss_fn(target_pred, target)
            
            test_loss += loss.item()
            test_acc  += (target_pred.argmax(1) == target).type(torch.float).sum().item()

    test_acc  /= size
    test_loss /= num_batches

    return test_acc, test_loss

6.4 Formal training

(1)model.train()

The role of model.train() is to enable Batch Normalization and Dropout.

If there are BN layers (Batch Normalization) and Dropout in the model, you need to add model.train() during training. model.train() is to ensure that the BN layer can use the mean and variance of each batch of data. For Dropout, model.train() randomly selects a part of the network connection to train and update parameters.

(2)model.eval()

The function of model.eval() is not to enable Batch Normalization and Dropout.

If there are BN layers (Batch Normalization) and Dropout in the model, add model.eval() during testing. model.eval() is to ensure that the BN layer can use the mean and variance of all training data, that is, to ensure that the mean and variance of the BN layer remain unchanged during the test. For Dropout, model.eval() utilizes all network connections, that is, does not randomly discard neurons.

After training the train samples, the generated model model is used to test the samples. Before model(test), you need to add model.eval(), otherwise, if there is input data, it will change the weight even if it is not trained. This is the nature of the BN layer and Dropout in the model.

(3) Code

epochs     = 50
train_loss = []
train_acc  = []
test_loss  = []
test_acc   = []

for epoch in range(epochs):
    model.train()
    epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, opt)
    
    model.eval()
    epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
    
    train_acc.append(epoch_train_acc)
    train_loss.append(epoch_train_loss)
    test_acc.append(epoch_test_acc)
    test_loss.append(epoch_test_loss)
    
    template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%,Test_loss:{:.3f}')
    print(template.format(epoch+1, epoch_train_acc*100, epoch_train_loss, epoch_test_acc*100, epoch_test_loss))
print('Done')

        The following are some training results:

        It can be seen that during the training process:

① The accuracy rate Train_acc in the training set is gradually increasing, and the training loss is gradually decreasing;

②The accuracy rate Test_acc in the test set is also gradually increasing, indicating that it has not yet reached the over-fitting state, and the test loss is gradually decreasing; 

7 Prediction & Results Visualization

import matplotlib.pyplot as plt
#隐藏警告
import warnings
warnings.filterwarnings("ignore")               #忽略警告信息
plt.rcParams['font.sans-serif']    = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False      # 用来正常显示负号
plt.rcParams['figure.dpi']         = 100        #分辨率

epochs_range = range(epochs)

plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

        The prediction and visualization results are as follows:

Guess you like

Origin blog.csdn.net/ali1174/article/details/130294224