MNIST handwritten digit recognition - ResNet-classic convolutional neural network

Understand the network structure of ResNet18; master the method of saving and loading the model; master the method of batch testing pictures.

Combined with the image classification task, the typical image classification network ResNet18 is used to realize handwritten digit recognition.

As a classic image classification network, ResNet has its obvious advantages:

  • First of all, it is deep enough, the common ones are 34 layers, 50 layers, and 101 layers. Usually, the deeper the level, the stronger the representation ability and the higher the classification accuracy.

  • Secondly, it is learnable, adopts the residual structure, and directly connects the lower layer to the upper layer through the shortcut connection, which solves the problem of gradient disappearance caused by the network being too deep during the backpropagation process.

  • In addition, the performance of the ResNet network is very good, not only in the accuracy of recognition, but also in the size and parameters of its own model.

1. Load and process the dataset

import os
import sys
import moxing as mox

datasets_dir = '../datasets'
if not os.path.exists(datasets_dir):
    os.makedirs(datasets_dir)
    
if not os.path.exists(os.path.join(datasets_dir, 'MNIST_Data.zip')):
    mox.file.copy('obs://modelarts-labs-bj4-v2/course/hwc_edu/python_module_framework/datasets/mindspore_data/MNIST_Data.zip', 
                  os.path.join(datasets_dir, 'MNIST_Data.zip'))
    os.system('cd %s; unzip MNIST_Data.zip' % (datasets_dir))
    
sys.path.insert(0, os.path.join(os.getcwd(), '../datasets/MNIST_Data'))
from load_data_all import load_data_all
from process_dataset import process_dataset

mnist_ds_train, mnist_ds_test, train_len, test_len = load_data_all(datasets_dir)  # 加载数据集
mnist_ds_train = process_dataset(mnist_ds_train, batch_size= 64, resize= 28)  # 处理训练集,分批加载
mnist_ds_test = process_dataset(mnist_ds_test, batch_size= 32, resize= 28)  # 处理测试集, 分批加载

Training set size: 60000, test set size: 10000 

2. Download the built resnet18 network source code file

In order to allow developers to better experience the advantages of the MindSpore framework, MindSpore Model Zoo has added more typical networks and related pre-trained models, involving fields such as computer vision, natural language processing, recommendation systems, and graph neural networks. Among them, the ResNet series network models have also been implemented using MindSpore.

2.1. Download network source files

ResNet-18 is used  to realize the task of handwritten digit recognition. It needs to be resnet.pydownloaded before using the network structure defined by MindSpore.

# 下载构建好的网络源文件,只需执行一次即可。
!wget -N https://modelarts-labs-bj4-v2.obs.cn-north-4.myhuaweicloud.com/course/mindspore/mnist_recognition/src/resnet.py --no-check-certificate

2.2. Modify the number of channels in the network file

Since the single-channel grayscale image data is used, it is necessary to change the input channel 3 of the first convolutional layer in the original network result to 1, that is, to change the resnet.py3 in line 387 to 1, which is used here sed command in linux to edit text.

# 此命令执行后将直接在原文件中修改,不会有任何输出
!sed -i '387s/3/1/g' ./resnet.py

 3. Load the resnet18 network

from resnet import resnet18

network = resnet18(class_num=10)  

4. Define loss function and optimizer 

import mindspore
import mindspore.nn as nn

lr = 0.01
momentum = 0.9

net_loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True, reduction='mean')  # 损失函数
net_opt = nn.Momentum(network.trainable_params(), lr, momentum)  # 优化器

5. Configure running information 

from mindspore import context
context.set_context(mode=context.GRAPH_MODE, device_target="GPU")

6. Call the Model high-level API to train and save the model file

        Model training consists of two layers of iterations, multiple rounds of iterations (epoch) of the dataset and single-step iterations by batch size within a round of datasets.

In order to simplify the training process, MindSpore encapsulates the Model high-level interface:

  • The user inputs the network, loss function and optimizer to complete the initialization of the Model;

  • Call the train interface for training. The parameters of the train interface include the number of iterations (epoch) and the data set (dataset);

  • Call Model's eval interface to predict new image categories;

  • Model saving is the process of persisting training parameters. In the Model class, the model is saved through the callback function (callback).

Here, a round of iterations is performed on the data, and the training takes about 30 seconds

import os,time
from mindspore import Model
from mindspore import load_checkpoint, load_param_into_net
from mindspore.train.callback import ModelCheckpoint, CheckpointConfig, LossMonitor, TimeMonitor

model = Model(network, loss_fn=net_loss, optimizer=net_opt, metrics={
    
    'acc'})  # 完成Model初始化

# 训练参数
batch_num = mnist_ds_train.get_dataset_size()
max_epochs = 1

model_path = "./models/ckpt/"
os.system('rm -f {0}*.ckpt {0}*.meta {0}*.pb'.format(model_path))

# 定义回调函数
config_ck = CheckpointConfig(save_checkpoint_steps=batch_num, keep_checkpoint_max=35)
ckpoint_cb = ModelCheckpoint(prefix="train_resnet_mnist", directory=model_path, config=config_ck)

loss_cb = LossMonitor(batch_num)  # 用于输出损失
start_time = time.time()
model.train(max_epochs, mnist_ds_train, callbacks=[ckpoint_cb, loss_cb])  # 训练
res = model.eval(mnist_ds_test)  # 验证测试集
print("result: ", res)
cost_time = time.time() - start_time
print("训练总耗时: %.1f s" % cost_time)
epoch: 1 step: 937, loss is 0.039122876 

result: {'acc': 0.9818709935897436} 

total training time: 16.7 s

        From the above output results, it can be seen that the ResNet18 model only trains for one epoch, and it takes only about 17 seconds to achieve an accuracy rate of more than 0.98 on the test set of the handwritten digit recognition task. The accuracy rate can reach the application level. Next, we will save the model and load the model for batch picture prediction to see how the real prediction effect is.

Query the saved model during training

!tree ./models/ckpt/
./models/ckpt/

├── train_resnet_mnist-1_937.ckpt

└── train_resnet_mnist-graph.meta



0 directories, 2 files

        The model weight parameter .ckpt file is saved every 937 steps, and a total of 1 is saved. In addition, the .meta file saves the calculation graph information of the model.

7. Predictive display of batch images

import numpy as np
from PIL import Image
import mindspore
import mindspore.ops as ops
from mindspore import Tensor

dic_ds_test = mnist_ds_test.create_dict_iterator(output_numpy=True)
ds_test = next(dic_ds_test)
images_test = ds_test["image"]
labels_test = ds_test["label"]
output = model.predict(Tensor(images_test))
pred_labels = ops.Argmax(output_type=mindspore.int32)(output)
print("预测值 -- > ", pred_labels)  # 打印预测值
print("真实值 -- > ", labels_test)  # 打印真实值
batch_img = np.squeeze(images_test[0])
for i in range(1, len(labels_test)):
    batch_img = np.hstack((batch_img, np.squeeze(images_test[i])))  # 将一批图片水平拼接起来,方便下一步进行显示
Image.fromarray((batch_img*255).astype('uint8'), mode= "L")  # 显示真实值
Predicted value --> [0 3 8 5 2 0 1 2 8 5 0 3 4 5 4 5 6 2 9 4 8 0 1 1 7 5 6 7 8 5 9 4] True value --> [0 3 1 5 

2 0 1 2 8 5 0 3 4 5 4 5 6 2 4 4 8 0 1 1 7 5 6 7 8 5 9 4]

        It can be seen from the prediction results that there is only one wrong prediction among the 32 pictures: 1 prediction becomes 8, which shows that the prediction effect of the model is still good. 

Guess you like

Origin blog.csdn.net/m0_54776464/article/details/126073463