Handwritten digit recognition
Task requirements
It can recognize images of handwritten digits 0-9. Specifically, the gray-scale image of handwritten digits (28 pixels x 28 pixels) is divided into 10 categories (0-9). It is required to use the PaddlePaddle framework to implement the model.
Data set and environment
- Data set source: MNIST , a classic data set in the ML field , containing 60,000 training images and 10,000 test images
- Data description: The data is divided into pictures and labels. The picture is a 28*28 pixel matrix, and the label is 10 numbers from 0 to 9
- Operating environment: PaddlePaddle2.0 + cuda11.1 + pycharm
Tips: The new version of PaddlePaddle2.0 ship, the newly added high-level API simplifies the model building process, and is convenient for quick hands-on practice!
Model building process
Next, we mainly conduct experiments around this process, as shown in the figure:
further, in the model training, we mainly do the tasks shown in the following figure:
Tips : The overall process framework here (why we should follow this process) can refer to the explanation of the regression ( Pokémon) of teacher Li Hongyi , about BP neural network (especially backpropagation and the chain derivation rule) ) You can refer to the watermelon book (summary) and the flower book (detailed). In addition, the gradient optimization can also refer to the statistical learning method of Li Hang and the gradient decent part of Li Hongyi . I will write a summary later, and I will not expand it here. Please forgive me~
Data preprocessing
The flying paddle has built-in MNIST data set, just call it. Define the training set train_dataset
and test set of the data set test_dataset
. Then use the Normalize
interface to normalize the picture.
import paddle
import numpy as np
import matplotlib.pyplot as plt
import paddle.vision.transforms as T
# 数据的加载和预处理
transform = T.Normalize(mean=[127.5], std=[127.5])
# 训练数据集
train_dataset = paddle.vision.datasets.MNIST(mode='train', transform=transform)
# 评估数据集
eval_dataset = paddle.vision.datasets.MNIST(mode='test', transform=transform)
print('训练集样本量: {},验证集样本量: {}'.format(len(train_dataset), len(eval_dataset)))
Why do we need to normalize? Here is a pre-processed picture for illustration.
print('图片:')
print(type(train_dataset[0][0]))
print(train_dataset[0][0])
print('标签:')
print(type(train_dataset[0][1]))
print(train_dataset[0][1])
# 可视化展示
plt.figure()
plt.imshow(train_dataset[0][0].reshape([28,28]), cmap=plt.cm.binary)
plt.show()
As shown in the figure, the range of the pixel matrix value after normalization is no longer 0 ~ 255, but compressed to -1 ~ 1. Obviously it is convenient to perform calculations later. For the normalization method, we use a uniform mean and standard deviation to calculate each channel of the image.
Then come to pay attention Normalize
to what can be done with the interface?
class paddle.vision.Normalize(mean=0.0, std=1.0, data_format='CHW', to_rgb=False, keys=None)
We have just mentioned the processing method of image normalization, and in this interface, the calculation process is as follows:
output [channel] = (input [channel] − mean [channel]) / std [channel] output[channel] = ( input[channel]-mean[channel]) / std[channel]output[channel]=(input[channel]−m e a n [ c h a n n e l ] ) / s t d [ c h a n n e l ]
Definition of related parameters used this time:
- mean: the normalized mean for each channel
- std: the standard deviation used for the normalization of each channel
- data_format (str, optional): The format of the data, it must be'HWC' or'CHW'. Default value:'CHW'
This method returns the normalized image, the return type is numpy ndarray
(numpy n-dimensional array object).
Model networking
Now start to design the neural network, using a single hidden layer fully connected network. Input layer neurons 784 (28 pixels * 28 pixels), hidden layer 512 neurons (can be customized at will), output layer 10 neurons (obviously this is a multi-classification task, divided into 0-9 numbers).
The model building code is as follows:
# 模型网络结构搭建
network = paddle.nn.Sequential(
paddle.nn.Flatten(), # 拉平,将 (28, 28) => (784)
paddle.nn.Linear(784, 512), # 隐层:线性变换层
paddle.nn.ReLU(), # 激活函数
paddle.nn.Linear(512, 10) # 输出层
)
# 模型封装
model = paddle.Model(network)
# 模型可视化
model.summary((1, 28, 28))
Here we Sequential
define the neural network. Note: The Sequential
interface is the sequential container provided by paddlepaddle . among them,
1.
Flatten
Interface, flatten a continuous-dimensional Tensor into a one-dimensional Tensor. In short, it is to flatten the 28*28 pixels.
2.Linear
Interface, set the hidden layer and output layer to linear transformation layer. That is:
O ut = XW + b Out = XW + bOut=X W+b
3.ReLU
Interface, use the relu activation function to process the result of the neuron's linear transformation, and then as the output value, output to the next layer
ReLU (x) = max (0, x) ReLU(x)=max(0, x)R and L U ( x )=max(0,x)
After that, the model is encapsulated and visualized to confirm the success of the model construction.
Training model
Now start to configure the loss function, optimizer, and evaluation indicators. Here we use the gradient descent method to optimize the parameters of the neural network. Among them, we use the Adam optimizer to dynamically adjust the learning rate (learning rate) of each parameter. Paddlepaddle also provides the corresponding interface . It is recommended to see the interface document. Adam algorithm paper.
Then start training the model.
# 配置优化器、损失函数、评估指标
model.prepare(paddle.optimizer.Adam(learning_rate=0.001, parameters=network.parameters()),
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy())
# 启动模型全流程训练
model.fit(train_dataset, # 训练数据集
eval_dataset, # 评估数据集
epochs=5, # 训练的总轮次
batch_size=64, # 训练使用的批大小
verbose=1) # 日志展示形式
Evaluation model
Evaluate the model to get the accuracy (accuracy).
# 模型评估,根据prepare接口配置的loss和metric进行返回
result = model.evaluate(eval_dataset, verbose=1)
print(result)
Model prediction
Batch prediction
Use predict
for batch prediction.
Extracted from official documents , the high-level API provides an predict
interface to facilitate users to predict and verify the trained model. You only need to put the data that needs to be predicted and tested into the interface for calculation based on the trained model, and the interface will pass the model The calculated prediction result is returned.
The return format is a list, the number of elements corresponds to the output number of the model:
- The model is a single output:
[(numpy_ndarray_1, numpy_ndarray_2, …, numpy_ndarray_n)]
- The model is multi-output:
[(numpy_ndarray_1, numpy_ndarray_2, …, numpy_ndarray_n), (numpy_ndarray_1, numpy_ndarray_2, …, numpy_ndarray_n), …]
- Note: It
numpy_ndarray_n
is the predicted data obtained after the corresponding original data is calculated by the model, and the number corresponds to the number of the predicted data set.
# 进行预测操作
result = model.predict(eval_dataset)
# 定义画图方法
def show_img(img, predict):
plt.figure()
plt.title('predict: {}'.format(predict))
plt.imshow(img.reshape([28, 28]), cmap=plt.cm.binary)
plt.show()
# 抽样展示
indexs = [2, 15, 38, 211]
for idx in indexs:
show_img(eval_dataset[idx][0], np.argmax(result[0][idx]))
Single picture prediction
Use model.predict_batch to predict a single or a small number of multiple pictures.
# 读取单张图片
image = eval_dataset[501][0]
# 单张图片预测
result = model.predict_batch([image])
# 可视化结果
show_img(image, np.argmax(result))
deploy
Save model
# 保存用于后续继续调优训练的模型
model.save('finetuning/mnist')
Continue tuning training
from paddle.static import InputSpec
# 模型封装,为了后面保存预测模型,这里传入了inputs参数
model_2 = paddle.Model(network, inputs=[InputSpec(shape=[-1, 28, 28], dtype='float32', name='image')])
# 加载之前保存的阶段训练模型
model_2.load('finetuning/mnist')
# 模型配置
model_2.prepare(paddle.optimizer.Adam(learning_rate=0.001, parameters=network.parameters()),
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy())
# 模型全流程训练
model_2.fit(train_dataset,
eval_dataset,
epochs=2,
batch_size=64,
verbose=1)
Save the prediction model
# 保存用于后续推理部署的模型
model_2.save('infer/mnist', training=False)