Detection and recognition of MNIST dataset based on BP neural network

1. about the author
2. Detection and recognition of MNIST dataset based on BP neural network
3. Detection and recognition experiment of MNIST data set based on BP neural network
4. BP neural network attention problem
- 4.1 Code-related issues

1. about the author

Ren Longgang, male, School of Electronic Information, Xi'an Polytechnic University, 2022 graduate student
Research direction: machine vision and artificial intelligence
Email: [email protected]

Zhang Siyi, female, School of Electronic Information, Xi'an Polytechnic University, 2022 graduate student, Zhang Hongwei Artificial Intelligence Research Group
Research direction: machine vision and artificial intelligence
Email: [email protected]

2. Detection and recognition of MNIST dataset based on BP neural network

2.1 Introduction to BP neural network

First of all, it can be seen from the name that the Bp neural network can be divided into two parts, bp and neural network. bp is the abbreviation of Back Propagation, which means backpropagation. The BP network can learn and store a large number of input-output pattern mapping relationships without revealing the mathematical equations describing the mapping relationship in advance. Its learning rule is to use the steepest descent method to continuously adjust the weights and thresholds of the network through backpropagation to minimize the sum of squared errors of the network. Its main characteristics are: the signal is propagated forward, and the error is propagated backward. The algorithm flow chart is as follows:
insert image description here

2.2 Neuron model

Each neuron receives input signals from other neurons, and each signal is passed through a weighted connection. The neuron adds these signals to get a total input value, and then compares the total input value with the neuron's threshold Compare (simulate threshold potential), and then process through an "activation function" to get the final output (simulate cell activation), and this output will be passed on layer by layer as the input of subsequent neurons. The neuron model is as follows:
insert image description here

2.3 Activation function

The purpose of introducing an activation function is to introduce non-linearity into the model. If there is no activation function (in fact, the activation function is f(x) = x), then no matter how many layers your neural network has, it will eventually be a linear map, then the network's approximation ability is quite limited, and a simple linear map cannot Solve linear inseparability problems. Because of the above reasons, we decided to introduce a nonlinear function as the activation function, so that the expressive ability of the deep neural network is more powerful.
Activation functions commonly used in BP neural network algorithms:
1) Sigmoid (logistic), also known as S-type growth curve, the function It works even better when used as a classifier.
insert image description here

2) Tanh function (hyperbolic tangent function), which solves the disadvantage that the logistic center is not 0, but still has the disadvantage that the gradient is easy to disappear.

3) The relu function is a general activation function, which is improved for the shortcomings of the Sigmoid function and tanh, and is currently used in most cases
insert image description here

2.4 BP Neural Network Infrastructure

The BP network consists of an input layer, a hidden layer, and an output layer.
insert image description here
Input layer: the input end of the information is the hidden layer that reads the data you input
: the processing end of the information, the number of layers of this hidden layer can be set (here one layer is hidden layer, q neurons)
output layer: the output end of information, which is the result we want
For the neural network model with only one hidden layer in the above picture: the process of BP neural network is mainly divided into two stages, the
first stage It is the forward propagation of the signal, from the input layer through the hidden layer, and finally reaches the output layer; the
second stage is the reverse propagation of the error, from the output layer to the hidden layer, and finally to the input layer, and sequentially adjusts the hidden layer to the output Layer weights and biases, input layer to hidden layer weights and biases

2.5 BP Neural Network Forward Propagation Back Propagation

Forward propagation process
Forward propagation is the process of letting information enter the network from the input layer, go through the calculation of each layer in turn, and obtain the final output layer result
From the input layer to the hidden layer:
from the hidden layer to the output layer:
take y1 as an example. The output in y1 naturally comes from b1, b2,...bq. Then multiply according to the weight respectively.

Similarly, we can solve the y2...yn
backpropagation process.
The basic idea is to adjust the network parameters by calculating the error between the output layer and the expected value, so that the error becomes smaller.
The formula for calculating the error is as follows: (the square of the difference)
insert image description here
How to adjust the weight to make the loss function smaller?
1. Gradient descent method
2. (Stochastic gradient descent) SGD
3. (Adaptive optimization) Adam

3. Detection and recognition experiment of MNIST data set based on BP neural network

3.1 Introduction to MNIST Dataset

The MNIST dataset is available at http://yann.lecun.com/exdb/mnist/, and it consists of four parts:
• Training set images: train-images-idx3-ubyte.gz (9.9 MB, 47 MB after decompression , contains 60,000 samples)
•Training set labels: train-labels-idx1-ubyte.gz (29 KB, 60 KB uncompressed, contains 60,000 labels)
•Test set images: t10k-images-idx3-ubyte.gz (1.6 MB, 7.8 MB after decompression, contains 10,000 samples)
•Test set labels: t10k-labels-idx1-ubyte.gz (5KB, 10 KB after decompression, contains 10,000 labels) The
labels in the Mnist dataset are between 0~9 The numbers in Mnist are represented by one-hot-vectors. Except for a certain digit of a one-hot vector, the arrays of other dimensions are all 0. For example, the label 0 is one-hot-vectors. The encoding is expressed as ([1, 0, 0, 0, 0, 0, 0, 0, 0, 0]), and the label 3 is expressed as ([0, 0, 0, 1, 0, 0, 0 , 0, 0, 0]). Therefore, all labels mnist.train.labels in the Mnist dataset is a [60000, 10] matrix of numbers.
insert image description here
Training pictures and test pictures:
each picture contains 28 28 pixels. The Mnist data set converts the two-dimensional data representing a picture into a vector with a length of 2828=784. Therefore, mnist.train.images in Mnist's training data set is a tensor with a shape of [60000, 784]. The first dimension number is used to index the picture, and the second dimension number is used to index the pixels in each picture. , the intensity value of a pixel in the image is between 0-1. (Figure 8 Partial Data Visualization)
insert image description here

3.2 Code implementation (Pytorch version)

Define the transform object, which defines how the images in the dataset should be processed:

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)), ])

Load and download the training and testing data set, here use the API provided by pytorch to download:

train_set = datasets.MNIST('data',  # 下载到该文件夹下
                           download=not os.path.exists('train_set'),  # 是否下载，如果下载过，则不重复下载
                           train=True,  # 是否为训练集
                           transform=transform  # 要对图片做的transform
                           )
print(train_set)
test_set = datasets.MNIST('data',
                          download=not os.path.exists('test_set'),
                          train=False,
                          transform=transform
                          )
print(test_set)

Construct DataLoader objects for training and testing datasets:

train_loader = torch.utils.data.DataLoader(train_set, batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=128, shuffle=True)

dataiter = iter(train_loader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

In the above, batch_size=64, each batch sends 64 pictures, each picture has only one channel (grayscale image), the size is 28x28. Take a picture and draw it:

plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');

Define the neural network:

class NerualNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        """
        定义第一个线性层，
        输入为图片（28x28），
        输出为第一个隐层的输入，大小为128。
        """
        self.linear1 = nn.Linear(28 * 28, 128)
        # 在第一个隐层使用ReLU激活函数
        self.relu1 = nn.ReLU()
        """
        定义第二个线性层，
        输入是第一个隐层的输出，
        输出为第二个隐层的输入，大小为64。
        """
        self.linear2 = nn.Linear(128, 64)
        # 在第二个隐层使用ReLU激活函数
        self.relu2 = nn.ReLU()
        """
        定义第三个线性层，
        输入是第二个隐层的输出，
        输出为输出层，大小为10
        """
        self.linear3 = nn.Linear(64, 10)
        # 最终的输出经过softmax进行归一化
        self.softmax = nn.LogSoftmax(dim=1)

        # 上述操作可以直接使用nn.Sequential写成如下形式：
        self.model = nn.Sequential(nn.Linear(28 * 28, 128),
                                   nn.ReLU(),
                                   nn.Linear(128, 64),
                                   nn.ReLU(),
                                   nn.Linear(64, 10),
                                   nn.LogSoftmax(dim=1)
                                   )

network forward pass

def forward(self, x):
    """
    定义神经网络的前向传播
    x: 图片数据, shape为(64, 1, 28, 28)
    """
    # 首先将x的shape转为(64, 784)
    x = x.view(x.shape[0], -1)
    # 接下来进行前向传播
    x = self.linear1(x)
    x = self.relu1(x)
    x = self.linear2(x)
    x = self.relu2(x)
    x = self.linear3(x)
    x = self.softmax(x)
    # 上述一串，可以直接使用 x = self.model(x) 代替。
    return x

Model instantiation:
define the loss function, here we choose the negative log likelihood loss function (NLLLoss, negative log likelihood loss), which is often used to
define the optimizer for classification tasks, here we use the stochastic gradient descent method, the learning rate is set to 0.003, and the momentum is set to Default 0.9 (used to prevent overfitting)

model = NerualNetwork()
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

Model training process:

time0 = time()  # 记录下当前时间
epochs = 15  # 一共训练15轮
for e in range(epochs):
    running_loss = 0  # 本轮的损失值
    for images, labels in train_loader:
        # 前向传播获取预测值
        output = model(images)
        # 计算损失
        loss = criterion(output, labels)
        # 进行反向传播
        loss.backward()
        # 更新权重
        optimizer.step()
        # 清空梯度
        optimizer.zero_grad()
        # 累加损失
        running_loss += loss.item()
    else:
        # 一轮循环结束后打印本轮的损失函数
        print("Epoch {} - Training loss: {}".format(e+1, running_loss / len(train_loader)))
# 打印总的训练时间
print("\nTraining Time (in minutes) =", (time() - time0) / 60)


correct_count, all_count = 0, 0
model.eval()  # 将模型设置为评估模式
# 从test_loader中一批一批加载图片
for images, labels in test_loader:
    # 循环检测这一批图片
    for i in range(len(labels)):
        logps = model(images[i])  # 进行前向传播，获取预测值
        probab = list(logps.detach().numpy()[0])                        # 将预测结果转为概率列表。[0]是取第一张照片的10个数字的概率列表（因为一次只预测一张照片）
        pred_label = probab.index(max(probab))  # 取最大的index作为预测结果
        true_label = labels.numpy()[i]
        if (true_label == pred_label):  # 判断是否预测正确
            correct_count += 1
        all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy ={}%".format((correct_count / all_count)*100))

Full code:

import os
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,)), ])

train_set = datasets.MNIST('data',  # 下载到该文件夹下
                           download=not os.path.exists('train_set'),  # 是否下载，如果下载过，则不重复下载
                           train=True,  # 是否为训练集
                           transform=transform  # 要对图片做的transform
                           )
print(train_set)
test_set = datasets.MNIST('data',
                          download=not os.path.exists('test_set'),
                          train=False,
                          transform=transform
                          )
print(test_set)
train_loader = torch.utils.data.DataLoader(train_set, batch_size=128, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=128, shuffle=True)

dataiter = iter(train_loader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

plt.imshow(images[0].numpy().squeeze(), cmap='gray_r');


class NerualNetwork(nn.Module):
    def __init__(self):
        super().__init__()
        """
        定义第一个线性层，
        输入为图片（28x28），
        输出为第一个隐层的输入，大小为128。
        """
        self.linear1 = nn.Linear(28 * 28, 128)
        # 在第一个隐层使用ReLU激活函数
        self.relu1 = nn.ReLU()
        """
        定义第二个线性层，
        输入是第一个隐层的输出，
        输出为第二个隐层的输入，大小为64。
        """
        self.linear2 = nn.Linear(128, 64)
        # 在第二个隐层使用ReLU激活函数
        self.relu2 = nn.ReLU()
        """
        定义第三个线性层，
        输入是第二个隐层的输出，
        输出为输出层，大小为10
        """
        self.linear3 = nn.Linear(64, 10)
        # 最终的输出经过softmax进行归一化
        self.softmax = nn.LogSoftmax(dim=1)

        # 上述操作可以直接使用nn.Sequential写成如下形式：
        self.model = nn.Sequential(nn.Linear(28 * 28, 128),
                                   nn.ReLU(),
                                   nn.Linear(128, 64),
                                   nn.ReLU(),
                                   nn.Linear(64, 10),
                                   nn.LogSoftmax(dim=1)
                                   )

    def forward(self, x):
        """
        定义神经网络的前向传播
        x: 图片数据, shape为(64, 1, 28, 28)
        """
        # 首先将x的shape转为(64, 784)
        x = x.view(x.shape[0], -1)
        # 接下来进行前向传播
        x = self.linear1(x)
        x = self.relu1(x)
        x = self.linear2(x)
        x = self.relu2(x)
        x = self.linear3(x)
        x = self.softmax(x)
        # 上述一串，可以直接使用 x = self.model(x) 代替。
        return x


model = NerualNetwork()
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.003, momentum=0.9)

time0 = time()  # 记录下当前时间
epochs = 15  # 一共训练15轮
for e in range(epochs):
    running_loss = 0  # 本轮的损失值
    for images, labels in train_loader:
        # 前向传播获取预测值
        output = model(images)
        # 计算损失
        loss = criterion(output, labels)
        # 进行反向传播
        loss.backward()
        # 更新权重
        optimizer.step()
        # 清空梯度
        optimizer.zero_grad()
        # 累加损失
        running_loss += loss.item()
    else:
        # 一轮循环结束后打印本轮的损失函数
        print("Epoch {} - Training loss: {}".format(e+1, running_loss / len(train_loader)))
# 打印总的训练时间
print("\nTraining Time (in minutes) =", (time() - time0) / 60)


correct_count, all_count = 0, 0
model.eval()  # 将模型设置为评估模式
# 从test_loader中一批一批加载图片
for images, labels in test_loader:
    # 循环检测这一批图片
    for i in range(len(labels)):
        logps = model(images[i])  # 进行前向传播，获取预测值
        probab = list(logps.detach().numpy()[0])                        # 将预测结果转为概率列表。[0]是取第一张照片的10个数字的概率列表（因为一次只预测一张照片）
        pred_label = probab.index(max(probab))  # 取最大的index作为预测结果
        true_label = labels.numpy()[i]
        if (true_label == pred_label):  # 判断是否预测正确
            correct_count += 1
        all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy ={}%".format((correct_count / all_count)*100))

3.3 Code implementation (TensorFlow version)

Download Data

#加载数据
mnist = tf.keras.datasets.mnist
(train_x,train_y),(test_x,test_y) = mnist.load_data()
print('\n train_x:%s, train_y:%s, test_x:%s, test_y:%s'%(train_x.shape,train_y.shape,test_x.shape,test_y.shape))

data preprocessing

#归一化、并转换为tensor张量，数据类型为float32.
X_train,X_test = tf.cast(train_x/255.0,tf.float32),tf.cast(test_x/255.0,tf.float32)
y_train,y_test = tf.cast(train_y,tf.int16),tf.cast(test_y,tf.int16)

Building a model (a neural network with a single hidden layer)

#建立模型
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))     #添加Flatten层说明输入数据的形状
model.add(tf.keras.layers.Dense(128,activation='relu'))     #添加隐含层，为全连接层，128个节点，relu激活函数
model.add(tf.keras.layers.Dense(10,activation='softmax'))   #添加输出层，为全连接层，10个节点，softmax激活函数
print('\n',model.summary())     #查看网络结构和参数信息

Configure the model training method

#配置模型训练方法
#adam算法参数采用keras默认的公开参数，损失函数采用稀疏交叉熵损失函数，准确率采用稀疏分类准确率函数
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['sparse_categorical_accuracy'])

evaluation model

#评估模型
model.evaluate(X_test,y_test,verbose=2)    #每次迭代输出一条记录，来评价该模型是否有比较好的泛化能力

save model

#保存整个模型
model.save('mnist_weights.h5')

Results visualization

#结果可视化
print(history.history)
loss = history.history['loss']          #训练集损失
val_loss = history.history['val_loss']  #测试集损失
acc = history.history['sparse_categorical_accuracy']            #训练集准确率
val_acc = history.history['val_sparse_categorical_accuracy']    #测试集准确率

Running results
insert image description here
Test results

Complete code:

########手写数字数据集##########
###########保存模型############
########1层隐含层（全连接层）##########
#60000条训练数据和10000条测试数据，28x28像素的灰度图像
#隐含层激活函数：ReLU函数
#输出层激活函数：softmax函数（实现多分类）
#损失函数：稀疏交叉熵损失函数
#输入层有784个节点，隐含层有128个神经元，输出层有10个节点
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np

import time
print('--------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print(nowtime)

#指定GPU
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# gpus = tf.config.experimental.list_physical_devices('GPU')
# tf.config.experimental.set_memory_growth(gpus[0],True)
#初始化
plt.rcParams['font.sans-serif'] = ['SimHei']

#加载数据
mnist = tf.keras.datasets.mnist
(train_x,train_y),(test_x,test_y) = mnist.load_data()
print('\n train_x:%s, train_y:%s, test_x:%s, test_y:%s'%(train_x.shape,train_y.shape,test_x.shape,test_y.shape))

#数据预处理
#X_train = train_x.reshape((60000,28*28))
#Y_train = train_y.reshape((60000,28*28))       #后面采用tf.keras.layers.Flatten()改变数组形状

#归一化、并转换为tensor张量，数据类型为float32.
X_train,X_test = tf.cast(train_x/255.0,tf.float32),tf.cast(test_x/255.0,tf.float32)
y_train,y_test = tf.cast(train_y,tf.int16),tf.cast(test_y,tf.int16)

#建立模型
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28)))     #添加Flatten层说明输入数据的形状
model.add(tf.keras.layers.Dense(128,activation='relu'))     #添加隐含层，为全连接层，128个节点，relu激活函数
model.add(tf.keras.layers.Dense(10,activation='softmax'))   #添加输出层，为全连接层，10个节点，softmax激活函数
print('\n',model.summary())     #查看网络结构和参数信息

#配置模型训练方法
#adam算法参数采用keras默认的公开参数，损失函数采用稀疏交叉熵损失函数，准确率采用稀疏分类准确率函数
model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['sparse_categorical_accuracy'])

#训练模型
#批量训练大小为64，迭代5次，测试集比例0.2（48000条训练集数据，12000条测试集数据）
print('--------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print('训练前时刻：'+str(nowtime))

history = model.fit(X_train,y_train,batch_size=64,epochs=30,validation_split=0.2)

print('--------------')
nowtime = time.strftime('%Y-%m-%d %H:%M:%S')
print('训练后时刻：'+str(nowtime))
#评估模型
model.evaluate(X_test,y_test,verbose=2)    #每次迭代输出一条记录，来评价该模型是否有比较好的泛化能力

#保存模型参数
#model.save_weights('C:\\Users\\xuyansong\\Desktop\\深度学习\\python\\MNIST\\模型参数\\mnist_weights.h5')
#保存整个模型
model.save('mnist_weights.h5')


#结果可视化
print(history.history)
loss = history.history['loss']          #训练集损失
val_loss = history.history['val_loss']  #测试集损失
acc = history.history['sparse_categorical_accuracy']            #训练集准确率
val_acc = history.history['val_sparse_categorical_accuracy']    #测试集准确率

plt.figure(figsize=(10,3))

plt.subplot(121)
plt.plot(loss,color='b',label='train')
plt.plot(val_loss,color='r',label='test')
plt.ylabel('loss')
plt.legend()

plt.subplot(122)
plt.plot(acc,color='b',label='train')
plt.plot(val_acc,color='r',label='test')
plt.ylabel('Accuracy')
plt.legend()

#暂停5秒关闭画布，否则画布一直打开的同时，会持续占用GPU内存
#根据需要自行选择
#plt.ion()       #打开交互式操作模式
#plt.show()
#plt.pause(5)
#plt.close()

#使用模型
plt.figure()
for i in range(10):
    num = np.random.randint(1,10000)

    plt.subplot(2,5,i+1)
    plt.axis('off')
    plt.imshow(test_x[num],cmap='gray')
    demo = tf.reshape(X_test[num],(1,28,28))
    y_pred = np.argmax(model.predict(demo))
    plt.title('标签值：'+str(test_y[num])+'\n预测值：'+str(y_pred))
y_pred = np.argmax(model.predict(X_test[0:5]),axis=1)
print('X_test[0:5]: %s'%(X_test[0:5].shape))
print('y_pred: %s'%(y_pred))

plt.ion()       #打开交互式操作模式
plt.show()
plt.pause(5)
plt.close()

4. BP neural network attention problem

a) Parameter selection is very important. When using the most basic BP algorithm to train the BP neural network, the settings of the learning rate, mean square error, weight, and threshold all have an impact on the training of the network. Selecting a reasonable value comprehensively will be beneficial to the training of the network. In the most basic BP algorithm, the learning rate remains constant throughout the training process. If the learning rate is too large, the algorithm may oscillate and become unstable; if the learning rate is too small, the convergence speed will be slow and the training time will be long.
b) There is paralysis. Since the optimized objective function is very complex, it will inevitably appear some flat areas when the neuron output is close to 0 or 1. In these areas, the weight error changes very little, so that the training process almost stops; c) Search
step The length is determined in advance. In order to make the network implement the BP algorithm, the traditional one-dimensional search method cannot be used to find the step size of each iteration, but the update rule of the step size must be given to the network in advance, which will cause the algorithm to be inefficient.
Accurary visualization results under different epochs:

insert image description here

4.1 Code-related issues

The num_worker setting in Dataloader
insert image description here

• num_workers=0 means that only the main process loads batch data, which may be a bottleneck.
• num_workers = 1 means that only one worker process is used to load batch data, and the main process does not participate in data loading. This will also be very slow.
•num_workers>0 means that only the specified number of worker processes will load data, and the main process will not participate. Increasing num_works will also increase the consumption of cpu memory. So the value of num_workers depends on the batch size and machine performance.
• A general starting point is to set num_workers equal to the number of CPUs on the computer
• The best approach is to slowly increase num_workers until the training speed no longer improves, then stop increasing the value of num_workers.

Detection and recognition of MNIST dataset based on BP neural network (Pytorch, Tensorflow version)