Experiment 3 Pattern recognition experiment based on neural network
1. Purpose of the experiment:
Understand the structure and principle of BP neural network and convolutional neural network, master the training process of backpropagation learning algorithm for neurons, and understand the backpropagation formula. By building BP neural network and convolutional neural network pattern recognition examples, familiarize yourself with the principle, structure and working process of feedforward neural network.
2. Experimental principle
The BP learning algorithm minimizes the error through the reverse learning process. The algorithm process starts from the output node and reversely propagates the weight correction caused by the total error to the first hidden layer (that is, the hidden layer closest to the input layer). . The BP network not only contains input nodes and output nodes, but also contains one or more layers of hidden (layer) nodes. The input signal is transmitted forward to the hidden node first, and then the output information of the hidden node is transmitted to the output node after being acted on, and finally the output result is given.
The artificial neurons of the convolutional neural network can respond to the surrounding units within a part of the coverage. The convolutional neural network consists of three parts: the first part is the input layer, the second part consists of a combination of n convolutional layers and pooling layers, and the third part consists of a fully connected multilayer perceptron classifier. This structure enables convolutional neural networks to take advantage of the two-dimensional structure of the input data. The convolutional neural network will automatically learn the best convolution kernel and the combination of these convolution kernels for a picture, and then make a judgment.
3. Experimental conditions:
Independently install 64-bit python 3.6 or above, and third-party libraries such as TensorFlow2.0 or above, numpy, matplotlib, pylab, etc., create a new folder datasets (C:\Users\A\.keras\datasets), put mnist.npz into datasets folder.
4. Experimental content:
1. Analyze the Mnist dataset, select 55,000 training samples, 5,000 validation samples, and 10,000 test samples, and set the batch size to 100.
2. Design a BP network structure model with 2 hidden layers. The output layer adopts the cross-entropy loss function after sofmax regression. Set the parameters such as learning rate, number of training steps, and number of batches, and give the training and test results, respectively fill in Table 1 below. ( Note: In Table 1 below , both the? and blank spaces need to be completed )
Table 1 Parameters and training and test results of BP network
parameter |
number of hidden layers |
Hidden layer activation function |
Number of neurons in the hidden layer |
learning algorithm |
Training results (training loss value, verification accuracy rate) |
test accuracy |
Batch size: 100 Learning rate: 0.01 Training times: 10 Number of input neurons: 512 Number of output neurons: 10 Output activation function: cross entropy after sofmax regression |
2 |
resume |
The first hidden layer: The number of neurons is: 512 The parameter amount is: 401920 The second hidden layer: The number of neurons is: 512 The number of parameters is: 261656 |
Stochastic gradient descent + momentum method: optimizers.SGD(lr=0.01) |
loss: 0.4541553258895874 Acc: 0.9074 |
Test Acc: 0.8866 |
Adagrad algorithm: optimizers.Adagrad(lr=0.01) |
loss: 0.0027855050284415483 Acc: 0.984 |
Test Acc: 0.9819 |
||||
Adam algorithm: optimizer=optimizers.Adam(lr=0.01) |
Loss: 0.17099925875663757 ACC=0.972 |
TEST ACC=0.9636 |
3. For a BP network structure model with 4 hidden layers (including 512 neurons), the hidden layer activation function uses relu, and the learning algorithm uses the Adam algorithm. Design and compare the accuracy of different models, and fill in the training and test results in Table 2 below. ( Note: In Table 2 below , both the? and blank spaces need to be completed )
Table 2 BP network training and test results with 4 hidden layers
different models |
Training results (training loss value, verification accuracy rate) |
test accuracy |
Use an exponentially decaying learning rate: Initial learning rate = 0.01 Attenuation coefficient = 0.96 Attenuation steps = 1000 |
loss: 0.0005220939638093114 Acc: 0.9766 |
Test Acc: 0.9776 |
Using regularization only, with a fixed learning rate: learning rate=0.01 Regularization factor = 0.001 |
loss: 0.17735381424427032 Acc: 0.971 |
Test Acc: 0.9676 |
Use exponentially decaying learning rate and regularization: Regularization factor = 0.001 Initial learning rate = 0.01 Attenuation coefficient = 0.96 Attenuation steps = 1000 |
loss: 0.02895130217075348 Acc: 0.9806 |
Test Acc: 0.9793 |
Only using Dropout with a fixed learning rate: learning rate=0.01 Dropout disconnection rate = 0.5 |
loss: 0.1732 Acc: 0.967 |
Test Acc: 0.958 |
4. For the Mnist dataset, select 55,000 training samples, 5,000 validation samples, and 10,000 test samples, and set the batch size to 100. Then build a convolutional neural network model, fill in the structural model parameters of the convolutional neural network in Table 3, use relu as the hidden layer activation function, select the learning algorithm, set the parameters, and fill in the training and test results of different models in the following table 4 . ( Note: In Table 3, fill in the names of input layer, convolutional layer, pooling layer, fully connected layer, and output layer; the BP neural network in Table 4 uses 4 hidden layers (each layer contains 512 neurons, activation The function adopts the network structure model of relu )
Table 3 Structural model parameters of convolutional neural network
name |
Number of neurons |
filter size |
Number of convolution kernels |
step size |
activation function |
Number of fills |
output feature map size |
|
Tier 1 |
Convolution layer 1 |
6*26*26 |
3*3 |
6 |
1 |
resume |
0 |
26*26 |
layer 2 |
pooling layer 1 |
- |
2*2 |
6 |
2 |
resume |
0 |
13*13 |
layer 3 |
Convolution layer 2 |
16*11*11 |
3*3 |
16 |
1 |
resume |
0 |
11*11 |
layer 4 |
pooling layer 2 |
- |
2*2 |
16 |
2 |
resume |
0 |
5*5 |
layer 5 |
Fully connected layer 1 |
120 |
- |
- |
- |
resume |
0 |
- |
Layer 6 |
Fully connected layer 2 |
84 |
- |
- |
- |
resume |
0 |
- |
Layer 7 |
Fully connected layer 3 |
10 |
- |
- |
- |
resume |
0 |
- |
Table 4 Training and test results of convolutional neural network and BP neural network
Learning algorithm and parameter setting |
different models |
Training results (training loss value, verification accuracy rate) |
test accuracy |
The learning rate is 0.01 The learning algorithm is Adam |
Convolutional Neural Networks Using Regularization (regularization factor=0.001) |
loss: 0.10340672731399536 Acc: 0.9824 |
Test Acc: 0.9799 |
Using regularized BP neural network (regularization factor=0.001) |
loss: 0.17735381424427032 Acc: 0.971 |
Test Acc: 0.9676 |
五、实验报告要求:
1. 按照实验内容,给出相应结果。
2.分析比较不同学习算法对BP网络的训练结果、测试结果等的影响。
①随机梯度下降的算法是训练速度最快的,同时也是准确率最低的,训练结果和测试结果的准确率都不是很高。代价函数也最大,为0.4541553258895874,比其他两种算法都高很多,说明其模型拟合的不好。因为它是盲目搜索,是随机抽取的一个样本,信息少,容易跑偏。
②Adagrad算法性能是三种里面最好的,无论是训练效果还是测试效果都是准确率最高的,代价函数的值也是远远小于其他两种算法,可以从结果看Adagrad算法并没有发生严重的过拟合现象。
③Adam算法优化,也可以提高训练结果和测试结果的准确度,但没有Adagrad算法准确度高。
3. 分析比较使用指数衰减学习率、正则化和Drop层等不同模型对于训练结果、测试结果等的影响。
①使用指数衰减学习率提高了训练结果和测试结果的准确度,但提升的程度不明显,但是代价函数小了很多。
②使用正则化后,可以比较好的解决过拟合的问题,这边的loss值比不使用时大了一点,但是从准确度上来说模型拟合的性能并没有提高多少。
③使用droout层,也是和使用正则化一样的效果,它对于神经网络单元按照一定的概率将其暂时从网络中丢弃,从而提升训练效果,但是在本实验中不太明显,本实验是只在第3层加dropout层,如下:
network = Sequential([layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dropout(0.5),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')])
④使用指数衰减学习率和正则化结合的方法比只使用指数衰减学习率的loss值高,说明一定程度上减小了过拟合的效应,但是loss比值使用正则化的loss值高,提升了模型性能,训练结果和测试结果的准确度都提升了。
4. 总结BP网络和卷积神经网络在模式识别方面的异同点。
不同点:
①BP网络和卷积神经网络的计算方法不同:
BP神经网络是一种按照误差逆向传播算法训练的多层前馈神经网络
卷积神经网络则包含卷积计算且具有深度结构的前馈神经网络。
②卷积网络使用共享权值来减少网络各层之间的连接
③BP的网络结构:包括输入层,隐层和输出层。
卷积网络结构:输入层,卷积层,池化层,全连接层,输出层。
④BP采用全连接,卷积网络采用局部感知
相同点:
①BP神经网络和卷积神经网络都属于前馈神经网络,
②输入层都是输入图像,输出层都是多分类的结果。
③网络的中间层数、各层的神经元个数都可以根据具体情况任意设定,并且随着结构的差异其性能也有所不同。
④都采用了前向传播计算输出值,反向传播调整权重和偏置。
5. 实验心得。
①掌握了卷积层中输入大小和输出大小之间的计算关系:
输入:rxc,卷积核:axb,步长stride:s=1 输出:长=(r-a)/s+1 ,宽 =(c-b) )/s +1
②掌握了训练参数的计算
③掌握了BP和卷积网络的搭建和和优化算法的使用及训练和预测
代码(例:4层指数衰减):
import tensorflow as tf
from tensorflow.keras import datasets, layers, optimizers, Sequential
import matplotlib.pyplot as plt
import numpy as np
import pylab
def preprocess(x, y):
x = tf.cast(x, dtype=tf.float32) / 255.
y = tf.cast(y, dtype=tf.int32)
return x,y
(x, y), (x_test, y_test) = datasets.mnist.load_data()#下载或读取数据集
print('datasets:', x.shape, y.shape, x.min(), x.max())#打印
训练:
imgs = x_test[0:5]#选取第0到5的图片
labs = y_test[0:5]
#print(labs)
plot_imgs = np.hstack(imgs)#将五张图片拼接成一行
plt.imshow(plot_imgs, cmap='gray')#选择gray灰度图
#pylab.show()#显示测试图片
x_train,x_val=tf.split(x,num_or_size_splits=[55000,5000])
y_train,y_val=tf.split(y,num_or_size_splits=[55000,5000])
batchsz = 100#设置批量大小
db = tf.data.Dataset.from_tensor_slices((x_train,y_train))
db = db.map(preprocess).shuffle(55000).batch(batchsz).repeat(10)
db_test = tf.data.Dataset.from_tensor_slices((x_test,y_test))
db_test = db_test.map(preprocess).batch(batchsz)
ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
ds_val = ds_val.map(preprocess).batch(batchsz)
#构建网络模型
network = Sequential([layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(512, activation='relu'),
layers.Dense(10, activation='softmax')])
network.build(input_shape=(batchsz, 28*28))#批量大小
network.summary() #打印网络参数
# 设置优化器
exponential_decay=tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate=0.01,decay_steps=1000,decay_rate=0.96)
optimizer=tf.keras.optimizers.Adam(exponential_decay)
#optimizer = optimizers.Adam(lr=0.01) # 固定学习率的Adam学习算法,大块=快,会振荡
#optimizer = = optimizers.SGD(0.01, decay=1e-2)# 固定学习率的SGD学习算法
#分批进行训练
for echo in range(10):
for step, (x,y) in enumerate(db):#从训练集读取一批样本数据
with tf.GradientTape() as tape:#构建梯度训练环境
# [b, 28, 28] => [b, 784]
x = tf.reshape(x, (-1, 28*28))
# [b, 784] => [b, 10]
out = network(x,training=True)
# [b] => [b, 10]
y_onehot = tf.one_hot(y, depth=10)
# [b]
#计算交叉熵损失函数
loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_onehot, out, from_logits=False))
grads = tape.gradient(loss, network.trainable_variables)#计算梯度
optimizer.apply_gradients(zip(grads, network.trainable_variables))#更新训练参数
if step % 100 == 0:
print('echo=',echo,' step=',step, 'loss:', float(loss))#打印训练的损失函数
# 模型评价
if step % 500 == 0:
total, total_correct = 0., 0
for i, (x, y) in enumerate(ds_val):
# [b, 28, 28] => [b, 784]
x = tf.reshape(x, (-1, 28*28))
# [b, 784] => [b, 10]
out = network(x)#神经网络模型输出
# [b, 10] => [b]
pred = tf.argmax(out, axis=1)
pred = tf.cast(pred, dtype=tf.int32)
# bool type
correct = tf.equal(pred, y)
# bool tensor => int tensor => numpy
total_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
total += x.shape[0]
print(' step=',step, 'Evaluate Acc:', total_correct/total)#打印模型验证正确率
print("train is over")
#测试模型
total, total_correct = 0., 0
for i, (x, y) in enumerate(db_test):#读取一批测试数据
# [b, 28, 28] => [b, 784]
x = tf.reshape(x, (-1, 28*28))
# [b, 784] => [b, 10]
out = network(x)#神经网络模型输出
# [b, 10] => [b]
pred = tf.argmax(out, axis=1)
pred = tf.cast(pred, dtype=tf.int32)
# bool type
correct = tf.equal(pred, y)
# bool tensor => int tensor => numpy
btestacc=tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
total_correct += btestacc
total += x.shape[0]
#print('第',i,'批','test acc=',btestacc/x.shape[0])
#print(y)
#print(pred)
print('Test Acc:', total_correct/total)
训练结果: