Introduction to Artificial Intelligence--Pattern Recognition Experiment Based on Neural Network

Experiment 3 Pattern recognition experiment based on neural network

1. Purpose of the experiment:

Understand the structure and principle of BP neural network and convolutional neural network, master the training process of backpropagation learning algorithm for neurons, and understand the backpropagation formula. By building BP neural network and convolutional neural network pattern recognition examples, familiarize yourself with the principle, structure and working process of feedforward neural network.

2. Experimental principle

The BP learning algorithm minimizes the error through the reverse learning process. The algorithm process starts from the output node and reversely propagates the weight correction caused by the total error to the first hidden layer (that is, the hidden layer closest to the input layer). . The BP network not only contains input nodes and output nodes, but also contains one or more layers of hidden (layer) nodes. The input signal is transmitted forward to the hidden node first, and then the output information of the hidden node is transmitted to the output node after being acted on, and finally the output result is given.

The artificial neurons of the convolutional neural network can respond to the surrounding units within a part of the coverage. The convolutional neural network consists of three parts: the first part is the input layer, the second part consists of a combination of n convolutional layers and pooling layers, and the third part consists of a fully connected multilayer perceptron classifier. This structure enables convolutional neural networks to take advantage of the two-dimensional structure of the input data. The convolutional neural network will automatically learn the best convolution kernel and the combination of these convolution kernels for a picture, and then make a judgment.

3. Experimental conditions:

Independently install 64-bit python 3.6 or above, and third-party libraries such as TensorFlow2.0 or above, numpy, matplotlib, pylab, etc., create a new folder datasets (C:\Users\A\.keras\datasets), put mnist.npz into datasets folder.

4. Experimental content:

1. Analyze the Mnist dataset, select 55,000 training samples, 5,000 validation samples, and 10,000 test samples, and set the batch size to 100.

2. Design a BP network structure model with 2 hidden layers. The output layer adopts the cross-entropy loss function after sofmax regression. Set the parameters such as learning rate, number of training steps, and number of batches, and give the training and test results, respectively fill in Table 1 below. ( Note: In Table 1 below , both the? and blank spaces need to be completed )

Table 1 Parameters and training and test results of BP network

parameter

number of hidden layers

Hidden layer activation function

Number of neurons in the hidden layer

learning algorithm

Training results (training loss value, verification accuracy rate)

test accuracy

Batch size: 100

Learning rate: 0.01

Training times: 10

Number of input neurons: 512 

Number of output neurons: 10

Output activation function: cross entropy after sofmax regression

2

resume

The first hidden layer:

The number of neurons is: 512

The parameter amount is: 401920

The second hidden layer:

The number of neurons is: 512

The number of parameters is: 261656

Stochastic gradient descent + momentum method: optimizers.SGD(lr=0.01)

loss: 0.4541553258895874

Acc: 0.9074

Test Acc: 0.8866

Adagrad algorithm:

optimizers.Adagrad(lr=0.01)

loss: 0.0027855050284415483

Acc: 0.984

Test Acc: 0.9819

Adam algorithm: optimizer=optimizers.Adam(lr=0.01)

Loss:

0.17099925875663757

ACC=0.972

TEST ACC=0.9636

3. For a BP network structure model with 4 hidden layers (including 512 neurons), the hidden layer activation function uses relu, and the learning algorithm uses the Adam algorithm. Design and compare the accuracy of different models, and fill in the training and test results in Table 2 below. ( Note: In Table 2 below , both the? and blank spaces need to be completed )

Table 2 BP network training and test results with 4 hidden layers

different models

Training results (training loss value, verification accuracy rate)

test accuracy

Use an exponentially decaying learning rate:

Initial learning rate = 0.01

Attenuation coefficient = 0.96

Attenuation steps = 1000

loss: 0.0005220939638093114

Acc: 0.9766

Test Acc: 0.9776

Using regularization only, with a fixed learning rate:

learning rate=0.01

Regularization factor = 0.001

loss: 0.17735381424427032

Acc: 0.971

Test Acc: 0.9676

Use exponentially decaying learning rate and regularization:

Regularization factor = 0.001

Initial learning rate = 0.01

Attenuation coefficient = 0.96

Attenuation steps = 1000

loss: 0.02895130217075348

Acc: 0.9806

Test Acc: 0.9793

Only using Dropout with a fixed learning rate:

learning rate=0.01

Dropout disconnection rate = 0.5

loss: 0.1732

Acc:  0.967

Test Acc:  0.958

4. For the Mnist dataset, select 55,000 training samples, 5,000 validation samples, and 10,000 test samples, and set the batch size to 100. Then build a convolutional neural network model, fill in the structural model parameters of the convolutional neural network in Table 3, use relu as the hidden layer activation function, select the learning algorithm, set the parameters, and fill in the training and test results of different models in the following table 4 . ( Note: In Table 3, fill in the names of input layer, convolutional layer, pooling layer, fully connected layer, and output layer; the BP neural network in Table 4 uses 4 hidden layers (each layer contains 512 neurons, activation The function adopts the network structure model of relu )

Table 3 Structural model parameters of convolutional neural network

name

Number of neurons

filter size

Number of convolution kernels

step size

activation function

Number of fills

output feature map size

Tier 1

Convolution layer 1

6*26*26

3*3

6

1

resume

0

26*26

layer 2

pooling layer 1

-

2*2

6

2

resume

0

13*13

layer 3

Convolution layer 2

16*11*11

3*3

16

1

resume

0

11*11

layer 4

pooling layer 2

-

2*2

16

2

resume

0

5*5

layer 5

Fully connected layer 1

120

-

-

-

resume

0

-

Layer 6

Fully connected layer 2

84

-

-

-

resume

0

-

Layer 7

Fully connected layer 3

10

-

-

-

resume

0

-

Table 4 Training and test results of convolutional neural network and BP neural network

Learning algorithm and parameter setting

different models

Training results (training loss value, verification accuracy rate)

test accuracy

The learning rate is 0.01

The learning algorithm is Adam

Convolutional Neural Networks Using Regularization

(regularization factor=0.001)

loss: 0.10340672731399536

 Acc: 0.9824

Test Acc: 0.9799

Using regularized BP neural network

(regularization factor=0.001)

loss: 0.17735381424427032

Acc: 0.971

Test Acc: 0.9676

五、实验报告要求:

1按照实验内容,给出相应结果。

2.分析比较不同学习算法对BP网络的训练结果、测试结果等的影响。

①随机梯度下降的算法是训练速度最快的,同时也是准确率最低的,训练结果和测试结果的准确率都不是很高。代价函数也最大,为0.4541553258895874,比其他两种算法都高很多,说明其模型拟合的不好。因为它是盲目搜索,是随机抽取的一个样本,信息少,容易跑偏。

②Adagrad算法性能是三种里面最好的,无论是训练效果还是测试效果都是准确率最高的,代价函数的值也是远远小于其他两种算法,可以从结果看Adagrad算法并没有发生严重的过拟合现象。

③Adam算法优化,也可以提高训练结果和测试结果的准确度,但没有Adagrad算法准确度高。

3. 分析比较使用指数衰减学习率、正则化和Drop层等不同模型对于训练结果、测试结果等的影响。

①使用指数衰减学习率提高了训练结果和测试结果的准确度,但提升的程度不明显,但是代价函数小了很多。

②使用正则化后,可以比较好的解决过拟合的问题,这边的loss值比不使用时大了一点,但是从准确度上来说模型拟合的性能并没有提高多少。

③使用droout层,也是和使用正则化一样的效果,它对于神经网络单元按照一定的概率将其暂时从网络中丢弃,从而提升训练效果,但是在本实验中不太明显,本实验是只在第3层加dropout层,如下:

network = Sequential([layers.Dense(512, activation='relu'),

                      layers.Dense(512, activation='relu'),

                      layers.Dense(512, activation='relu'),

                       layers.Dropout(0.5),

                      layers.Dense(512, activation='relu'),

                      layers.Dense(10,  activation='softmax')])

④使用指数衰减学习率和正则化结合的方法比只使用指数衰减学习率的loss值高,说明一定程度上减小了过拟合的效应,但是loss比值使用正则化的loss值高,提升了模型性能,训练结果和测试结果的准确度都提升了。

4. 总结BP网络和卷积神经网络在模式识别方面的异同点。

不同点:

①BP网络和卷积神经网络的计算方法不同:

BP神经网络是一种按照误差逆向传播算法训练的多层前馈神经网络

卷积神经网络则包含卷积计算且具有深度结构的前馈神经网络。

②卷积网络使用共享权值来减少网络各层之间的连接

③BP的网络结构:包括输入层,隐层和输出层。

卷积网络结构:输入层,卷积层,池化层,全连接层,输出层。

④BP采用全连接,卷积网络采用局部感知

相同点:

①BP神经网络和卷积神经网络都属于前馈神经网络,

②输入层都是输入图像,输出层都是多分类的结果。

③网络的中间层数、各层的神经元个数都可以根据具体情况任意设定,并且随着结构的差异其性能也有所不同。

④都采用了前向传播计算输出值,反向传播调整权重和偏置。

5. 实验心得。

①掌握了卷积层中输入大小和输出大小之间的计算关系:

输入:rxc,卷积核:axb,步长stride:s=1  输出:长=(r-a)/s+1 ,宽 =(c-b) )/s +1

②掌握了训练参数的计算

③掌握了BP和卷积网络的搭建和和优化算法的使用及训练和预测

代码(例:4层指数衰减):

import  tensorflow as tf
from    tensorflow.keras import datasets, layers, optimizers, Sequential
import matplotlib.pyplot as plt
import numpy as np
import pylab

def preprocess(x, y):
    x = tf.cast(x, dtype=tf.float32) / 255.
    y = tf.cast(y, dtype=tf.int32)
    return x,y


(x, y), (x_test, y_test) = datasets.mnist.load_data()#下载或读取数据集
print('datasets:', x.shape, y.shape, x.min(), x.max())#打印

训练:

imgs = x_test[0:5]#选取第0到5的图片
labs = y_test[0:5]
#print(labs)
plot_imgs = np.hstack(imgs)#将五张图片拼接成一行
plt.imshow(plot_imgs, cmap='gray')#选择gray灰度图
#pylab.show()#显示测试图片


x_train,x_val=tf.split(x,num_or_size_splits=[55000,5000])
y_train,y_val=tf.split(y,num_or_size_splits=[55000,5000])

batchsz = 100#设置批量大小

db = tf.data.Dataset.from_tensor_slices((x_train,y_train))
db = db.map(preprocess).shuffle(55000).batch(batchsz).repeat(10)
db_test = tf.data.Dataset.from_tensor_slices((x_test,y_test))
db_test = db_test.map(preprocess).batch(batchsz)
ds_val = tf.data.Dataset.from_tensor_slices((x_val, y_val))
ds_val = ds_val.map(preprocess).batch(batchsz)

#构建网络模型
network = Sequential([layers.Dense(512, activation='relu'),
                      layers.Dense(512, activation='relu'),
                      layers.Dense(512, activation='relu'),
                      layers.Dense(512, activation='relu'),
                      layers.Dense(10,  activation='softmax')])
network.build(input_shape=(batchsz, 28*28))#批量大小
network.summary()                       #打印网络参数
# 设置优化器
exponential_decay=tf.keras.optimizers.schedules.ExponentialDecay(
          initial_learning_rate=0.01,decay_steps=1000,decay_rate=0.96)

optimizer=tf.keras.optimizers.Adam(exponential_decay)

#optimizer = optimizers.Adam(lr=0.01)    # 固定学习率的Adam学习算法,大块=快,会振荡
#optimizer = = optimizers.SGD(0.01, decay=1e-2)# 固定学习率的SGD学习算法

#分批进行训练
for echo in range(10):
    for step, (x,y) in enumerate(db):#从训练集读取一批样本数据
        with tf.GradientTape() as tape:#构建梯度训练环境
            # [b, 28, 28] => [b, 784]
            x = tf.reshape(x, (-1, 28*28))
            # [b, 784] => [b, 10]
            out = network(x,training=True)
            # [b] => [b, 10]
            y_onehot = tf.one_hot(y, depth=10)
            # [b]
            #计算交叉熵损失函数
            loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y_onehot, out, from_logits=False))

        grads = tape.gradient(loss, network.trainable_variables)#计算梯度
        optimizer.apply_gradients(zip(grads, network.trainable_variables))#更新训练参数
        if step % 100 == 0:
            print('echo=',echo,' step=',step, 'loss:', float(loss))#打印训练的损失函数

        # 模型评价
        if step % 500 == 0:
            total, total_correct = 0., 0
            for i, (x, y) in enumerate(ds_val):
                # [b, 28, 28] => [b, 784]
                x = tf.reshape(x, (-1, 28*28))
                # [b, 784] => [b, 10]
                out = network(x)#神经网络模型输出
                # [b, 10] => [b]
                pred = tf.argmax(out, axis=1)
                pred = tf.cast(pred, dtype=tf.int32)
                # bool type
                correct = tf.equal(pred, y)
                # bool tensor => int tensor => numpy
                total_correct += tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
                total += x.shape[0]
            print(' step=',step, 'Evaluate Acc:', total_correct/total)#打印模型验证正确率

print("train is over")

#测试模型
total, total_correct = 0., 0
for i, (x, y) in enumerate(db_test):#读取一批测试数据
    # [b, 28, 28] => [b, 784]
    x = tf.reshape(x, (-1, 28*28))
    # [b, 784] => [b, 10]
    out = network(x)#神经网络模型输出
    # [b, 10] => [b]
    pred = tf.argmax(out, axis=1)
    pred = tf.cast(pred, dtype=tf.int32)
    # bool type
    correct = tf.equal(pred, y)
    # bool tensor => int tensor => numpy
    btestacc=tf.reduce_sum(tf.cast(correct, dtype=tf.int32)).numpy()
    total_correct += btestacc
    total += x.shape[0]
    #print('第',i,'批','test acc=',btestacc/x.shape[0])
    #print(y)
    #print(pred)
print('Test Acc:', total_correct/total)

训练结果:

 

Guess you like

Origin blog.csdn.net/cangzhexingxing/article/details/124123719