mnist-keras multi-layer perceptron to recognize handwritten digits

mnist-keras multi-layer perceptron to recognize handwritten digits

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-hH7VWDuN-1570422464746)(C:\Users\72451\Desktop\MNISTdata set.png)]

1. Perform data preprocessing

Import the required modules

from keras.utils import np_utils
import numpy as np
np.random.seed(10)

Read the MNIST data set

from keras.datasets import mnist
(x_train_image, y_train_label),\
(x_test_image, y_test_label) = mnist.load_data()

Convert feature (digital image feature value) using reshape

Convert 28*28 into 784 Float numbers

x_Train = x_train_image.reshape(60000, 784).astype('float32')
x_Test = x_test_image.reshape(10000, 784).astype('float32')

Standardize features (digital image feature values)

Improve accuracy

x_Train_normalize = x_Train / 255
x_Test_normalize = x_Test / 255

Label (digital real value) is converted by One-hot Encoding

y_Train_OneHot = np_utils.to_categorical(y_train_label)
y_Test_OneHot = np_utils.to_categorical(y_test_label)

2. Build a model

The input layer has 784 neurons, the hidden layer has 1000 neurons, and the output layer has 10 neurons

Import required modules

from keras.models import Sequential
from keras.layers import Dense

Build a Sequential model

Build a linear stacking model

model = Sequential()

Create the input layer, hide the layer

model.add(Dense(units = 1000,		# 定义隐藏层神经元的个数为1000
                input_dim = 784,	# 设置输入层神经元个数为784 
                kernel_initializer = 'normal',	# 使用 normal distribution 正态分布的随机数来初始化weight(权重)和 bias(偏差)
                activation = 'relu')) 	# 定义激活函数relu(小于0的值为0,大于0的值不变) 

Build the output layer

Join the Dense neural network layer and use the softmax activation function for conversion, which can convert the output of the neuron into the probability of predicting each number

model.add(Dense(units = 10,	# 定义输出层的神经元一共有10个
                kernel_initializer = 'normal',	# 使用 normal distribution 正态分布的随机数来初始化 weight 和 bias 
                activation = 'softmax'))    # 定义激活函数
#不需要设置input_dim,Keras会自动按照上一层的units是256个神经元,设置这一次的input_dim是256

View the summary of the model

print(model.summary())
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 1000)              785000    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                10010     
=================================================================
Total params: 795,010
Trainable params: 795,010
Non-trainable params: 0
_________________________________________________________________
None

3. Conduct training

Define training methods

model.compile(loss = 'categorical_crossentropy',	#设置损失函数(交叉熵损失函数)
              optimizer = 'adam',	# 优化器使用
              metrics = ['accuracy'])	

ps:

  1. Cross entropy describes the distance between two probability distributions, or it can be said that it describes the difficulty of expressing probability distribution p through probability distribution q, p represents the correct answer, q represents the predicted value, the smaller the cross entropy , The distributions of the two probabilities are approximately close.

  2. The basic mechanism of Adam optimization algorithm

    Adam algorithm is different from traditional stochastic gradient descent. Stochastic gradient descent maintains a single learning rate (that is, alpha) to update all weights, and the learning rate does not change during the training process. And Adam designs unique adaptive learning rates for different parameters by calculating the first-order moment estimation and the second-order moment estimation of the gradient.

    advantage:

    Efficient calculation
    required less memory
    gradient diagonal scaling invariance (proof will be given to the second portion)
    for solving optimization problems including large-scale data and parameters
    applicable to non-steady-state (non-stationary) target
    It is suitable for solving problems with very high noise or sparse gradients.
    Hyperparameters can be explained intuitively, and basically only a very small amount of parameter adjustment is required

Start training

train_history = model.fit(x = x_Train_normalize, 	# 特征值
                          y = y_Train_OneHot,		# 真实值
                          validation_split = 0.2, 	# 分割比例,将60000*0.8作为训练数据,60000*0.2作为验证数据
                          epochs = 10,				# 设置训练周期
                          batch_size = 200,			# 每批训练200个数据
                          verbose = 2)				# 显示训练过程
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
 - 1s - loss: 0.4379 - accuracy: 0.8830 - val_loss: 0.2182 - val_accuracy: 0.9408
Epoch 2/10
 - 1s - loss: 0.1908 - accuracy: 0.9454 - val_loss: 0.1557 - val_accuracy: 0.9553
Epoch 3/10
 - 1s - loss: 0.1354 - accuracy: 0.9615 - val_loss: 0.1257 - val_accuracy: 0.9647
Epoch 4/10
 - 1s - loss: 0.1026 - accuracy: 0.9703 - val_loss: 0.1118 - val_accuracy: 0.9683
Epoch 5/10
 - 1s - loss: 0.0809 - accuracy: 0.9771 - val_loss: 0.0982 - val_accuracy: 0.9715
Epoch 6/10
 - 1s - loss: 0.0658 - accuracy: 0.9820 - val_loss: 0.0932 - val_accuracy: 0.9725
Epoch 7/10
 - 1s - loss: 0.0543 - accuracy: 0.9851 - val_loss: 0.0916 - val_accuracy: 0.9738
Epoch 8/10
 - 1s - loss: 0.0458 - accuracy: 0.9876 - val_loss: 0.0830 - val_accuracy: 0.9762
Epoch 9/10
 - 1s - loss: 0.0379 - accuracy: 0.9902 - val_loss: 0.0823 - val_accuracy: 0.9762
Epoch 10/10
 - 1s - loss: 0.0315 - accuracy: 0.9916 - val_loss: 0.0811 - val_accuracy: 0.9762

test

val_loss, val_acc = model.evaluate(x_Test_normalize, y_Test_OneHot, 1)  # 评估模型对样本数据的输出结果
print(val_loss)  # 模型的损失值
print(val_acc)  # 模型的准确度
10000/10000 [==============================] - 4s 379us/step
0.07567812022235794
0.9760000109672546

Set up show_train_history to display the training process

import matplotlib.pyplot as plt
def show_train_history(train_history, train, validation):
    plt.plot(train_history.history[train])
    plt.plot(train_history.history[validation])
    plt.title('Train History')
    plt.ylabel(train)
    plt.xlabel('Epoch')
    plt.legend(['train', 'validation'], loc = 'upper left')
    plt.show()
show_train_history(train_history, 'accuracy', 'val_accuracy')
# accuracy 是使用训练集计算准确度
# val_accuracy 是使用验证数据集计算准确度

Insert picture description here

4. Experimental parameters

Activation function Number of neurons Training average running time Accuracy
resume 256 1s 0.9760
resume 1000 3-4s 0.9801
Sigmoid 256 1s 0.9645
fishy 256 1s 0.9753
rlu 256 1s 0.9749
kernel_initializer Accuracy
normal 0.9760
random_uniform 0.9778

256 neurons

Activation function: relu

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 256)               200960    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                2570      
=================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
 - 1s - loss: 0.4379 - accuracy: 0.8830 - val_loss: 0.2182 - val_accuracy: 0.9407
Epoch 2/10
 - 1s - loss: 0.1909 - accuracy: 0.9454 - val_loss: 0.1559 - val_accuracy: 0.9555
Epoch 3/10
 - 1s - loss: 0.1355 - accuracy: 0.9617 - val_loss: 0.1260 - val_accuracy: 0.9649
Epoch 4/10
 - 1s - loss: 0.1027 - accuracy: 0.9704 - val_loss: 0.1119 - val_accuracy: 0.9683
Epoch 5/10
 - 1s - loss: 0.0810 - accuracy: 0.9773 - val_loss: 0.0979 - val_accuracy: 0.9721
Epoch 6/10
 - 1s - loss: 0.0659 - accuracy: 0.9817 - val_loss: 0.0936 - val_accuracy: 0.9722
Epoch 7/10
 - 1s - loss: 0.0543 - accuracy: 0.9851 - val_loss: 0.0912 - val_accuracy: 0.9737
Epoch 8/10
 - 1s - loss: 0.0460 - accuracy: 0.9877 - val_loss: 0.0830 - val_accuracy: 0.9767
Epoch 9/10
 - 1s - loss: 0.0379 - accuracy: 0.9902 - val_loss: 0.0828 - val_accuracy: 0.9760
Epoch 10/10
 - 1s - loss: 0.0316 - accuracy: 0.9917 - val_loss: 0.0807 - val_accuracy: 0.9769

test:

10000/10000 [==============================] - 4s 374us/step
0.07602789112742801
0.9757999777793884

1000 neurons

Activation function: relu

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 1000)              785000    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                10010     
=================================================================
Total params: 795,010
Trainable params: 795,010
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
 - 3s - loss: 0.2944 - accuracy: 0.9152 - val_loss: 0.1528 - val_accuracy: 0.9565
Epoch 2/10
 - 3s - loss: 0.1179 - accuracy: 0.9661 - val_loss: 0.1073 - val_accuracy: 0.9678
Epoch 3/10
 - 3s - loss: 0.0759 - accuracy: 0.9783 - val_loss: 0.0922 - val_accuracy: 0.9724
Epoch 4/10
 - 3s - loss: 0.0514 - accuracy: 0.9853 - val_loss: 0.0869 - val_accuracy: 0.9733
Epoch 5/10
 - 3s - loss: 0.0357 - accuracy: 0.9905 - val_loss: 0.0754 - val_accuracy: 0.9757
Epoch 6/10
 - 4s - loss: 0.0257 - accuracy: 0.9932 - val_loss: 0.0743 - val_accuracy: 0.9778
Epoch 7/10
 - 4s - loss: 0.0185 - accuracy: 0.9958 - val_loss: 0.0724 - val_accuracy: 0.9793
Epoch 8/10
 - 4s - loss: 0.0132 - accuracy: 0.9971 - val_loss: 0.0718 - val_accuracy: 0.9778
Epoch 9/10
 - 4s - loss: 0.0087 - accuracy: 0.9988 - val_loss: 0.0712 - val_accuracy: 0.9798
Epoch 10/10
 - 4s - loss: 0.0062 - accuracy: 0.9992 - val_loss: 0.0705 - val_accuracy: 0.9800

test:

10000/10000 [==============================] - 6s 569us/step
0.06873653566057918
0.9797999858856201

ps: Sometimes it can exceed 0.98

Activation function: Sigmoid

256 neurons

Summary:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_2 (Dense)              (None, 256)               200960    
_________________________________________________________________
dense_3 (Dense)              (None, 10)                2570      
=================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
 - 1s - loss: 0.7395 - accuracy: 0.8315 - val_loss: 0.3386 - val_accuracy: 0.9109
Epoch 2/10
 - 1s - loss: 0.3100 - accuracy: 0.9136 - val_loss: 0.2560 - val_accuracy: 0.9277
Epoch 3/10
 - 1s - loss: 0.2492 - accuracy: 0.9290 - val_loss: 0.2233 - val_accuracy: 0.9381
Epoch 4/10
 - 1s - loss: 0.2119 - accuracy: 0.9391 - val_loss: 0.1974 - val_accuracy: 0.9424
Epoch 5/10
 - 1s - loss: 0.1835 - accuracy: 0.9466 - val_loss: 0.1757 - val_accuracy: 0.9517
Epoch 6/10
 - 1s - loss: 0.1608 - accuracy: 0.9533 - val_loss: 0.1607 - val_accuracy: 0.9551
Epoch 7/10
 - 1s - loss: 0.1424 - accuracy: 0.9593 - val_loss: 0.1489 - val_accuracy: 0.9587
Epoch 8/10
 - 1s - loss: 0.1269 - accuracy: 0.9638 - val_loss: 0.1394 - val_accuracy: 0.9621
Epoch 9/10
 - 1s - loss: 0.1141 - accuracy: 0.9677 - val_loss: 0.1291 - val_accuracy: 0.9634
Epoch 10/10
 - 1s - loss: 0.1025 - accuracy: 0.9711 - val_loss: 0.1216 - val_accuracy: 0.9659
10000/10000 [==============================] - 4s 380us/step
0.11642538407448501
0.9645000100135803

The effect is significantly worse

Activation function tanh

256 neurons

Train on 48000 samples, validate on 12000 samples
Epoch 1/10
 - 1s - loss: 0.4394 - accuracy: 0.8801 - val_loss: 0.2483 - val_accuracy: 0.9302
Epoch 2/10
 - 1s - loss: 0.2252 - accuracy: 0.9352 - val_loss: 0.1883 - val_accuracy: 0.9479
Epoch 3/10
 - 1s - loss: 0.1681 - accuracy: 0.9514 - val_loss: 0.1556 - val_accuracy: 0.9580
Epoch 4/10
 - 1s - loss: 0.1313 - accuracy: 0.9631 - val_loss: 0.1374 - val_accuracy: 0.9603
Epoch 5/10
 - 1s - loss: 0.1064 - accuracy: 0.9704 - val_loss: 0.1214 - val_accuracy: 0.9652
Epoch 6/10
 - 1s - loss: 0.0876 - accuracy: 0.9763 - val_loss: 0.1140 - val_accuracy: 0.9668
Epoch 7/10
 - 1s - loss: 0.0728 - accuracy: 0.9802 - val_loss: 0.1063 - val_accuracy: 0.9694
Epoch 8/10
 - 1s - loss: 0.0610 - accuracy: 0.9837 - val_loss: 0.0951 - val_accuracy: 0.9731
Epoch 9/10
 - 1s - loss: 0.0510 - accuracy: 0.9870 - val_loss: 0.0926 - val_accuracy: 0.9721
Epoch 10/10
 - 1s - loss: 0.0426 - accuracy: 0.9894 - val_loss: 0.0866 - val_accuracy: 0.9738
10000/10000 [==============================] - 4s 371us/step
0.08017727720420531
0.9753999710083008

Activation function rlu (Exponential Linear Units)

256 neurons

Train on 48000 samples, validate on 12000 samples
Epoch 1/10
 - 1s - loss: 0.4413 - accuracy: 0.8773 - val_loss: 0.2636 - val_accuracy: 0.9261
Epoch 2/10
 - 1s - loss: 0.2476 - accuracy: 0.9284 - val_loss: 0.2049 - val_accuracy: 0.9422
Epoch 3/10
 - 1s - loss: 0.1849 - accuracy: 0.9471 - val_loss: 0.1645 - val_accuracy: 0.9557
Epoch 4/10
 - 1s - loss: 0.1423 - accuracy: 0.9593 - val_loss: 0.1424 - val_accuracy: 0.9599
Epoch 5/10
 - 1s - loss: 0.1139 - accuracy: 0.9676 - val_loss: 0.1232 - val_accuracy: 0.9658
Epoch 6/10
 - 1s - loss: 0.0936 - accuracy: 0.9734 - val_loss: 0.1140 - val_accuracy: 0.9674
Epoch 7/10
 - 1s - loss: 0.0781 - accuracy: 0.9778 - val_loss: 0.1070 - val_accuracy: 0.9692
Epoch 8/10
 - 1s - loss: 0.0670 - accuracy: 0.9807 - val_loss: 0.0976 - val_accuracy: 0.9720
Epoch 9/10
 - 1s - loss: 0.0570 - accuracy: 0.9839 - val_loss: 0.0939 - val_accuracy: 0.9725
Epoch 10/10
 - 1s - loss: 0.0485 - accuracy: 0.9868 - val_loss: 0.0880 - val_accuracy: 0.9740
10000/10000 [==============================] - 4s 374us/step
0.07968259554752871
0.9749000072479248

Guess you like

Origin blog.csdn.net/qq_44082148/article/details/102298181