Handwritten digit recognition
mnist-keras multi-layer perceptron to recognize handwritten digits
[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-hH7VWDuN-1570422464746)(C:\Users\72451\Desktop\MNISTdata set.png)]
1. Perform data preprocessing
Import the required modules
from keras.utils import np_utils
import numpy as np
np.random.seed(10)
Read the MNIST data set
from keras.datasets import mnist
(x_train_image, y_train_label),\
(x_test_image, y_test_label) = mnist.load_data()
Convert feature (digital image feature value) using reshape
Convert 28*28 into 784 Float numbers
x_Train = x_train_image.reshape(60000, 784).astype('float32')
x_Test = x_test_image.reshape(10000, 784).astype('float32')
Standardize features (digital image feature values)
Improve accuracy
x_Train_normalize = x_Train / 255
x_Test_normalize = x_Test / 255
Label (digital real value) is converted by One-hot Encoding
y_Train_OneHot = np_utils.to_categorical(y_train_label)
y_Test_OneHot = np_utils.to_categorical(y_test_label)
2. Build a model
The input layer has 784 neurons, the hidden layer has 1000 neurons, and the output layer has 10 neurons
Import required modules
from keras.models import Sequential
from keras.layers import Dense
Build a Sequential model
Build a linear stacking model
model = Sequential()
Create the input layer, hide the layer
model.add(Dense(units = 1000, # 定义隐藏层神经元的个数为1000
input_dim = 784, # 设置输入层神经元个数为784
kernel_initializer = 'normal', # 使用 normal distribution 正态分布的随机数来初始化weight(权重)和 bias(偏差)
activation = 'relu')) # 定义激活函数relu(小于0的值为0,大于0的值不变)
Build the output layer
Join the Dense neural network layer and use the softmax activation function for conversion, which can convert the output of the neuron into the probability of predicting each number
model.add(Dense(units = 10, # 定义输出层的神经元一共有10个
kernel_initializer = 'normal', # 使用 normal distribution 正态分布的随机数来初始化 weight 和 bias
activation = 'softmax')) # 定义激活函数
#不需要设置input_dim,Keras会自动按照上一层的units是256个神经元,设置这一次的input_dim是256
View the summary of the model
print(model.summary())
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1000) 785000
_________________________________________________________________
dense_2 (Dense) (None, 10) 10010
=================================================================
Total params: 795,010
Trainable params: 795,010
Non-trainable params: 0
_________________________________________________________________
None
3. Conduct training
Define training methods
model.compile(loss = 'categorical_crossentropy', #设置损失函数(交叉熵损失函数)
optimizer = 'adam', # 优化器使用
metrics = ['accuracy'])
ps:
-
Cross entropy describes the distance between two probability distributions, or it can be said that it describes the difficulty of expressing probability distribution p through probability distribution q, p represents the correct answer, q represents the predicted value, the smaller the cross entropy , The distributions of the two probabilities are approximately close.
-
The basic mechanism of Adam optimization algorithm
Adam algorithm is different from traditional stochastic gradient descent. Stochastic gradient descent maintains a single learning rate (that is, alpha) to update all weights, and the learning rate does not change during the training process. And Adam designs unique adaptive learning rates for different parameters by calculating the first-order moment estimation and the second-order moment estimation of the gradient.
advantage:
Efficient calculation
required less memory
gradient diagonal scaling invariance (proof will be given to the second portion)
for solving optimization problems including large-scale data and parameters
applicable to non-steady-state (non-stationary) target
It is suitable for solving problems with very high noise or sparse gradients.
Hyperparameters can be explained intuitively, and basically only a very small amount of parameter adjustment is required
Start training
train_history = model.fit(x = x_Train_normalize, # 特征值
y = y_Train_OneHot, # 真实值
validation_split = 0.2, # 分割比例,将60000*0.8作为训练数据,60000*0.2作为验证数据
epochs = 10, # 设置训练周期
batch_size = 200, # 每批训练200个数据
verbose = 2) # 显示训练过程
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4379 - accuracy: 0.8830 - val_loss: 0.2182 - val_accuracy: 0.9408
Epoch 2/10
- 1s - loss: 0.1908 - accuracy: 0.9454 - val_loss: 0.1557 - val_accuracy: 0.9553
Epoch 3/10
- 1s - loss: 0.1354 - accuracy: 0.9615 - val_loss: 0.1257 - val_accuracy: 0.9647
Epoch 4/10
- 1s - loss: 0.1026 - accuracy: 0.9703 - val_loss: 0.1118 - val_accuracy: 0.9683
Epoch 5/10
- 1s - loss: 0.0809 - accuracy: 0.9771 - val_loss: 0.0982 - val_accuracy: 0.9715
Epoch 6/10
- 1s - loss: 0.0658 - accuracy: 0.9820 - val_loss: 0.0932 - val_accuracy: 0.9725
Epoch 7/10
- 1s - loss: 0.0543 - accuracy: 0.9851 - val_loss: 0.0916 - val_accuracy: 0.9738
Epoch 8/10
- 1s - loss: 0.0458 - accuracy: 0.9876 - val_loss: 0.0830 - val_accuracy: 0.9762
Epoch 9/10
- 1s - loss: 0.0379 - accuracy: 0.9902 - val_loss: 0.0823 - val_accuracy: 0.9762
Epoch 10/10
- 1s - loss: 0.0315 - accuracy: 0.9916 - val_loss: 0.0811 - val_accuracy: 0.9762
test
val_loss, val_acc = model.evaluate(x_Test_normalize, y_Test_OneHot, 1) # 评估模型对样本数据的输出结果
print(val_loss) # 模型的损失值
print(val_acc) # 模型的准确度
10000/10000 [==============================] - 4s 379us/step
0.07567812022235794
0.9760000109672546
Set up show_train_history to display the training process
import matplotlib.pyplot as plt
def show_train_history(train_history, train, validation):
plt.plot(train_history.history[train])
plt.plot(train_history.history[validation])
plt.title('Train History')
plt.ylabel(train)
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc = 'upper left')
plt.show()
show_train_history(train_history, 'accuracy', 'val_accuracy')
# accuracy 是使用训练集计算准确度
# val_accuracy 是使用验证数据集计算准确度
4. Experimental parameters
Activation function | Number of neurons | Training average running time | Accuracy |
---|---|---|---|
resume | 256 | 1s | 0.9760 |
resume | 1000 | 3-4s | 0.9801 |
Sigmoid | 256 | 1s | 0.9645 |
fishy | 256 | 1s | 0.9753 |
rlu | 256 | 1s | 0.9749 |
kernel_initializer | Accuracy |
---|---|
normal | 0.9760 |
random_uniform | 0.9778 |
256 neurons
Activation function: relu
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 256) 200960
_________________________________________________________________
dense_2 (Dense) (None, 10) 2570
=================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4379 - accuracy: 0.8830 - val_loss: 0.2182 - val_accuracy: 0.9407
Epoch 2/10
- 1s - loss: 0.1909 - accuracy: 0.9454 - val_loss: 0.1559 - val_accuracy: 0.9555
Epoch 3/10
- 1s - loss: 0.1355 - accuracy: 0.9617 - val_loss: 0.1260 - val_accuracy: 0.9649
Epoch 4/10
- 1s - loss: 0.1027 - accuracy: 0.9704 - val_loss: 0.1119 - val_accuracy: 0.9683
Epoch 5/10
- 1s - loss: 0.0810 - accuracy: 0.9773 - val_loss: 0.0979 - val_accuracy: 0.9721
Epoch 6/10
- 1s - loss: 0.0659 - accuracy: 0.9817 - val_loss: 0.0936 - val_accuracy: 0.9722
Epoch 7/10
- 1s - loss: 0.0543 - accuracy: 0.9851 - val_loss: 0.0912 - val_accuracy: 0.9737
Epoch 8/10
- 1s - loss: 0.0460 - accuracy: 0.9877 - val_loss: 0.0830 - val_accuracy: 0.9767
Epoch 9/10
- 1s - loss: 0.0379 - accuracy: 0.9902 - val_loss: 0.0828 - val_accuracy: 0.9760
Epoch 10/10
- 1s - loss: 0.0316 - accuracy: 0.9917 - val_loss: 0.0807 - val_accuracy: 0.9769
test:
10000/10000 [==============================] - 4s 374us/step
0.07602789112742801
0.9757999777793884
1000 neurons
Activation function: relu
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1000) 785000
_________________________________________________________________
dense_2 (Dense) (None, 10) 10010
=================================================================
Total params: 795,010
Trainable params: 795,010
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 3s - loss: 0.2944 - accuracy: 0.9152 - val_loss: 0.1528 - val_accuracy: 0.9565
Epoch 2/10
- 3s - loss: 0.1179 - accuracy: 0.9661 - val_loss: 0.1073 - val_accuracy: 0.9678
Epoch 3/10
- 3s - loss: 0.0759 - accuracy: 0.9783 - val_loss: 0.0922 - val_accuracy: 0.9724
Epoch 4/10
- 3s - loss: 0.0514 - accuracy: 0.9853 - val_loss: 0.0869 - val_accuracy: 0.9733
Epoch 5/10
- 3s - loss: 0.0357 - accuracy: 0.9905 - val_loss: 0.0754 - val_accuracy: 0.9757
Epoch 6/10
- 4s - loss: 0.0257 - accuracy: 0.9932 - val_loss: 0.0743 - val_accuracy: 0.9778
Epoch 7/10
- 4s - loss: 0.0185 - accuracy: 0.9958 - val_loss: 0.0724 - val_accuracy: 0.9793
Epoch 8/10
- 4s - loss: 0.0132 - accuracy: 0.9971 - val_loss: 0.0718 - val_accuracy: 0.9778
Epoch 9/10
- 4s - loss: 0.0087 - accuracy: 0.9988 - val_loss: 0.0712 - val_accuracy: 0.9798
Epoch 10/10
- 4s - loss: 0.0062 - accuracy: 0.9992 - val_loss: 0.0705 - val_accuracy: 0.9800
test:
10000/10000 [==============================] - 6s 569us/step
0.06873653566057918
0.9797999858856201
ps: Sometimes it can exceed 0.98
Activation function: Sigmoid
256 neurons
Summary:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 256) 200960
_________________________________________________________________
dense_3 (Dense) (None, 10) 2570
=================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.7395 - accuracy: 0.8315 - val_loss: 0.3386 - val_accuracy: 0.9109
Epoch 2/10
- 1s - loss: 0.3100 - accuracy: 0.9136 - val_loss: 0.2560 - val_accuracy: 0.9277
Epoch 3/10
- 1s - loss: 0.2492 - accuracy: 0.9290 - val_loss: 0.2233 - val_accuracy: 0.9381
Epoch 4/10
- 1s - loss: 0.2119 - accuracy: 0.9391 - val_loss: 0.1974 - val_accuracy: 0.9424
Epoch 5/10
- 1s - loss: 0.1835 - accuracy: 0.9466 - val_loss: 0.1757 - val_accuracy: 0.9517
Epoch 6/10
- 1s - loss: 0.1608 - accuracy: 0.9533 - val_loss: 0.1607 - val_accuracy: 0.9551
Epoch 7/10
- 1s - loss: 0.1424 - accuracy: 0.9593 - val_loss: 0.1489 - val_accuracy: 0.9587
Epoch 8/10
- 1s - loss: 0.1269 - accuracy: 0.9638 - val_loss: 0.1394 - val_accuracy: 0.9621
Epoch 9/10
- 1s - loss: 0.1141 - accuracy: 0.9677 - val_loss: 0.1291 - val_accuracy: 0.9634
Epoch 10/10
- 1s - loss: 0.1025 - accuracy: 0.9711 - val_loss: 0.1216 - val_accuracy: 0.9659
10000/10000 [==============================] - 4s 380us/step
0.11642538407448501
0.9645000100135803
The effect is significantly worse
Activation function tanh
256 neurons
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4394 - accuracy: 0.8801 - val_loss: 0.2483 - val_accuracy: 0.9302
Epoch 2/10
- 1s - loss: 0.2252 - accuracy: 0.9352 - val_loss: 0.1883 - val_accuracy: 0.9479
Epoch 3/10
- 1s - loss: 0.1681 - accuracy: 0.9514 - val_loss: 0.1556 - val_accuracy: 0.9580
Epoch 4/10
- 1s - loss: 0.1313 - accuracy: 0.9631 - val_loss: 0.1374 - val_accuracy: 0.9603
Epoch 5/10
- 1s - loss: 0.1064 - accuracy: 0.9704 - val_loss: 0.1214 - val_accuracy: 0.9652
Epoch 6/10
- 1s - loss: 0.0876 - accuracy: 0.9763 - val_loss: 0.1140 - val_accuracy: 0.9668
Epoch 7/10
- 1s - loss: 0.0728 - accuracy: 0.9802 - val_loss: 0.1063 - val_accuracy: 0.9694
Epoch 8/10
- 1s - loss: 0.0610 - accuracy: 0.9837 - val_loss: 0.0951 - val_accuracy: 0.9731
Epoch 9/10
- 1s - loss: 0.0510 - accuracy: 0.9870 - val_loss: 0.0926 - val_accuracy: 0.9721
Epoch 10/10
- 1s - loss: 0.0426 - accuracy: 0.9894 - val_loss: 0.0866 - val_accuracy: 0.9738
10000/10000 [==============================] - 4s 371us/step
0.08017727720420531
0.9753999710083008
Activation function rlu (Exponential Linear Units)
256 neurons
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4413 - accuracy: 0.8773 - val_loss: 0.2636 - val_accuracy: 0.9261
Epoch 2/10
- 1s - loss: 0.2476 - accuracy: 0.9284 - val_loss: 0.2049 - val_accuracy: 0.9422
Epoch 3/10
- 1s - loss: 0.1849 - accuracy: 0.9471 - val_loss: 0.1645 - val_accuracy: 0.9557
Epoch 4/10
- 1s - loss: 0.1423 - accuracy: 0.9593 - val_loss: 0.1424 - val_accuracy: 0.9599
Epoch 5/10
- 1s - loss: 0.1139 - accuracy: 0.9676 - val_loss: 0.1232 - val_accuracy: 0.9658
Epoch 6/10
- 1s - loss: 0.0936 - accuracy: 0.9734 - val_loss: 0.1140 - val_accuracy: 0.9674
Epoch 7/10
- 1s - loss: 0.0781 - accuracy: 0.9778 - val_loss: 0.1070 - val_accuracy: 0.9692
Epoch 8/10
- 1s - loss: 0.0670 - accuracy: 0.9807 - val_loss: 0.0976 - val_accuracy: 0.9720
Epoch 9/10
- 1s - loss: 0.0570 - accuracy: 0.9839 - val_loss: 0.0939 - val_accuracy: 0.9725
Epoch 10/10
- 1s - loss: 0.0485 - accuracy: 0.9868 - val_loss: 0.0880 - val_accuracy: 0.9740
10000/10000 [==============================] - 4s 374us/step
0.07968259554752871
0.9749000072479248