Reconocimiento de dígitos escritos a mano
perceptrón multicapa mnist-keras para reconocer dígitos escritos a mano
[Error en la transferencia de la imagen del enlace externo. El sitio de origen puede tener un mecanismo de enlace anti-sanguijuela. Se recomienda guardar la imagen y subirla directamente (img-hH7VWDuN-1570422464746) (C: \ Users \ 72451 \ Desktop \ MNISTdata set. png)]
1. Realice el procesamiento previo de datos
Importar los módulos requeridos
from keras.utils import np_utils
import numpy as np
np.random.seed(10)
Leer el conjunto de datos MNIST
from keras.datasets import mnist
(x_train_image, y_train_label),\
(x_test_image, y_test_label) = mnist.load_data()
Convertir característica (valor de característica de imagen digital) usando remodelar
Convierta 28 * 28 en 784 números flotantes
x_Train = x_train_image.reshape(60000, 784).astype('float32')
x_Test = x_test_image.reshape(10000, 784).astype('float32')
Estandarizar funciones (valores de funciones de imagen digital)
Mejorar la precisión
x_Train_normalize = x_Train / 255
x_Test_normalize = x_Test / 255
La etiqueta (valor real digital) se convierte mediante codificación One-hot
y_Train_OneHot = np_utils.to_categorical(y_train_label)
y_Test_OneHot = np_utils.to_categorical(y_test_label)
2. Crea un modelo
La capa de entrada tiene 784 neuronas, la capa oculta tiene 1000 neuronas y la capa de salida tiene 10 neuronas
Importar módulos requeridos
from keras.models import Sequential
from keras.layers import Dense
Construye un modelo secuencial
Construya un modelo de apilamiento lineal
model = Sequential()
Crea la capa de entrada, oculta la capa
model.add(Dense(units = 1000, # 定义隐藏层神经元的个数为1000
input_dim = 784, # 设置输入层神经元个数为784
kernel_initializer = 'normal', # 使用 normal distribution 正态分布的随机数来初始化weight(权重)和 bias(偏差)
activation = 'relu')) # 定义激活函数relu(小于0的值为0,大于0的值不变)
Construye la capa de salida
Únase a la capa de red neuronal densa y use la función de activación softmax para la conversión, que puede convertir la salida de la neurona en la probabilidad de predecir cada número
model.add(Dense(units = 10, # 定义输出层的神经元一共有10个
kernel_initializer = 'normal', # 使用 normal distribution 正态分布的随机数来初始化 weight 和 bias
activation = 'softmax')) # 定义激活函数
#不需要设置input_dim,Keras会自动按照上一层的units是256个神经元,设置这一次的input_dim是256
Ver el resumen del modelo
print(model.summary())
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1000) 785000
_________________________________________________________________
dense_2 (Dense) (None, 10) 10010
=================================================================
Total params: 795,010
Trainable params: 795,010
Non-trainable params: 0
_________________________________________________________________
None
3. Realización de la formación
Definir métodos de entrenamiento
model.compile(loss = 'categorical_crossentropy', #设置损失函数(交叉熵损失函数)
optimizer = 'adam', # 优化器使用
metrics = ['accuracy'])
PD:
-
La entropía cruzada describe la distancia entre dos distribuciones de probabilidad, o se puede decir que describe la dificultad de expresar la distribución de probabilidad p mediante la distribución de probabilidad q, p representa la respuesta correcta, q representa el valor predicho, cuanto menor es la entropía cruzada, Las distribuciones de las dos probabilidades son aproximadamente cercanas.
-
El mecanismo básico del algoritmo de optimización de Adam
El algoritmo de Adam es diferente del descenso de gradiente estocástico tradicional. El descenso de gradiente estocástico mantiene una única tasa de aprendizaje (es decir, alfa) para actualizar todos los pesos, y la tasa de aprendizaje no cambia durante el proceso de entrenamiento. Y Adam diseña tasas de aprendizaje adaptativas únicas para diferentes parámetros mediante el cálculo de la estimación de momento de primer orden y la estimación de momento de segundo orden del gradiente.
ventaja:
El cálculo eficiente
requirió menos
invariancia de escala diagonal de gradiente de memoria (se dará prueba a la segunda parte)
para resolver problemas de optimización, incluidos datos a gran escala y parámetros
aplicables a objetivos de estado no estacionario (no estacionario)
Es adecuado para resolver problemas con ruido muy alto o gradientes escasos. Los
hiperparámetros se pueden explicar de manera intuitiva y, básicamente, solo se requiere una cantidad muy pequeña de ajuste de parámetros
Empezar a entrenar
train_history = model.fit(x = x_Train_normalize, # 特征值
y = y_Train_OneHot, # 真实值
validation_split = 0.2, # 分割比例,将60000*0.8作为训练数据,60000*0.2作为验证数据
epochs = 10, # 设置训练周期
batch_size = 200, # 每批训练200个数据
verbose = 2) # 显示训练过程
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4379 - accuracy: 0.8830 - val_loss: 0.2182 - val_accuracy: 0.9408
Epoch 2/10
- 1s - loss: 0.1908 - accuracy: 0.9454 - val_loss: 0.1557 - val_accuracy: 0.9553
Epoch 3/10
- 1s - loss: 0.1354 - accuracy: 0.9615 - val_loss: 0.1257 - val_accuracy: 0.9647
Epoch 4/10
- 1s - loss: 0.1026 - accuracy: 0.9703 - val_loss: 0.1118 - val_accuracy: 0.9683
Epoch 5/10
- 1s - loss: 0.0809 - accuracy: 0.9771 - val_loss: 0.0982 - val_accuracy: 0.9715
Epoch 6/10
- 1s - loss: 0.0658 - accuracy: 0.9820 - val_loss: 0.0932 - val_accuracy: 0.9725
Epoch 7/10
- 1s - loss: 0.0543 - accuracy: 0.9851 - val_loss: 0.0916 - val_accuracy: 0.9738
Epoch 8/10
- 1s - loss: 0.0458 - accuracy: 0.9876 - val_loss: 0.0830 - val_accuracy: 0.9762
Epoch 9/10
- 1s - loss: 0.0379 - accuracy: 0.9902 - val_loss: 0.0823 - val_accuracy: 0.9762
Epoch 10/10
- 1s - loss: 0.0315 - accuracy: 0.9916 - val_loss: 0.0811 - val_accuracy: 0.9762
prueba
val_loss, val_acc = model.evaluate(x_Test_normalize, y_Test_OneHot, 1) # 评估模型对样本数据的输出结果
print(val_loss) # 模型的损失值
print(val_acc) # 模型的准确度
10000/10000 [==============================] - 4s 379us/step
0.07567812022235794
0.9760000109672546
Configure show_train_history para mostrar el proceso de entrenamiento
import matplotlib.pyplot as plt
def show_train_history(train_history, train, validation):
plt.plot(train_history.history[train])
plt.plot(train_history.history[validation])
plt.title('Train History')
plt.ylabel(train)
plt.xlabel('Epoch')
plt.legend(['train', 'validation'], loc = 'upper left')
plt.show()
show_train_history(train_history, 'accuracy', 'val_accuracy')
# accuracy 是使用训练集计算准确度
# val_accuracy 是使用验证数据集计算准确度
4. Parámetros experimentales
Función de activación | Numero de neuronas | Entrenamiento promedio de tiempo de ejecución | Precisión |
---|---|---|---|
reanudar | 256 | 1 s | 0,9760 |
reanudar | 1000 | 3-4 s | 0.9801 |
Sigmoideo | 256 | 1 s | 0.9645 |
sospechoso | 256 | 1 s | 0.9753 |
rlu | 256 | 1 s | 0.9749 |
kernel_initializer | Precisión |
---|---|
normal | 0,9760 |
random_uniform | 0,9778 |
256 neuronas
Función de activación: relu
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 256) 200960
_________________________________________________________________
dense_2 (Dense) (None, 10) 2570
=================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4379 - accuracy: 0.8830 - val_loss: 0.2182 - val_accuracy: 0.9407
Epoch 2/10
- 1s - loss: 0.1909 - accuracy: 0.9454 - val_loss: 0.1559 - val_accuracy: 0.9555
Epoch 3/10
- 1s - loss: 0.1355 - accuracy: 0.9617 - val_loss: 0.1260 - val_accuracy: 0.9649
Epoch 4/10
- 1s - loss: 0.1027 - accuracy: 0.9704 - val_loss: 0.1119 - val_accuracy: 0.9683
Epoch 5/10
- 1s - loss: 0.0810 - accuracy: 0.9773 - val_loss: 0.0979 - val_accuracy: 0.9721
Epoch 6/10
- 1s - loss: 0.0659 - accuracy: 0.9817 - val_loss: 0.0936 - val_accuracy: 0.9722
Epoch 7/10
- 1s - loss: 0.0543 - accuracy: 0.9851 - val_loss: 0.0912 - val_accuracy: 0.9737
Epoch 8/10
- 1s - loss: 0.0460 - accuracy: 0.9877 - val_loss: 0.0830 - val_accuracy: 0.9767
Epoch 9/10
- 1s - loss: 0.0379 - accuracy: 0.9902 - val_loss: 0.0828 - val_accuracy: 0.9760
Epoch 10/10
- 1s - loss: 0.0316 - accuracy: 0.9917 - val_loss: 0.0807 - val_accuracy: 0.9769
prueba:
10000/10000 [==============================] - 4s 374us/step
0.07602789112742801
0.9757999777793884
1000 neuronas
Función de activación: relu
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 1000) 785000
_________________________________________________________________
dense_2 (Dense) (None, 10) 10010
=================================================================
Total params: 795,010
Trainable params: 795,010
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 3s - loss: 0.2944 - accuracy: 0.9152 - val_loss: 0.1528 - val_accuracy: 0.9565
Epoch 2/10
- 3s - loss: 0.1179 - accuracy: 0.9661 - val_loss: 0.1073 - val_accuracy: 0.9678
Epoch 3/10
- 3s - loss: 0.0759 - accuracy: 0.9783 - val_loss: 0.0922 - val_accuracy: 0.9724
Epoch 4/10
- 3s - loss: 0.0514 - accuracy: 0.9853 - val_loss: 0.0869 - val_accuracy: 0.9733
Epoch 5/10
- 3s - loss: 0.0357 - accuracy: 0.9905 - val_loss: 0.0754 - val_accuracy: 0.9757
Epoch 6/10
- 4s - loss: 0.0257 - accuracy: 0.9932 - val_loss: 0.0743 - val_accuracy: 0.9778
Epoch 7/10
- 4s - loss: 0.0185 - accuracy: 0.9958 - val_loss: 0.0724 - val_accuracy: 0.9793
Epoch 8/10
- 4s - loss: 0.0132 - accuracy: 0.9971 - val_loss: 0.0718 - val_accuracy: 0.9778
Epoch 9/10
- 4s - loss: 0.0087 - accuracy: 0.9988 - val_loss: 0.0712 - val_accuracy: 0.9798
Epoch 10/10
- 4s - loss: 0.0062 - accuracy: 0.9992 - val_loss: 0.0705 - val_accuracy: 0.9800
prueba:
10000/10000 [==============================] - 6s 569us/step
0.06873653566057918
0.9797999858856201
ps: a veces puede superar 0,98
Función de activación: Sigmoide
256 neuronas
Resumen:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 256) 200960
_________________________________________________________________
dense_3 (Dense) (None, 10) 2570
=================================================================
Total params: 203,530
Trainable params: 203,530
Non-trainable params: 0
_________________________________________________________________
None
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.7395 - accuracy: 0.8315 - val_loss: 0.3386 - val_accuracy: 0.9109
Epoch 2/10
- 1s - loss: 0.3100 - accuracy: 0.9136 - val_loss: 0.2560 - val_accuracy: 0.9277
Epoch 3/10
- 1s - loss: 0.2492 - accuracy: 0.9290 - val_loss: 0.2233 - val_accuracy: 0.9381
Epoch 4/10
- 1s - loss: 0.2119 - accuracy: 0.9391 - val_loss: 0.1974 - val_accuracy: 0.9424
Epoch 5/10
- 1s - loss: 0.1835 - accuracy: 0.9466 - val_loss: 0.1757 - val_accuracy: 0.9517
Epoch 6/10
- 1s - loss: 0.1608 - accuracy: 0.9533 - val_loss: 0.1607 - val_accuracy: 0.9551
Epoch 7/10
- 1s - loss: 0.1424 - accuracy: 0.9593 - val_loss: 0.1489 - val_accuracy: 0.9587
Epoch 8/10
- 1s - loss: 0.1269 - accuracy: 0.9638 - val_loss: 0.1394 - val_accuracy: 0.9621
Epoch 9/10
- 1s - loss: 0.1141 - accuracy: 0.9677 - val_loss: 0.1291 - val_accuracy: 0.9634
Epoch 10/10
- 1s - loss: 0.1025 - accuracy: 0.9711 - val_loss: 0.1216 - val_accuracy: 0.9659
10000/10000 [==============================] - 4s 380us/step
0.11642538407448501
0.9645000100135803
El efecto es significativamente peor
Función de activación tanh
256 neuronas
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4394 - accuracy: 0.8801 - val_loss: 0.2483 - val_accuracy: 0.9302
Epoch 2/10
- 1s - loss: 0.2252 - accuracy: 0.9352 - val_loss: 0.1883 - val_accuracy: 0.9479
Epoch 3/10
- 1s - loss: 0.1681 - accuracy: 0.9514 - val_loss: 0.1556 - val_accuracy: 0.9580
Epoch 4/10
- 1s - loss: 0.1313 - accuracy: 0.9631 - val_loss: 0.1374 - val_accuracy: 0.9603
Epoch 5/10
- 1s - loss: 0.1064 - accuracy: 0.9704 - val_loss: 0.1214 - val_accuracy: 0.9652
Epoch 6/10
- 1s - loss: 0.0876 - accuracy: 0.9763 - val_loss: 0.1140 - val_accuracy: 0.9668
Epoch 7/10
- 1s - loss: 0.0728 - accuracy: 0.9802 - val_loss: 0.1063 - val_accuracy: 0.9694
Epoch 8/10
- 1s - loss: 0.0610 - accuracy: 0.9837 - val_loss: 0.0951 - val_accuracy: 0.9731
Epoch 9/10
- 1s - loss: 0.0510 - accuracy: 0.9870 - val_loss: 0.0926 - val_accuracy: 0.9721
Epoch 10/10
- 1s - loss: 0.0426 - accuracy: 0.9894 - val_loss: 0.0866 - val_accuracy: 0.9738
10000/10000 [==============================] - 4s 371us/step
0.08017727720420531
0.9753999710083008
Función de activación rlu (unidades lineales exponenciales)
256 neuronas
Train on 48000 samples, validate on 12000 samples
Epoch 1/10
- 1s - loss: 0.4413 - accuracy: 0.8773 - val_loss: 0.2636 - val_accuracy: 0.9261
Epoch 2/10
- 1s - loss: 0.2476 - accuracy: 0.9284 - val_loss: 0.2049 - val_accuracy: 0.9422
Epoch 3/10
- 1s - loss: 0.1849 - accuracy: 0.9471 - val_loss: 0.1645 - val_accuracy: 0.9557
Epoch 4/10
- 1s - loss: 0.1423 - accuracy: 0.9593 - val_loss: 0.1424 - val_accuracy: 0.9599
Epoch 5/10
- 1s - loss: 0.1139 - accuracy: 0.9676 - val_loss: 0.1232 - val_accuracy: 0.9658
Epoch 6/10
- 1s - loss: 0.0936 - accuracy: 0.9734 - val_loss: 0.1140 - val_accuracy: 0.9674
Epoch 7/10
- 1s - loss: 0.0781 - accuracy: 0.9778 - val_loss: 0.1070 - val_accuracy: 0.9692
Epoch 8/10
- 1s - loss: 0.0670 - accuracy: 0.9807 - val_loss: 0.0976 - val_accuracy: 0.9720
Epoch 9/10
- 1s - loss: 0.0570 - accuracy: 0.9839 - val_loss: 0.0939 - val_accuracy: 0.9725
Epoch 10/10
- 1s - loss: 0.0485 - accuracy: 0.9868 - val_loss: 0.0880 - val_accuracy: 0.9740
10000/10000 [==============================] - 4s 374us/step
0.07968259554752871
0.9749000072479248