The keras deep learning framework realizes handwritten digit recognition through convolutional neural network cnn

Yesterday , we built a simple neural network through keras to realize handwritten digit recognition . As a result, when we finally performed our own handwritten digit recognition, the accuracy was worrying, only 60%. Handwritten digit recognition is achieved today through convolutional neural networks.

The idea of building a convolutional neural network is similar to that of a simple neural network, except that concepts such as convolution and pooling are added here. The network structure is a bit more complicated, but the overall idea remains the same. Load the data set, modify the data set, and build the network model. Compile the model, train the model, save the model, and use the model to predict.

Here are two examples. One is to build a network and finally save the trained network model. The other is to predict our own handwritten digit pictures by loading the saved network model.

import keras
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout, Conv2D, Flatten, MaxPool2D
from tensorflow.keras import datasets, utils
# 数据处理
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], x_train.shape[1], x_train.shape[1], 1)
x_train = x_train.astype('float32') / 255
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], x_test.shape[1], 1)
x_test = x_test.astype('float32') / 255
y_train = utils.to_categorical(y_train, num_classes=10)
y_test = utils.to_categorical(y_test, num_classes=10)
# 构建模型
model = Sequential()
model.add(Conv2D(filters=16, kernel_size=(3, 3), padding='same', activation="relu", input_shape=(28, 28, 1)))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Conv2D(filters=36, kernel_size=(3, 3), padding='same', activation="relu"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation="relu"))
model.add(Dropout(0.25))
model.add(Dense(10, activation="softmax"))
# 编译
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()
# 训练
model.fit(x_train, y_train, epochs=5, batch_size=128, validation_data=(x_test, y_test))
# 保存模型
model.save("mnist.h5")

Train the model and print the information as follows:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 28, 28, 16)        160       
                                                                 
 max_pooling2d (MaxPooling2D  (None, 14, 14, 16)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 14, 14, 36)        5220      
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 7, 7, 36)         0         
 2D)                                                             
                                                                 
 dropout (Dropout)           (None, 7, 7, 36)          0         
                                                                 
 flatten (Flatten)           (None, 1764)              0         
                                                                 
 dense (Dense)               (None, 128)               225920    
                                                                 
 dropout_1 (Dropout)         (None, 128)               0         
                                                                 
 dense_1 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 232,590
Trainable params: 232,590
Non-trainable params: 0
_________________________________________________________________
Epoch 1/5
2023-08-28 16:03:54.677314: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8800
469/469 [==============================] - 10s 17ms/step - loss: 0.2842 - accuracy: 0.9123 - val_loss: 0.0628 - val_accuracy: 0.9798
Epoch 2/5
469/469 [==============================] - 7s 16ms/step - loss: 0.0836 - accuracy: 0.9743 - val_loss: 0.0473 - val_accuracy: 0.9841
Epoch 3/5
469/469 [==============================] - 7s 16ms/step - loss: 0.0627 - accuracy: 0.9801 - val_loss: 0.0325 - val_accuracy: 0.9886
Epoch 4/5
469/469 [==============================] - 7s 15ms/step - loss: 0.0497 - accuracy: 0.9844 - val_loss: 0.0346 - val_accuracy: 0.9882
Epoch 5/5
469/469 [==============================] - 7s 15ms/step - loss: 0.0422 - accuracy: 0.9867 - val_loss: 0.0298 - val_accuracy: 0.9898

The accuracy finally reached over 98.5%.

Predict with model

import keras
import numpy as np
import cv2
from keras.models import load_model

model = load_model("mnist.h5")


def predict(img_path):
    img = cv2.imread(img_path, 0)
    img = img.reshape(28, 28).astype("float32") / 255  # 0 1
    img = img.reshape(1, 28, 28, 1)  # 28 * 28 -> (1,28,28,1)
    label = model.predict(img)
    label = np.argmax(label, axis=1)
    print('{} -> {}'.format(img_path, label[0]))


if __name__ == '__main__':
    for _ in range(10):
        predict("number_images/b_{}.png".format(_))

The digital pictures are as follows:

The images are placed in the project directory number_images.

Prediction results print:

It feels different, and the accuracy rate has increased from 60% to 90%. It's not 100%, but it's pretty good.

Compared with the previous code, the changes are very small. The main reason is that the data shape has changed when the network is input. A simple neural network requires a (784, *) structure, and a convolutional neural network requires (1, 28, 28, 1 ) structure, adjustments have been made in data processing. Another difference is that when the network model is added, it was a simple two-layer network before, and the convolutional neural network is much more complicated.

The keras deep learning framework realizes handwritten digit recognition through convolutional neural network cnn

Guess you like