Keras deep learning framework builds LeNet5 neural network model to realize handwritten digit recognition

    The previous two articles used the keras deep learning framework to build a simple neural network and a convolutional neural network to implement handwritten digit recognition experiments. This article shares the convolutional neural network I built based on the LeNet5 model to implement handwritten digit recognition.

     This experiment is to build a convolutional neural network based on the LeNet5 model. The schematic diagram of the LeNet5 model is as follows:

     I believe you have seen this model in many places, but in fact, many codes do not strictly follow this model for experiments. Why? Mainly, the handwritten digit recognition model data mnist specification is 28*28, and the data input required by this LeNet5 model is 32*32. In order to make do, everyone privately changed the parameters of this model, and the final input became 28*28. Then after the first convolution, the feature map became 6@24*24, and downsampling reduced it by half: 6@ 12*12. When it comes to the fully connected layer, the feature map becomes 16@4*4.

    In fact, there is nothing wrong with this modification, and the purpose of the experiment can be achieved in the end. Moreover, after training the model, the test accuracy can reach more than 98%.

     To really follow this LeNet5 model, you need 32*32 input shapes, but now it seems that only the KNN handwritten digit recognition experiment seems to have 32*32=1024 specifications of the training data set trainingDigits and the test data set testDigits, and the amount of data is relatively small. , there are 2,000 training data sets and more than 900 test data sets. The other is that its data is text. To get that data, you need to read the text conversion.

    There is a compromise, that is, we can use the resize method provided by opencv to convert the 28*28 shape of the mnist data set into a 32*32 shape. This may cause some loss of accuracy, and now the test data and the picture data used to predict are also It will still be modified, so the accuracy problem can be considered to be offset.

    In fact, there are some ambiguities in this model itself. That is, the specific method of the downsampling layer is not given in this figure. Whether to use maximum sampling or average sampling, as well as the activation function of the convolution layer, you can only know it by reading the paper carefully. Find out.

     Therefore, in implementation, the activation function of the convolution layer generally uses tanh, but there is no problem in using relu. Similarly, the downsampling layer SubSampling uses MaxPool2D or AveragePooling2D.

    Briefly describe the model:

    Input image matrix shape (32,32)

    The first convolutional layer: 6 convolution kernels, the size of the convolution kernel is 5*5, so the size of the output feature maps is 6@(32-5+1) * (32-5+1) = 6 @28*28

    The second layer of downsampling (SubSampling): 2*2 size, the result is that the size of the feature map continues to be halved by 6@14*14

    The third convolution layer: 16 convolution kernels, convolution kernel size 5*5, output feature map size: 16@(14-5+1)*(14-5+1)=16@10*10

   The fourth layer of downsampling: 2*2 size, the feature map is reduced by half 16@5*5

    The fifth flattening layer: 120 neurons

    The sixth fully connected layer: 84 neurons

    Seventh layer output layer: 10 neurons

    Next, build a neural network based on the mnist data set and LeNet5 data model and train and test it. The code is as follows:

from keras.models import Sequential
from keras.layers import Conv2D, MaxPool2D, AveragePooling2D
from keras.layers import Dense, Flatten
import keras
from keras.datasets import mnist
from keras.utils import np_utils
from keras.utils.vis_utils import plot_model
import numpy as np
import cv2

# 加载数据
(X_train, y_train), (X_test, y_test) = mnist.load_data()
input_shape = (32, 32, 1)
train_x = []

test_x = []

for val in X_train:
    img = cv2.resize(val, (input_shape[0], input_shape[1]))
    train_x.append(img.reshape(input_shape))

for val_ in X_test:
    img = cv2.resize(val_, (input_shape[0], input_shape[1]))
    test_x.append(img.reshape(input_shape))

# 数据预处理
X_train = np.array(train_x) / 255.0
X_test = np.array(test_x) / 255.0

# to_categorical()将类别向量转换为二进制(只有0和1)的矩阵类型表示
y_train = np_utils.to_categorical(y_train, num_classes=10)
y_test = np_utils.to_categorical(y_test, num_classes=10)

model = Sequential()
model.add(Conv2D(6, kernel_size=(5, 5), activation='tanh', input_shape=input_shape))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Conv2D(16, kernel_size=(5, 5), activation='tanh'))
model.add(AveragePooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(120, activation='tanh'))
model.add(Dense(84, activation='tanh'))
model.add(Dense(10, activation='softmax'))

# 模型编译
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
# 训练
model.fit(X_train, y_train, batch_size=128, epochs=10)

# 评估模型
score = model.evaluate(X_test, y_test)
print('acc', score[1])
plot_model(model, to_file='model.png', show_shapes=True)
model.save("lenet5.h5")

    Run the code and print the model parameter information as follows:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 28, 28, 6)         156       
                                                                 
 average_pooling2d (AverageP  (None, 14, 14, 6)        0         
 ooling2D)                                                       
                                                                 
 conv2d_1 (Conv2D)           (None, 10, 10, 16)        2416      
                                                                 
 average_pooling2d_1 (Averag  (None, 5, 5, 16)         0         
 ePooling2D)                                                     
                                                                 
 flatten (Flatten)           (None, 400)               0         
                                                                 
 dense (Dense)               (None, 120)               48120     
                                                                 
 dense_1 (Dense)             (None, 84)                10164     
                                                                 
 dense_2 (Dense)             (None, 10)                850       
                                                                 
=================================================================
Total params: 61,706
Trainable params: 61,706
Non-trainable params: 0

    The training and testing process is as follows:

Epoch 1/10
2023-08-30 22:04:32.166768: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8800
469/469 [==============================] - 8s 12ms/step - loss: 0.3042 - accuracy: 0.9110
Epoch 2/10
469/469 [==============================] - 5s 10ms/step - loss: 0.1113 - accuracy: 0.9664
Epoch 3/10
469/469 [==============================] - 5s 10ms/step - loss: 0.0709 - accuracy: 0.9784
Epoch 4/10
469/469 [==============================] - 5s 11ms/step - loss: 0.0530 - accuracy: 0.9843
Epoch 5/10
469/469 [==============================] - 5s 11ms/step - loss: 0.0410 - accuracy: 0.9875
Epoch 6/10
469/469 [==============================] - 5s 11ms/step - loss: 0.0329 - accuracy: 0.9898
Epoch 7/10
469/469 [==============================] - 6s 12ms/step - loss: 0.0283 - accuracy: 0.9910
Epoch 8/10
469/469 [==============================] - 6s 12ms/step - loss: 0.0228 - accuracy: 0.9928
Epoch 9/10
469/469 [==============================] - 6s 12ms/step - loss: 0.0193 - accuracy: 0.9939
Epoch 10/10
469/469 [==============================] - 6s 13ms/step - loss: 0.0159 - accuracy: 0.9949
313/313 [==============================] - 2s 4ms/step - loss: 0.0406 - accuracy: 0.9871
acc 0.9871000051498413

   The test accuracy rate is as high as 98.7%.

    At the end of the code, we also save the model image through plot_model:

 

    In addition, for prediction, we also saved the model file lenet5.h5. 

    The prediction is still an ancestral code, and the picture shape is changed to 32*32 pixels, because the prediction picture is still 28*28 pixels, with white text on a black background:

import keras
import numpy as np
import cv2
from keras.models import load_model

model = load_model("lenet5.h5")


def predict(img_path):
    img = cv2.imread(img_path, 0)
    img = cv2.resize(img, (32, 32))
    img = img.astype("float32") / 255  # 0 1
    img = img.reshape(1, 32, 32, 1)  # 32 * 32 -> (1,32,32,1)
    label = model.predict(img)
    label = np.argmax(label, axis=1)
    print('{} -> {}'.format(img_path, label[0]))


if __name__ == '__main__':
    for _ in range(10):
        predict("number_images/b_{}.png".format(_))

    Experimental results:

 

    This result is actually not unexpected, nor is it because this model is very powerful and the prediction is as high as 100%. In fact, there are only 10 pictures predicted here, which is not a lot. Many pictures have been tested by myself, so they look very powerful. . 

    There are many variables here. The first one is the input size problem mentioned earlier. If we change 32*32 to 28*28, then we do not need to modify the shape when using the mnist data set. But this is different from this model, although it can finally run with high test accuracy and prediction accuracy. Another variable is the activation function of the convolution layer. Tanh is used here. In fact, it is no problem to use relu. Another one is to use MaxPool2D as downsampling. When compiling the model, we use the rmsprop optimizer, but adam can also be used.

     The LeNet5 model is actually very suitable for digital recognition, but there is no suitable training data set for digital recognition, and the cifar10 data set is 32*32. Therefore, from a coding perspective, this model is most suitable for image classification.

Guess you like

Origin blog.csdn.net/feinifi/article/details/132591032