Artificial Intelligence Study Notes 4 - Handwritten Number Recognition

This article will use the convolutional neural network model to classify and recognize the handwritten digit set minist, and the framework used is keras. If you are not clear about convolutional neural networks, you can take a look at this article Convolutional Neural Networks (caodong0225.github.io) .

MNIST is a picture dataset of handwritten digits. The dataset was organized by the National Institute of Standards and Technology. A total of 250 different handwritten digit pictures were collected, 50% of which were high school students and 50% were from the census. bureau staff. The purpose of collecting this data set is to realize the recognition of handwritten digits through algorithms.

The training set contains a total of 60,000 images and labels, while the test set contains a total of 10,000 images and labels. The first 5000 in the test set are from the original NIST project training set, and the last 5000 are from the original NIST project test set. The first 5,000 numbers are more regular than the last 5,000 because the first 5,000 are from U.S. Census Bureau employees, while the last 5,000 are from college students.

download link:

caodong0225.github.io/minist.zip at master · caodong0225/caodong0225.github.io

Since 1998, this data set has been widely used in the field of machine learning and deep learning to test the effect of algorithms, such as linear classifiers (Linear Classifiers), K-Nearest Neighbors (K-Nearest Neighbors), support vector machines (SVMs), Neural Nets, Convolutional nets, etc.

Figure 1 (minist partial handwritten data set)

Keras is an open source artificial neural network library written in Python, which can be used as a high-level application program interface for Tensorflow, Microsoft-CNTK and Theano to design, debug, evaluate, apply and visualize deep learning models.

Keras is written in an object-oriented way in terms of code structure, and is fully modularized and scalable. Its operating mechanism and documentation take user experience and difficulty of use into consideration, and try to simplify the difficulty of implementing complex algorithms. Keras supports mainstream algorithms in the field of modern artificial intelligence, including neural networks with feedforward structure and recursive structure, and can also participate in the construction of statistical learning models through encapsulation. In terms of hardware and development environment, Keras supports multi-GPU parallel computing under multiple operating systems, and can be converted into components under Tensorflow, Microsoft-CNTK and other systems according to background settings. Therefore, this article uses keras as the framework.

Since the length and width of the input set of handwritten digits are both 28 pixels, and the color space is black and white, there is no need for an overly complicated structure. I first perform two convolution operations on the input data, the convolution kernel size is 3×3, and then perform a pooling operation, then perform two convolution operations, then perform a pooling operation, and then flatten the data, Construct a 128-dimensional fully connected layer, and finally output a 10-dimensional array. In the entire network, the activation function of all layers except the last one is the relu function, the activation function of the last layer uses the softmax function, and the loss function uses the crossentropy function. as shown in picture 2.

Figure 2 Schematic diagram of convolutional neural network

Figure 3 Schematic diagram of convolutional neural network

The code for the network construction is as follows:

#构建卷积神经网络  

def create_model():  

    model = keras.Sequential()  

    model.add(layers.Conv2D(5, (3, 3),activation='relu',input_shape=(28,28,1),padding = 'same'))  

    model.add(layers.Conv2D(5, (3, 3), activation='relu', padding = 'same'))  

    model.add(layers.MaxPooling2D(pool_size = (2,2)))  

    model.add(layers.Conv2D(10, (3, 3), activation='relu', padding = 'same'))  

    model.add(layers.Conv2D(10, (3, 3), activation='relu', padding = 'same'))  

    model.add(layers.MaxPooling2D(pool_size = (2,2)))  

    model.add(layers.Flatten())#将数据压缩成一维数组  

    model.add(layers.Dense(128, activation='relu'))  

model.add(layers.Dense(10, activation='softmax'))  

return model  

After 25 epochs of training, the loss value of the network finally dropped to 0.0307, ​​and the accuracy rate reached 0.99.

The training change of the accuracy rate is shown in Figure 4. Train represents the change in the accuracy rate of the training set, and Test represents the change in the accuracy rate of the test set.

The training changes of the loss value are shown in Figure 5. Train represents the change of the loss value of the training set, and Test represents the change of the loss value of the test set.

The changes in the overall training process are shown in Fig. 6.

Figure 4 model accuracy change curve

Figure 5 Model loss value change curve

Figure 6 Training process diagram

The full code is:

#coding:gbk  

from PIL import Image  

import numpy as np  

from keras import layers  

import keras  

import matplotlib.pyplot as plt  

import glob  

np.set_printoptions(threshold=np.inf)  

wid = 28#定义图片的宽  

hei = 28#定义图片的长  

def process(preimg):#读取图片,将它转化成np.array的格式  

    imge = Image.open(preimg,'r')  

    imge = imge.convert('L')  

    imge = imge.resize((wid,hei))  

    return (np.asarray(imge))  

trainset = []  

trainexpe = []  

testset = []  

testexpe = []  

train_src = glob.glob("train_images//*.jpg")#训练数据

test_src = glob.glob("test_images//*.jpg")#测试数据  

for data in train_src:#数据填充到数组中  

    trainset.append(process(data))  

    datatem = [0] * 10  

    datatem[int(data [-5])] = 1  

    trainexpe.append(datatem)  

for data in test_src:  

    testset.append(process(data))  

    datatem = [0] * 10  

    datatem[int(data [-5])] = 1  

    testexpe.append(datatem)  

trainset = np.array(trainset).reshape((-1,wid,hei,1))  

trainexpe = np.array(trainexpe)  

testset = np.array(testset).reshape((-1,wid,hei,1))  

testexpe = np.array(testexpe)  

def create_model():  

    model = keras.Sequential()  

    model.add(layers.Conv2D(5, (3, 3), activation='relu', input_shape=(wid,hei,1),padding = 'same'))  

    ''''' 

    filters 要去训练多少个卷积核 

    kernel_size: 卷积核大小 

    activation: 非线性化所需要去使用的激活函数 

    input_shape:输入数据的形状 

    padding:same表示填充一圈,valid表示不填充 

    strides表示滑动的步长 

    '''  

    model.add(layers.Conv2D(5, (3, 3), activation='relu', padding = 'same'))#第二层添加的时候就不需要 input_shape 这个参数了,因为默认是去根据上一层输出的形状进行计算。  

    model.add(layers.MaxPooling2D(pool_size = (2,2)))#默认maxpooling大小为(2,2)  

    model.add(layers.Conv2D(10, (3, 3), activation='relu', padding = 'same'))  

    model.add(layers.Conv2D(10, (3, 3), activation='relu', padding = 'same'))  

    model.add(layers.MaxPooling2D(pool_size = (2,2)))  

    #model.summary()#显示model信息  

    model.add(layers.Flatten())#将数据压缩成一维数组  

    model.add(layers.Dense(128, activation='relu'))  

    model.add(layers.Dropout(0.5))  

    model.add(layers.Dense(10, activation='softmax'))  

    return model  

  

model = create_model()#load_model("test.h5")  

model.compile(optimizer="adam",loss= "categorical_crossentropy",metrics=['accuracy'])  

history = model.fit(trainset,trainexpe,batch_size=60,epochs=25,verbose=2, validation_data=(testset,testexpe))  

  

plt.plot(history.history['accuracy'])  

plt.plot(history.history['val_accuracy'])  

plt.title('Model accuracy')  

plt.ylabel('Accuracy')  

plt.xlabel('Epoch')  

plt.legend(['Train', 'Test'], loc='upper left')  

plt.show()  

# 绘制训练 & 验证的损失值  

plt.plot(history.history['loss'])  

plt.plot(history.history['val_loss'])  

plt.title('Model loss')  

plt.ylabel('Loss')  

plt.xlabel('Epoch')  

plt.legend(['Train', 'Test'], loc='upper left')  

plt.show()  

model.save('test.h5')  

Guess you like

Origin blog.csdn.net/qq_45198339/article/details/128708159