The keras deep learning framework realizes handwritten digit recognition through a simple neural network

background

The keras deep learning framework is not an independent deep learning framework. It relies on tensorflow or theano in the background. Most developers should use tensorflow. Keras can easily build the neural network we need based on the model like building blocks, and then compile, train, test, and predict.

The handwritten digit recognition experiment introduced today is mainly to be familiar with the process of building a neural network in keras, as well as the general idea. Nowadays, there are various codes for handwritten digit recognition experiments. For beginners, what we need is a simple example like helloworld. Through examples, we can understand the process of building a neural network.

The handwritten digit recognition used here builds a network, builds a model, and finally saves the model. Then we load the model and predict it through real pictures, and also test the ability of the neural network.

The handwritten digit recognition data here comes from the official mnist data set. This data set contains 60,000 training sets and 10,000 test sets. Each data is composed of 28 * 28 = 784 matrix elements. Therefore, the pictures we use for testing should be made according to the 28*28 size in the end, and when making the final prediction, we should also convert the pictures into a 784-element array like the training set or test set.

Prepare code

import keras
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Activation
from tensorflow.keras import datasets, utils
import matplotlib.pyplot as plt


(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28*28))
x_train = x_train.astype('float32')/255
x_test = x_test.reshape((-1, 28*28))
x_test = x_test.astype('float32')/255

y_train = utils.to_categorical(y_train, num_classes=10)
y_test = utils.to_categorical(y_test, num_classes=10)

print('x_train.shape', x_train.shape)
print('x_test.shape', x_test.shape)
print('y_train.shape', y_train.shape)
print('y_test.shape', y_test.shape)
"""
layer = [Dense(32, input_shape=(784,)),
         Activation('relu'),
         Dense(10),
         Activation('softmax')]

model = Sequential(layer)
"""
model = Sequential()
# model.add(Dense(units=784, activation="relu", input_dim=784))
model.add(Dense(512, activation="relu", input_shape=(28*28, )))
model.add(Dense(10, activation="softmax"))

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

history = model.fit(x_train, y_train, epochs=5, batch_size=128, validation_data=(x_test, y_test))

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label="Training accuracy")
plt.plot(epochs, val_acc, 'b', label="Validation accuracy")
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
model.save("mnist.h5")
prediction = model.predict(x_test[:1], batch_size=32)
print(x_test[:1])
print(y_test[:1])
print(prediction)
print(np.argmax(prediction, axis=1))

After introducing the relevant libraries, the first thing this code does is data processing:

(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28*28))
x_train = x_train.astype('float32')/255
x_test = x_test.reshape((-1, 28*28))
x_test = x_test.astype('float32')/255
y_train = utils.to_categorical(y_train, num_classes=10)
y_test = utils.to_categorical(y_test, num_classes=10)

print('x_train.shape', x_train.shape)
print('x_test.shape', x_test.shape)
print('y_train.shape', y_train.shape)
print('y_test.shape', y_test.shape)

Our data sets x_train and x_test are our image data. This data is an array of 784 elements. We first transform the matrix, and then take the modulo of the pixels to obtain a value between 0-1. Our code finally prints x_test[:1], you can see what it looks like:

Here we also use utils.to_categorical(y_test,num_classes=10) to one-hot transcode our target. We can also see from this picture that after the number 7 is converted to one-hot encoding, it becomes [0,0,0,0,0,0,0,1,0,0].

This code builds a simple neural network, just two layers,

The first input layer Dense(512,activation="relu",input_shape=(28*28, )) #512 nodes, relu activation function, input shape or dimension 28*28=784. The code also gives another method of specifying dimensions through input_dim, which means the same, but the network specified by model.add(Dense(units=784, activation="relu", input_dim=784)) nodeunits=784. This number can be defined arbitrarily. In handwritten digit recognition, you can set 512 or 784.

The second output layer Dense(10, activation="softmax") #Here specifies the corresponding ten categories, that is, the number of numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Handwritten digit recognition is a multi-classification problem.

There are no hidden layers and no other dropout. It is a simple neural network.

In addition, the code also gives a way to build a neural network:

layer = [Dense(32, input_shape=(784,)),
         Activation('relu'),
         Dense(10),
         Activation('softmax')]

model = Sequential(layer)

The meaning is the same, except that here units=32, that is, the input layer is composed of 32 neural network nodes.

model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
model.summary()

This is to compile the neural network and print the neural network summary.

Compile the neural network and pass in loss="categorical_cressentropy" to indicate that the loss function is cross entropy. optimizer="adam" means that the optimizer is adam, which means an adaptive algorithm. In addition, you may also see sgd, a stochastic gradient descent algorithm, or rmsprop, which is also an adaptive algorithm. metrics=["accuracy"] Statistical indicators, specify the success rate here.

Through model.summary() we can see the neural network node information:

history = model.fit(x_train, y_train, epochs=5, batch_size=128, validation_data=(x_test, y_test))

Here is to put the training and testing neural network together, and the validation_data we pass in specifies the test data set. If you do not specify validation_data, then later, we can also get loss, acc and other data through model.evaluate(x_test, y_test).

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label="Training accuracy")
plt.plot(epochs, val_acc, 'b', label="Validation accuracy")
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

We use matplot to display acc, val_acc and other information, and the result is shown in the figure below:

We also save the model via model.save("mnist.h5"), which we will load later for prediction.

prediction = model.predict(x_test[:1], batch_size=32)
print(x_test[:1])
print(y_test[:1])
print(prediction)
print(np.argmax(prediction, axis=1))

We simply performed a verification through the first number 7 of the test set. This verification is mainly to know what type of data we need to pass in the picture in the future, and how to get the value after getting the prediction result. Here prediction is an array assembled according to probability. Whichever probability is higher will be the final result. We specify by np.argmax(prediction, axis=1) to obtain the largest number in an array by row (axis=1).

***************************************************************

predict

In many code examples, basically after model.evaluate() evaluates the algorithm, there is no more. For those who are just getting started, the neural network is created and tested, and they don’t know if it works well. Because the training set and test machine are examples given on the official website, for programmers, it is the most important thing to verify a guess through practice, and it doesn't matter what it is.

At the end of the above code, we simply made a prediction through the test set x_test[:1], which is the first test number. We probably know that the required data is an array of [28*28=784] to predict. And the test picture we prepared should also correspond to the official test data, that is, the picture mentioned above is a digital picture of 28*28 pixels, as shown below:

The same code is given:

import keras
import numpy as np
import cv2
from keras.models import load_model

model = load_model("mnist.h5")


def predict(img_path):
    img = cv2.imread(img_path, 0)
    img = img.reshape(28, 28).astype("float32") / 255  # 0 1
    img = img.reshape(-1, 784)  # 28 * 28 -> 784
    label = model.predict(img)
    label = np.argmax(label, axis=1)
    print('{} -> {}'.format(img_path, label[0]))


if __name__ == '__main__':
    for _ in range(10):
        predict("number_images/b_{}.png".format(_))

We put these images in the number_images directory, and the naming rules are b_0.png, b_1.png.

Finally, we load the model, and load the image through the opencv library, and convert the image matrix to an array of 784 elements. Then it is handed over to the model for prediction. The prediction result is a probability array, and the element of the array with the highest probability is taken.

The prediction results are as follows:

The results are very touching, but it does not reach a high probability, with an accuracy of 60%. Moreover, this probability is a bit high for handwritten picture recognition, because in fact many digital pictures are recognized incorrectly.

This article mainly focuses on building a simple neural network with keras, training, testing, and finally using our own handwritten digital pictures to conduct prediction verification, and also got addicted to deep learning.

The versions of keras and tensorflow in this article are 2.8.0. There may be several APIs that are different from other places, such as datasets. Here tensorflow.keras.datasets is used. In addition, when calculating the success rate acc, history['accuracy'] is used. In some places, it may be history['acc'] directly. It should be a version problem. Just find the appropriate method according to your own version.