Using Keras to train a Lenet network for handwritten digit recognition

Using Keras to train a Lenet network for handwritten digit recognition

This blog will describe how to train a Lenet network using Keras for handwritten digit recognition.

  • The LeNet architecture is a pioneering work in deep learning that demonstrates how to train a neural network to recognize objects in an image in an end-to-end fashion (i.e. without having to do feature extraction, the network is able to learn patterns from the image itself). First introduced by LeCun et al. In their 1998 paper, gradient-based learning was applied to document recognition. As the name of the paper suggests, the author's motivation for implementing LeNet is mainly for Optical Character Recognition (OCR).
  • Although groundbreaking, LeNet is still considered a "superficial" network by today's standards. With only four trainable layers (two CONV layers and two FC layers), the depth of LeNet dwarfs that of current state-of-the-art architectures such as VGG (16 and 19 layers) and ResNet (100 layers) .
  • The LeNet architecture is simple and small (in terms of memory footprint), making it ideal for learning the basics of CNNs.

This blog will first review the LeNet architecture and then implement the network using Keras. Finally, LeNet for handwritten digit recognition will be evaluated on the MNIST dataset.

1. Rendering

After 20 and 10 epochs of training, errors are reported, and the CPU is directly stuck at 100%. Adjusted 8 epochs, success...

2022-07-04 22:34:57.847384: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found
2022-07-04 22:34:57.848391: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[INFO] accessing MNIST...
[INFO] compiling model...
D:\python374\lib\site-packages\keras\optimizer_v2\optimizer_v2.py:356: UserWarning: The `lr` argument is deprecated, use `learning_rate` instead.
  "The `lr` argument is deprecated, use `learning_rate` instead.")
2022-07-04 22:35:35.461843: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'nvcuda.dll'; dlerror: nvcuda.dll not found
2022-07-04 22:35:35.462571: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-07-04 22:35:35.467148: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: WIN10-20180515Z
2022-07-04 22:35:35.467837: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: WIN10-20180515Z
2022-07-04 22:35:35.468665: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[INFO] training network...
2022-07-04 22:35:38.528379: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
Epoch 1/8
  1/469 [..............................] - ETA: 4:54 - loss: 2.3132 - accuracy: 0.1250
  2/469 [..............................] - ETA: 27s - loss: 2.3172 - accuracy: 0.1211 
  3/469 [..............................] - ETA: 27s - loss: 2.3099 - accuracy: 0.1354
  4/469 [..............................] - ETA: 26s - loss: 2.3119 - accuracy: 0.1387
  5/469 [..............................] - ETA: 27s - loss: 2.3136 - accuracy: 0.1375
  6/469 [..............................] - ETA: 27s - loss: 2.3145 - accuracy: 0.1289
  7/469 [..............................] - ETA: 27s - loss: 2.3133 - accuracy: 0.1306
  8/469 [..............................] - ETA: 27s - loss: 2.3121 - accuracy: 0.1348
 ...
 ...
 ...
467/469 [============================>.] - ETA: 0s - loss: 1.0499 - accuracy: 0.7285
468/469 [============================>.] - ETA: 0s - loss: 1.0482 - accuracy: 0.7290
469/469 [==============================] - 28s 58ms/step - loss: 1.0469 - accuracy: 0.7293 - val_loss: 0.2980 - val_accuracy: 0.9138
Epoch 2/8
 ...
 ...
 ...
Epoch 8/8
 ...
 ...
 ...
468/469 [============================>.] - ETA: 0s - loss: 0.0795 - accuracy: 0.9769
469/469 [==============================] - 26s 55ms/step - loss: 0.0795 - accuracy: 0.9769 - val_loss: 0.0639 - val_accuracy: 0.9791
[INFO] evaluating network...
              precision    recall  f1-score   support

           0       0.98      0.99      0.98       980
           1       0.99      0.99      0.99      1135
           2       0.98      0.98      0.98      1032
           3       0.99      0.97      0.98      1010
           4       0.98      0.98      0.98       982
           5       0.98      0.98      0.98       892
           6       0.98      0.98      0.98       958
           7       0.98      0.97      0.98      1028
           8       0.96      0.98      0.97       974
           9       0.97      0.96      0.97      1009

    accuracy                           0.98     10000
   macro avg       0.98      0.98      0.98     10000
weighted avg       0.98      0.98      0.98     10000

As you can see, LeNet achieves 98% classification accuracy, which is a big improvement over 92% when using a standard feedforward neural network.

The loss and accuracy over time are plotted as follows:

insert image description here

It can be seen that the network performs quite well. A classification accuracy of ≈96% has been reached after 5 epochs. As the learning rate remains constant and does not decay, the loss on training and validation data continues to drop with only a few small "spikes". After 8 epochs, the accuracy on the test set reached 98%.

Training and validation loss and accuracy mimic each other (almost) completely, with no signs of overfitting. It is often difficult to obtain such a well-behaved training graph, indicating that the network is learning the underlying patterns without overfitting.

The MNIST dataset is heavily preprocessed and is not representative of the image classification problems that one would encounter in the real world. Researchers tend to use the MNIST dataset as a benchmark to evaluate new classification algorithms. If their method fails to achieve >95% classification accuracy, there is a flaw in (1) the logic of the algorithm or (2) the implementation itself.

2. Principle

pip install opencv-contrib-python
  1. The LeNet architecture is an excellent "real world" network. The network is small, easy to understand, and large enough to provide interesting results.

  2. The LeNet architecture consists of two series of CONV=>TANH=>POOL layer sets, followed by fully connected layers and softmax outputs.

  3. The combination of LeNet+MNIST can easily run on CPU, making it easier for beginners to take their first steps in deep learning and CNN. (LeNet+MNIST is the "Hello, World" equivalent of deep learning applied to image classification.)

  4. The LeNet architecture consists of the following layers, using CONV=>ACT=>POOL mode and layer type of Convolutional Neural Network (CNN):

    INPUT => CONV => TANH => POOL => CONV => TANH => POOL => FC => TANH => FC

  5. The LeNet architecture uses a tanh activation function instead of the more popular ReLU. Back in 1998, ReLU was not used in deep learning - it is more common to use tanh or sigmoid as activation function.

Table 1 summarizes the parameters of the LeNet architecture. The input layer takes an input image with 28 rows and 28 columns and uses a single channel (grayscale) to represent the depth (i.e. the dimensions of the image in the MNIST dataset). Then learn 20 filters, each filter is 5×5. The CONV layer is followed by ReLU activation followed by max pooling of size 2×2 and stride 2×2.

The next block of the architecture follows the same pattern, this time learning 50 5×5 filters. It is common to increase the number of CONV layers deep in the network as the actual spatial input dimension decreases.
Then there are two FC layers. The first FC contains 500 hidden nodes followed by ReLU activations. The last FC layer controls the number of output class labels (0-9; one for each of the possible ten digits). Finally apply softmax activation to obtain class probabilities.

3. Source code

# 使用LeNet进行手写数字识别
# USAGE
# python lenet_mnist.py

# 1. 从磁盘加载MNIST数据集
# 2. 实例化LeNet架构
# 3. 训练LeNet模型
# 4. 评估网络性能

# 在绝大多数机器学习情况下,几乎所有的示例都遵循这种通用的导入模式:
# 将要训练的网络架构、用于训练网络的优化器(SGD)、用于构造给定数据集的训练和测试分割的(一组)便利函数、一个用于计算分类报告的函数,以便评估分类器的性能;
# 以及一些额外的类,以方便执行某些任务(例如预处理图像)。

# 导入必要的包
from pyimagesearch.nn.conv.lenet import LeNet
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.datasets import mnist
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from tensorflow.keras import backend as K
import matplotlib.pyplot as plt
import numpy as np

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

# MNIST数据集已经过预处理(11MB第一次会自动下载)
# load_data()会从Keras数据集存储库下载MNIST数据集。MNIST数据集被序列化为单个11MB文件,
# 注意:每个MNIST样本内部数据由28×28灰度图像的784-d矢量(即原始像素强度)表示。因此需要根据“通道优先”还是“通道最后”排序来重塑数据矩阵:
print("[INFO] accessing MNIST...")
((trainData, trainLabels), (testData, testLabels)) = mnist.load_data()

# 如果是通道优先,则转换为样本数*深度*高度*宽度
if K.image_data_format() == "channels_first":
    trainData = trainData.reshape((trainData.shape[0], 1, 28, 28))
    testData = testData.reshape((testData.shape[0], 1, 28, 28))

# 如果是通道最后,则转换矩阵为:num_samples x rows x columns x depth
else:
    trainData = trainData.reshape((trainData.shape[0], 28, 28, 1))
    testData = testData.reshape((testData.shape[0], 28, 28, 1))

# 将图像像素强度缩放到[0,1]范围
trainData = trainData.astype("float32") / 255.0
testData = testData.astype("float32") / 255.0

# 转换类标签编码为一个热向量,而不是单个整数值。如3,转换为热编码:[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
# 注意:向量中的所有项都是零,数字0是第一个索引,因此为什么三是第四个索引
le = LabelBinarizer()
trainLabels = le.fit_transform(trainLabels)
testLabels = le.transform(testLabels)

# 初始化优化器和模型
# 以0.01的学习率初始化SGD优化器
# 实例化LeNet,表明数据集中的所有输入图像都将是28像素宽、28像素高,深度为1。假设MNIST数据集中有十个类(每个数字一个,0−8) 因此将标签类型设置为10
# 使用交叉熵损失作为损失函数来编译模型
print("[INFO] compiling model...")
opt = SGD(lr=0.01)
model = LeNet.build(width=28, height=28, depth=1, classes=10)
model.compile(loss="categorical_crossentropy", optimizer=opt,
              metrics=["accuracy"])

# 训练网络
# 使用128个小批量在MNIST上训练LeNet总共10个纪元
print("[INFO] training network...")
H = model.fit(trainData, trainLabels,
              validation_data=(testData, testLabels), batch_size=128,
              epochs=8, verbose=1)

# 评估网络的性能,并绘制随时间变化的损失和准确性图表
# 调用model.predict() 对于testX中的每个样本,构造128个批量,然后通过网络进行分类。对所有测试数据点进行分类后,返回预测变量。
# 预测变量实际上是一个NumPy数组,形状为(len(testX),10),这意味着现在有10个概率与testX中每个数据点的每个类标签相关。
# classification_report中的argmax(axis=1)查找概率最大的标签索引(即最终输出分类)。给定网络的最终分类,可以将预测的类标签与实际的标签值进行比较。
print("[INFO] evaluating network...")
predictions = model.predict(testData, batch_size=128)
print(classification_report(testLabels.argmax(axis=1),
                            predictions.argmax(axis=1),
                            target_names=[str(x) for x in le.classes_]))

# 绘制训练/验证的损失/准确度图表
plt.style.use("ggplot")
plt.figure()
plt.plot(np.arange(0, 8), H.history["loss"], label="train_loss")
plt.plot(np.arange(0, 8), H.history["val_loss"], label="val_loss")
plt.plot(np.arange(0, 8), H.history["accuracy"], label="train_acc")
plt.plot(np.arange(0, 8), H.history["val_accuracy"], label="val_acc")
plt.title("Training Loss and Accuracy")
plt.xlabel("Epoch #")
plt.ylabel("Loss/Accuracy")
plt.legend()
plt.show()

refer to

Guess you like

Origin blog.csdn.net/qq_40985985/article/details/125559814