Python and deep learning (5): CNN and handwritten digit recognition

1. Description

Starting from this article, the convolutional neural network CNN is introduced. CNN is more suitable for image classification problems than ANN.

2. Convolution operation

The process of convolution operation is shown in the figure below. The grayscale image (which can be considered as a two-dimensional matrix) is obtained by fitting the convolution kernel from the upper left corner, and then multiplying and adding the numbers at the corresponding positions. The first number of feature maps.
insert image description here
Then, by moving the convolution kernel, multiplying and adding the corresponding positions, the second result of the feature map can be obtained, where the step size is 1.
The movement direction of the convolution kernel is to move to the right first, to the end, then move down and return to the beginning, and then continue to move to the right. After such a reciprocating movement, the result shown in the figure below can be obtained.
insert image description here
For single-channel input and multi-core phenomena, each convolution kernel and single-channel image are used to perform operations, and finally two feature maps are obtained, as shown in the figure below.
insert image description here
For multi-channel situations, as shown in the figure below.
insert image description here
Each channel of the multi-channel is convolved with the corresponding channel of the convolution kernel, and then the calculation results of all channels are added to obtain the result of the feature map, as shown in the figure below.
insert image description here
For the multi-channel multi-core phenomenon, each convolution kernel and multi-channel are operated to obtain multiple feature maps, as shown in the figure below.
insert image description here
The above are all cases where the step size is 1, that is, the convolution kernel moves one pixel at a time. If the step size is 2, the convolution kernel moves two pixels at a time.

3. Fill

Through the above introduction, we found that after multi-layer convolution, the feature map will become smaller and smaller, which will cause the lack of information. In order to reduce the impact, we can perform a filling operation, that is, add a circle of 0 to the outer layer of the feature map, and then combine with The convolution kernel performs operations, as shown in the figure below.
insert image description here
At this time, the parameter padding of the convolution kernel is same, and it is valid when no padding is performed, and the default is valid.

4. Pooling

Sometimes the feature map is very large and the network has too many parameters. In order to reduce the number of network parameters, a pooling operation is used. Pooling can be divided into maximum pooling and average pooling.
Maximum pooling is to find the maximum value in each pooling area of ​​the feature map as the output result, as shown in the figure below.
insert image description here
Average pooling is to find the result of averaging the feature map in each pooling area as the output result, as shown in the figure below.
insert image description here

5. Convolutional neural network combat - CNN model for handwritten digit recognition

5.1 Import related libraries

The following third-party libraries are python-specific libraries for deep learning

from keras.datasets import mnist
import matplotlib.pyplot as plt
from tensorflow import keras
from keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPool2D
from keras.models import Sequential
from keras.callbacks import EarlyStopping
import tensorflow as tf
from keras import optimizers, losses

5.2 Load data

Load the MNIST dataset

"1.加载数据"
"""
x_train是mnist训练集图片,大小的28*28的,y_train是对应的标签是数字
x_test是mnist测试集图片,大小的28*28的,y_test是对应的标签是数字
"""
(x_train, y_train), (x_test, y_test) = mnist.load_data()  # 加载mnist数据集
print('mnist_data:', x_train.shape, y_train.shape, x_test.shape, y_test.shape)  # 打印训练数据和测试数据的形状

5.3 Data preprocessing

(1) Normalize the input picture and transform it from 0-255 to 0-1;
(2) Convert the shape of the input picture (60000, 28, 28) into (60000, 28, 28, 1), which is convenient Input to the neural network;
(3) One-hot encoding of the label y, because the output of the neural network is 10 probability values, and y is 1 number, which cannot be calculated correspondingly when calculating loss, so y is independently encoded into 10 The row vector of the number, and then calculate the loss One-hot encoding: For example, the one-hot encoding of the 10 categories of the value 1 is [0 1 0 0 0 0 0 0 0 0, that is, the position of 1 is 1, and the rest of the positions are 0.

"2.数据预处理"


def preprocess(x, y):  # 数据预处理函数
    x = tf.cast(x, dtype=tf.float32) / 255.  # 将输入的图片进行归一化,从0-255变换到0-1
    x = tf.reshape(x, [28, 28, 1])
    """
    # 将输入图片的形状(60000,28,28)转换成(60000,28,28,1),
    相当于将图片拉直,便于输入给神经网络
    """
    y = tf.cast(y, dtype=tf.int32)  # 将输入图片的标签转换为int32类型
    y = tf.one_hot(y, depth=10)
    """
    # 将标签y进行独热编码,因为神经网络的输出是10个概率值,而y是1个数,
    计算loss时无法对应计算,因此将y进行独立编码成为10个数的行向量,然后进行loss的计算
    独热编码:例如数值1的10分类的独热编码是[0 1 0 0 0 0 0 0 0 0,即1的位置为1,其余位置为0
    """
    return x, y

5.4 Data processing

After the data is loaded into the memory, it needs to be converted into a Dataset object in order to take advantage of various convenient functions provided by TensorFlow.
Through Dataset.from_tensor_slices, the data image x and label y of the training part can be converted into Dataset objects

batchsz = 128  # 每次输入给神经网络的图片数
"""
数据加载进入内存后,需要转换成 Dataset 对象,才能利用 TensorFlow 提供的各种便捷功能。
通过 Dataset.from_tensor_slices 可以将训练部分的数据图片 x 和标签 y 都转换成Dataset 对象
"""
db = tf.data.Dataset.from_tensor_slices((x_train, y_train))  # 构建训练集对象
db = db.map(preprocess).shuffle(60000).batch(batchsz)  # 将数据进行预处理,随机打散和批量处理
ds_val = tf.data.Dataset.from_tensor_slices((x_test, y_test))  # 构建测试集对象
ds_val = ds_val.map(preprocess).batch(batchsz)  # 将数据进行预处理,随机打散和批量处理

5.5 Building a network model

Two convolutional layers, two pooling layers, then a flattening layer (straightening the two-dimensional feature map into the fully connected layer) are constructed, followed by three fully connected layers.

"3.构建网络模型"
model = Sequential([Conv2D(filters=6, kernel_size=(5, 5), activation='relu'),
                    MaxPool2D(pool_size=(2, 2), strides=2),
                    Conv2D(filters=16, kernel_size=(5, 5), activation='relu'),
                    MaxPool2D(pool_size=(2, 2), strides=2),
                    Flatten(),
                    Dense(120, activation='relu'),
                    Dense(84, activation='relu'),
                    Dense(10,activation='softmax')])

model.build(input_shape=(None, 28, 28, 1))  # 模型的输入大小
model.summary()  # 打印网络结构

5.6 Model compilation

The optimizer of the model is Adam, the learning rate is 0.01,
the loss function is losses.CategoricalCrossentropy, and
the performance index is accuracy

"4.模型编译"
model.compile(optimizer=optimizers.Adam(lr=0.01),
                loss=tf.losses.CategoricalCrossentropy(from_logits=False),
                metrics=['accuracy']
                )
"""
模型的优化器是Adam,学习率是0.01,
损失函数是losses.CategoricalCrossentropy,
性能指标是正确率accuracy
"""

5.7 Model Training, Storage and Evaluation

The number of model training is 5, and each cycle is tested;
the model is saved in the .h5 file format;
the correct rate of the test set is obtained.

"5.模型训练"
history = model.fit(db, epochs=5, validation_data=ds_val, validation_freq=1)
"""
模型训练的次数是5,每1次循环进行测试
"""
"6.模型保存"
model.save('cnn_mnist.h5')  # 以.h5文件格式保存模型

"7.模型评价"
model.evaluate(ds_val)  # 得到测试集的正确率

5.8 Model testing

Test the model

"8.模型测试"
sample = next(iter(ds_val))  # 取一个batchsz的测试集数据
x = sample[0]  # 测试集数据
y = sample[1]  # 测试集的标签
pred = model.predict(x)  # 将一个batchsz的测试集数据输入神经网络的结果
pred = tf.argmax(pred, axis=1)  # 每个预测的结果的概率最大值的下标,也就是预测的数字
y = tf.argmax(y, axis=1)  # 每个标签的最大值对应的下标,也就是标签对应的数字
print(pred)  # 打印预测结果
print(y)  # 打印标签数字

5.9 Visualization of model training results

Visualize the training results of the model

"9.模型训练时的可视化"
# 显示训练集和验证集的acc和loss曲线
acc = history.history['accuracy']  # 获取模型训练中的accuracy
val_acc = history.history['val_accuracy']  # 获取模型训练中的val_accuracy
loss = history.history['loss']  # 获取模型训练中的loss
val_loss = history.history['val_loss']  # 获取模型训练中的val_loss
# 绘值acc曲线
plt.figure(1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()
# 绘制loss曲线
plt.figure(2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()  # 将结果显示出来

6. Visualization result map of CNN model for handwritten digit recognition

Epoch 1/5
469/469 [==============================] - 12s 22ms/step - loss: 0.1733 - accuracy: 0.9465 - val_loss: 0.0840 - val_accuracy: 0.9763
Epoch 2/5
469/469 [==============================] - 11s 21ms/step - loss: 0.0704 - accuracy: 0.9793 - val_loss: 0.0581 - val_accuracy: 0.9819
Epoch 3/5
469/469 [==============================] - 11s 22ms/step - loss: 0.0566 - accuracy: 0.9833 - val_loss: 0.0576 - val_accuracy: 0.9844
Epoch 4/5
469/469 [==============================] - 11s 22ms/step - loss: 0.0573 - accuracy: 0.9833 - val_loss: 0.0766 - val_accuracy: 0.9784
Epoch 5/5
469/469 [==============================] - 11s 22ms/step - loss: 0.0556 - accuracy: 0.9844 - val_loss: 0.0537 - val_accuracy: 0.9830

insert image description here
insert image description here
From the above results, it can be seen that the accuracy of the model reached 98%.

7. Complete code

from keras.datasets import mnist
import matplotlib.pyplot as plt
from tensorflow import keras
from keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPool2D
from keras.models import Sequential
from keras.callbacks import EarlyStopping
import tensorflow as tf
from keras import optimizers, losses
"1.加载数据"
"""
x_train是mnist训练集图片,大小的28*28的,y_train是对应的标签是数字
x_test是mnist测试集图片,大小的28*28的,y_test是对应的标签是数字
"""
(x_train, y_train), (x_test, y_test) = mnist.load_data()  # 加载mnist数据集
print('mnist_data:', x_train.shape, y_train.shape, x_test.shape, y_test.shape)  # 打印训练数据和测试数据的形状

"2.数据预处理"


def preprocess(x, y):  # 数据预处理函数
    x = tf.cast(x, dtype=tf.float32) / 255.  # 将输入的图片进行归一化,从0-255变换到0-1
    x = tf.reshape(x, [28, 28, 1])
    """
    # 将输入图片的形状(60000,28,28)转换成(60000,28,28,1),
    相当于将图片拉直,便于输入给神经网络
    """
    y = tf.cast(y, dtype=tf.int32)  # 将输入图片的标签转换为int32类型
    y = tf.one_hot(y, depth=10)
    """
    # 将标签y进行独热编码,因为神经网络的输出是10个概率值,而y是1个数,
    计算loss时无法对应计算,因此将y进行独立编码成为10个数的行向量,然后进行loss的计算
    独热编码:例如数值1的10分类的独热编码是[0 1 0 0 0 0 0 0 0 0,即1的位置为1,其余位置为0
    """
    return x, y


batchsz = 128  # 每次输入给神经网络的图片数
"""
数据加载进入内存后,需要转换成 Dataset 对象,才能利用 TensorFlow 提供的各种便捷功能。
通过 Dataset.from_tensor_slices 可以将训练部分的数据图片 x 和标签 y 都转换成Dataset 对象
"""
db = tf.data.Dataset.from_tensor_slices((x_train, y_train))  # 构建训练集对象
db = db.map(preprocess).shuffle(60000).batch(batchsz)  # 将数据进行预处理,随机打散和批量处理
ds_val = tf.data.Dataset.from_tensor_slices((x_test, y_test))  # 构建测试集对象
ds_val = ds_val.map(preprocess).batch(batchsz)  # 将数据进行预处理,随机打散和批量处理

"3.构建网络模型"
model = Sequential([Conv2D(filters=6, kernel_size=(5, 5), activation='relu'),
                    MaxPool2D(pool_size=(2, 2), strides=2),
                    Conv2D(filters=16, kernel_size=(5, 5), activation='relu'),
                    MaxPool2D(pool_size=(2, 2), strides=2),
                    Flatten(),
                    Dense(120, activation='relu'),
                    Dense(84, activation='relu'),
                    Dense(10,activation='softmax')])

model.build(input_shape=(None, 28, 28, 1))  # 模型的输入大小
model.summary()  # 打印网络结构

"4.模型编译"
model.compile(optimizer=optimizers.Adam(lr=0.01),
                loss=tf.losses.CategoricalCrossentropy(from_logits=False),
                metrics=['accuracy']
                )
"""
模型的优化器是Adam,学习率是0.01,
损失函数是losses.CategoricalCrossentropy,
性能指标是正确率accuracy
"""

"5.模型训练"
history = model.fit(db, epochs=5, validation_data=ds_val, validation_freq=1)
"""
模型训练的次数是5,每1次循环进行测试
"""
"6.模型保存"
model.save('cnn_mnist.h5')  # 以.h5文件格式保存模型

"7.模型评价"
model.evaluate(ds_val)  # 得到测试集的正确率

"8.模型测试"
sample = next(iter(ds_val))  # 取一个batchsz的测试集数据
x = sample[0]  # 测试集数据
y = sample[1]  # 测试集的标签
pred = model.predict(x)  # 将一个batchsz的测试集数据输入神经网络的结果
pred = tf.argmax(pred, axis=1)  # 每个预测的结果的概率最大值的下标,也就是预测的数字
y = tf.argmax(y, axis=1)  # 每个标签的最大值对应的下标,也就是标签对应的数字
print(pred)  # 打印预测结果
print(y)  # 打印标签数字

"9.模型训练时的可视化"
# 显示训练集和验证集的acc和loss曲线
acc = history.history['accuracy']  # 获取模型训练中的accuracy
val_acc = history.history['val_accuracy']  # 获取模型训练中的val_accuracy
loss = history.history['loss']  # 获取模型训练中的loss
val_loss = history.history['val_loss']  # 获取模型训练中的val_loss
# 绘值acc曲线
plt.figure(1)
plt.plot(acc, label='Training Accuracy')
plt.plot(val_acc, label='Validation Accuracy')
plt.title('Training and Validation Accuracy')
plt.legend()
# 绘制loss曲线
plt.figure(2)
plt.plot(loss, label='Training Loss')
plt.plot(val_loss, label='Validation Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.show()  # 将结果显示出来

Guess you like

Origin blog.csdn.net/qq_47598782/article/details/131871796