四、单层卷积网络

五、池化层

六、Classic Networks ——LeNet-5

1. 基础理论

输入数据和标签，输出损失值和选定的指标值（如精确度accuracy）

plt.subplot()函数用于直接指定划分方式和位置进行绘图。

（4）最终代码合集

七、总结

前言

看的视频是吴恩达的深度学习相关讲解，链接如下：

14.14.2.2.经典网络_哔哩哔哩_bilibilihttps://www.bilibili.com/video/BV1SB4y1s7D2/?p=14&vd_source=1ac3c4db6c62f190a2b66f5032778fc9

一、边缘探测

二、填充（padding）

三、三维卷积

四、单层卷积网络

五、池化层

六、Classic Networks ——LeNet-5

1. 基础理论

2. 代码理解

（1）数据集获取

from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Input, Dropout
from keras.models import Model
from keras.utils import np_utils


"""
数据集获取
"""
def get_mnist_data():

    (x_train_original, y_train_original), (x_test_original, y_test_original) = mnist.load_data()

    # 从训练集中分配验证集
    x_val = x_train_original[50000:] #（10000，28，28）每一个图片
    y_val = y_train_original[50000:] #10000，每个图片的标签
    x_train = x_train_original[:50000]# （50000，28，28）
    y_train = y_train_original[:50000]#50000

    # 将图像转换为四维矩阵(nums,rows,cols,channels), 这里把数据从unint类型转化为float32类型, 提高训练精度。
    x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32')
    #x_train.shape[0]表示x_train的行数。28是图片自身的大小。这里与原本的LeNet-5不同，原有的输入大小是32
    x_val = x_val.reshape(x_val.shape[0], 28, 28, 1).astype('float32')
    x_test = x_test_original.reshape(x_test_original.shape[0], 28, 28, 1).astype('float32')

    # 原始图像的像素灰度值为0-255，为了提高模型的训练精度，通常将数值归一化映射到0-1。
    x_train = x_train / 255
    x_val = x_val / 255
    x_test = x_test / 255

    # 图像标签一共有10个类别即0-9，这里将其转化为独热编码（One-hot）向量
    y_train = np_utils.to_categorical(y_train)
    y_val = np_utils.to_categorical(y_val)
    y_test = np_utils.to_categorical(y_test_original)

    return x_train, y_train, x_val, y_val, x_test, y_test

分析：

① 独热向量编码（one-hot encoding）

独热编码通常用于处理类别间不具有大小关系的特征。

例如：特征：血型，一共有四种类别（A,B,AB,O），采用独热编码后，会把血型变成有一个4维的稀疏向量，

A表示为（1,0,0,0）
B表示为（0,1,0,0）
AB表示为（0,0,1,0）
O表示为（0,0,0,1）

有几个类别，就会生成几维的稀疏向量。

（2）定义网络

"""
定义LeNet-5网络模型
"""
def LeNet5():

    input_shape = Input(shape=(28, 28, 1))

    x = Conv2D(6, (5, 5), activation="relu", padding="same")(input_shape)
    x = MaxPooling2D((2, 2), 2)(x)
    x = Conv2D(16, (5, 5), activation="relu", padding='same')(x)
    x = MaxPooling2D((2, 2), 2)(x)

    x = Flatten()(x)
    x = Dense(120, activation='relu')(x)
    x = Dense(84, activation='relu')(x)
    x = Dense(10, activation='softmax')(x)

    model = Model(input_shape, x)
    print(model.summary())

    return model

分析：

① Conv2D用法

tf.keras.layers.Conv2D(
    filters,
    kernel_size,
    strides=(1, 1),
    padding="valid",
    data_format=None,
    dilation_rate=(1, 1),
    groups=1,
    activation=None,
    use_bias=True,
    kernel_initializer="glorot_uniform",
    bias_initializer="zeros",
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    **kwargs
)

在这里，filters = 6, dernel_size = (5,5), strides = (1, 1), padding = ;same', activation = 'relu'，其余为默认值。但是我认为这是现代改进版的LeNet-5，在最开始的版本中，padding = valid, activation = None，究其原因，1995年还没有发明出这么多方法。在后面我会进行比较。

② x = Flatten()(x)

flatten()是对多维数据的降维函数。
flatten(),默认缺省参数为0，也就是说flatten()和flatte(0)效果一样。
python里的flatten(dim)表示，从第dim个维度开始展开，将后面的维度转化为一维.也就是说，只保留dim之前的维度，其他维度的数据全都挤在dim这一维。
比如一个数据的维度是( S 0 , S 1 , S 2......... , S n ) (S0,S1,S2.........,Sn)(S0,S1,S2.........,Sn), flatten(m)后的数据为( S 0 ， S 1 ， S 2 ， . . . ， S m − 2 ， S m − 1 ， S m ∗ S m + 1 ∗ S m + 2 ∗ . . . ∗ S n ) (S0，S1，S2，...，Sm-2，Sm-1，Sm*Sm+1*Sm+2*...*Sn)(S0，S1，S2，...，Sm−2，Sm−1，Sm∗Sm+1∗Sm+2∗...∗Sn)

示例：

import torch
a = torch.rand(2,3,4)
b = a.flatten()
c = a.flatten(0)
print(f'a is {a}')
print(f'b is {b}')
print(f'c is {c}')

结果：

a is tensor([[[0.9502, 0.0951, 0.9110, 0.5439],
         [0.9169, 0.2038, 0.2820, 0.6624],
         [0.2834, 0.9371, 0.7960, 0.4528]],

        [[0.2815, 0.3445, 0.7618, 0.8268],
         [0.3344, 0.4812, 0.5323, 0.8737],
         [0.9929, 0.2042, 0.4372, 0.3169]]])
b is tensor([0.9502, 0.0951, 0.9110, 0.5439, 0.9169, 0.2038, 0.2820, 0.6624, 0.2834,
        0.9371, 0.7960, 0.4528, 0.2815, 0.3445, 0.7618, 0.8268, 0.3344, 0.4812,
        0.5323, 0.8737, 0.9929, 0.2042, 0.4372, 0.3169])
c is tensor([0.9502, 0.0951, 0.9110, 0.5439, 0.9169, 0.2038, 0.2820, 0.6624, 0.2834,
        0.9371, 0.7960, 0.4528, 0.2815, 0.3445, 0.7618, 0.8268, 0.3344, 0.4812,
        0.5323, 0.8737, 0.9929, 0.2042, 0.4372, 0.3169])

③ 全连接Dense

keras.layers.Dense(units, 
				  activation=None, 
				  use_bias=True, 
				  kernel_initializer='glorot_uniform', 
				  bias_initializer='zeros', 
				  kernel_regularizer=None, 
				  bias_regularizer=None, 
			      activity_regularizer=None, 
				  kernel_constraint=None, 
				  bias_constraint=None)

参数说明如下：

units:
该层有几个神经元

activation:
该层使用的激活函数

use_bias:
是否添加偏置项

kernel_initializer:
权重初始化方法

bias_initializer:
偏置值初始化方法

kernel_regularizer:
权重规范化函数

bias_regularizer:
偏置值规范化方法

activity_regularizer:
输出的规范化方法

kernel_constraint:
权重变化限制函数

bias_constraint:
偏置值变化限制函数

④ model。summary（）：

使用keras构建深度学习模型，我们会通过model.summary()输出模型各层的参数状况

（3）编译网络并训练

"""
编译网络并训练
"""
x_train, y_train, x_val, y_val, x_test, y_test = get_mnist_data()
model = LeNet5()

# 编译网络（定义损失函数、优化器、评估指标）
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# 开始网络训练（定义训练数据与验证数据、定义训练代数，定义训练批大小）
train_history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=20, batch_size=32, verbose=2)

# 模型保存
model.save('lenet_mnist.h5')

# 定义训练过程可视化函数（训练集损失、验证集损失、训练集精度、验证集精度）
def show_train_history(train_history, train, validation):
    plt.plot(train_history.history[train])
    plt.plot(train_history.history[validation])
    plt.title('Train History')
    plt.ylabel(train)
    plt.xlabel('Epoch')
    plt.legend(['train', 'validation'], loc='best')
    plt.show()

show_train_history(train_history, 'accuracy', 'val_accuracy')
show_train_history(train_history, 'loss', 'val_loss')

# 输出网络在测试集上的损失与精度
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# 测试集结果预测
predictions = model.predict(x_test)
predictions = np.argmax(predictions, axis=1)
print('前20张图片预测结果：', predictions[:20])

# 预测结果图像可视化
(x_train_original, y_train_original), (x_test_original, y_test_original) = mnist.load_data()
def mnist_visualize_multiple_predict(start, end, length, width):

    for i in range(start, end):
        plt.subplot(length, width, 1 + i)
        plt.imshow(x_test_original[i], cmap=plt.get_cmap('gray'))
        title_true = 'true=' + str(y_test_original[i])
        # title_prediction = ',' + 'prediction' + str(model.predict_classes(np.expand_dims(x_test[i], axis=0)))
        title_prediction = ',' + 'prediction' + str(predictions[i])
        title = title_true + title_prediction
        plt.title(title)
        plt.xticks([])
        plt.yticks([])
    plt.show()

mnist_visualize_multiple_predict(start=0, end=9, length=3, width=3)

# 混淆矩阵
cm = confusion_matrix(y_test_original, predictions)
cm = pd.DataFrame(cm)
class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

分析：

① model。compile（）

model.compile(optimizer = 优化器，

loss = 损失函数，

metrics = ["准确率”])

其中：

optimizer可以是字符串形式给出的优化器名字，也可以是函数形式，使用函数形式可以设置学习率、动量和超参数

例如：“sgd” 或者 tf.optimizers.SGD(lr = 学习率，

decay = 学习率衰减率，

momentum = 动量参数）

“adagrad" 或者 tf.keras.optimizers.Adagrad(lr = 学习率，

decay = 学习率衰减率）

”adadelta" 或者 tf.keras.optimizers.Adadelta(lr = 学习率，

decay = 学习率衰减率）

“adam" 或者 tf.keras.optimizers.Adam(lr = 学习率，

decay = 学习率衰减率）

loss可以是字符串形式给出的损失函数的名字，也可以是函数形式

例如：”mse" 或者 tf.keras.losses.MeanSquaredError()

"sparse_categorical_crossentropy" 或者 tf.keras.losses.SparseCatagoricalCrossentropy(from_logits = False)

损失函数经常需要使用softmax函数来将输出转化为概率分布的形式，在这里from_logits代表是否将输出转为概率分布的形式，为False时表示转换为概率分布，为True时表示不转换，直接输出

Metrics标注网络评价指标

例如：

"accuracy" : y_ 和 y 都是数值，如y_ = [1] y = [1] #y_为真实值，y为预测值

“sparse_accuracy":y_和y都是以独热码和概率分布表示，如y_ = [0, 1, 0], y = [0.256, 0.695, 0.048]

"sparse_categorical_accuracy" :y_是以数值形式给出，y是以独热码给出，如y_ = [1], y = [0.256 0.695, 0.048]

②model.fei():

fit( x, y, batch_size=32, epochs=10, verbose=1, callbacks=None,
validation_split=0.0, validation_data=None, shuffle=True, 
class_weight=None, sample_weight=None, initial_epoch=0)

x：输入数据。如果模型只有一个输入，那么x的类型是numpy
array，如果模型有多个输入，那么x的类型应当为list，list的元素是对应于各个输入的numpy array
y：标签，numpy array
batch_size：整数，指定进行梯度下降时每个batch包含的样本数。训练时一个batch的样本会被计算一次梯度下降，使目标函数优化一步。
epochs：整数，训练终止时的epoch值，训练将在达到该epoch值时停止，当没有设置initial_epoch时，它就是训练的总轮数，否则训练的总轮数为epochs - inital_epoch
verbose：日志显示，0为不在标准输出流输出日志信息，1为输出进度条记录，2为每个epoch输出一行记录
callbacks：list，其中的元素是keras.callbacks.Callback的对象。这个list中的回调函数将会在训练过程中的适当时机被调用，参考回调函数
validation_split：0~1之间的浮点数，用来指定训练集的一定比例数据作为验证集。验证集将不参与训练，并在每个epoch结束后测试的模型的指标，如损失函数、精确度等。注意，validation_split的划分在shuffle之前，因此如果你的数据本身是有序的，需要先手工打乱再指定validation_split，否则可能会出现验证集样本不均匀。
validation_data：形式为（X，y）的tuple，是指定的验证集。此参数将覆盖validation_spilt。
shuffle：布尔值或字符串，一般为布尔值，表示是否在训练过程中随机打乱输入样本的顺序。若为字符串“batch”，则是用来处理HDF5数据的特殊情况，它将在batch内部将数据打乱。
class_weight：字典，将不同的类别映射为不同的权值，该参数用来在训练过程中调整损失函数（只能用于训练）
sample_weight：权值的numpy
array，用于在训练时调整损失函数（仅用于训练）。可以传递一个1D的与样本等长的向量用于对样本进行1对1的加权，或者在面对时序数据时，传递一个的形式为（samples，sequence_length）的矩阵来为每个时间步上的样本赋不同的权。这种情况下请确定在编译模型时添加了sample_weight_mode=’temporal’。
initial_epoch: 从该参数指定的epoch开始训练，在继续之前的训练时有用。

fit函数返回一个History的对象，其History.history属性记录了损失函数和其他指标的数值随epoch变化的情况，如果有验证集的话，也包含了验证集的这些指标变化情况

③ model.evaluate（）

输入数据和标签，输出损失值和选定的指标值（如精确度accuracy）

示例1：

	# 评估模型,不输出预测结果
	loss,accuracy = model.evaluate(X_test,Y_test)
	print('\ntest loss',loss)
	print('accuracy',accuracy)

示例2：

score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

④ np.argmax(predictions, axis=1)

输出最大值的索引

import numpy as np


#一维数组
data1 = np.random.randint(0, 10, size=10)
print(f'data1 is {data1}')
a1 = np.argmax(data1)
print(f'a1 is {a1}')

#二维数组
data2 = np.random.randint(0, 10, size=16).reshape((2,8))
print(f'data2 is {data2}')
a2_0 = np.argmax(data2, axis=0)
print(f'a2_0 is {a2_0}')
a2_1 = np.argmax(data2, axis=1)
print(f'a2_1 is {a2_1}')

结果：

data1 is [7 6 7 0 5 5 8 2 9 8]
a1 is 8
data2 is [[3 0 4 1 3 3 4 0]
 [4 3 8 1 5 6 4 6]]
a2_0 is [1 1 1 0 1 1 0 1]
a2_1 is [2 2]

④plt.subplot()

plt.subplot()函数用于直接指定划分方式和位置进行绘图。

# 使用plt.subplot来创建小图. plt.subplot(221)表示将整个图像窗口分为2行2列, 当前位置为1.
plt.subplot(221)
# plt.subplot(222)表示将整个图像窗口分为2行2列, 当前位置为2.
plt.subplot(222) # 第一行的右图
# plt.subplot(223)表示将整个图像窗口分为2行2列, 当前位置为3.
plt.subplot(223)
# plt.subplot(224)表示将整个图像窗口分为2行2列, 当前位置为4.
plt.subplot(224)

对于本代码而言：

⑤ plt.imshow(x_test_original[i], cmap=plt.get_cmap('gray'))

imshow()其实就是将数组的值以图片的形式展示出来,数组的值对应着不同的颜色深浅,而数值的横纵坐标就是数组的索引,比如一个1000X1000的数组,图片里的点也就有1000X1000个,比如第一个行第一个点的坐标就是(0,0),它的值会通过colorbar(也就是cmap)反映出来,所以按照我的理解，imshow()函数的功能就是把数值展示成热图。下面是一个简单的代码段：

x = np.linspace(0, 10, 1000)
I = np.sin(x) * np.cos(x[:, np.newaxis])
plt.imshow(I, cmap='RdBu')
cb = plt.colorbar(label='color bar settings')
plt.show()

（4）最终代码合集

from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.metrics import confusion_matrix
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Input, Dropout
from keras.models import Model
from keras.utils import np_utils


"""
数据集获取
"""
def get_mnist_data():

    (x_train_original, y_train_original), (x_test_original, y_test_original) = mnist.load_data()

    # 从训练集中分配验证集
    x_val = x_train_original[50000:] #（10000，28，28）每一个图片
    y_val = y_train_original[50000:] #10000，每个图片的标签
    x_train = x_train_original[:50000]# （50000，28，28）
    y_train = y_train_original[:50000]#50000

    # 将图像转换为四维矩阵(nums,rows,cols,channels), 这里把数据从unint类型转化为float32类型, 提高训练精度。
    x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32')
    #x_train.shape[0]表示x_train的行数。28是图片自身的大小。这里与原本的LeNet-5不同，原有的输入大小是32
    x_val = x_val.reshape(x_val.shape[0], 28, 28, 1).astype('float32')
    x_test = x_test_original.reshape(x_test_original.shape[0], 28, 28, 1).astype('float32')

    # 原始图像的像素灰度值为0-255，为了提高模型的训练精度，通常将数值归一化映射到0-1。
    x_train = x_train / 255
    x_val = x_val / 255
    x_test = x_test / 255

    # 图像标签一共有10个类别即0-9，这里将其转化为独热编码（One-hot）向量
    y_train = np_utils.to_categorical(y_train)#标签都变成为二维
    y_val = np_utils.to_categorical(y_val)
    y_test = np_utils.to_categorical(y_test_original)

    return x_train, y_train, x_val, y_val, x_test, y_test


"""
定义LeNet-5网络模型
"""
def LeNet5():

    input_shape = Input(shape=(28, 28, 1))

    x = Conv2D(6, (5, 5), activation="relu", padding="same")(input_shape)#第一层卷积
    # tf.keras.layers.Conv2D(
    # filters = 6,
    # kernel_size = （5，5）,
    # strides=(1, 1),
    # padding="same",
    # data_format=None,
    # dilation_rate=(1, 1),
    # groups=1,
    # activation=relu,
    # use_bias=True,
    # kernel_initializer="glorot_uniform",
    # bias_initializer="zeros",
    # kernel_regularizer=None,
    # bias_regularizer=None,
    # activity_regularizer=None,
    # kernel_constraint=None,
    # bias_constraint=None,
    # **kwargs

    x = MaxPooling2D((2, 2), 2)(x) #第一层池化，原始版本用的是average pooling
    x = Conv2D(16, (5, 5), activation="relu", padding='same')(x) #第二层卷积
    x = MaxPooling2D((2, 2), 2)(x) #原始没有第二层的池化

    x = Flatten()(x)#将生成的数据展开成一维的
    x = Dense(120, activation='relu')(x)#第一层全连接
    #keras.layers.Dense(units = 120,
                       #activation=relu,
                       #use_bias=True,
                       #kernel_initializer='glorot_uniform',
                       #bias_initializer='zeros',
                       #kernel_regularizer=None,
                       #bias_regularizer=None,
                       #activity_regularizer=None,
                       #kernel_constraint=None,
                       #bias_constraint=None)

    x = Dense(84, activation='relu')(x) #第二层全连接
    x = Dense(10, activation='softmax')(x) #第三层全连接

    model = Model(input_shape, x)#放的是输入和输出
    print(model.summary())

    return model

"""
编译网络并训练
"""
x_train, y_train, x_val, y_val, x_test, y_test = get_mnist_data()
model = LeNet5()

# 编译网络（定义损失函数、优化器、评估指标）
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
#model.compile(optimizer=优化器，
#loss = 损失函数，
#metrics = ["准确率”])
#其中：optimizer可以是字符串形式给出的优化器名字，也可以是函数形式，使用函数形式可以设置学习率、动量和超参数


# 开始网络训练（定义训练数据与验证数据、定义训练代数，定义训练批大小）
train_history = model.fit(x_train, y_train, validation_data=(x_val, y_val), epochs=10, batch_size=32, verbose=2)

# 模型保存
model.save('lenet_mnist.h5')

# 定义训练过程可视化函数（训练集损失、验证集损失、训练集精度、验证集精度）
def show_train_history(train_history, train, validation):
    plt.plot(train_history.history[train])
    plt.plot(train_history.history[validation])
    plt.title('Train History')
    plt.ylabel(train)
    plt.xlabel('Epoch')
    plt.legend(['train', 'validation'], loc='best')
    plt.show()

show_train_history(train_history, 'accuracy', 'val_accuracy')#输出训练集和验证集上的准确率
show_train_history(train_history, 'loss', 'val_loss')

# 输出网络在测试集上的损失与精度
score = model.evaluate(x_test, y_test)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# 测试集结果预测
predictions = model.predict(x_test)
#是一个二维矩阵，每一行表示对一个图片的预测结果，每一行中的每一个数表明识别的这个数字是0-9的概率，取最大值即最有可能的预测值
predictions = np.argmax(predictions, axis=1) #每一行最大值所在的列数
print('前20张图片预测结果：', predictions[:20])

# 预测结果图像可视化
(x_train_original, y_train_original), (x_test_original, y_test_original) = mnist.load_data()
def mnist_visualize_multiple_predict(start, end, length, width):

    for i in range(start, end):
        plt.subplot(length, width, 1 + i) #一张图片放的行数，一张图片放的列数，该子图所在的位置
        plt.imshow(x_test_original[i], cmap=plt.get_cmap('gray'))
        title_true = 'true=' + str(y_test_original[i])
        # title_prediction = ',' + 'prediction' + str(model.predict_classes(np.expand_dims(x_test[i], axis=0)))
        title_prediction = ',' + 'prediction' + str(predictions[i])
        title = title_true + title_prediction
        plt.title(title)
        plt.xticks([])
        plt.yticks([])
    plt.show()

mnist_visualize_multiple_predict(start=0, end=20, length=4, width=5)

结果：

Test loss: 0.04232775792479515
Test accuracy: 0.989799976348877
前20张图片预测结果： [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]

七、总结

这只是第一个典型CNN模型，后面还会继续更新。敬请期待！

【机器学习】CNN卷积神经网络: 典型网络1：LeNet-5

前言

一、边缘探测

二、填充（padding）

三、三维卷积

四、单层卷积网络

五、池化层

六、Classic Networks ——LeNet-5

1. 基础理论

2. 代码理解

（1）数据集获取

（2）定义网络

（3）编译网络并训练

输入数据和标签，输出损失值和选定的指标值（如精确度accuracy）

plt.subplot()函数用于直接指定划分方式和位置进行绘图。

（4）最终代码合集

七、总结

猜你喜欢