Keras Deep Learning - Training Raw Neural Networks

Get into the habit of writing together! This is the 8th day of my participation in the "Nuggets Daily New Plan · April Update Challenge", click to view the details of the event .

Train a vanilla neural network

Now that we've learned the basic concepts of neural networks, and we've seen how to use kerasthe library to build neural network models, in this section we'll go a step further and get a glimpse into the power of neural networks by implementing a practical model.

Introduction to Vanilla Neural Networks and the MNIST Dataset

A network that stacks multiple fully connected layers between the input and output is called a multilayer perceptron, and is sometimes colloquially called 香草a neural network (ie, the original neural network). To see how to train a vanilla neural network, we'll train a model to MNISTpredict the labels of numbers in MNISTa dataset, a very commonly used dataset consisting 250of different people, where the training set contains 60000images and the test set contains 10000images, each image has its label, and the image size is 28*28.

Building Neural Network Models with Keras

  1. Import relevant packages and datasets, and visualize datasets to understand the data:
from keras.datasets import mnist
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.utils import np_utils
import matplotlib.pyplot as plt

(x_train, y_train), (x_test, y_test) = mnist.load_data()
复制代码

In the preceding code, import the relevant Kerasmethods and MNISTdatasets.

  1. MNISTThe shapes of the images in the dataset are 28 x 28, plot some images in the dataset to get a better understanding of the dataset:
plt.subplot(221)
plt.imshow(x_train[0], cmap='gray')
plt.subplot(222)
plt.imshow(x_train[1], cmap='gray')
plt.subplot(223)
plt.imshow(x_test[0], cmap='gray')
plt.subplot(224)
plt.imshow(x_test[1], cmap='gray')
plt.show()
复制代码

The following image shows the output of the above code:

MNIST dataset

  1. Flatten the 28 x 28image so that the input is transformed into one-dimensional 784 pixel values ​​and fed into the Denselayer. Also, the labels need to be transformed into one-hot encoding. This step is critical in the dataset preparation process:
num_pixels = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(-1, num_pixels).astype('float32')
x_test = x_test.reshape(-1, num_pixels).astype('float32')
复制代码

在上示代码中,使用 reshape 方法对输入数据集进行形状变换,np.reshape() 将给定形状的数组转换为不同的形状。在此示例中,x_train 数组具有 x_train.shape[0] 个数据点(图像),每个图像中都有 x_train.shape[1] 行和 x_train.shape[2] 列, 我们将其形状变换为具有 x_train.shape[0] 个数据,每个数据具有 x_train.shape [1] * x_train.shape[2] 个值的数组。 接下来,我们将标签数据编码为独热向量:

y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
复制代码

我们简单了解下独热编码的工作原理。假设有一数据集的可能标签为 {apple,orange,banana,lemon,pear},如果我们将相应的标签转换为独热编码,则如下所示:

类别 索引0 索引1 索引2 索引3 索引4
apple 1 0 0 0 0
orange 0 1 0 0 0
banana 0 0 1 0 0
lemon 0 0 0 1 0
pear 0 0 0 0 1

每个独热向量含有 n n 个数值,其中 n n 为可能的标签数,且仅有标签对应的索引处的值为 1 外,其他所有值均为 0。如上所示,apple 的独热编码可以表示为 [1, 0, 0, 0, 0]。在 Keras 中,使用 to_categorical 方法执行标签的独热编码,该方法找出数据集中唯一标签的数量,然后将标签转换为独热向量。

  1. 用具有 1000 个节点的隐藏层构建神经网络:
model = Sequential()
model.add(Dense(1000, input_dim=num_pixels, activation='relu'))
model.add(Dense(num_classes,  activation='softmax'))
复制代码

输入具有 28×28=784 个值,这些值与隐藏层中的 1000 个节点单元相连,指定激活函数为 ReLU。最后,隐藏层连接到具有 num_classes=10 个值的输出 (有十个可能的图像标签,因此 to_categorical 方法创建的独热向量有 10 列),在输出的之前使用 softmax 激活函数,以便获得图像的类别概率。

  1. 上述模型架构信息可视化如下所示:
model.summary()
复制代码

架构信息输出如下:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 1000)              785000    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                10010     
=================================================================
Total params: 795,010
Trainable params: 795,010
Non-trainable params: 0
_________________________________________________________________
复制代码

在上述体系结构中,第一层的参数数量为 785000,因为 784 个输入单元连接到 1000 个隐藏层单元,因此在隐藏层中包括 784 * 1000 权重值加 1000 个偏置值,总共 785000 个参数。类似地,输出层有10个输出,分别连接到 1000 个隐藏层,从而产生 1000 * 10 个权重和 10 个偏置(总共 10010 个参数)。输出层有 10 个节点单位,因为输出中有 10 个可能的标签,输出层为我们提供了给定输入图像的属于每个类别的概率值,例如第一节点单元表示图像属于 0 的概率,第二个单元表示图像属于 1 的概率,以此类推。

  1. 编译模型如下:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
复制代码

因为目标值是包含多个类别的独热编码矢量,所以损失函数是多分类交叉熵损失。此外,我们使用 Adam 优化器来最小化损失函数,在训练模型时,监测准确率 (accuracy,可以简写为 acc) 指标。

  1. 拟合模型,如下所示:
history = model.fit(x_train, y_train,
                    validation_data=(x_test, y_test),
                    epochs=50,
                    batch_size=64,
                    verbose=1)
复制代码

上述代码中,我们指定了模型要拟合的输入(x_train)和输出(y_train);指定测试数据集的输入和输出,模型将不会使用测试数据集来训练权重,但是,它可以用于观察训练数据集和测试数据集之间的损失值和准确率有何不同。

  1. 提取不同epoch的训练和测试损失以及准确率指标:
history_dict = history.history
loss_values = history_dict['loss']
val_loss_values = history_dict['val_loss']
acc_values = history_dict['acc']
val_acc_values = history_dict['val_acc']
epochs = range(1, len(val_loss_values) + 1)
复制代码

在拟合模型时,history 变量会在训练和测试数据集的每个 epoch 中存储与模型相对应的准确率和损失值,我们将这些值提取存储在列表中,以便绘制在训练数据集和测试数据集中准确率和损失的变化。

  1. 可视化不同epoch的训练和测试损失以及准确性:
plt.subplot(211)
plt.plot(epochs, loss_values, marker='x', label='Traing loss')
plt.plot(epochs, val_loss_values, marker='o', label='Test loss')
plt.title('Training and test loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()

plt.subplot(212)
plt.plot(epochs, acc_values, marker='x', label='Training accuracy')
plt.plot(epochs, val_acc_values, marker='o', label='Test accuracy')
plt.title('Training and test accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
复制代码

The preceding code runs the input as shown in the following figures, where the first figure shows the training and test loss values ​​as the epochnumber increases, and the second figure shows the epochtraining and test accuracy as the number increases:

Model training performance monitoring

The final model is about 97% accurate.

  1. In addition, we can also manually calculate the accuracy of the final model on the test set:
preds = model.predict(x_test)
correct = 0
for i in range(len(x_test)):
    pred = np.argmax(preds[i], axis=0)
    act = np.argmax(y_test[i], axis=0)
    if (pred == act):
        correct += 1
    else:
        continue
accuracy = correct / len(x_test)
print('Test accuracy: {:.4f}%'.format(accuracy*100))
复制代码

In the above code predict, x_testthe predicted output value for the given input (here) is calculated using the model's method. Then, we loop over all the predictions for the test set, using to argmaxcalculate the index with the highest probability value. At the same time, do the same for the true label values ​​of the test dataset. In the predicted value of the test data set and the true value, the index of the highest probability value is the same, indicating that the prediction is correct, and the number of correct predictions in the test data set divided by the total amount of data in the test data set is the accuracy of the model.

Related Links

Learning Neural Network Forward Propagation from Scratch - Nuggets (juejin.cn)

Learning Neural Network Backpropagation from Scratch - Nuggets (juejin.cn)

First experience of building neural networks with Keras - Nuggets (juejin.cn)

Guess you like

Origin juejin.im/post/7084749383882768421