Practical combat: Use activation function and Keras framework to solve classification problems

Practical combat: Use activation function and Keras framework to solve classification problems

introduce

This article will introduce how to use activation functions and the Keras framework to solve classification problems. Classification problem is a basic problem in machine learning, whose goal is to classify input data into different categories. This article will take the problem of handwritten digit recognition as an example to introduce how to use activation functions and the Keras framework to build a classifier to classify handwritten digits into ten categories from 0 to 9.

Data loading

First, we need to load the handwritten digits dataset MNIST. This dataset contains 60,000 training samples and 10,000 test samples, each sample is a 28x28 grayscale image representing one of the handwritten digits 0-9. Using the Keras framework, the MNIST dataset can be easily obtained:

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

print('训练集大小:', x_train.shape, y_train.shape)
print('测试集大小:', x_test.shape, y_test.shape)

The output is as follows:

训练集大小: (60000, 28, 28) (60000,)
测试集大小: (10000, 28, 28) (10000,)

As you can see, the training set contains 60,000 samples, and the test set contains 10,000 samples. Each sample is a 28x28 grayscale image.

Data preprocessing

Before training the model, we need to preprocess the data. First, the image matrix is ​​converted into a one-dimensional vector and the pixel values ​​are normalized to the range of 0-1. This can be achieved with the following code:

import numpy as np

# 将图像矩阵转化为一维向量
x_train = x_train.reshape((60000, 28 * 28))
x_test = x_test.reshape((10000, 28 * 28))

# 将像素值归一化到0-1范围内
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# 将类别标签转化为独热编码
from keras.utils import to_categorical

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

print('训练集大小:', x_train.shape, y_train.shape)
print('测试集大小:', x_test.shape, y_test.shape)

After this processing, the sizes of the training set and the test set are respectively (60000, 784)and (10000, 784), and the category labels are also converted into one-hot encoding.

Model design

We will use the Keras framework to build a fully connected neural network to solve the problem of handwritten digit classification. This neural network contains an input layer, two hidden layers and an output layer. The size of the hidden layer is 128, and the activation function uses ReLU; the size of the output layer is 10 (corresponding to 10 categories), and the activation function uses softmax. This neural network can be implemented with the following code:

from keras import models
from keras import layers

model = models.Sequential()
model.add(layers.Dense(128, activation='relu', input_shape=(784,)))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

print(model.summary())

The structure of this neural network is as follows:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 128)               100480    
_________________________________________________________________
dense_2 (Dense)              (None, 128)               16512     
_________________________________________________________________
dense_3 (Dense)              (None, 10)                1290      
=================================================================
Total params: 118,282
Trainable params: 118,282
Non-trainable params: 0
_________________________________________________________________

As you can see, the size of the input layer is 784 (corresponding to the size of the 28x28 image after vectorization), the size of both hidden layers is 128, and the size of the output layer is 10 (corresponding to 10 categories).

Model compilation

After designing the model, we need to compile the model. When compiling a model, you need to specify the loss function, optimizer, and evaluation metrics. For classification problems, the commonly used loss function is cross entropy (categorical_crossentropy), the commonly used optimizer is stochastic gradient descent (SGD), and the commonly used evaluation index is accuracy. This can be achieved using the following code:

model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

Model training

After compiling the model, we can use fitthe function to train the model. fitThe function needs to specify the input and output of the training set and test set, batch size (batch size), number of training rounds (epochs) and other parameters. This can be achieved using the following code:

history = model.fit(x_train, y_train,
                    batch_size=128,
                    epochs=20,
                    validation_data=(x_test, y_test))

This model will be trained on the training set for 20 rounds, with 128 samples randomly selected for training each time. After each round of training, the model is tested on the test set, and the training and test losses and accuracy are recorded.

Model evaluation

After training the model, we can use the following code to evaluate the model's performance on the test set:

test_loss, test_acc = model.evaluate(x_test, y_test)

print('测试集损失:', test_loss)
print('测试集准确率:', test_acc)

The output is as follows:

测试集损失: 0.10732187752056122
测试集准确率: 0.968600034236908

It can be seen that the accuracy of this model on the test set reached 96.86%.

We can also plot the training and test loss and accuracy curves during training with the following code:

import matplotlib.pyplot as plt

history_dict = history.history
train_loss = history_dict['loss']
test_loss = history_dict['val_loss']
train_acc = history_dict['accuracy']
test_acc = history_dict['val_accuracy']

epochs = range(1, len(train_loss) + 1)

plt.plot(epochs, train_loss, 'bo', label='Training loss')
plt.plot(epochs, test_loss, 'b', label='Testing loss')
plt.title('Training and testing loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.plot(epochs, train_acc, 'bo', label='Training accuracy')
plt.plot(epochs, test_acc, 'b', label='Testing accuracy')
plt.title('Training and testing accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

It can be seen that as the number of training rounds increases, the training loss gradually decreases and the training accuracy gradually increases. However, the test loss and test accuracy begin to rebound after a certain number of training epochs, which means that the model has begun to overfit. We can use various techniques to solve the overfitting problem, such as adding regularization terms, using dropout, etc.

Summarize

This article explains how to use activation functions and the Keras framework to solve classification problems. Taking the problem of handwritten digit recognition as an example, we introduced how to use the Keras framework to build a fully connected neural network to classify handwritten digits into ten categories from 0 to 9. We also cover model compilation, training, and

Guess you like

Origin blog.csdn.net/java_wxid/article/details/132649550