"Introduction to Deep Learning" Chapter 7 Actual Combat: Handwritten Digit Recognition - Convolutional Neural Network

Tip: After the article is written, the table of contents can be automatically generated. How to generate it can refer to the help document on the right


foreword

I recently read Chapter 7 of the book "Introduction to Deep Learning-Python-Based Theory and Implementation". This chapter mainly explains convolutional neural networks. Convolutional Neural Network (CNN) can be used in various occasions of image recognition and speech recognition.


1. A little introduction

1. Overall structure

Compared with the previously learned neural network, CNN has one more convolutional layer (convolutional layer) and one pooling layer (pooling layer).

The connection order of CNN's layers is "Convolution - ReLU - (Pooling)" (Pooling layer is sometimes omitted). This can be understood as the previous "Affine - ReLU" connection was replaced by a "Convolution - ReLU - (Pooling)" connection.
insert image description here

2. Convolution layer

In the fully connected layer, the data is flattened into 1-dimensional data and input into the network. So in the fully connected layer, the shape of the data is actually ignored. But the convolutional layer can keep the shape of the data unchanged.

In CNN, the input and output data of the convolutional layer is sometimes called a feature map . Among them, the input data of the convolutional layer is called the input feature map (input feature map) , and the output data is called the output feature map (output feature map) . In this book, "input-output data" and "feature map" are used as synonymous words.

convolution operation

The objects involved in the convolution operation are input data and filters. (Assuming that we are now inputting two-dimensional data, the corresponding filter is also two-dimensional)
insert image description here
For the input data, the convolution operation slides the window of the filter at a certain interval and applies it. The window mentioned here refers to the gray 3 × 3 part in Figure 7-4. As shown in Figure 7-4, the element of the filter at each position is multiplied by the corresponding element of the input, and then summed (this calculation is sometimes called a multiply-accumulate operation). Then, save this result to the corresponding location of the output. Perform this process at all positions to get the output of the convolution operation.

In a fully connected neural network, in addition to weight parameters, there are also biases. In CNN, the parameters of the filter correspond to the previous weights. Also, there is bias in CNN. The example convolution operation in Figure 7-3 is shown up to the stage where the filter is applied. The processing flow of the convolution operation including bias is shown in Figure 7-5.
insert image description here
insert image description here

filling

Before the convolutional layer is processed, it is sometimes necessary to fill in fixed data (such as 0, etc.) around the input data
, which is called padding .

insert image description here

Through padding, the input data of size (4, 4) becomes (6, 6) shape. Then, a filter of size (3, 3) is applied, resulting in output data of size (4, 4).

In this example, the padding is set to 1, but the padding value can also be set to any integer such as 2, 3, etc. In the example in Figure 7-5, if the padding is set to 2, the size of the input data becomes (8, 8); if the padding is set to 3, the size becomes (10, 10).

Padding is used primarily to resize the output.

stride

insert image description here
Convolution operation based on multiple filters:
insert image description here

3. Pooling layer

Pooling is an operation that reduces the space in the height and length directions. Generally speaking, pooling is divided into two types: Max pooling and Average pooling.

Max pooling: get the maximum value.
Average pooling: Get the average value.

The example in the figure below is the processing sequence when performing 2*2 Max pooling according to stride 2.
insert image description here
Features of the pooling layer:
1. There are no parameters to learn.
Pooling only obtains the maximum or average value from the target area, and there are no parameters to learn.
2. The number of channels does not change.
After the pooling operation, the number of channels of input data and output data has not changed. As shown in the figure below, calculations are performed independently by the number of channels.
insert image description here

2. Implementation of convolutional layer and pooling layer

Refer to this blog post: https://blog.csdn.net/LeungSr/article/details/127203161

3. All codes and running results

from collections import OrderedDict

import sys, os

sys.path.append(os.pardir)
import numpy as np
import matplotlib.pyplot as plt
from common.functions import *
from common.layers import *
from collections import OrderedDict
from dataset.mnist import load_mnist
from dataset.two_layer_net import TwoLayerNet
from common.trainer import Trainer
import pickle

class SimpleConvNet:
    # input_dim 输入数据的尺寸,默认1通道,高28,长28
    # filter_num 滤波器数量
    # filter_size 滤波器大小
    # pad 填充
    # stride 步幅,默认为1
    # hidden_size 隐藏层(全连接)中的神经元数量
    # output_size 输出层(全连接)中的神经元数量
    # weight_init_std 初始化时权重的标准差
    def __init__(self, input_dim=(1,28,28),
                 conv_param={
    
    'filter_num':30, 'filter_size':5, 'pad':0, 'stride':1},
                 hidden_size=100, output_size=10, weight_init_std=0.01):
        filter_num = conv_param['filter_num']
        filter_size = conv_param['filter_size']
        filter_pad = conv_param['pad']
        filter_stride = conv_param['stride']
        input_size = input_dim[1]
        conv_output_size = (input_size - filter_size + 2*filter_pad) / filter_stride + 1
        pool_output_size = int(filter_num * (conv_output_size/2) * (conv_output_size/2))
        # 初始化权重参数
        self.params = {
    
    }
        self.params['W1'] = weight_init_std * np.random.randn(filter_num, input_dim[0], filter_size, filter_size)
        self.params['b1'] = np.zeros(filter_num)
        self.params['W2'] = weight_init_std * np.random.randn(pool_output_size, hidden_size)
        self.params['b2'] = np.zeros(hidden_size)
        self.params['W3'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b3'] = np.zeros(output_size)
        # 生成必要的层
        self.layers = OrderedDict()
        self.layers['Conv1'] = Convolution(self.params['W1'], self.params['b1'], conv_param['stride'], conv_param['pad'])
        self.layers['Relu1'] = Relu()
        self.layers['Pool1'] = Pooling(pool_h=2, pool_w=2, stride=2)
        self.layers['Affine1'] = Affine(self.params['W2'], self.params['b2'])
        self.layers['Relu2'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W3'], self.params['b3'])
        self.last_layer = SoftmaxWithLoss()

    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        return x

    # x是输入数据,t是标签
    def loss(self, x, t):
        y = self.predict(x)
        return self.last_layer.forward(y, t)

    # 计算精确度
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        if t.ndim != 1:
            t = np.argmax(t, axis=1)
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy

    def save_params(self, file_name="params.pkl"):
        params = {
    
    }
        for key, val in self.params.items():
            params[key] = val
        with open(file_name, 'wb') as f:
            pickle.dump(params, f)

    def load_params(self, file_name="params.pkl"):
        with open(file_name, 'rb') as f:
            params = pickle.load(f)
        for key, val in params.items():
            self.params[key] = val

        for i, key in enumerate(['Conv1', 'Affine1', 'Affine2']):
            self.layers[key].W = self.params['W' + str(i + 1)]
            self.layers[key].b = self.params['b' + str(i + 1)]


    def gradient(self, x, t):
        # forward
        self.loss(x, t)
        # backward
        dout = 1
        dout = self.last_layer.backward(dout)
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)
        # 设定
        grads = {
    
    }
        grads['W1'] = self.layers['Conv1'].dW
        grads['b1'] = self.layers['Conv1'].db
        grads['W2'] = self.layers['Affine1'].dW
        grads['b2'] = self.layers['Affine1'].db
        grads['W3'] = self.layers['Affine2'].dW
        grads['b3'] = self.layers['Affine2'].db

        return grads


# 读入数据
(x_train, t_train), (x_test, t_test) = load_mnist(flatten=False)

# 处理花费时间较长的情况下减少数据
# x_train, t_train = x_train[:5000], t_train[:5000]
# x_test, t_test = x_test[:1000], t_test[:1000]

max_epochs = 20

network = SimpleConvNet(input_dim=(1, 28, 28),
                        conv_param={
    
    'filter_num': 30, 'filter_size': 5, 'pad': 0, 'stride': 1},
                        hidden_size=100, output_size=10, weight_init_std=0.01)

trainer = Trainer(network, x_train, t_train, x_test, t_test,
                  epochs=max_epochs, mini_batch_size=100,
                  optimizer='Adam', optimizer_param={
    
    'lr': 0.001},
                  evaluate_sample_num_per_epoch=1000)
trainer.train()

# 保存参数
network.save_params("params.pkl")
print("Saved Network Parameters!")

# 绘制图形
markers = {
    
    'train': 'o', 'test': 's'}
x = np.arange(max_epochs)
plt.plot(x, trainer.train_acc_list, marker='o', label='train', markevery=2)
plt.plot(x, trainer.test_acc_list, marker='s', label='test', markevery=2)
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()






operation result:
insert image description here

Guess you like

Origin blog.csdn.net/rellvera/article/details/128054009