"Introduction to Deep Learning" Chapter 4 Actual Combat: Handwritten Number Recognition


foreword

This article completes the small case of handwritten digit recognition based on the content of Chapter 4 of "Introduction to Deep Learning". The focus of this chapter is how to let the neural network " learn to learn ". In order to enable the neural network to learn to learn, 损失函数this indicator will be imported to find the weight parameters that minimize the loss function. In order to find the smallest possible value of the loss function, we use 梯度下降法.


1. Theoretical knowledge

(1) Learning steps of neural network

  1. mini-batch : Randomly select a part of the data from the training data, this part of the data is called mini-batch. Send the data in the mini-batch to the network, and then you can get the prediction result. According to the predicted result and the correct result, the loss function is calculated.
  2. Calculate the gradient : In order to reduce the value of the mini-batch loss function, the gradient of each weight parameter needs to be calculated. The gradient represents the direction in which the value of the loss function decreases the most.
  3. Update parameters : The weight parameters are slightly updated along the gradient direction.
  4. Repeat : Repeat steps 1-3.

(2) Gradient and gradient descent

Gradient : The vector summed up by the partial derivatives of all variables is called the gradient. The direction indicated by the gradient is the direction in which the value of the function decreases the most at each point.
Gradient method : A method that continuously advances along the direction of the gradient and gradually reduces the value of the function. Among them, the gradient ascent method refers to the gradient method for finding the maximum value; the gradient descent method refers to the gradient method for finding the minimum value.

(3) Loss function

Loss function : The indicator used in the learning of the neural network can be used to indicate the extent to which the current neural network does not fit the supervised data. The commonly used loss functions are mean square error and cross entropy error.
均方误差:
insert image description here
where y_k represents the output of the neural network, t_k represents the actual data, and k represents the dimensionality of the data.
Code:

def mean_squared_error(y, t):
	return 0.5 * np.sum((y-t)**2)

交叉熵误差:
insert image description here
y_k represents the output of the neural network (it is a probability, such as the output of sigmoid or softmax), t_k is the label of the correct solution (t_k is represented by one-hot)
code implementation:

def cross_entropy_error(y, t):
	delta = 1e-7
	return -np.sum(t * np.log(y + delta))

(四) epoch、iters_num

Epoch : Epoch is a unit, and one epoch represents the number of updates when all the training data in the learning has been used once . For 10,000 training data, when learning with a mini-batch of 100 data, repeat the stochastic gradient descent method 100 times, and all the training data have been seen. So in this example, epoch is 100.
iters_num : The number of iterations of the gradient method. (In this case of handwritten digit recognition, iters_num is 10000. It means that each time a mini_batch is randomly selected, and the extraction is repeated 10000 times.)

(5) The neural network structure of this case

This network uses a two-layer neural network. The network structure is roughly as follows:

Input layer: 784 neurons.
Hidden layer: 50 neurons.
Output layer: 10 neurons.


2. All codes

import sys, os

sys.path.append(os.pardir)
import numpy as np
import matplotlib.pyplot as plt
from common.functions import *
from common.gradient import numerical_gradient
from dataset.mnist import load_mnist
from dataset.two_layer_net import TwoLayerNet


def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        batch_size = y.shape[0]
        return -np.sum(t * np.log(y + 1e-7)) / batch_size


class TwoLayerNet:
    def __init__(self, input_size, hidden_size, output_size, weight_init_std=0.01):
        # 初始化权重
        self.params = {
    
    }
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size)
        self.params['b2'] = np.zeros(output_size)

    def sigmoid(a):
        return 1 / (1 + np.exp(-a))

    def softmax(a):
        exp_a = np.exp(a)
        sum = np.sum(exp_a)
        y = exp_a / sum
        return y

    def predict(self, x):
        W1, W2 = self.params['W1'], self.params['W2']
        b1, b2 = self.params['b1'], self.params['b2']
        a1 = np.dot(x, W1) + b1
        z1 = self.sigmoid(a1)
        a2 = np.dot(z1, W2) + b2
        z2 = self.softmax(a2)
        return z2

    # x是输入数据,t是标签
    def loss(self, x, t):
        y = self.predict(x)
        return cross_entropy_error(y, t)  # 交叉熵损失函数

    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        t = np.argmax(t, axis=1)
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy

    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        grads = {
    
    }
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        return grads

    (x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)
    train_loss_list = []
    train_acc_list = []
    test_acc_list = []


    # 超参数
    iters_num = 500
    train_size = x_train.shape[0]
    batch_size = 100
    learning_rate = 0.1
    network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)
    # 平均每个epoch的重复次数
    iter_per_epoch = max(train_size / batch_size, 1)
    for i in range(iters_num):
        # 获取mini-batch
        batch_mask = np.random.choice(train_size, batch_size)
        x_batch = x_train[batch_mask]
        t_batch = t_train[batch_mask]

        # 计算梯度
        grad = network.numerical_gradient(x_batch, t_batch)
        print('hello')
        # 更新参数
        for key in ('W1', 'b1', 'W2', 'b2'):
            network.params[key] -= learning_rate * grad[key]

        # 记录学习过程
        loss = network.loss(x_batch, t_batch)
        train_loss_list.append(loss)

        # 计算每个epoch的识别精度
        if i % iter_per_epoch == 0:
            train_acc = network.accuracy(x_train, t_train)
            test_acc = network.accuracy(x_test, t_test)
            train_acc_list.append(train_acc)
            test_acc_list.append(test_acc)


    # 导入数据
    m = list(np.arange(1, iters_num+1))
    n = list(np.arange(1, len(train_acc_list)+1))
    t = list(np.arange(1, len(test_acc_list)+1))
    # 绘图命令
    print(train_loss_list)
    print(train_acc_list)
    print(test_acc_list)
    # 画第一个图
    plt.subplot(221)
    plt.plot(m, train_loss_list)
    # show出图形
    plt.show()

Running results:
(The abscissa in the figure below is iters_num, and the ordinate is the loss function value)
insert image description here

Guess you like

Origin blog.csdn.net/rellvera/article/details/127933046