[Fundamentals of deep learning] Detailed explanation of the principle of backpropagation BP algorithm and practical demonstration (with source code)

If you need the source code, please like and follow the collection and leave a private message in the comment area~~~

The design of the neural network is inspired by the neural network in biology. As shown in the figure, each node is a neuron, and the connection between neurons indicates the direction of information transmission. Layer 1 represents the input layer, Layer 2 and Layer 3 represent the hidden layer, and Layer 4 represents the output layer. We hope to transform the input data through the neural network to obtain the desired output. In other words, the neural network is a kind of mapping that maps the original data into the desired data . The BP algorithm is one of the mappings. The following is a specific example to demonstrate the process of the BP algorithm.

Suppose the current network layer is as shown in the figure, the first layer has two neurons x1, x2, and an intercept item c1; the second layer has two neurons y1, y2, and an intercept item c2; the third layer is Output, there are two neurons h1 and h2; each line represents the weight of the connection between neurons (the specific value is shown in Figure 1-2), and the activation function σ selects the Sigmoid function. The sigmoid function and its derivative with respect to x are shown below:

 

 

Input: x_1=0.05, x_2=0.1, target: output h1, h2 as close as possible to [0.03, 0.05]

1: forward propagation

input layer -> hidden layer

 

hidden layer -> output layer

 

 

 So far, the process of forward propagation has been completed, and the output at this time is [0.694, 0.718], which is quite different from the expected output [0.03, 0.05]. Next, through backpropagation, update the weights on each edge and recalculate the output

2: Backpropagation

 Calculate the total error: mean squared error as our total error function

target_outℎ_i represents the i-th real value; out_ℎ_i represents the predicted value; N takes a value of 2

Because each weight contributes to the error, we want to know how much each weight contributes to the error. This can be achieved by taking the partial derivative of the overall error for a specific weight. From the output layer to the hidden layer, a total of 5 parameters need to be updated, namely b11, b12, b21, b22, c2. Taking b22 as an example, it is calculated by the chain rule, as follows:

 

 

 

 

Among them, ρ represents the learning rate, which is set to 0.5 in this example. In the same way: b_11^new=0.458, b_12^new=0.560, b_21^new=0.658

For the bias term, the solution method is similar, but since the bias term contributes to the loss of each neuron, it should be calculated for each neuron and then summed. Since the last item is 1 after derivation in this example, it is 1 in general, so it can be simplified

 

hidden layer -> input layer 

The method is similar to "output layer -> hidden layer", but there is a little difference. As shown in the figure, it can be found that neuron h1 directly outputs backwards without inputting the next neuron, and the output value of neuron y1 needs to be input to neurons h1 and h2, causing neuron y1 to receive two signals from h1 and h2. The error transmitted by neurons, so both h1 and h2 need to be calculated

From the hidden layer to the input layer, a total of 5 parameters need to be updated, namely a11, a12, a21, a22, c1, taking a11 as an example to calculate

Several partial differentials in the formula have corresponding calculation formulas in the weight update of the output layer -> hidden layer

 In the same way, a_12^new=0.199, a_21^new=0.298, a_22^new=0.398

The calculation method of the bias item is consistent with the output layer->hidden layer method, so I won’t go into details here, but it should be noted that the update of c1 is related to y1, y2, h1, and h2, and the value can be obtained by inserting the value: c_1^new =0.307

So far, all parameters have been updated

Using the updated parameters, the new output can be calculated as [0.667, 0.693] (the original output is [0.694, 0.718], and the target output is [0.03, 0.05]) The new total error is 0.44356 (the original total error is 0.444)

Through the new weight calculation, it can be found that the output value is gradually approaching the target value, and the total error is gradually decreasing. As the number of iterations increases, the output value will be highly similar to the target value

3: Numpy implements backpropagation algorithm

1: Import dataset

The data set uses the sklearn.make_moons() data set (below), and uses the sklearn package for data preprocessing. The data set is visualized as follows

 2: Preprocessing

Our model is based on the optimization algorithm of gradient descent. In order to make this type of algorithm better optimize the neural network, we need to normalize the data set. We use the library function of sklearn to complete it.

Rebuild a two-layer neural network, as shown in the figure

where X is a p×q matrix. The cross-entropy loss function is selected as the loss function of the neural network. The learning rate is 0.05, and the parameters are updated using the gradient descent method

For the weight matrix W and the bias matrix B, a method of random initialization is adopted. When the latitude of W and B is high, it is not easy to initialize manually. And if W and B are initialized to 0 or the same value, the gradient will remain equal and the weights will be the same during the gradient descent update process, resulting in different hidden units taking the same function or function value as input, which can be passed through the parameters Random initialization breaks this deadlock.
  At the same time, it should be noted that the parameter initialization should be reasonable, otherwise there will be gradient disappearance or gradient explosion

When implementing sigmoid, when x is very large, there will be an overflow problem. We can use scipy.expit() to implement sigmoid

So far, we have completed the construction of the basic framework, now we write the training model and we assume that the learning rate is 0.05

3: Training and testing

 

The test set prediction results are as follows

Define the prediction function, predict through the following code on the test set, and visualize the prediction results. It can be found that the data set can be divided into two categories more clearly

 

The error plot is as follows, it can be seen that when the number of iterations is about 300 times, it has basically stabilized

 

 The last part of the code is as follows

# 导入模块
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.colors import ListedColormap
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=2000, noise=0.4, random_state=None)
# 将数据集可视化
plt.figure(figsize=(8, 8))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=ListedColormap(['#B22222', '#87CEFA']), edgecolors='k')
# 60%作为训练集,40%作为测试集
from sklearn.model_selection import train_test_split
trainX, ard.fit_transform(trainX)

# 使用标准化器在训练集上的均值和标准差,对测试集进行归一化
testX = standard.transform(testX)# 定义神经网络
# 我们选用较简单的神经网络,它只有两个参数,W(权重矩阵)和 B(偏置矩阵)
def _init_(q):
    # q:W的维度(也就是有 q 个神经元)
    # 我们将 W 与 B 随机初始化
    np.ranB
def train(trainX, trainY, testX, testY, W, B, epochs, flag):
    # trainX: np.ndarray, 训练集, 维度 = (p,q)
    # trainY: np.ndarray, 训练集标签, 维度 = (p, )
    # testX: np.ndarray, 测试集, 维度 = (m,q)
    # testY: np.ndarray, 测试集标签, 维度 = (m, )
    # W:np.ndarray, 权重矩阵, 维度 = (q,)
    # B: np.ndarray, 偏置项, 维度 = (1,)
    # epochs: 迭代次数
    # flag: flag == True 打印损失值\\ flag == False不打印损失值

    train_loss_list = []
    test_loss_list = []
    for i in range(epochs):
        # 训练集
        pred_train_Y = forward(trainX, W, B)
        train_loss = loss_func(trainY, pred_train_Y)
        
        # 测试集
        pred_test_Y = forward(testX, W, B)
        test_loss = loss_func(testY, pred_test_Y)
        
        if flag == True:
            print('the traing loss of %s epoch : %s'%(i+1, train_loss))
            print('the test loss of %s epoch : %s'%(i+1, test_loss))
            print('=========================')
        
        train_loss_list.append(train_loss)
        test_loss_list.append(test_loss)
        
        #反向传播
        backword(W, B, trainY, pred_train_Y, trainX, learning_rate)
    return train_loss_list, test_loss_list
def predict(X, W, B):
    # X: np.ndarray, 训练集, 维度 = (n, m)
    # W: np.ndarray, 参数, 维度 = (m, 1)
    # B: np.ndarray, 参数b, 维度 = (1, )
    
    y_pred = forward(X,W,B)
    n = len(y_pred)
    prediction = np.zeros((n,1))
    for i in range(n):
        if y_pred[i] > 0.5:
            prediction[i,0] = 1
        else:
            prediction[i,0] = 0
    return prediction
def plot_loss(train_loss_list, test_loss_list):
    plt.figure(figsize = (10,8))
    plt.plot(train_loss_list, label='train_loss')
    plt.plot(test_loss_list, label='test_loss')
    
    plt.xlabel('epoch')
    plt.ylabel('loss')
    plt.legend()

It's not easy to create and find it helpful, please like, follow and collect~~~

Guess you like

Origin blog.csdn.net/jiebaoshayebuhui/article/details/130446179