Quick learning in one article - let the neural network no longer be mysterious, learn the basics of neural network in one day (6) - backpropagation based on numerical differentiation


foreword

I have been thinking for a long time whether to publish in-depth learning content. After all, more than half of the machine learning content in the mathematical modeling column has not been updated. I have considered for a long time and decided to come up with a series of articles on neural networks. Otherwise, if neural networks are used in mathematical modeling competitions or other more optimized models in the future (such as using LSTM for time series model prediction), then it will be better to explain to everyone. And explained the principle. However, the content of deep learning is not so easy to master. It contains a lot of mathematical theoretical knowledge and a lot of calculation formula principles that require reasoning. And it is difficult to understand what the code we write represents in the neural network computing framework without actual operation. However, I will try my best to simplify the knowledge and convert it into something we are more familiar with. I will try my best to let everyone understand and be familiar with the neural network framework, so as to ensure smooth understanding and smooth deduction, try not to use too many mathematical formulas and Professional theoretical knowledge. Quickly understand and implement the algorithm in one article, proficient in this knowledge in the most efficient way.

Although many competitions do not limit the use of algorithm frameworks, more and more award-winning teams have used deep learning algorithms, and traditional machine learning algorithms are gradually declining. For example, in the 2022 American College Students Mathematical Modeling Question C, the parameter team used the deep learning network team, and the winning ratio is very high. Now artificial intelligence competitions and data mining competitions are increasing one after another, and the demand for neural network knowledge is also increasing, so it is very useful. It is necessary to master various neural network algorithms.

The blogger has focused on modeling for four years, participated in dozens of mathematical modeling, large and small, and understands the principles of various models, the modeling process of each model, and various topic analysis methods. The purpose of this column is to quickly use various mathematical models, machine learning and deep learning, and codes with zero foundation. Each article contains practical projects and runnable codes. Bloggers keep up with all kinds of digital and analog competitions. For each digital and analog competition, bloggers will write the latest ideas and codes into this column, as well as detailed ideas and complete codes. I hope that friends in need will not miss the column carefully created by the author.

This article is the last chapter and the most critical chapter of the neural network foundation. So far, the basic structure of the entire neural network and the functions of each node are basically on the paper. Personally, I think it is relatively clear and detailed. According to the basis of the previous five articles, we have been able to construct a basic primary neural network, but the key step that contains the soul of the neural network - backpropagation has not yet been described. This article will pass the neural network of the gradient descent optimization algorithm. Network Handwritten Digit Recognition.


Backpropagation Based on Numerical Differentiation

activation function

According to the steps we have learned, first write the tool, that is, the activation function. Our clear goal is the handwritten digit recognition project. Finally, we need to classify the recognized pictures. Naturally, we think of the Softmax activation function. Since it is a deep learning network, the ReLu family is indispensable. Let's write these two tools first. superior:

#启发函数(激活函数)ReLU
def _relu(in_data):
    return np.maximum(0,in_data)
#激活函数Softmax
def _softmax(x):
    if x.ndim == 2:
        c = np.max(x,axis=1)
        x = x.T - c #溢出对策
        y = np.exp(x) / np.sum(np.exp(x),axis=0)
        return y.T
    c = np.max(x)
    exp_x = np.exp(x-c)
    return exp_x/np.sum(exp_x)

 loss function

For the loss function, the numerical differentiation has already been mentioned last time. Such image classification generally uses the cross entropy loss function (Cross Entropy Loss). The cross entropy loss function is used to measure the gap between the predicted value and the real label in the classification problem. It has been widely used in deep learning. The cross-entropy loss function performs very well in multi-classification problems, such as in image classification, natural language processing and other fields. If you don't know enough about the loss function, I recommend you to read this article of mine:

Detailed Explanation of Loss Function (Loss Function) - Python Code Implementation of Common Loss Functions for Classification Problems + Analysis of Calculation Principles

Give the code directly:

#损失函数
def cross_entropy_error(p,y):
    delta =1e-7
    batch_size = p.shape[0]
    return -np.sum(y*np.log(p+delta))/batch_size

Relatively simple to achieve.

numerical differentiation

The concept of numerical differentiation was not described in detail before, and I will add it here. Numerical differentiation is a numerical approximation method for computing derivatives (or slopes). A derivative is the rate of change of a function at a point, which tells us how much the output value of the function at that point responds to the input value. Numerical differentiation allows us to estimate derivatives by taking the value of a function over a small range around a certain point. There are two commonly used methods:

Forward Difference: Forward Difference estimates the derivative by computing the difference between the value of a function at one point xand the value at a later point . x+hNumerical differentiation formula for forward differencing:

f'(x)\approx \frac{f(x+h)-f(x)}{h}

where his a small positive number called the differential step size. By choosing different differential step sizes, estimates with different accuracies can be obtained. .

Central Difference: Central difference estimates the derivative by computing the difference between xthe value of a function at a point and the value at the point x-\frac{h}{2}and . x+\frac{h}{2}Numerical differentiation formula for central difference:

f'(x)\approx \frac{f(x+\frac{h}{2})-f(x-\frac{h}{2})}{h}

Central differencing is generally more accurate than forward differencing because it takes into account xthe function values ​​around the point.

The main application of numerical differentiation is to compute the derivative of a function without analyzing the derivative expression. It has wide applications in numerical optimization, numerical integration, gradient computation in machine learning, and other numerical methods involving derivatives.

It should be noted that numerical differentiation is a numerical approximation method, and its accuracy is affected by the differentiation step size. Smaller differentiation steps generally lead to more accurate estimates, but may also introduce numerical stability issues. Therefore, in practical applications, it is necessary to select an appropriate differential step size according to specific problems and computing resources.

The implementation method of Python calculation is:

def numerical_gradient(f,x):
    h = 1e-4 #0.0001
    grad = np.zeros_like(x)
    it = np.nditer(x,flags=['multi_index'],op_flags=['readwrite'])
    while not it.multi_index:
        idx = it.multi_index
        tmp_val = x[idx]
        x[idx] = float(tmp_val)+h
        fxh1 = f(x) #f(x+h)
        
        x[idx] = tmp_val-h
        fxh2 = f(x) #f(x-h)
        grad[idx] = (fxh1 - fxh2) / (2*h)
        
        x[idx] = tmp_val #还原值
        it.iternext()
        
    return grad

 Define a neural network

Next, we will build our own neural network. Follow the steps to ensure that we will not forget the key points. The network building process described in detail in the previous article and in the forward propagation of the previous article. If you have forgotten, I recommend you to take a look. It is quite difficult to remember the neural network structure and construction over and over again. First, we need to initialize our network:

class TwoLayerNet:
    def __init__(self,input_size,hidden_size,output_size,weight_init_std=0.01):
        #初始化权重
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size,hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size,output_size)
        self.params['b2'] = np.zeros(output_size)
        

 Set the weight and hidden layer, where the shape of W1 is (input_size, hidden_size), and the shape of W2 is (hidden_size, output_size). Then it is forward propagation to improve the network function.

forward propagation

It is recommended that you read the forward propagation again to help you remember: one article to learn quickly - let the neural network no longer be mysterious, one day to quickly learn the basics of neural network - forward propagation (3)

 No more description here, the implementation method:

def predict(self,x):
        W1,W2 = self.params['W1'],self.params['W2']
        b1,b2 = self.params['b1'],self.params['b2']

        a1 = np.dot(x,W1)+b1
        z1 = _relu(a1)
        a2 = np.dot(z1,W2)+b2
        p = _softmax(a2)

 Now let’s calculate the loss value. For the predicted value, the result is calculated by forward propagation, and then call the function cross_entropy_error to get the loss value Loss of the loss function. Our goal is to make Loss continuously reduce:

#x:输入数据,y:监督数据
    def loss(self,x,y):
        p = self.predict(x)

        return cross_entropy_error(p,y)

 Then calculate the inference based on the loss value to get the optimal weight set, and the goal is very clear.

optimize

What we take is to calculate the gradient descent optimization algorithm:

#x:输入数据,y:监督数据
    def numerical_gradient(self,x,y):
        loss_W = lambda W:self.loss(x,y)
        
        grads = {}
        grads['W1'] = numerical_gradient(loss_W,self.params['W1'])
        grads['b1'] = numerical_gradient(loss_W,self.params['b1'])
        grads['W2'] = numerical_gradient(loss_W,self.params['W2'])
        grads['b2'] = numerical_gradient(loss_W,self.params['b2'])
        
        return grads

 model checking

There are many algorithms for calculating model indicators. Generally speaking, there are three commonly used indicators for this type of classification algorithm detection. Of course, I have also introduced them in detail in my previous articles, and there are also visual displays that are very useful. I recommend you read them:

Detailed calculation of sklearn prediction evaluation indicators: accuracy (Accuracy), precision (Precision), recall (Recall), F1score

Here is a convenient display with an AC:

def accuracy(self,x,t):
        p = self.predict(x)
        p = np.argmax(p,axis=1)
        y = np.argmax(t,axis=1)

        accuracy = np.sum(p == y)/float(x.shape[0])
        return accuracy

 Well, the basic wheels are also set up, and then we start to make the code run:

#超参数
iters_num = 1000 #适当设定循环的次数
train_size = x_train.shape[0]
batch_size = 100
learning_rate = 0.001

network = TwoLayerNet(input_size = 784,hidden_size=50,output_size=10)
for i in range(iters_num):
    batch_mask = np.random.choice(train_size,batch_size)
    x_batch = x_train[batch_mask]
    y_batch = y_train[batch_mask]
    
    grad = network.numerical_gradient(x_batch,y_batch)
    
    for key in ('W1','b1','W2','b2'):
        network.params[key] -= learning_rate*grad[key]
#记录学习过程
    loss =network.loss(x_batch,y_batch)
    print(loss)

I have trained dozens of cycles here and the accuracy can reach 67%:

So this article comes to an end here. So far, we have built the framework of the entire neural network. We will talk about many things that can be refined and filled later. After learning here, you already have the ability to build a primary neural network. , you can run some other traditional machine learning projects to see the effect, then in the next chapter we will learn about the multiple epoch training of the neural network.


Guess you like

Origin blog.csdn.net/master_hunter/article/details/132625461