Educode--Python commonly used loss function implementation

Pass 1: Implement forward propagation of common loss functions

mission details

The task of this level: implement the forward propagation of common loss functions.

related information

In order to complete this task, you need to master: the definition of common loss functions.

For the content of this training, please refer to chapters 4.1−4.2 in the book "Introduction to Deep Learning - Theory and Implementation Based on Python".

Neural Network Training

As a technology that had appeared in the 1990s, why did neural networks suddenly re-enter people's field of vision after 2012, and show unprecedented dominance in computer vision, natural language processing and other fields? The answer includes two factors. The first is the rapid development of computer hardware, which provides powerful computing support for deep learning; the second is data. With the rapid development of cloud computing and other technologies, the Internet provides massive data for deep learning. . It can be said that data is the lifeblood of deep learning, deep learning is driven by data, and data quality is also the most critical factor in determining the performance of neural network models.

So with the data, how to get a model that can solve the problem? This process is the training of the model . The purpose of training is to make the model recognize the pattern that exists in the data , which can be understood as the hidden common features behind the data. When training the model, we need to set a goal so that the output of the model is close to the result we want. The loss function is used to measure how close the model's output is to our desired result. In different types of tasks, we use different loss functions. Below, we introduce the commonly used loss functions.

Definition of Common Loss Functions

1. Cross Entropy

In classification tasks, the network usually predicts the probability that the input sample belongs to each class, and our goal is to expect the correct class with the highest probability. This is the idea of ​​maximum likelihood probability. Based on this idea, we can get the Cross Entropy loss function. Its function expression is:

E=−∑i=1C​qi​log(pi​)

Where qi​is the one-hot encoding of the label category. When the sample belongs to the i-th category, qi​=1, otherwise qi​=0; C represents the number of categories; pi​is the predicted probability of each category, Usually the output of softmax:

pi=exi/∑j=1Cexj

where xi​is the output of the network model. Softmax can convert the output of the network into several positive real numbers whose sum is 1. The larger xi​ corresponds to the larger pi​, so it can be regarded as the probability of multiple categories.

When implementing cross-entropy, for the simplicity of derivation during backpropagation, it is usually implemented together with softmax, that is, for the output of the model, first do softmax once, and then calculate the cross-entropy loss. The advantages of backpropagation corresponding to this approach will be introduced in the next level.

2. Mean Squared Error

Regression problems are simpler than classification problems. In regression problems, our goal is to make the output of the network model as close as possible to the target. In order to achieve this goal, the most straightforward idea is to use the Mean Squared Error (Mean Squared Error). The mean square error is a commonly used loss function in regression problems, and its core idea is to minimize the square of the difference between the model's prediction result and the target. Its function expression is:

E=0.5∑i=1N​(yi​−ti​)2

Where yi​is the output of the network model, ti​is our desired target, and N is the number of outputs.

Implementation of forward propagation of common loss functions

For the cross-entropy loss function, the training has pre-defined a SoftmaxWithLossclass, which is a composite implementation of softmax and cross-entropy. You need to implement the forward function of this class forward(x, t). forward(x, t)The input of the function xis a dimension equal to 2 numpy.ndarray, the shape is (B,C), where B is the batch size, and C is the number of categories; tit is the category to which each sample in the batch belongs, and it is a shape of (B,) inttype numpy.ndarray. First, you need to xsoftmax along the second dimension. The training has provided an implementation of the softmax function, which you can use directly. After that, you need to implement the cross-entropy loss function. The final output is the average of the loss over all samples in the batch .

For the mean square error loss function, the training has pre-defined a MeanSquaredErrorclass. You need to implement the forward function of this class forward(y, t). forward()The input of the function yis a dimension equal to 2 and numpy.ndarraya shape of (B, N), where B is the batch size and N is the number of outputs; tit is also a shape of (B, N) numpy.ndarray, representing the expected value of each output. You need to implement the mean square error loss function, and the final output is the sum of the loss of all samples in the batch .

For the above two loss functions, you need to record the last loss value in self.loss.

programming requirements

According to the prompt, add the code between Begin and End of the editor on the right to realize the above loss function.

Test instruction

The platform will test the code you write. The test method is: the platform will randomly generate input x/ yand target , and then create an instance of / class taccording to your implementation code , and then use this instance to perform forward propagation calculations. Your answers will be compared with standard answers. Because the calculation of floating point numbers may have errors, as long as the error between your answer and the standard answer does not exceed 10−5.SoftmaxWithLossMeanSquaredError

Sample input:

 
 
  1. # 对于SoftmaxWithLoss损失函数:
  2. x:
  3. [[-1 0 1]
  4. [-2 0 2]]
  5. t:
  6. [1, 2]
  7. # 输出loss
  8. 0.775
  9. # 对于MeanSquaredError损失函数:
  10. y:
  11. [[-1 0 1]
  12. [-2 0 2]]
  13. t:
  14. [[0 0 0]
  15. [0 0 0]]
  16. # 输出loss
  17. 5.0

The above results have rounding errors, which you can ignore.


Let's start your mission, I wish you success!

Code:

import numpy as np


 

def softmax(x):

    x = x - np.max(x, axis=1, keepdims=True)

    return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)


 

class SoftmaxWithLoss:

    def __init__(self):

        self.loss = None

    def forward(self, x, t):

        r'''

        Forward propagation of SoftMax + Cross Entropy

        Parameter:

        - x: numpy.array, (B, C)

        - t: numpy.array, (B)

        Return:

        - loss: float

        '''

        ########## Begin ##########

        y = softmax(x)  

        batch_size = y.shape[0]  

        # In order to avoid y being too small when calculating the log, add a 1e-7 to avoid arithmetic errors  

        loss = -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size  

        self.loss = loss  

        return loss  

        ########## End ##########


 

class MeanSquaredError:

    def __init__(self):

        self.loss = None

    def forward(self, y, t):

        r'''

        Forward propagation of Mean Squared Error

        Parameter:

        - y: numpy.array, (B, N)

        - t: numpy.array, (B, N)

        Return:

        - loss: float

        '''

        ########## Begin ##########

        loss = 0.5 * np.sum((y - t) ** 2)  

        self.loss = loss  

        return loss

        ########## End ##########

Code screenshot:,

Pass 2: Implement backpropagation for common loss functions

mission details

The task of this level: realize the backpropagation of common loss functions.

related information

In order to complete this task, you need to master: the definition of common loss functions.

For the content of this training, please refer to chapters 4.1−4.2 in the book "Introduction to Deep Learning - Theory and Implementation Based on Python".

Gradient Descent

Through the study of the previous level, we know that the role of the loss function is to measure the gap between the output of the model and our expectation. Then, the smaller the value of the loss function, the more likely we are to obtain a model with better performance. So, how to change the value of the parameters in the model so that the value of the loss function keeps getting smaller? Here we introduce the concept of gradient . The gradient is the direction in which the function value increases the fastest. Generally speaking, this is also the negative direction of the direction in which the function value decreases the fastest. If we can find the gradient ∂l/∂w of each parameter, then we can make all parameters take a small step along their negative gradient direction and get a new set of parameters. This is the basic idea of ​​the gradient descent method , and the distance of this small step is called the learning rate η. The process of parameter update can be expressed by the following formula:

wi′ =wi −η⋅∂wi ∂l

So, now the question is, how can we find their gradient for each parameter? This process is called backpropagation of the neural network (backprop). For the neural network, because of its stacked structure, the matter of finding the gradient becomes a little more complicated. But also because of this stacking structure, this process can also be solved step by step according to the stacking order. The core idea of ​​backpropagation is the chain rule of derivation, namely:

∂x∂l​=∂f(x)∂l​⋅∂x∂f(x)​

In this way, the process of obtaining the gradient of the neural network becomes the process of obtaining the gradient of each layer. In this exercise, we focus on the first term of the above formula, which is the backpropagation of the loss function itself.

Backpropagation of common loss functions

1. Cross Entropy

In the previous level, we introduced the functional expression of the cross-entropy loss function as:

E=−∑i=1C​ti​log(yi​)

If softmax is combined, then there are:

yi​E​=∑j=1C​exi​exi​​=−i=1∑C​ti​log(yi​)​

The calculation of the gradient of xi​ corresponding to the above formula is:

∂xi​∂l​=yi​−ti​

When t is one-hot encoded, it can be further simplified as:

∂xk​∂l​=yk​−1

where k is the number corresponding to the correct category. The derivation process of the above formula is slightly cumbersome, interested students can refer to this article .

2. Mean Squared Error

Compared with cross entropy, the backpropagation of mean square error is simpler. Recall the forward propagation formula for mean squared error:

E=0.5∑i=1N​(yi​−ti​)2

We can directly write the formula for its backpropagation:

∂yi​∂l​=yi​−ti​

Where yi​is the output of the network model, ti​is our desired target, and N is the number of outputs.

Implementation of forward propagation of common loss functions

SoftmaxWithLossFor the cross-entropy loss function, the training extends the class defined in the previous pass , which is a composite implementation of softmax and cross-entropy. The implementation of the training has been given forward(x, t), and it has been modified to meet the needs of backpropagation. You need to implement the backpropagation function of this class backward(). backward()The function does not require any input, you need to calculate the gradient of the input according to the value of the sum recorded forward(x, t)when the function is called , and return it as the return value.self.tself.yforward(x, t)xbackward()

MeanSquaredErrorFor the mean square error loss function, the training extends the class defined in the previous pass , which is a composite implementation of softmax and cross entropy. The implementation of the training has been given forward(y, t), and it has been modified to meet the needs of backpropagation. You need to implement the backpropagation function of this class backward(). backward()The function does not require any input, you need to calculate the gradient of the input according to the value of the sum recorded forward(y, t)when the function is called , and return it as the return value.self.tself.yforward(y, t)ybackward()

programming requirements

According to the prompt, add the code between Begin and End of the editor on the right to realize the above loss function.

Test instruction

The platform will test the code you write. The test method is: the platform will randomly generate the input x/ yand the target , and then create an instance of the / class taccording to your implementation code , and then use this instance to perform forward propagation calculations. Backpropagation calculations. Your answers will be compared with standard answers. Because the calculation of floating point numbers may have errors, as long as the error between your answer and the standard answer does not exceed 10−5.SoftmaxWithLossMeanSquaredError

Sample input:

 
 
  1. # 对于SoftmaxWithLoss损失函数:
  2. x:
  3. [[-1 0 1]
  4. [-2 0 2]]
  5. t:
  6. [1, 2]
  7. # x的梯度
  8. [[ 0.04501529 -0.37763578 0.33262047]
  9. [ 0.00793812 0.05865521 -0.06659332]]
  10. # 对于MeanSquaredError损失函数:
  11. y:
  12. [[-1 0 1]
  13. [-2 0 2]]
  14. t:
  15. [[0 0 0]
  16. [0 0 0]]
  17. # y的梯度
  18. [[-1 0 1]
  19. [-2 0 2]]

The above results have rounding errors, which you can ignore.


Let's start your mission, I wish you success!

Code:

import numpy as np


 

def softmax(x):

    x = x - np.max(x, axis=1, keepdims=True)

    return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)


 

class SoftmaxWithLoss:

    def __init__(self):

        self.loss = None

        self.y = None

        self.t = None

    def forward(self, x, t):

        r'''

        Forward propagation of SoftMax + Cross Entropy

        Parameter:

        - x: numpy.array, (B, C)

        - t: numpy.array, (B)

        Return:

        - loss: float

        '''

        y = softmax(x)

        batch_size = y.shape[0]

        loss = -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size

        self.loss = loss

        self.y = y

        self.t = t

        return loss

    def backward(self):

        r'''

        Backpropagation of SoftMax + Cross Entropy

        Return:

        - dx: numpy.array, (B, C)

        '''

        ########## Begin ##########

        batch_size = self.t.shape[0]  

        dx = self.y.copy()  

        dx[np.arange(batch_size), self.t] -= 1  

        dx = dx / batch_size  

        return dx  

        ########## End ##########


 

class MeanSquaredError:

    def __init__(self):

        self.loss = None

        self.y = None

        self.t = None

    def forward(self, y, t):

        r'''

        Forward propagation of Mean Squared Error

        Parameter:

        - y: numpy.array, (B, N)

        - t: numpy.array, (B, N)

        Return:

        - loss: float

        '''

        loss = 0.5 * np.sum((y - t) ** 2)

        self.loss = loss

        self.y = y

        self.t = t

        return loss

    def backward(self):

        r'''

        Backpropagation of Mean Squared Error

        Return:

        - dy: numpy.array, (B, N)

        '''

        ########## Begin ##########

        y_grad = self.y - self.t  

        return y_grad

        ########## End ##########

Code screenshot:

 

 

Guess you like

Origin blog.csdn.net/qq_57409899/article/details/124341186