Pass 1: Implement forward propagation of common loss functions
mission details
The task of this level: implement the forward propagation of common loss functions.
related information
In order to complete this task, you need to master: the definition of common loss functions.
For the content of this training, please refer to chapters 4.1−4.2 in the book "Introduction to Deep Learning - Theory and Implementation Based on Python".
Neural Network Training
As a technology that had appeared in the 1990s, why did neural networks suddenly re-enter people's field of vision after 2012, and show unprecedented dominance in computer vision, natural language processing and other fields? The answer includes two factors. The first is the rapid development of computer hardware, which provides powerful computing support for deep learning; the second is data. With the rapid development of cloud computing and other technologies, the Internet provides massive data for deep learning. . It can be said that data is the lifeblood of deep learning, deep learning is driven by data, and data quality is also the most critical factor in determining the performance of neural network models.
So with the data, how to get a model that can solve the problem? This process is the training of the model . The purpose of training is to make the model recognize the pattern that exists in the data , which can be understood as the hidden common features behind the data. When training the model, we need to set a goal so that the output of the model is close to the result we want. The loss function is used to measure how close the model's output is to our desired result. In different types of tasks, we use different loss functions. Below, we introduce the commonly used loss functions.
Definition of Common Loss Functions
1. Cross Entropy
In classification tasks, the network usually predicts the probability that the input sample belongs to each class, and our goal is to expect the correct class with the highest probability. This is the idea of maximum likelihood probability. Based on this idea, we can get the Cross Entropy loss function. Its function expression is:
E=−∑i=1Cqilog(pi)
Where qiis the one-hot encoding of the label category. When the sample belongs to the i-th category, qi=1, otherwise qi=0; C represents the number of categories; piis the predicted probability of each category, Usually the output of softmax:
pi=exi/∑j=1Cexj
where xiis the output of the network model. Softmax can convert the output of the network into several positive real numbers whose sum is 1. The larger xi corresponds to the larger pi, so it can be regarded as the probability of multiple categories.
When implementing cross-entropy, for the simplicity of derivation during backpropagation, it is usually implemented together with softmax, that is, for the output of the model, first do softmax once, and then calculate the cross-entropy loss. The advantages of backpropagation corresponding to this approach will be introduced in the next level.
2. Mean Squared Error
Regression problems are simpler than classification problems. In regression problems, our goal is to make the output of the network model as close as possible to the target. In order to achieve this goal, the most straightforward idea is to use the Mean Squared Error (Mean Squared Error). The mean square error is a commonly used loss function in regression problems, and its core idea is to minimize the square of the difference between the model's prediction result and the target. Its function expression is:
E=0.5∑i=1N(yi−ti)2
Where yiis the output of the network model, tiis our desired target, and N is the number of outputs.
Implementation of forward propagation of common loss functions
For the cross-entropy loss function, the training has pre-defined a SoftmaxWithLoss
class, which is a composite implementation of softmax and cross-entropy. You need to implement the forward function of this class forward(x, t)
. forward(x, t)
The input of the function x
is a dimension equal to 2 numpy.ndarray
, the shape is (B,C), where B is the batch size, and C is the number of categories; t
it is the category to which each sample in the batch belongs, and it is a shape of (B,) int
type numpy.ndarray
. First, you need to x
softmax along the second dimension. The training has provided an implementation of the softmax function, which you can use directly. After that, you need to implement the cross-entropy loss function. The final output is the average of the loss over all samples in the batch .
For the mean square error loss function, the training has pre-defined a MeanSquaredError
class. You need to implement the forward function of this class forward(y, t)
. forward()
The input of the function y
is a dimension equal to 2 and numpy.ndarray
a shape of (B, N), where B is the batch size and N is the number of outputs; t
it is also a shape of (B, N) numpy.ndarray
, representing the expected value of each output. You need to implement the mean square error loss function, and the final output is the sum of the loss of all samples in the batch .
For the above two loss functions, you need to record the last loss value in self.loss
.
programming requirements
According to the prompt, add the code between Begin and End of the editor on the right to realize the above loss function.
Test instruction
The platform will test the code you write. The test method is: the platform will randomly generate input x
/ y
and target , and then create an instance of / class t
according to your implementation code , and then use this instance to perform forward propagation calculations. Your answers will be compared with standard answers. Because the calculation of floating point numbers may have errors, as long as the error between your answer and the standard answer does not exceed 10−5.SoftmaxWithLoss
MeanSquaredError
Sample input:
# 对于SoftmaxWithLoss损失函数:
x:
[[-1 0 1]
[-2 0 2]]
t:
[1, 2]
# 输出loss
0.775
# 对于MeanSquaredError损失函数:
y:
[[-1 0 1]
[-2 0 2]]
t:
[[0 0 0]
[0 0 0]]
# 输出loss
5.0
The above results have rounding errors, which you can ignore.
Let's start your mission, I wish you success!
Code:
import numpy as np
def softmax(x):
x = x - np.max(x, axis=1, keepdims=True)
return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)
class SoftmaxWithLoss:
def __init__(self):
self.loss = None
def forward(self, x, t):
r'''
Forward propagation of SoftMax + Cross Entropy
Parameter:
- x: numpy.array, (B, C)
- t: numpy.array, (B)
Return:
- loss: float
'''
########## Begin ##########
y = softmax(x)
batch_size = y.shape[0]
# In order to avoid y being too small when calculating the log, add a 1e-7 to avoid arithmetic errors
loss = -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size
self.loss = loss
return loss
########## End ##########
class MeanSquaredError:
def __init__(self):
self.loss = None
def forward(self, y, t):
r'''
Forward propagation of Mean Squared Error
Parameter:
- y: numpy.array, (B, N)
- t: numpy.array, (B, N)
Return:
- loss: float
'''
########## Begin ##########
loss = 0.5 * np.sum((y - t) ** 2)
self.loss = loss
return loss
########## End ##########
Code screenshot:,
Pass 2: Implement backpropagation for common loss functions
mission details
The task of this level: realize the backpropagation of common loss functions.
related information
In order to complete this task, you need to master: the definition of common loss functions.
For the content of this training, please refer to chapters 4.1−4.2 in the book "Introduction to Deep Learning - Theory and Implementation Based on Python".
Gradient Descent
Through the study of the previous level, we know that the role of the loss function is to measure the gap between the output of the model and our expectation. Then, the smaller the value of the loss function, the more likely we are to obtain a model with better performance. So, how to change the value of the parameters in the model so that the value of the loss function keeps getting smaller? Here we introduce the concept of gradient . The gradient is the direction in which the function value increases the fastest. Generally speaking, this is also the negative direction of the direction in which the function value decreases the fastest. If we can find the gradient ∂l/∂w of each parameter, then we can make all parameters take a small step along their negative gradient direction and get a new set of parameters. This is the basic idea of the gradient descent method , and the distance of this small step is called the learning rate η. The process of parameter update can be expressed by the following formula:
wi′ =wi −η⋅∂wi ∂l
So, now the question is, how can we find their gradient for each parameter? This process is called backpropagation of the neural network (backprop). For the neural network, because of its stacked structure, the matter of finding the gradient becomes a little more complicated. But also because of this stacking structure, this process can also be solved step by step according to the stacking order. The core idea of backpropagation is the chain rule of derivation, namely:
∂x∂l=∂f(x)∂l⋅∂x∂f(x)
In this way, the process of obtaining the gradient of the neural network becomes the process of obtaining the gradient of each layer. In this exercise, we focus on the first term of the above formula, which is the backpropagation of the loss function itself.
Backpropagation of common loss functions
1. Cross Entropy
In the previous level, we introduced the functional expression of the cross-entropy loss function as:
E=−∑i=1Ctilog(yi)
If softmax is combined, then there are:
yiE=∑j=1Cexiexi=−i=1∑Ctilog(yi)
The calculation of the gradient of xi corresponding to the above formula is:
∂xi∂l=yi−ti
When t is one-hot encoded, it can be further simplified as:
∂xk∂l=yk−1
where k is the number corresponding to the correct category. The derivation process of the above formula is slightly cumbersome, interested students can refer to this article .
2. Mean Squared Error
Compared with cross entropy, the backpropagation of mean square error is simpler. Recall the forward propagation formula for mean squared error:
E=0.5∑i=1N(yi−ti)2
We can directly write the formula for its backpropagation:
∂yi∂l=yi−ti
Where yiis the output of the network model, tiis our desired target, and N is the number of outputs.
Implementation of forward propagation of common loss functions
SoftmaxWithLoss
For the cross-entropy loss function, the training extends the class defined in the previous pass , which is a composite implementation of softmax and cross-entropy. The implementation of the training has been given forward(x, t)
, and it has been modified to meet the needs of backpropagation. You need to implement the backpropagation function of this class backward()
. backward()
The function does not require any input, you need to calculate the gradient of the input according to the value of the sum recorded forward(x, t)
when the function is called , and return it as the return value.self.t
self.y
forward(x, t)
x
backward()
MeanSquaredError
For the mean square error loss function, the training extends the class defined in the previous pass , which is a composite implementation of softmax and cross entropy. The implementation of the training has been given forward(y, t)
, and it has been modified to meet the needs of backpropagation. You need to implement the backpropagation function of this class backward()
. backward()
The function does not require any input, you need to calculate the gradient of the input according to the value of the sum recorded forward(y, t)
when the function is called , and return it as the return value.self.t
self.y
forward(y, t)
y
backward()
programming requirements
According to the prompt, add the code between Begin and End of the editor on the right to realize the above loss function.
Test instruction
The platform will test the code you write. The test method is: the platform will randomly generate the input x
/ y
and the target , and then create an instance of the / class t
according to your implementation code , and then use this instance to perform forward propagation calculations. Backpropagation calculations. Your answers will be compared with standard answers. Because the calculation of floating point numbers may have errors, as long as the error between your answer and the standard answer does not exceed 10−5.SoftmaxWithLoss
MeanSquaredError
Sample input:
# 对于SoftmaxWithLoss损失函数:
x:
[[-1 0 1]
[-2 0 2]]
t:
[1, 2]
# x的梯度
[[ 0.04501529 -0.37763578 0.33262047]
[ 0.00793812 0.05865521 -0.06659332]]
# 对于MeanSquaredError损失函数:
y:
[[-1 0 1]
[-2 0 2]]
t:
[[0 0 0]
[0 0 0]]
# y的梯度
[[-1 0 1]
[-2 0 2]]
The above results have rounding errors, which you can ignore.
Let's start your mission, I wish you success!
Code:
import numpy as np
def softmax(x):
x = x - np.max(x, axis=1, keepdims=True)
return np.exp(x) / np.sum(np.exp(x), axis=1, keepdims=True)
class SoftmaxWithLoss:
def __init__(self):
self.loss = None
self.y = None
self.t = None
def forward(self, x, t):
r'''
Forward propagation of SoftMax + Cross Entropy
Parameter:
- x: numpy.array, (B, C)
- t: numpy.array, (B)
Return:
- loss: float
'''
y = softmax(x)
batch_size = y.shape[0]
loss = -np.sum(np.log(y[np.arange(batch_size), t] + 1e-7)) / batch_size
self.loss = loss
self.y = y
self.t = t
return loss
def backward(self):
r'''
Backpropagation of SoftMax + Cross Entropy
Return:
- dx: numpy.array, (B, C)
'''
########## Begin ##########
batch_size = self.t.shape[0]
dx = self.y.copy()
dx[np.arange(batch_size), self.t] -= 1
dx = dx / batch_size
return dx
########## End ##########
class MeanSquaredError:
def __init__(self):
self.loss = None
self.y = None
self.t = None
def forward(self, y, t):
r'''
Forward propagation of Mean Squared Error
Parameter:
- y: numpy.array, (B, N)
- t: numpy.array, (B, N)
Return:
- loss: float
'''
loss = 0.5 * np.sum((y - t) ** 2)
self.loss = loss
self.y = y
self.t = t
return loss
def backward(self):
r'''
Backpropagation of Mean Squared Error
Return:
- dy: numpy.array, (B, N)
'''
########## Begin ##########
y_grad = self.y - self.t
return y_grad
########## End ##########
Code screenshot: