content
I just finished learning to use resnet18 training data to implement cifar10 classification. Now I turned around and found my loss function, backward propagation, and gradient descent. The concepts of this part are not very good, so I have to study it again.
Loss
In the process of training the model, we need to measure whether the parameters of this model are good or bad, and to measure the quality of the parameters of this model is to look at the loss of the output. In the process of training the model, the predicted label of this model for a picture classification will be output. The difference between this predicted label and the real label of this picture is what we call loss.
What is a loss function
The loss needs to be calculated by us, and the calculation method is the loss function.
We use the **Li()** function to represent our method of calculating this difference, then the following concept can be obtained. The loss for the entire dataset is thus also obtained.
Where:
Si represents the true label, yi represents the predicted label
Here, the label referred to represents the classification score of an image for its category, so here is a numerical value
In practice, we generally use the following formula to represent the loss function
. The role of the regular term has the following points:
1. Ensure the generalization ability of the model.
2. Prevent the model from overfitting the training set
Example: Multiclass SVM Loss
Note: The right side of the picture should be Sj+1, no - sign.
Calculation example:
There are many other loss functions, you can see more loss functions in the following two blogs
https://xiongyiming.blog.csdn.net/article/details/99672818
https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650748392&idx=2&sn=1cc2080bad1cfee17e8f292256742c44&chksm=871
Optimization
Because deep learning refers to the computer optimizing its own parameters, the optimization process is that the network continuously optimizes its own parameters, so that the loss function is continuously reduced, so as to achieve a better accuracy process.
How to optimize
In order to reduce the loss, we need to optimize, and there are many ways to optimize:
Gradient descent method, Newton method: Taylor + minimization + update;
heuristic: ant colony, genetics, simulated annealing, particle, etc.;
Constrained: Lagrange multiplier method, etc.;
batch gradient descent, stochastic gradient descent method: SGD , Adam, etc.;
these methods are the general descriptions learned in class, we only focus on the gradient descent method, such as ant colony, genetics, simulated annealing, etc. are not mentioned, you can learn it by yourself.
In the current process of deep learning, stochastic gradient descent is the most used method.
Stochastic Gradient Descent
Among them, W refers to the weight or parameter in the network.
The simple understanding is that we can use the gradient descent method to reduce the loss, and the specific process actually involves the chain rule of derivation. This involves some methods of mathematics. You can continue to learn from Baidu.