Deep Learning - Loss Functions and Optimization


I just finished learning to use resnet18 training data to implement cifar10 classification. Now I turned around and found my loss function, backward propagation, and gradient descent. The concepts of this part are not very good, so I have to study it again.

Loss

In the process of training the model, we need to measure whether the parameters of this model are good or bad, and to measure the quality of the parameters of this model is to look at the loss of the output. In the process of training the model, the predicted label of this model for a picture classification will be output. The difference between this predicted label and the real label of this picture is what we call loss.

What is a loss function

The loss needs to be calculated by us, and the calculation method is the loss function.
We use the **Li()** function to represent our method of calculating this difference, then the following concept can be obtained. The loss for the entire dataset is thus also obtained.
Where:
Si represents the true label, yi represents the predicted label

insert image description here

Here, the label referred to represents the classification score of an image for its category, so here is a numerical value
insert image description here

In practice, we generally use the following formula to represent the loss function
insert image description here
. The role of the regular term has the following points:
1. Ensure the generalization ability of the model.
2. Prevent the model from overfitting the training set

Example: Multiclass SVM Loss

SVM multi-class loss

Note: The right side of the picture should be Sj+1, no - sign.
Calculation example:
insert image description here

There are many other loss functions, you can see more loss functions in the following two blogs

https://xiongyiming.blog.csdn.net/article/details/99672818

https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650748392&idx=2&sn=1cc2080bad1cfee17e8f292256742c44&chksm=871

Optimization

Because deep learning refers to the computer optimizing its own parameters, the optimization process is that the network continuously optimizes its own parameters, so that the loss function is continuously reduced, so as to achieve a better accuracy process.

How to optimize

In order to reduce the loss, we need to optimize, and there are many ways to optimize:

Gradient descent method, Newton method: Taylor + minimization + update;
heuristic: ant colony, genetics, simulated annealing, particle, etc.;
Constrained: Lagrange multiplier method, etc.;
batch gradient descent, stochastic gradient descent method: SGD , Adam, etc.;
these methods are the general descriptions learned in class, we only focus on the gradient descent method, such as ant colony, genetics, simulated annealing, etc. are not mentioned, you can learn it by yourself.

In the current process of deep learning, stochastic gradient descent is the most used method.

Stochastic Gradient Descent

insert image description here

Among them, W refers to the weight or parameter in the network.

The simple understanding is that we can use the gradient descent method to reduce the loss, and the specific process actually involves the chain rule of derivation. This involves some methods of mathematics. You can continue to learn from Baidu.

Guess you like

Origin blog.csdn.net/scarecrow_sun/article/details/119698134