Softmax regression + loss function + image classification dataset hands-on deep learning v2 pytorch

1. Softmax regression

insert image description here
insert image description here
insert image description here
insert image description here

insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here
insert image description here

2. Loss function

insert image description here

2.1 L2 Loss mean square loss function

Blue line: the changing loss function, when the predicted value of the change, this is a quadratic function y=0Green line: likelihood function Orange line: the gradient of the loss function, the gradient is a linear functiony'0.5 * y'^2
e^-l
y-y'

During gradient descent, we update our parameters according to the direction of the negative gradient, so its derivative determines how to update the parameters. When the actual value y is far away from the predicted value y', the (y-y')change is relatively large; on the contrary, when it is close to the origin, the derivative is relatively small.

The disadvantage is that when we are far away from the dot, we do not necessarily need to change the gradient so much to update our parameters.
insert image description here

2.2 L1 Loss absolute value loss function

The blue line: the changing loss function, when the predicted value of the change, which is a quadratic function y=0. The green line: the likelihood function , is a Gaussian distribution, but it is also steeper at 0 and has a peak. Orange line: the gradient of the loss function. When the predicted value is far from the true value, the gradient is an absolute value error, which is a constant; when the predicted value is relatively close to the true value, it is an exponential function.y'y'
e^-l

In order to reduce the distance from the dot, do not update the parameters so quickly. An absolute value loss function can be used.

  • When y' > 0when, the derivative is 1;
  • When y' < 0, the derivative is -1;
  • When y' = 0is not differentiable, the derivative is [-1,1]between

Variation is a parameter, no matter how far apart the actual and predicted values ​​are. The advantage is that the stability in the early stage is better. When the optimization reaches the end, when it y-y'approaches 0, its gradient becomes very large, and it is not easy to optimize, that is, the green sharp place.

insert image description here

2.3 Huber’s Robust Loss

Huber's Robust Loss combines the benefits of L2 Loss and L1 Loss.
Blue line: the changing loss function, the predicted value of the change when y=0, y'which is a quadratic function y'
Green line: the likelihood function e^-l, which is a Gaussian distribution, but also smoother at 0
Orange line: the gradient of the loss function , when the predicted value is far from the real value, the gradient is an absolute value error, which is a constant; when the predicted value is relatively close to the real value, it is a squared error.

insert image description here
insert image description here

3. Image Classification Dataset

insert image description here
insert image description here
insert image description here

insert image description here
insert image description here
insert image description here

refer to

https://www.bilibili.com/video/BV1K64y1Q7wu?p=1

Guess you like

Origin blog.csdn.net/zgpeace/article/details/123726284