Softmax classifier and cross entropy loss (easy to understand)

Before we talk about classifiers, let's understand linear classification

Linear function y = kx + b, when dealing with multiple categories and multiple features, W can be regarded as a matrix, vertically represents the category, and horizontally represents the feature value, now there are 3 categories, each category has only 2 features

 A linear classification function can be defined as:

f(x i, W, b)=W x_{i}+b

Our goal is to learn the parameters W, b through the training set data. Once the learning is complete, the training set can be discarded and only the learned parameters are kept.

1. Loss function

The loss function is an evaluation function used to tell us the performance of the current classifier. It is an instructive function used to guide the weight adjustment of the classifier. Through this function, we can know how to improve the weight coefficient. Generally speaking, a set of parameters (W, b) corresponds to a loss L. Generally, the smaller the loss, the better the model. Our goal is to make the loss reach the optimal value through various optimizations (not necessarily the smallest is the optimal value).

Common loss functions:

  • log likelihood loss

\small L(y, \hat y)=-y\log\hat y-(1-y)\log(1-\hat y)

  • hundred page loss

\small L_i = \sum_{j\neq y_i} \max(0, s_j - s_{y_i} + \Delta)

Now use the louver loss to find the loss of the previous linear classification:

L_{i}=\sum \max \left(0, s_{j}-s_{y i}+\Delta\right)=\max (0,0.67-0.77+1)+\max (0,2.3-0.67+1) =\max (0,0.9)+\max (0,1.97)=2.87

2. Softmax classifier and cross-entropy loss (cross-entropy)

  • Softmax

function definition

\small y=\frac{e^{f_{i}}}{\sum_{j} e^{f j}}

Simply put, the softmax function will reduce the output result to a value from 0 to 1, and add all values ​​to 1

Use the softmax function to score the previous linear classification

Category 1:                                    y_{0}=\frac{e^{0.77}}{e^{0.77}+ e^{0.67}+e^{0.23}} = 0.21

Category 2:                                    y_{0}=\frac{e^{0.67}}{e^{0.77}+ e^{0.67}+e^{0.23}}=0.18

Category 3:                                    y_{0}=\frac{e^{2.3}}{e^{0.77}+ e^{0.67}+e^{2.3}}=0.61

  • Cross entropy loss (cross-entropy)

The cross-entropy loss measures the performance of a classification model and its output is a probability value between 0 and 1. The cross-entropy loss increases as the predicted probabilities deviate from the actual labels. Therefore, predicting a probability of 0.012 when the actual observed label is 1 is not well modeled and leads to high loss values. A perfect model has a log loss of 0. Cross-entropy is generally used after the result of the softmax function is obtained .

Function definition:

L_{\mathrm{CE}}=-\sum_{i=1}^{n} t_{i} \log \left(p_{i}\right)

t_{i}Is the real value, p_{i}which is the result obtained by the softmax function.

 Because the real value only belongs to this category or does not belong to this category, 1 means this category, as shown in the figure, it means that the input is category 3,

cross-entropy calculation

\begin{aligned} L_{C E} &=-\sum_{i=1} t_{i} \log \left(p_{i}\right) \\ &=-\left[0 \log _{2}(0.21)+0 \log _{2}(0.18)+1 \log _{2}(0.61)\right] \\ &=-\log _{2}(0.61) \\ &=0.71 \end{aligned}

Why the minus sign?

log function image:

 Because the result of softmax is between (0,1), the cross-entropy result is negative, and the negative sign makes the loss positive.

If the softmax results are now 0.10, 0.08, and 0.82 through optimization, then calculate the cross-entropy results and compare them

\begin{aligned} L_{C E2} &=-\sum_{i=1} t_{i} \log \left(p_{i}\right) \\ &=-\left[0 \log _{2}(0.10)+0 \log _{2}(0.08)+1 \log _{2}(0.82)\right] \\ &=-\log _{2}(0.82) \\ &=0.28 \end{aligned}

0.28Less than the previous loss of 0.71 , suggesting that the model is learning. The optimization process (adjusting the weights so that the output is close to the true value) continues until the end of training.

Reference https://towardsdatascience.com/cross-entropy-loss-function-f38c4ec8643e

Guess you like

Origin blog.csdn.net/Peyzhang/article/details/125418625