Understand the softmax function in one minute

Students who have done many classification tasks must know the softmax function. The softmax function is also called the normalized exponential function. It is the generalization of the binary classification function sigmoid in multi-classification, and its purpose is to display the results of multi-classification in the form of probability. The figure below shows the calculation method of softmax:

Let me explain to you why softmax is in this form.

First of all, we know that probability has two properties: 1) the predicted probability is non-negative ; 2) the sum of the probabilities of various predicted results is equal to 1.

Softmax converts the prediction results from negative infinity to positive infinity into probabilities according to these two steps.

1) Convert the prediction result into a non-negative number

The picture below shows the image of y=exp(x). We can know that the value range of the exponential function is from zero to positive infinity. The first step of softmax is to transform the prediction results of the model into an exponential function, thus ensuring the non-negative nature of the probability .

2) The sum of the probabilities of various predicted results is equal to 1

In order to ensure that the sum of the probabilities of each predicted outcome is equal to 1. We only need to normalize the converted results. The method is to divide the converted results by the sum of all converted results, which can be understood as the percentage of the converted results in the total. This gives approximate probabilities.

Let me give you an example below. Suppose the model predicts a three-class classification problem as -3, 1.5, 2.7. We need to use softmax to convert the model results into probabilities. Proceed as follows:

1) Convert the prediction result into a non-negative number

y1 = exp(x1) = exp(-3) = 0.05

y2 = exp(x2) = exp(1.5) = 4.48

y3 = exp(x3) = exp(2.7) = 14.88

2) The sum of the probabilities of various predicted results is equal to 1

z1 = y1/(y1+y2+y3) = 0.05/(0.05+4.48+14.88) = 0.0026

z2 = y2/(y1+y2+y3) = 4.48/(0.05+4.48+14.88) = 0.2308

z3 = y3/(y1+y2+y3) = 14.88/(0.05+4.48+14.88) = 0.7666

To summarize how softmax converts multi-classification output into probability, it can be divided into two steps:

1) Numerator: Maps real number output from zero to positive infinity through an exponential function.

2) Denominator: Add all results and normalize.

The picture below is the most softmax explanation in the CS224n course at Stanford University:


——————————————
Copyright Statement: This article is an original article by CSDN blogger "-Never Compromise-" and follows the CC 4.0 BY-SA copyright agreement. Please attach the original source for reprinting Links and this statement.
Original link: https://blog.csdn.net/lz_peter/article/details/84574716

Guess you like

Origin blog.csdn.net/modi000/article/details/132318593