Softmax function and its misunderstanding

663aa4c4e2fa64cdeb88fb8789f2f5f5.png

来源:深度学习爱好者 CV技术指南
本文约1300字,建议阅读6分钟
本文为你从全方位介绍Softmax函数。

[Guide] Softmax is an activation function that everyone is familiar with. However, many people only know its expression and its position in the network, but they cannot answer some specific reasons and details. This article gives a corresponding introduction. 

Softmax is a mathematical function used to normalize values ​​between 0 and 1.

In this article, you'll learn about:

  • What is Softmax activation function and its mathematical expression?

  • How is it implemented using the argmax() function?

  • Why is Softmax only used in the last layer of the neural network?

  • Misunderstanding of Softmax


What is Softmax activation function and its mathematical expression?

In deep learning, Softmax is used as an activation function to normalize the output and scale for each value in a vector between 0 and 1. Softmax is used for classification tasks. In the last layer of the network, an N-dimensional vector is generated, one for each class in the classification task.

5d25d256145faaa14148843625a07885.jpeg

N-dimensional vector in the output layer of the network

Softmax is used to normalize those weighted sum values ​​between 0 and 1, and their sum is equal to 1, that's why most people think these values ​​are class probabilities, but this is a misconception, we will discuss in It is discussed in this article.

The formula to implement the Softmax function:

87bdbe07aa53cb5664c5ecb096e1376e.jpeg

Using this mathematical expression, we calculate the normalized value for each class of data. Here θ(i) is the input we get from the flatten layer.

Computes the normalized value for each class, where the numerator is the index value of the class and the denominator is the sum of the index values ​​of all classes. Using the Softmax function, we get all values ​​between 0 and 1, the sum of all values ​​becomes equal to 1. So people see it as probability, which is their misconception.

How does it use the argmax() function?

After applying the above mathematical function to each class, Softmax calculates a value between 0 and 1 for each class.

Now we have several values ​​for each class, to classify which class the input belongs to, Softmax uses argmax() which gives the index of the value which has the maximum value after applying Softmax.

a9dcf23a56ccc19c3ae708a205164e66.jpeg

Visual interpretation of argmax

Why is Softmax only used in the last layer of the neural network?

Now coming to the important part, Softmax is only used in the last layer to normalize the values, while other activation functions (relu, leaky relu, sigmoid and various others) are used in the inner layers .

If we see other activation functions like relu, leaky relu and sigmoid, they all use unique single values ​​to bring non-linearity. They can't see what the other values ​​are.

But in the Softmax function, in the denominator, it takes the sum of all exponent values ​​to normalize the values ​​of all classes. It takes into account the values ​​of all classes in scope, that's why we use it in the last layer. To know which class the Input belongs to by analyzing all the values.

b2435494db8370b5d11016995e6da02e.jpeg

The Softmax activation function of the last layer

Misunderstanding of Softmax

The first and biggest misconception about Softmax is that its output via normalized values ​​is a probability value for each class, which is completely wrong. This misunderstanding is because the values ​​sum to 1, but they are just normalized values ​​and not class probabilities.

c696d72452750b2640af9abca6d007d6.jpeg

Instead of using Sotmax alone in the last layer, we prefer to use Log Softmax, which just takes the logarithm of the normalized values ​​from the Softmax function.

Log Softmax is superior to Softmax in terms of numerical stability, cheaper model training costs, and Penalizes Large error (the greater the error, the greater the penalty).

This is the Softmax function used as activation function in neural networks. I believe that after reading this article, you already have a clear understanding of it.

Original link: https://medium.com/artificialis/softmax-function-and-misconception-4248917e5a1c

Editor: Huang Jiyan

f982ca684d2347ebfebfad8fe60fdcde.png

Guess you like

Origin blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/131318100