Sigmoid function and how to deal with Softmax be classified?

HowNet new sources of business, the original title: Function Classification big PK: Sigmoid and Softmax, respectively, how to use?

 

When designing the model to perform classification tasks (such as chest X-ray to disease or handwritten digital classification), sometimes you need to select multiple answers (such as simultaneous selection pneumonia and abscesses), and sometimes can only choose one answer (such as the number "8 "). This article discusses how to use the raw output value Softmax Sigmoid function or function processing classifier.

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Neural network classifier

There are many classification algorithms, but the discussion in this article is limited to the neural network classifier. Classification problem can be solved by different neural networks, such as feed-forward neural networks and convolution neural network.

Sigmoid function or application function Softmax

Neural Network Classifier The end result is a vector, i.e., "raw output value," such as [-0.5, 1.2, -0.1, 2.4], corresponding to the four output values ​​pneumonia found chest X-ray examination, cardiac hypertrophy, tumors and abscesses. But these raw output value What does it mean?

The output value is converted to a probability may be easier to understand. Compared to seemingly random "2.4", the possibility of suffering from diabetes was 91% in patients with this statement easier to understand.

Softmax Sigmoid function or output value of the original function may be mapped to a probability classifier.

The figure shows the original output value (blue) feedforward neural network mapping by the Sigmoid function is a probability (red):

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Then the process is repeated using Softmax function:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

As shown, Sigmoid function and Softmax function returns a different result.

The reason is that, the Sigmoid function handles each respective raw output value, therefore the results are independent, the probability is not necessarily the sum of 1, FIG. 0.37 + 0.77 + 0.48 + 0.91 = 2.53.

In contrast, the output value of the function correlating Softmax, the sum of probabilities is always 1, as 0.04 + 0.21 + 0.05 + 0.70 = 1.00. Thus, in Softmax function, in order to increase the probability of a category, the probability must be a corresponding reduction in other categories.

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Sigmoid function Application:

With chest X-rays and hospitalization for example

Chest X-ray: a chest X-ray is capable of simultaneously displaying a variety of diseases, chest X-ray sorter therefore also need to display a variety of symptoms. The following figure shows a display pneumonia, abscesses and chest X-ray, there are two "1" in the right column of the label:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

入院 :目标是根据患者的健康档案,判断该患者将来入院的可能性。因此,分类问题可设计为:根据诊断可能导致患者未来入院的病症(如果有的话),对该患者现有的健康档案进行分类。导致患者入院的疾病可能有多种,因此答案可能有多个。

图表 :下面两个前馈神经网络分别对应上述问题。在最后计算中,由Sigmoid函数处理原始输出值,得出相应概率,允许多种可能性并存——因胸部X射线可能反映出多种异常状态,则患者入院的病因可能不止一种。

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Softmax函数应用:

以手写数字和Iris(鸢尾花)为例

手 写数字 :在区别手写数字(MNIST数据集:https://en.wikipedia.org/wiki/MNIST_database)时,分类器应采用Softmax函数,明确数字为哪一类。毕竟,数字8只能是数字8,不能同时是数字7。

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Iris :Iris数据集于1936年引入(https://en.wikipedia.org/wiki/Iris_flower_data_set),一共包含150个数据集,分为山鸢尾、杂色鸢尾、维吉尼亚鸢尾3类,每类各有50个数据集,每个数据包含花萼长度、花萼宽度、花瓣长度、花瓣宽度4个属性。

以下9个示例摘自Iris数据集:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

数据集中没有任何图像,但下图的杂色鸢尾(https://en.wikipedia.org/wiki/Iris_flower_data_set#/media/File:Iris_versicolor_3.jpg),可供你欣赏:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Iris数据集的神经网络分类器,要采用Softmax函数处理原始输出值,因为一朵鸢尾花只能是某一个特定品种——将其分为几个品种毫无意义。

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

关于“e”的注解

要理解Sigmoid和Softmax函数,应先引入 “e”。在本文中,只需了解e是约等于2.71828的数学常数。

下面是关于e的其他信息:

• e的十进制表示永远存在,数字出现完全随机——类似于pi。

• e常用于复利、赌博和某些概率分布的研究中。

• 下面是e的一个公式:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

但e的公式不止一个。其计算方法有多种。

有关示例:https://www.intmath.com/exponential-logarithmic-functions/calculating-e.php

• 2004年,谷歌公司首次公开募股达2,718,281,828美元,即“e百万美元”。

• 维基百科中人类历史上著名的十进制数字e的演变

(https://en.wikipedia.org/wiki/E_%28mathematical_constant%29#Bernoulli_trials),从1690年的一位数字开始,持续到1978年的116,000位数字:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Sigmoid函数和Softmax函数

Sigmoid =多标签分类问题=多个正确答案=非独占输出(例如胸部X光检查、住院)

• 构建分类器,解决有多个正确答案的问题时,用Sigmoid函数分别处理各个原始输出值。

• Sigmoid函数如下所示(注意e):

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

在该公式中,σ表示Sigmoid函数,σ(zj)表示将Sigmoid函数应用于数字Zj。 “Zj”表示单个原始输出值,如-0.5。 j表示当前运算的输出值。如果有四个原始输出值,则j = 1,2,3或4。在前面的例子中,原始输出值为[-0.5,1.2,-0.1,2.4],则Z1 = -0.5,Z2 = 1.2,Z3 = -0.1,Z4 = 2.4。

所以,

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Z2,Z3、Z4 的计算过程同上。

由于Sigmoid函数分别应用于每个原始输出值,因此可能出现的输出情况包括:所有类别概率都很低(如“此胸部X光检查没有异常”),一种类别的概率很高但是其他类别的概率很低(如“胸部X光检查仅发现肺炎”),多个或所有类别的概率都很高(如“胸部X光检查发现肺炎和脓肿”)。

下图为Sigmoid函数曲线:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

https://en.wikipedia.org/wiki/Sigmoid_function#/media/File:Logistic-curve.svg

Softmax =多类别分类问题=只有一个正确答案=互斥输出(例如手写数字,鸢尾花)

• When building a classifier, there is only one correct answer to solve the problem, deal with various raw output value Softmax function.

• Softmax denominator function combines all the elements of the original output value, which means that the linkages between different probabilities Softmax function obtained.

• Softmax function is expressed as follows:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

In addition to the denominator, as the combination of all factors, the original output values ​​e ^ thing addition, the Softmax function Sigmoid function with little difference. In other words, the raw output value calculating individual (e.g. Z1) with Softmax function can not be calculated only Z1, the denominator of Z1, Z2, Z3 and Z4 should also be calculated as follows:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Softmax advantage of functions is that the sum of all the output probability is 1:

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Distinguishing handwritten digital, to process the raw output value Softmax function, such as to increase the probability that a sample is divided into "8", it is necessary to reduce the number of other sample was assigned (0,1,2,3,4,5 , 6, 7 and / or 9) the probability.

Other examples of Sigmoid and Softmax

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

Knowledge maps, classification function big PK: Sigmoid and Softmax, respectively, how to use?

to sum up

• If the model outputs are non mutually exclusive categories, and can simultaneously select a plurality of categories, the raw output Sigmoid function value calculated using the network.

• If the model output are mutually exclusive categories, and select a category only, the original output value Softmax function calculation using the network.

Guess you like

Origin www.cnblogs.com/xinzhihao/p/11077704.html