Introduction to ROC and AUC and how to calculate AUC

ROC (Receiver Operating Characteristic) curve and AUC are often used to evaluate the pros and cons of a binary classifier. For a brief introduction to the two, see here . This blog post briefly introduces the characteristics of ROC and AUC, and in more depth, discusses how to make a ROC curve graph and calculate AUC.

ROC curve

It should be noted in advance that we only discuss binary classifiers here. For classifiers, or classification algorithms, the evaluation indicators mainly include precision, recall, F-score 1 , and ROC and AUC that we are going to discuss today. The figure below is an example of an ROC curve2 .

ROC Curve Example

As we can see in this example plot of the ROC curve, the abscissa of the ROC curve is the false positive rate (FPR) and the ordinate is the true positive rate (TPR). The figure below details how FPR and TPR are defined.

FPR and TPR Definitions

Next we consider four points and a line in the ROC plot. The first point, (0,1), is FPR=0, TPR=1, which means FN (false negative)=0, and FP (false positive)=0. Wow, this is a perfect classifier, it classifies all the samples correctly. The second point, (1,0), i.e. FPR=1, TPR=0, can be similarly analyzed to find that this is the worst classifier because it successfully avoids all correct answers. The third point, (0,0), that is, FPR=TPR=0, that is, FP (false positive)=TP (true positive)=0, it can be found that the classifier predicts that all samples are negative samples (negative). Similarly, at the fourth point (1,1), the classifier actually predicts that all samples are positive. After the above analysis, we can assert that the closer the ROC curve is to the upper left corner, the better the performance of the classifier.

Next consider the point on the dashed line y=x in the ROC plot. The point on this diagonal actually represents the result of a classifier that adopts a random guessing strategy, such as (0.5, 0.5), which means that the classifier randomly guesses that half of the samples are positive samples, and the other half of the samples are negative sample.

How to draw a ROC curve

For a particular classifier and test dataset, obviously only one classification result can be obtained, a set of FPR and TPR results, and to get a curve, we actually need a series of FPR and TPR values, which is how to get what about? Let's first look at the definition of ROC curve on Wikipedia :

In signal detection theory, a receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied.

The problem is "as its discrimination threshold is varied". How to understand the "discrimination threshold" here? We have overlooked an important function of the classifier, the "probability output", which means how likely the classifier thinks a sample is to be a positive (or negative) sample. By gaining a deeper understanding of the internal mechanics of each classifier, we can always figure out a way to get a probabilistic output. Generally speaking, it is to map a real number range to the (0,1) interval 3 through some transformation .

If we have obtained the probability output of all samples (the probability of belonging to a positive sample), the question now is how to change the "discrimination threshold"? We sort from large to small according to the probability that each test sample belongs to a positive sample. The following figure is an example. There are 20 test samples in the figure. The "Class" column indicates the true label of each test sample (p indicates positive samples, n indicates negative samples), and "Score" indicates that each test sample belongs to a positive sample The probability of 4 .

Sort by probability

Next, we take the "Score" value as the threshold threshold in order from high to low. When the probability of the test sample belonging to a positive sample is greater than or equal to this threshold, we consider it a positive sample, otherwise it is a negative sample. For example, for the 4th sample in the figure, its "Score" value is 0.6, then samples 1, 2, 3, and 4 are considered positive samples because their "Score" values ​​are all greater than or equal to 0.6, while other The samples are considered negative samples. Each time a different threshold is chosen, we can get a set of FPR and TPR, that is, a point on the ROC curve. In this way, we got a total of 20 sets of FPR and TPR values, and the result of drawing them on the ROC curve is as follows:

ROC curve example

When we set the threshold to 1 and 0, we can get two points (0,0) and (1,1) on the ROC curve, respectively. Connecting these (FPR, TPR) pairs yields the ROC curve. The more the threshold value is, the smoother the ROC curve.

In fact, we do not necessarily need to get the probability value that each test sample is a positive sample, as long as we get the "score value" of the classifier for the test sample (the score value is not necessarily in the (0,1) interval). The higher the score, the more certain the classifier believes that the test sample is a positive sample, and uses each score value as the threshold at the same time. I think it's a bit easier to understand converting rating values ​​into probabilities.

Calculation of AUC value

AUC(Area Under Curve)被定义为ROC曲线下的面积,显然这个面积的数值不会大于1。又由于ROC曲线一般都处于y=x这条直线的上方,所以AUC的取值范围在0.5和1之间。使用AUC值作为评价标准是因为很多时候ROC曲线并不能清晰的说明哪个分类器的效果更好,而作为一个数值,对应AUC更大的分类器效果更好。

在了解了ROC曲线的构造过程后,编写代码实现并不是一件困难的事情。相比自己编写代码,有时候阅读其他人的代码收获更多,当然过程也更痛苦些。在此推荐scikit-learn中关于计算AUC的代码

AUC意味着什么

那么AUC值的含义是什么呢?根据(Fawcett, 2006),AUC的值的含义是:

The AUC value is equivalent to the probability that a randomly chosen positive example is ranked higher than a randomly chosen negative example.

这句话有些绕,我尝试解释一下:首先AUC值是一个概率值,当你随机挑选一个正样本以及一个负样本,当前的分类算法根据计算得到的Score值将这个正样本排在负样本前面的概率就是AUC值。当然,AUC值越大,当前的分类算法越有可能将正样本排在负样本前面,即能够更好的分类。

为什么使用ROC曲线

既然已经这么多评价标准,为什么还要使用ROC和AUC呢?因为ROC曲线有个很好的特性:当测试集中的正负样本的分布变化的时候,ROC曲线能够保持不变。在实际的数据集中经常会出现类不平衡(class imbalance)现象,即负样本比正样本多很多(或者相反),而且测试数据中的正负样本的分布也可能随着时间变化。下图是ROC曲线和Precision-Recall曲线5的对比:

ROC curve vs. Precision-Recall curve

在上图中,(a)和(c)为ROC曲线,(b)和(d)为Precision-Recall曲线。(a)和(b)展示的是分类其在原始测试集(正负样本分布平衡)的结果,(c)和(d)是将测试集中负样本的数量增加到原来的10倍后,分类器的结果。可以明显的看出,ROC曲线基本保持原貌,而Precision-Recall曲线则变化较大。

说明,文中除了第一张图来自Wikipedia外,其他的图都来自论文(Fawcett, 2006)6截图.

引用及其他链接:

  • 维基百科中对ROC的介绍: http://en.wikipedia.org/wiki/Receiver_operating_characteristic
  • ROC曲线及AUC评价指标 by 冒泡的崔:http://bubblexc.com/y2011/148/
  1. 我避免将precision,recall等评价指标翻译成中文,因为它们可能对应多个中文解释,极易产生混淆。 

  2. 图片来源:http://en.wikipedia.org/wiki/File:Roccurves.png 

  3. 这种映射不一定都是可靠的,即你不一定真的得到了某个样本是正样本的概率。 

  4. Note that "Score" is used here, not probability. For the time being, we can think that the "Score" value is the probability of a positive sample. 

  5. Davis, J., & Goadrich, M. (2006, June). The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd international conference on Machine learning (pp. 233-240). ACM. 

  6. (Fawcett, 2006),Fawcett, T. (2006). An introduction to ROC analysis. Pattern recognition letters, 27(8), 861-874. 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324964662&siteId=291194637