Model Evaluation (1) AUC Dafa

question:

  • What is AUC
  • What can AUC be used for
  • How to solve AUC (deep understanding of AUC)

What is AUC

Confusion matrix

The confusion matrix is ​​the basis for understanding most evaluation metrics, and undoubtedly the basis for understanding AUC. Rich materials introduce the concept of confusion matrix. Here is a classic diagram to explain what confusion matrix is. 
write picture description here 
Obviously, the confusion matrix contains four parts of information: 
1. True negative (TN), called the true negative rate, indicating that the actual number of negative samples predicted to be negative samples 
2. False positive (FP), called the false positive rate, Indicates the number of samples that are actually predicted to be positive samples from negative samples 
3. False negative (FN), called the false negative rate, indicates the number of samples that are actually predicted to be negative samples from positive samples 
4. True positive (TP), called the true positive rate, Indicates the number of samples that are actually positive samples predicted to be positive samples

Compared with the confusion matrix, it is easy to understand the relationships and concepts, but over time, it is easy to forget the concepts. Might as well divide the memory into two parts according to the position. The first part is True/False, which means true and false, which means the correctness of the prediction. The latter part is positive/negative, which means positive and negative samples, which means the result of the prediction, so , the confusion matrix can be expressed as a set of correctness-prediction results . Now let's look at the concepts of the above four parts (all represent the number of samples, omitted below): 
1. TN, the prediction is a negative sample, the 
prediction is correct 2. FP, the prediction is a positive sample, the prediction is wrong 
3. FN, the prediction is Negative sample, the prediction is wrong 
4. TP, the prediction is a positive sample, the prediction is right

Almost all evaluation metrics I know are based on confusion matrix, including accuracy, precision, recall, F1-score, and of course AUC.

ROC curve

In fact, it is not so easy to figure out what AUC is at once, first we have to start with the ROC curve . For a two-class classifier , the output label (0 or 1) often depends on the probability of the output and a predetermined probability threshold. For example, the common threshold is 0.5. If it is greater than 0.5, it is considered a positive sample, and if it is less than 0.5, it is considered a negative sample. sample. If you increase this threshold, the probability of prediction error (for positive samples, that is, the prediction is a positive sample but the prediction is wrong, the same below) will decrease, but the probability of correct prediction will also decrease; if you decrease this Threshold, then the probability of correct prediction will increase but at the same time the probability of wrong prediction will also increase. In fact, the selection of this threshold also reflects the classification ability of the classifier to a certain extent . Of course, we hope that no matter how large the threshold is selected, the classification can be as correct as possible, that is, the stronger the classification ability of the classifier, the better, to a certain extent, it can be understood as a kind of robust ability
In order to visually measure this classification ability, the ROC curve was born! As shown in the figure below, it is a ROC curve (the original data of the curve will be introduced in the third part). Now the concern is: 
- Horizontal axis: False Positive Rate (FPR) 
- Vertical axis: True Positive Rate (TPR)

write picture description here 
- The false positive rate, which is simply and popularly understood, is the possibility that the prediction is a positive sample but the prediction is wrong. Obviously, we do not want this indicator to be too high.

 

- The true positive rate represents the possibility that the prediction is a positive sample but the prediction is correct. Of course, we hope that the higher the true positive rate, the better.

 

显然,ROC曲线的横纵坐标都在[0,1]之间,自然ROC曲线的面积不大于1。现在我们来分析几个特殊情况,从而更好地掌握ROC曲线的性质: 
- (0,0):假阳率和真阳率都为0,即分类器全部预测成负样本 
- (0,1):假阳率为0,真阳率为1,全部完美预测正确,happy 
- (1,0):假阳率为1,真阳率为0,全部完美预测错误,悲剧 
- (1,1):假阳率和真阳率都为1,即分类器全部预测成正样本 
- TPR=FPR,斜对角线,预测为正样本的结果一半是对的,一半是错的,代表随机分类器的预测效果

于是,我们可以得到基本的结论:ROC曲线在斜对角线以下,则表示该分类器效果差于随机分类器,反之,效果好于随机分类器,当然,我们希望ROC曲线尽量除于斜对角线以上,也就是向左上角(0,1)凸。

AUC(Area under the ROC curve)

ROC曲线一定程度上可以反映分类器的分类效果,但是不够直观,我们希望有这么一个指标,如果这个指标越大越好,越小越差,于是,就有了AUC。AUC实际上就是ROC曲线下的面积。AUC直观地反映了ROC曲线表达的分类能力。 
- AUC = 1,代表完美分类器 
- 0.5 < AUC < 1,优于随机分类器 
- 0 < AUC < 0.5,差于随机分类器

AUC能拿来干什么

从作者有限的经历来说,AUC最大的应用应该就是点击率预估(CTR)的离线评估。CTR的离线评估在公司的技术流程中占有很重要的地位,一般来说,ABTest和转全观察的资源成本比较大,所以,一个合适的离线评价可以节省很多时间、人力、资源成本。那么,为什么AUC可以用来评价CTR呢?我们首先要清楚两个事情: 
1. CTR是把分类器输出的概率当做是点击率的预估值,如业界常用的LR模型,利用sigmoid函数将特征输入与概率输出联系起来,这个输出的概率就是点击率的预估值。内容的召回往往是根据CTR的排序而决定的。 
2. AUC量化了ROC曲线表达的分类能力。这种分类能力是与概率、阈值紧密相关的,分类能力越好(AUC越大),那么输出概率越合理,排序的结果越合理。

我们不仅希望分类器给出是否点击的分类信息,更需要分类器给出准确的概率值,作为排序的依据。所以,这里的AUC就直观地反映了CTR的准确性(也就是CTR的排序能力)

AUC如何求解

步骤如下: 
1. 得到结果数据,数据结构为:(输出概率,标签真值) 
2. 对结果数据按输出概率进行分组,得到(输出概率,该输出概率下真实正样本数,该输出概率下真实负样本数)。这样做的好处是方便后面的分组统计、阈值划分统计等 
3. 对结果数据按输出概率进行从大到小排序 
4. 从大到小,把每一个输出概率作为分类阈值,统计该分类阈值下的TPR和FPR 
5. 微元法计算ROC曲线面积、绘制ROC曲线

代码如下所示:

import pylab as pl
from math import log,exp,sqrt
import itertools
import operator

def read_file(file_path, accuracy=2):
    db = []  #(score,nonclk,clk)
    pos, neg = 0, 0 #正负样本数量
    #读取数据
    with open(file_path,'r') as fs:
        for line in fs:
            temp = eval(line)
            #精度可控
            #score = '%.1f' % float(temp[0])
            score = float(temp[0])
            trueLabel = int(temp[1])
            sample = [score, 0, 1] if trueLabel == 1 else [score, 1, 0]
            score,nonclk,clk = sample
            pos += clk #正样本
            neg += nonclk #负样本
            db.append(sample)
    return db, pos, neg

def get_roc(db, pos, neg):
    #按照输出概率,从大到小排序
    db = sorted(db, key=lambda x:x[0], reverse=True)
    file=open('data.txt','w')
    file.write(str(db))
    file.close()
    #计算ROC坐标点
    xy_arr = []
    tp, fp = 0., 0.
    for i in range(len(db)):
        tp += db[i][2]
        fp += db[i][1]
        xy_arr.append([fp/neg,tp/pos])
    return xy_arr

def get_AUC(xy_arr):
    #计算曲线下面积
    auc = 0.
    prev_x = 0
    for x,y in xy_arr:
        if x != prev_x:
            auc += (x - prev_x) * y
            prev_x = x
    return auc

def draw_ROC(xy_arr):
    x = [_v[0] for _v in xy_arr]
    y = [_v[1] for _v in xy_arr]
    pl.title("ROC curve of %s (AUC = %.4f)" % ('clk',auc))
    pl.xlabel("False Positive Rate")
    pl.ylabel("True Positive Rate")
    pl.plot(x, y)# use pylab to plot x and y
    pl.show()# show the plot on the screen

 数据:提供的数据为每一个样本的(预测概率,真实标签)tuple 

Data link: https://pan.baidu.com/s/1c1FUzVM , password 1ax8 
calculation result: AUC=0.747925810016, which is basically consistent with the calculated value of roc_AUC in Spark  MLLib 
bigger

Summarize

  1. The ROC curve reflects the classification ability of the classifier, combined with the accuracy of the output probability of the classifier
  2. AUC quantifies the classification ability of the ROC curve. The larger the classification, the better the classification effect and the more reasonable the output probability.
  3. AUC is often used for offline evaluation of CTR. The larger the AUC, the stronger the sorting ability of CTR.

References

Many big cows have their own knowledge and understanding of AUC. What is the meaning of AUC here, and give some  answers from big cows that 
can help them understand AUC [1] From  how do you understand auc in machine learning and statistics?

write picture description here

[2]  How to understand the auc in From machine learning and statistics?

write picture description here

[3]  What are the advantages and disadvantages of From precision, recall, F1 value, ROC, and AUC?

write picture description here

[4]  How high is From's AUC considered high?

write picture description here

Some other reference materials: 
Using Python to draw ROC curve and AUC value to calculate  
precision and recall rate, RoC curve and PR curve  
ROC and AUC introduction and how to calculate AUC  
evaluation index based on confusion matrix  
Machine learning performance evaluation index

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326502313&siteId=291194637