Model assessment method used

1 model commonly used assessment method?

Typically, a single score can not fully assess a machine learning model. Only the good and bad from the true way to evaluate a scene to assess the model, is a defective. The following describes the common assessment methods classification and regression model.

Common assessment methods classification models:

Common assessment methods regression model:

2 confusion matrix

3 error rate and accuracy

  1. Error rate (Error Rate): the number of samples misclassified proportion of the total number of samples.
  2. Precision (accuracy): number of samples correctly classified the proportion of the total number of samples.

4 precision and recall

The algorithm to predict the results into four cases:

  1. Certainly correct (True Positive, TP): prediction is true, actually true
  2. Correct negative (True Negative, TN): prediction is false, actually false
  3. False positives (False Positive, FP): prediction is true, the actual false
  4. False negatives (False Negative, FN): prediction is false, actually true

then:

Precision ratio (Precision) = TP / (TP + FP)

Understand : the forecast for the positive specimens, the number right there. The difference between accuracy (correctly predicted samples, including correctly predicted positive and negative, the proportion of the total sample).
Example, we predict malignancy in all patients, in fact, the percentage of patients with malignant tumors, the higher the better.

Recall (Recall) = TP / (TP + FN)

Comprehension : correctly predicted as the ratio of the number of positive samples in the total number of positive.
Cases, in fact, all patients had malignant tumors, successfully predicted the percentage of patients with malignant tumors, the higher the better.

5 ROC and AUC

ROC stands for "receiver operating characteristic" (Receiver Operating Characteristic).

Area under the ROC curve is AUC (Area Under Curve).

AUC is a measure of "binary classification" machine learning algorithm performance (generalization).

ROC curve, by setting a plurality of different continuous variable threshold value, thereby calculating a series of true and false positive rate, then a false positive rate of abscissa, ordinate really was plotted under the curve The larger the area, the higher the inference accuracy. On the ROC curve, the coordinates of the point closest to the upper left of Figure false positive rate and the real rate higher threshold.

For the classification, or classification algorithm, evaluation mainly Precision, Recall, F-score. Below is an example of a ROC curve.

The abscissa is the ROC curve False Positive Rate (FPR), the ordinate is the True Positive Rate (TPR). Wherein
\ [TPR = \ frac {TP
} {TP + FN}, \ \ FPR = \ frac {FP} {FP + TN} \] The following highlights the ROC curve and the four points in a line.
The first point (0,1), i.e., FPR = 0, TPR = 1, which means FN (False Negative) = 0, and FP (False Positive) = 0. Means this is a perfect classification, it will all samples were correctly classified.
The second point (1,0), that FPR = 1, TPR = 0, means that this is a worst classification, because it successfully avoids all the correct answers.
The third point (0,0), i.e., FPR = TPR = 0, i.e., FP (False Positive) = TP ( True Positive) = 0, the classifier can be found in all samples predicted negative samples (Negative).
The fourth point (1,1), i.e., FPR = TPR = 1, the classifier predicted practically all samples as the positive samples.
After the above analysis, ROC curve closer to the upper left corner, the better the performance of the classifier.

Area under the ROC curve is called covered performance AUC (Area Under Curve), determination can be more intuitive learner, AUC greater the better the performance.

6 How to draw ROC curve

​ 下图是一个示例,图中共有20个测试样本,“Class”一栏表示每个测试样本真正的标签(p表示正样本,n表示负样本),“Score”表示每个测试样本属于正样本的概率。

步骤:
1、假设已经得出一系列样本被划分为正类的概率,按照大小排序。
2、从高到低,依次将“Score”值作为阈值threshold,当测试样本属于正样本的概率大于或等于这个threshold时,我们认为它为正样本,否则为负样本。举例来说,对于图中的第4个样本,其“Score”值为0.6,那么样本1,2,3,4都被认为是正样本,因为它们的“Score”值都大于等于0.6,而其他样本则都认为是负样本。
3、每次选取一个不同的threshold,得到一组FPR和TPR,即ROC曲线上的一点。以此共得到20组FPR和TPR的值。
4、根据3、中的每个坐标点,画图。

7 如何计算TPR,FPR

7.1 分析数据

y_true​ = [0, 0, 1, 1];scores = [0.1, 0.4, 0.35, 0.8];

7.2 列表

7.3 将截断点依次取为score值,计算TPR和FPR。

当截断点为0.1时:
说明只要score>=0.1,它的预测类别就是正例。 因为4个样本的score都大于等于0.1,所以,所有样本的预测类别都为P。
scores = [0.1, 0.4, 0.35, 0.8];y_true = [0, 0, 1, 1];y_pred = [1, 1, 1, 1];
正例与反例信息如下:

由此可得:
TPR = TP/(TP+FN) = 1; FPR = FP/(TN+FP) = 1;

当截断点为0.35时:
scores = [0.1, 0.4, 0.35, 0.8];y_true = [0, 0, 1, 1];y_pred = [0, 1, 1, 1];
正例与反例信息如下:

由此可得:
TPR = TP/(TP+FN) = 1; FPR = FP/(TN+FP) = 0.5;

当截断点为0.4时:
scores = [0.1, 0.4, 0.35, 0.8];y_true = [0, 0, 1, 1];y_pred = [0, 1, 0, 1];
正例与反例信息如下:

由此可得:
TPR = TP/(TP+FN) = 0.5; FPR = FP/(TN+FP) = 0.5;

当截断点为0.8时:
scores = [0.1, 0.4, 0.35, 0.8];y_true = [0, 0, 1, 1];y_pred = [0, 0, 0, 1];

正例与反例信息如下:

由此可得:
TPR = TP/(TP+FN) = 0.5; FPR = FP/(TN+FP) = 0;

7.4 根据TPR、FPR值,以FPR为横轴,TPR为纵轴画图。

8 如何计算AUC

  • 将坐标点按照横坐标FPR排序 。
  • 计算第\(i\)个坐标点和第\(i+1\)个坐标点的间距\(dx\)
  • 获取第\(i\)或者\(i+1\)个坐标点的纵坐标y。
  • 计算面积微元\(ds=ydx\)
  • 对面积微元进行累加,得到AUC。

9 为什么使用Roc和Auc评价分类器

​ 模型有很多评估方法,为什么还要使用ROC和AUC呢?
​ 因为ROC曲线有个很好的特性:当测试集中的正负样本的分布变换的时候,ROC曲线能够保持不变。在实际的数据集中经常会出现样本类不平衡,即正负样本比例差距较大,而且测试数据中的正负样本也可能随着时间变化。

10 直观理解AUC

​ 下图展现了三种AUC的值:

​ AUC是衡量二分类模型优劣的一种评价指标,表示正例排在负例前面的概率。其他评价指标有精确度、准确率、召回率,而AUC比这三者更为常用。
​ 一般在分类模型中,预测结果都是以概率的形式表现,如果要计算准确率,通常都会手动设置一个阈值来将对应的概率转化成类别,这个阈值也就很大程度上影响了模型准确率的计算。
​ 举例:
​ 现在假设有一个训练好的二分类器对10个正负样本(正例5个,负例5个)预测,得分按高到低排序得到的最好预测结果为[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],即5个正例均排在5个负例前面,正例排在负例前面的概率为100%。然后绘制其ROC曲线,由于是10个样本,除去原点我们需要描10个点,如下:

​ 描点方式按照样本预测结果的得分高低从左至右开始遍历。从原点开始,每遇到1便向y轴正方向移动y轴最小步长1个单位,这里是1/5=0.2;每遇到0则向x轴正方向移动x轴最小步长1个单位,这里也是0.2。不难看出,上图的AUC等于1,印证了正例排在负例前面的概率的确为100%。

​ 假设预测结果序列为[1, 1, 1, 1, 0, 1, 0, 0, 0, 0]。

​ 计算上图的AUC为0.96与计算正例与排在负例前面的概率0.8 × 1 + 0.2 × 0.8 = 0.96相等,而左上角阴影部分的面积则是负例排在正例前面的概率0.2 × 0.2 = 0.04。

​ 假设预测结果序列为[1, 1, 1, 0, 1, 0, 1, 0, 0, 0]。

​ 计算上图的AUC为0.88与计算正例与排在负例前面的概率0.6 × 1 + 0.2 × 0.8 + 0.2 × 0.6 = 0.88相等,左上角阴影部分的面积是负例排在正例前面的概率0.2 × 0.2 × 3 = 0.12。

Guess you like

Origin www.cnblogs.com/P3nguin/p/11242551.html