ROC AUC principle and under sklearn Detailed

ROC stands Receiver operating characteristic.

definition

TPR: true positive rate, positive ratio of correctly classified samples, i.e., TP / (TP + FN), the better it is generally desirable to

FPR: false negtive rage, the negative ratio of the sample classification error, i.e. FP / (FP + TN), the smaller the better it is generally desirable to

ROC curves: the X-axis to FPR, TPR y-axis

Principle roc_curve functions and calculations

To make ROC curve, and the corresponding need to calculate FPR TPR.

For a given predicted probability, set different thresholds to predict the result will be different. For example, we predict the correct set of samples and the results of the threshold value in the threshold value of 0.3 or more than 0.5, the resulting prediction is entirely different. And FPR ROC curve is calculated and different thresholds corresponding to TPR.

For example to https://www.w3cschool.cn/doc_scikit_learn/scikit_learn-modules-generated-sklearn-metrics-roc_curve.html?lang=en

>>> import numpy as np
>>> from sklearn import metrics
>>> y = np.array([1, 1, 2, 2])
>>> scores = np.array([0.1, 0.4, 0.35, 0.8])
>>> fpr, tpr, thresholds = metrics.roc_curve(y, scores, pos_label=2)
>>> fpr
array([ 0. ,  0.5,  0.5,  1. ])
>>> tpr
array([ 0.5,  0.5,  1. ,  1. ])
>>> thresholds
array([ 0.8 ,  0.4 ,  0.35,  0.1 ])

roc_curve score taken from the four values ​​as a threshold value, this threshold is determined by, and obtained fpr tpr at different thresholds, and tpr make use fpr ROC curve.

auc principle and calculation :

Full AUC Area Under the Curve, i.e. the area under the ROC curve. sklearn This value is calculated by the trapezoidal method. Auc above example code is as follows:

>>> metrics.auc(fpr, tpr)
0.75

roc_auc_score principle and calculation :

In binary classification problem, roc_auc_score the results are the same, are calculated AUC.

In multi-category, there are two calculations: One VS Rest and One VS One, in multi_class parameter and respectively ovr ovo.

ovr: 3 to classification, for example, confusion matrix is ​​divided into three layers, the first layer C1 and the exclusion of other classes of classes C1, C2 of the second layer to the exclusion of other classes and categories of C2, C3 class and the third layer exclusion of other classes C3, as shown:

In this case, the need to specify how to obtain the total score, average parameter sklearn has four options:

micro: calculated together in all classes. which is

$ TPR= \frac{TP1+TP2+TP3}{TP1+FN1+TP2+FN2+TP3+FN3} $

$ FPR= \frac{FP1+FP2+FP3}{FP1+TN1+FP2+TN2+FP3+TN3} $

Then use this as ROC curves obtained score

macro: the value of each layer is the same distribution rights. which is

$ TPR= \frac{1}{3}(\frac{TP1}{TP1+FN1}+\frac{TP2}{TP2+FN2}+\frac{TP3}{TP3+FN3}) $

$ FPR= \frac{1}{3}(\frac{FP1}{FP1+TN1}+\frac{FP2}{FP2+TN2}+\frac{FP3}{FP3+TN3}) $

weighted: class representing a percentage in the sample as a weight, calculate TPR and FPR.

$ TPR= \frac{TP1}{TP1+FN1}w_1+\frac{TP2}{TP2+FN2}w_2+\frac{TP3}{TP3+FN3}w_3 $

$ FPR= \frac{FP1}{FP1+TN1}w_1+\frac{FP2}{FP2+TN2}w_2+\frac{FP3}{FP3+TN3}w_3 $

sample: For very uneven sample classes, the method can be employed

Guess you like

Origin www.cnblogs.com/webbery/p/12123148.html