[Evaluation indicators] Sensitivity/specificity/PPV/NPV and other indicators principles and calculation methods

Keywords :
machine learning classification index, clinical evaluation index, accuracy/accuracy/recall/F1, sensitivity/specificity/Youden index, ROC/AUC

In machine learning classification models, the evaluation indicators usually use accuracy, precision, recall and F1 value . In clinical experiments, indicators such as sensitivity, specificity, Youden index, PPV, and NPV are usually used. There are correlations among these indicators. Seeing that some students are confused about these indicators, this article attempts to understand and explain them, in order to achieve the following purposes:

  • Be able to explain the above indicators to non-technical students in plain language;
  • Be able to understand the relationship between clinical indicators and classification model indicators;
  • Be able to use relevant tools to quickly calculate the above indicators;

1. Confusion matrix of binary classification prediction results

Use a 2x2 confusion matrix (Confusion Matrix) to represent the model prediction results, where the rows represent the actual labels and the columns represent the predicted labels. The confusion matrix divides samples into four categories, namely:

  • TP (True Positive) True example : The number of samples that are actually positive and predicted to be positive.
  • TN (True Negative) True Negative : The number of samples that are actually negative and predicted to be negative.
  • FP (False Positive) False Positive : The number of samples that are actually negative and predicted to be positive.
  • FN (False Negative) False Negative : The number of samples that are actually positive and predicted to be negative.

Memory method : the first letter T/F indicates the prediction, and the second letter P/N indicates the actual label

Actual/ Forecast predicted positive Predicted to be negative total forecast
Actually a positive example TP FN P
Actually a negative example FP TN N
actual total TP + FN FP + TN P + N

2. Description of indicators such as ACC/P/R/F1/sensitivity/specificity/Youden index

index Calculation formula explain
Accuracy ACC T P + T N T P + F P + F N + T N \frac{TP+TN}{TP+FP+FN+TN} TP+FP+FN+TNTP+TN In the total sample, the proportion of samples that are correctly predicted
Accuracy P T P T P + F P \frac{TP}{TP+FP} TP+FPTP In the positive sample, the proportion of the correct amount is predicted
Recall R T P T P + F N \frac{TP}{TP+FN} TP+FNTP Among the samples that are actually positive, the proportion predicted to be positive
F1 value 2 P × R P + R 2\frac{P\times R}{P+R} 2P+RP×R A composite metric of precision and recall
Sensitivity T P T P + F N \frac{TP}{TP+FN}TP+FNTP Among the samples that are actually positive cases, the proportion predicted to be positive cases, not being recalled means missed diagnosis
Specificity T N T N + F P \frac{TN}{TN+FP} TN+FPTN Among the samples that are actually negative cases, the proportion that is predicted to be negative cases, not being recalled means misdiagnosis
Youden index S e n s i t i v i t y + S p e c i f i c i t y − 1 Sensitivity + Specificity -1 Sensitivity+Specificity1 A composite measure of sensitivity and specificity
Positive predictive value (PPV) T P T P + F P \frac{TP}{TP+FP} TP+FPTP Equivalent to the "accuracy rate", the proportion of the people who tested positive are really positive, and the wrong judgment is false positive
Negative predictive value (NPV) T N T N + F N \frac{TN}{TN+FN} TN + UNTN The proportion of people who tested negative were indeed negative, and false negatives were judged incorrectly
True Positive Rate (TPR) T P T P + F N \frac{TP}{TP+FN} TP+FNTP Equivalent to "sensitivity", the proportion of samples that are actually positive samples that are predicted to be positive
False Positive Rate (FPR) F P T N + F P \frac{FP}{TN+FP} TN+FPFP 1 − 1- 1 −Specificity , the proportion of samples that are actually negative samples that are predicted to be positive
AUC The area under the ROC curve. The ROC curve is a curve composed of TPR and FPR under different thresholds. AUC is not affected by the proportion of positive and negative samples. It reflects the overall performance of the model under different thresholds.

Index relationship

1. 正例召回率Recall = 敏感性Sensitivity = 真正例率TPR
2. 精确率Precision = 阳性预测值PPV
3. 假正例FPR = 1 - 特异性Specificity
4. F1 = 1/P + 1/R = 2*P*R/(P+R)
5. 约登指数YoudenIndex =  敏感性Sensitivity + 特异性Specificity −1

In machine learning, we usually only care about the identification of "positive cases". The corresponding indicators are precision rate P, recall rate R, and comprehensive index F1 value . For example, in the field of risk control, positive cases are risky materials, and negative cases are normal materials. , if the positive example fails to be recalled, it may bring more serious risk omissions, so the requirements for R are higher. If the recall rate (specificity) of the negative example is considered, the amount of negative examples will be far greater than the amount of positive examples , the indicator will naturally be high.
In the medical field, a positive case corresponds to a "positive" and a negative case corresponds to a "negative". Sensitivity represents the recall rate of a positive case. Failure to recall means "misdiagnosis". PPV and NPV represent the accuracy rate of positive identification and negative identification rate respectively. PPV discrimination error means false positive, that is, misdiagnosis, and NPV discrimination error means false negative, that is, missed diagnosis. In medical diagnosis, both missed diagnosis and misdiagnosis are very serious. events, so these indicators reflecting positive and negative recognition effects are commonly used indicators.

3. Calculation tools

Open source address : https://github.com/donote/youden_index

For specific usage, please refer to the demo:

from youden import youden_index

# 生成随机标签和预测概率
np.random.seed(42)
y_true = np.random.randint(0, 2, size=50)
y_score = np.random.rand(50)
df, mj_val, mf1_val, auc = youden_index(y_true, y_score, pos_label=1, step=5)
print(df)    # df中保存了各个指标,具体参考github

The execution results are as follows:

 Thr    ACC    PPV    NPV Sens(Rec/TPR)   Spec YoudenIdx     F1 TrueBen TrueMal PredBen PredMal      TP      FP      TN      FN
0.000  0.540  0.540  0.000         1.000  0.000     0.000  0.701  27.000  23.000  50.000   0.000  27.000  23.000   0.000   0.000
0.050  0.540  0.543  0.500         0.926  0.087     0.013  0.685  27.000  23.000  46.000   4.000  25.000  21.000   2.000   2.000
0.100  0.580  0.571  0.625         0.889  0.217     0.106  0.696  27.000  23.000  42.000   8.000  24.000  18.000   5.000   3.000

References:
https://blog.csdn.net/xu624735206/article/details/121849981
https://mp.weixin.qq.com/s/qYS9wkWAV1jC47hQzZaXPA

Synchronous update to : AI gas station

---------------- END ----------------

Guess you like

Origin blog.csdn.net/iling5/article/details/130526176