Keywords :
machine learning classification index, clinical evaluation index, accuracy/accuracy/recall/F1, sensitivity/specificity/Youden index, ROC/AUC
In machine learning classification models, the evaluation indicators usually use accuracy, precision, recall and F1 value . In clinical experiments, indicators such as sensitivity, specificity, Youden index, PPV, and NPV are usually used. There are correlations among these indicators. Seeing that some students are confused about these indicators, this article attempts to understand and explain them, in order to achieve the following purposes:
- Be able to explain the above indicators to non-technical students in plain language;
- Be able to understand the relationship between clinical indicators and classification model indicators;
- Be able to use relevant tools to quickly calculate the above indicators;
1. Confusion matrix of binary classification prediction results
Use a 2x2 confusion matrix (Confusion Matrix) to represent the model prediction results, where the rows represent the actual labels and the columns represent the predicted labels. The confusion matrix divides samples into four categories, namely:
- TP (True Positive) True example : The number of samples that are actually positive and predicted to be positive.
- TN (True Negative) True Negative : The number of samples that are actually negative and predicted to be negative.
- FP (False Positive) False Positive : The number of samples that are actually negative and predicted to be positive.
- FN (False Negative) False Negative : The number of samples that are actually positive and predicted to be negative.
Memory method : the first letter T/F indicates the prediction, and the second letter P/N indicates the actual label
Actual/ Forecast | predicted positive | Predicted to be negative | total forecast |
---|---|---|---|
Actually a positive example | TP | FN | P |
Actually a negative example | FP | TN | N |
actual total | TP + FN | FP + TN | P + N |
2. Description of indicators such as ACC/P/R/F1/sensitivity/specificity/Youden index
index | Calculation formula | explain |
---|---|---|
Accuracy ACC | T P + T N T P + F P + F N + T N \frac{TP+TN}{TP+FP+FN+TN} TP+FP+FN+TNTP+TN | In the total sample, the proportion of samples that are correctly predicted |
Accuracy P | T P T P + F P \frac{TP}{TP+FP} TP+FPTP | In the positive sample, the proportion of the correct amount is predicted |
Recall R | T P T P + F N \frac{TP}{TP+FN} TP+FNTP | Among the samples that are actually positive, the proportion predicted to be positive |
F1 value | 2 P × R P + R 2\frac{P\times R}{P+R} 2P+RP×R | A composite metric of precision and recall |
Sensitivity | T P T P + F N \frac{TP}{TP+FN}TP+FNTP | Among the samples that are actually positive cases, the proportion predicted to be positive cases, not being recalled means missed diagnosis |
Specificity | T N T N + F P \frac{TN}{TN+FP} TN+FPTN | Among the samples that are actually negative cases, the proportion that is predicted to be negative cases, not being recalled means misdiagnosis |
Youden index | S e n s i t i v i t y + S p e c i f i c i t y − 1 Sensitivity + Specificity -1 Sensitivity+Specificity−1 | A composite measure of sensitivity and specificity |
Positive predictive value (PPV) | T P T P + F P \frac{TP}{TP+FP} TP+FPTP | Equivalent to the "accuracy rate", the proportion of the people who tested positive are really positive, and the wrong judgment is false positive |
Negative predictive value (NPV) | T N T N + F N \frac{TN}{TN+FN} TN + UNTN | The proportion of people who tested negative were indeed negative, and false negatives were judged incorrectly |
True Positive Rate (TPR) | T P T P + F N \frac{TP}{TP+FN} TP+FNTP | Equivalent to "sensitivity", the proportion of samples that are actually positive samples that are predicted to be positive |
False Positive Rate (FPR) | F P T N + F P \frac{FP}{TN+FP} TN+FPFP | 1 − 1- 1 −Specificity , the proportion of samples that are actually negative samples that are predicted to be positive |
AUC | – | The area under the ROC curve. The ROC curve is a curve composed of TPR and FPR under different thresholds. AUC is not affected by the proportion of positive and negative samples. It reflects the overall performance of the model under different thresholds. |
Index relationship
1. 正例召回率Recall = 敏感性Sensitivity = 真正例率TPR
2. 精确率Precision = 阳性预测值PPV
3. 假正例FPR = 1 - 特异性Specificity
4. F1 = 1/P + 1/R = 2*P*R/(P+R)
5. 约登指数YoudenIndex = 敏感性Sensitivity + 特异性Specificity −1
In machine learning, we usually only care about the identification of "positive cases". The corresponding indicators are precision rate P, recall rate R, and comprehensive index F1 value . For example, in the field of risk control, positive cases are risky materials, and negative cases are normal materials. , if the positive example fails to be recalled, it may bring more serious risk omissions, so the requirements for R are higher. If the recall rate (specificity) of the negative example is considered, the amount of negative examples will be far greater than the amount of positive examples , the indicator will naturally be high.
In the medical field, a positive case corresponds to a "positive" and a negative case corresponds to a "negative". Sensitivity represents the recall rate of a positive case. Failure to recall means "misdiagnosis". PPV and NPV represent the accuracy rate of positive identification and negative identification rate respectively. PPV discrimination error means false positive, that is, misdiagnosis, and NPV discrimination error means false negative, that is, missed diagnosis. In medical diagnosis, both missed diagnosis and misdiagnosis are very serious. events, so these indicators reflecting positive and negative recognition effects are commonly used indicators.
3. Calculation tools
Open source address : https://github.com/donote/youden_index
For specific usage, please refer to the demo:
from youden import youden_index
# 生成随机标签和预测概率
np.random.seed(42)
y_true = np.random.randint(0, 2, size=50)
y_score = np.random.rand(50)
df, mj_val, mf1_val, auc = youden_index(y_true, y_score, pos_label=1, step=5)
print(df) # df中保存了各个指标,具体参考github
The execution results are as follows:
Thr ACC PPV NPV Sens(Rec/TPR) Spec YoudenIdx F1 TrueBen TrueMal PredBen PredMal TP FP TN FN
0.000 0.540 0.540 0.000 1.000 0.000 0.000 0.701 27.000 23.000 50.000 0.000 27.000 23.000 0.000 0.000
0.050 0.540 0.543 0.500 0.926 0.087 0.013 0.685 27.000 23.000 46.000 4.000 25.000 21.000 2.000 2.000
0.100 0.580 0.571 0.625 0.889 0.217 0.106 0.696 27.000 23.000 42.000 8.000 24.000 18.000 5.000 3.000
References:
https://blog.csdn.net/xu624735206/article/details/121849981
https://mp.weixin.qq.com/s/qYS9wkWAV1jC47hQzZaXPA
Synchronous update to : AI gas station
---------------- END ----------------