Accuracy, precision, recall, true rate, false positive rate, ROC/AUC

foreword

  Recently, when I saw these words, I was always confused, and it was easy to forget after reading them, so I checked some information and recorded them.
When we design a deep learning network model, we often need to evaluate it. These things are used in the evaluation. Before introducing this rate and that rate, let me first introduce what is a confusion matrix, as shown in the following table: Confusion matrix
:
insert image description here

  • P (Positive): represents 1
  • N (Negative): represents 0
  • T (True): means the prediction is correct
  • F (False): Represents prediction error

TP: The prediction is 1, the prediction is correct, that is, the actual 1
FP: The prediction is 1, the prediction is wrong, that is, the actual 0
FN: The prediction is 0, the prediction is wrong, that is, the actual 1
TN: The prediction is 0, the prediction is correct, that is, the actual 0
Simply remember this confusion matrix is ​​the former one indicating whether the prediction is correct or not, and the latter one indicating the predicted value.

1. Accuracy

After understanding the confusion matrix, let's look at the accuracy.
Accuracy:
  Accuracy is the percentage of correctly predicted results in the total samples . The formula is as follows:
Accuracy = TP + TNTP + TN + FP + FN Accuracy = \frac{TP+TN}{TP+TN+FP+FN}Accuracy=TP+TN+FP+FNTP+TN
  Although the accuracy rate can judge the overall accuracy rate, but in the case of unbalanced samples, the accuracy rate becomes helpless. For example: For example, in a sample, the proportion of positive samples is 0.95, and the proportion of negative samples is 0.05. At this time, it is only necessary to predict all samples as positive samples to obtain a 95% correct rate, which is obviously unreasonable. It is precisely because of this defect in accuracy that precision and recall, or precision and recall, were born.

2. Accuracy

  Precision (Precision) is also called precision rate. It refers to the prediction results and refers to the probability of the actual positive samples among all the samples that are predicted to be positive. It means that in the results of the predicted positive samples, how confident we can be that the prediction is correct. The formula is as follows: Precision = TPTP + FP Precision = \frac{TP}{TP+FP
}Accuracy=TP+FPTP
Accuracy is how much you think you are looking for is actually right

  Note: The precision rate represents the prediction accuracy of the positive sample results, and the accuracy rate represents the overall prediction accuracy, including both positive samples and negative samples.

3. Recall rate

  The recall rate ( Recall) is also called the recall rate, which refers to the original sample and refers to the probability of being predicted as a positive sample in the actual positive sample . The formula is as follows:
recall rate = TPTP + FN recall rate = \frac{TP}{TP+FN}recall rate=TP+FNTP
The recall rate is the probability of finding the correct one among the samples that are actually correct.

  The recall rate is also called the recall rate. It can be seen from its name that the higher the recall rate, the higher the probability of the sample that is actually expected to be predicted. Its meaning is similar: it is better to kill a thousand by mistake than to let one go.

4. The relationship between precision rate and recall rate, F1 score

  Through the above formula, we found that the numerators of the precision rate and the recall rate are   the same, but the denominators are different. One is ( ) and theTP other TP+FPis ( TP+FN) . How do we understand this curve? The precision rate indicates the probability of how many positive samples we predict are actually positive samples, and the recall rate indicates how many positive samples are actually positive samples, how many I found out. Take a logistic regression as an example. The output of logistic regression is a probability number between 0 and 1. Therefore, if we want to judge whether a user is good or bad based on this probability, we must define a threshold. Generally speaking, the greater the probability of logistic regression, the closer it is to 1, which means that he is more likely to be a bad user. For example, we define a threshold of 0.5, that is, we consider all users with a probability less than 0.5 to be good users, and those with a probability greater than 0.5 to be considered bad users. Therefore, for the threshold value of 0.5, we can get a corresponding pair of precision and recall. This threshold is defined arbitrarily. In order to find the most suitable threshold to meet our requirements, we must traverse all the thresholds between 0 and 1, and each threshold corresponds to a pair of precision and recall, so we get this curve. Here we first add a knowledge point mAP . We often see this word in target detection. It means the area under the PR curve.P-RP-R
insert image description here

5. F1 score

  Let's talk about what the F1 score is . Usually, if we want to find a balance between precision and recall, we need a new indicator: F1score. F1The score takes both the precision rate and the recall rate into consideration, so that the two can reach the highest at the same time, and a balance is taken. F1The formula of the score is:
F 1 score = 2 × precision rate × recall rate Precision rate + recall rate F1 score =\frac{2\times precision rate\times recall rate}{precision rate + recall rate}F 1 score=Precision+recall2×Precision×recall

6. Sensitivity and specificity

After talking about the various rates above, let's look at ROC/AUC   again . Before introducing ROC/AUC , we need to know two concepts: sensitivity and specificity.
Sensitivity = TPTP + FN Sensitivity = \frac{TP}{TP+FN}sensitivity=TP+FNTP
Sensitivity is recall.
Specificity = TNFP + TN Specificity = \frac{TN}{FP+TN}specificity=FP+TNTN
The specificity indicates the probability that the actual negative sample is correctly predicted as a negative sample. But we usually only care about the probability of positive samples, and we don't care about negative samples, so the following false positive rate is derived. The false positive rate is expressed as: the probability that the actual negative sample is wrongly predicted as a negative sample .

Seven. True rate and false positive rate

  Careful friends may have discovered that the sensitivity is not the recall rate, it is just a change of vest. Since we care more about positive samples, we need to see how many negative samples are incorrectly predicted as positive samples, so use (1-specificity) instead of specificity. Therefore, two concepts of true rate and false positive rate are extended:
true rate = recall rate = sensitivity = TPTP + FN true rate = recall rate = sensitivity = \frac{TP}{TP+FN}true rate=recall rate=sensitivity=TP+FNTP
False positive rate = 1 − specificity = FPFP + TN false positive rate = 1-specificity = \frac{FP}{FP+TN}false positive rate=1specificity=FP+TNFP
  Through the above definition, it can be found that the true rate and the false positive rate are based on the actual performance of 1 and 0, respectively, that is to say, they observe the relevant probability problems in the actual positive samples and negative samples respectively. Because of this, it doesn't matter whether the sample is balanced or not. Still taking the previous example, 95% of the total samples are positive samples and 5% are negative samples. We know that there is moisture in using the accuracy rate, but the true rate and the false positive rate are different. Here, the true rate only focuses on how many of the 95% positive samples are actually covered, and has nothing to do with the 5%. Similarly, the false positive rate only focuses on how many of the 5% negative samples are covered by errors, and has nothing to do with the 95%. Therefore, it can be seen that if we start from the perspective of various results of actual performance, we can avoid the problem of sample imbalance, which is why the true rate and false positive rate are selected as the ROC/AUC indicators .

Eight. ROC curve

  Let's look at the ROC curve : ROC(Receiver Operating Characteristic)the curve, also known as the receiver operating characteristic curve. This curve was first used in the field of radar signal detection to distinguish signal from noise. Later, it was used to evaluate the predictive ability of the model, and ROCthe curve was derived based on the confusion matrix.
  ROCThe two main indicators in the curve are the true rate and the false positive rate, and the benefits of such a choice are also explained above. The abscissa is the false positive rate ( FPR), and the ordinate is the true rate ( TPR). Below is a standard ROCgraph.

insert image description here
  Similar to the previous P-Rcurve, ROCthe curve also draws the entire curve by traversing all thresholds. If we continuously traverse all thresholds, the predicted positive and negative samples are constantly changing, and correspondingly ROCslide along the curve in the graph.
insert image description here
insert image description here
insert image description here

  ROCWe still encounter a problem in the curve, how to judge ROCthe quality of the curve? Changing the threshold just keeps changing the predicted number of positive and negative samples, that is, the true rate and the false positive rate, but the curve itself will not change. So how to judge the curve of a model ROCis good? This still has to return to our purpose: FPRto indicate the degree of response of the false report of the model, and TPRto indicate the degree of coverage of the model's predicted response. What we hope is of course: the less false reporting, the better, and the more coverage, the better. So to sum up, the higher the true rate and the lower the false positive rate (that is, ROCthe steeper the curve), the better the performance of the model. ROCThe curve also disregards sample imbalance.
AUC
  is the same as P-Rthe curve mAP, ROCand also accounts for the area under the curve, called AUC. Interestingly, if we connect the diagonals, its area is exactly 0.5. The actual meaning of the diagonal line is: to randomly judge response or non-response, and the positive and negative sample coverage should be 50%, indicating a random effect. The steeper the ROC curve, the better, so the ideal value is 1, a square, and the worst random judgment is 0.5, so the general AUC value is between 0.5 and 1.
The general judgment standard of AUC:

  • 0.5 - 0.7: less effective, but good enough for predicting stocks
  • 0.7 - 0.85: Moderate effect
  • 0.85 - 0.95: works well
  • 0.95 - 1: Very good, but generally unlikely

Guess you like

Origin blog.csdn.net/qq_38683460/article/details/126492686