1. TP, FP, FN, TN
True Positive
Meeting the following three conditions is considered TP
1. The confidence is greater than the threshold (the category has a threshold, and IoU also has a threshold to determine whether the bouding box is suitable)
2. The prediction type matches the label type (category prediction is correct)
3. The predicted IoU of Bouding Box and Ground Truth is greater than the threshold (the boxes are correct). When there are multiple pre-selected boxes that meet the conditions, select the one with the highest confidence as TP and the rest as FP.
False Positive
1. The predicted category does not match the actual label type (classification error)
2. The IoU of the predicted Bounding box and Ground Truth is less than the threshold (the box is not so good, and the positioning is wrong)
False Negative
Striving for classification and positioning correctly, but was detected as a negative sample
True Negative
The number of detected negative samples is too many, and the vast majority of frames are of this type. It will not be used when calculating precision and recall later, so the quantity of this thing will not be counted.
2. Precision和Recall
Precision = TP/(TP+FP) = TP/(the number of all the positive examples I judged)
Recall = TP/(TP+FN) = TP/(the number of positive examples in all worlds)
3. PR curve
The Precision-Recall curve is based on the change of the threshold value from 0 to 1, and the precision and recall values of the model under each threshold are plotted as vertical and horizontal coordinates respectively. (Each threshold θ corresponds to a (Precision, Recall) point, and connecting these points is the PR curve)
For example, suppose we collect 20 sample data, and their true labels and confidence levels are as follows
At this time, in order to draw the PR curve, we calculated the coordinates of the points above and below the PR curve.
Threshold=0.9 —— TP =len([ #1, ]) = 1 ; FP=0 ; FN =len([#2, #4, #5, #6, #9, #11, #13, #17 , #19]) =9 —— Precision =TP/(TP+FP)=1/(1+0)= 1 —— Recall =TP/(TP+FN)=1/(1+9) = 0.1
Threshold 0.8 —— TP =len([#1,#2])= 2 ; FP=0 ; FN=len([#4, #5, #6, #9, #11, #13, # 17, # 19]) =8 —— Precision =2/(2+0)= 1 ; Recall =2/(2+8)= 0.2
Threshold 0.7 —— TP =len([#1,#2])= 2 ; FP =len([#3]) =1 , #_of_True=10 —— Precision =2/(2+1)= 0.67 ; Recall =2/10=0.2
Threshold 0.6 —— TP =len([#1, #2, #4])= 3 ; FP =len([#3]) =1 , #_of_True=10—— Precision =3/(3+1)= 0.75; Recall = 3/10= 0.3
Threshold 0.5 - TP =len([#1, #2, #4, #5, #6, #9])= 6 ; FP =len([#3, #7, #8, #10]) = 4 ,#_of_True=10—— Precision =6/(6+4)=0.6; Recall = 6/10= 0.6
Threshold 0.4 - TP =len([#1, #2, #4, #5, #6, #9, #11])= 7 ; FP = len ([#3, #7, #8, #10 ])= 4 , #_of_True=10—— Precision =7/(7+4)= 0.64 ; Recall = 7 /10= 0.7
Threshold 0.3 —— TP =len([#1, #2, #4, #5, #6, #9, #11, #13, #17, #19])= 10 ; FP =len ( [ # 3 , #7, #8, #10, #12, #14, #15, #16, #18])= 9; #_of_True=10; —— Precision = 10/(10+9)= 0.53 ; Recall = 10/10= 1
You can calculate the answer using the following code from sklearn
import numpy as np
from sklearn.metrics import precision_recall_curve
# 导入数据
y_true = np.array([1,1,0,1,1,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0])
y_scores = np.array([0.9,0.8,0.7,0.6,0.55,0.54, 0.53,0.52,0.51,0.505, 0.4, 0.39,0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])
# 计算出每个阈值下,precision和recall的值
precision, recall, thresholds = precision_recall_curve(y_true, y_scores)
# 写上这两行,中间的列就不会被用省略号省略,都显示出来
pd.options.display.max_rows = None
pd.options.display.max_columns = None
#整理成横向的dataframe,方便大家查看
precision = pd.DataFrame(precision).T
recall = pd.DataFrame(recall).T
thresholds= pd.DataFrame(thresholds).T
#纵向拼接
results = pd.concat([thresholds, recall,precision], axis=0)
# 仅仅保留2位小数
results = round(results, 2)
#行名改一下
results.index = ["thresholds", "recall", "precision"]
print(results)
The precision-recall curve drawn is as follows
import matplotlib.pyplot as plt
def plot_pr_curve(recall, precision):
plt.plot(recall, precision, label='Precision-Recall curve')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('PR Curve')
plt.legend()
plot_pr_curve(recall, precision)
For jagged graphics like this in the PR image, we generally use the operation of smoothing the jagged edges. The so-called smooth sawtooth operation is on the Recall axis , for the Recall point calculated for each threshold θ, see who has the largest Precision on its right side (including itself), and then use this Precision value in this interval , as shown in the example below , Recall=(0, 0.4] all use Precision=1, Recall= (0.4, 0.8] all use Precision=0.57, Recall= (0.8, 1) all use Precision=0.5
Therefore, the graphics in our example should look like this after being smoothed by jagged edges.
The contradictory relationship between Precision and Recall
In the lower right corner of the picture above, recall is high, which means that 99% of all murderers have been caught, which will result in a lot of arrests. Everyone who had anything to do with the murder was arrested. This will inevitably lead to a very low proportion of all those arrested who are actually criminals, which means low precision.
OK, if you want a higher proportion of people arrested by the police to actually be criminals (that is, a higher precision). Then the police would not dare to arrest people randomly, and they would not dare to arrest people without sufficient evidence. What's the result? That is, a large number of well-hidden criminals have been missed, which means that the proportion of all criminals who are actually caught has been reduced. That is to say, the recall is low. This situation corresponds to the position in the upper left corner of the picture above.
How to evaluate the quality of a different model
Since the precision and recall of a model are in a trade-off relationship, and it is impossible for both to be great at the same time, how to judge which model is better? The answer is that the more convex the PR curve is toward the upper right corner of the curve, the better the model will be. As shown in the picture above, both red line A and black line B are better than model C.
But here comes the question, which one is better between model A and model B? That leads to the concept of "Balanced Error Point" (BEP). The balance point is the point on the curve where the Precision value = Recall value, that is, the three points in the picture above. The larger the coordinates of the balance point, the better the model.
In addition to the balance point, the F1 score can also be used to evaluate. F1 = 2 * P * R / ( P + R ) . F1-score takes the P value and R value into consideration, and is the harmonic mean of precision and recall. Similarly, the larger the F1 value, the better the performance of the learner.
4. From PR curve to AP
The formula for AP is as follows
The actual meaning of the AP indicator:
AP is the average of various Precision values in the case of "different Recall values" caused by "different thresholds". AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:
The AP value can be understood as the sum of the areas from the PR curve downward and to the left to the X-axis and Y-axis.
The difference between this threshold and the previous threshold is the width of the rectangle, and the precision value is the height of the rectangle. Width times height is the area of a rectangular column.
AP This is the integral of the PR curve, which is the area
Use the sklearn formula to calculate this
from sklearn.metrics import average_precision_score
AP = average_precision_score(y_true, y_scores, average='macro', pos_label=1, sample_weight=None)
print(AP)
# 0.7357475805927818
AP=(0.2-0)× 1 + (0.5-0.2)× 0.83 + (0.6-0.5) × 0.67 + (0.7-0.6)×0.64 +(0.8-0.7)×0.62 +(1-0.8)×0.53=0.7480
There is an error of 0.1 between what was calculated by the above program, but the difference is not big in general, so I won’t go into the small details of the miscalculation.
5. From AP to mAP
Average the AP values of each category (such as cars, pedestrians, buses, bicycles)
6. From mAP to mAP50, mAP75, mAP95
mAP also has IoU, and mAP50 also has IoU. What is the relationship between the two?
The IoU in mAP refers to whether the predicted label on the category is True or False.
The IoU in mAP50 refers to the judgment of the IoU between the predicted bbox and the ground truth bbox in the positioning to be counted as True or False in love with positioning
mAP
What you don’t understand yet
What is the ROC curve? What does it mean? how to use?
The same problem with AUC curve
Reference
PR curve and ROC curve_roc curve and pr curve_THE@JOKER's blog-CSDN blog
Why is Average Precision (AP) the area under the PR curve? - Mark Lue's answer - Zhihu https://www.zhihu.com/question/422868156/answer/1523130474
sklearn.metrics.average_precision_score — scikit-learn 1.3.0 documentation
[CV] Map calculation in target detection - Zhihu
COCO - Common Objects in Context
https://www.cnblogs.com/ywheunji/p/13376090.html
Precision, recall and mAP, AP50/75_map and ap50_dagongji10's blog-CSDN blog