Target detection evaluation index mAP: from Precision, Recall, PR curve to AP

1. TP, FP, FN, TN

True Positive

Meeting the following three conditions is considered TP

        1. The confidence is greater than the threshold (the category has a threshold, and IoU also has a threshold to determine whether the bouding box is suitable)

        2. The prediction type matches the label type (category prediction is correct)

        3. The predicted IoU of Bouding Box and Ground Truth is greater than the threshold (the boxes are correct). When there are multiple pre-selected boxes that meet the conditions, select the one with the highest confidence as TP and the rest as FP.

False Positive

       1. The predicted category does not match the actual label type (classification error)

        2. The IoU of the predicted Bounding box and Ground Truth is less than the threshold (the box is not so good, and the positioning is wrong)

False Negative

        Striving for classification and positioning correctly, but was detected as a negative sample

True Negative

        The number of detected negative samples is too many, and the vast majority of frames are of this type. It will not be used when calculating precision and recall later, so the quantity of this thing will not be counted.

2. Precision和Recall

Precision = TP/(TP+FP) = TP/(the number of all the positive examples I judged)

Recall = TP/(TP+FN) = TP/(the number of positive examples in all worlds)

3. PR curve

The Precision-Recall curve is based on the change of the threshold value from 0 to 1, and the precision and recall values ​​of the model under each threshold are plotted as vertical and horizontal coordinates respectively. (Each threshold θ corresponds to a (Precision, Recall) point, and connecting these points is the PR curve)

For example, suppose we collect 20 sample data, and their true labels and confidence levels are as follows

 At this time, in order to draw the PR curve, we calculated the coordinates of the points above and below the PR curve.

Threshold=0.9 —— TP =len([ #1, ]) = 1 ; FP=0 ; FN =len([#2, #4, #5, #6, #9, #11, #13, #17 , #19]) =9 —— Precision =TP/(TP+FP)=1/(1+0)= 1 —— Recall =TP/(TP+FN)=1/(1+9) = 0.1

Threshold 0.8 —— TP =len([#1,#2])= 2 ; FP=0 ; FN=len([#4, #5, #6, #9, #11, #13, # 17, # 19]) =8 —— Precision =2/(2+0)= 1 ; Recall =2/(2+8)= 0.2

Threshold 0.7 —— TP =len([#1,#2])= 2 ; FP =len([#3]) =1 , #_of_True=10  —— Precision =2/(2+1)= 0.67 ; Recall =2/10=0.2

Threshold 0.6 —— TP =len([#1, #2, #4])= 3FP =len([#3]) =1 , #_of_True=10—— Precision =3/(3+1)= 0.75; Recall = 3/10= 0.3

Threshold 0.5 - TP =len([#1, #2, #4, #5, #6, #9])= 6FP =len([#3, #7, #8, #10]) = 4 ,#_of_True=10—— Precision =6/(6+4)=0.6; Recall = 6/10= 0.6

Threshold 0.4 - TP =len([#1, #2, #4, #5, #6, #9, #11])= 7 ; FP = len ([#3, #7, #8, #10 ])= 4 , #_of_True=10—— Precision =7/(7+4)= 0.64 ; Recall = 7 /10= 0.7

Threshold 0.3 —— TP =len([#1, #2, #4, #5, #6, #9, #11, #13, #17, #19])= 10 ; FP =len ( [ # 3 , #7, #8, #10, #12, #14, #15, #16, #18])= 9; #_of_True=10; —— Precision = 10/(10+9)= 0.53 ; Recall = 10/10= 1

You can calculate the answer using the following code from sklearn

import numpy as np
from sklearn.metrics import precision_recall_curve

# 导入数据
y_true = np.array([1,1,0,1,1,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0])
y_scores = np.array([0.9,0.8,0.7,0.6,0.55,0.54, 0.53,0.52,0.51,0.505, 0.4, 0.39,0.38, 0.37, 0.36, 0.35, 0.34, 0.33, 0.30, 0.1])

# 计算出每个阈值下,precision和recall的值
precision, recall, thresholds = precision_recall_curve(y_true, y_scores)

# 写上这两行,中间的列就不会被用省略号省略,都显示出来
pd.options.display.max_rows = None
pd.options.display.max_columns = None


#整理成横向的dataframe,方便大家查看
precision = pd.DataFrame(precision).T
recall = pd.DataFrame(recall).T
thresholds= pd.DataFrame(thresholds).T
#纵向拼接
results = pd.concat([thresholds, recall,precision], axis=0)
# 仅仅保留2位小数
results = round(results, 2)
#行名改一下
results.index = ["thresholds", "recall", "precision"]
print(results)

The precision-recall curve drawn is as follows

import matplotlib.pyplot as plt
def plot_pr_curve(recall, precision):
    plt.plot(recall, precision, label='Precision-Recall curve')
    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.title('PR Curve')
    plt.legend()
plot_pr_curve(recall, precision)

For jagged graphics like this in the PR image, we generally use the operation of smoothing the jagged edges. The so-called smooth sawtooth operation is on the Recall axis , for the Recall point calculated for each threshold θ, see who has the largest Precision on its right side (including itself), and then use this Precision value in this interval , as shown in the example below , Recall=(0, 0.4] all use Precision=1, Recall= (0.4, 0.8] all use Precision=0.57, Recall= (0.8, 1) all use Precision=0.5

Therefore, the graphics in our example should look like this after being smoothed by jagged edges.

The contradictory relationship between Precision and Recall

        In the lower right corner of the picture above, recall is high, which means that 99% of all murderers have been caught, which will result in a lot of arrests. Everyone who had anything to do with the murder was arrested. This will inevitably lead to a very low proportion of all those arrested who are actually criminals, which means low precision.

        OK, if you want a higher proportion of people arrested by the police to actually be criminals (that is, a higher precision). Then the police would not dare to arrest people randomly, and they would not dare to arrest people without sufficient evidence. What's the result? That is, a large number of well-hidden criminals have been missed, which means that the proportion of all criminals who are actually caught has been reduced. That is to say, the recall is low. This situation corresponds to the position in the upper left corner of the picture above.

How to evaluate the quality of a different model

        Since the precision and recall of a model are in a trade-off relationship, and it is impossible for both to be great at the same time, how to judge which model is better? The answer is that the more convex the PR curve is toward the upper right corner of the curve, the better the model will be. As shown in the picture above, both red line A and black line B are better than model C.

        But here comes the question, which one is better between model A and model B? That leads to the concept of "Balanced Error Point" (BEP). The balance point is the point on the curve where the Precision value = Recall value, that is, the three points in the picture above. The larger the coordinates of the balance point, the better the model.

        In addition to the balance point, the F1 score can also be used to evaluate. F1 = 2 * P * R / ( P + R ) . F1-score takes the P value and R value into consideration, and is the harmonic mean of precision and recall. Similarly, the larger the F1 value, the better the performance of the learner.

4. From PR curve to AP

The formula for AP is as follows

\text{AP} = \sum_n (R_n - R_{n-1}) P_n

The actual meaning of the AP indicator:

        AP is the average of various Precision values ​​in the case of "different Recall values" caused by "different thresholds". AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:

        The AP value can be understood as the sum of the areas from the PR curve downward and to the left to the X-axis and Y-axis.

        The difference between this threshold and the previous threshold is the width of the rectangle, and the precision value is the height of the rectangle. Width times height is the area of ​​a rectangular column.

        AP This is the integral of the PR curve, which is the area

Use the sklearn formula to calculate this

from sklearn.metrics import average_precision_score
AP = average_precision_score(y_true, y_scores, average='macro', pos_label=1, sample_weight=None)
print(AP)

# 0.7357475805927818

AP=(0.2-0)× 1 + (0.5-0.2)× 0.83 + (0.6-0.5) × 0.67 + (0.7-0.6)×0.64 +(0.8-0.7)×0.62 +(1-0.8)×0.53=0.7480

There is an error of 0.1 between what was calculated by the above program, but the difference is not big in general, so I won’t go into the small details of the miscalculation.

5. From AP to mAP

Average the AP values ​​of each category (such as cars, pedestrians, buses, bicycles)

6. From mAP to mAP50, mAP75, mAP95

mAP also has IoU, and mAP50 also has IoU. What is the relationship between the two?

        The IoU in mAP refers to whether the predicted label on the category is True or False.

        The IoU in mAP50 refers to the judgment of the IoU between the predicted bbox and the ground truth bbox in the positioning to be counted as True or False in love with positioning

        mAP

What you don’t understand yet

        What is the ROC curve? What does it mean? how to use?

        The same problem with AUC curve

Reference

PR curve and ROC curve_roc curve and pr curve_THE@JOKER's blog-CSDN blog

Why is Average Precision (AP) the area under the PR curve? - Mark Lue's answer - Zhihu https://www.zhihu.com/question/422868156/answer/1523130474

sklearn.metrics.average_precision_score — scikit-learn 1.3.0 documentation

[CV] Map calculation in target detection - Zhihu 

COCO - Common Objects in Context

https://www.cnblogs.com/ywheunji/p/13376090.html

Precision, recall and mAP, AP50/75_map and ap50_dagongji10's blog-CSDN blog

Guess you like

Origin blog.csdn.net/Albert233333/article/details/132752216