[Pedestrian detection] miss rate versus false positives per image (FPPI) Past and Present (Theory)

I. Introduction

Recently, I am doing pedestrian detection related things. I used an indicator called miss rate versus false positives per-image (FPPI), but I searched the Internet and found that there was very little relevant information, so I would sort out the knowledge related to this indicator. It is also convenient for subsequent people to use this. If there is something wrong, please correct me.

二、miss rate versus false positives per window (FPPW)

Insert picture description here

Before introducing miss rate versus false positives per-image (hereinafter referred to as FPPI), I have to talk about another indicator called
miss rate versus false positives per window (hereinafter referred to as FPPW).

At first, everyone used FPPW as an indicator for evaluating pedestrian detection. This indicator first appeared in the article Histograms of Oriented Gradients for Human Detection . In this article, the INRIA pedestrian data set is published. This FPPW is used when evaluating performance (it is worth mentioning that the classic pedestrian detection method HOG+SVM is also proposed in this article)

The following briefly introduces the detection principle of FPPW:

In pedestrian detection, because we only focus on the detection frame, the results can be mainly divided into three situations:

  1. If the ground truth is [there is a pedestrian], and the detection detects the pedestrian, the detection result is true positive
  2. If the ground truth is [there is a pedestrian], but the detection does not detect the pedestrian, then the pedestrian is false negative, which means that the detection is missed
  3. If the ground truth is [no pedestrian], but the detection result is that there is a pedestrian, then the pedestrian is a false positive, that is, air is detected and false detection

Among them, positive = true positive+false negative, which is the number of real pedestrians.
True negative means [no pedestrian] and the detection result is no pedestrian, but this is meaningless, we don't care about this, so we don't need to use it when constructing evaluation indicators.

(If you think TP and TN are easy to confuse, you can refer to this memory method: [Machine Learning] TP, TN, FP, FN memory method )

The vertical axis of FPPW is miss rate, and the horizontal axis is false positives per window. Both axes are represented by logarithmic axes:

  1. miss rate = false negative / positive, namely 1-recall, which means that among all the existing pedestrians (positive), how many pedestrians are missed (false negative)
  2. false positives per window = false positive / the number of window

Why use the number of window, because it is related to the principle of HOG+SVM, and his detection process is roughly like this:

  1. Enter a picture to be detected
  2. First, use the sliding window method to select a certain area on the picture (this area is called window later)
  3. Extract the HOG features of this area
  4. Input HOG features into SVM, use SVM to classify, judge whether it is a pedestrian

Through the above process, we can see that because SVM is only used as a classifier, if you want to detect pedestrians of different sizes, you need to use many windows of different sizes to slide in the sliding window method, and each slide corresponds to One window, so many windows are born, and each window corresponds to a prediction result of SVM.

For a picture, we are concerned about whether SVM can judge these windows accurately, so using false positive / the number of window, we can evaluate the detection performance of SVM on this picture.

How to get multiple miss rate and fppw values?

This is similar to the ROC curve routine, that is, a series of miss rate and fppw can be obtained by adjusting the detection threshold.

For example, the higher the threshold, it means that only the detection frame with higher confidence can be regarded as the output of the detector. Therefore, the fewer actual detection frames output, the more accurate the detection frame, and the lower the probability of detecting air. The greater the probability of missed detection (true positive with low confidence becomes false negative), so the miss rate increases and the fppw decreases. The opposite is the same.

The above is the calculation method when there is only one picture, but it is similar for multiple pictures. First put the results of all pictures together, sort them according to the confidence level from high to low, and then adjust the detection threshold according to the level of confidence, thus get a series of miss rate and fppw, and then divide by the number of window (at this time the number of window is the number of windows on each picture * the number of pictures).

(Examples of adjusting the detection threshold are given below)

How does FPPW quantify the comparison?

Because there is no way to quantify the comparison between the curve and the curve, the author uses FPPW = 1 0 − 4 时 10^{−4}时10 The miss rate at 4 o'clockis used as the reference point for comparison of results (the position is similar to the AUC value in the ROC curve).

The above is the general principle of FPPW.

In the original text, the author said that the FPPW indicator is very sensitive to miss rate changes, that is, if the miss rate changes a little, the fppw on the horizontal axis will change very much. For example, every 1% decrease in miss rate is equivalent to a 1.57 times decrease in the original fppw.

三、miss rate versus false positives per image (FPPI)

Insert picture description here

FPPW was introduced earlier, but FPPW has the following problems:

  1. It cannot reflect the performance of false positives in different size and location spaces, that is, it is impossible to know the performance of the classifier in detecting the vicinity of the target or the performance of the classifier in a background similar to the target.
    Because we cannot know where the window is in the image from the per window, nor can we know the size of the window, the amount of useful information we can get about the window with per window is not large, so per window does not No special advantage
  2. The FPPW indicator is hard to understand, because the concept of per window is too close to the underlying detection principle. According to normal thinking, we are actually more curious about "for each picture, what is the false detection rate" , we will think more macroscopically , Close to the actual application scenario, instead of caring about the detection of each window

Therefore, in the article Pedestrian detection: A benchmark , the author proposed FPPI as a more suitable pedestrian detection measure.

The main benefits of FPPI are as follows:

  1. The concept of per image is closer to real life and better understood

The following briefly introduces the detection principle of FPPI:

The vertical axis of FPPI is miss rate, and the horizontal axis is false positives per image. Both axes are represented by logarithmic axes:

  1. miss rate = false negative / positive,即1-recall
  2. false positives per image = false positive / the number of image

We can find that, in fact, only the horizontal axis has changed, but the vertical axis is actually the same.

Similarly, we can also adjust the threshold to get a series of miss rate and fppi.

How does FPPI quantify the comparison?

Similarly, there is no way to quantify comparison between curves, so at the beginning, use the miss rate when FPPI=1 as the reference point for comparison of results.

But in the follow-up paper Pedestrian Detection: An Evaluation of the State of the Art (the authors of both papers are Piotr Dollar), the author changed to use log-average miss rate as the reference point for comparison of results, the calculation method is :

In the logarithmic coordinate system, from 1 0 − 2 10^{-2}102 to1 0 0 10^010Take 9 FPPI values ​​evenly between 0 , these 9 FPPI values ​​will correspond to 9 miss rate values, and average these 9 miss rate values ​​to get the log-average miss rate.

(For some curves that have ended early before reaching a specific FPPI value, the miss rate value is the minimum value that the curve can reach)

What is a curve that ends early?

We can look at the picture below. The
Insert picture description here
second-to-last purple HogLbp has not reached 1 0 0 10^0.100 has ended early, and you will find that different curves have different lengths. Why is this happening? In fact, this is related to the output of different detectors.

Because the curve in the FPPI diagram is essentially composed of a group of [fppi,mr] points, these points are connected to form a curve, and the curve ends early, indicating that the maximum fppi value of these points cannot reach 1 0 0 10 ^0100 . And a group of [fppi, mr] points are obtained by adjusting the threshold of the detector. The threshold selection method of the detector is determined according to the number of detection frames output by the detector and the confidence level.

For example, detector A detects 3 pictures, and it outputs a total of 10 detection frames on these 3 pictures. Each detection frame has a corresponding confidence level. We sort these detection frames from high to low according to the confidence level. , For example: 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45.

We first choose 0.9 as the threshold of the detector. If the detection frame is greater than or equal to 0.9, we think that there are pedestrians and the detection frame is lower than 0.9. We think that there are no pedestrians, so we get a [fppi,mr] point.

Next, we choose 0.85 as the detector's threshold. If the detection frame is greater than or equal to 0.85, we think that there are pedestrians and the detection frame is lower than 0.85. We think that there are no pedestrians, so we get a [fppi,mr] point.

By analogy, until 0.45, we can get a total of 10 [fppi,mr] points, that is, how many detection frames the detector outputs, we can get as many [fppi,mr] points. When the threshold is 0.45, assuming that the value of the [fppi,mr] point corresponding to detector A is [0.8, 0.25], then at this time, the curve of detector A can only be drawn up to fppi=0.8, so it will not reach 1 0 0 10^0100 out

But if you change a detector, maybe they output the result of the detection frame. For example, detector B detects the same 3 pictures, assuming that it also outputs a total of 10 detection frames on these 3 pictures, according to the above steps, detector B will output 10 [fppi,mr] points. When the threshold is 0.45, assuming that the value of the [fppi,mr] point corresponding to detector B is [1.5, 0.25], then the curve of detector B will exceed 1 0 0 10^0 when drawing the curve100

(The number of detection frames output by different detectors may be different, which is also an influencing factor. In order to simplify the description, it is not considered in the above example)

In fact, to put it plainly, the performance of different detectors is different. When the threshold of the detector selects the lowest confidence level, essentially all the detection frames we consider the detection result as pedestrians. At this time, if some detectors have many false detection frames, then the fppi it can achieve will be Relatively high; if some detectors have fewer false detection frames, the maximum fppi that it can reach will also be lower, so the fppi upper bounds of different detectors are different.

The above is the general principle of FPPI.

Fourth, get the FPPI curve from the ROC curve

In the code actually drawn FPPI curve, the authors use is compRoc, plotRocthese words with ROC words to write, then we take a look at how the author is theoretically obtained by the ROC's FPPI

Insert picture description here

The y-axis of the ROC curve is TPR (True positive rate), and the x-axis is FPR (False positive rate):

  • TPR = TP / ( TP + FN )
  • FPR = FP / ( FP + TN )

The calculation formula of TPR and recall is the same, so we can consider TPR=recall

The y-axis of the FPPI curve is miss rate, and the x-axis is fppi (false positives per image):

  • miss rate = FN / ( TP + FN )
  • fppi = FP / the number of image

About y-axis conversion

miss rate = ( TP + FN - TP) / ( TP + FN ) = 1 - recall = 1 - TPR

So just subtract the y value of the ROC curve from 1, you can get the y value of the FPPI curve

About x-axis conversion

In the compRocfunction, for the ROC curve, the calculated y-axis is fppi! This is actually a question of noun understanding. . . .

Usually, when we hear the ROC curve, what we think of is a curve where the y-axis is TPR and the x-axis is FPR.

But in the author's code, the ROC he refers to refers to a curve with TPR on the y-axis and fppi on the x-axis. So there is no conversion between FPR and fppi like the y-axis, because the author directly calculates fppi

We can also understand from another perspective why he named it this way. In fact, the FPR of the ROC curve and the fppi of the FPPI curve are very similar in nature. The molecules are both FP, and these two indicators focus on misdetection.

The difference between the two is only in the denominator. The denominator of FPR is "all negative cases", and the denominator of fppi is "all pictures".

This is also easy to understand, because in the pedestrian detection task, the number of "all negative examples" is too much! There are only a few pedestrians in a picture (that is, "positive cases"). Where there are no pedestrians, it can be regarded as "negative cases". We can't determine this number.

Therefore, in the pedestrian detection task, FPR cannot be calculated, so fppi is used to evaluate the false detection situation

Finally, I suggest again that you must learn these concepts by memorizing machine learning! ( [Machine Learning] TP, TN, FP, FN memory methods )


In the next article [Pedestrian Detection] miss rate versus false positives per image (FPPI) Past and Present (Practical Part-Part 1) , we will run through the author's code for drawing the FPPI diagram and interpretation of the source code

Guess you like

Origin blog.csdn.net/weixin_38705903/article/details/109654157