YOLOv1 Meditations

YOLOv1 Meditations

Question 1:
Why are only positive samples considered when calculating BBox-loss and class-loss? What about negative samples?

Answer:
When BBox-loss and class-loss, only positive samples are considered, which is a bit similar to the RCNN part in Faster RCNN. From the RPN, only positive samples are used to calculate the loss of the RCNN part.
yolov1 sets a parameter confidence objectness for detecting foreground or background. This parameter is the key to balancing positive and negative samples.

Question 2:
Why B (B=2) prediction boxes?

Question 3:
How to determine positive and negative samples?
In Yolov1, a picture is divided into 7x7 grid cells (called grids), which contain the grid in the center of the object target, which is marked as a positive sample, and vice versa.

Question 4: What is the relationship between the IOU of the prediction box and the GT box?
In the yolo series, only positive samples will be used to calculate the loss of bbox and class, and negative samples will not be considered. The
target value of Pr (objectness) confidence is 1/0, or the IOU / 0, 1 or IOU of the prediction box and GT box represents a positive sample, and 0 represents a negative sample. This place is the same as the author's, and I think it is enough to take 1; but the paper uses IOU, which leads to many incomprehensible details later, which is a particularly headache.
Understanding a Real-Time Object Detection Network: You Only Look Once (YOLOv1)
The analysis of this article, taking IOU, can make model learning more realistic.
insert image description here

The IOU of the prediction box and the GT box have two purposes:
first, there are two prediction boxes, but there is only one real box, so you can only choose one to calculate the bbox loss. How to choose? Take the box with the largest IOU
Second, as the GT value of the confidence,

Question 5:
How to use Pr(objectness) confidence in prediction?
The Pr(objectness)=1 mentioned in the paper refers to "positive samples" from the current technical point of view.
First of all, it should be clear that YOLO has three predictions: objectness, class, and bbox. Among them, objectness is a binary classification, that is, whether there is an object or no object, that is, the "confidence of the bounding box", corresponding to the "C" in the loss function, the label without the object is obviously 0, and the label with the object can be directly Give 1, and you can also calculate the IoU between the currently predicted bbox and gt as the label of the object. Note that this IoU is the label of objectness prediction learning, and it has nothing to do with YOLO. Class is the category prediction, only the grid cell at the positive sample will be trained (that is, the place where Pr(objectness)=1, note that this Pr(objectness)=1 is the place where the positive sample is pointed out, it has nothing to do with IoU, and YOLO It doesn't matter, it's only related to the label, because which grid the center point of the gt box falls on, which grid is the positive sample, that is, Pr(objectness)=1). The same is true for bbox.

In the test phase, YOLO will output a total of three predictions, whether there are objectness predictions, class predictions, and bbox predictions. First, we calculate score=objectness*class as the score of each bounding box. The Pr(class)*IoU written in the paper is actually this. This IoU is the objectness prediction, because the positive sample label of objectness in the training process is IoU , so it can be considered that objectness implies the concept of IoU, but the essence is the prediction of whether there is an object.

Don't forget that objectness has learned to judge whether there is an object in each grid cell during the training process. Then, obviously, for places with objects, objectness will be very close to 1, and the accurate prediction of class should also be very close to 1. If there is no object In places, objectness will be very close to 0, and class will predict blindly. After all, it only calculates the class loss of the object part in the training phase, but obviously objectness will play a leading role, even if class blindly predicts a value very close to 1, objectness Knowing that there is no object here, so a value close to 0 will be given, and the score will be very close to 0.

Question 6:
How to calculate the c, x, y, w, and h of the positive sample. This is a question that I want to think about urgently in every target detection paper, because it involves loss calculation and the final prediction box (like manual Mark that kind of box)
c stands for confidence, as mentioned above

Let's take a look at x, y, w, h
x, y, is the offset relative to the grid, and the value is (0,1).
w, h are normalized relative to the entire graph. Note that when predicting, w, h may exceed the graph, so it may be greater than 1.

reference

Guess you like

Origin blog.csdn.net/u010006102/article/details/126866631