YOLOv1 Meditations
Question 1:
Why are only positive samples considered when calculating BBox-loss and class-loss? What about negative samples?
Answer:
When BBox-loss and class-loss, only positive samples are considered, which is a bit similar to the RCNN part in Faster RCNN. From the RPN, only positive samples are used to calculate the loss of the RCNN part.
yolov1 sets a parameter confidence objectness for detecting foreground or background. This parameter is the key to balancing positive and negative samples.
Question 2:
Why B (B=2) prediction boxes?
Question 3:
How to determine positive and negative samples?
In Yolov1, a picture is divided into 7x7 grid cells (called grids), which contain the grid in the center of the object target, which is marked as a positive sample, and vice versa.
Question 4: What is the relationship between the IOU of the prediction box and the GT box?
In the yolo series, only positive samples will be used to calculate the loss of bbox and class, and negative samples will not be considered. The
target value of Pr (objectness) confidence is 1/0, or the IOU / 0, 1 or IOU of the prediction box and GT box represents a positive sample, and 0 represents a negative sample. This place is the same as the author's, and I think it is enough to take 1; but the paper uses IOU, which leads to many incomprehensible details later, which is a particularly headache.
Understanding a Real-Time Object Detection Network: You Only Look Once (YOLOv1)
The analysis of this article, taking IOU, can make model learning more realistic.
The IOU of the prediction box and the GT box have two purposes:
first, there are two prediction boxes, but there is only one real box, so you can only choose one to calculate the bbox loss. How to choose? Take the box with the largest IOU
Second, as the GT value of the confidence,
Question 5:
How to use Pr(objectness) confidence in prediction?
The Pr(objectness)=1 mentioned in the paper refers to "positive samples" from the current technical point of view.
First of all, it should be clear that YOLO has three predictions: objectness, class, and bbox. Among them, objectness is a binary classification, that is, whether there is an object or no object, that is, the "confidence of the bounding box", corresponding to the "C" in the loss function, the label without the object is obviously 0, and the label with the object can be directly Give 1, and you can also calculate the IoU between the currently predicted bbox and gt as the label of the object. Note that this IoU is the label of objectness prediction learning, and it has nothing to do with YOLO. Class is the category prediction, only the grid cell at the positive sample will be trained (that is, the place where Pr(objectness)=1, note that this Pr(objectness)=1 is the place where the positive sample is pointed out, it has nothing to do with IoU, and YOLO It doesn't matter, it's only related to the label, because which grid the center point of the gt box falls on, which grid is the positive sample, that is, Pr(objectness)=1). The same is true for bbox.
In the test phase, YOLO will output a total of three predictions, whether there are objectness predictions, class predictions, and bbox predictions. First, we calculate score=objectness*class as the score of each bounding box. The Pr(class)*IoU written in the paper is actually this. This IoU is the objectness prediction, because the positive sample label of objectness in the training process is IoU , so it can be considered that objectness implies the concept of IoU, but the essence is the prediction of whether there is an object.
Don't forget that objectness has learned to judge whether there is an object in each grid cell during the training process. Then, obviously, for places with objects, objectness will be very close to 1, and the accurate prediction of class should also be very close to 1. If there is no object In places, objectness will be very close to 0, and class will predict blindly. After all, it only calculates the class loss of the object part in the training phase, but obviously objectness will play a leading role, even if class blindly predicts a value very close to 1, objectness Knowing that there is no object here, so a value close to 0 will be given, and the score will be very close to 0.
Question 6:
How to calculate the c, x, y, w, and h of the positive sample. This is a question that I want to think about urgently in every target detection paper, because it involves loss calculation and the final prediction box (like manual Mark that kind of box)
c stands for confidence, as mentioned above
Let's take a look at x, y, w, h
x, y, is the offset relative to the grid, and the value is (0,1).
w, h are normalized relative to the entire graph. Note that when predicting, w, h may exceed the graph, so it may be greater than 1.
- YOLO Introductory Tutorial: YOLOv1(2)-Analysis of YOLOv1
- Understanding a Real-Time Object Detection Network: You Only Look Once (YOLOv1)
reference
-
When the Yolo slideshow
reproduced the code in 18 years, I didn't miss this slideshow -
The author of the graphic YOLO
combined the "YOLO Slideshow" and made an analysis. In 2018, it was a very good analysis material -
Real-time Object Detection with YOLO, YOLOv2 and now YOLOv3
When I reproduced yolov1 and yolov2 in 2018, I read this article. In the absence of Chinese analysis, the illustrations of the article are very clear and the idea is very good. It is recommended to read more. -
YOLO introductory tutorial: YOLOv1(2)-Analysis
The author of YOLOv1 has analyzed the Yolo series of papers very well, and at the same time he has done code reproduction and improvement himself, which is a good learning material. yjh0410 The author's own github address
The author also wrote an article about the target detection series Target detection series -
Understanding a Real-Time Object Detection Network: You Only Look Once (YOLOv1)
The original yolov1, analysis. -
You must have never seen such an easy-to-understand model interpretation of the YOLO series (from v1 to v5) (Part 1).
The author starts with the gourd baby, analyzes the principles of the yolo series in detail, and manually marks the code. Note that it is not the original yolo.
It's an article that has been processed by the author and evolved based on the principle of yolo .