Yolo v1 notes

Personally don't understand 

1. Items 4 and 5 of the loss function

[Interpretation of the paper] Interpretation of the Yolo trilogy - Yolov1 bzdww

https://www.youtube.com/watch?v=NkFENlEb4kM&t=672s

insert image description here

Training phase:

C_i predicted value: the confidence degree confidence of the first bbox and the second bbox in 7*7*30 output by the network

C_i^hat tag value: defined in the paper as Pr(Object)*IOU (IOU of predicted frame and real frame)

ps:

1.1.C_i^hat is defined as both iou in yolov1, but after the iou used by the author of v1, v3 becomes 1. 1 does converge and is more stable

1.2. A considerable part of the code implementation also directly uses 1 as the label value

Personal understanding part:

From 1 and 2, let the bbox detection frame gradually learn the position of the real frame, and the position is close. Accelerate the approximation of the detection frame and the real frame by 3 items

1 and 2 items, let xy, wh fit better.

In the extreme cases of 3, there is no intersection between the two iou is equal to 0, the items 1 and 2 will be very large, and the loss function is very large

In the most ideal situation in 3, the coincidence iou of the two is equal to 1, so let the predicted value of C_i approach 1, which means that there are 100% objects in the sense. 1 and 2 are close to disappearing directly, and the loss of 12 is very small

(Pure conjecture) A compromise situation. There is an intersection between the two, but there are not many intersections. The author wants to learn the confidence of the predicted box bbox at this time to approach the value of IOU, because although the predicted box bbox is on xywh. Although there is a gap with the real box, it can be used It is represented by numbers, but it is not intuitive enough. The author decided to use the IOU of the predicted box bbox and the real box as the percentage of the predicted box bbox that can be trusted at this time, that is, the confidence level. That is, C_i gradually approaches the iou of the two, in other words, take iou as a trustworthy example. But the actual situation has been mentioned above, it is better to use 1 to be stable, and I guess the author will give up later, the compromise situation is not well simulated by iou, it is better to directly approach 1, that is, the situation where the two directly overlap.

2.inception modules

3.1*1 reduction 1*1 convolution layer is often used to keep the size constant but change the number of channels and disperse features

The origin of the word reduction: In deep learning, 1x1 convolution is usually used to reduce the dimension of the feature map. This process is called "1x1 convolution dimensionality reduction" or "1x1 convolution reduction (reduction)"

4. Neural network calculation process

Guess you like

Origin blog.csdn.net/qq_36632604/article/details/130334265