[yolov7 series 2] distribution strategy of positive and negative samples

This article mainly focuses on the positive and negative sample screening strategy of yolov7, and compares it with yolov5 and yolov6.

First follow up from the previous yolov7 series 1, the overall structure of the network, fill in a few small holes, I hope it will not cause trouble to everyone:
insert image description here
such as: E-ELANlayer, after cat, the conv layer needs to be used for feature fusion:
insert image description here
and SPPCSPC层after everyone’s errata, the changes are as follows :
insert image description here
There are a few other small problems: For example, the REPconv layer removes the identity layer in the yolov7 paper, and the activation function after convolution is SiLu. Because the yolov7 network is constructed based on the code of the Tag0.1 version yolov7.yaml, the author will follow up In the continuous optimization iteration, the follow-up big knife will continue to be updated.

**yolov7 is the same as yolov5 because of anchor based target detection, the matching strategy of yolov6's positive and negative samples is the same as yolox, and yolov7 basically integrates the strengths of the two. **Let's review the positive and negative sample matching strategy of yolov5 and v6 first.

yolov5's positive and negative sample matching strategy

yolov5Based on anchor based, before starting the training, based on the training set gt(ground truth 框), through the k-meansclustering algorithm, 9 anchor boxes arranged from small to large are obtained a priori. First match each gt with 9 anchors ( previously IOU matching, yolov5 becomes shape matching, calculate the aspect ratio between gt and 9 anchors, if the aspect ratio is less than the set threshold, it means the gt and the corresponding anchor Matching)
insert image description here
As shown above, the network architecture of yolov5, yolov5 has a three-layer network, 9 anchors, from small to large, every 3 anchors correspond to a layer of prediction network, and gt corresponds to the layer where the anchor is located, which is used to make training predictions for the gt , a gt may match several anchors.
Therefore, a gt may perform prediction training on different network layers, which greatly increases the number of positive samples. Of course, there will also be cases where gt does not match all anchors, so gt will be regarded as the background and will not participate in training. The size of the anchor box is not well designed.

How to define positive and negative samples during the training process, because negative samples in yolov5 do not participate in training, so the number of positive samples needs to be increased. After the gt frame matches the anchor frame, get the grid of the network layer corresponding to the anchor frame, and see which grid the gt center point falls on, not only take the anchor that matches gt in the grid as a positive sample, but also take the adjacent two The anchor in the grid is a positive sample.

As shown in the figure below**, the center point of the green gt box falls in the third quadrant of the red grid, then not only the grid, but also the grid on the left and the grid below are taken, **this is based on the three grids and the matching The anchor has three center points located at the center points of the three grids, and the length and width are positive samples of the length and width of the anchor. At the same time, gt not only matches one anchor frame, but if it matches several anchor frames, there may be 3-27 positive samples. samples, increasing the number of positive samples.

insert image description here

yolov6's positive and negative sample matching strategy

The positive and negative sample matching strategy of yolov6 is the same as yoloxthat of yolovx anchor free, and anchor free because of the lack of prior knowledge of the prior box. In theory, it should have better generalization of the scene. Also refer to the official interpretation of Megvii: Anchor adds detection Due to the complexity of the header and the number of generated results, it is intolerable for some edge devices to transfer a large number of detection results from the NPU to the CPU.

The positive sample screening in yolov6 is mainly divided into the following parts:
①: Rough screening based on two dimensions;
②: Further screening based on simOTA.

The specific steps are as follows:
insert image description here
The gt of the tie label is shown in the figure, find the center point of gt (Cx, Cy), calculate the distance from the center point to the upper left corner (l_l, l_t), and the coordinates of the lower right corner (l_r, l_b), and then use the two Step to filter positive samples:

The first step is to roughly screen the first dimension. If the center point of the grid falls in gt, the frame predicted by the grid is considered to be a positive sample, as shown in the red and orange parts **, and the second dimension is * *Using the center point of the grid where the center point of gt is located as the center point, expand the grid within 2.5 grid steps up, down, left, and right, and the frame predicted by the grid will be a positive sample by default, as shown in the purple and orange parts in the figure. In this way, 31 positive samples are screened in the first step (note: here is a single layer of positive samples, yolov6 has three network layers, and the positive samples of each layer are calculated separately and superimposed).
insert image description here
Step 2: Further screening by SimOTA:

SimOTA is an optimization based on OTA. OTA is a dynamic matching algorithm. For details, please refer to the official interpretation of Megvii (https://www.zhihu.com/question/473350307/answer/2021031747)

The process of SimOTA is as follows: ①Calculate
the IOU between the positive samples of the preliminary screening and gt, and sort the IOU from large to small, take the sum of the top ten and round it up, and record it as b.
②Calculate the cos cost function of the primary screening positive samples, arrange the cos cost functions from small to large, and take the sample b before cos as the positive sample.
At the same time, consider the case where the same grid prediction frame is associated with two gts, take the smaller value of cos, and the prediction frame is the positive sample of the corresponding gt.
For details, please refer to Jiang Dabai's Zhihu article: https://www.zhihu.com/search?type=content&q=simOTA

yolov7's positive and negative sample matching strategy

Because yolov7 integrates the essence of both v5 and v6 based on anchor based, that is, the first step in yolov6 is replaced by the strategy of screening positive samples in yolov5, and the second step of simOTA is retained for further screening strategy.

At the same time, there are two aux_headand lead_headtwo headin yolov7, aux_head is used as an auxiliary, and its strategy for screening positive samples is the same as lead_head, but more relaxed . For example, in the first step of screening, lead_head takes the grid where the center point is located and the prediction frames corresponding to the two grids close to it as positive samples, as shown in the green grid, and aux_head takes the center point and the four surrounding prediction frames as Positive sample. The grid in the green + blue area is shown in the figure below.
insert image description here
At the same time, in the second step of simOTA, lead_head is to calculate the IOU of the positive sample and gt for the primary screening, and sort the IOU from large to small, take the sum of the top ten and round it up, and record it as b. aux_head takes the sum of the first twenty and rounds it up . The other steps are the same, aux_head is mainly to increase the recall rate and prevent missed detection, and lead_head is further screened based on aux_head.

The above is the matching strategy of positive and negative samples of yolov7, I hope it will be helpful to everyone. At the same time, if there are bugs in the text, please discuss them together.

Finally, if you need to use the ppt in the article, please follow the official account to add WeChat in the background , receive it, and remark "ppt".

Reference:
[1] https://github.com/WongKinYiu/yolov7 (official github code)
[2] https://arxiv.org/pdf/2207.02696.pdf (yolov7 paper)
[3] ] https://zhuanlan .zhihu.com/p/39
[4] YOLOv7 official open source | Alexey Bochkovskiy platform, the accuracy and speed surpasses all YOLO, but it has to be AB (qq.com)
[5] How to evaluate the open source YOLOX of Megvii, the effect is better than YOLOv5?

Guess you like

Origin blog.csdn.net/zqwwwm/article/details/125971506