Interpretation of YOLOv6-3.0-target detection paper


Paper: "YOLOv6 v3.0: A Full-Scale Reloading"
github: https://github.com/meituan/YOLOv6
reference YOLOv6

Summary

In YOLOv6 v3.0, YOLOv6-N reached 37.5AP, 1187FPS;
YOLOv6-S reached 45AP, 484FPS;
expanded backbone and neck, YOLOv6-M/L reached 50/52.8AP, and the time consumption was basically unchanged;
YOLOv6-L6 achieved real-time target The detection reaches SOTA; YOLOv6 is compared with other versions in Figure 1.
insert image description here
The contribution of YOLOv6 is summarized as follows:
1. Update the neck to RepBi-PAN, introduce the SiC module and SimCSPSPPF Block;
2. Use the AAT (Anchor-Aided Training) strategy that has no effect on time consumption;
3. YOLOv6 adds a stage to the backbone and neck , to strengthen the performance under high-resolution input;
4. Introduce a self-supervised strategy to improve the performance of the YOLOv6 small model, use the high-parameter DFL branch to assist the training regression branch during training, and remove it during inference to avoid time-consuming increase.

algorithm

2.1 Network Design

Based on PAN, the author proposes the Bi-directional Concatenation (BiC) module, as shown in Figure 2, which integrates the features of the backbone Ci-1 layer and Pi layer, and more accurate position signals are preserved, which is conducive to small target positioning.
The author simplifies the SPPF block to SimCSPSPPF Block to enhance the expression ability. The neck in YOLOv6 is defined as RepBi-PAN.
insert image description here

2.2 Anchor auxiliary training

The author found that the anchor-based YOLOv6-N is superior to the anchor-free scheme, as shown in Table 1.
insert image description here
The author proposes the anchor-assisted training scheme (AAT) combined with the advantages of anchor-based and anchor-free, as shown in Figure 3, the auxiliary branch and the anchor-free branch during training Independently calculate the loss, the auxiliary branch can help optimize the anchor-free head, remove the auxiliary branch during inference, improve performance, and the speed remains unchanged.
insert image description here

2.3 Self-distillation

The self-supervised loss function in the previous version of YOLOv6 is shown in Equation 1, and DFL is used for the distillation frame regression branch.
insert image description here
The early teacher model of distillation uses soft labels, and hard labels are more suitable for training. Therefore, the author uses cosine weight decay for the distillation weights, as shown in Equation 3. DFL will affect the model reasoning speed. For this, the author designs
insert image description here
Decoupled Localization Distillation (DLD), distillation When , the student is equipped with the original regression branch and the auxiliary branch combined with DFL, the teacher only uses the auxiliary branch, the original regression branch is trained with hard labels, and the auxiliary branch is updated with hard labels and the teacher model; after distillation, the auxiliary branch is removed.

experiment

The author uses FP16 accuracy to compare various schemes. The results are shown in Table 2 and Figure 1.
insert image description here
YOLOv6-N surpasses YOLOv5-N/YOLOv7-Tiny 9.5%/4.2%;
YOLOv6-S surpasses YOLOX-S/PPYOLOE-S 3.5%/ 0.9%, and takes less time;
YOLOv6-M surpasses YOLOv5-M 4.6;
YOLOv6-L surpasses YOLOX-L/PPYOLOE-L 3.1%/1.4%;
compared with the YOLOv8 series, the performance is close.

Similar to YOLOv5, the author adds the C6 layer to the backbone to detect larger targets, and the neck is adjusted accordingly, named YOLOv6-N6/S6/M6/L6 respectively; the experimental results are shown in Table 2. Compared with YOLOv5
, the performance is improved and the reasoning speed Basically unchanged;
compared with YOLOv7-E6E, the performance of YOLOv6-L6 is improved by 0.4, and the time consumption is shortened by 63%;

Ablation experiment

The ablation experiment is shown in Table 3. BiC+SimCSPSPPF improves the performance by 0.6%; AAT improves the performance by 0.3%; DLD improves the performance by 0.7%. The BiC
insert image description here
module impact experiment is shown in Table 4. Inserting BiC in the PAN top-down path, YOLOv6-S/ L performance is improved by 0.6%/0.4%; however, inserting the bottom-up path will bring gain. The author analyzes that the BiC in the bottom-up path will cause the detection head to confuse features of different scales; Table 5 shows the impact of different types of SPP blocks, SimSPPF
insert image description here
* 3 means that P3, P4 and P5 layers use SimSPPF blocks, and SimSPPCSPC surpasses SimSPPF by 1.6%/0.3% on YOLOv6-N/S, but the time consumption increases; on
YOLOv6-N/S/M, SimCSPSPPF surpasses SimSPPF by 1.1%/0.4 %/0.1%;
Considering the performance and time-consuming balance, the author uses SimCSPSPPF in YOLOv6-N/S, and SimSPPF blocks in YOLOv6-M/L;
insert image description here
as shown in Table 6, anchor auxiliary training (AAT) in YOLOv6-S/M/L , bringing 0.3%/0.5%/0.5% performance improvement; the performance of small targets on YOLOv6-N/S/M is significantly improved; Table 7 shows that weight decay
insert image description here
on YOLOv6-L improves performance by 0.6%;
insert image description here
Table 8 shows that in DLD on YOLOv6-S brings 0.7% performance improvement;
insert image description here

in conclusion

The author further improved YOLOv6 and reached SOTA in the field of real-time target detection.

Guess you like

Origin blog.csdn.net/qq_41994006/article/details/129150299