Safety helmet detection system based on YOLOv8 (2): Gold-YOLO, far ahead, assisting behavior detection | Huawei Noah NeurIPS23

Table of contents

 1.Yolov8 introduction

2. Introduction to the safety hat data set

3.Gold-YOLO

4. Analysis of training results


 1.Yolov8 introduction

         Ultralytics YOLOv8 is the latest version of the YOLO target detection and image segmentation model developed by Ultralytics. YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds on previous YOLO success and introduces new features and improvements to further improve performance and flexibility. It can be trained on large datasets and is capable of running on a variety of hardware platforms, from CPUs to GPUs.

The specific improvements are as follows:

  1. Backbone : still uses the idea of ​​CSP, but the C3 module in YOLOv5 has been replaced by the C2f module, achieving further lightweighting. At the same time, YOLOv8 still uses the SPPF module used in YOLOv5 and other architectures;

  2. PAN-FPN : There is no doubt that YOLOv8 still uses the idea of ​​​​PAN, but by comparing the structure diagrams of YOLOv5 and YOLOv8, we can see that YOLOv8 deletes the convolution structure in the PAN-FPN upsampling stage in YOLOv5, and also removes C3 The module is replaced by the C2f module;

  3. Decoupled-Head : Do you smell something different? Yes, YOLOv8 goes to Decoupled-Head;

  4. Anchor-Free : YOLOv8 abandoned the previous Anchor-Base and used the idea of ​​Anchor-Free ;

  5. Loss function : YOLOv8 uses VFL Loss as classification loss and DFL Loss+CIOU Loss as classification loss;

  6. Sample matching : YOLOv8 abandons the previous IOU matching or unilateral proportion allocation method, and instead uses the Task-Aligned Assigner matching method.

The framework diagram is provided at the link: Brief summary of YOLOv8 model structure · Issue #189 · ultralytics/ultralytics · GitHub

2. Introduction to the safety hat data set

The data set size is 3241 images, train:val:test is randomly allocated as 7:2:1, category: hat

 

3.Gold-YOLO

Link: https://arxiv.org/pdf/2309.11331.pdf 

Problems with traditional YOLO

In the detection model, a series of features at different levels are usually first extracted through backbone. FPN takes advantage of this feature of backbone to build a corresponding fusion structure: non-level features contain position information of objects of different sizes. Although these features The information contained is different, but after being integrated with each other, these features can make up for each other's missing information, enhance the richness of information at each level, and improve network performance.

The original FPN structure allows the information of adjacent layers to be fully fused due to its layer-by-layer progressive information fusion mode, but it also leads to problems in cross-layer information fusion: when cross-layer information is interactively fused, due to the lack of direct connection The interactive channels can only rely on the middle layer to act as an "intermediary" for integration, resulting in a certain amount of information loss. Many previous works have paid attention to this problem, and the solution is usually to add more paths by adding shortcuts to enhance the information flow.

Abstract: The current YOLO series models usually use FPN-like methods for information fusion, but this structure has the problem of information loss when fusing cross-layer information. In response to this problem, we proposed a new Gather-and-Distribute Mechanism (Gather-and-Distribute Mechanism) GD mechanism, which integrates and distributes features at different levels in a global perspective to build a more comprehensive A fully efficient information interaction and fusion mechanism, and built Gold-YOLO based on the GD mechanism. In the COCO dataset, our Gold-YOLO surpasses the existing YOLO series and achieves SOTA on the accuracy-speed curve.

 

A new information interaction and fusion mechanism is proposed: Information Gather-and-Distribute Mechanism . This mechanism obtains global information by globally fusing features at different levels, and injects global information into features at different levels, achieving efficient information interaction and fusion. The GD mechanism significantly enhances the information fusion ability of the Neck part without significantly increasing the delay, and improves the model's detection ability of objects of different sizes. 

 In Gold-YOLO, in response to the need for the model to detect objects of different sizes, and to weigh accuracy and speed, we constructed two GD branches to fuse information: low-level information aggregation-distribution branch (Low-GD) and high-level The information gathering-distribution branch (High-GD) extracts and fuses feature information based on convolution and transformer respectively.

For details of the source code, see: The first violent increase in YOLOv8 on the entire network: Gold-YOLO, far ahead, surpassing all YOLO | Huawei Noah NeurIPS23_AI Little Monster's Blog-CSDN Blog

4. Analysis of training results

The training results are as follows:

[email protected] 0.897 increased to 0.913

Guess you like

Origin blog.csdn.net/m0_63774211/article/details/133513119