What improvements does YOLOv8 have over the previous generation?

The YOLO series has been updated again!

All I can say is that the YOLO series is developing so fast that it can’t keep up!

YOLOv1-YOLOv8 series review

YOLOv1: 2015 Joseph Redmon and Ali Farhadi et al (University of Washington)

YOLOv2: 2016 Joseph Redmon and Ali Farhadi et al (University of Washington)

YOLOv3: 2018 Joseph Redmon and Ali Farhadi et al (University of Washington)

YOLOv4: 2020 Alexey Bochkovskiy and Chien-Yao Wang et al.

YOLOv5: Ultralytics Company of 2020

YOLOv6: Meituan in 2022

YOLOv7: 2022 Alexey Bochkovskiy and Chien-Yao Wang et al.

YOLOv8: Ultralytics Company of 2023

Let’s take a look at the classic YOLOv5 first

Backbone:CSPDarkNet structure, the main structural idea is reflected in the C3 module, which is also where the main idea of ​​gradient diversion is located;

PAN-FPN: Dual-stream FPN must be good and fast, but quantization still requires graph optimization to achieve optimal performance, such as scale optimization before and after cat. Wait, in addition to the upsampling and CBS convolution modules, the most important thing here is the C3 module (remember this C3 module);

Head:Coupled Head+Anchor-base. There is no doubt that YOLOv3, YOLOv4, YOLOv5, and YOLOv7 are all Anchor-Base. Will it change in the future?

Loss:BEC Loss is used for classification and CIoU Loss is used for regression.

Not much to say, just YOLOv8!

YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds on previous YOLO success and introduces new features and improvements to further improve performance and flexibility. It can be trained on large datasets and is capable of running on a variety of hardware platforms, from CPUs to GPUs.

A key feature of YOLOv8 is its extensibility, it is designed as a framework that supports all previous versions of YOLO, making it easy to switch between different versions and compare their performance.

In addition to scalability, YOLOv8 includes many other innovations that make it an attractive choice for a variety of object detection and image segmentation tasks. These include a new backbone network, a new anchor-free detection head and a new loss function.

Overall, YOLOv8 is a powerful and flexible object detection and image segmentation tool that provides the best of both worlds: the latest SOTA technology; and the ability to use and compare all previous YOLO versions.

YOLOv8 代码:Ultralytics

权重:Ultralytics

YOLOv8 documentation: Ultralytics Docs

Based on YOLOv5, the specific improvements of v8 are as follows:

Backbone: still uses the idea of ​​CSP, but the C3 module in YOLOv5 has been replaced by the C2f module, achieving further lightweighting. At the same time, YOLOv8 still uses YOLOv5 SPPF module used in other architectures;

PAN-FPN: There is no doubt that YOLOv8 still uses the idea of ​​PAN, but by comparing the structure diagrams of YOLOv5 and YOLOv8, we can see that YOLOv8 combines PAN-FPN in YOLOv5 The convolution structure in the sampling stage was deleted, and the C3 module was replaced with the C2f module;

Decoupled-Head:Do you smell something different? Yes, YOLOv8 goes to Decoupled-Head;

Anchor-Free:YOLOv8 abandoned the previous Anchor-Base and used the idea of ​​Anchor-Free;

Loss function:YOLOv8 uses VFL Loss as classification loss and DFL Loss+CIOU Loss as classification loss;

Sample matching:YOLOv8 abandons the previous IOU matching or unilateral proportion allocation method, and instead uses the Task-Aligned Assigner matching method.

What has PAN-FPN improved?

Let’s first take a look at the structure diagram of the PAN-FPN part of YOLOv5 and YOLOv6:

The structure diagram of the Neck part of YOLOv5 is as follows:

The structure diagram of the Neck part of YOLOv6 is as follows:

Let’s look at the structure diagram of YOLOv8:

It can be seen that compared to YOLOv5 or YOLOv6, YOLOv8 replaces the C3 module and RepBlock with C2f. At the same time, you can find carefully that compared to YOLOv5 and YOLOv6, YOLOv8 chooses to remove the 1×1 convolution before upsampling, and separates the different stages of Backbone. The output features are directly fed into the upsampling operation.

What has changed in the Head part?

Let’s first take a look at the Head (Coupled-Head) of YOLOv5 itself:

YOLOv8 uses Decoupled-Head. At the same time, due to the use of the idea of ​​DFL, the number of channels of the regression head also becomes 4*reg_max:

Compare the YAML of YOLOv5 and YOLOv8

loss function

For YOLOv8, its classification loss is VFL Loss, and its regression loss is in the form of CIOU Loss+DFL, where Reg_max defaults to 16.

The main improvement of VFL is the proposed asymmetric weighting operation. Both FL and QFL are symmetric. The idea of ​​​​asymmetric weighting comes from the paper PISA, which points out that first of all, there is an imbalance problem in positive and negative samples, and there is an unequal weight problem even in positive samples, because the calculation of mAP is the main positive sample.

q is the label, when the positive sample is the IoU of bbox and gt, when the negative sample is q=0, when it is the positive sample, FL is not actually used, but ordinary BCE, but with an additional adaptive IoU weighting for Highlight the master sample. When it is a negative sample, it is standard FL. It can be clearly found that VFL is simpler than QFL. Its main features are asymmetric weighting of positive and negative samples and highlighting positive samples as the main samples.

For DFL (Distribution Focal Loss) here, it mainly models the position of the box as a general distribution, allowing the network to quickly focus on the distribution of positions close to the target position.

DFL allows the network to focus on values ​​near the target y faster, increasing their probability;

The meaning of DFL is to optimize the probability of the two positions closest to label y in the form of cross entropy, so that the network can focus on the distribution of the adjacent areas of the target position faster; that is to say, the learned distribution Theoretically, it is near the real floating point coordinates, and the weight of the distance from the left and right integer coordinates is obtained in linear interpolation mode.

sample matching

Label allocation is a very important part of target detection. In early versions of YOLOv5, MaxIOU was used as the label allocation method. However, in practice, it is found that directly using the side length ratio can also achieve the same effect. YOLOv8 abandoned the Anchor-Base method and used the Anchor-Free method, and found a matching method that replaced the side length ratio, TaskAligned.

In order to be used with NMS, the Anchor allocation of training samples needs to meet the following two rules:

Normally aligned Anchors should predict high classification scores while having precise localization;

Misaligned anchors should have low classification scores and be suppressed in the NMS stage. Based on the above two goals, TaskAligned designed a new Anchor alignment metric to measure the level of Task-Alignment at the Anchor level. Moreover, the Alignment metric is integrated into the sample allocation and loss function to dynamically optimize the prediction of each Anchor.

Anchor alignment metric:

The classification score and IoU represent the prediction effect of these two tasks, so TaskAligned uses a high-order combination of classification score and IoU to measure the degree of Task-Alignment. Use the following method to calculate the anchor-level alignment for each instance

s and u are the classification score and IoU value respectively, α and β are the weight hyperparameters. As can be seen from the above formula, t can simultaneously control the classification score and IoU optimization to achieve Task-Alignment, which can guide the network to dynamically focus on high-quality Anchors.

Training sample Assignment:

In order to improve the alignment of the two tasks, TOOD focuses on Task-Alignment Anchor and uses a simple allocation rule to select training samples: for each instance, select m Anchors with the largest t value as positive samples, and select the remaining Anchors. as a negative sample. Then, it is trained through a loss function designed for the alignment of classification and localization.

---------

V8 open source has attracted more than 600 stars in one day, which is very popular, but we still need to take a comprehensive look at star forks, etc. When a code base is just released, with marketing promotions, the star will definitely skyrocket, but it will take a while after it is released Later, the growth of the number of stars stabilized. At this time, you can actually pay attention to the star/fork ratio. All major YOLOs continue to innovate and update. You can look forward to yolov9 v10, but there is no need to blindly change models. You still have to After roughly understanding the improvement points, advantages and disadvantages, choose carefully.

Guess you like

Origin blog.csdn.net/qq_53545309/article/details/134217739