Practical implementation of industrial small target defect detection based on Yolov8 (2): Dynamic Snake Convolution (Dynamic Snake Convolution), achieving violent gains | ICCV2023

Table of contents

1. Introduction to industrial oil pollution data set

1.1 Definition of small goals

1.2 Difficulties

 1.3 Introduction to industrial defect detection algorithms

1.3.1 YOLOv8

2.Dynamic Snake Convolution

2.1 Dynamic Snake Convolution added to yolov8

 3. Analysis of training results

4.Series


1. Introduction to industrial oil pollution data set

Samsung oil stain defect categories: hair strands and small black spots, ["TFS", "XZW"] 

Data set size: 660 images, including some good product images to improve background detection capabilities.

Data set address: https://download.csdn.net/download/m0_63774211/87741209

Defect characteristics: small target defects, difficult to detect, as shown in the figure below;

1.1 Definition of small goals

1) Taking the COCO object definition of the general data set in the field of object detection as an example, small objects refer to less than 32×32 pixels (medium objects refer to 32*32-96*96, and large objects refer to greater than 96*96);
2) In actual application scenarios, it is usually more preferable to use the ratio relative to the original image to define: the length and width product of the object label box, divided by the length and width product of the entire image, and then open the square root. If the result is less than 3%, Just call it small goals;

1.2 Difficulties

1) The number of samples containing small targets is small, which potentially allows the target detection model to pay more attention to the detection of medium and large targets;

2) The area covered by small targets is smaller, so the locations of small targets will lack diversity. We speculate that this makes it difficult to verify the generalizability of small object detection;

3) The problem of anchor matching is difficult. This is mainly for the anchor-based method. Since the gt box and anchor of the small target are very small, the anchor and gt box are slightly offset, and the IoU becomes very low, making it easy to be judged as a negative sample by the network;

4) They are not only small, but also difficult, with varying degrees of obstruction, blur, and incompleteness;

 1.3 Introduction to industrial defect detection algorithms

The most popular deep learning frameworks for industrial defect detection are Paddle and Pytorch, among which the most popular detection algorithms are YOLOv8, YOLOV5 and PP-YOLOE . This article uses YOLOv8 to improve defect detection capabilities.

1.3.1 YOLOv8

        Ultralytics YOLOv8 is the latest version of the YOLO target detection and image segmentation model developed by Ultralytics. YOLOv8 is a cutting-edge, state-of-the-art (SOTA) model that builds on previous YOLO success and introduces new features and improvements to further improve performance and flexibility. It can be trained on large datasets and is capable of running on a variety of hardware platforms, from CPUs to GPUs.

The specific improvements are as follows:

Backbone: still uses the idea of ​​CSP, but the C3 module in YOLOv5 has been replaced by the C2f module, achieving further lightweighting. At the same time, YOLOv8 still uses the SPPF module used in YOLOv5 and other architectures;

PAN-FPN: There is no doubt that YOLOv8 still uses the idea of ​​​​PAN, but by comparing the structure diagrams of YOLOv5 and YOLOv8, we can see that YOLOv8 deletes the convolution structure in the PAN-FPN upsampling stage in YOLOv5, and also removes C3 The module is replaced by the C2f module;

Decoupled-Head: Do you smell something different? Yes, YOLOv8 goes to Decoupled-Head;

Anchor-Free: YOLOv8 abandoned the previous Anchor-Base and used the idea of ​​Anchor-Free;

Loss function: YOLOv8 uses VFL Loss as classification loss and DFL Loss+CIOU Loss as classification loss;

Sample matching: YOLOv8 abandons the previous IOU matching or unilateral proportion allocation method, and instead uses the Task-Aligned Assigner matching method.

2.Dynamic Snake Convolution

Paper:  2307.08388.pdf (arxiv.org)

Abstract: Accurate segmentation of topological tubular structures such as blood vessels and roads is crucial in various fields to ensure the accuracy and efficiency of downstream tasks. However, many factors complicate the task, including thin local structures and variable global morphology. In this work, we note the peculiarities of tubular structures and exploit this knowledge to guide our DSCNet to simultaneously enhance perception in three stages: feature extraction, feature fusion, and loss constraints. First, we propose a dynamic snake convolution to accurately capture the features of tubular structures by adaptively focusing on elongated and tortuous local structures. Subsequently, we propose a multi-view feature fusion strategy to supplement the multi-angle focus on features during the feature fusion process and ensure that important information from different global modalities is retained. Finally, a continuity constraint loss function based on persistent homology is proposed to better constrain the topological continuity of segmentation. Experiments on 2D and 3D datasets show that our DSCNet provides better accuracy and continuity on the tubular structure segmentation task compared to multiple methods. Our code is public. 

        The main challenges arise from the slender and weak local structural features and the complex and changeable global morphological features. This paper focuses on the slender and continuous characteristics of tubular structures and uses this information to simultaneously enhance perception in the following three stages of the neural network: feature extraction, feature fusion, and loss constraints. Dynamic Snake Convolution, multi-view feature fusion strategy and continuity topology constraint loss were designed respectively. 

         We hope that the convolution kernel can freely fit the structural learning features on the one hand, and not deviate too far from the target structure under constraints on the other hand. After observing the elongated and continuous features of the tubular structure, an animal came to mind - a snake . We hope that the convolution kernel can twist dynamically like a snake to fit the structure of the target.

2.1 Dynamic Snake Convolution added to yolov8

Core code:

class DySnakeConv(nn.Module):
    def __init__(self, inc, ouc, k=3) -> None:
        super().__init__()
        
        self.conv_0 = Conv(inc, ouc, k)
        self.conv_x = DSConv(inc, ouc, 0, k)
        self.conv_y = DSConv(inc, ouc, 1, k)
    
    def forward(self, x):
        return torch.cat([self.conv_0(x), self.conv_x(x), self.conv_y(x)], dim=1)

For details, see:

The first Yolov8 point increase artifact: Dynamic Snake Convolution (Dynamic Snake Convolution), achieving violent point increase | ICCV2023_AI Little Monster's Blog-CSDN Blog

 3. Analysis of training results

The training results are as follows:

Original [email protected] 0.679 improved to 0.743

YOLOv8-C2f-DySnakeConv summary: 249 layers, 3425894 parameters, 0 gradients, 8.7 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 2/2 [00:04<00:00,  2.15s/it]
                   all         66        187      0.722      0.668      0.743      0.342
                   TFS         66        130      0.582        0.6      0.638      0.295
                   XZW         66         57      0.862      0.737      0.847      0.388

For details, see: https://cv2023.blog.csdn.net/article/details/133125904

4.Series

1) Industrial small target defect detection based on Yolov8 (1)

2)  Dynamic Serpentine Convolution | ICCV2023

Dynamic Snake Convolution (Dynamic Snake Convolution), achieving violent gains | ICCV2023 

3) Multiple detection heads improve small target detection accuracy

 

Guess you like

Origin blog.csdn.net/m0_63774211/article/details/133125347