YOLOv8 improves the effective point increase series->Teach you step-by-step to add dynamic snake convolution (Dynamic Snake Convolution)

Paper code address:Dynamic snake convolution official code download address
Paper address:[Free 】Dynamic Snake Convolution (DynamicSnakeConvolution) resources-CSDN library

Introduction to this article

Dynamic Serpentine Convolution was inspired by the observation and understanding of the peculiarities of tubular structures. When segmenting types of tubular structures such as topological tubular structures, blood vessels, and roads, the complexity of the task increases because of these structures. The local structure may be very elongated and circuitous, while the overall morphology may also be variable.
Therefore, in order to deal with this challenge, the author's research team noticed the particularity of the tubular structure and proposed the dynamic snake shape Convolution (Dynamic Snake Convolution) method. Dynamic serpentine convolution accurately captures the characteristics of tubular structures by adaptively focusing on elongated and circuitous local structures. The core idea of ​​this convolution method is to enhance the perception ability through dynamically shaped convolution kernels and optimize the feature extraction of tubular structures.

In short, dynamic serpentine convolution is an innovative method for tubular structure segmentation tasks.Added points to many models that are effective for some data sets, a>It has importance and wide application fields.

Dynamic Snake Convolution is suitable for a variety of models,This convolution can be added or replaced on a variety of models, The improved model that this article mainly targets is the YOLOv8 model, and fixes the BUG that exists in the official code of dynamic snake convolutionFor example: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! Repair, and use this as an example to help everyone understand and master the dynamic snake convolution and YOLOv8 model.

PS->If you just want to learn how to modify how to add dynamic snake convolution in yolov8 without learning its principles, you can just ignore most of this, < /span>Jump directly to the official code bug fix chapter and read that chapter and subsequent chapters.

Introduction to YOLOv8

YOLO (You Only Look Once) series of algorithms have attracted much attention due to their high efficiency and accuracy. The latest version of YOLO, YOLOv8, released by Ultralytics in 2023, is an integrated and improved version based on previous generations of YOLO. It has been deeply optimized to make it better in terms of speed and accuracy.

If you want to understand and learn YOLOv8 in depth, I suggest you read the following article. This article is mainly about how to improve dynamic snake convolution (Dynamic Snake Convolution).

Detailed explanation of YOLOv8 network structure/environment construction/data set acquisition/training/inference/verification/export/deployment.

Dynamic snake convolution background and principle

Background->Dynamic Snake Convolution comes from clinical medicine. Clearly delineating blood vessels is a key prerequisite for computational fluid dynamics research and can assist radiology departments. Physicians diagnose and locate lesions. In remote sensing applications, complete road segmentation provides a solid foundation for path planning. Regardless of the field, these structures share the common characteristic of being elongated and tortuous, making them difficult to capture in images because of their small proportion in the image. Therefore, there is an urgent need to improve the perception of elongated tubular structures, so in this context the author proposed the Dynamic Snake Convolution (Dynamic Snake Convolution) ).

Principle->The above figure shows a three-dimensional cardiovascular data set and a a>. Standard convolution kernels are designed to extract local features. Based on this, deformable convolution kernels are designed to enrich their applications and adapt to geometric deformations of different targets. However, effectively focusing on small tubular structures is difficult due to the previously mentioned challenges. fragile local structures and complex overall morphology. Both datasets aim to extract tubular structures, but this task faces challenges due to2D remote road dataset

This remains a challenging task due to the following difficulties:

  1. Small and fragile local structures: As shown in the figure above, small structures only account for a small part of the overall image, and due to the limited composition of pixels, these structures are susceptible to interference from complex backgrounds, making it difficult for the model to accurately distinguish the subtleties of the target. Variety. Therefore, the model may have difficulty distinguishing these structures, leading to fragmentation in the segmentation results.

  2. Complex and varied overall morphology: The image above shows the complex and varied morphology of tiny tubular structures, even within the same image. Targets in different regions show morphological changes, including the number of branches, bifurcation positions, and path lengths. When the data presents a morphological structure that has never been seen before, the model may overfit the features that have been seen before, resulting in weak generalization ability under the new morphological structure.

In order to deal with the above obstacles, the following solutions are proposed,which include tubular perceptual convolution kernels, multi-view feature fusion strategies and topological continuity constrained loss functions< a i=2>. The details are as follows:

        1. To address the challenge that small and fragile local structures account for a small proportion and are difficult to focus on, dynamic snake convolution is proposed. Enhanced perception of geometric structures by adaptively focusing on local features of the elongated curves of the tubular structure. Unlike deformable convolution, DSConv takes into account the serpentine morphology of tubular structures and supplements the free learning process with constraints to enhance the perception of tubular structures in a targeted manner.

        2. Aiming at the challenge of complex and changeable overall shapes, a multi-view feature fusion strategy is proposed. In this method, multiple morphological convolution kernel templates are generated based on DSConv, the structural characteristics of the target are observed from different angles, and efficient feature fusion is achieved by summarizing typical important features.

        3. Aiming at the problem that segmentation of tubular structures is prone to breakage, a topological continuity based on Persistent Homology (PH) is proposed Constrained loss function (TCLoss). PH is a topological feature response process from appearance to disappearance, which can obtain sufficient topological information from noisy high-dimensional data. The associated Betti number is a way of describing the connectivity of a topological space. Unlike other methods, TCLoss combines PH with point set similarity to guide the network to focus on fracture regions with abnormal pixel/voxel distribution, from topology Angle implements continuity constraints.

Summary: To overcome the challenges, a DSCNet framework is proposed, including a tubular perceptual convolution kernel, a multi-view feature fusion strategy and a topological continuity constrained loss function. DSConv enhances the perception of slender curve features, the multi-view feature fusion strategy improves the processing capabilities of complex overall shapes, and TCLoss implements continuity constraints from a topological perspective based on persistent homology.

Advantages of dynamic serpentine convolution

In order to improve the performance on tubular structures, various methods have been proposed to design specific network architectures and modules according to the morphology of tubular structures. details as follows:

1. Methods based on convolution kernel design:The famous dilated convolution (dilated convolution) and deformable convolution (deformable convolution) and other methods have been proposed to deal with convolution It overcomes the inherent geometric transformation limitations in neural networks and achieves excellent performance in complex detection and segmentation tasks. These methods are also designed to dynamically sense the geometric characteristics of objects to adapt to structures with variable morphology. For example, DUNet.

2. Morphology-based methods:Some methods focus on using morphological information to deal with tubular structures. For example, the Morphological Reconstruction Network uses morphological reconstruction operations to reconstruct tubular structures, thereby achieving more accurate segmentation. In addition, morphological operations such as opening and closing operations are also widely used to deal with tubular structures.

3. Topology-based methods:Topology methods are used to deal with the topological characteristics of tubular structures. For example, methods based on persistent homology can obtain topological information from high-dimensional data and be used to analyze the connectivity and morphological characteristics of tubular structures.

Summary: To deal with tubular structures, various methods have been proposed. These methods include methods based on convolution kernel design, methods based on morphology, and methods based on topology. The goal of these methods is to improve the detection and segmentation performance of tubular structures by designing network architectures and modules that adapt to the morphology of tubular structures.

Advantages->The above methods are only analyzed from a single perspective. DSConv proposes a multi-angle feature fusion strategy to supplement important features from multiple angles. Feature attention. In this strategy, multiple morphological convolution kernel templates are generated based on dynamic serpentine convolution (DSConv), the structural characteristics of the target are observed from multiple angles, and feature fusion is achieved by summarizing key standard features, thereby improving the performance of our model. performance.

Experiments and results

data set

Three datasets are used to validate our framework, including two public datasets and one internal dataset. On the 2D side, the DRIVE Retina Dataset and the Massachusetts Road Dataset are evaluated. On the 3D side, a dataset called Cardiac CCTA Data is used.

experiment

Comparative experiments and ablation studies were performed to demonstrate the advantages of DSCN. Compare with the classic segmentation network U-Net and the CS2-Net proposed in 2021 for blood vessel segmentation to verify the accuracy. In order to verify the network design performance, the DCU-Net proposed in 2022 for retinal blood vessel segmentation was compared. In order to verify the advantages of feature fusion, Transunet proposed in 2021 for medical image segmentation was compared. To verify the loss function constraints, clDice proposed in 2021 and TCLoss LWTC based on Wasserstein distance were compared. The models are trained on the same dataset and implemented accurately, evaluated by the following metrics. All metrics are calculated for each image and averaged.

1. Volume score: Use average Dice coefficient (Dice), relative Dice coefficient (RDice), centerline Dice (clDice), accuracy (ACC) and AUC to evaluate the performance of the results.
2. Topological error: Calculate topology-based scores, including Betti errors for Betti numbers β0 and β1. Meanwhile, to objectively verify the continuity of coronary artery segmentation, the overlap until first error (OF) was used to evaluate the completeness of the extracted centerline.
3. Distance error: Hausdorff distance (HD) is also widely used to describe the similarity between two sets of points and is recommended for evaluating the similarity of thin tubular structures.

Experimental results

The advantages of the DSCNet method on each metric are demonstrated in the table below, and the results show that the proposed DSCNet achieves better results on both 2D and 3D datasets.

In evaluations on the DRIVE dataset, DSCNet outperforms other models in both segmentation accuracy and topological continuity. In the table below, compared to other methods, DSCNet achieved the best segmentation results in terms of volumetric accuracy, with Dice coefficient of 82.06%, RDice coefficient of 90.17%, clDice coefficient of 82.07%, and accuracy of 96.87%, The AUC is 90.27%. At the same time, from a topological perspective, DSCNet achieved the best results in topological continuity compared with other methods, with a β0 error of 0.998 and a β1 error of 0.803. The results show that the DSCNet method better captures the characteristics of thin tubular structures and exhibits more accurate segmentation performance and more continuous topology. As shown in rows 6 to 12 in Table 1, after the introduction of TCLoss, different models improved in terms of topological continuity of segmentation. The results show that TCLoss is able to accurately constrain the model to focus on thin tubular structures that lose topological continuity. In the evaluation on the ROADS data set, DSCNet also achieved the best results. As shown in Table 1, compared with other methods, the proposed DSCNet with TCLoss achieved the best results in segmentation results, with a Dice coefficient of 78.21%, a RDice coefficient of 85.85%, and a clDice coefficient of 87.64%. Compared with the results of the classic segmentation network UNet, the DSCNet method improves the Dice coefficient, RDice coefficient and clDice coefficient by up to 1.31%, 1.78% and 0.77% respectively. The results show that compared with other models, DSCNet's model also performs well on road data sets with complex structures and varied shapes.

In the evaluation on the CORONARY dataset, it was verified that DSCNet still achieved the best results on the 3D thin tubular structure segmentation task. As shown in the table below, compared with other methods, the proposed DSCNet achieved the best results in segmentation results, with a Dice coefficient of 80.27%, a RDice coefficient of 86.37%, and a clDice coefficient of 85.26%. Compared with the results of the classic segmentation network UNet, the DSCNet method improves the Dice coefficient, RDice coefficient and clDice coefficient by up to 3.40%, 1.89% and 3.83% respectively. Meanwhile, the OF metric is used to evaluate the continuity of segmentation. Using the DSCNet method, the OF index of LAD increased by 6.00%, the OF index of LCX increased by 3.78%, and the OF index of RCA increased by 3.30%.

Demonstration of effectiveness

DSCNet and TCLoss have decisive visual advantages in every aspect.

(1) To show the effectiveness of DSCNet in the picture below. From left to right, the third to fifth columns show the performance of different networks in terms of segmentation accuracy. Since DSConv can adaptively perceive key features, DSCNet's method performs well in segmentation results. Compared with other methods, DSCNet's method is better able to capture and preserve the details of thin tubular structures.

(2) In order to verify the effectiveness of DSCNet's TCLoss, the sixth column shows the segmentation results without using TCLoss. It can be seen that the method without TCLoss has obvious problems in topological continuity, while the DSCNet method can accurately constrain the topology of the segmentation results through TCLoss, making the segmentation results more continuous.

(3) In the seventh and eighth columns, the segmentation results of DSCNet on different data sets are shown. It can be seen that DSCNet can achieve accurate and continuous segmentation results on both DRIVE and ROADS data sets, further proving the versatility and robustness of DSCNet.

Overall, it can be clearly seen from Figure 6 that our DSCNet and TCLoss have significant advantages in segmentation accuracy and topological continuity, which further proves the effectiveness and superiority of our method.

DSConv is able to dynamically adapt to the shape of the tubular structure, and the attention adapts well to the target.

(1) Adapt to the shape of the tubular structure. The top part of the image below shows the location and shape of the convolution kernel. Visualization results show that DSConv adapts well to tubular structures and maintains shape, while deformable convolutions wander outside the target.

(2) Pay attention to the position of the tubular structure. The bottom part of the image below shows the attention heatmap for a given point. The results show that the brightest areas of DSConv are concentrated on tubular structures, which means that DSConv is more sensitive to tubular structures.

These results demonstrate that our DSConv is able to effectively adapt and focus on tubular structures, thereby enabling DSCNet to better capture and segment these structures.

Applications and future prospects

The framework of DSCNet handles the segmentation of thin tubular structures well and successfully combines morphological features with topological knowledge to guide the model to adapt to the segmentation task. However, whether other morphological targets can achieve better performance in similar paradigms remains an exciting subject. Meanwhile, more research will explore the possibility of applying similar paradigms on other morphological targets.

The DSCNet method focuses on the segmentation of thin tubular structures, but similar ideas and techniques may also be applicable to other morphological targets, such as slender lines, slender stretches of cells, etc. By combining morphological features and topological knowledge, DSCNet can guide the model to better adapt to the segmentation tasks of different morphological objects.

Future research can explore how similar paradigms can be applied to other morphological targets and further improve and optimize the segmentation algorithm. This may involve work on designing network structures more suitable for specific goals, introducing richer prior knowledge, and developing more effective training strategies. Through these efforts, it is expected to achieve more accurate and robust segmentation results on different morphological targets.

Add dynamic snake convolution (Dynamic Snake Convolution) in YOLOv8

The above series only introduces Dynamic Snake Convolution. If you just want to improve your model and don’t want to know the principle, you can just read this paragraph.

My environment and version

Because different versions of yolov8 may have different code directory formats, the following is an introduction to the version I use. They are basically the latest versions, so they are suitable for everyone to update and modify.

python=3.9.7

torch=2.1

different=11.8

My yolov8 version is version 8.0.203

Official code bug fix 

If you add the official code yourself, you may get the following error. We made modifications and successfully solved the problem.

    y_new_ = y_new_.add(y_offset_new_.mul(extend_scope))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

The modified code is as follows, you can copy it directly! Fix the problem!

# -*- coding: utf-8 -*-
from __future__ import annotations

import os

import numpy as np
import torch
from torch import nn
import einops


"""Dynamic Snake Convolution Module"""


class DSConv_pro(nn.Module):
    def __init__(
        self,
        in_channels: int = 1,
        out_channels: int = 1,
        kernel_size: int = 9,
        extend_scope: float = 1.0,
        morph: int = 0,
        if_offset: bool = True,
        device: str | torch.device = "cpu",
    ):
        """
        A Dynamic Snake Convolution Implementation

        Based on:

            TODO

        Args:
            in_ch: number of input channels. Defaults to 1.
            out_ch: number of output channels. Defaults to 1.
            kernel_size: the size of kernel. Defaults to 9.
            extend_scope: the range to expand. Defaults to 1 for this method.
            morph: the morphology of the convolution kernel is mainly divided into two types along the x-axis (0) and the y-axis (1) (see the paper for details).
            if_offset: whether deformation is required,  if it is False, it is the standard convolution kernel. Defaults to True.

        """

        super().__init__()

        if morph not in (0, 1):
            raise ValueError("morph should be 0 or 1.")

        self.kernel_size = kernel_size
        self.extend_scope = extend_scope
        self.morph = morph
        self.if_offset = if_offset
        self.device = torch.device(device)
        self.to(device)

        # self.bn = nn.BatchNorm2d(2 * kernel_size)
        self.gn_offset = nn.GroupNorm(kernel_size, 2 * kernel_size)
        self.gn = nn.GroupNorm(out_channels // 4, out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.tanh = nn.Tanh()

        self.offset_conv = nn.Conv2d(in_channels, 2 * kernel_size, 3, padding=1)

        self.dsc_conv_x = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=(kernel_size, 1),
            stride=(kernel_size, 1),
            padding=0,
        )
        self.dsc_conv_y = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=(1, kernel_size),
            stride=(1, kernel_size),
            padding=0,
        )

    def forward(self, input: torch.Tensor):
        # Predict offset map between [-1, 1]
        offset = self.offset_conv(input)
        # offset = self.bn(offset)
        offset = self.gn_offset(offset)
        offset = self.tanh(offset)

        # Run deformative conv
        y_coordinate_map, x_coordinate_map = get_coordinate_map_2D(
            offset=offset,
            morph=self.morph,
            extend_scope=self.extend_scope,
            device=self.device,
        )
        deformed_feature = get_interpolated_feature(
            input,
            y_coordinate_map,
            x_coordinate_map,
        )

        if self.morph == 0:
            output = self.dsc_conv_x(deformed_feature)
        elif self.morph == 1:
            output = self.dsc_conv_y(deformed_feature)

        # Groupnorm & ReLU
        output = self.gn(output)
        output = self.relu(output)

        return output


def get_coordinate_map_2D(
    offset: torch.Tensor,
    morph: int,
    extend_scope: float = 1.0,
    device: str | torch.device = "cpu",
):
    """Computing 2D coordinate map of DSCNet based on: TODO

    Args:
        offset: offset predict by network with shape [B, 2*K, W, H]. Here K refers to kernel size.
        morph: the morphology of the convolution kernel is mainly divided into two types along the x-axis (0) and the y-axis (1) (see the paper for details).
        extend_scope: the range to expand. Defaults to 1 for this method.
        device: location of data. Defaults to 'cuda'.

    Return:
        y_coordinate_map: coordinate map along y-axis with shape [B, K_H * H, K_W * W]
        x_coordinate_map: coordinate map along x-axis with shape [B, K_H * H, K_W * W]
    """

    if morph not in (0, 1):
        raise ValueError("morph should be 0 or 1.")

    batch_size, _, width, height = offset.shape
    kernel_size = offset.shape[1] // 2
    center = kernel_size // 2
    device = torch.device(device)

    y_offset_, x_offset_ = torch.split(offset, kernel_size, dim=1)

    y_center_ = torch.arange(0, width, dtype=torch.float32, device=device)
    y_center_ = einops.repeat(y_center_, "w -> k w h", k=kernel_size, h=height)

    x_center_ = torch.arange(0, height, dtype=torch.float32, device=device)
    x_center_ = einops.repeat(x_center_, "h -> k w h", k=kernel_size, w=width)

    if morph == 0:
        """
        Initialize the kernel and flatten the kernel
            y: only need 0
            x: -num_points//2 ~ num_points//2 (Determined by the kernel size)
        """
        y_spread_ = torch.zeros([kernel_size], device=device)
        x_spread_ = torch.linspace(-center, center, kernel_size, device=device)

        y_grid_ = einops.repeat(y_spread_, "k -> k w h", w=width, h=height)
        x_grid_ = einops.repeat(x_spread_, "k -> k w h", w=width, h=height)

        y_new_ = y_center_ + y_grid_
        x_new_ = x_center_ + x_grid_

        y_new_ = einops.repeat(y_new_, "k w h -> b k w h", b=batch_size)
        x_new_ = einops.repeat(x_new_, "k w h -> b k w h", b=batch_size)

        y_offset_ = einops.rearrange(y_offset_, "b k w h -> k b w h")
        y_offset_new_ = y_offset_.detach().clone()

        # The center position remains unchanged and the rest of the positions begin to swing
        # This part is quite simple. The main idea is that "offset is an iterative process"

        y_offset_new_[center] = 0

        for index in range(1, center + 1):
            y_offset_new_[center + index] = (
                y_offset_new_[center + index - 1] + y_offset_[center + index]
            )
            y_offset_new_[center - index] = (
                y_offset_new_[center - index + 1] + y_offset_[center - index]
            )

        y_offset_new_ = einops.rearrange(y_offset_new_, "k b w h -> b k w h")

        y_new_ = y_new_.add(y_offset_new_.mul(extend_scope))

        y_coordinate_map = einops.rearrange(y_new_, "b k w h -> b (w k) h")
        x_coordinate_map = einops.rearrange(x_new_, "b k w h -> b (w k) h")

    elif morph == 1:
        """
        Initialize the kernel and flatten the kernel
            y: -num_points//2 ~ num_points//2 (Determined by the kernel size)
            x: only need 0
        """
        y_spread_ = torch.linspace(-center, center, kernel_size, device=device)
        x_spread_ = torch.zeros([kernel_size], device=device)

        y_grid_ = einops.repeat(y_spread_, "k -> k w h", w=width, h=height)
        x_grid_ = einops.repeat(x_spread_, "k -> k w h", w=width, h=height)

        y_new_ = y_center_ + y_grid_
        x_new_ = x_center_ + x_grid_

        y_new_ = einops.repeat(y_new_, "k w h -> b k w h", b=batch_size)
        x_new_ = einops.repeat(x_new_, "k w h -> b k w h", b=batch_size)

        x_offset_ = einops.rearrange(x_offset_, "b k w h -> k b w h")
        x_offset_new_ = x_offset_.detach().clone()

        # The center position remains unchanged and the rest of the positions begin to swing
        # This part is quite simple. The main idea is that "offset is an iterative process"

        x_offset_new_[center] = 0

        for index in range(1, center + 1):
            x_offset_new_[center + index] = (
                x_offset_new_[center + index - 1] + x_offset_[center + index]
            )
            x_offset_new_[center - index] = (
                x_offset_new_[center - index + 1] + x_offset_[center - index]
            )

        x_offset_new_ = einops.rearrange(x_offset_new_, "k b w h -> b k w h")

        x_new_ = x_new_.add(x_offset_new_.mul(extend_scope))

        y_coordinate_map = einops.rearrange(y_new_, "b k w h -> b w (h k)")
        x_coordinate_map = einops.rearrange(x_new_, "b k w h -> b w (h k)")

    return y_coordinate_map, x_coordinate_map


def get_interpolated_feature(
    input_feature: torch.Tensor,
    y_coordinate_map: torch.Tensor,
    x_coordinate_map: torch.Tensor,
    interpolate_mode: str = "bilinear",
):
    """From coordinate map interpolate feature of DSCNet based on: TODO

    Args:
        input_feature: feature that to be interpolated with shape [B, C, H, W]
        y_coordinate_map: coordinate map along y-axis with shape [B, K_H * H, K_W * W]
        x_coordinate_map: coordinate map along x-axis with shape [B, K_H * H, K_W * W]
        interpolate_mode: the arg 'mode' of nn.functional.grid_sample, can be 'bilinear' or 'bicubic' . Defaults to 'bilinear'.

    Return:
        interpolated_feature: interpolated feature with shape [B, C, K_H * H, K_W * W]
    """

    if interpolate_mode not in ("bilinear", "bicubic"):
        raise ValueError("interpolate_mode should be 'bilinear' or 'bicubic'.")

    y_max = input_feature.shape[-2] - 1
    x_max = input_feature.shape[-1] - 1

    y_coordinate_map_ = _coordinate_map_scaling(y_coordinate_map, origin=[0, y_max])
    x_coordinate_map_ = _coordinate_map_scaling(x_coordinate_map, origin=[0, x_max])

    y_coordinate_map_ = torch.unsqueeze(y_coordinate_map_, dim=-1)
    x_coordinate_map_ = torch.unsqueeze(x_coordinate_map_, dim=-1)

    # Note here grid with shape [B, H, W, 2]
    # Where [:, :, :, 2] refers to [x ,y]
    grid = torch.cat([x_coordinate_map_, y_coordinate_map_], dim=-1)

    interpolated_feature = nn.functional.grid_sample(
        input=input_feature,
        grid=grid,
        mode=interpolate_mode,
        padding_mode="zeros",
        align_corners=True,
    )

    return interpolated_feature


def _coordinate_map_scaling(
    coordinate_map: torch.Tensor,
    origin: list,
    target: list = [-1, 1],
):
    """Map the value of coordinate_map from origin=[min, max] to target=[a,b] for DSCNet based on: TODO

    Args:
        coordinate_map: the coordinate map to be scaled
        origin: original value range of coordinate map, e.g. [coordinate_map.min(), coordinate_map.max()]
        target: target value range of coordinate map,Defaults to [-1, 1]

    Return:
        coordinate_map_scaled: the coordinate map after scaling
    """
    min, max = origin
    a, b = target

    coordinate_map_scaled = torch.clamp(coordinate_map, min, max)

    scale_factor = (b - a) / (max - min)
    coordinate_map_scaled = a + scale_factor * (coordinate_map_scaled - min)

    return coordinate_map_scaled


Where the code needs to be changed

There are five things we need to change to add this convolution to our code!

Modification 1

First, copy the above code with BUG repair to the end of the following directory, the ultralytics/nn/modules/conv.py file.

After modification, it should look like the following. Because the file code is too long, I cannot copy it all. I will upload the modified code to CSDN for everyone to download.

PS: It should be noted that you need to import the required library as shown in the code box below and move it to the front end of the file, otherwise an error will be reported.

# -*- coding: utf-8 -*-
from __future__ import annotations

import os

import numpy as np
import torch
from torch import nn
import einops

If you do not move to the front end, the following error will be reported!​ 

Modification 2

The second thing that needs to be modified is the front of the same file as above. The class name we defined is added here.

Modification 3 

We move to the file ultralytics/nn/modules/__init__.py and modify the file. We also add the class name to this place.

Modification 4

We move to ultralytics/nn/tasks.py under this file and modify the file. We first modify the front-end module import place of the file, as shown below, and also add the class name here.

Modification 5 

Modify the same file and find the defined method

def parse_model(d, ch, verbose=True):  # model_dict, input_channels(3)

Modify as follows and add the class name.​ 

Modify model configuration file

At this point, you can add the convolution to the model. We need to find the following directory to modify "ultralytics/cfg/models/v8/yolov8.yaml". After finding this file, the initial look is as follows:

It should be noted that dynamic snake convolution does not change the width and height of the input image. We can add it to the Backbone backbone network, or add it to the Neck part for auxiliary feature fusion, depending on your own needs. , make improvements. Here I add it to the Backbone layer to replace the first C2f and the last C2f to achieve the effective increase point and modify it as shown below.​ 

Start training

Here I perform training by creating a py file. I created a file named run and placed it in the root directory.

The content is as follows

from ultralytics import YOLO

model = YOLO("ultralytics/cfg/models/v8/yolov8.yaml").train(**{'cfg':'ultralytics/cfg/default.yaml'}) # pass any model type


Run this file, and the console will start to output and verify our model. Our output can be reflected in the output network structure.

Comparative Results 

training completed,

We can take a look at the detection renderings. We can see that most objects can still be detected. The mAP has also been slightly improved. I won’t show it to you here because my data set is available online for blogging. The one I found only has a few hundred pictures. I will use a large data set to test it later and provide you with an accuracy map.

Modified files 

Finally, I will provide you with the download address of the file after the modification is completed, and it will be modified for everyone. You only need to add it where you want to add DSCN!

Modified the ultralytics file of DSCN dynamic snake convolution

Finally, I hope this article can help you.

おすすめ

転載: blog.csdn.net/java1314777/article/details/134139459