The most detailed YOLOV3 SPP structure code analysis in history

Recommend a particularly good blogger @太阳花的小豆豆, I am a novice, and I learn all the knowledge of deep learning neural networks from this blogger. I recently learned the code of Yolov3 spp, and I feel that the code is difficult to share here. I have commented every line of code as much as possible.
This article is also summed up after watching the video study of the blogger Sunflower Mung Bean.
The blogger’s video is at station B, students who want to watch the video can watch it, Yolov3 spp code here is the link.
The code is also the blogger's code, just download all the files in this directory .

Preparation

We want to download the VOC2012 dataset, the download link is here https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
is in the Yolov3spp folder, create a file called VOCdevkit, and the downloaded dataset is placed in in this file.
In the README file in the spp file, the meaning of each file is also marked, and will not be described here. There are two scripts for converting formats and generating files, @太阳花's little mung bean is really great. Note that you can change the path. The code is directly listed below, and the code is in the order of my understanding.

Code

1. parse_config.py parses the network structure

First look at the yolov3-spp.cfg file in the cfg file. This file records the parameter configuration of each layer of our entire spp network. To put it bluntly, parse_config.py is to read and parse these parameters so that we can use them to build network models.

import os
import numpy as np

def parse_model_cfg(path: str):
    # 检查文件是否存在
    if not path.endswith(".cfg") or not os.path.exists(path):
        # path.endswith判断是否存在.cfg的文件
        raise FileNotFoundError("the cfg file not exist...")

    # 读取文件信息
    with open(path, "r") as f:
        lines = f.read().split("\n")
        # f.read().split()把文件内容当成一个列表返回  通过换行符进行分割变成一行一行的

    # 去除空行和注释行
    lines = [x for x in lines if x and not x.startswith("#")]
    # startswith()判断字符串是否以指定字符或子字符串开头。
    # 如果x不为空或者不是#开头的,就存在lines列表中,否者就丢弃。 for...[if]... 就是创建List的一种写法

    lines = [x.strip() for x in lines]
    # strip() 去除每行开头和结尾的空格符,因为括号中没有元素所以为空格
    mdefs = []  # 定义一个空列表
    for line in lines:            #遍历lines
        if line.startswith("["):  # 如果列表中的内容以[开头
            mdefs.append({
    
    })      # 在mdefs这个空列表中创建一个空字典
            mdefs[-1]["type"] = line[1:-1].strip()  # 记录module类型
            # [-1]就是空字典。在这个空字典里创建一组键值,键就是["type"],值就是line[1:-1],.strip()还是去除空格
            # 这时可以打开yolov3-spp.cfg文件观察,每一个[convolutional],第一个元素[索引为0,c索引为1,l索引为-1,因为切片时左闭右开,相当于做后一个元素是l
            # 所以对应的读取到convolutional存取到type中
            # 所以说这一步的操作就是读取到卷积或池化或yolo的名字
            if mdefs[-1]["type"] == "convolutional":
                mdefs[-1]["batch_normalize"] = 0
                # 如果是卷积模块,设置默认不使用BN,(普通卷积层后面会重写成1,最后的预测层conv保持为0)
        else:
            # 如果不是[开头,读取的就是参数值了
            key, val = line.split("=")
            # 通过等号进行分割得到key和value
            key = key.strip()
            val = val.strip()

            if key == "anchors":
                # 在最后yolo层中存在anchor
                # anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
                val = val.replace(" ", "")  # 将空格去除
                mdefs[-1][key] = np.array([float(x) for x in val.split(",")]).reshape((-1, 2))  # np anchors
                # np.array()转化为数组,float(x)是因为有小数转为浮点型
                # arr.reshape(-1,m) 改变维度为d行、m列 (-1表示行数自动计算,d= a*b /m ),d=1*18/2=9,所以就是9行2列,9组anchor

            elif (key in ["from", "layers", "mask"]) or (key == "size" and "," in val):
                mdefs[-1][key] = [int(x) for x in val.split(",")]
                # 以逗号分割,转化为整形
            else:  #不是上面两种情况那就是到了正常卷积的参数读取了
                # TODO: .isnumeric() actually fails to get the float case
                if val.isnumeric():  # 字母读取完,如果是数值的情况,val.isnumeric()判断是否是数字
                    mdefs[-1][key] = int(val) if (int(val) - float(val)) == 0 else float(val)
                    # 判断数字是整形还是字符型,如果整形 int(val) - float(val)就会=0,那么输入int(val),否则float(val)
                else:
                    mdefs[-1][key] = val  # return string  是字符的情况

    # check all fields are supported 字典中所有的key值
    supported = ['type', 'batch_normalize', 'filters', 'size', 'stride', 'pad', 'activation', 'layers', 'groups',
                 'from', 'mask', 'anchors', 'classes', 'num', 'jitter', 'ignore_thresh', 'truth_thresh', 'random',
                 'stride_x', 'stride_y', 'weights_type', 'weights_normalization', 'scale_x_y', 'beta_nms', 'nms_kind',
                 'iou_loss', 'iou_normalizer', 'cls_normalizer', 'iou_thresh', 'probability']

    # 遍历检查每个模型的配置
    for x in mdefs[1:]:  # 0对应net配置
        # 遍历每个配置字典中的key值
        for k in x:
            if k not in supported:
                # 判断所有的key是否在这个supported列表中
                raise ValueError("Unsupported fields:{} in cfg".format(k))

    return mdefs  # 最后返回我们解析好的网络参数


def parse_data_cfg(path):
    # Parses the data configuration file
    if not os.path.exists(path) and os.path.exists('data' + os.sep + path):  # add data/ prefix if omitted
        path = 'data' + os.sep + path

    with open(path, 'r') as f:
        lines = f.readlines()

    options = dict()
    for line in lines:
        line = line.strip()
        if line == '' or line.startswith('#'):
            continue
        key, val = line.split('=')
        options[key.strip()] = val.strip()

    return options

2. Models.py builds a network model (model building and forward propagation)

After analyzing the parameters of the network, we are going to build a model of the network. Except for this part of the code models.py, some functions in it layes.pyare built in .

Code reading steps

The whole network should be class Darknet(nn.Module):read from the beginning, self.module_list, self.routs = create_modules(self.module_defs, img_size)we have to look at create_modules()how this function is constructed, and jump to this function.
When running to determine whether the type is the two types of routeand , the two functions of and shortcutare used . They are built in l and jump to these two functions. When it is judged to be yolo, we have to see how this program is built, and then jump to this function, which processes the output of YOLO. After that, read the function sentence by sentence normally. In the end , two values ​​​​are returned , and we will come back . Normal reading is fine. Note that it was set at the very beginning , so just ignore the statements related to .FeatureConcat()WeightedFeatureFusion()ayers.py
mdef["type"] == "yolo"modules = YOLOLayer(........YOLOLayer
create_modules()module_list, routs_binaryclass Darknet(nn.Module):
ONNX_EXPORT = FalseONNX_EXPORT

The models.py code is as follows.

from build_utils.layers import *
from build_utils.parse_config import *

ONNX_EXPORT = False


def create_modules(modules_defs: list, img_size):
    # 两个参数 第一个为一个列表,后面用到这个函数时传入的就是解析网络结构的列表,第二个参数为图片大小
    """
    Constructs module list of layer blocks from module configuration in module_defs
    :param modules_defs: 通过.cfg文件解析得到的每个层结构的列表
    :param img_size:
    :return:
    """

    img_size = [img_size] * 2 if isinstance(img_size, int) else img_size
    # pop(0)删除解析cfg列表中的第一个配置(对应[net]的配置)
    modules_defs.pop(0)  # cfg training hyperparams (unused)
    output_filters = [3]  # 记录接下来搭建每个模块时所输出特征矩阵的channel,列表中放3,是因为RGB图像通道为3
    module_list = nn.ModuleList()   # 之后每一个模块都会传入到nn.ModuleList()中
    # routs()统计哪些特征层的输出会被后续的层使用到(可能是特征融合,也可能是拼接) 这里不懂得话 后面用到就懂了
    routs = []  # list of layers which rout to deeper layers
    yolo_index = -1

    # 遍历搭建每个层结构
    for i, mdef in enumerate(modules_defs):
        # i为索引 mdef为信息
        modules = nn.Sequential()
        # 模块中的信息存入到Sequential中
        if mdef["type"] == "convolutional":
            bn = mdef["batch_normalize"]  # 1 or 0 / use or not
            filters = mdef["filters"]
            k = mdef["size"]  # kernel size
            stride = mdef["stride"] if "stride" in mdef else (mdef['stride_y'], mdef["stride_x"])  #此时就是stride而已
            if isinstance(k, int):   #isinstance()判断一个函数是否是一个已知的类型,此时判断K卷积核大小是否为整数
                # add_module这个函数可以在A.init(self)以外定义A的子模块,加一个卷积层
                modules.add_module("Conv2d", nn.Conv2d(in_channels=output_filters[-1],  # 如果是第一个卷积层,那么此时就为3
                                                       out_channels=filters,            #卷积核个数
                                                       kernel_size=k,
                                                       stride=stride,
                                                       padding=k // 2 if mdef["pad"] else 0,
                                                       bias=not bn))                    # 使用bn bias为flase 不使用为ture
            else:
                raise TypeError("conv2d filter size must be int type.")

            if bn:
                modules.add_module("BatchNorm2d", nn.BatchNorm2d(filters))   #使用bn,就再加入一个bn层,输入=上一层输出filters
            else:
                # 如果该卷积操作没有bn层,意味着该层为yolo的predictor
                routs.append(i)  # detection output (goes into yolo layer)  记录一个索引

            if mdef["activation"] == "leaky":  #除了三个predictor对应的卷积是linear 其他都是leaky,全局看就是在卷积的基础上看里面的激活函数是什么
                modules.add_module("activation", nn.LeakyReLU(0.1, inplace=True))
            else:
                pass

        elif mdef["type"] == "BatchNorm2d":
            pass

        elif mdef["type"] == "maxpool":   #只有spp结构使用maxpool
            k = mdef["size"]  # kernel size
            stride = mdef["stride"]
            modules = nn.MaxPool2d(kernel_size=k, stride=stride, padding=(k - 1) // 2)

        elif mdef["type"] == "upsample":
            if ONNX_EXPORT:  # explicitly state size, avoid scale_factor
                g = (yolo_index + 1) * 2 / 32  # gain
                modules = nn.Upsample(size=tuple(int(x * g) for x in img_size))
            else:
                modules = nn.Upsample(scale_factor=mdef["stride"])  #上采样在文件中也已经定义好

        elif mdef["type"] == "route":  # [-2],  [-1,-3,-5,-6], [-1, 61]
            layers = mdef["layers"]
            filters = sum([output_filters[l + 1 if l > 0 else l] for l in layers])
            #  filters定义为当前层输出特征矩阵深度,l遍历layers,l>0存入l+1,因为output_filters中已经存在一个元素3,它的索引为0,小于0直接写入,sum求和
            routs.extend([i + l if l < 0 else l for l in layers])
            # extend和append的区别就在于,append添加后不改变添加项的类型,之前是一个列表添加进去之后还是一个列表,extend则会存储到routs这个列表中变为其中的元素
            # 此时当前层为i,l小小于0,就说明不是此时的模块,就要倒数回去所以l+i,如果l大于0就是l本身的那一层,这个l应该是layer的值了
            # 举个例子此时i=18 layers=[-1, 61],经过上面操作 routs=[18,17, 61]
            modules = FeatureConcat(layers=layers)
            # FeatureConcat将多个特征矩阵在channel维度进行concate拼接,在layers.py写了这个函数的结构

        elif mdef["type"] == "shortcut":
            layers = mdef["from"]   #from代表与前面那一层连接 其实就是layers的意思
            filters = output_filters[-1]  # 上一层输出channel
            # routs.extend([i + l if l < 0 else l for l in layers])
            routs.append(i + layers[0])
            # 记录一下现在是哪一层的输出
            modules = WeightedFeatureFusion(layers=layers, weight="weights_type" in mdef)
            # 这个是输出特征矩阵相加,在layers.py写了这个函数的结构
        elif mdef["type"] == "yolo":
            yolo_index += 1  # 记录是第几个yolo_layer [0, 1, 2],这里加1是因为上面设置的为-1,本结构一共有3个yolo所以最后是0 1 2
            stride = [32, 16, 8]  # 预测特征层对应原图的缩放比例

            modules = YOLOLayer(anchors=mdef["anchors"][mdef["mask"]],  # 一共9组anchor使用哪一个 由mask决定
                                nc=mdef["classes"],  # number of classes
                                img_size=img_size,
                                stride=stride[yolo_index])   #每一个预测特征层有相应的stride

            # Initialize preceding Conv2d() bias (https://arxiv.org/pdf/1708.02002.pdf section 3.3)
            try:   #这里就是yolo层的上一层预测层,j=-1的意思,简单看一下
                j = -1
                # bias: shape(255,) 索引0对应Sequential中的Conv2d
                # view: shape(3, 85)
                b = module_list[j][0].bias.view(modules.na, -1)
                b.data[:, 4] += -4.5  # obj
                b.data[:, 5:] += math.log(0.6 / (modules.nc - 0.99))  # cls (sigmoid(p) = 1/nc)
                module_list[j][0].bias = torch.nn.Parameter(b.view(-1), requires_grad=True)
            except Exception as e:
                print('WARNING: smart bias initialization failure.', e)
        else:
            print("Warning: Unrecognized Layer Type: " + mdef["type"])

        # Register module list and number of output filters
        module_list.append(modules)    # 将上面的modules都放在module_list中,上面module_list = nn.ModuleList(),ModuleList是Module的子类
        # 当添加nn.ModuleList作为nn.Module对象的一个成员时(即当我们添加模块到我们的网络时),所有nn.ModuleList内部的nn.Module的parameter也被添加作为我们网络的parameter.
        output_filters.append(filters)  #也一样每个模块的filters存入output_filters中

    routs_binary = [False] * len(modules_defs)   #建立一个列表 modules_defs有几个这个false就有多少
    for i in routs:
        routs_binary[i] = True   # routs记录的是一个索引,在有索引的地方设置为ture
    return module_list, routs_binary    # 回到了 Darknet中create_modules


class YOLOLayer(nn.Module):
    """
    对YOLO的输出进行处理
    """
    def __init__(self, anchors, nc, img_size, stride):   #nc为类别
        super(YOLOLayer, self).__init__()
        self.anchors = torch.Tensor(anchors)      #之前的anchor为numpy格式转为tensor格式
        self.stride = stride  # layer stride 特征图上一步对应原图上的步距 [32, 16, 8]
        self.na = len(anchors)  # number of anchors (3)  anchor的数量就是3
        self.nc = nc  # number of classes (80)  类别
        self.no = nc + 5  # 每个anchor预测的参数80+5 (85: x, y, w, h, obj, cls1, ...)
        self.nx, self.ny, self.ng = 0, 0, (0, 0)  # nx ny是预测特征层宽度和高度,ng预测特征层size 初始化都为0
        self.anchor_vec = self.anchors / self.stride    # 将anchors大小缩放到grid预测特征层上的尺度
        # batch_size, na, grid_h, grid_w, wh,
        # 值为1的维度对应的值不是固定值,后续操作可根据broadcast广播机制自动扩充,就是会随着输入数据不同而变化
        self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1, 2)  #缩放后的anchor进行处理,调整视图,得到了anchor相对于grid cell的宽和高就是pw和ph
        self.grid = None  #正向传播中重新赋值

        if ONNX_EXPORT:
            self.training = False
            self.create_grids((img_size[1] // stride, img_size[0] // stride))  # number x, y grid points

    def create_grids(self, ng=(13, 13), device="cpu"):
        """
        更新grids信息并生成新的grids参数
        :param ng: 特征图大小
        :param device:
        :return:
        """
        self.nx, self.ny = ng     # 特征图的宽和高
        self.ng = torch.tensor(ng, dtype=torch.float)   #转换为tensor赋值给ng

        # build xy offsets 构建每个cell处的anchor的xy偏移量(在feature map上的)
        if not self.training:  # 训练模式不需要回归到最终预测boxes,训练时只需要计算损失即可,不需要把这个框预测出来
            yv, xv = torch.meshgrid([torch.arange(self.ny, device=device),
                                     torch.arange(self.nx, device=device)])
            # arange产生的是由1-9组成的1维度张量 ,类型int
            # meshgrid得到以左上角为原点的x的坐标和y坐标分别赋值给yv和xv
            # batch_size, na, grid_h, grid_w, wh
            self.grid = torch.stack((xv, yv), 2).view((1, 1, self.ny, self.nx, 2)).float()
            # stack对xv和xy进行拼接,就是变成一个个(x,y),通过view调整视图
        if self.anchor_vec.device != device:        #判断设备类型是否一样
            self.anchor_vec = self.anchor_vec.to(device)
            self.anchor_wh = self.anchor_wh.to(device)

    def forward(self, p):  #p是预测的参数,,包括batch_size,anchor带下,图片种类,置信度一系列参数
        if ONNX_EXPORT:
            bs = 1  # batch size
        else:
            bs, _, ny, nx = p.shape  # 获取p的shape,赋值给batch_size, predict_param(255)不会用到所以_, grid(13)高, grid(13)宽
            if (self.nx, self.ny) != (nx, ny) or self.grid is None:  # 判断输入特征矩阵的宽和高是否发生变化或者说grid为none就是第一次开始就要设置grid的参数
                self.create_grids((nx, ny), p.device)   #设置grid 13*13大小

        # view: (batch_size, 255, 13, 13) -> (batch_size, 3, 85, 13, 13)
        # permute: (batch_size, 3, 85, 13, 13) -> (batch_size, 3, 13, 13, 85)
        # [bs, anchor, grid, grid, xywh + obj + classes]
        p = p.view(bs, self.na, self.no, self.ny, self.nx).permute(0, 1, 3, 4, 2).contiguous()  #  调整视图 将之前的_变为anchor的个数3和预测的参数85
        # permute调整每个参数的位置 见上面注释,改变参数位置之后在原有内存中这些参数不再连续,就使用contiguous()方法让他们连续
        if self.training: #训练模式直接返回我们处理好的p
            return p
        elif ONNX_EXPORT:
            # Avoid broadcasting for ANE operations
            m = self.na * self.nx * self.ny  # 3*
            ng = 1. / self.ng.repeat(m, 1)
            grid = self.grid.repeat(1, self.na, 1, 1, 1).view(m, 2)
            anchor_wh = self.anchor_wh.repeat(1, 1, self.nx, self.ny, 1).view(m, 2) * ng

            p = p.view(m, self.no)
            # xy = torch.sigmoid(p[:, 0:2]) + grid  # x, y
            # wh = torch.exp(p[:, 2:4]) * anchor_wh  # width, height
            # p_cls = torch.sigmoid(p[:, 4:5]) if self.nc == 1 else \
            #     torch.sigmoid(p[:, 5:self.no]) * torch.sigmoid(p[:, 4:5])  # conf
            p[:, :2] = (torch.sigmoid(p[:, 0:2]) + grid) * ng  # x, y
            p[:, 2:4] = torch.exp(p[:, 2:4]) * anchor_wh  # width, height
            p[:, 4:] = torch.sigmoid(p[:, 4:])
            p[:, 5:] = p[:, 5:self.no] * p[:, 4:5]
            return p
        else:  # inference验证时
            # [bs, anchor, grid, grid, xywh + obj + classes]
            io = p.clone()  # inference output   将p克隆给io
            io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid  # xy 计算在feature map上的中心点xy坐标
            # ...是bs, anchor, grid, grid这些参数,:2是取最后的85个参数中的前两个也就是x和y,对xy进行sigmoid处理在+之前左上方grid网格坐标就得到xy坐标了
            io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh  # 2:4意味着 wh这两个参数,对应公式e的tw次方和e的th次方,计算anchor的宽和高
            io[..., :4] *= self.stride  # 换算映射回原图尺度
            torch.sigmoid_(io[..., 4:])  #将置信度和后面的类别信息sigmoid函数处理,定义在0,1之间
            return io.view(bs, -1, self.no), p  # view [1, 3, 13, 13, 85] as [1, 507, 85]   -1自动处理机制 13*13是一个举例
            #回到YOLOLayer

class Darknet(nn.Module):
    """
    YOLOv3 spp object detection model
    """
    def __init__(self, cfg, img_size=(416, 416), verbose=False):
        # 有三个参数,cfg为参数配置文件,img_size输入图像大小,训练时未用到这个参数,verbose是否打印模型每个模块详细信息
        super(Darknet, self).__init__()
        # 这里传入的img_size只在导出ONNX模型时起作用,isinstance()判断一个函数是否是一个已知的类型
        self.input_size = [img_size] * 2 if isinstance(img_size, int) else img_size
        # 解析网络对应的.cfg文件,parse_model_cfg函数是parse_config.py文件中定义的解析网络的函数。
        self.module_defs = parse_model_cfg(cfg)
        # 根据解析的网络结构通过create_modules()完成网络结构建立,返回的值传递给module_list,routs为索引
        self.module_list, self.routs = create_modules(self.module_defs, img_size)
        # 获取所有YOLOLayer层的索引
        self.yolo_layers = get_yolo_layers(self)

        # 打印下模型的信息,如果verbose为True则打印详细信息,上面设置为false了
        self.info(verbose) if not ONNX_EXPORT else None  # print model description

    def forward(self, x, verbose=False):
        return self.forward_once(x, verbose=verbose)

    def forward_once(self, x, verbose=False):
        # yolo_out收集每个yolo_layer层的输出
        # out收集每个模块的输出
        yolo_out, out = [], []
        if verbose:
            print('0', x.shape)
            str = ""

        for i, module in enumerate(self.module_list):
            name = module.__class__.__name__
            if name in ["WeightedFeatureFusion", "FeatureConcat"]:  # sum, concat
                if verbose:
                    l = [i - 1] + module.layers  # layers
                    sh = [list(x.shape)] + [list(out[i].shape) for i in module.layers]  # shapes
                    str = ' >> ' + ' + '.join(['layer %g %s' % x for x in zip(l, sh)])
                x = module(x, out)  # WeightedFeatureFusion(), FeatureConcat() 这里解释一下为什么写成module(x, out)
                                    # 可以在回顾一下WeightedFeatureFusion和FeatureConcat的正向传播过程,就是有两个参数x和output
            elif name == "YOLOLayer":
                yolo_out.append(module(x))
            else:  # run module directly, i.e. mtype = 'convolutional', 'upsample', 'maxpool', 'batchnorm2d' etc.
                x = module(x)

            out.append(x if self.routs[i] else [])
            if verbose:
                print('%g/%g %s -' % (i, len(self.module_list), name), list(x.shape), str)
                str = ''

        if self.training:  # train
            return yolo_out
        elif ONNX_EXPORT:  # export
            # x = [torch.cat(x, 0) for x in zip(*yolo_out)]
            # return x[0], torch.cat(x[1:3], 1)  # scores, boxes: 3780x80, 3780x4
            p = torch.cat(yolo_out, dim=0)

            # # 根据objectness虑除低概率目标
            # mask = torch.nonzero(torch.gt(p[:, 4], 0.1), as_tuple=False).squeeze(1)
            # # onnx不支持超过一维的索引(pytorch太灵活了)
            # # p = p[mask]
            # p = torch.index_select(p, dim=0, index=mask)
            #
            # # 虑除小面积目标,w > 2 and h > 2 pixel
            # # ONNX暂不支持bitwise_and和all操作
            # mask_s = torch.gt(p[:, 2], 2./self.input_size[0]) & torch.gt(p[:, 3], 2./self.input_size[1])
            # mask_s = torch.nonzero(mask_s, as_tuple=False).squeeze(1)
            # p = torch.index_select(p, dim=0, index=mask_s)  # width-height 虑除小目标
            #
            # if mask_s.numel() == 0:
            #     return torch.empty([0, 85])

            return p
        else:  # inference or test
            x, p = zip(*yolo_out)  # inference output, training output
            # zip() 函数用于将可迭代的对象作为参数,将对象中对应的元素打包成一个个元组,然后返回由这些元组组成的列表。
            x = torch.cat(x, 1)  # 最后的预测结果进行拼接

            return x, p

    def info(self, verbose=False):
        """
        打印模型的信息
        :param verbose:
        :return:
        """
        torch_utils.model_info(self, verbose)


def get_yolo_layers(self):
    """
    获取网络中三个"YOLOLayer"模块对应的索引
    :param self:
    :return:
    """
    return [i for i, m in enumerate(self.module_list) if m.__class__.__name__ == 'YOLOLayer']  # [89, 101, 113]```

The layers.py part of the code is as follows

class FeatureConcat(nn.Module):
    """
    将多个特征矩阵在channel维度进行concatenate拼接
    """
    def __init__(self, layers):
        super(FeatureConcat, self).__init__()
        self.layers = layers  # layer indices
        self.multiple = len(layers) > 1  # 如果layers中是多个参数 multiple返回ture

    def forward(self, x, outputs):
        return torch.cat([outputs[i] for i in self.layers], 1) if self.multiple else outputs[self.layers[0]]
        # C=torch.cat((A,B),1)就表示按维数1(列)拼接A和B,也就是横着拼接,A左B右
        # 如果是多个参数就将通道拼接在一起 否则就是那个通道
        # output在Darknet中建立
class WeightedFeatureFusion(nn.Module):  # weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070
    """
    将多个特征矩阵的值进行融合(add操作)
    """
    def __init__(self, layers, weight=False):
        super(WeightedFeatureFusion, self).__init__()
        self.layers = layers  # layer indices
        self.weight = weight  # apply weights boolean
        self.n = len(layers) + 1  # number of layers 融合的特征矩阵个数 两个矩阵融合
        if weight:
            self.w = nn.Parameter(torch.zeros(self.n), requires_grad=True)  # layer weights

    def forward(self, x, outputs):
        # Weights
        if self.weight:
            w = torch.sigmoid(self.w) * (2 / self.n)  # sigmoid weights (0-1)
            x = x * w[0]

        # Fusion
        nx = x.shape[1]  # input channels
        for i in range(self.n - 1):  #两个融合其实n=2 n-1=1,就循环一次
            a = outputs[self.layers[i]] * w[i + 1] if self.weight else outputs[self.layers[i]]  # feature to add
            na = a.shape[1]  # feature channels

            # Adjust channels
            # 根据相加的两个特征矩阵的channel选择相加方式
            if nx == na:  # same shape 如果channel相同,直接相加 只会用到这种情况
                x = x + a
            elif nx > na:  # slice input 如果channel不同,将channel多的特征矩阵砍掉部分channel保证相加的channel一致
                x[:, :na] = x[:, :na] + a  # or a = nn.ZeroPad2d((0, 0, 0, 0, 0, dc))(a); x = x + a
            else:  # slice feature
                x = x + a[:, :nx]

        return x


class MixConv2d(nn.Module):  # MixConv: Mixed Depthwise Convolutional Kernels https://arxiv.org/abs/1907.09595
    def __init__(self, in_ch, out_ch, k=(3, 5, 7), stride=1, dilation=1, bias=True, method='equal_params'):
        super(MixConv2d, self).__init__()

        groups = len(k)
        if method == 'equal_ch':  # equal channels per group
            i = torch.linspace(0, groups - 1E-6, out_ch).floor()  # out_ch indices
            ch = [(i == g).sum() for g in range(groups)]
        else:  # 'equal_params': equal parameter count per group
            b = [out_ch] + [0] * groups
            a = np.eye(groups + 1, groups, k=-1)
            a -= np.roll(a, 1, axis=1)
            a *= np.array(k) ** 2
            a[0] = 1
            ch = np.linalg.lstsq(a, b, rcond=None)[0].round().astype(int)  # solve for equal weight indices, ax = b

        self.m = nn.ModuleList([nn.Conv2d(in_channels=in_ch,
                                          out_channels=ch[g],
                                          kernel_size=k[g],
                                          stride=stride,
                                          padding=k[g] // 2,  # 'same' pad
                                          dilation=dilation,
                                          bias=bias) for g in range(groups)])

    def forward(self, x):
        return torch.cat([m(x) for m in self.m], 1)

I hope everyone can criticize and correct me. If you don’t like it, don’t spray it. Next, I will update the code about data reading and preprocessing according to the video sequence of @太阳花的小豆豆B station, which is the datasets.py file under the utils file.

Guess you like

Origin blog.csdn.net/JiatongForever/article/details/125973726