YOLOv5 Comprehensive Analysis Tutorial ①: Line-by-line code interpretation of network structure

e22910166c884ab92afe47ab1cb3656a.jpeg

Written by | Fengwen, BBuf

The code covered in this tutorial is at:

https://github.com/Oneflow-Inc/one-yolov5

The tutorial is also applicable to Ultralytics/YOLOv5, because One-YOLOv5 just changed a runtime backend, and the calculation logic and code have not changed compared to Ultralytics/YOLOv5. Welcome to star. For details, please see: A faster YOLOv5 comes out, with a comprehensive Chinese analysis tutorial

 1

introduction

YOLOv5 has the same overall network architecture for different sizes (n, s, m, l, x), but it uses different depths and widths in each sub-module, respectively responding to the depth_multiple and width_multiple parameters in the yaml file.

It should also be noted that in addition to the n, s, m, l, x versions, there are also n6, s6, m6, l6, x6 in the official version. The difference is that the latter is for larger resolution images such as 1280x1280. The difference is that the former will only downsample to 32 times and use 3 prediction feature layers, while the latter will downsample 64 times and use 4 prediction feature layers.

This chapter will take YOLOv5s as an example,

From the configuration file models/yolov5s.yaml

(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml) to models/yolo.py (https://github.com/Oneflow-Inc/one-yolov5/ blob/main/models/yolo.py)

The source code is interpreted.

 2

yolov5s.yaml file content

nc: 80  # number of classes 数据集中的类别数
depth_multiple: 0.33  # model depth multiple  模型层数因子(用来调整网络的深度)
width_multiple: 0.50  # layer channel multiple 模型通道数因子(用来调整网络的宽度)
# 如何理解这个depth_multiple和width_multiple呢?它决定的是整个模型中的深度(层数)和宽度(通道数),具体怎么调整的结合后面的backbone代码解释。


anchors: # 表示作用于当前特征图的Anchor大小为 xxx
# 9个anchor,其中P表示特征图的层级,P3/8该层特征图缩放为1/8,是第3层特征
  - [10,13, 16,30, 33,23]  # P3/8, 表示[10,13],[16,30], [33,23]3个anchor
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32




# YOLOv5s v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]


# YOLOv5s v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13


   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)


   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)


   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)


   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

3

anchor interpretation

YOLOv5 initializes 9 anchors, which are used in three feature maps (feature maps), and each grid cell of each feature map has three anchors for prediction. Distribution rules:

  • The feature map with a larger scale is closer to the front, and the downsampling rate of the original image is smaller, and the receptive field is smaller. Therefore, it is relatively possible to predict some objects with smaller scales (small targets), and the assigned anchors are smaller.

  • The smaller the feature map is, the lower the sampling rate relative to the original image is, and the receptive field is larger, so some objects with larger scales (large targets) can be predicted, so the assigned anchors are larger.

  • That is, large targets are detected on small feature maps, medium targets are detected on medium-sized feature maps, and small targets are detected on large feature maps.

4

Backbone & head Interpretation

[from, number, module, args] arguments

The meanings of the four parameters are:

  1. The first parameter from: which layer to get the input from, -1 means to get it from the previous layer, [-1, 6] means to get it from the upper layer and the 6th layer.

  2. The second parameter number: indicates that there are several identical modules, and if it is 9, it indicates that there are 9 identical modules.

  3. The third parameter module: the name of the module, these modules are written in common.py.

  4. The fourth parameter args: the initialization parameter of the class, which is used to parse the incoming parameters of the moudle.

Let's take the first module Conv as an example to introduce the modules in common.py

The Conv module is defined as follows:

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        """
        @Pargm c1: 输入通道数
        @Pargm c2: 输出通道数
        @Pargm k : 卷积核大小(kernel_size)
        @Pargm s : 卷积步长 (stride)
        @Pargm p : 特征图填充宽度 (padding)
        @Pargm g : 控制分组,必须整除输入的通道数(保证输入的通道能被正确分组)
        """
        super().__init__()
        # https://oneflow.readthedocs.io/en/master/generated/oneflow.nn.Conv2d.html?highlight=Conv
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())


    def forward(self, x):
        return self.act(self.bn(self.conv(x)))


    def forward_fuse(self, x):
        return self.act(self.conv(x))

For example, if width_multiple is set to 0.5 above, then the first [64, 6, 2, 2] will be parsed into [3,64*0.5=32,6,2,2], where the first 3 is the input channel (Because of the input), 32 is the output channel.

Detailed instructions for resizing the network

In line 256 of yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py), there are parameters such as nc and depth_multiple of the yaml file to read. The specific code is as follows :

anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']

The function of the "width_multiple" parameter has been introduced in the args parameter, so what is the function of "depth_multiple"?

Line 257 of yolo.py (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py) has a specific definition of parameters:

n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain 暂且将这段代码当作公式(1)

Among them, gd is the value of depth_multiple, and the value of n is the second parameter of the list in backbone:

According to the formula (1), it is easy to see that gd affects the size of n, thus affecting the structure size of the network.

The number of modules, the size and number of convolution kernels between the following layers have also changed. Compared with YOLOv5l, the size of the training parameters has increased exponentially.

The depth and width of the model will be much larger, which makes the accuracy of YOLOv5l much better than YOLOv5s, so the detection accuracy in the final inference is high, but the inference speed of the model is slower.

Therefore, YOLOv5 provides different options. If you want to pursue inference speed, you can choose smaller models such as YOLOv5s and YOLOv5m. If you want to pursue higher accuracy and do not require high inference speed, you can choose the other two slightly larger models.

As shown in the picture below:

ec8445cc7ae031b5da990a8a5c9c2592.pngyolov5 model complexity comparison chart

5

Conv module interpretation

Network structure preview

The following is according to yolov5s.yaml

(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml) A simplified version of the overall network structure drawn.

cfda0b9528cfca49c9fa55391ce0a0fb.pngOverall structure diagram of yolov5s network

  1. Detailed network structure diagram:

    https://oneflow-static.oss-cn-beijing.aliyuncs.com/one-yolo/imgs/yolov5s.onnx.png

    The onnx format exported through export.py, and the pictures exported through the https://netron.app/ website (model export will be introduced separately in the subsequent articles of this tutorial).

  2. The parameters on the right side of the module component represent the shape of the feature map. For example, the shape of the input image in the first layer (Conv) is [3, 640, 640]. Regarding these parameters, a fixed image can be input to the network and passed through yolov5s.yaml 

    (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml) The model parameters are calculated and can be found in the project models/yolo.py (https://github.com /Oneflow-Inc/one-yolov5/blob/main/models/yolo.py) Use the code to print and view. For detailed data, please refer to the attached table 2.1.

6

Interpretation of yolo.py module

File address (https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolo.py)

The file mainly contains three parts: Detect class, Model class and parse_model function

You can run the script through python models/yolo.py --cfg yolov5s.yaml to observe

7

Interpretation of parse_model function

def parse_model(d, ch):  # model_dict, input_channels(3)
    """用在下面Model模块中
    解析模型文件(字典形式),并搭建网络结构
    这个函数其实主要做的就是: 更新当前层的args(参数),计算c2(当前层的输出channel) =>
                          使用当前层的参数搭建当前层 =>
                          生成 layers + save
    @Params d: model_dict 模型文件 字典形式 {dict:7}  [yolov5s.yaml](https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)中的6个元素 + ch
    #Params ch: 记录模型每一层的输出channel 初始ch=[3] 后面会删除
    @return nn.Sequential(*layers): 网络的每一层的层结构
    @return sorted(save): 把所有层结构中from不是-1的值记下 并排序 [4, 6, 10, 14, 17, 20, 23]
    """
    LOGGER.info(f"\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}")
    # 读取d字典中的anchors和parameters(nc、depth_multiple、width_multiple)
    anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']
    # na: number of anchors 每一个predict head上的anchor数 = 3
    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors
    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5) 每一个predict head层的输出channel 
    # 开始搭建网络
    # layers: 保存每一层的层结构
    # save: 记录下所有层结构中from中不是-1的层结构序号
    # c2: 保存当前层的输出channel
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    # enumerate() 函数用于将一个可遍历的数据对象(如列表、元组或字符串)组合为一个索引序列,同时列出数据和数据下标,一般用在 for 循环当中。
    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args
        m = eval(m) if isinstance(m, str) else m  # eval strings
        for j, a in enumerate(args):
            # args是一个列表,这一步把列表中的内容取出来
            with contextlib.suppress(NameError):
                args[j] = eval(a) if isinstance(a, str) else a  # eval strings
        
        # 将深度与深度因子相乘,计算层深度。深度最小为1. 
        n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain
        
        # 如果当前的模块m在本项目定义的模块类型中,就可以处理这个模块
        if m in (Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,
                 BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x):
            # c1: 输入通道数 c2:输出通道数
            c1, c2 = ch[f], args[0] 
            # 该层不是最后一层,则将通道数乘以宽度因子 也就是说,宽度因子作用于除了最后一层之外的所有层
            if c2 != no:  # if not output
                # make_divisible的作用,使得原始的通道数乘以宽度因子之后取整到8的倍数,这样处理一般是让模型的并行性和推理性能更好。
                c2 = make_divisible(c2 * gw, 8)


            # 将前面的运算结果保存在args中,它也就是这个模块最终的输入参数。
            args = [c1, c2, *args[1:]] 
            # 根据每层网络参数的不同,分别处理参数 具体各个类的参数是什么请参考它们的__init__方法这里不再详细解释了
            if m in [BottleneckCSP, C3, C3TR, C3Ghost, C3x]:
                # 这里的意思就是重复n次,比如conv这个模块重复n次,这个n 是上面算出来的 depth 
                args.insert(2, n)  # number of repeats
                n = 1
        elif m is nn.BatchNorm2d:
            args = [ch[f]]
        elif m is Concat:
            c2 = sum(ch[x] for x in f)
        elif m is Detect:
            args.append([ch[x] for x in f])
            if isinstance(args[1], int):  # number of anchors
                args[1] = [list(range(args[1] * 2))] * len(f)
        elif m is Contract:
            c2 = ch[f] * args[0] ** 2
        elif m is Expand:
            c2 = ch[f] // args[0] ** 2
        else:
            c2 = ch[f]
        # 构建整个网络模块 这里就是根据模块的重复次数n以及模块本身和它的参数来构建这个模块和参数对应的Module
        m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
        # 获取模块(module type)具体名例如 models.common.Conv , models.common.C3 , models.common.SPPF 等。
        t = str(m)[8:-2].replace('__main__.', '')  #  replace函数作用是字符串"__main__"替换为'',在当前项目没有用到这个替换。
        np = sum(x.numel() for x in m_.parameters())  # number params
        m_.i, m_.f, m_.type, m_.np = i, f, t, np  # attach index, 'from' index, type, number params
        LOGGER.info(f'{i:>3}{str(f):>18}{n_:>3}{np:10.0f}  {t:<40}{str(args):<30}')  # print
        """
        如果x不是-1,则将其保存在save列表中,表示该层需要保存特征图。
        这里 x % i 与 x 等价例如在最后一层 : 
        f = [17,20,23] , i = 24 
        y = [ x % i for x in ([f] if isinstance(f, int) else f) if x != -1 ]
        print(y) # [17, 20, 23] 
        # 写成x % i 可能因为:i - 1 = -1 % i (比如 f = [-1],则 [x % i for x in f] 代表 [11] )
        """
        save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
        layers.append(m_)
        if i == 0: # 如果是初次迭代,则新创建一个ch(因为形参ch在创建第一个网络模块时需要用到,所以创建网络模块之后再初始化ch)
            ch = []
        ch.append(c2)
    # 将所有的层封装为nn.Sequential , 对保存的特征图排序
    return nn.Sequential(*layers), sorted(save)

8

Model class interpretation

class Model(nn.Module):
    # YOLOv5 model
    def __init__(self, cfg='[yolov5s.yaml](https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)', ch=3, nc=None, anchors=None):  # model, input channels, number of classes
        super().__init__()
        # 如果cfg已经是字典,则直接赋值,否则先加载cfg路径的文件为字典并赋值给self.yaml。
        if isinstance(cfg, dict): 
            self.yaml = cfg  # model dict
        else:  # is *.yaml  加载yaml模块
            import yaml  # for flow hub 
            self.yaml_file = Path(cfg).name
            with open(cfg, encoding='ascii', errors='ignore') as f:
                self.yaml = yaml.safe_load(f)  # model dict  从yaml文件中加载出字典


        # Define model
        # ch: 输入通道数。 假如self.yaml有键‘ch’,则将该键对应的值赋给内部变量ch。假如没有‘ch’,则将形参ch赋给内部变量ch
        ch = self.yaml['ch'] = self.yaml.get('ch', ch)  # input channels
        # 假如yaml中的nc和方法形参中的nc不一致,则覆盖yaml中的nc。
        if nc and nc != self.yaml['nc']:
            LOGGER.info(f"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}")
            self.yaml['nc'] = nc  # override yaml value
        if anchors: # anchors  先验框的配置
            LOGGER.info(f'Overriding model.yaml anchors with anchors={anchors}')
            self.yaml['anchors'] = round(anchors)  # override yaml value
        # 得到模型,以及对应的保存的特征图列表。    
        self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch])  # model, savelist
        self.names = [str(i) for i in range(self.yaml['nc'])]  # default names 初始化类名列表,默认为[0,1,2...]
        
        # self.inplace=True  默认True  节省内存
        self.inplace = self.yaml.get('inplace', True)


        # Build strides, anchors  确定步长、步长对应的锚框
        m = self.model[-1]  # Detect()
        if isinstance(m, Detect): # 检验模型的最后一层是Detect模块
            s = 256  # 2x min stride
            m.inplace = self.inplace
            # 计算三个feature map下采样的倍率  [8, 16, 32]
            m.stride = flow.tensor([s / x.shape[-2] for x in self.forward(flow.zeros(1, ch, s, s))])  # forward
            # 检查anchor顺序与stride顺序是否一致 anchor的顺序应该是从小到大,这里排一下序
            check_anchor_order(m)  # must be in pixel-space (not grid-space)
            # 对应的anchor进行缩放操作,原因:得到anchor在实际的特征图中的位置,因为加载的原始anchor大小是相对于原图的像素,但是经过卷积池化之后,特征图的长宽变小了。
            m.anchors /= m.stride.view(-1, 1, 1)
            self.stride = m.stride
            self._initialize_biases() # only run once  初始化偏置 


        # Init weights, biases
        # 调用oneflow_utils.py下initialize_weights初始化模型权重
        initialize_weights(self)
        self.info() # 打印模型信息
        LOGGER.info('')
    # 管理前向传播函数
    def forward(self, x, augment=False, profile=False, visualize=False):
        if augment:# 是否在测试时也使用数据增强  Test Time Augmentation(TTA)
            return self._forward_augment(x)  # augmented inference, None
        return self._forward_once(x, profile, visualize)  # single-scale inference, train
    # 带数据增强的前向传播
    def _forward_augment(self, x):
        img_size = x.shape[-2:]  # height, width
        s = [1, 0.83, 0.67]  # scales
        f = [None, 3, None]  # flips (2-ud, 3-lr)
        y = []  # outputs
        for si, fi in zip(s, f):
            xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max()))
            yi = self._forward_once(xi)[0]  # forward
            # cv2.imwrite(f'img_{si}.jpg', 255 * xi[0].cpu().numpy().transpose((1, 2, 0))[:, :, ::-1])  # save
            yi = self._descale_pred(yi, fi, si, img_size)
            y.append(yi)
        y = self._clip_augmented(y)  # clip augmented tails
        return flow.cat(y, 1), None  # augmented inference, train
    # 前向传播具体实现
    def _forward_once(self, x, profile=False, visualize=False):
        """
        @params x: 输入图像
        @params profile: True 可以做一些性能评估
        @params feature_vis: True 可以做一些特征可视化
        """
        # y: 存放着self.save=True的每一层的输出,因为后面的特征融合操作要用到这些特征图
        y, dt = [], []  # outputs
        # 前向推理每一层结构   m.i=index   m.f=from   m.type=类名   m.np=number of params
        for m in self.model:
            # if not from previous layer   m.f=当前层的输入来自哪一层的输出  s的m.f都是-1
            if m.f != -1:  # if not from previous layer
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            if profile:
                self._profile_one_layer(m, x, dt)
            x = m(x)  # run
            y.append(x if m.i in self.save else None)  # save output
            if visualize:
                feature_visualization(x, m.type, m.i, save_dir=visualize)
        return x
    # 将推理结果恢复到原图图片尺寸(逆操作)
    def _descale_pred(self, p, flips, scale, img_size):
        # de-scale predictions following augmented inference (inverse operation)
        """用在上面的__init__函数上
        将推理结果恢复到原图图片尺寸  Test Time Augmentation(TTA)中用到
         de-scale predictions following augmented inference (inverse operation)
        @params p: 推理结果
        @params flips:
        @params scale:
        @params img_size:
        """
        if self.inplace:
            p[..., :4] /= scale  # de-scale
            if flips == 2:
                p[..., 1] = img_size[0] - p[..., 1]  # de-flip ud
            elif flips == 3:
                p[..., 0] = img_size[1] - p[..., 0]  # de-flip lr
        else:
            x, y, wh = p[..., 0:1] / scale, p[..., 1:2] / scale, p[..., 2:4] / scale  # de-scale
            if flips == 2:
                y = img_size[0] - y  # de-flip ud
            elif flips == 3:
                x = img_size[1] - x  # de-flip lr
            p = flow.cat((x, y, wh, p[..., 4:]), -1)
        return p
    # 这个是TTA的时候对原图片进行裁剪,也是一种数据增强方式,用在TTA测试的时候。
    def _clip_augmented(self, y):
        # Clip YOLOv5 augmented inference tails
        nl = self.model[-1].nl  # number of detection layers (P3-P5)
        g = sum(4 ** x for x in range(nl))  # grid points
        e = 1  # exclude layer count
        i = (y[0].shape[1] // g) * sum(4 ** x for x in range(e))  # indices
        y[0] = y[0][:, :-i]  # large
        i = (y[-1].shape[1] // g) * sum(4 ** (nl - 1 - x) for x in range(e))  # indices
        y[-1] = y[-1][:, i:]  # small
        return y
    # 打印日志信息  前向推理时间
    def _profile_one_layer(self, m, x, dt):
        c = isinstance(m, Detect)  # is final layer, copy input as inplace fix
        o = thop.profile(m, inputs=(x.copy() if c else x,), verbose=False)[0] / 1E9 * 2 if thop else 0  # FLOPs
        t = time_sync()
        for _ in range(10):
            m(x.copy() if c else x)
        dt.append((time_sync() - t) * 100)
        if m == self.model[0]:
            LOGGER.info(f"{'time (ms)':>10s} {'GFLOPs':>10s} {'params':>10s}  module")
        LOGGER.info(f'{dt[-1]:10.2f} {o:10.2f} {m.np:10.0f}  {m.type}')
        if c:
            LOGGER.info(f"{sum(dt):10.2f} {'-':>10s} {'-':>10s}  Total")
    # initialize biases into Detect(), cf is class frequency
    def _initialize_biases(self, cf=None): 
        # https://arxiv.org/abs/1708.02002 section 3.3
        # cf = flow.bincount(flow.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.
        m = self.model[-1]  # Detect() module
        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1).detach()  # conv.bias(255) to (3,85)
            b[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)
            b[:, 5:] += math.log(0.6 / (m.nc - 0.999999)) if cf is None else flow.log(cf / cf.sum())  # cls
            mi.bias = flow.nn.Parameter(b.view(-1), requires_grad=True)
    #  打印模型中最后Detect层的偏置biases信息(也可以任选哪些层biases信息)
    def _print_biases(self):
        """
        打印模型中最后Detect模块里面的卷积层的偏置biases信息(也可以任选哪些层biases信息)
        """
        m = self.model[-1]  # Detect() module
        for mi in m.m:  # from
            b = mi.bias.detach().view(m.na, -1).T  # conv.bias(255) to (3,85)
            LOGGER.info(
                ('%6g Conv2d.bias:' + '%10.3g' * 6) % (mi.weight.shape[1], *b[:5].mean(1).tolist(), b[5:].mean()))


    def _print_weights(self):
        """
        打印模型中Bottleneck层的权重参数weights信息(也可以任选哪些层weights信息)
        """
        for m in self.model.modules():
            if type(m) is Bottleneck:
                LOGGER.info('%10.3g' % (m.w.detach().sigmoid() * 2))  # shortcut weights
    
    # fuse()是用来进行conv和bn层合并,为了提速模型推理速度。
    def fuse(self):  # fuse model Conv2d() + BatchNorm2d() layers
        """用在detect.py、val.py
        fuse model Conv2d() + BatchNorm2d() layers
        调用oneflow_utils.py中的fuse_conv_and_bn函数和common.py中Conv模块的fuseforward函数
        """
        LOGGER.info('Fusing layers... ')
        for m in self.model.modules():
            # 如果当前层是卷积层Conv且有bn结构, 那么就调用fuse_conv_and_bn函数讲conv和bn进行融合, 加速推理
            if isinstance(m, (Conv, DWConv)) and hasattr(m, 'bn'):
                m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv
                delattr(m, 'bn')  # remove batchnorm  移除bn remove batchnorm
                m.forward = m.forward_fuse  # update forward 更新前向传播 update forward (反向传播不用管, 因为这种推理只用在推理阶段)
        self.info()  # 打印conv+bn融合后的模型信息
        return self
    # 打印模型结构信息 在当前类__init__函数结尾处有调用
    def info(self, verbose=False, img_size=640):  # print model information
        model_info(self, verbose, img_size)


    def _apply(self, fn):
        # Apply to(), cpu(), cuda(), half() to model tensors that are not parameters or registered buffers
        self = super()._apply(fn)
        m = self.model[-1]  # Detect()
        if isinstance(m, Detect):
            m.stride = fn(m.stride)
            m.grid = list(map(fn, m.grid))
            if isinstance(m.anchor_grid, list):
                m.anchor_grid = list(map(fn, m.anchor_grid))
        return self

9

Interpretation of Detect class

class Detect(nn.Module):
    """
    Detect模块是用来构建Detect层的,将输入feature map 通过一个卷积操作和公式计算到我们想要的shape, 为后面的计算损失或者NMS后处理作准备
    """
    stride = None  # strides computed during build
    onnx_dynamic = False  # ONNX export parameter
    export = False  # export mode


    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer
        super().__init__()
        #  nc:分类数量
        self.nc = nc  # number of classes  
        #  no:每个anchor的输出数
        self.no = nc + 5  # number of outputs per anchor
        # nl:预测层数,此次为3
        self.nl = len(anchors)  # number of detection layers
        #  na:anchors的数量,此次为3
        self.na = len(anchors[0]) // 2  # number of anchors
        #  grid:格子坐标系,左上角为(1,1),右下角为(input.w/stride,input.h/stride)
        self.grid = [flow.zeros(1)] * self.nl  # init grid
        self.anchor_grid = [flow.zeros(1)] * self.nl  # init anchor grid
        # 写入缓存中,并命名为anchors
        self.register_buffer('anchors', flow.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)
        # 将输出通过卷积到 self.no * self.na 的通道,达到全连接的作用
        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv
        self.inplace = inplace  # use inplace ops (e.g. slice assignment)


    def forward(self, x):
        z = []  # inference output
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()


            if not self.training:  # inference
                if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    # 向前传播时需要将相对坐标转换到grid绝对坐标系中
                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)
                y = x[i].sigmoid()
                if self.inplace:
                    y[..., 0:2] = (y[..., 0:2] * 2 + self.grid[i]) * self.stride[i]  # xy
                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh
                else:  # for YOLOv5 on AWS Inferentia https://github.com/ultralytics/yolov5/pull/2953
                    xy, wh, conf = y.split((2, 2, self.nc + 1), 4)  # y.tensor_split((2, 4, 5), 4)  
                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy
                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh
                    y = flow.cat((xy, wh, conf), 4)
                z.append(y.view(bs, -1, self.no))


        return x if self.training else (flow.cat(z, 1),) if self.export else (flow.cat(z, 1), x)
    
    # 相对坐标转换到grid绝对坐标系
    def _make_grid(self, nx=20, ny=20, i=0):
        d = self.anchors[i].device
        t = self.anchors[i].dtype
        shape = 1, self.na, ny, nx, 2  # grid shape
        y, x = flow.arange(ny, device=d, dtype=t), flow.arange(nx, device=d, dtype=t)
       
        yv, xv = flow.meshgrid(y, x, indexing="ij")
        grid = flow.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5
        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)
        return grid, anchor_grid

10

appendix

Table 2.1 yolov5s.yaml parsing table

(https://github.com/Oneflow-Inc/one-yolov5/blob/main/models/yolov5s.yaml)

layers form moudule arguments input output
0 -1 Conv [3, 32, 6, 2, 2] [3, 640, 640] [32, 320, 320]
1 -1 Conv [32, 64, 3, 2] [32, 320, 320] [64, 160, 160]
2 -1 C3 [64, 64, 1] [64, 160, 160] [64, 160, 160]
3 -1 Conv [64, 128, 3, 2] [64, 160, 160] [128, 80, 80]
4 -1 C3 [128, 128, 2] [128, 80, 80] [128, 80, 80]
5 -1 Conv [128, 256, 3, 2] [128, 80, 80] [256, 40, 40]
6 -1 C3 [256, 256, 3] [256, 40, 40] [256, 40, 40]
7 -1 Conv [256, 512, 3, 2] [256, 40, 40] [512, 20, 20]
8 -1 C3 [512, 512, 1] [512, 20, 20] [512, 20, 20]
9 -1 SPPF [512, 512, 5] [512, 20, 20] [512, 20, 20]
10 -1 Conv [512, 256, 1, 1] [512, 20, 20] [256, 20, 20]
11 -1 Upsample [None, 2, 'nearest'] [256, 20, 20] [256, 40, 40]
12 [-1, 6] Concat [1] [1, 256, 40, 40], [1, 256, 40, 40] [512, 40, 40]
13 -1 C3 [512, 256, 1, False] [512, 40, 40] [256, 40, 40]
14 -1 Conv [256, 128, 1, 1] [256, 40, 40] [128, 40, 40]
15 -1 Upsample [None, 2, 'nearest'] [128, 40, 40] [128, 80, 80]
16 [-1, 4] Concat [1] [1, 128, 80, 80], [1, 128, 80, 80] [256, 80, 80]
17 -1 C3 [256, 128, 1, False] [256, 80, 80] [128, 80, 80]
18 -1 Conv [128, 128, 3, 2] [128, 80, 80] [128, 40, 40]
19 [-1, 14] Concat [1] [1, 128, 40, 40], [1, 128, 40, 40] [256, 40, 40]
20 -1 C3 [256, 256, 1, False] [256, 40, 40] [256, 40, 40]
twenty one -1 Conv [256, 256, 3, 2] [256, 40, 40] [256, 20, 20]
twenty two [-1, 10] Concat [1] [1, 256, 20, 20], [1, 256, 20, 20] [512, 20, 20]
twenty three -1 C3 [512, 512, 1, False] [512, 20, 20] [512, 20, 20]
twenty four [17, 20, 23] Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]] [1, 128, 80, 80], [1, 256, 40, 40], [1, 512, 20, 20] [1, 3, 80, 80, 85],[1, 3, 40, 40, 85],[1, 3, 20, 20, 85]

11

reference article

  • https://zhuanlan.zhihu.com/p/436891962?ivk_sa=1025922q

  • https://zhuanlan.zhihu.com/p/110204563

  • https://www.it610.com/article/1550621248474648576.htm


everyone else is watching

欢迎Star、试用OneFlow最新版本:GitHub - Oneflow-Inc/oneflow: OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient. - GitHub - Oneflow-Inc/oneflow: OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.https://github.com/Oneflow-Inc/oneflow/

Guess you like

Origin blog.csdn.net/OneFlow_Official/article/details/128310247