[6] from scratch learning YOLOv3 The model constructed in YOLOLayer

Introduction: The last spoke YOLOv3 in model building, again reading from start to finish processing the cfg to model the entire construction process. Which model to build the most important YOLOLayer not sort out this article from the perspective of building understanding and implementation of the code of YOLOLayer.

1. Grid Creating

YOLOv3 is a single stage of the target detector, the target is divided into different grid, each grid assigned as a priori three anchor block matching is performed. First read the section on grid code created.

First look at pytorch in the API:torch.mershgrid

A simple example is more clear:

Python 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> a = torch.arange(3)
>>> b = torch.arange(5)
>>> x,y = torch.meshgrid(a,b)
>>> a
tensor([0, 1, 2])
>>> b
tensor([0, 1, 2, 3, 4])
>>> x
tensor([[0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1],
        [2, 2, 2, 2, 2]])
>>> y
tensor([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4]])
>>>

Simply look at input and output, may not quite understand, he cited an example:

>>> for i in range(3):
...     for j in range(4):
...         print("(", x[i,j], "," ,y[i,j],")")
...
( tensor(0) , tensor(0) )
( tensor(0) , tensor(1) )
( tensor(0) , tensor(2) )
( tensor(0) , tensor(3) )
( tensor(1) , tensor(0) )
( tensor(1) , tensor(1) )
( tensor(1) , tensor(2) )
( tensor(1) , tensor(3) )
( tensor(2) , tensor(0) )
( tensor(2) , tensor(1) )
( tensor(2) , tensor(2) )
( tensor(2) , tensor(3) )

>>> torch.stack((x,y),2)
tensor([[[0, 0],
         [0, 1],
         [0, 2],
         [0, 3],
         [0, 4]],

        [[1, 0],
         [1, 1],
         [1, 2],
         [1, 3],
         [1, 4]],

        [[2, 0],
         [2, 1],
         [2, 2],
         [2, 3],
         [2, 4]]])
>>>

Now more clearly, divided grid 3 × 4, obtained by traversing x and y can traverse the entire grid.

Here is the code yolov3 provided (Note that this is for a layer YOLOLayer, but not all YOLOLayer):

def create_grids(self,
                 img_size=416,
                 ng=(13, 13),
                 device='cpu',
                 type=torch.float32):
    nx, ny = ng  # 网格尺寸
    self.img_size = max(img_size)
    #下采样倍数为32
    self.stride = self.img_size / max(ng)

    # 划分网格,构建相对左上角的偏移量
    yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])
    # 通过以上例子很容易理解
    self.grid_xy = torch.stack((xv, yv), 2).to(device).type(type).view(
        (1, 1, ny, nx, 2))

    # 处理anchor,将其除以下采样倍数
    self.anchor_vec = self.anchors.to(device) / self.stride
    self.anchor_wh = self.anchor_vec.view(1, self.na, 1, 1,
                                          2).to(device).type(type)
    self.ng = torch.Tensor(ng).to(device)
    self.nx = nx
    self.ny = ny

2. yololay

Article before mentioned, the number of layer of the front YOLO convolution filter layer having specific requirements, calculated as:
\ [filter \ _num = Anchor \ _num \ Times (+ classes. 5 \ _num) \]
as follows Figure:

Training process:

YOLOLayer action is a convolution of the tensor layer was processed, the code can be seen particularly in the training process involves (temporarily ONNX about code portion):

class YOLOLayer(nn.Module):
    def __init__(self, anchors, nc, img_size, yolo_index, arc):
        super(YOLOLayer, self).__init__()

        self.anchors = torch.Tensor(anchors)
        self.na = len(anchors)  # 该YOLOLayer分配给每个grid的anchor的个数
        self.nc = nc  # 类别个数
        self.no = nc + 5  # 每个格子对应输出的维度 class + 5 中5代表x,y,w,h,conf
        self.nx = 0  # 初始化x方向上的格子数量
        self.ny = 0  # 初始化y方向上的格子数量
        self.arc = arc

        if ONNX_EXPORT:  # grids must be computed in __init__
            stride = [32, 16, 8][yolo_index]  # stride of this layer
            nx = int(img_size[1] / stride)  # number x grid points
            ny = int(img_size[0] / stride)  # number y grid points
            create_grids(self, img_size, (nx, ny))

    def forward(self, p, img_size, var=None):
        '''
        onnx代表开放式神经网络交换
        pytorch中的模型都可以导出或转换为标准ONNX格式
        在模型采用ONNX格式后,即可在各种平台和设备上运行
        在这里ONNX代表规范化的推理过程
        '''
        if ONNX_EXPORT:
            bs = 1  # batch size
        else:
            bs, _, ny, nx = p.shape  # bs, 255, 13, 13
            if (self.nx, self.ny) != (nx, ny):
                create_grids(self, img_size, (nx, ny), p.device, p.dtype)

        # p.view(bs, 255, 13, 13) -- > (bs, 3, 13, 13, 85)
        # (bs, anchors, grid, grid, classes + xywh)
        p = p.view(bs, self.na, self.no, self.ny,
                   self.nx).permute(0, 1, 3, 4, 2).contiguous()  

        if self.training:
            return p

When understanding the above code, a need to understand the meaning of each channel represent, P is the original feature map obtained by a layer of convoluted shape (80 categories, input 416-fold downsampling Example 32) : [batch size, anchor × (80 + 5), 13, 13], in the process of training, the feature shape conversion operation by the map of tensor: [batch size, anchor, 13, 13, 85].

Testing process:

# p的形状目前为:【bs, anchor_num, gridx,gridy,xywhc+class】
else:  # 测试推理过程
   # s = 1.5  # scale_xy  (pxy = pxy * s - (s - 1) / 2)
   io = p.clone()  # 测试过程输出就是io
   io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid_xy  # xy
   # grid_xy是左上角再加上偏移量io[...:2]代表xy偏移
   io[..., 2:4] = torch.exp(
       io[..., 2:4]) * self.anchor_wh  # wh yolo method
   # io[..., 2:4] = ((torch.sigmoid(io[..., 2:4]) * 2) ** 3) * self.anchor_wh  
   # wh power method
   io[..., :4] *= self.stride

   if 'default' in self.arc:  # seperate obj and cls
       torch.sigmoid_(io[..., 4])
   elif 'BCE' in self.arc:  # unified BCE (80 classes)
       torch.sigmoid_(io[..., 5:])
       io[..., 4] = 1
   elif 'CE' in self.arc:  # unified CE (1 background + 80 classes)
       io[..., 4:] = F.softmax(io[..., 4:], dim=4)
       io[..., 4] = 1

   if self.nc == 1:
       io[..., 5] = 1
       # single-class model https://github.com/ultralytics/yolov3/issues/235

   # reshape from [1, 3, 13, 13, 85] to [1, 507, 85]
   return io.view(bs, -1, self.no), p

Understood that the above is required corresponding to the following formula:
\ [B_X = \ Sigma (t_x) + c_x \]

\[ b_y=\sigma(t_y)+c_y \]
\[ b_w=p_we^{t_x} \]

\[ b_h=p_he^{t_h} \]

xy section:

\[ b_x=\sigma(t_x)+c_x \]

\ [B_y = \ sigma (t_y) + c_y \]

\ (c_x, c_y \) represents the coordinates of the upper left corner of the grid; \ (t_x, t_y \) represents the predicted result of the network; \ (\ Sigma \) representative of sigmoid activation function. Code corresponding understood that:

io[..., :2] = torch.sigmoid(io[..., :2]) + self.grid_xy  # xy
# grid_xy是左上角再加上偏移量io[...:2]代表xy偏移

wh parts:

\[ b_w=p_we^{t_x} \]

\[ b_h=p_he^{t_h} \]

\ (p_w, p_h \) represents the a priori anchor frame corresponding to the feature map in size. \ (t_w, t_h \) represents the scaling factor learning get. Code corresponding understood that:

# wh yolo method
io[..., 2:4] = torch.exp(io[..., 2:4]) * self.anchor_wh  

class section:

Category in part, provides several methods to select a different mode according to the arc parameter. With CE (crossEntropy) as an example:

#io: (bs, anchors, grid, grid, xywh+classes)
io[..., 4:] = F.softmax(io[..., 4:], dim=4)# 使用softmax
io[..., 4] = 1 

3. References

pytorch official API

Output decoding: https://zhuanlan.zhihu.com/p/76802514

Guess you like

Origin www.cnblogs.com/pprp/p/12228991.html