参考：https://blog.paperspace.com/how-to-implement-a-yolo-v3-object-detector-from-scratch-in-pytorch-part-3/

如何在PyTorch中从头开始实现YOLO（v3）对象检测器：第3部分

图片来源：Karol Majek。在这里查看他的YOLO v3实时检测视频

这是从头开始实现YOLO v3探测器的教程的第3部分。在最后一部分中，我们实现了YOLO架构中使用的层，在这部分中，我们将在PyTorch中实现YOLO的网络架构，以便我们可以生成给定图像的输出。
我们的目标是设计网络的前向传递。

本教程的代码旨在在Python 3.5和PyTorch 0.4上运行。它可以在这个Github回购中找到它的全部内容。

本教程分为5个部分：

先决条件

扫描二维码关注公众号，回复： 4505847 查看本文章

本教程的第1部分和第2部分。
PyTorch的基本知识，包括如何创建自定义的架构nn.Module，nn.Sequential以及torch.nn.parameter类。
在PyTorch中处理图像

定义网络

正如我之前所指出的，我们使用nn.Moduleclass在PyTorch中构建自定义体系结构。让我们为探测器定义一个网络。在darknet.py文件中，我们添加以下类。

class Darknet(nn.Module):
    def __init__(self, cfgfile):
        super(Darknet, self).__init__()
        self.blocks = parse_cfg(cfgfile)
        self.net_info, self.module_list = create_modules(self.blocks)

在这里，我们将类子nn.Module类化并命名为我们的类Darknet。我们用成员blocks，net_info和来初始化网络module_list。

实现网络的前向传递

通过覆盖类的forward方法来实现网络的前向传递nn.Module。

forward有两个目的。首先，计算输出，其次，以一种可以更容易处理的方式转换输出检测特征图（例如将它们转换为可以连接多个尺度的检测图，否则这是不可能的具有不同的尺寸）。

def forward(self, x, CUDA):
    modules = self.blocks[1:]
    outputs = {}   #We cache the outputs for the route layer

forward有三个参数，self，输入x和CUDA，如果为true，将使用GPU来加速向前传球。

在这里，我们迭代self.blocks[1:]而不是self.blocks因为第一个元素self.blocks是一个net不是正向传递的一部分的块。

由于路径和快捷方式图层需要先前图层的输出图，因此我们会缓存dict中每个图层的输出要素图outputs。键是图层的索引，值是要素图

与create_modules函数的情况一样，我们现在迭代module_list其中包含网络的模块。这里要注意的是模块的附加顺序与配置文件中的顺序相同。这意味着，我们可以通过每个模块简单地运行输入以获得输出。

write = 0     #This is explained a bit later
for i, module in enumerate(modules):        
    module_type = (module["type"])

卷积和上采样层

如果模块是卷积或上采样模块，则前向传递应该如何工作。

        if module_type == "convolutional" or module_type == "upsample":
            x = self.module_list[i](x)

路线图层/快捷方式图层

如果查看路由层的代码，我们必须考虑两种情况（如第2部分所述）。对于我们必须连接两个特征映射的情况，我们使用torch.cat带有第二个参数的函数为1.这是因为我们想要沿着深度连接特征映射。（在PyTorch中，卷积层的输入和输出具有格式为“BXCXHX W”。深度对应于通道维度）。

        elif module_type == "route":
            layers = module["layers"]
            layers = [int(a) for a in layers]

            if (layers[0]) > 0:
                layers[0] = layers[0] - i

            if len(layers) == 1:
                x = outputs[i + (layers[0])]

            else:
                if (layers[1]) > 0:
                    layers[1] = layers[1] - i

                map1 = outputs[i + layers[0]]
                map2 = outputs[i + layers[1]]

                x = torch.cat((map1, map2), 1)

        elif  module_type == "shortcut":
            from_ = int(module["from"])
            x = outputs[i-1] + outputs[i+from_]

YOLO（检测层）

YOLO的输出是卷积特征映射，其包含沿着特征映射的深度的边界框属性。由单元预测的属性边界框彼此沿着一个接一个地堆叠。因此，如果您必须在（5,6）访问单元格的第二个边界，那么您将必须将其编入索引map[5,6, (5+C): 2*(5+C)]。这种形式对于输出处理非常不方便，例如通过对象置信度进行阈值处理，向中心添加网格偏移，应用锚点等。

另一个问题是，由于检测发生在三个尺度上，预测图的尺寸将是不同的。尽管三个特征图的尺寸不同，但要对它们进行的输出处理操作是相似的。不得不在单个张量上进行这些操作，而不是三个单独的张量。

为了解决这些问题，我们介绍了这个功能 predict_transform

转换输出

该函数predict_transform住在该文件中util.py，当我们在使用它，我们将导入函数forward的Darknet类。

将导入添加到顶部 util.py

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np
import cv2

predict_transform接受5个参数; 预测（我们的输出），inp_dim（输入图像维度），锚点，num_classes和可选的CUDA标志

def predict_transform(prediction, inp_dim, anchors, num_classes, CUDA = True):

predict_transform 函数采用检测特征映射并将其转换为二维张量，其中张量的每一行对应于边界框的属性，按以下顺序。

bbox_-2

这是执行上述转换的代码。


    batch_size = prediction.size(0)
    stride =  inp_dim // prediction.size(2)
    grid_size = inp_dim // stride
    bbox_attrs = 5 + num_classes
    num_anchors = len(anchors)
    
    prediction = prediction.view(batch_size, bbox_attrs*num_anchors, grid_size*grid_size)
    prediction = prediction.transpose(1,2).contiguous()
    prediction = prediction.view(batch_size, grid_size*grid_size*num_anchors, bbox_attrs)

锚定件的尺寸是根据与height和width所述的属性net的块。这些属性描述输入图像的尺寸，其比检测图更大（以步幅为单位）。因此，我们必须通过检测特征图的步幅来划分锚点。

    anchors = [(a[0]/stride, a[1]/stride) for a in anchors]

现在，我们需要根据我们在第1部分中讨论的方程转换输出。

Sigmoid x，y坐标和对象得分。

    #Sigmoid the  centre_X, centre_Y. and object confidencce
    prediction[:,:,0] = torch.sigmoid(prediction[:,:,0])
    prediction[:,:,1] = torch.sigmoid(prediction[:,:,1])
    prediction[:,:,4] = torch.sigmoid(prediction[:,:,4])

将网格偏移添加到中心坐标预测。

    #Add the center offsets
    grid = np.arange(grid_size)
    a,b = np.meshgrid(grid, grid)

    x_offset = torch.FloatTensor(a).view(-1,1)
    y_offset = torch.FloatTensor(b).view(-1,1)

    if CUDA:
        x_offset = x_offset.cuda()
        y_offset = y_offset.cuda()

    x_y_offset = torch.cat((x_offset, y_offset), 1).repeat(1,num_anchors).view(-1,2).unsqueeze(0)

    prediction[:,:,:2] += x_y_offset

将锚点应用于边界框的尺寸。

    #log space transform height and the width
    anchors = torch.FloatTensor(anchors)

    if CUDA:
        anchors = anchors.cuda()

    anchors = anchors.repeat(grid_size*grid_size, 1).unsqueeze(0)
    prediction[:,:,2:4] = torch.exp(prediction[:,:,2:4])*anchors

将sigmoid激活应用于课程分数

    prediction[:,:,5: 5 + num_classes] = torch.sigmoid((prediction[:,:, 5 : 5 + num_classes]))

我们在这里要做的最后一件事是将检测图的大小调整为输入图像的大小。此处的边界框属性根据要素图（例如，13 x 13）调整大小。如果输入图像是416 x 416，我们将属性乘以32或stride变量。

prediction[:,:,:4] *= stride

循环体结束了。

返回函数末尾的预测。

    return prediction

重新检测层

现在我们已经改变了输出张量，现在我们可以将三种不同尺度的检测图连接成一个大张量。请注意，在转换之前这是不可能的，因为无法连接具有不同空间维度的要素图。但是从现在开始，我们的输出张量仅仅作为一个带有边界框的表，因为它的行，连接是非常可能的。

我们的一个障碍是我们不能初始化一个空张量，然后将一个非空（不同形状）张量连接到它上面。因此，我们延迟收集器的初始化（包含检测的张量），直到我们得到第一个检测图，然后在我们得到后续检测时连接到它的映射。

注意write = 0函数中循环之前的行forward。该write标志用于指示我们是否遇到了第一次检测。如果write为0，则表示收集器尚未初始化。如果它是1，则表示收集器已初始化，我们可以将检测映射连接到它。

现在，我们已经使用了该predict_transform函数，我们编写了处理forward函数中检测特征映射的代码。

在darknet.py文件顶部，添加以下导入。

from util import *

然后，在forward函数中。

        elif module_type == 'yolo':        

            anchors = self.module_list[i][0].anchors
            #Get the input dimensions
            inp_dim = int (self.net_info["height"])

            #Get the number of classes
            num_classes = int (module["classes"])

            #Transform 
            x = x.data
            x = predict_transform(x, inp_dim, anchors, num_classes, CUDA)
            if not write:              #if no collector has been intialised. 
                detections = x
                write = 1

            else:       
                detections = torch.cat((detections, x), 1)

        outputs[i] = x

现在，只需返回检测。

    return detections

测试前进传球

这是一个创建虚拟输入的函数。我们会将此输入传递给我们的网络。在编写此函数之前，请将此图像保存到工作目录中。如果您使用的是Linux，请输入。

wget https://github.com/ayooshkathuria/pytorch-yolo-v3/raw/master/dog-cycle-car.png

现在，在darknet.py文件顶部定义函数，如下所示：

def get_test_input():
    img = cv2.imread("dog-cycle-car.png")
    img = cv2.resize(img, (416,416))          #Resize to the input dimension
    img_ =  img[:,:,::-1].transpose((2,0,1))  # BGR -> RGB | H X W C -> C X H X W 
    img_ = img_[np.newaxis,:,:,:]/255.0       #Add a channel at 0 (for batch) | Normalise
    img_ = torch.from_numpy(img_).float()     #Convert to float
    img_ = Variable(img_)                     # Convert to Variable
    return img_

然后，我们输入以下代码：


model = Darknet("cfg/yolov3.cfg")
inp = get_test_input()
pred = model(inp, torch.cuda.is_available())
print (pred)

你会看到像这样的输出。

(  0  ,.,.) = 
   16.0962   17.0541   91.5104  ...     0.4336    0.4692    0.5279
   15.1363   15.2568  166.0840  ...     0.5561    0.5414    0.5318
   14.4763   18.5405  409.4371  ...     0.5908    0.5353    0.4979
               ⋱                ...             
  411.2625  412.0660    9.0127  ...     0.5054    0.4662    0.5043
  412.1762  412.4936   16.0449  ...     0.4815    0.4979    0.4582
  412.1629  411.4338   34.9027  ...     0.4306    0.5462    0.4138
[torch.FloatTensor of size 1x10647x85]

这个张量的形状是1 x 10647 x 85。第一个维度是批量大小，它只是1，因为我们使用了单个图像。对于批处理中的每个图像，我们有一个10647 x 85表。每个表的行代表一个边界框。（4个bbox属性，1个对象分数和80个课程分数）

此时，我们的网络具有随机权重，并且不会产生正确的输出。我们需要在网络中加载权重文件。为此，我们将使用官方重量文件。

下载预训练的重量

将权重文件下载到检测器目录中。从这里抓取权重文件。或者如果你在linux上，

wget https://pjreddie.com/media/files/yolov3.weights

了解权重文件

官方权重文件是二进制文件，其中包含以串行方式存储的权重。

必须非常小心地阅读重量。权重只是作为浮点存储，没有任何东西可以指导我们它们属于哪个层。如果搞砸了，那么就没有什么可以阻止你将批量规范层的权重加载到卷积层的权重中。因为，您只读取浮点数，所以无法区分哪个权重属于哪个层。因此，我们必须了解权重的存储方式。

首先，权重仅属于两种类型的层，即批量范数层或卷积层。

这些图层的权重将按照与配置文件中显示的顺序完全相同的顺序存储。因此，如果a convolutional后跟一个shortcut块，然后是shortcut另一个convolutional块的块，您将期望file包含前一个convolutional块的权重，后面是后者的权重。

当批量标准层出现在convolutional块中时，没有偏差。但是，当没有批量规范层时，必须从文件中读取偏差“权重”。

下图总结了权重如何存储权重。

WTS-1

装载重量

让我们编写一个函数加载权重。它将是Darknet该类的成员函数。除了self权重文件的路径之外，还需要一个参数。

def load_weights(self, weightfile):

权重文件的前160个字节存储int32构成文件头部的5个值。

    #Open the weights file
    fp = open(weightfile, "rb")

    #The first 5 values are header information 
    # 1. Major version number
    # 2. Minor Version Number
    # 3. Subversion number 
    # 4,5. Images seen by the network (during training)
    header = np.fromfile(fp, dtype = np.int32, count = 5)
    self.header = torch.from_numpy(header)
    self.seen = self.header[3]

其余的位现在按照上述顺序表示权重。权重存储为float3232位浮点数。让我们在a中加载其余的权重np.ndarray。

    weights = np.fromfile(fp, dtype = np.float32)

现在，我们遍历权重文件，并将权重加载到我们网络的模块中。

    ptr = 0
    for i in range(len(self.module_list)):
        module_type = self.blocks[i + 1]["type"]

        #If module_type is convolutional load weights
        #Otherwise ignore.

进入循环，我们首先检查convolutional块是否为batch_normaliseTrue。基于此，我们加载权重。

        if module_type == "convolutional":
            model = self.module_list[i]
            try:
                batch_normalize = int(self.blocks[i+1]["batch_normalize"])
            except:
                batch_normalize = 0

            conv = model[0]

我们保留一个变量ptr来跟踪我们在权重数组中的位置。现在，如果batch_normalize为True，我们按如下方式加载权重。

        if (batch_normalize):
            bn = model[1]

            #Get the number of weights of Batch Norm Layer
            num_bn_biases = bn.bias.numel()

            #Load the weights
            bn_biases = torch.from_numpy(weights[ptr:ptr + num_bn_biases])
            ptr += num_bn_biases

            bn_weights = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
            ptr  += num_bn_biases

            bn_running_mean = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
            ptr  += num_bn_biases

            bn_running_var = torch.from_numpy(weights[ptr: ptr + num_bn_biases])
            ptr  += num_bn_biases

            #Cast the loaded weights into dims of model weights. 
            bn_biases = bn_biases.view_as(bn.bias.data)
            bn_weights = bn_weights.view_as(bn.weight.data)
            bn_running_mean = bn_running_mean.view_as(bn.running_mean)
            bn_running_var = bn_running_var.view_as(bn.running_var)

            #Copy the data to model
            bn.bias.data.copy_(bn_biases)
            bn.weight.data.copy_(bn_weights)
            bn.running_mean.copy_(bn_running_mean)
            bn.running_var.copy_(bn_running_var)

如果batch_norm不为true，则只需加载卷积层的偏差。

        else:
            #Number of biases
            num_biases = conv.bias.numel()

            #Load the weights
            conv_biases = torch.from_numpy(weights[ptr: ptr + num_biases])
            ptr = ptr + num_biases

            #reshape the loaded weights according to the dims of the model weights
            conv_biases = conv_biases.view_as(conv.bias.data)

            #Finally copy the data
            conv.bias.data.copy_(conv_biases)

最后，我们最后加载卷积层的权重。

#Let us load the weights for the Convolutional layers
num_weights = conv.weight.numel()

#Do the same as above for weights
conv_weights = torch.from_numpy(weights[ptr:ptr+num_weights])
ptr = ptr + num_weights

conv_weights = conv_weights.view_as(conv.weight.data)
conv.weight.data.copy_(conv_weights)

我们已完成此功能，您现在可以Darknet通过调用load_weightsdarknet对象上的函数来加载对象中的权重。

model = Darknet("cfg/yolov3.cfg")
model.load_weights("yolov3.weights")

这就是这一部分，我们建立了模型，加载了权重，我们终于可以开始检测对象了。在下一部分中，我们将介绍使用对象置信度阈值和非最大值抑制来生成最终的检测集。

进一步阅读

Ayoosh Kathuria目前是印度国防研究与发展组织的实习生，他正致力于改善粒状视频中的物体检测。当他不工作时，他正在睡觉或者在他的吉他上玩粉红色弗洛伊德。您可以在LinkedIn上与他联系，或者查看他在GitHub上做的更多内容

pytorch_yolov3(3)

如何在PyTorch中从头开始实现YOLO（v3）对象检测器：第3部分

接发送到您的收件箱

猜你喜欢