Using pytorch to implement YOLO-V3 target detection algorithm from scratch (2)

Using pytorch to implement YOLO-V3 target detection algorithm from scratch (2)

Blog translation

This is part 2 of a tutorial on implementing a YOLO v3 detector from scratch. In the previous section I explained how YOLO works, in this section we will implement YOLO's layers in PyTorch. In other words, this is the part where we create the building blocks of our model.
The code used in this tutorial needs to run on Python 3.5 and PyTorch 0.4. It can be found in this Github repository .

This tutorial is divided into 5 parts:

  • Part 1: Understanding How YOLO Works
  • Part 2: Creating the Network Structure
  • Part 3: Implementing forward propagation of the network
  • Part 4: Object Confidence Thresholding and Non-Maximum Suppression
  • Part 5: Designing Input and Output Pipelines

Prepare

This part requires the reader to have a basic understanding of how and how YOLO operates, as well as basic knowledge about PyTorch, such as how to build custom neural network architectures through classes such as nn.Module, nn.Sequential, and torch.nn.parameter.

getting Started

First create a folder for the detector code, and then create the Python file darknet.py. Darknet is the environment for building the underlying architecture of YOLO, this file will contain all the code to implement the YOLO network. Also we need to add a file called util.py, which will contain various functions that need to be called. After saving all these files in the detector folder, we can use git to track their changes.

configuration file

The official code (implemented in C language) uses a configuration file to build the network, ie the cfg file describes the network architecture piece by piece. If you've used the caffe backend, it's the .protxt file that describes the network.

We will use the official cfg file to build the network, which is published by the author of YOLO. We can download it at the following address and place it in the cfg folder in the detector directory.

Configuration file download: https://github.com/pjreddie/darknet/blob/master/cfg/yolov3.cfg

Of course, if you're using Linux, you can cd to the detector network directory and run the following command line.

mkdir cfg
cd cfg
wget https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg

Open the cfg configuration file and you will see some network structures as follows:

[convolutional]
...
[convolutional]
...
[convolutional]
...
[shortcut]
...

We see that there are four block configurations above, 3 of which describe the convolutional layers, and the last describes the shortcut layers or skip connections commonly used in ResNet. Here are the 5 tiers used in YOLO:

  • convolutional layer
[convolutional]
batch_normalize=1  
filters=64  
size=3  
stride=1  
pad=1  
activation=leaky
  • Skip connections
    Skip connections are similar to the structure used in residual networks. The parameter from is -3, which means that the output of the shortcut layer will be obtained by adding the feature map of the output of the previous layer and the output of the third layer before and the input of the module. .
[shortcut]
from=-3 
activation=linear 
  • The upsampling layer upsamples the feature maps in the previous layers
    via the parameter stride.
[upsample]
stride=2
  • Routing Layer The
    routing layer needs some explanation, and its parameter layers have one or two values. When there is only one value, it outputs the feature map of this layer indexed by that value. It is set to -4 in our experiments, so the layer will output the feature map of the fourth layer before the routing layer.
    When the hierarchy has two values, it returns the concatenated feature map indexed by these two values. -1 and 61 in our experiments, so this layer will output feature maps from the previous layer (-1) to the 61st layer and concatenate them by depth.
[route]
layers = -4

[route]
layers = -1, 61
  • YOLO
    The YOLO level corresponds to the detection level described above. The parameter anchors defines 9 sets of anchors, but they are only anchors indexed by the attribute used by the mask tag. Here, the mask values ​​of 0, 1, 2 indicate the first, second, and third anchors used. The mask indicates that each unit in the detection layer predicts three boxes. All in all, our detection layer has a scale of 3 and assembles a total of 9 anchors.
[yolo]
mask=0,1,2
anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
classes=80
num=9
jitter=.3
ignore_thresh = .5
truth_thresh = 1
random=1

  • There is another block net in the Net configuration file, but I don't think it is a layer, because it only describes the relevant information of the network input and training parameters, and is not used for YOLO's forward propagation. However, it provides us with information such as the size of the network input, which can be used to adjust the anchors in the forward pass.
[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=16
width= 320
height = 320
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

Parse the configuration file

Before we start, let's add the necessary imports at the top of the darknet.py file.

from __future__ import division

import torch 
import torch.nn as nn
import torch.nn.functional as F 
from torch.autograd import Variable
import numpy as np

We define a function parse_cfg that takes the path to the configuration file as input.

def parse_cfg(cfgfile):
    """
    Takes a configuration file
    Returns a list of blocks. Each blocks describes a block in the neural
    network to be built. Block is represented as a dictionary in the list    
    """

The idea here is to parse the cfg, storing each chunk as a dictionary. The properties and values ​​of these blocks are stored in dictionaries as key-value pairs. During parsing, we add these dictionaries (represented by the variable block in the code) to the list blocks. Our function will return this block.
We start by saving the configuration file contents in a list of strings. The following code performs preprocessing on this list:

    file = open(cfgfile, 'r')
    lines = file.read().split('\n')                        # store the lines in a list
    lines = [x for x in lines if len(x) > 0]               # get read of the empty lines 
    lines = [x for x in lines if x[0] != '#']              # get rid of comments

Then, we iterate over the preprocessed list and get the chunks.

    block = {}
    blocks = []

    for line in lines:
        if line[0] == "[":               # This marks the start of a new block
            if len(block) != 0:          # If block is not empty, implies it is storing values of previous block.
                blocks.append(block)     # add it the blocks list
                block = {}               # re-init the block
            block["type"] = line[1:-1].rstrip()     
        else:
            key,value = line.split("=") 
            block[key.rstrip()] = value.lstrip()
    blocks.append(block)

    return blocks

Now we will use the list returned by parse_cfg above to build the PyTorch module as a building block in the configuration file.
There are 5 types of layers in the list. PyTorch provides preset layers for convolutional and upsample. We will write our own modules for the remaining layers by extending the nn.Module class.
The create_modules function uses the list of blocks returned by the parse_cfg function:

def create_modules(blocks):
    net_info = blocks[0]  # Captures the information about the input and pre-processing
    module_list = nn.ModuleList()
    index = 0  # indexing blocks helps with implementing route  layers (skip connections)
    prev_filters = 3
    output_filters = []

Before iterating over the list, we define the variable net_info to store information about the network.
Our function will return a nn.ModuleList. This class is almost equivalent to an ordinary list containing nn.Module objects. However, when adding nn.ModuleList as a member of the nn.Module object (ie when we add modules to our network), the parameters of all nn.Module objects (modules) inside nn.ModuleList are also added as nn. The parameters of the Module object (that is, our network, which adds nn.ModuleList as its member).
When we define a new convolutional layer, we must define its kernel dimension. While the kernel height and width are provided by the cfg file, the kernel depth is determined by the number of kernels (or feature map depth) in the previous layer. This means that we need to keep track of the number of convolution kernels that are applied to the convolutional layer. We use the variable prev_filter to do this. We initialize it to 3 because the image has 3 channels corresponding to the RGB channels.
The route layer obtains feature maps (possibly concatenated) from previous layers. If there is a convolutional layer after the routing layer, then the convolutional kernel will be applied to the feature map of the previous layer, precisely the feature map obtained by the routing layer. Therefore, we need to track not only the number of convolution kernels in the previous layer, but also each previous layer. As we iterate, we add the number of output filters for each module to the output_filters list.
Now, the idea is to iterate over the list of modules and create a PyTorch module for each module.

    for x in blocks:
        module = nn.Sequential()
        #check the type of block
        #create a new module for the block
        #append to module_list

The nn.Sequential class is used to execute a number of nn.Module objects sequentially. If you look at the cfg file, you will find that a module may contain more than one layer. For example, a convolutional type module has a batch normalization layer, a leaky ReLU activation layer, and a convolutional layer. We concatenate these layers using nn.Sequential to get the add_module function. For example, the following shows an example of how we create convolutional and upsampling layers:

       # If it's a convolutional layer
        if (x["type"] == "convolutional"):
            # Get the info about the layer
            activation = x["activation"]
            try:
                batch_normalize = int(x["batch_normalize"])
                bias = False
            except:
                batch_normalize = 0
                bias = True

            filters = int(x["filters"])
            padding = int(x["pad"])
            kernel_size = int(x["size"])
            stride = int(x["stride"])

            if padding:
                pad = (kernel_size - 1) // 2
            else:
                pad = 0

            # Add the convolutional layer
            conv = nn.Conv2d(prev_filters, filters, kernel_size, stride, pad, bias=bias)
            module.add_module("conv_{0}".format(index), conv)
            # Add the Batch Norm Layer
            if batch_normalize:
                bn = nn.BatchNorm2d(filters)
                module.add_module("batch_norm_{0}".format(index), bn)

            # Check the activation.
            # It is either Linear or a Leaky ReLU for YOLO
            if activation == "leaky":
                activn = nn.LeakyReLU(0.1, inplace=True)
                module.add_module("leaky_{0}".format(index), activn)

        # If it's an upsampling layer
        # We use Bilinear2dUpsampling
        elif (x["type"] == "upsample"):
            stride = int(x["stride"])
            #            upsample = Upsample(stride)
            upsample = nn.Upsample(scale_factor=2, mode="nearest")
            module.add_module("upsample_{}".format(index), upsample)

Next, let's write the code to create the Route Layer and the Shortcut Layer:

The code that creates the routing layer needs some explanation. First, we extract the value about the layer attribute, represent it as an integer, and store it in a list.

Then we get a new layer called EmptyLayer, which, as the name suggests, is an empty layer.

route = EmptyLayer()

It is defined as follows:

class EmptyLayer(nn.Module):
     def __init__(self):
     super(EmptyLayer, self).__init__()

Now, an empty layer can be confusing because it doesn't do anything. And the Route Layer just like any other layer will do something (get the concatenation of the previous layer). In PyTorch, when we define a new layer, we write the operation of the layer's forward function in the nn.Module object in the subclass nn.Module.

To design a layer in the Route module, we must create an nn.Module object, which is initialized as a member of layers. Then, we can write code that stitches together the feature maps in the forward function and feeds it forward. Finally, we execute this layer of some forward function of the network.

But the code for the concatenation operation is fairly short and simple (calling torch.cat on the feature map), and designing a layer like the above process would lead to unnecessary abstraction, adding boilerplate code. Instead, we can place a dummy layer in place of the routing layer proposed earlier and perform the concatenation directly in the forward function of the nn.Module object representing the darknet. (If you're confused, I suggest you read up on using the nn.Module class in PyTorch).

The convolutional layer after the routing layer will apply its convolution kernel to the feature map (possibly concatenated) of the previous layer. The following code updates the filters variable to hold the number of kernels output by the routing layer.

# If it is a route layer
        elif (x["type"] == "route"):
            x["layers"] = x["layers"].split(',')

            # Start  of a route
            start = int(x["layers"][0])

            # end, if there exists one.
            try:
                end = int(x["layers"][1])
            except:
                end = 0

            # Positive anotation
            if start > 0:
                start = start - index

            if end > 0:
                end = end - index

            route = EmptyLayer()
            module.add_module("route_{0}".format(index), route)

            if end < 0:
                filters = output_filters[index + start] + output_filters[index + end]
            else:
                filters = output_filters[index + start]
        # shortcut corresponds to skip connection
        if x["type"] == "shortcut":
            from_ = int(x["from"])
            shortcut = EmptyLayer()
            module.add_module("shortcut_{}".format(index), shortcut)

Finally, we will write the code to create the YOLO layer:

        # Yolo is the detection layer
        if x["type"] == "yolo":
            mask = x["mask"].split(",")
            mask = [int(x) for x in mask]

            anchors = x["anchors"].split(",")
            anchors = [int(a) for a in anchors]
            anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
            anchors = [anchors[i] for i in mask]

            detection = DetectionLayer(anchors)
            module.add_module("Detection_{}".format(index), detection)

We define a new layer DetectionLayer to hold the anchors used to detect bounding boxes.
The detection layer is defined as follows:

class DetectionLayer(nn.Module):
    def __init__(self, anchors):
        super(DetectionLayer, self).__init__()
        self.anchors = anchors

At the end of this loop, we do some statistics (bookkeeping.).

        module_list.append(module)
        prev_filters = filters
        output_filters.append(filters)

test code

You can test the code by typing the following command line after darknet.py to run the file.

blocks = parse_cfg("cfg/yolov3.cfg")
print(create_modules(blocks))

You'll see a long list (106 items to be exact) with elements that look like this:

 (9): Sequential(
 (conv_9): Conv2d (128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
 (batch_norm_9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
 (leaky_9): LeakyReLU(0.1, inplace)
 )
 (10): Sequential(
 (conv_10): Conv2d (64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
 (batch_norm_10): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
 (leaky_10): LeakyReLU(0.1, inplace)
 )
 (11): Sequential(
 (shortcut_11): EmptyLayer(
 )
 )

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325521458&siteId=291194637