[7] YOLOv3 learn from scratch. Teach you to add Attention mechanism YOLOv3 model

Preface: [series] from scratch learning YOLOv3 write more and more, the content had been scheduled relatively small, but in the process of reading the code slowly discover some new bright spots, so keep adding to this series. Before reading the code in the YOLOv3, we have learned cfg file, model construction and so on. Herein before, based on the model code is modified, the previous series Attention SE module and the module into YOLOv3 the CBAM.

1. prescribed format

Just as [convolutional], [maxpool], [net], [route]and other layers defined in the cfg, we add new modules when you want to look at the provisions cfg format. Make the following provisions:

(Explain in detail see: the SE module [cv in the Attention mechanism] the simplest and most easily implemented SE module ), there is a parameter reduction, the parameter default is 16, so the detailed parameters in this module we follow the following setting:

[se]
reduction=16

In CBAM module (explain in detail see: [Attention] mechanism CV in 2018 Convolutional Block Attention Module ECCV ), the spatial attention mechanisms and passages attention mechanisms in the presence of a total of two parameters: ratioand kernel_sizeso such a requirement CBAM in cfg file format:

[cbam]
ratio=16
kernelsize=7

2. Modify the analytical part

Since we added these parameters are customizable, so you need to modify cfg file parsing function, talked about before, we need to modify parse_config.pypart of:

def parse_model_cfg(path):
    # path参数为: cfg/yolov3-tiny.cfg
    if not path.endswith('.cfg'):
        path += '.cfg'
    if not os.path.exists(path) and \
           os.path.exists('cfg' + os.sep + path):
        path = 'cfg' + os.sep + path

    with open(path, 'r') as f:
        lines = f.read().split('\n')

    # 去除以#开头的,属于注释部分的内容
    lines = [x for x in lines if x and not x.startswith('#')]
    lines = [x.rstrip().lstrip() for x in lines]
    mdefs = []  # 模块的定义
    for line in lines:
        if line.startswith('['):  # 标志着一个模块的开始
            '''
            eg:
            [shortcut]
            from=-3
            activation=linear
            '''
            mdefs.append({})
            mdefs[-1]['type'] = line[1:-1].rstrip()
            if mdefs[-1]['type'] == 'convolutional':
                mdefs[-1]['batch_normalize'] = 0 
        else:
            key, val = line.split("=")
            key = key.rstrip()

            if 'anchors' in key:
                mdefs[-1][key] = np.array([float(x) for x in val.split(',')]).reshape((-1, 2))
            else:
                mdefs[-1][key] = val.strip()

    # Check all fields are supported
    supported = ['type', 'batch_normalize', 'filters', 'size',\
                 'stride', 'pad', 'activation', 'layers', \
                 'groups','from', 'mask', 'anchors', \
                 'classes', 'num', 'jitter', 'ignore_thresh',\
                 'truth_thresh', 'random',\
                 'stride_x', 'stride_y']

    f = []  # fields
    for x in mdefs[1:]:
        [f.append(k) for k in x if k not in f]
    u = [x for x in f if x not in supported]  # unsupported fields
    assert not any(u), "Unsupported fields %s in %s. See https://github.com/ultralytics/yolov3/issues/631" % (u, path)

    return mdefs

Above, the need to change is supported in the field, our content will be added to it:

supported = ['type', 'batch_normalize', 'filters', 'size',\
            'stride', 'pad', 'activation', 'layers', \
            'groups','from', 'mask', 'anchors', \
            'classes', 'num', 'jitter', 'ignore_thresh',\
            'truth_thresh', 'random',\
            'stride_x', 'stride_y',\
            'ratio', 'reduction', 'kernelsize']

3. To achieve SE and CBAM

See also specific principles [cv] mechanism of Attention simplest and most easily implemented SE modules and [Attention] mechanism CV in ECCV 2018 Convolutional Block Attention Module These two articles, using code directly below the above two articles:

I KNOW

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

CBAM

class SpatialAttention(nn.Module):
    def __init__(self, kernel_size=7):
        super(SpatialAttention, self).__init__()
        assert kernel_size in (3,7), "kernel size must be 3 or 7"
        padding = 3if kernel_size == 7else1

        self.conv = nn.Conv2d(2,1,kernel_size, padding=padding, bias=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = torch.mean(x, dim=1, keepdim=True)
        maxout, _ = torch.max(x, dim=1, keepdim=True)
        x = torch.cat([avgout, maxout], dim=1)
        x = self.conv(x)
        return self.sigmoid(x)
    
class ChannelAttention(nn.Module):
    def __init__(self, in_planes, rotio=16):
        super(ChannelAttention, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.max_pool = nn.AdaptiveMaxPool2d(1)

        self.sharedMLP = nn.Sequential(
            nn.Conv2d(in_planes, in_planes // ratio, 1, bias=False), nn.ReLU(),
            nn.Conv2d(in_planes // rotio, in_planes, 1, bias=False))
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        avgout = self.sharedMLP(self.avg_pool(x))
        maxout = self.sharedMLP(self.max_pool(x))
        return self.sigmoid(avgout + maxout)

Two or more modules of code is added to models.pythe file.

4. Design cfg file

Here to yolov3-tiny.cfgas the baseline, and then add the attention mechanism module.

Similarly CBAM and SE, the SE to, for example, added to the part after the Backbone, the information reconstruction (refinement).

[net]
# Testing
batch=1
subdivisions=1
# Training
# batch=64
# subdivisions=2
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

[se]
reduction=16

# 在backbone结束的地方添加se模块
#####backbone######

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear



[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=18
activation=linear

[yolo]
mask = 0,1,2
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=1
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

5. Model Construction

These are preparations to SE, for example, we modify the model.pyloaded part of the model file, and modify the code forward part of the function, let it function properly:

In model.pythe create_modulesfunction, add:

        elif mdef['type'] == 'se':
            modules.add_module(
                'se_module',
                SELayer(output_filters[-1], reduction=int(mdef['reduction'])))

Then modify the function in the forward portion Darknet:

def forward(self, x, var=None):
    img_size = x.shape[-2:]
    layer_outputs = []
    output = []

    for i, (mdef,
            module) in enumerate(zip(self.module_defs, self.module_list)):
        mtype = mdef['type']
        if mtype in ['convolutional', 'upsample', 'maxpool']:
            x = module(x)
        elif mtype == 'route':
            layers = [int(x) for x in mdef['layers'].split(',')]
            if len(layers) == 1:
                x = layer_outputs[layers[0]]
            else:
                try:
                    x = torch.cat([layer_outputs[i] for i in layers], 1)
                except:  # apply stride 2 for darknet reorg layer
                    layer_outputs[layers[1]] = F.interpolate(
                        layer_outputs[layers[1]], scale_factor=[0.5, 0.5])
                    x = torch.cat([layer_outputs[i] for i in layers], 1)

        elif mtype == 'shortcut':
            x = x + layer_outputs[int(mdef['from'])]
        elif mtype == 'yolo':
            output.append(module(x, img_size))
        layer_outputs.append(x if i in self.routs else [])

SE module is added in the forward, it is actually very simple. SE convolution module layer, the sample, the maximum cell layer is the same position, no further operation, only need to modify the code portion of the above:

    for i, (mdef,
            module) in enumerate(zip(self.module_defs, self.module_list)):
        mtype = mdef['type']
        if mtype in ['convolutional', 'upsample', 'maxpool', 'se']:
            x = module(x)

CBAM overall process is similar, you can try it yourself, the way to familiarize yourself with the overall process of YOLOv3.

Postscript: The content of this article is very simple, just add the attention of the module, it is easy to achieve. But the location of the specific mechanism of attention, so put the number of modules needed to do experiments to verify. Attention mechanism is not one size fits all, need to bring Senate to try to get satisfactory results. Welcome to contact me to join a group chat, feedback effects on the respective data sets.

ps: the recent attention to the body, go out wearing masks.

Guess you like

Origin www.cnblogs.com/pprp/p/12241054.html